Updated URLScan for archive URLs. Fixes #84 by RajuKoushik · Pull Request #94 · aboutcode-org/scancode-server

RajuKoushik · 2017-08-23T19:40:36Z

Signed-off-by: rajukoushik g.rajukoushik@gmail.com

Signed-off-by: rajukoushik <g.rajukoushik@gmail.com>

JonoYang

Sorry for the delay. I made a few comments regarding your code. Thanks for the contribution!

JonoYang · 2017-09-27T01:15:13Z

+    Create and save a file at `path` present at `url` using `scan_id` and bare `path` and
+    `file_name` and apply the scan.
+    """
+    r = requests.get(url)


Why is path repeated here?

JonoYang · 2017-09-27T01:21:44Z

+    url_parse = urlparse(url)
+    os.chdir(path)
+
+    if r.status_code == 200:


We have code from extractcode in scancode that can do extractions on different archive types which may be helpful, for example https://github.com/nexB/scancode-toolkit/blob/develop/src/extractcode/extract.py#L101

JonoYang · 2017-09-27T01:23:19Z

+                scan_directory = None
+                scan_id = create_scan_id(user, url, scan_directory, scan_start_time)
+                current_scan = Scan.objects.get(pk=scan_id)
+                path = '/'.join([path, '{}'.format(current_scan.pk)])


Use os.path.join() to ensure consistency when joining paths

JonoYang · 2017-09-27T01:48:13Z

+                for i in allowed_exts:
+                    if url_parse.path.endswith(i):
+                        is_zip_url = True
+            finally:


Why is try-finally used?

JonoYang · 2017-09-27T02:05:44Z

+            is_zip_url = False
+
+            try:
+                for i in allowed_exts:


We may have some code that identify whether or not files are archives or not. I will ask @pombredanne

Updated URLScan for archive URLs. Fixes aboutcode-org#84

26b48a1

Signed-off-by: rajukoushik <g.rajukoushik@gmail.com>

JonoYang suggested changes Sep 27, 2017

View reviewed changes

singh1114 mentioned this pull request Jan 31, 2018

Extending the scancode API #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated URLScan for archive URLs. Fixes #84#94

Updated URLScan for archive URLs. Fixes #84#94
RajuKoushik wants to merge 1 commit into
aboutcode-org:developfrom
RajuKoushik:zipurl

RajuKoushik commented Aug 23, 2017

Uh oh!

JonoYang left a comment

Uh oh!

JonoYang Sep 27, 2017

Uh oh!

JonoYang Sep 27, 2017

Uh oh!

JonoYang Sep 27, 2017

Uh oh!

JonoYang Sep 27, 2017

Uh oh!

JonoYang Sep 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RajuKoushik commented Aug 23, 2017

Uh oh!

JonoYang left a comment

Choose a reason for hiding this comment

Uh oh!

JonoYang Sep 27, 2017

Choose a reason for hiding this comment

Uh oh!

JonoYang Sep 27, 2017

Choose a reason for hiding this comment

Uh oh!

JonoYang Sep 27, 2017

Choose a reason for hiding this comment

Uh oh!

JonoYang Sep 27, 2017

Choose a reason for hiding this comment

Uh oh!

JonoYang Sep 27, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants