Skip to content
This repository was archived by the owner on Feb 12, 2021. It is now read-only.

Updated URLScan for archive URLs. Fixes #84#94

Open
RajuKoushik wants to merge 1 commit into
aboutcode-org:developfrom
RajuKoushik:zipurl
Open

Updated URLScan for archive URLs. Fixes #84#94
RajuKoushik wants to merge 1 commit into
aboutcode-org:developfrom
RajuKoushik:zipurl

Conversation

@RajuKoushik
Copy link
Copy Markdown
Contributor

Signed-off-by: rajukoushik g.rajukoushik@gmail.com

Signed-off-by: rajukoushik <g.rajukoushik@gmail.com>
Copy link
Copy Markdown
Member

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I made a few comments regarding your code. Thanks for the contribution!

Comment thread scanapp/tasks.py
Create and save a file at `path` present at `url` using `scan_id` and bare `path` and
`file_name` and apply the scan.
"""
r = requests.get(url)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is path repeated here?

Comment thread scanapp/tasks.py
url_parse = urlparse(url)
os.chdir(path)

if r.status_code == 200:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have code from extractcode in scancode that can do extractions on different archive types which may be helpful, for example https://github.com/nexB/scancode-toolkit/blob/develop/src/extractcode/extract.py#L101

Comment thread scanapp/views.py
scan_directory = None
scan_id = create_scan_id(user, url, scan_directory, scan_start_time)
current_scan = Scan.objects.get(pk=scan_id)
path = '/'.join([path, '{}'.format(current_scan.pk)])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use os.path.join() to ensure consistency when joining paths

Comment thread scanapp/views.py
for i in allowed_exts:
if url_parse.path.endswith(i):
is_zip_url = True
finally:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is try-finally used?

Comment thread scanapp/views.py
is_zip_url = False

try:
for i in allowed_exts:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have some code that identify whether or not files are archives or not. I will ask @pombredanne

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants