-
Notifications
You must be signed in to change notification settings - Fork 9
Improve error messages for entity extraction to aid debugging #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -774,9 +774,9 @@ def add_url_safely(url_str): | |||||||
| if cleaned_url: | ||||||||
| urls.add(cleaned_url) | ||||||||
| except (ValueError, AttributeError) as e: | ||||||||
| print(f"[*] Error processing URL {url_str}: {e}") | ||||||||
| print(f"[*] Error processing URL: value={url_str!r}, error={type(e).__name__}: {e}") | ||||||||
| except Exception as e: | ||||||||
| print(f"[*] Unexpected error with URL {url_str}: {e}") | ||||||||
| print(f"[*] Unexpected error with URL: value={url_str!r}, error={type(e).__name__}: {e}") | ||||||||
|
|
||||||||
| # Process URLs found with regex | ||||||||
| for url in re.findall(url_format, body, re.DOTALL): | ||||||||
|
|
@@ -802,16 +802,16 @@ def add_url_safely(url_str): | |||||||
|
|
||||||||
| urls.add(UUF(full_url).rebuild()) | ||||||||
| except AttributeError: | ||||||||
| print("[*] AttributeError: Invalid attribute in URL") | ||||||||
| print(f"[*] AttributeError: Invalid attribute in URL, value={href!r}") | ||||||||
| except ValueError: | ||||||||
| print("[*] ValueError: Invalid URL format") | ||||||||
| print(f"[*] ValueError: Invalid URL format, value={href!r}") | ||||||||
| except Exception as e: | ||||||||
| print(f"Error parsing text with BeautifulSoup: {e}") | ||||||||
| print(f"[*] Unexpected error: {e}") | ||||||||
| print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}") | ||||||||
| print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}") | ||||||||
|
Comment on lines
+809
to
+810
|
||||||||
| print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}") | |
| print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}") | |
| print(f"[*] Unexpected error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}") |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error prints include value={body!r}. Since body is the full HTML/text chunk, this can be extremely large and may leak sensitive content into logs/stdout. Consider logging only type(body), len(body), and a truncated preview (or a hash) instead of the full repr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this BeautifulSoup link loop, the exception handlers interpolate
href(e.g.,value={href!r}), buthrefis assigned inside thetry. If an exception is raised beforehrefis set (e.g., duringurl.get(...)), the handler will raiseUnboundLocalErrorand mask the original failure. Initializehref = Nonebefore thetry(or use a safer fallback like printingurl/full_urlvialocals().get(...)).