Skip to content

Improve error messages for entity extraction to aid debugging#40

Open
devbm7 wants to merge 1 commit intojunquera:masterfrom
devbm7:master
Open

Improve error messages for entity extraction to aid debugging#40
devbm7 wants to merge 1 commit intojunquera:masterfrom
devbm7:master

Conversation

@devbm7
Copy link
Copy Markdown

@devbm7 devbm7 commented Jul 21, 2025

  • Enhanced all error print statements for URL/entity extraction (Skype, Discord, Telegram, WhatsApp, etc.) to include:
    • The entity type being processed
    • The problematic value that caused the error
    • The exception type and message
  • This makes errors much more descriptive and actionable for debugging, addressing the open issue about insufficient error context.

This would help with #39

- Enhanced all error print statements for URL/entity extraction (Skype, Discord, Telegram, WhatsApp, etc.) to include:
  - The entity type being processed
  - The problematic value that caused the error
  - The exception type and message
- This makes errors much more descriptive and actionable for debugging, addressing the open issue about insufficient error context.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances stdout error messages emitted during URL/entity extraction to include the problematic value and the exception type/message, improving debugging context as requested in issue #39.

Changes:

  • Add value=... and error=<ExceptionType>: <message> context to URL parsing error prints in extract_links.
  • Add similar contextual information to entity-specific URL extraction errors (I2P/Tor/WhatsApp/Discord/Telegram/Skype) in _analyze_chunk.
  • Expand BeautifulSoup parsing error prints to include the relevant value being processed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 803 to +807
urls.add(UUF(full_url).rebuild())
except AttributeError:
print("[*] AttributeError: Invalid attribute in URL")
print(f"[*] AttributeError: Invalid attribute in URL, value={href!r}")
except ValueError:
print("[*] ValueError: Invalid URL format")
print(f"[*] ValueError: Invalid URL format, value={href!r}")
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this BeautifulSoup link loop, the exception handlers interpolate href (e.g., value={href!r}), but href is assigned inside the try. If an exception is raised before href is set (e.g., during url.get(...)), the handler will raise UnboundLocalError and mask the original failure. Initialize href = None before the try (or use a safer fallback like printing url/full_url via locals().get(...)).

Copilot uses AI. Check for mistakes.
Comment on lines +812 to +814
print(f"[*] TypeError: Invalid input type for BeautifulSoup, value={body!r}")
except Exception as e:
print(f"[*] Error with HTML parsing: {e}")
print(f"[*] Error with HTML parsing: value={body!r}, error={type(e).__name__}: {e}")
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error prints include value={body!r}. Since body is the full HTML/text chunk, this can be extremely large and may leak sensitive content into logs/stdout. Consider logging only type(body), len(body), and a truncated preview (or a hash) instead of the full repr.

Copilot uses AI. Check for mistakes.
Comment on lines +809 to +810
print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")
print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}")
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception path prints two separate lines for the same failure and uses a different prefix format than the other error messages. Consider consolidating to a single, consistently-formatted message to reduce noisy/duplicated output while preserving the new context fields.

Suggested change
print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")
print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}")
print(f"[*] Unexpected error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants