Improve error messages for entity extraction to aid debugging by devbm7 · Pull Request #40 · junquera/restalker

devbm7 · 2025-07-21T10:55:49Z

Enhanced all error print statements for URL/entity extraction (Skype, Discord, Telegram, WhatsApp, etc.) to include:
- The entity type being processed
- The problematic value that caused the error
- The exception type and message
This makes errors much more descriptive and actionable for debugging, addressing the open issue about insufficient error context.

This would help with #39

- Enhanced all error print statements for URL/entity extraction (Skype, Discord, Telegram, WhatsApp, etc.) to include: - The entity type being processed - The problematic value that caused the error - The exception type and message - This makes errors much more descriptive and actionable for debugging, addressing the open issue about insufficient error context.

Copilot

Pull request overview

This PR enhances stdout error messages emitted during URL/entity extraction to include the problematic value and the exception type/message, improving debugging context as requested in issue #39.

Changes:

Add value=... and error=<ExceptionType>: <message> context to URL parsing error prints in extract_links.
Add similar contextual information to entity-specific URL extraction errors (I2P/Tor/WhatsApp/Discord/Telegram/Skype) in _analyze_chunk.
Expand BeautifulSoup parsing error prints to include the relevant value being processed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T10:25:07Z

restalker/restalker.py

                                urls.add(UUF(full_url).rebuild())
                        except AttributeError:
-                            print("[*] AttributeError: Invalid attribute in URL")
+                            print(f"[*] AttributeError: Invalid attribute in URL, value={href!r}")
                        except ValueError:
-                            print("[*] ValueError: Invalid URL format")
+                            print(f"[*] ValueError: Invalid URL format, value={href!r}")


In this BeautifulSoup link loop, the exception handlers interpolate href (e.g., value={href!r}), but href is assigned inside the try. If an exception is raised before href is set (e.g., during url.get(...)), the handler will raise UnboundLocalError and mask the original failure. Initialize href = None before the try (or use a safer fallback like printing url/full_url via locals().get(...)).

Copilot · 2026-02-12T10:25:08Z

restalker/restalker.py

+            print(f"[*] TypeError: Invalid input type for BeautifulSoup, value={body!r}")
        except Exception as e:
-            print(f"[*] Error with HTML parsing: {e}")
+            print(f"[*] Error with HTML parsing: value={body!r}, error={type(e).__name__}: {e}")


These error prints include value={body!r}. Since body is the full HTML/text chunk, this can be extremely large and may leak sensitive content into logs/stdout. Consider logging only type(body), len(body), and a truncated preview (or a hash) instead of the full repr.

Copilot · 2026-02-12T10:25:08Z

restalker/restalker.py

+                            print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")
+                            print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}")


This exception path prints two separate lines for the same failure and uses a different prefix format than the other error messages. Consider consolidating to a single, consistently-formatted message to reduce noisy/duplicated output while preserving the new context fields.

Suggested change

print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")

print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}")

print(f"[*] Unexpected error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")

CarlosLannister requested a review from Copilot February 12, 2026 10:22

Copilot started reviewing on behalf of CarlosLannister February 12, 2026 10:22 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error messages for entity extraction to aid debugging#40

Improve error messages for entity extraction to aid debugging#40
devbm7 wants to merge 1 commit intojunquera:masterfrom
devbm7:master

devbm7 commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")
		print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}")

	print(f"Error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")
	print(f"[*] Unexpected error: value={href!r}, error={type(e).__name__}: {e}")
	print(f"[*] Unexpected error parsing text with BeautifulSoup: value={href!r}, error={type(e).__name__}: {e}")

Conversation

devbm7 commented Jul 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants