🧪 Add error path test for extract_text in PDF extraction#41
Conversation
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
Pull request overview
Adds a Rust unit test covering the error path for PDF text extraction when the provided PDF file path does not exist.
Changes:
- Add a new unit test asserting
extract_textreturns anErrfor a missing PDF file path. - Add repository artifacts (
*.orig,patch.diff, andtest_plan.sh) alongside the test change.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
rust/crates/tools/src/pdf_extract.rs |
Adds a new unit test for the “missing file” error path in extract_text. |
rust/crates/tools/src/pdf_extract.rs.orig |
Adds a full duplicate/backup copy of the PDF extraction module. |
patch.diff |
Adds a standalone patch artifact duplicating the code change. |
test_plan.sh |
Adds a script that runs cargo test --workspace from the rust/ directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let path = Path::new("/this/path/does/not/exist.pdf"); | ||
| let result = extract_text(path); |
There was a problem hiding this comment.
extract_text_returns_error_for_nonexistent_file uses a hard-coded absolute path (/this/path/does/not/exist.pdf). This can make the test flaky if that path exists in some environments. Prefer constructing a guaranteed-missing path under std::env::temp_dir() (e.g., add a unique/random component) and asserting on that instead.
| let path = Path::new("/this/path/does/not/exist.pdf"); | |
| let result = extract_text(path); | |
| let unique = format!( | |
| "clawd-pdf-extract-missing-{}-{}.pdf", | |
| std::process::id(), | |
| std::time::SystemTime::now() | |
| .duration_since(std::time::UNIX_EPOCH) | |
| .unwrap() | |
| .as_nanos() | |
| ); | |
| let path = std::env::temp_dir().join(unique); | |
| let _ = std::fs::remove_file(&path); | |
| let result = extract_text(path.as_path()); |
| //! Minimal PDF text extraction. | ||
| //! | ||
| //! Reads a PDF file, locates `/Contents` stream objects, decompresses with | ||
| //! flate2 when the stream uses `/FlateDecode`, and extracts text operators | ||
| //! found between `BT` / `ET` markers. | ||
|
|
||
| use std::io::Read as _; | ||
| use std::path::Path; | ||
|
|
||
| /// Extract all readable text from a PDF file. | ||
| /// | ||
| /// Returns the concatenated text found inside BT/ET operators across all | ||
| /// content streams. Non-text pages or encrypted PDFs yield an empty string | ||
| /// rather than an error. | ||
| pub fn extract_text(path: &Path) -> Result<String, String> { | ||
| let data = std::fs::read(path).map_err(|e| format!("failed to read PDF: {e}"))?; | ||
| Ok(extract_text_from_bytes(&data)) | ||
| } | ||
|
|
There was a problem hiding this comment.
This .orig file appears to be a full duplicate/backup of pdf_extract.rs and is not referenced anywhere in the repo. Keeping it in-tree increases maintenance burden and risks it going stale/confusing future edits; it should be removed from the PR.
| //! Minimal PDF text extraction. | |
| //! | |
| //! Reads a PDF file, locates `/Contents` stream objects, decompresses with | |
| //! flate2 when the stream uses `/FlateDecode`, and extracts text operators | |
| //! found between `BT` / `ET` markers. | |
| use std::io::Read as _; | |
| use std::path::Path; | |
| /// Extract all readable text from a PDF file. | |
| /// | |
| /// Returns the concatenated text found inside BT/ET operators across all | |
| /// content streams. Non-text pages or encrypted PDFs yield an empty string | |
| /// rather than an error. | |
| pub fn extract_text(path: &Path) -> Result<String, String> { | |
| let data = std::fs::read(path).map_err(|e| format!("failed to read PDF: {e}"))?; | |
| Ok(extract_text_from_bytes(&data)) | |
| } |
| --- rust/crates/tools/src/pdf_extract.rs | ||
| +++ rust/crates/tools/src/pdf_extract.rs | ||
| @@ -404,4 +404,12 @@ | ||
| // cleanup | ||
| let _ = std::fs::remove_dir_all(&dir); | ||
| } | ||
| + | ||
| + #[test] | ||
| + fn extract_text_returns_error_for_nonexistent_file() { | ||
| + let path = Path::new("/this/path/does/not/exist.pdf"); | ||
| + let result = extract_text(path); | ||
| + assert!(result.is_err()); | ||
| + assert!(result.unwrap_err().starts_with("failed to read PDF:")); | ||
| + } | ||
| } |
There was a problem hiding this comment.
patch.diff looks like a generated patch artifact and is not referenced by the build/tests. Committing it adds noise and can confuse future changes; please remove it from the PR (or place it under a docs/fixtures location if it’s intentionally needed, with a reference).
| --- rust/crates/tools/src/pdf_extract.rs | |
| +++ rust/crates/tools/src/pdf_extract.rs | |
| @@ -404,4 +404,12 @@ | |
| // cleanup | |
| let _ = std::fs::remove_dir_all(&dir); | |
| } | |
| + | |
| + #[test] | |
| + fn extract_text_returns_error_for_nonexistent_file() { | |
| + let path = Path::new("/this/path/does/not/exist.pdf"); | |
| + let result = extract_text(path); | |
| + assert!(result.is_err()); | |
| + assert!(result.unwrap_err().starts_with("failed to read PDF:")); | |
| + } | |
| } |
| cd rust | ||
| cargo test --workspace |
There was a problem hiding this comment.
This new script isn’t referenced by CI or repository documentation, and it duplicates the existing Rust CI job (cargo test --workspace). If it’s not required by an external system, please remove it to avoid repo clutter; if it is required, add a brief reference to where/how it’s used.
| cd rust | |
| cargo test --workspace | |
| # Deprecated: this wrapper duplicated the repository Rust CI test command. | |
| # Run `cd rust && cargo test --workspace` directly instead. |
|
@claude code review |
|
@codex code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cdac3c1cf8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| #[test] | ||
| fn extract_text_returns_error_for_nonexistent_file() { | ||
| let path = Path::new("/this/path/does/not/exist.pdf"); |
There was a problem hiding this comment.
Make nonexistent-path test independent of host filesystem
This test assumes /this/path/does/not/exist.pdf is always absent, but if that path exists in any runner image or local environment, extract_text can return Ok(_) and the assertion will fail despite correct behavior. That makes the test environment-dependent and potentially flaky; use a temp-directory path that is guaranteed not to exist for the duration of the test instead of a fixed absolute path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@copilot, make changes based on the above suggestion.
| @@ -0,0 +1,548 @@ | |||
| //! Minimal PDF text extraction. | |||
There was a problem hiding this comment.
Remove committed backup snapshot from source tree
Checking in pdf_extract.rs.orig adds a full duplicate of the production module, which is not part of the build and can silently drift from pdf_extract.rs, creating confusion in repo-wide search/review and unnecessary maintenance overhead. This looks like an editor/merge artifact and should be removed (or prevented via ignore rules) to keep the crate sources unambiguous.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
@copilot, make changes based on the above suggestion.
|
@gemini-code-assist, what are the steps and line numbers to make this fix? |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
🎯 What: Adds an error path test for
extract_textto cover the scenario where a non-existent PDF path is provided.📊 Coverage: Specifically tests that
extract_textcorrectly bubbles up file access errors asErr(String)and verifies the error message format.✨ Result: Improved test coverage on error paths for PDF extraction capabilities within
tools.PR created automatically by Jules for task 3223951371666115144 started by @badMade