Skip to content

🧪 Add error path test for extract_text in PDF extraction#41

Open
badMade wants to merge 1 commit into
mainfrom
testing-improvement/extract_text_error_path-3223951371666115144
Open

🧪 Add error path test for extract_text in PDF extraction#41
badMade wants to merge 1 commit into
mainfrom
testing-improvement/extract_text_error_path-3223951371666115144

Conversation

@badMade

@badMade badMade commented Apr 24, 2026

Copy link
Copy Markdown
Owner

🎯 What: Adds an error path test for extract_text to cover the scenario where a non-existent PDF path is provided.
📊 Coverage: Specifically tests that extract_text correctly bubbles up file access errors as Err(String) and verifies the error message format.
Result: Improved test coverage on error paths for PDF extraction capabilities within tools.


PR created automatically by Jules for task 3223951371666115144 started by @badMade

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copilot AI review requested due to automatic review settings April 24, 2026 03:39
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Rust unit test covering the error path for PDF text extraction when the provided PDF file path does not exist.

Changes:

  • Add a new unit test asserting extract_text returns an Err for a missing PDF file path.
  • Add repository artifacts (*.orig, patch.diff, and test_plan.sh) alongside the test change.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
rust/crates/tools/src/pdf_extract.rs Adds a new unit test for the “missing file” error path in extract_text.
rust/crates/tools/src/pdf_extract.rs.orig Adds a full duplicate/backup copy of the PDF extraction module.
patch.diff Adds a standalone patch artifact duplicating the code change.
test_plan.sh Adds a script that runs cargo test --workspace from the rust/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +551 to +552
let path = Path::new("/this/path/does/not/exist.pdf");
let result = extract_text(path);

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_text_returns_error_for_nonexistent_file uses a hard-coded absolute path (/this/path/does/not/exist.pdf). This can make the test flaky if that path exists in some environments. Prefer constructing a guaranteed-missing path under std::env::temp_dir() (e.g., add a unique/random component) and asserting on that instead.

Suggested change
let path = Path::new("/this/path/does/not/exist.pdf");
let result = extract_text(path);
let unique = format!(
"clawd-pdf-extract-missing-{}-{}.pdf",
std::process::id(),
std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_nanos()
);
let path = std::env::temp_dir().join(unique);
let _ = std::fs::remove_file(&path);
let result = extract_text(path.as_path());

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +19
//! Minimal PDF text extraction.
//!
//! Reads a PDF file, locates `/Contents` stream objects, decompresses with
//! flate2 when the stream uses `/FlateDecode`, and extracts text operators
//! found between `BT` / `ET` markers.

use std::io::Read as _;
use std::path::Path;

/// Extract all readable text from a PDF file.
///
/// Returns the concatenated text found inside BT/ET operators across all
/// content streams. Non-text pages or encrypted PDFs yield an empty string
/// rather than an error.
pub fn extract_text(path: &Path) -> Result<String, String> {
let data = std::fs::read(path).map_err(|e| format!("failed to read PDF: {e}"))?;
Ok(extract_text_from_bytes(&data))
}

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .orig file appears to be a full duplicate/backup of pdf_extract.rs and is not referenced anywhere in the repo. Keeping it in-tree increases maintenance burden and risks it going stale/confusing future edits; it should be removed from the PR.

Suggested change
//! Minimal PDF text extraction.
//!
//! Reads a PDF file, locates `/Contents` stream objects, decompresses with
//! flate2 when the stream uses `/FlateDecode`, and extracts text operators
//! found between `BT` / `ET` markers.
use std::io::Read as _;
use std::path::Path;
/// Extract all readable text from a PDF file.
///
/// Returns the concatenated text found inside BT/ET operators across all
/// content streams. Non-text pages or encrypted PDFs yield an empty string
/// rather than an error.
pub fn extract_text(path: &Path) -> Result<String, String> {
let data = std::fs::read(path).map_err(|e| format!("failed to read PDF: {e}"))?;
Ok(extract_text_from_bytes(&data))
}

Copilot uses AI. Check for mistakes.
Comment thread patch.diff
Comment on lines +1 to +15
--- rust/crates/tools/src/pdf_extract.rs
+++ rust/crates/tools/src/pdf_extract.rs
@@ -404,4 +404,12 @@
// cleanup
let _ = std::fs::remove_dir_all(&dir);
}
+
+ #[test]
+ fn extract_text_returns_error_for_nonexistent_file() {
+ let path = Path::new("/this/path/does/not/exist.pdf");
+ let result = extract_text(path);
+ assert!(result.is_err());
+ assert!(result.unwrap_err().starts_with("failed to read PDF:"));
+ }
}

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch.diff looks like a generated patch artifact and is not referenced by the build/tests. Committing it adds noise and can confuse future changes; please remove it from the PR (or place it under a docs/fixtures location if it’s intentionally needed, with a reference).

Suggested change
--- rust/crates/tools/src/pdf_extract.rs
+++ rust/crates/tools/src/pdf_extract.rs
@@ -404,4 +404,12 @@
// cleanup
let _ = std::fs::remove_dir_all(&dir);
}
+
+ #[test]
+ fn extract_text_returns_error_for_nonexistent_file() {
+ let path = Path::new("/this/path/does/not/exist.pdf");
+ let result = extract_text(path);
+ assert!(result.is_err());
+ assert!(result.unwrap_err().starts_with("failed to read PDF:"));
+ }
}

Copilot uses AI. Check for mistakes.
Comment thread test_plan.sh
Comment on lines +2 to +3
cd rust
cargo test --workspace

Copilot AI Apr 24, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new script isn’t referenced by CI or repository documentation, and it duplicates the existing Rust CI job (cargo test --workspace). If it’s not required by an external system, please remove it to avoid repo clutter; if it is required, add a brief reference to where/how it’s used.

Suggested change
cd rust
cargo test --workspace
# Deprecated: this wrapper duplicated the repository Rust CI test command.
# Run `cd rust && cargo test --workspace` directly instead.

Copilot uses AI. Check for mistakes.
@badMade badMade marked this pull request as ready for review April 26, 2026 21:38
@badMade

badMade commented Apr 27, 2026

Copy link
Copy Markdown
Owner Author

@claude code review

@badMade

badMade commented Apr 27, 2026

Copy link
Copy Markdown
Owner Author

@codex code review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdac3c1cf8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


#[test]
fn extract_text_returns_error_for_nonexistent_file() {
let path = Path::new("/this/path/does/not/exist.pdf");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make nonexistent-path test independent of host filesystem

This test assumes /this/path/does/not/exist.pdf is always absent, but if that path exists in any runner image or local environment, extract_text can return Ok(_) and the assertion will fail despite correct behavior. That makes the test environment-dependent and potentially flaky; use a temp-directory path that is guaranteed not to exist for the duration of the test instead of a fixed absolute path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, make changes based on the above suggestion.

@@ -0,0 +1,548 @@
//! Minimal PDF text extraction.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Remove committed backup snapshot from source tree

Checking in pdf_extract.rs.orig adds a full duplicate of the production module, which is not part of the build and can silently drift from pdf_extract.rs, creating confusion in repo-wide search/review and unnecessary maintenance overhead. This looks like an editor/merge artifact and should be removed (or prevented via ignore rules) to keep the crate sources unambiguous.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, make changes based on the above suggestion.

@badMade

badMade commented Apr 30, 2026

Copy link
Copy Markdown
Owner Author

@gemini-code-assist, what are the steps and line numbers to make this fix?

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants