🧪 Add error path test for `extract_text` in PDF extraction by badMade · Pull Request #41 · badMade/claw-code

badMade · 2026-04-24T03:39:42Z

🎯 What: Adds an error path test for extract_text to cover the scenario where a non-existent PDF path is provided.
📊 Coverage: Specifically tests that extract_text correctly bubbles up file access errors as Err(String) and verifies the error message format.
✨ Result: Improved test coverage on error paths for PDF extraction capabilities within tools.

PR created automatically by Jules for task 3223951371666115144 started by @badMade

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

google-labs-jules · 2026-04-24T03:39:43Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

gemini-code-assist · 2026-04-24T03:39:45Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Adds a Rust unit test covering the error path for PDF text extraction when the provided PDF file path does not exist.

Changes:

Add a new unit test asserting extract_text returns an Err for a missing PDF file path.
Add repository artifacts (*.orig, patch.diff, and test_plan.sh) alongside the test change.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`rust/crates/tools/src/pdf_extract.rs`	Adds a new unit test for the “missing file” error path in `extract_text`.
`rust/crates/tools/src/pdf_extract.rs.orig`	Adds a full duplicate/backup copy of the PDF extraction module.
`patch.diff`	Adds a standalone patch artifact duplicating the code change.
`test_plan.sh`	Adds a script that runs `cargo test --workspace` from the `rust/` directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-24T03:41:50Z

+        let path = Path::new("/this/path/does/not/exist.pdf");
+        let result = extract_text(path);


extract_text_returns_error_for_nonexistent_file uses a hard-coded absolute path (/this/path/does/not/exist.pdf). This can make the test flaky if that path exists in some environments. Prefer constructing a guaranteed-missing path under std::env::temp_dir() (e.g., add a unique/random component) and asserting on that instead.

Suggested change

let path = Path::new("/this/path/does/not/exist.pdf");

let result = extract_text(path);

let unique = format!(

"clawd-pdf-extract-missing-{}-{}.pdf",

std::process::id(),

std::time::SystemTime::now()

.duration_since(std::time::UNIX_EPOCH)

.unwrap()

.as_nanos()

);

let path = std::env::temp_dir().join(unique);

let _ = std::fs::remove_file(&path);

let result = extract_text(path.as_path());

Copilot · 2026-04-24T03:41:50Z

+//! Minimal PDF text extraction.
+//!
+//! Reads a PDF file, locates `/Contents` stream objects, decompresses with
+//! flate2 when the stream uses `/FlateDecode`, and extracts text operators
+//! found between `BT` / `ET` markers.
+
+use std::io::Read as _;
+use std::path::Path;
+
+/// Extract all readable text from a PDF file.
+///
+/// Returns the concatenated text found inside BT/ET operators across all
+/// content streams.  Non-text pages or encrypted PDFs yield an empty string
+/// rather than an error.
+pub fn extract_text(path: &Path) -> Result<String, String> {
+    let data = std::fs::read(path).map_err(|e| format!("failed to read PDF: {e}"))?;
+    Ok(extract_text_from_bytes(&data))
+}
+


This .orig file appears to be a full duplicate/backup of pdf_extract.rs and is not referenced anywhere in the repo. Keeping it in-tree increases maintenance burden and risks it going stale/confusing future edits; it should be removed from the PR.

Suggested change

//! Minimal PDF text extraction.

//!

//! Reads a PDF file, locates `/Contents` stream objects, decompresses with

//! flate2 when the stream uses `/FlateDecode`, and extracts text operators

//! found between `BT` / `ET` markers.

use std::io::Read as _;

use std::path::Path;

/// Extract all readable text from a PDF file.

///

/// Returns the concatenated text found inside BT/ET operators across all

/// content streams. Non-text pages or encrypted PDFs yield an empty string

/// rather than an error.

pub fn extract_text(path: &Path) -> Result<String, String> {

let data = std::fs::read(path).map_err(|e| format!("failed to read PDF: {e}"))?;

Ok(extract_text_from_bytes(&data))

}

Copilot · 2026-04-24T03:41:51Z

+--- rust/crates/tools/src/pdf_extract.rs
+++ rust/crates/tools/src/pdf_extract.rs
+@@ -404,4 +404,12 @@
+         // cleanup
+         let _ = std::fs::remove_dir_all(&dir);
+     }
+
+    #[test]
+    fn extract_text_returns_error_for_nonexistent_file() {
+        let path = Path::new("/this/path/does/not/exist.pdf");
+        let result = extract_text(path);
+        assert!(result.is_err());
+        assert!(result.unwrap_err().starts_with("failed to read PDF:"));
+    }
+ }


patch.diff looks like a generated patch artifact and is not referenced by the build/tests. Committing it adds noise and can confuse future changes; please remove it from the PR (or place it under a docs/fixtures location if it’s intentionally needed, with a reference).

Suggested change

--- rust/crates/tools/src/pdf_extract.rs

+++ rust/crates/tools/src/pdf_extract.rs

@@ -404,4 +404,12 @@

// cleanup

let _ = std::fs::remove_dir_all(&dir);

}

+

+ #[test]

+ fn extract_text_returns_error_for_nonexistent_file() {

+ let path = Path::new("/this/path/does/not/exist.pdf");

+ let result = extract_text(path);

+ assert!(result.is_err());

+ assert!(result.unwrap_err().starts_with("failed to read PDF:"));

+ }

}

Copilot · 2026-04-24T03:41:51Z

+cd rust
+cargo test --workspace


This new script isn’t referenced by CI or repository documentation, and it duplicates the existing Rust CI job (cargo test --workspace). If it’s not required by an external system, please remove it to avoid repo clutter; if it is required, add a brief reference to where/how it’s used.

Suggested change

cd rust

cargo test --workspace

# Deprecated: this wrapper duplicated the repository Rust CI test command.

# Run `cd rust && cargo test --workspace` directly instead.

badMade · 2026-04-27T03:41:51Z

@claude code review

badMade · 2026-04-27T03:41:56Z

@codex code review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cdac3c1cf8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T03:43:24Z

+
+    #[test]
+    fn extract_text_returns_error_for_nonexistent_file() {
+        let path = Path::new("/this/path/does/not/exist.pdf");


Make nonexistent-path test independent of host filesystem

This test assumes /this/path/does/not/exist.pdf is always absent, but if that path exists in any runner image or local environment, extract_text can return Ok(_) and the assertion will fail despite correct behavior. That makes the test environment-dependent and potentially flaky; use a temp-directory path that is guaranteed not to exist for the duration of the test instead of a fixed absolute path.

Useful? React with 👍 / 👎.

@copilot, make changes based on the above suggestion.

chatgpt-codex-connector · 2026-04-27T03:43:24Z

@@ -0,0 +1,548 @@
+//! Minimal PDF text extraction.


Remove committed backup snapshot from source tree

Checking in pdf_extract.rs.orig adds a full duplicate of the production module, which is not part of the build and can silently drift from pdf_extract.rs, creating confusion in repo-wide search/review and unnecessary maintenance overhead. This looks like an editor/merge artifact and should be removed (or prevented via ignore rules) to keep the crate sources unambiguous.

Useful? React with 👍 / 👎.

@copilot, make changes based on the above suggestion.

badMade · 2026-04-30T02:03:52Z

@gemini-code-assist, what are the steps and line numbers to make this fix?

gemini-code-assist · 2026-04-30T02:03:55Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

🧪 Add error path test for extract_text when file does not exist

cdac3c1

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 24, 2026 03:39

Copilot started reviewing on behalf of badMade April 24, 2026 03:40 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

badMade marked this pull request as ready for review April 26, 2026 21:38

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

		let path = Path::new("/this/path/does/not/exist.pdf");
		let result = extract_text(path);

-        let path = Path::new("/this/path/does/not/exist.pdf");
-        let result = extract_text(path);
+        let unique = format!(
+            "clawd-pdf-extract-missing-{}-{}.pdf",
+            std::process::id(),
+            std::time::SystemTime::now()
+                .duration_since(std::time::UNIX_EPOCH)
+                .unwrap()
+                .as_nanos()
+        );
+        let path = std::env::temp_dir().join(unique);
+        let _ = std::fs::remove_file(&path);
+        let result = extract_text(path.as_path());

-cd rust
-cargo test --workspace
+# Deprecated: this wrapper duplicated the repository Rust CI test command.
+# Run `cd rust && cargo test --workspace` directly instead.

Conversation

badMade commented Apr 24, 2026

Uh oh!

google-labs-jules Bot commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

badMade commented Apr 27, 2026

Uh oh!

badMade commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

badMade Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

badMade Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

badMade commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants