feat(attest-npmjs.com): add support for dir inside the subject path#74
feat(attest-npmjs.com): add support for dir inside the subject path#74AEnguerrand wants to merge 17 commits intomainfrom
Conversation
…tion upload process - Updated the action to support multiple .tgz files and improved error handling for missing files. - Changed output format to include a list of packages instead of individual package details. - Refactored the attestation generation script to process multiple packages and construct a JSON array for attestations. - Improved the upload process to handle multiple attestation files and added error handling for upload failures.
… handling - Enhanced the handling of .tgz files by introducing a function to process each file individually, allowing for better error management and validation of JSON outputs from npm pack. - Updated the logic to use 'find' for locating attestation files, improving robustness against large file counts and filenames with spaces. - Ensured that the action gracefully handles cases where no valid npm packages or attestation files are found, providing clearer error messages.
…handling - Changed the output to generate JSON files for package lists and attestation files, improving data management. - Updated the script to read from the new JSON files instead of environment variables, enhancing reliability and error handling. - Added checks to ensure the existence of required files before processing, providing clearer error messages for missing files.
…files - Added logic to process pre-packed .tgz files by extracting metadata directly from the tarball, improving efficiency. - Updated error handling for invalid package extraction and JSON output validation, providing clearer warnings. - Maintained existing functionality for non-packed files by continuing to use 'npm pack' for metadata generation.
…integrity calculation - Introduced a unique temporary file for extracting package.json from pre-packed .tgz files to prevent collisions during parallel execution. - Enhanced integrity calculation by stripping whitespace from the base64 output, ensuring cleaner results. - Updated error handling to remove temporary files appropriately, maintaining a clean environment.
…bugging - Updated integrity calculation to prefer sha512sum if available, falling back to openssl for compatibility. - Improved logging to include file details and calculated SHA512 in hex format for better debugging of integrity mismatches. - Ensured whitespace is stripped from base64 output for cleaner results.
… enhance debugging output - Simplified integrity calculation by using openssl for base64 encoding and ensuring whitespace is stripped for cleaner results. - Improved debugging output to include calculated SHA512 in hex format and detailed file information, aiding in troubleshooting integrity mismatches. - Adjusted fallback logic for SHA512 calculation to enhance compatibility and reliability.
… calculation - Replaced the use of openssl and sha512sum with a Node.js script for calculating the integrity hash, improving reliability and avoiding shell-related issues. - Enhanced error handling for hash calculation failures, providing clearer feedback during execution. - Updated debugging output to include both SHA512 hex and base64 integrity values for better traceability.
- Reformatted the Node.js script for calculating the integrity hash to improve readability and maintainability. - Ensured consistent error handling and output formatting for better debugging and traceability during hash calculations.
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the attest-for-npmsjs-com action to support processing multiple npm packages from a directory, rather than just a single package file. The changes refactor the code to iterate over multiple packages and generate attestations for each one.
Key changes:
- Refactored to support both single files and directories as
subject-path - Added support for pre-packed
.tgzfiles alongside npm package directories - Changed from single package processing to batch processing with iteration over multiple packages
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| slsa-github-generator-nodejs-layout.sh | Refactored to process multiple packages via a process_package function and iterate over a packages list file |
| action.yml | Added process_file function to handle both pre-packed .tgz files and directories, changed to generate attestations for multiple packages in a loop |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
actions/attest-for-npmsjs-com/slsa-github-generator-nodejs-layout.sh
Outdated
Show resolved
Hide resolved
| fi | ||
| echo "ATTESTATION_SIGNED_PATH=${ATTESTATION_SIGNED_PATH}" >> $GITHUB_OUTPUT | ||
|
|
||
| ATTESTATION_FILES_JSON=$(echo "$ATTESTATION_FILES" | jq -R -s -c 'split("\n") | map(select(length > 0))') |
There was a problem hiding this comment.
If filenames contain spaces or special characters, the ATTESTATION_FILES variable (which is the output of find) will not properly preserve individual filenames when converted to JSON. Use 'find -print0' with null-delimited output and proper handling to avoid issues with filenames containing whitespace or newlines.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…out.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| local pkg_name=$(jq -r '.name' "$tmp_pkg_json") | ||
| local pkg_version=$(jq -r '.version' "$tmp_pkg_json") | ||
| local filename=$(basename "$file") |
There was a problem hiding this comment.
Using basename on the file path could cause issues if the tarball is not in the current directory. The attestation will reference just the filename without its full path, which might cause confusion or errors in subsequent steps that need to locate the file. Consider preserving the relative path or using a consistent approach for file identification.
| local filename=$(basename "$file") | |
| local filename="$file" |
| const hex = crypto.createHash('sha512').update(content).digest('hex'); | ||
| const base64 = crypto.createHash('sha512').update(content).digest('base64'); |
There was a problem hiding this comment.
The hash is calculated twice (lines 63-64) - once for hex and once for base64. This requires reading and processing the file content twice, which is inefficient for large files. Consider calculating the hash once and converting the result to both formats, or use a single Buffer to convert between hex and base64 representations.
| const hex = crypto.createHash('sha512').update(content).digest('hex'); | |
| const base64 = crypto.createHash('sha512').update(content).digest('base64'); | |
| const hash = crypto.createHash('sha512').update(content).digest(); | |
| const hex = hash.toString('hex'); | |
| const base64 = hash.toString('base64'); |
| echo "Error: PACKAGES_LIST_FILE (${PACKAGES_LIST_FILE}) not found" | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
The temporary file cleanup at line 98 should be wrapped in a trap to ensure it's removed even if the script exits early due to an error. Consider adding trap "rm -f \"$ATTESTATIONS_FILE\"" EXIT after creating the temp file to prevent file leakage on script failures.
| shell: bash | ||
| run: | | ||
| # Initialize a temporary file to accumulate JSON results from npm pack | ||
| touch packages_accumulated.json |
There was a problem hiding this comment.
The temporary files packages_accumulated.json, packages_list.json, and attestation_files.json created during execution are not cleaned up. Consider adding cleanup logic at the end of the workflow or using trap handlers to ensure these files are removed after the action completes, preventing potential workspace pollution.
| digest: { | ||
| ($alg): $digest | ||
| } | ||
| } |
There was a problem hiding this comment.
The error message could be more helpful by providing guidance on what the user should check. Consider adding information about where this file should be created or what might have gone wrong in the previous step. For example: "Error: PACKAGES_LIST_FILE (${PACKAGES_LIST_FILE}) not found. This file should have been created by the package-details step."
| # and avoid "Argument list too long" errors. | ||
| if find ./attestations-signed -maxdepth 1 -name "*.build.slsa" -print -quit | grep -q .; then | ||
| ATTESTATION_FILES=$(find ./attestations-signed -maxdepth 1 -name "*.build.slsa") | ||
| else |
There was a problem hiding this comment.
The JSON array construction using split("\n") is fragile. If any file path contains a newline character (which is valid in Unix filenames), this will break. Additionally, if ATTESTATION_FILES is empty or contains only whitespace, the map(select(length > 0)) might still produce unexpected results. Consider using a more robust approach, such as using find with -print0 and processing null-delimited output, or directly constructing the JSON array from the find command.
| const hex = crypto.createHash('sha512').update(content).digest('hex'); | ||
| const base64 = crypto.createHash('sha512').update(content).digest('base64'); | ||
| console.log(hex + ' ' + base64); | ||
| } catch (e) { | ||
| console.error(e); | ||
| process.exit(1); | ||
| } |
There was a problem hiding this comment.
The inline Node.js script is complex and hard to maintain within the bash script. Consider extracting this into a separate .js file in the action directory for better readability, testing, and maintenance. This would also allow for proper syntax highlighting and linting of the JavaScript code.
| if (response.status !== 201) { | ||
| throw new Error(`Failed to upload attestation for ${attestationPath}. HTTP status code: ${response.status}`); | ||
| } |
There was a problem hiding this comment.
The check if (response.status !== 201) is redundant and potentially misleading. When using github.request(), a non-201 status code will automatically throw an error, so this check will never be reached in failure scenarios. The error handling is already covered by the surrounding try-catch block. Consider removing this check.
| if (response.status !== 201) { | |
| throw new Error(`Failed to upload attestation for ${attestationPath}. HTTP status code: ${response.status}`); | |
| } | |
| // Success: github.request() did not throw, so status is 201. | |
| # Use 'find' to safely handle directories with a large number of files or filenames containing spaces. | ||
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. | ||
| while IFS= read -r -d '' file; do | ||
| process_file "$file" | ||
| done < <(find "${{ inputs.subject-path }}" -maxdepth 1 -name "*.tgz" -print0) |
There was a problem hiding this comment.
[nitpick] When subject-path is a directory, the code only processes .tgz files (line 122). This means it won't automatically pack npm packages from subdirectories containing package.json files. Consider adding a comment explaining this behavior, or extending the functionality to also find and pack directories with package.json files if that's the intended use case. The PR title mentions "add support for dir inside the subject path" which suggests this might be incomplete.
| # Use 'find' to safely handle directories with a large number of files or filenames containing spaces. | |
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. | |
| while IFS= read -r -d '' file; do | |
| process_file "$file" | |
| done < <(find "${{ inputs.subject-path }}" -maxdepth 1 -name "*.tgz" -print0) | |
| # When subject-path is a directory, process both: | |
| # - .tgz files at the top level | |
| # - subdirectories (at top level) containing a package.json (i.e., unpacked npm packages) | |
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. | |
| # First, process all .tgz files at the top level | |
| while IFS= read -r -d '' file; do | |
| process_file "$file" | |
| done < <(find "${{ inputs.subject-path }}" -maxdepth 1 -type f -name "*.tgz" -print0) | |
| # Next, process all subdirectories at the top level that contain a package.json | |
| while IFS= read -r -d '' dir; do | |
| # Only process if package.json exists in the directory | |
| if [ -f "$dir/package.json" ]; then | |
| # Run npm pack in the directory, capture the output .tgz file path | |
| tgz_path=$(cd "$dir" && npm pack --json | jq -r '.[0].filename') | |
| if [ -n "$tgz_path" ] && [ -f "$dir/$tgz_path" ]; then | |
| process_file "$dir/$tgz_path" | |
| # Optionally, clean up the packed .tgz file | |
| rm -f "$dir/$tgz_path" | |
| else | |
| echo "Warning: npm pack failed for $dir. Skipping." | |
| fi | |
| fi | |
| done < <(find "${{ inputs.subject-path }}" -mindepth 1 -maxdepth 1 -type d -print0) |
| shell: bash | ||
| id: scan-attestations-signed |
There was a problem hiding this comment.
The logic combining find ... -print -quit | grep -q . is unnecessarily complex. The -print -quit will print at most one matching file and exit, then grep -q . checks if there's any output. This can be simplified to just checking the exit code of find: if find ./attestations-signed -maxdepth 1 -name "*.build.slsa" -quit | grep -q .; then. However, a cleaner approach would be to store the full find results in a variable first and check if it's empty, avoiding the need to run find twice (once for checking, once for getting all files).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if [ -f "${{ inputs.subject-path }}" ]; then | ||
| PACKAGE_PATH=${{ inputs.subject-path }} | ||
| process_file "${{ inputs.subject-path }}" | ||
| elif [ -d "${{ inputs.subject-path }}" ]; then | ||
| # Use 'find' to safely handle directories with a large number of files or filenames containing spaces. | ||
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. | ||
| while IFS= read -r -d '' file; do | ||
| process_file "$file" | ||
| done < <(find "${{ inputs.subject-path }}" -maxdepth 1 -name "*.tgz" -print0) | ||
| else | ||
| PACKAGE_PATH=$(ls -1 ${{ inputs.subject-path }}/*.tgz | head -n 1) | ||
| echo "Error: subject-path '${{ inputs.subject-path }}' is not a valid file or directory." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
The code only checks if the subject-path is a file or directory, but doesn't handle the case where it doesn't exist. If subject-path points to a non-existent path, both the -f and -d checks will fail, leading to the else block at line 123. However, the error message "is not a valid file or directory" is misleading—it could also mean the path doesn't exist. Consider adding an explicit check for existence first:
if [ ! -e "${{ inputs.subject-path }}" ]; then
echo "Error: subject-path '${{ inputs.subject-path }}' does not exist."
exit 1
elif [ -f "${{ inputs.subject-path }}" ]; then
process_file "${{ inputs.subject-path }}"
elif [ -d "${{ inputs.subject-path }}" ]; then
# ... directory handling ...
else
echo "Error: subject-path '${{ inputs.subject-path }}' is not a regular file or directory."
exit 1
fi| local node_hash_script=" | ||
| const fs = require('fs'); | ||
| const crypto = require('crypto'); | ||
| try { | ||
| const content = fs.readFileSync(process.argv[1]); | ||
| const hex = crypto.createHash('sha512').update(content).digest('hex'); | ||
| const base64 = crypto.createHash('sha512').update(content).digest('base64'); | ||
| console.log(hex + ' ' + base64); | ||
| } catch (e) { | ||
| console.error(e); | ||
| process.exit(1); | ||
| } | ||
| " |
There was a problem hiding this comment.
The Node.js hash calculation script reads file content synchronously using fs.readFileSync() without any size limit. If a malicious or corrupted .tgz file is extremely large, this could cause the Node.js process to run out of memory. While npm packages are typically reasonably sized, consider adding a file size check before reading, or use streaming APIs for more robust handling:
const stats = fs.statSync(process.argv[1]);
if (stats.size > 100 * 1024 * 1024) { // 100MB limit as example
console.error('File too large');
process.exit(1);
}| # We suppress stderr to silently skip invalid packages or files that are not valid npm packages, | ||
| # allowing the action to proceed with only the valid ones found in the directory. | ||
| if ! json_output=$(npm pack "$file" --json 2>/dev/null); then | ||
| echo "Warning: Failed to pack $file. It might not be a valid npm package. Skipping." |
There was a problem hiding this comment.
The error handling for npm pack may silently skip packages without providing adequate feedback. When npm pack fails, the function returns without incrementing any counter or logging which file failed (the warning is shown but not clearly visible in CI logs). If all packages in a directory fail to pack, the later check at line 129 will catch this, but it won't indicate which specific packages failed. Consider improving the warning message to be more prominent (e.g., using echo "::warning::Failed to pack $file...") so issues are visible in GitHub Actions annotations.
| echo "Warning: Failed to pack $file. It might not be a valid npm package. Skipping." | |
| echo "::warning::Failed to pack $file. It might not be a valid npm package. Skipping." |
|
|
||
| ### Features | ||
|
|
||
| * Add artifact upload for npm package tarballs |
There was a problem hiding this comment.
The changelog entry "Add artifact upload for npm package tarballs" doesn't accurately describe the changes in this PR. Based on the PR title and code changes, the main feature is adding support for directories in the subject-path (allowing attestation of multiple packages) rather than artifact upload functionality. Consider updating to something like: "Add support for directory input in subject-path to attest multiple npm packages"
| * Add artifact upload for npm package tarballs | |
| * Add support for directory input in subject-path to attest multiple npm packages |
| echo "ATTESTATION_SIGNED_PATH=${ATTESTATION_SIGNED_PATH}" >> $GITHUB_OUTPUT | ||
| - name: Upload Attestation to the Github API | ||
|
|
||
| ATTESTATION_FILES_JSON=$(echo "$ATTESTATION_FILES" | jq -R -s -c 'split("\n") | map(select(length > 0))') |
There was a problem hiding this comment.
The conversion of file paths to JSON using jq -R -s -c splits on newlines, but if any file path contains a newline character (which is technically possible on Unix systems), this will create incorrect entries in the array. While newlines in filenames are rare, it's better to use a more robust approach. Consider using find with -print0 and processing with null-delimited strings, or collect paths into a JSON array directly:
ATTESTATION_FILES_JSON=$(find ./attestations-signed -maxdepth 1 -name "*.build.slsa" -print0 | jq -R -s -c 'split("\u0000") | map(select(length > 0))')| ATTESTATION_FILES_JSON=$(echo "$ATTESTATION_FILES" | jq -R -s -c 'split("\n") | map(select(length > 0))') | |
| ATTESTATION_FILES_JSON=$(find ./attestations-signed -maxdepth 1 -name "*.build.slsa" -print0 | jq -R -s -c 'split("\u0000") | map(select(length > 0))') |
| local node_hash_script=" | ||
| const fs = require('fs'); | ||
| const crypto = require('crypto'); | ||
| try { | ||
| const content = fs.readFileSync(process.argv[1]); | ||
| const hex = crypto.createHash('sha512').update(content).digest('hex'); | ||
| const base64 = crypto.createHash('sha512').update(content).digest('base64'); | ||
| console.log(hex + ' ' + base64); | ||
| } catch (e) { | ||
| console.error(e); | ||
| process.exit(1); | ||
| } | ||
| " |
There was a problem hiding this comment.
The Node.js hash calculation script computes both hex and base64 hashes, but only the base64 hash is used (line 78 integrity=$(echo "$hash_output" | awk '{print $2}')). The hex hash on line 77 is calculated, printed for debugging, but serves no other purpose. Consider removing the hex hash calculation to simplify the code and improve performance:
const content = fs.readFileSync(process.argv[1]);
const base64 = crypto.createHash('sha512').update(content).digest('base64');
console.log(base64);And update line 78 to: local integrity="$hash_output"
| if ! tar -xzOf "$file" package/package.json > "$tmp_pkg_json" 2>/dev/null; then | ||
| echo "Warning: Failed to extract package.json from $file. Skipping." | ||
| rm -f "$tmp_pkg_json" | ||
| return | ||
| fi |
There was a problem hiding this comment.
When extracting package.json from a tarball fails, the function warns and returns, but the error message from tar is silenced with 2>/dev/null. This makes debugging difficult if a .tgz file is corrupted or has an unexpected structure. Consider capturing the error and including it in the warning message, or at least allowing the stderr to be visible:
if ! tar -xzOf "$file" package/package.json > "$tmp_pkg_json" 2>&1; then
echo "Warning: Failed to extract package.json from $file. The file may be corrupted or not a valid npm package tarball. Skipping."
rm -f "$tmp_pkg_json"
return
fi| if ! tar -xzOf "$file" package/package.json > "$tmp_pkg_json" 2>/dev/null; then | |
| echo "Warning: Failed to extract package.json from $file. Skipping." | |
| rm -f "$tmp_pkg_json" | |
| return | |
| fi | |
| local tmp_tar_err=$(mktemp) | |
| if ! tar -xzOf "$file" package/package.json > "$tmp_pkg_json" 2> "$tmp_tar_err"; then | |
| local tar_err_msg=$(cat "$tmp_tar_err") | |
| echo "Warning: Failed to extract package.json from $file. Skipping." | |
| echo "tar error: $tar_err_msg" | |
| rm -f "$tmp_pkg_json" "$tmp_tar_err" | |
| return | |
| fi | |
| rm -f "$tmp_tar_err" |
| process_file "${{ inputs.subject-path }}" | ||
| elif [ -d "${{ inputs.subject-path }}" ]; then | ||
| # Use 'find' to safely handle directories with a large number of files or filenames containing spaces. | ||
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. |
There was a problem hiding this comment.
When processing a directory, if no .tgz files are found, the find command will complete successfully but won't process any files. The subsequent check at line 129 will catch this case, but the error message "No valid npm packages found to attest" is ambiguous—it doesn't distinguish between "no .tgz files found" and "found .tgz files but all failed to process". Consider adding a more specific check or error message after the find loop to clarify which scenario occurred.
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. | |
| # We process files sequentially to avoid "Argument list too long" errors that can occur with shell expansion. | |
| tgz_files_count=$(find "${{ inputs.subject-path }}" -maxdepth 1 -name "*.tgz" | wc -l) | |
| if [ "$tgz_files_count" -eq 0 ]; then | |
| echo "Error: No .tgz files found in directory '${{ inputs.subject-path }}'." | |
| exit 1 | |
| fi |
| - name: Upload Attestation to the Github API | ||
|
|
||
| ATTESTATION_FILES_JSON=$(echo "$ATTESTATION_FILES" | jq -R -s -c 'split("\n") | map(select(length > 0))') | ||
| echo "$ATTESTATION_FILES_JSON" > attestation_files.json |
There was a problem hiding this comment.
The temporary file attestation_files.json is created but never cleaned up. While GitHub Actions cleans up the workspace after job completion, it's good practice to explicitly remove temporary files. Consider adding cleanup either at the end of this step or in a subsequent step.
| if ! digest=$(echo "${base64_digest}" | base64 -d 2>/dev/null | od -A n -v -t x1 | tr -d ' \n'); then | ||
| echo "Error: Failed to base64-decode integrity digest for package '${PACKAGE_NAME}': '${base64_digest}'" >&2 |
There was a problem hiding this comment.
The base64 decoding uses the -d flag, which is standard on Linux but may not be available or may behave differently on some systems (e.g., macOS uses -D in older versions). While GitHub Actions typically runs on Linux, consider adding a comment about the platform dependency or use a more portable approach. Additionally, the error message could be more specific about what went wrong (invalid base64 vs. other error).
| if ! digest=$(echo "${base64_digest}" | base64 -d 2>/dev/null | od -A n -v -t x1 | tr -d ' \n'); then | |
| echo "Error: Failed to base64-decode integrity digest for package '${PACKAGE_NAME}': '${base64_digest}'" >&2 | |
| # Note: base64 -d is standard on Linux, but macOS may require -D. Alternatively, openssl base64 -d is more portable. | |
| if ! digest=$(echo "${base64_digest}" | openssl base64 -d 2>/dev/null | od -A n -v -t x1 | tr -d ' \n'); then | |
| echo "Error: Failed to base64-decode integrity digest for package '${PACKAGE_NAME}': '${base64_digest}'. This may be due to invalid base64 input or platform incompatibility (try installing openssl or GNU coreutils)." >&2 |
No description provided.