Fix release notes parser to handle bot-generated content#66
Conversation
The regex pattern was greedy and incorrectly captured bot-generated 'Summary by Sourcery' content within release notes sections. This caused the parser to fail with 'expect section header' errors. Changes: - Changed regex from greedy to reluctant quantifier (.*?) - Replaced negative lookahead with positive lookahead for clarity - Added \s* to handle optional whitespace before 'Summary by' - Pattern now stops before any bot-generated content The fix maintains backward compatibility with all existing test cases while correctly handling PRs with bot-generated summaries. Test coverage: - Added test case for Sourcery bot content - All 36 TestCheckReleaseNotesTask tests pass - All TestGenerateReleaseNotesTask tests pass - No regressions detected Signed-off-by: Yihong Wang <yh.wang@ibm.com>
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdjusts the release-notes regex to stop before bot-generated "Summary by" footers and adds a targeted test to ensure PRs with Sourcery bot content are parsed correctly without regressing existing behavior. Sequence diagram for release notes parsing with bot-generated Summary by contentsequenceDiagram
participant CiJob
participant GenerateReleaseNotesTask
participant ReleaseNotesParser
participant ReleaseNotePattern
CiJob->>GenerateReleaseNotesTask: run()
GenerateReleaseNotesTask->>ReleaseNotesParser: parse(prBody)
ReleaseNotesParser->>ReleaseNotePattern: match("== release notes ==...Summary by Sourcery...")
Note over ReleaseNotePattern: Pattern: == release note(s)? ==\w*\n?(.*?)(?=\s*Summary by|\z)
ReleaseNotePattern-->>ReleaseNotesParser: match groups
Note over ReleaseNotesParser: Group 2 contains only release notes section
ReleaseNotesParser-->>GenerateReleaseNotesTask: ReleaseNotes
GenerateReleaseNotesTask-->>CiJob: success (no "expect section header" error)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
cc @unidevel |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The updated pattern stops on any
Summary bypreceded by optional whitespace anywhere in the text, which may prematurely terminate content ifSummary byappears in the middle of a sentence; consider tightening the lookahead to something like(?=\n\s*Summary by|\z)so it only triggers at the start of a line. - The comment above
RELEASE_NOTE_PATTERNstill refers to stopping before a "\nSummary by" footer, but the regex now uses\s*Summary by; either align the implementation with the comment or update the comment to reflect the broader matching behavior.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The updated pattern stops on any `Summary by` preceded by optional whitespace anywhere in the text, which may prematurely terminate content if `Summary by` appears in the middle of a sentence; consider tightening the lookahead to something like `(?=\n\s*Summary by|\z)` so it only triggers at the start of a line.
- The comment above `RELEASE_NOTE_PATTERN` still refers to stopping before a "\nSummary by" footer, but the regex now uses `\s*Summary by`; either align the implementation with the comment or update the comment to reflect the broader matching behavior.
## Individual Comments
### Comment 1
<location path="presto-release-tools/src/main/java/com/facebook/presto/release/tasks/GenerateReleaseNotesTask.java" line_range="88-91" />
<code_context>
// Pattern: "== release note(s)? ==" followed by optional whitespace and newline, then captures content
// until it encounters "\nSummary by" or reaches end of string
- protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?((?:(?!\\nSummary by).)*)", CASE_INSENSITIVE + MULTILINE + DOTALL);
+ protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?(.*?)(?=\\s*Summary by|\\z)", CASE_INSENSITIVE + MULTILINE + DOTALL);
protected static final Pattern HEADER_PATTERN = Pattern.compile("(.*) change(s)$", CASE_INSENSITIVE);
public static final List<Pattern> VALID_SECTION_HEADERS = ImmutableList.of(
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The `Summary by` lookahead should likely be anchored to the start of a line to avoid truncating notes that mention it in the middle of text.
With `MULTILINE + DOTALL`, `(?=\s*Summary by|\z)` will end the match on any `Summary by` preceded by whitespace anywhere in the body, not only on footer lines. A note like `... as summarized in Summary by XYZ ...` would be truncated. If this is meant to target only footer-style lines, anchor to line start, e.g. `(?=^\s*Summary by|\z)` (keeping `MULTILINE`).
```suggestion
// Matches release notes section and stops capturing before "Summary by" footer (added by bots like Sourcery)
// Pattern: "== release note(s)? ==" followed by optional whitespace and newline, then captures content
// until it encounters a line starting with "Summary by" or reaches end of string
protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?(.*?)(?=^\\s*Summary by|\\z)", CASE_INSENSITIVE + MULTILINE + DOTALL);
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Matches release notes section and stops capturing before "Summary by" footer (added by bots like Sourcery) | ||
| // Pattern: "== release note(s)? ==" followed by optional whitespace and newline, then captures content | ||
| // until it encounters "\nSummary by" or reaches end of string | ||
| protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?((?:(?!\\nSummary by).)*)", CASE_INSENSITIVE + MULTILINE + DOTALL); | ||
| protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?(.*?)(?=\\s*Summary by|\\z)", CASE_INSENSITIVE + MULTILINE + DOTALL); |
There was a problem hiding this comment.
suggestion (bug_risk): The Summary by lookahead should likely be anchored to the start of a line to avoid truncating notes that mention it in the middle of text.
With MULTILINE + DOTALL, (?=\s*Summary by|\z) will end the match on any Summary by preceded by whitespace anywhere in the body, not only on footer lines. A note like ... as summarized in Summary by XYZ ... would be truncated. If this is meant to target only footer-style lines, anchor to line start, e.g. (?=^\s*Summary by|\z) (keeping MULTILINE).
| // Matches release notes section and stops capturing before "Summary by" footer (added by bots like Sourcery) | |
| // Pattern: "== release note(s)? ==" followed by optional whitespace and newline, then captures content | |
| // until it encounters "\nSummary by" or reaches end of string | |
| protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?((?:(?!\\nSummary by).)*)", CASE_INSENSITIVE + MULTILINE + DOTALL); | |
| protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?(.*?)(?=\\s*Summary by|\\z)", CASE_INSENSITIVE + MULTILINE + DOTALL); | |
| // Matches release notes section and stops capturing before "Summary by" footer (added by bots like Sourcery) | |
| // Pattern: "== release note(s)? ==" followed by optional whitespace and newline, then captures content | |
| // until it encounters a line starting with "Summary by" or reaches end of string | |
| protected static final Pattern RELEASE_NOTE_PATTERN = Pattern.compile("== release note(s)? ==\\w*\\n?(.*?)(?=^\\s*Summary by|\\z)", CASE_INSENSITIVE + MULTILINE + DOTALL); |
The regex pattern was greedy and incorrectly captured bot-generated 'Summary by Sourcery' content within release notes sections. This caused the parser to fail with 'expect section header' errors.
Changes:
The fix maintains backward compatibility with all existing test cases while correctly handling PRs with bot-generated summaries.
Test coverage: