Skip to content

Feature/merge near dupes#257

Draft
rlskoeser wants to merge 2 commits intodevelopfrom
feature/merge-near-dupes
Draft

Feature/merge near dupes#257
rlskoeser wants to merge 2 commits intodevelopfrom
feature/merge-near-dupes

Conversation

@rlskoeser
Copy link
Collaborator

Associated Issue(s): resolves #251

Changes in this PR

  • Preliminary logic to identify & merge excerpts with a high degree of overlap

Notes

  • Based on previous notebook code and also span overlap & overlap factor
  • Does not yet merge excerpts; hoping to figure out a way to refactor & reuse exact span merge logic

@rlskoeser rlskoeser requested a review from laurejt February 25, 2026 20:00
@rlskoeser rlskoeser changed the base branch from develop to feature/merge-exact-spans March 2, 2026 20:01
# to avoid having to reconcile columns first
output_df = pl.concat([output_df, merged_output_df], how="diagonal")

# now identify & merge partial overlap
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurejt sorry for the confusing diff! I didn't recognize how many changes here. This is the part of the code I wanted your thoughts on

Base automatically changed from feature/merge-exact-spans to develop March 4, 2026 19:08
- new method to identify overlapping spans, with configurable overlap
- new method to combine groups of ids
- first-pass to refactor merge logic for use by exact and partial overlap
- initial method to run all partial merge steps
@rlskoeser rlskoeser force-pushed the feature/merge-near-dupes branch from e51dfaa to c67fed2 Compare March 13, 2026 16:49
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.93%. Comparing base (6efc35b) to head (c67fed2).

❌ Your patch check has failed because the patch coverage (25.00%) is below the target coverage (95.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #257      +/-   ##
===========================================
- Coverage    79.76%   78.93%   -0.84%     
===========================================
  Files           23       23              
  Lines         2120     2150      +30     
===========================================
+ Hits          1691     1697       +6     
- Misses         429      453      +24     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant