feat: expression to match pronoun-be, word pair or contraction#2944
feat: expression to match pronoun-be, word pair or contraction#2944hippietrail wants to merge 8 commits intoAutomattic:masterfrom
Conversation
86xsk
left a comment
There was a problem hiding this comment.
One thing I'm wondering is whether it might make sense to implement this as a function of SequenceExpr rather than creating a dedicated Expr.
On one hand, having a dedicated type does help to organize things a bit, on the other, it is just a wrapper for a preset SequnceExpr under the hood.
harper-core/src/expr/pronoun_be.rs
Outdated
| let expr = SequenceExpr::default().then_any_of(vec![ | ||
| Box::new( | ||
| SequenceExpr::default() | ||
| .then_subject_pronoun() | ||
| .t_ws() | ||
| .t_set(&["am", "are", "is", "was", "were"]), | ||
| ), | ||
| Box::new(WordSet::new(&[ | ||
| "i'm", "we're", "you're", "he's", "she's", "it's", "they're", | ||
| ])), | ||
| ]); |
There was a problem hiding this comment.
I would consider LazyLocking this or constructing and storing this inside the struct, especially since constructing a WordSet (currently) requires O(n2) string comparisons.
There was a problem hiding this comment.
I would consider LazyLocking this or constructing and storing this inside the struct, especially since constructing a
WordSet(currently) requires O(n2) string comparisons.
Hmm even more like an implementation detail? You make it sound like there's a more efficient way to implement WordSet?
There was a problem hiding this comment.
The biggest cost with WordSet is the duplicate checking it does whenever a value is inserted. You can get a sizable performance benefit (~10%) on the lint_essay benchmark simply by removing those checks in WordSet::add and WordSet::add_chars. (Since WordSet stores its words in a SmallVec, checking for duplicates requires scanning through all words already contained.)
I've dabbled locally with trying to optimize it before, but I didn't spend too much time on it and didn't finish anything.
There was a problem hiding this comment.
I haven't dedicated time to that problem because the vast majority of WordSet constructions are done at startup, and thus don't have a tangible impact on linting latency.
harper-core/src/expr/pronoun_be.rs
Outdated
|
|
||
| use super::{Expr, SequenceExpr}; | ||
|
|
||
| #[derive(Default)] |
There was a problem hiding this comment.
I don't think you strictly need the Default derive here since it's a ZST, though I guess it doesn't hurt to have.
My one concern is that it may lead to usage of PronounBe::default() even though PronounBe alone would suffice as long as it remains a ZST.
There was a problem hiding this comment.
I don't think you strictly need the
Defaultderive here since it's a ZST, though I guess it doesn't hurt to have.
Oh I thought I deleted that! I was doing lots of experimentation with different ctors for optional inclusions such as the common misspellings without the apostrophes.
My one concern is that it may lead to usage of
PronounBe::default()even thoughPronounBealone would suffice as long as it remains a ZST.
Removed it. Thanks.
Hmm I hadn't considered that. I did it this way because in my head it's similar to the 'spelled number', 'time unit', etc. expressions. Though I'm not sure there were any similar ones not made by me so I wonder what I based the decision on with my first one. I think 'fixed expression' might've been the OG.
I think the idea was that semantically it's its own expression so it made semantic sense. On my one hand |
elijah-potter
left a comment
There was a problem hiding this comment.
It is preferable to have it as its own type. The organization is worth any potential performance trade off. I can't imagine there is a trade-off, though, since the compiler and the downstream branch predictor tends be pretty good at optimizing these situations.
However, as @86xsk alluded to, you should not be constructing those WordSet instances in the hot loop. Please refactor this to construct them as part of a new constructor.
Oh I didn't even realize! Fix is in... |
Issues
N/A
Description
An expression that matches pronouns followed by inflections of the word "be", including when they're joined as contractions.
Using this
Exprwill avoid some potential pitfalls:But it does not avoid mismatched pairs such as "I are" or "you is".
Nor does it support common "wrong apostrophes":
;or´Potential improvements:
How Has This Been Tested?
Unit tests for all standard pairs and contractions using sentences harvested from GitHub.
Handcrafted unit tests for edge cases.
Checklist