Skip to content

Haschunkrefs sentence#4514

Open
Xinlu-Y wants to merge 5 commits intomainfrom
haschunkrefs-sentence
Open

Haschunkrefs sentence#4514
Xinlu-Y wants to merge 5 commits intomainfrom
haschunkrefs-sentence

Conversation

@Xinlu-Y
Copy link
Copy Markdown
Collaborator

@Xinlu-Y Xinlu-Y commented Dec 9, 2025

Contributes to #4476

  • Move the sdep/shortdep/lnames helpers into Language.Drasil.Sentence, derive a real HasChunkRefs instance, and export sentenceRefs so any chunk field of type Sentence reports its UIDs.
  • Leave Language.Drasil.Sentence.Extract as a re-export shim so existing imports still build.

This prepares the automation work in #4434 by turning Sentence into a proper “chunk atom” for declareHasChunkRefs.

JacquesCarette
JacquesCarette previously approved these changes Dec 10, 2025
Copy link
Copy Markdown
Owner

@JacquesCarette JacquesCarette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, but I also want @balacij 's review, to make sure this is going in the same direction that he's working on.

Comment on lines +158 to +234
-- | Helpers for extracting references -----------------------------------------

-- | Generic traverse of all positions that could lead to /symbolic/ 'UID's from 'Sentence's.
getUIDs :: Sentence -> [UID]
getUIDs (Ch ShortStyle _ _) = []
getUIDs (Ch TermStyle _ _) = []
getUIDs (Ch PluralTerm _ _) = []
getUIDs (SyCh a) = [a]
getUIDs Sy {} = []
getUIDs NP {} = []
getUIDs S {} = []
getUIDs P {} = []
getUIDs Ref {} = []
getUIDs Percent = []
getUIDs ((:+:) a b) = getUIDs a ++ getUIDs b
getUIDs (Quote a) = getUIDs a
getUIDs (E a) = meNames a
getUIDs EmptyS = []

-- | Generic traverse of all positions that could lead to /symbolic/ and /abbreviated/ 'UID's from 'Sentence's
-- but doesn't go into expressions.
getUIDshort :: Sentence -> [UID]
getUIDshort (Ch ShortStyle _ a) = [a]
getUIDshort (Ch TermStyle _ _) = []
getUIDshort (Ch PluralTerm _ _) = []
getUIDshort SyCh {} = []
getUIDshort Sy {} = []
getUIDshort NP {} = []
getUIDshort S {} = []
getUIDshort Percent = []
getUIDshort P {} = []
getUIDshort Ref {} = []
getUIDshort ((:+:) a b) = getUIDshort a ++ getUIDshort b
getUIDshort (Quote a) = getUIDshort a
getUIDshort E {} = []
getUIDshort EmptyS = []

-----------------------------------------------------------------------------
-- And now implement the exported traversals all in terms of the above
-- | This is to collect /symbolic/ 'UID's that are printed out as a 'Symbol'.
sdep :: Sentence -> [UID]
sdep = nubOrd . getUIDs
{-# INLINE sdep #-}

-- This is to collect symbolic 'UID's that are printed out as an /abbreviation/.
shortdep :: Sentence -> [UID]
shortdep = nubOrd . getUIDshort
{-# INLINE shortdep #-}

-- | Generic traverse of all positions that could lead to /reference/ 'UID's from 'Sentence's.
lnames :: Sentence -> [UID]
lnames Ch {} = []
lnames SyCh {} = []
lnames Sy {} = []
lnames NP {} = []
lnames S {} = []
lnames Percent = []
lnames P {} = []
lnames (Ref a _ _) = [a]
lnames ((:+:) a b) = lnames a ++ lnames b
lnames Quote {} = []
lnames E {} = []
lnames EmptyS = []
{-# INLINE lnames #-}

-- | Get /reference/ 'UID's from 'Sentence's.
lnames' :: [Sentence] -> [UID]
lnames' = concatMap lnames
{-# INLINE lnames' #-}

sentenceRefs :: Sentence -> Set.Set UID
sentenceRefs sent = Set.fromList (lnames sent ++ sdep sent ++ shortdep sent)
{-# INLINE sentenceRefs #-}

instance HasChunkRefs Sentence where
chunkRefs = sentenceRefs
{-# INLINABLE chunkRefs #-}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Now that we have this, do we still need to exposesdep, shortdep, etc.?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also now using a Set for chunkRefs. Should we change those functions to use sets, too? (This would need to be a separate PR if so)

Copy link
Copy Markdown
Collaborator Author

@Xinlu-Y Xinlu-Y Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Now that we have this, do we still need to exposesdep, shortdep, etc.?

Probably keep them for now. They’re still used outside this module (TraceTable, GetChunks, and re-exported via Development/Sentence.Extract).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also now using a Set for chunkRefs. Should we change those functions to use sets, too? (This would need to be a separate PR if so)

sdep/lnames' currently return [UID] and are used as lists in GetChunks.hs and TraceTable.hs.

-- | Gets a list of quantities ('DefinedQuantityDict') from a 'Sentence' in order to print.
vars' :: Sentence -> ChunkDB -> [DefinedQuantityDict]
vars' a m = map (`findOrErr` m) $ sdep a
-- | Gets a list of concepts ('ConceptChunk') from a 'Sentence' in order to print.
concpt :: Sentence -> ChunkDB -> [Sentence]
concpt a m = map (definition . defResolve' m) $ sdep a
getDependenciesOf :: HasUID a => [a -> [Sentence]] -> [a] -> [(UID, [UID])]
getDependenciesOf fs = map (\x -> (x ^. uid, concatMap (lnames' . ($ x)) fs))

If sdep/lnames' change to Set UID, those call sites would need to convert back to a list (e.g., Set.toAscList), right? Does the order matter for outputs?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code was all overwritten in another one of your PRs? This PR might be outdated now.

{-# INLINE sentenceRefs #-}

instance HasChunkRefs Sentence where
chunkRefs = sentenceRefs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use getUIDs instead of sentenceRefs? This way we only need to traverse a Sentence once to get the same information?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, seeing as sentenceRefs isn't exposed by the module, can we just inline it into the definition of chunkRefs?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use getUIDs instead of sentenceRefs? This way we only need to traverse a Sentence once to get the same information?

getUIDs only collects symbolic UIDs (and from expressions). it ignores Ref and Ch ShortStyle. sentenceRefs was combining symbol + short + ref. So getUIDs isn’t equivalent. I added getAllUIDs that collects all three in one pass

@JacquesCarette
Copy link
Copy Markdown
Owner

When are you going to be able to get back to this @Xinlu-Y ?

@Xinlu-Y
Copy link
Copy Markdown
Collaborator Author

Xinlu-Y commented Dec 29, 2025

When are you going to be able to get back to this @Xinlu-Y ?

I’m working on it today

Ch :: SentenceStyle -> TermCapitalization -> UIDRef typ -> Sentence
-- | A branch of Ch dedicated to SymbolStyle only.
SyCh :: UID -> Sentence
SyCh :: UIDRef typ -> Sentence
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular one was done in #4810.

It would be good to add constraints on what we expect from typ for the other things, but those should probably be split off into other PRs.

Comment on lines +158 to +234
-- | Helpers for extracting references -----------------------------------------

-- | Generic traverse of all positions that could lead to /symbolic/ 'UID's from 'Sentence's.
getUIDs :: Sentence -> [UID]
getUIDs (Ch ShortStyle _ _) = []
getUIDs (Ch TermStyle _ _) = []
getUIDs (Ch PluralTerm _ _) = []
getUIDs (SyCh a) = [a]
getUIDs Sy {} = []
getUIDs NP {} = []
getUIDs S {} = []
getUIDs P {} = []
getUIDs Ref {} = []
getUIDs Percent = []
getUIDs ((:+:) a b) = getUIDs a ++ getUIDs b
getUIDs (Quote a) = getUIDs a
getUIDs (E a) = meNames a
getUIDs EmptyS = []

-- | Generic traverse of all positions that could lead to /symbolic/ and /abbreviated/ 'UID's from 'Sentence's
-- but doesn't go into expressions.
getUIDshort :: Sentence -> [UID]
getUIDshort (Ch ShortStyle _ a) = [a]
getUIDshort (Ch TermStyle _ _) = []
getUIDshort (Ch PluralTerm _ _) = []
getUIDshort SyCh {} = []
getUIDshort Sy {} = []
getUIDshort NP {} = []
getUIDshort S {} = []
getUIDshort Percent = []
getUIDshort P {} = []
getUIDshort Ref {} = []
getUIDshort ((:+:) a b) = getUIDshort a ++ getUIDshort b
getUIDshort (Quote a) = getUIDshort a
getUIDshort E {} = []
getUIDshort EmptyS = []

-----------------------------------------------------------------------------
-- And now implement the exported traversals all in terms of the above
-- | This is to collect /symbolic/ 'UID's that are printed out as a 'Symbol'.
sdep :: Sentence -> [UID]
sdep = nubOrd . getUIDs
{-# INLINE sdep #-}

-- This is to collect symbolic 'UID's that are printed out as an /abbreviation/.
shortdep :: Sentence -> [UID]
shortdep = nubOrd . getUIDshort
{-# INLINE shortdep #-}

-- | Generic traverse of all positions that could lead to /reference/ 'UID's from 'Sentence's.
lnames :: Sentence -> [UID]
lnames Ch {} = []
lnames SyCh {} = []
lnames Sy {} = []
lnames NP {} = []
lnames S {} = []
lnames Percent = []
lnames P {} = []
lnames (Ref a _ _) = [a]
lnames ((:+:) a b) = lnames a ++ lnames b
lnames Quote {} = []
lnames E {} = []
lnames EmptyS = []
{-# INLINE lnames #-}

-- | Get /reference/ 'UID's from 'Sentence's.
lnames' :: [Sentence] -> [UID]
lnames' = concatMap lnames
{-# INLINE lnames' #-}

sentenceRefs :: Sentence -> Set.Set UID
sentenceRefs sent = Set.fromList (lnames sent ++ sdep sent ++ shortdep sent)
{-# INLINE sentenceRefs #-}

instance HasChunkRefs Sentence where
chunkRefs = sentenceRefs
{-# INLINABLE chunkRefs #-}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code was all overwritten in another one of your PRs? This PR might be outdated now.

Comment on lines +33 to +35
-- | Extract the 'UID' from a 'UIDRef'.
unRef :: UIDRef t -> UID
unRef (UIDRef u) = u
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was pulled into #4808 as raw and rawUni.

Comment on lines +29 to +31
-- | Create a 'UIDRef' from a raw 'UID'.
uidRef :: UID -> UIDRef t
uidRef = UIDRef
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to avoid 'upgrading' UID references. I think hide is a bit better because it means that the specific chunk (and correct type) was in scope at some point.

spec _ Percent = P.E $ P.MO P.Perc
spec _ (P s) = P.E $ symbol s
spec sm (SyCh s) = P.E $ symbol $ lookupC' sm s
spec sm (SyCh s) = P.E $ symbol $ lookupC' sm (unRef s)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, at 'get chunk' sites, we should be try to be using 'unhide' as much as possible, which knows about the correct type to be looking for from the ChunkDB. That being said, for this in particular, this is difficult. I had to resort to a fairly major hack in #4810.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants