You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently histograms are a distinct construct throughout the compiler (Hist in the SOACS IR, and SegHist in the GPU/MC IR). Just as we rewrote scatters to be expressed through accumulators, we could do the same for histograms. The advantages would be the following:
An overall simplification of the compiler.
Allowing multiple histogram updates per input element.
Consistent code generation no matter whether it is a "real" histogram in the source program, or produced by a program transformation such as AD.
I think Cosmin proposed this some time ago as well.
The main challenge is that we do very careful and precise code generation for histograms. However, I believe it may not be so hard to detect that a SegMap is actually "histogram-like", and do our usual subhistogramming/multipass/whatever strategies. Here is an idea:
When generating code for a SegMap, see if it performs any UpdateAccs on free accumulators with operators. If so, record these and treat it as a histogram.
Essentially generate code like we currently do for SegHist, except that instead of having a list of HistOps that we manually apply after the KernelBody has run, the KernelBody itself does the updates. In order to do multihistogramming, we simply change which array a given accumulator (identified by its certificate) refers to. We still need to have all the same logic for combining multiple subhistograms. This is tracked in the ImpState and is currently looked up by lookupAcc, but we could easily add a combinator for locally overriding the underlying array.
The main challenges are the following:
Cosmin's carefully tuned histogram strategy assumes only one update per input element. If we eventually add support for multiple updates, we will have to refine the strategy. This is probably not so difficult and can be postponed.
We will need to handle arbitrary accumulators in reverse-mode AD.
Point (2) here is really difficult, I fear. I added support for scatter-like WithAccs to AD, which was difficult enough, but I don't really know how we could possibly handle arbitrary update operators without adding a tape. Currently we assume that all WithAcc operators are the result of preceding applications of AD, which means they are linear, and so have a constant adjoint. One solution is to preserve Hist at the SOACS level until after AD, at which point we rewrite it to accumulators.
Currently histograms are a distinct construct throughout the compiler (
Histin the SOACS IR, andSegHistin the GPU/MC IR). Just as we rewrote scatters to be expressed through accumulators, we could do the same for histograms. The advantages would be the following:UpdateAdd-based optimisations would work transparently with histograms, such as what is asked for in Fuseconcatwithreduce_by_index#2418.I think Cosmin proposed this some time ago as well.
The main challenge is that we do very careful and precise code generation for histograms. However, I believe it may not be so hard to detect that a
SegMapis actually "histogram-like", and do our usual subhistogramming/multipass/whatever strategies. Here is an idea:SegMap, see if it performs anyUpdateAccs on free accumulators with operators. If so, record these and treat it as a histogram.SegHist, except that instead of having a list ofHistOps that we manually apply after theKernelBodyhas run, theKernelBodyitself does the updates. In order to do multihistogramming, we simply change which array a given accumulator (identified by its certificate) refers to. We still need to have all the same logic for combining multiple subhistograms. This is tracked in theImpStateand is currently looked up bylookupAcc, but we could easily add a combinator for locally overriding the underlying array.The main challenges are the following:
Point (2) here is really difficult, I fear. I added support for scatter-like
WithAccs to AD, which was difficult enough, but I don't really know how we could possibly handle arbitrary update operators without adding a tape. Currently we assume that allWithAccoperators are the result of preceding applications of AD, which means they are linear, and so have a constant adjoint. One solution is to preserveHistat the SOACS level until after AD, at which point we rewrite it to accumulators.