I've been playing around with Pathom and it's been great.
Thank you for this amazing library. It really feels like an amazing paradigm shift that's really flown under the radar. I wanna write a blog post once I have this digested a bit better :))
So far I've been rewriting "simple" pipelines in my data processing code. I'm hoping there will be some decoupling and simpler APIs as a result.
But I'm now trying to see if I can abstract more concepts in my application with Pathom
I tried to make a simple example that's hopefully easy to understand.
On a high level I'm curious:
- Can this be made performant?
- Should I just not be doing this in Pathom..?
- How would you architect this problem?
So first in Vanilla Clojure
I have some x-y point. I can view these points as:
- polar coordinates
- normalized (unit length) vectors
You can imagine this is as one little part of some larger application..
(use 'clojure.math)
(def
xy-points
"Just a long list of `[x y]` pairs"
(repeatedly 1000000
#(vector (rand)
(rand))))
#_
(take 5
xy-points)
;; => ([0.9595777815722415 0.955966600109044]
;; [0.4582556879334392 0.4586413232063461]
;; [0.07833332358513934 0.07130513464892763]
;; [0.7655731396104173 0.37268965701951307]
;; [0.40480720580295426 0.3954351287740743])
(defn
to-polar
"Convert the `[x y]` pairs to `[radius angle]` pairs"
[[x
y]]
[ (sqrt (+ (pow x ;;radius
2)
(pow y
2)))
(atan2 y
x) ])
#_
(->> xy-points
(mapv to-polar)
(take 5))
;; => ([1.354496828867144 0.7835129670951717]
;; [0.6483441515706128 0.7858187507103993]
;; [0.10592701171653926 0.738464855959417]
;; [0.8514692082173457 0.4530410919554245]
;; [0.5658955866045997 0.7736871501622257])
(def xy-points-normalized
"Convert my `[x y]` pairs to a normalized vector
Also `[x y]`"
(let [polar-coords (->> xy-points
(mapv to-polar))]
(mapv (fn [[x
y]
[radius
_]]
(vector (/ x
radius)
(/ y
radius)))
xy-points
polar-coords))))
#_
(take 5
xy-points-normalized)
;; => ([0.9063464150748226 0.4225354137596247]
;; [0.9724893343242329 0.23294740742410108]
;; [0.3622086279586388 0.9320970495781651]
;; [0.6496713506235731 0.7602151907051993]
;; [0.39979032478244364 0.916606620208663])
This is quite performant.
I could probably get fancier with arrays and transduction etc. ..
It's simple but has lots of downsides:
- Secret index based meanings to values (sometimes x-coord sometimes radius).
- If this is part of a larger application.. you need to remember if your vector has already been normalized or not.
- Maybe you already got the polar form somewhere else and you could reuse it when normalizing your values.. So you need to write another
to-polar overload that takes precomputed radius values. Yuck
- Unused values.. I call
to-polar during normalization, which calculates an angle which I never use.
(the example is a bit artificial :)) )
Now I'm trying to rethink this in terms of Pathom and how I can get the engine to do all the work for me!
I just need to define resolvers that take x-y points and return the values I'm interested in
(add-libs {'com.wsscode/pathom3 {:mvn/version "2025.01.16-alpha"}})
(require
'[com.wsscode.pathom3.connect.operation :as pco]
'[com.wsscode.pathom3.connect.indexes :as pci]
'[com.wsscode.pathom3.interface.smart-map :as psm])
(pco/defresolver $radius
"Take `::x` and `::y` values and use pythagoras to return a `::radius`"
[{::keys [x
y]}]
{::radius (sqrt (+ (pow x ;;radius
2)
(pow y
2)))})
#_
($radius {::x 3.0
::y 5.0})
;; => #:user{:radius 5.830951894845301}
(pco/defresolver $angle
"Take `::x` and `::y` values and return an `::angle`"
[{::keys [x
y]}]
{::angle (atan2 y
x)})
#_
($angle {::x 3.0
::y 5.0})
;; => #:user{:angle 1.0303768265243125}
(pco/defresolver $normalized-point
"L2 norm.. divide x y points by their length"
[{::keys [x
y
radius]}]
{::x-norm (/ x
radius)
::y-norm (/ y
radius)})
#_
($normalized-point {::x 3.0
::y 5.0
::radius 2.0})
;; => #:user{:x-norm 1.5, :y-norm 2.5}
;; here the radius can only be infered with the full `register`
;; so for testing you need to supply the values
#_
(::x-norm (psm/smart-map (pci/register [$radius
$angle
$normalized-point
$polar-data
$normalized-data])
{::x 3.0
::y 4.0}))
;; => 0.6 ;; this is 3.0/5.0 (think 3,4,5 triangle)
So those work great! It seems very composable.
Now I want to work with long lists of points.
Instead of [x y] pairs I should use labeled maps - which I think is a lot more clear
(def xy-point-maps
"Now I remake the x-y data as labeled maps
So the x and y are explicitely labeled"
(repeatedly 100000
#(hash-map ::x
(rand)
::y
(rand))))
#_
(take 5
xy-point-maps)
;; => (#:user{:y 0.36939725673492985, :x 0.9602380622865435}
;; #:user{:y 0.16960025245899446, :x 0.8414923599000971}
;; #:user{:y 0.23689244515168162, :x 0.3515261363669846}
;; #:user{:y 0.41155488019829933, :x 0.27518489053192907}
;; #:user{:y 0.04144667976964722, :x 0.7061907847196561})
I can then use these using "nested input" to process lists of values
https://pathom3.wsscode.com/docs/resolvers#nested-inputs
As far as I understand the EQL, the system doesn't actually know there will be a list coming in.. so I'm a bit unclear if this is a perf bottleneck.
(pco/defresolver $polar-data
"Converts a lists of xy point maps to
a list of polar coordinates"
[{::keys [cartesian-data]}]
{::pco/input [{::cartesian-data [::radius
::angle ]}]}
{::polar-data (mapv (fn [{::keys [radius
angle]}]
{:radius radius
:angle angle}) ;; maybe superfluous
cartesian-data)})
#_
($polar-data {::cartesian-data [{::radius 3.0
::angle 4.0}]})
;; => #:user{:polar-data [{:radius 3.0, :angle 5.0}]}
#_
($polar-data {::cartesian-data [{::x 3.0
::y 4.0}]})
;; => #:user{:polar-data [{:radius nil, :angle nil}]}
;; Doesn't work b/c it needs the register to find that `radius`!
(pco/defresolver $normalized-data
[{::keys [cartesian-data]}]
{::pco/input [{::cartesian-data [::x-norm
::y-norm]}]}
{::normalized-data (mapv (fn [{::keys [x-norm
y-norm]}]
{::x-norm x-norm
::y-norm y-norm}) ;; maybe superfluous
cartesian-data)})
;; {:y 0.9528055634606857, :x 0.9674033979599416})
(def env (pci/register [$radius
$angle
$normalized-point
$polar-data
$normalized-data]))
(def smap (psm/smart-map env
{::cartesian-data xy-point-maps}))
(take 5
(::normalized-data smap))
;; => (#:user{:x-norm 0.08712330380715341, :y-norm 0.9961975355991032}
;; #:user{:x-norm 0.47009242740808693, :y-norm 0.8826171931780916}
;; #:user{:x-norm 0.7068921332233743, :y-norm 0.7073213640113715}
;; #:user{:x-norm 0.8241905340531331, :y-norm 0.5663126023471587}
;; #:user{:x-norm 0.9010577036764886, :y-norm 0.4336992214026367})
(take 5
(::polar-data smap))
;; => ({:radius 0.2674784113124991, :angle 1.4835624270007897}
;; {:radius 0.9248188165128844, :angle 1.0814008320003392}
;; {:radius 0.7074320549968458, :angle 0.7857016754029951}
;; {:radius 0.9759277453181741, :angle 0.602024979576675}
;; {:radius 0.28699758977324435, :angle 0.4485941613968548})
In the end it works! Pretty cool stuff
I haven't benchmarked it, but I'm guessing if I calculate the polar form, the values are reused when getting the normalizing view
Maybe it'd be interesting to add another resolver that goes from polar to cartesian (I'm not sure how the loop would be handled)
But the result is coming out quite slow and chokes if there are too many points. Maybe I'm doing something wrong here? I don't quite understand the performance characteristics of the nested inputs here. I assume the engine resolver is only run once and reused for the whole sequence. Is that right?
I have a general sense that I'm.. as Steve Jobs says.. "holding it wrong"
Should I avoid making the last two resolvers entirely? They don't really do anything other than repackage the data. Maybe I should just do the nested input at each call site?
I have a specific application in mind where I'm pulling out coordinate pairs from a tech.ml.dataset table (to make all sorts of different plots with thing/geom ) and I'd like to effectively "enhance" each row with derived values using Pathom. I can make low-level row resolvers that take rows and effectively add entries (such as polar coordinates). Or I can have high level resolvers that take columns in the table and derive new columns. But the second options gets messy when the derived entries maybe only make sense for certain rows... So I'm more partial to the first solution if it's relatively fast.
Do you have any other general thoughts or advice architecturally?
Thank you once again for this awesome lib
I've been playing around with Pathom and it's been great.
Thank you for this amazing library. It really feels like an amazing paradigm shift that's really flown under the radar. I wanna write a blog post once I have this digested a bit better :))
So far I've been rewriting "simple" pipelines in my data processing code. I'm hoping there will be some decoupling and simpler APIs as a result.
But I'm now trying to see if I can abstract more concepts in my application with Pathom
I tried to make a simple example that's hopefully easy to understand.
On a high level I'm curious:
So first in Vanilla Clojure
I have some x-y point. I can view these points as:
You can imagine this is as one little part of some larger application..
This is quite performant.
I could probably get fancier with arrays and transduction etc. ..
It's simple but has lots of downsides:
to-polaroverload that takes precomputedradiusvalues. Yuckto-polarduring normalization, which calculates ananglewhich I never use.(the example is a bit artificial :)) )
Now I'm trying to rethink this in terms of Pathom and how I can get the engine to do all the work for me!
I just need to define resolvers that take x-y points and return the values I'm interested in
So those work great! It seems very composable.
Now I want to work with long lists of points.
Instead of [x y] pairs I should use labeled maps - which I think is a lot more clear
I can then use these using "nested input" to process lists of values
https://pathom3.wsscode.com/docs/resolvers#nested-inputs
As far as I understand the EQL, the system doesn't actually know there will be a list coming in.. so I'm a bit unclear if this is a perf bottleneck.
In the end it works! Pretty cool stuff
I haven't benchmarked it, but I'm guessing if I calculate the polar form, the values are reused when getting the normalizing view
Maybe it'd be interesting to add another resolver that goes from polar to cartesian (I'm not sure how the loop would be handled)
But the result is coming out quite slow and chokes if there are too many points. Maybe I'm doing something wrong here? I don't quite understand the performance characteristics of the nested inputs here. I assume the engine resolver is only run once and reused for the whole sequence. Is that right?
I have a general sense that I'm.. as Steve Jobs says.. "holding it wrong"
Should I avoid making the last two resolvers entirely? They don't really do anything other than repackage the data. Maybe I should just do the
nested inputat each call site?I have a specific application in mind where I'm pulling out coordinate pairs from a tech.ml.dataset table (to make all sorts of different plots with
thing/geom) and I'd like to effectively "enhance" each row with derived values using Pathom. I can make low-level row resolvers that take rows and effectively add entries (such as polar coordinates). Or I can have high level resolvers that take columns in the table and derive new columns. But the second options gets messy when the derived entries maybe only make sense for certain rows... So I'm more partial to the first solution if it's relatively fast.Do you have any other general thoughts or advice architecturally?
Thank you once again for this awesome lib