Skip to content

Should I be processing low-level/small elements with Pathom? #233

@kxygk

Description

@kxygk

I've been playing around with Pathom and it's been great.

Thank you for this amazing library. It really feels like an amazing paradigm shift that's really flown under the radar. I wanna write a blog post once I have this digested a bit better :))

So far I've been rewriting "simple" pipelines in my data processing code. I'm hoping there will be some decoupling and simpler APIs as a result.

But I'm now trying to see if I can abstract more concepts in my application with Pathom

I tried to make a simple example that's hopefully easy to understand.

On a high level I'm curious:

  • Can this be made performant?
  • Should I just not be doing this in Pathom..?
  • How would you architect this problem?

So first in Vanilla Clojure

I have some x-y point. I can view these points as:

  • polar coordinates
  • normalized (unit length) vectors

You can imagine this is as one little part of some larger application..

(use 'clojure.math)

(def
  xy-points
  "Just a long list of `[x y]` pairs"
  (repeatedly 1000000
              #(vector (rand)
                       (rand))))
#_
(take 5
      xy-points)
;; => ([0.9595777815722415 0.955966600109044]
;;     [0.4582556879334392 0.4586413232063461]
;;     [0.07833332358513934 0.07130513464892763]
;;     [0.7655731396104173 0.37268965701951307]
;;     [0.40480720580295426 0.3954351287740743])

(defn
  to-polar
  "Convert the `[x y]` pairs to `[radius angle]` pairs"
  [[x
    y]]
  [ (sqrt (+ (pow x   ;;radius
                  2)
             (pow y
                  2)))
   (atan2 y
          x) ])
#_
(->> xy-points
     (mapv to-polar)
     (take 5))
;; => ([1.354496828867144 0.7835129670951717]
;;     [0.6483441515706128 0.7858187507103993]
;;     [0.10592701171653926 0.738464855959417]
;;     [0.8514692082173457 0.4530410919554245]
;;     [0.5658955866045997 0.7736871501622257])

(def xy-points-normalized
  "Convert my `[x y]` pairs to a normalized vector
  Also `[x y]`"
  (let [polar-coords (->> xy-points
                          (mapv to-polar))]
    (mapv (fn [[x
                y]
               [radius
                _]]
            (vector (/ x
                       radius)
                    (/ y
                       radius)))
          xy-points
          polar-coords))))
#_
(take 5
      xy-points-normalized)
;; => ([0.9063464150748226 0.4225354137596247]
;;     [0.9724893343242329 0.23294740742410108]
;;     [0.3622086279586388 0.9320970495781651]
;;     [0.6496713506235731 0.7602151907051993]
;;     [0.39979032478244364 0.916606620208663])

This is quite performant.
I could probably get fancier with arrays and transduction etc. ..
It's simple but has lots of downsides:

  • Secret index based meanings to values (sometimes x-coord sometimes radius).
  • If this is part of a larger application.. you need to remember if your vector has already been normalized or not.
  • Maybe you already got the polar form somewhere else and you could reuse it when normalizing your values.. So you need to write another to-polar overload that takes precomputed radius values. Yuck
  • Unused values.. I call to-polar during normalization, which calculates an angle which I never use.
    (the example is a bit artificial :)) )

Now I'm trying to rethink this in terms of Pathom and how I can get the engine to do all the work for me!

I just need to define resolvers that take x-y points and return the values I'm interested in

(add-libs {'com.wsscode/pathom3 {:mvn/version "2025.01.16-alpha"}})

(require
'[com.wsscode.pathom3.connect.operation :as pco]
'[com.wsscode.pathom3.connect.indexes :as pci]
'[com.wsscode.pathom3.interface.smart-map :as psm])

(pco/defresolver $radius
"Take `::x` and `::y` values and use pythagoras to return a `::radius`"
[{::keys [x
          y]}]
{::radius (sqrt (+ (pow x   ;;radius
                        2)
                   (pow y
                        2)))})
#_
($radius {::x 3.0
          ::y 5.0})
;; => #:user{:radius 5.830951894845301}

(pco/defresolver $angle
"Take `::x` and `::y` values and return an `::angle`"
[{::keys [x
          y]}]
{::angle (atan2 y
                x)})
#_
($angle {::x 3.0
         ::y 5.0})
;; => #:user{:angle 1.0303768265243125}

(pco/defresolver $normalized-point
"L2 norm.. divide x y points by their length"
[{::keys [x
          y
          radius]}]
{::x-norm (/ x
             radius)
 ::y-norm (/ y
             radius)})

#_
($normalized-point {::x      3.0
                    ::y      5.0
                    ::radius 2.0}) 
;; => #:user{:x-norm 1.5, :y-norm 2.5}
;; here the radius can only be infered with the full `register`
;; so for testing you need to supply the values
#_
(::x-norm  (psm/smart-map (pci/register [$radius
                                         $angle
                                         $normalized-point
                                         $polar-data
                                         $normalized-data])
                          {::x 3.0
                           ::y 4.0}))
;; => 0.6  ;; this is 3.0/5.0 (think 3,4,5 triangle)

So those work great! It seems very composable.
Now I want to work with long lists of points.
Instead of [x y] pairs I should use labeled maps - which I think is a lot more clear

(def xy-point-maps
  "Now I remake the x-y data as labeled maps
  So the x and y are explicitely labeled"
  (repeatedly 100000
              #(hash-map ::x
                         (rand)
                         ::y
                         (rand))))
#_
(take 5
      xy-point-maps)
;; => (#:user{:y 0.36939725673492985, :x 0.9602380622865435}
;;     #:user{:y 0.16960025245899446, :x 0.8414923599000971}
;;     #:user{:y 0.23689244515168162, :x 0.3515261363669846}
;;     #:user{:y 0.41155488019829933, :x 0.27518489053192907}
;;     #:user{:y 0.04144667976964722, :x 0.7061907847196561})

I can then use these using "nested input" to process lists of values

https://pathom3.wsscode.com/docs/resolvers#nested-inputs

As far as I understand the EQL, the system doesn't actually know there will be a list coming in.. so I'm a bit unclear if this is a perf bottleneck.

(pco/defresolver $polar-data
  "Converts a lists of xy point maps to
   a list of polar coordinates"
[{::keys [cartesian-data]}]
{::pco/input [{::cartesian-data [::radius
                                 ::angle ]}]}
  {::polar-data (mapv (fn [{::keys [radius
                                    angle]}]
                        {:radius radius
                         :angle  angle}) ;; maybe superfluous
                      cartesian-data)})
#_
($polar-data {::cartesian-data [{::radius 3.0
                                 ::angle  4.0}]})
;; => #:user{:polar-data [{:radius 3.0, :angle 5.0}]}
#_
($polar-data {::cartesian-data [{::x 3.0
                                 ::y 4.0}]})
;; => #:user{:polar-data [{:radius nil, :angle nil}]}
;; Doesn't work b/c it needs the register to find that `radius`!

(pco/defresolver $normalized-data
  [{::keys [cartesian-data]}]
  {::pco/input [{::cartesian-data [::x-norm
                                   ::y-norm]}]}
  {::normalized-data (mapv (fn [{::keys [x-norm
                                         y-norm]}]
                             {::x-norm x-norm
                              ::y-norm y-norm}) ;; maybe superfluous
                           cartesian-data)})

;;     {:y 0.9528055634606857, :x 0.9674033979599416})

(def env (pci/register [$radius
                        $angle
                        $normalized-point
                        $polar-data
                        $normalized-data]))

(def smap (psm/smart-map env
                         {::cartesian-data xy-point-maps}))

(take 5
      (::normalized-data smap))
;; => (#:user{:x-norm 0.08712330380715341, :y-norm 0.9961975355991032}
;;     #:user{:x-norm 0.47009242740808693, :y-norm 0.8826171931780916}
;;     #:user{:x-norm 0.7068921332233743, :y-norm 0.7073213640113715}
;;     #:user{:x-norm 0.8241905340531331, :y-norm 0.5663126023471587}
;;     #:user{:x-norm 0.9010577036764886, :y-norm 0.4336992214026367})


(take 5
      (::polar-data smap))
;; => ({:radius 0.2674784113124991, :angle 1.4835624270007897}
;;     {:radius 0.9248188165128844, :angle 1.0814008320003392}
;;     {:radius 0.7074320549968458, :angle 0.7857016754029951}
;;     {:radius 0.9759277453181741, :angle 0.602024979576675}
;;     {:radius 0.28699758977324435, :angle 0.4485941613968548})

In the end it works! Pretty cool stuff

I haven't benchmarked it, but I'm guessing if I calculate the polar form, the values are reused when getting the normalizing view

Maybe it'd be interesting to add another resolver that goes from polar to cartesian (I'm not sure how the loop would be handled)

But the result is coming out quite slow and chokes if there are too many points. Maybe I'm doing something wrong here? I don't quite understand the performance characteristics of the nested inputs here. I assume the engine resolver is only run once and reused for the whole sequence. Is that right?

I have a general sense that I'm.. as Steve Jobs says.. "holding it wrong"

Should I avoid making the last two resolvers entirely? They don't really do anything other than repackage the data. Maybe I should just do the nested input at each call site?

I have a specific application in mind where I'm pulling out coordinate pairs from a tech.ml.dataset table (to make all sorts of different plots with thing/geom ) and I'd like to effectively "enhance" each row with derived values using Pathom. I can make low-level row resolvers that take rows and effectively add entries (such as polar coordinates). Or I can have high level resolvers that take columns in the table and derive new columns. But the second options gets messy when the derived entries maybe only make sense for certain rows... So I'm more partial to the first solution if it's relatively fast.

Do you have any other general thoughts or advice architecturally?

Thank you once again for this awesome lib

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions