Skip to content

Experiment with improving the parallelization of the layout optimizer  #12

@LTLA

Description

@LTLA

It may be possible to improve the parallelizability of optimize_layout without changing the results from the single-threaded case. This is done by precomputing a "graph" of dependencies between observations and spinning off threads if we observe a series of observations that don't depend on each other.

  1. Iterate across observations within each epoch as usual. At each observation, find its neighbors and also hit the RNG to get the negative samples. The combined set of neighbors and negative samples defines the "dependencies".
  2. Check if the dependencies depend on any other queued observations in other threads.
    • If yes and it's just one thread, add it to the task queue for that thread.
    • If yes and it's multiple threads, close the task queues for all threads. Put this observation on hold.
    • If not, add it to the task queue of the thread with the shortest task queue.
  3. If all task queues are closed, actually execute the compute by spinning up one thread per non-empty task queue.
  4. Restart this process with the on-hold observation, or finish if no observation was held.

Hopefully, the task queues build up to a healthy size and we don't have to spin up too many threads. At around 10 us per thread and 500 epochs, we'd be aiming for ~100 start/stops to keep the thread overhead under 10 seconds.

Note that this also necessitates holding onto the identities of the negative samples. This shouldn't take a lot of memory, but it may be wise to set a buffer size (e.g., 10 million negative samples) that closes all task queues and triggers execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions