Skip to content

race condition with scheduler initialization #83

@shwestrick

Description

@shwestrick

A student of mine, @darshand15, has been doing some testing with ParlayLib. We think we've found a race condition in the scheduler initialization.

Suppose you have 2 threads.

  1. At the moment of scheduler initialization, we have num_awake_workers set to 2.
  2. Immediately after thread 0 finishes initializing the scheduler, thread 1 is "awake" but may not be scheduled by the OS right away.
  3. Suppose thread 0 now executes pardo and therefore scheduler.spawn(&right_job); from here it executes if (first) wake_up_a_worker() but this does not do atomic_notify_one because the current number of awake threads is set to 2 (equal to num_threads).
  4. Now, thread 1 starts executing. It enters worker() and then wait_for_work(), and therefore immediately goes to sleep, waiting on the wake_up_counter.

The problem is that we have now entered a situation where thread 0 has a spare job in its deque, but thread 1 is not awake to steal it.

We were able to reproduce this race condition in practice with a simple program that looks like:

void loop() {
  // just loop for a while
}

int main() {
  loop();
  pardo(
    [](){ loop(); },
    [](){ loop(); }
  );
}

In this example, the scheduler gets initialized (lazily) at the beginning of the pardo. When we run this program on 2 threads, we almost always see sequential performance (no speedup). Actually, I don't think we've yet managed to see any speedup on 2 threads for this example program.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions