-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem? Please describe.
We need an automated logic for increasing or decreasing the number of workers in a Pathway pipeline based on the current data processing load.
This logic should work as follows:
- The user configures a maximum and minimum number of workers and a time window over which a trigger event must occur.
- There are two types of trigger events:
- Increase trigger: high output latency indicates that the current number of workers is lagging behind the data stream
- Decrease trigger: underutilized resources, as extracted from the differential_dataflow
When the configured trigger event occurs within the specified time window, Pathway restarts with an exponentially larger or smaller number of workers (base scaling factor 2, we should also consider a scaling factor of φ).
Between restarts, the computation state must be preserved. This can be achieved using a persistence in state-saving mode. Since restarts occur on the same machine, the backend can keep the state in memory using a key-value wrapper instead of storing it on disk, significantly speeding up state saving and restart. Users should also have the option to persist the state on disk if memory conservation is needed.
Describe the solution you'd like
Implement the automatic scaling mechanism described above, including:
- User-configurable min/max worker counts and time window;
- Detection of increase and decrease triggers based on latency and resource utilization;
- Exponential adjustment of worker count with configurable scaling factor;
- Preservation of computation state between restarts, with both in-memory and optional on-disk persistence.
Describe alternatives you've considered
None.
Additional context
Additionnal context may be added here. The issue may also be decomposed in several smaller ones in the future.