A simulation environment to analyze different autoscaling algorithms of containers and nodes.
ASCAL is a Python package that simulates different autoscaling algorithms focusing on minimizing costs in Container as a Service (CaaS) deployed on Infrastructure as a Service (IaaS) clouds. It applies to CPU-bounded applications.
The following autoscaling algorithms are implemented:
- RHA. Reactive and horizontal autoscaling of containers and nodes. It implements HPA (Horizontal Pod Autoscaler) and CA (Cluster Autoscaler) concepts of Kubernetes autoscalers.
- RHVA. Reactive horizontal/vertical autoscaling of containers and nodes. FCMA algorithms (https://github.com/asi-uniovi/fcma) are used to calculate near-optimal allocations in terms of cost at periodic times, called Fixed Deployment Windows (FDWs). The transition between two consecutive FDWs relies on optimizing the recycling of containers and nodes.
- PHVA. Predictive horizontal/vertical autoscaling of containers and nodes. Analogous to the previous one, but assuming a perfect load prediction. Time is divided into prediction windows, where application loads at a given percentile are known in advance.
- RHA-RHVA. Mixed reactive horizontal autoscaling and reactive horizontal/vertical autoscaling. It combines the RHA and RHVA autoscalers. It periodically readjusts with RHVA the deployment generated by the RHA algorithm.
- RHA-PHVA. Mixed reactive horizontal autoscaling and predictive horizontal/vertical autoscaling. It combines the RHA and RHVA autoscalers. It periodically readjusts with PHVA the deployment generated by the RHA algorithm.
Inputs:
- A set of applications.
- Application's workload measured in req/s (requests per second).
- A familiy of instance classes. Currently, ASCAL works with a single instance class family. For example, AWS C5, M5 and R5 family of instance classes. A family is made up of instance classes with identical performance per core, assuming there is enough memory to run the applications. An example of family definitions can be found in file aws_eu_west_1.py. In that file, AWS c5, AWS m5 and AWS r5 instances have the same performance per core for any application. The instances differ in the number of cores and memory, but the cost per core and memory is constant. File aws_eu_west_1.py can be used as a reference to create new instance class families.
- Autoscaler characterization.
Outputs:
- Application performances.
- System cost.
Clone the repository:
git clone https://github.com/asi-uniovi/ascal.git
cd ascalInstall uv (instructions)
Synchronize the packages with uv:
uv syncuv will create a local virtual environment in the .venv folder, and install there ascal in editable mode, and all the required dependencies. You can manually activate this virtual environment, but it is not required if you run the python scripts via uv, with the command uv run <script>.
An example of usage can be found in the examples folder, which can be run with:
cd examples/ex1
uv run ex1.pyA jupyter notebook with the same example can be found in examples/ex1/ex1.ipynb.
If order to use the package in your own code, the scheme provided in the example can be followed.
1. Write the required imports.
from ascal import AscalConfig, Ascal
from examples import aws_eu_west_1_c5m5r5Module aws_eu_west_1_c5m5r5 defines the cost, cores and GiB of AWS's C5, M5 and R5 instances and can be used as a reference to define new instance class families.
3. Set the ASCAL configuration.
ascal_config = AscalConfig.get_from_config_yaml(config_file, aws_eu_west_1_c5m5r5.c5_m5_r5_fm)Argument config_file is a YAML with ASCAL's configuration. Argument aws_eu_west_1_c5m5r5.c5_m5_r5_fm is the instance class family. File contains child families c5, m5 and r5, which can be used to restrict the type of nodes used by the autoscaler.
4. Create an ASCAL simulation
ascal_problem = Ascal(ascal_config, log=log_file)The second argument is an optional log file. Log is disabled when it is not set.
5. Execute an ASCAL simulation.
ascal_problem.run()Simulate until the number of specified simulation units. In the example it simulates until the end, defined by the length of application's workloads. The simulation unit is 1 second.
6. Generate CSV files with workloads and simulation results.
ascal_problem.write_workload_csv('workloads.csv')
ascal_problem.write_performance_csv('performances.csv')
ascal_problem.write_cost_csv('cost.csv')7. Optional plot of workloads and simulation results.
ascal_problem.plot(ascal_problem.get_workloads(), "Application Workloads", "req/s")
ascal_problem.plot(ascal_problem.get_performances(), "Application Performances", "req/s")
cluster_cost = ascal_problem.get_cluster_cost()
total_cost_str = f"total cost = {sum(cluster_cost)/3600:.3f} $"
ascal_problem.plot({total_cost_str: cluster_cost}, "Cluster Cost", "$/hour")ASCAL simulation can be configured through a YAML configuration file. It is divided into three sectuons:
- Autoscalers.
- Timings.
- Applications.
- Simulation time.
An example can be found in file examples/ex1/config.yaml, which is described below.
1. Autoscalers.
autoscalers:
h_reactive:
time_period: 60 # Seconds
desired_cpu_utilization: 0.7
node_utilization_threshold: 0.5
replica_scale_down_stabilization_time: 300
node_scale_down_stabilization_time: 600
aggs: 2
One or more autoscalers can be defined, although only one will be used. h_reactive autoscaler corresponds to the RHA, i.e., the reactive and horizontal autoscaler. This autoscaler wakes every time_period and creates/removes containers/nodes. For each application, the number of container replicas is adjusted to obtain the desired_cpu_utilization, using the same ideas behind HPA in Kubernetes. If more replicas are required and they can not be allocated in the current nodes, new nodes are created using the minimum number of nodes but optimizing the cost. For example, if two c5.xlarge nodes are required to allocate the new containers, a single c5.2xlarge node is used instead, which provides the same computational resources at the same cost, but with less fragmentation (which helps in the allocation process). The removal of application replicas can not start before replica_scale_down_stabilization_time seconds from the last time the number increased.
When CPU utilization is below the node_utilization_threshold they are elegible to be removed. A simulation process analogous to tha performed by CA in Kubernetes is carried out, trying to move all its containers in other nodes. If the movement is successful, then the node is removed. Node removals are not tried until node_scale_down_stabilization_time seconds have elapsed from the last node creations.
Unlike what happens with HPA in Kubernetes, this autoscaler is able to work with different sizes of containers in terms of CPU cores for the same application. Each application defines a minimum-size container and a set aggregations that multiply the CPU cores and performance, while the memory remains constant. Property aggs: 2 indicates that all the applications use aggregation 2 (double CPU cores and performance), but other settings are possible, such as {app0: [1, 2], app1: 2}.
hv_reactive:
time_period: 60 # Seconds
desired_cpu_utilization: 0.7
algorithm: fcma
transition_time_budget: 100 # Seconds
hot_node_scale_up: Falsehv_reactive autoscaler corresponds to the RHVA, i.e., the reactive and horizontal/vertical autoscaler. This autoscaler wakes every time_period, also named scheduling window, and creates/removes containers/nodes of different sizes. For each application, the total performance of its container replicas is adjusted to obtain the desired_cpu_utilization. Just like with the horizontal autoscaler scaler, a value lower than one is necessary to handle temporary overloads, or those caused by the delay between requesting the creation of a node and the moment the node becomes available. hot_node_scale_up enables hot node upgrade withn the same instance class family.
A new deployment of containers and node is calculated at every time period using the specified algorithm. Currently there are three avaliable algorithms: fcma1, fcma2 and fcma3, in order of decreasing cost, but increasing calculation time.
Transitioning between two consecutive scheduling windows requires a sequence of creations/removals of containers of nodes. Property transition_time_budget provides a budget for the transition algorithm, but the actual time to perform the transition may be higher when the value is too small a ned nodes need to be created. Setting this parameter to zero uses a simple transition algorithm that tries to minimize the transition time.
hv_predictive:
prediction_window: 240 # Seconds
prediction_percentile: 95 # Percentage
algorithm: fcma
transition_time_budget: 200 # Seconds
hot-node-scale-up: Truehv_predictive autoscaler corresponds to the PHVA, i.e., the predictive and horizontal/vertical autoscaler. This autoscaler wakes every prediction_window, and creates/removes containers/nodes of different sizes.
At the current prediction window, application's workload load (for the given percentile) is assumed to be known for the next prediction window. This workload is used to calculate a new deployment for the next prediction window using the provided algorithm (fcma in the example). Two transitions are scheduled, one that completes before the end of the current prediction window, for those applications with increased workload, and another that completes just at the beginning of the next prediction windod, for those applications with decreased workload.
Property transition_time_budget is the total time available to perform the two transitions, but note again, that the actual time may be higher.
Some systems allow hot node scaling-up, so a node can upgrade to adquire more computational resources without being removed and recreated.
h_reactive_hv_reactive:
desired_cpu_utilization: 0.7
h_time_period: 60
h_node_utilization_threshold: 0.5
h_replica_scale_down_stabilization_time: 300
h_node_scale_down_stabilization_time: 600
hv_time_period: 600
hv_algorithm: fcma
hv_transition_time_budget: 100
hot_node_scale_up: Falseh_reactive_hv_reactive autoscaler corresponds to the RHA-RHVA, i.e., a combination of a reactive horizontal autoscaler and a reactive horizontal/vertical autoscaler. It basically performs as a reactive horizontal autoscaler that is readjusted every hv_time_period seconds by a reactive horizontal/vertical autoscaler, moving the deployment to a near-optimal one.
h_reactive_hv_predictive:
h_desired_cpu_utilization: 0.7
h_time_period: 60
h_node_utilization_threshold: 0.5
h_replica_scale_down_stabilization_time: 300
h_node_scale_down_stabilization_time: 600
hv_prediction_window: 600
hv_prediction_percentile: 95
hv_algorithm: fcma
hv_transition_time_budget: 100
hot_node_scale_up: Falseh_reactive_hv_predictive autoscaler corresponds to the RHA-PHVA, i.e., a combination of a reactive horizontal autoscaler and a predictive horizontal/vertical autoscaler. It basically performs as a reactive horizontal autoscaler that is readjusted every hv_time_period seconds by a predictive horizontal/vertical autoscaler, moving the deployment to a near-optimal one.
Finally, the autoscaler to be used must be set. For example:
autoscaler: h_reactive_hv_reactive2. Timings. This section defines the timings in seconds related to node creation, node removals, container allocations and container removals.
timing_args:
node_time_to_billing: 20
node_creation_time: 100
node_removal_time: 10
container_creation_time: 1
container_removal_time: 5
hot_node_scale_up_time: 60Property node_time_to_billing is the time between the request of a new node creation and the time the node starts to be billed. Between node_time_to_billing and node_creation_time the node can not allocate containers yet, but it is billed. Finally, hot_node_scale_up_time is the time required to perform a node upgrade.
3. Applications.
This section in the YAML file defines the applications.
apps:
app0:
load:
file: ../../traces/triangle_rand.csv
time_interval: 1
repeat: 1
load_offset: 10
load_mult: 4
time_offset: 0
container:
cpu: 260 mcores
mem: 0.420 GiB
perf: 1 req/s
aggs: [1, 2, 4]The application workload is based on a trace file in csv format, with a column per application, where a line corresponds to a sample of integer workloads. The other properties preprocess the trace. time_interval is the time in seconds between two consecutive samples. The whole number of samples is repeated repeat times. For example, we can repeat one hour of samples to obtain the trace for several hours. Workloads can be incremented using an integer load_offset and multiplied by a float load_mult. Finally, the first sample of the workload is repeated time_ofset times adding some offset to the final workloads.
Each application is implemented as a set of containers, with allocated cores, memory and associated performance. Three different types of containers, or container classes, are defined for app0 in the example. cpu, mem and perf data in the YAML file correspond to the minimum-size container, with aggregation level 1. The other container classes correspond to aggregation levels 2 and 4, have x2 nd x4 cores and performance, but memory is assumed to be the same. The value 1/_perf represents the minimum inter-arrival time required for the minimum-sized container to meet the application's timing requirements.
4. Simulation time. The simulation time is defined in this section.
simulation_time: 3600If the number of seconds in the workload is longer than this time, workload is truncated at the end. However, if the number of seconds in the workload is to small, the last sample is repeated to get the simulation time.