Oasis server healthcheck - slow when under load

> **Report:** The healthcheck api was lagging and dragging under high load. It caused our EKS to kill the oasis server pod. We raised the thresholds for probe check so now it seems ok. I am also refactoring some parts of our code to cache somethings so we don’t hit oasis end points all the time. But it’s really worth looking into oasis server optimisation if you have time

> on a side note, it seems there is some internal process inside the server that also tries to call health api but it has the wrong url. Instead of api/health, it calls health. I think it’s wait-for-server.sh. But I couldn’t understand how it’s connected to gunicorn thing


### Might be caused by one of two issues 
* **resource problem** -> the platform node is overloaded and the healthcheck is slow to respond  (correct behaviour?)  but might be that something needs  optimisation
* **Max concurrent requests issue**  ->  Because the http server is synchronous (Gunicorn ~ WSGI), its limited to  (number of workers * threads per worker)  so if longer running requests are gumming up all those slots it would hang the health-check response. 

###  Options
- grant the 'healthcheck' calls its own dedicated thread
- for the side note,  it probably make sense to update the routing so that both `api/healthcheck` and `healthcheck` are valid
- Update the server to support async http


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oasis server healthcheck - slow when under load #1208

Might be caused by one of two issues

Options

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Oasis server healthcheck - slow when under load #1208

Description

Might be caused by one of two issues

Options

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions