Discussion: RPC server automatic recovery

Wanted to start a discussion to see if anyone has ideas on how to improve the reliability of the RPC server.

The recent issues with batch requests caused the behavior where the RPC server stops responding and does not auto-recover. PM2 does not detect the issue and does not restart the service automatically. The node operator has to notice there is an issue and go call pm2 restart. Not ideal when it happens in the middle of the night.

How can we make it more resilient? Can the node detect the problem and restart itself? Currently, the death of the RPC server does not crash the whole service, so PM2/docker cannot know if it needs to be restarted.

Maybe needs more investigation to understand why the server stops responding. without throwing exception

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: RPC server automatic recovery #38

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Discussion: RPC server automatic recovery #38

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions