Skip to content

Discussion: RPC server automatic recovery #38

@hiveuprss

Description

@hiveuprss

Wanted to start a discussion to see if anyone has ideas on how to improve the reliability of the RPC server.

The recent issues with batch requests caused the behavior where the RPC server stops responding and does not auto-recover. PM2 does not detect the issue and does not restart the service automatically. The node operator has to notice there is an issue and go call pm2 restart. Not ideal when it happens in the middle of the night.

How can we make it more resilient? Can the node detect the problem and restart itself? Currently, the death of the RPC server does not crash the whole service, so PM2/docker cannot know if it needs to be restarted.

Maybe needs more investigation to understand why the server stops responding. without throwing exception

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions