Reduce an error level (1.1.2 branch)#147
Reduce an error level (1.1.2 branch)#147cbeck88 wants to merge 1 commit intomobilecoinfoundation:release-1.1.2from
Conversation
| Err(err) => { | ||
| // Failing to publish a report is not fatal, because we attempt to publish | ||
| Err(Error::PublishReport) => { | ||
| // This is already logged adequately within |
There was a problem hiding this comment.
Somewhat unrelated to this PR, but in regards to the logging inside publish_report. I see a comment noting that if it "doesn't resolve itself" then we need to take action. Could you provide context for how I might identify that it is not going to resolve itself?
There was a problem hiding this comment.
Yeah this is a good question, and maybe we should change the code more aggressively.
So the idea here was that we try to publish, but we don't block if we can't. Because it's not critical that we republish the report with a higher expiry number, as long as one of our publications succeeds before the key actually expires.
There are other things we could do here:
- Use retries, and log an error but only if e.g. 3 retries fail
- Keep track of how often the report publication has failed, and start blocking and looping infinitely if it fails too many times in a row.
I wanted to avoid making significant changes in a release branch, so I just wanted to adjust the log level in this commit.
Could you provide context for how I might identify that it is not going to resolve itself?
If this warning persists every time we attempt to publish, then eventually the key will expire and users can't publish.
If we manage to process a block without emitting this warning, then the system has recovered.
I don't think random connection issues are going to cause this to happen -- if we randomly lose connection to postgres and this publication action fails, it has to be resolved when we try to write block data to postgres, because that is going to block infinitely on being able to write the data, and we won't hit this action again until that action succeeds.
There was a problem hiding this comment.
We can consider whether to make this do something else in master branch (1.2.0) I think?
Sometimes, we lose connection to the sql DB, and the report server write operation fails.
This should not be logged as an error because it's not actionable and errors get converted to sentry alerts.