Skip to content

Race condition between __record_timer() and flush() #86

@oseemann

Description

@oseemann

Occasionally our pystatsd instance freezes, maybe in a similar way as in #40. Strace showed a KeyError in the following line:

See server.py:118

if key not in self.timers:
            self.timers[key] = [ [], ts ]
self.timers[key][0].append(float(value or 0))
self.timers[key][1] = ts  # <--- KeyError

It seems that the check ensures that the key is always present in self.timers. However, there's a second thread running, flushing the contents of self.timers periodically, and also deleting the contents of self.timers.

See server.py:228

del(self.timers[k])

When the flush happens right between the check in line 115 and the access in lines 117 and 118 then a KeyError will be raised. We think this is exactly what happens.

We haven't seen it happen in self.__record_counter, but it looks vulnerable in the same way.

We haven't had time yet to come up with a patch, but maybe something like this would work:

Use a threading.Lock() to synchronize access to self.timers and self.counters. The self.flush() involves network communication and can therefore take a long time. We don't want to block the receiving code during that time. Therefore maybe just use the lock to (deep-)copy self.timers and reset it. Then release the lock and flush from the copy. (Or not copy at all, but just switch references, i.e. assign self.timers to a second variable pointing to the same dictionary and then assign self.timers an empty dict and continue flushing from the new variable.)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions