Skip to content

bad timew performance regression from timegm relocation into libshared #104

@smemsh

Description

@smemsh

The commit 5cb49e6 from #99 appears to create severe performance degradation in TimeWarrior as detailed in GothenburgBitFactory/timewarrior#703. It's clear by backing out the libshared peg in timewarrior to the previous commit 8ad3646, it fixes the performance issue. FYI @tobiolo.

I have ~9.25k time intervals in my timew db. After adding this gmtime() patch the following main differences from call count trace of this command:

timew get dom.tracked.1.start
  • 37k (4x intervals) new openat()/close()/lseek(), which appear to be for /etc/localtime
  • 112k (12x intervals) new newfstatat()
  • 74k (8x intervals) new read()
  • 18k (2x intervals) new mktime()/setenv()/unsetenv()/timegm(), joining the count of time()/gmtime() whereas correspondingly, these functions are not called before the patch.
  • 37k (4x intervals) new tzset() calls

So I think for every interval in my timew data, a Datetime object is used for beginning and end, or maybe Datetime::resolve() is called and ends up running:

_date = utc ? timegm (&t) : mktime (&t);

So there's 18.5k of those for 9.25k entries, and for some reason 2x that number of tzset() are called (I see only one call in the patch, not 2), every one of which opens /etc/localtime. And then there's seeking and reading.

So that should be fixed...

I took a look at the glibc version of timegm() and followed it, I don't see any tzset() being done inside that. I don't know why the version in Taskwarrior was doing that. It must be that prior to moving this into libshared, Timewarrior was using the system version of timegm(), but now it's using the one in libshared which does not detect HAVE_TIMEGM correctly.

My cmake is not defining HAVE_TIMEGM, yet my system does have this function. I think we should maybe make this private implementation used on Windows only as the timegm() function appears to be available on both GNU and BSD.

Also I don't see why we need tzset()/*env() at all even in the private implementation. This is unnecessary (as evidenced by the glibc implementation not needing it).

If taskwarrior is doing this it would likewise benefit from not using this slow implementation. However I don't see timegm() being called in taskwarrior anywhere...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions