Skip to content

Running time almost not affected by data size or GPU #8

@zhxiaokang

Description

@zhxiaokang

Thank you for the great tool!

While testing it, I noticed that python ECOLE_call.py takes a very long time, so I have done some experiments, and have got the following statistics:

Dataset Sample CPU Duration GPU Duration Speedup
test HG001_part 56m 35s 47m 46s 1.18x
test HG006_part 59m 25s 47m 14s 1.26x
GIAB HG001 53m 05s 51m 16s 1.04x
GIAB HG002 1h 01m 10s 50m 59s 1.20x
GIAB HG003 55m 55s 48m 26s 1.15x
GIAB HG004 51m 32s 48m 43s 1.06x
GIAB HG005 59m 00s 50m 59s 1.16x
GIAB HG006 1h 00m 00s 48m 39s 1.23x
GIAB HG007 57m 57s 49m 15s 1.18x

HG001_part and HG006_part are subsets of the original data, which have about 2% of the reads.

I have two surprising observations:

  1. Overall, GPU didn't speed up much than CPU
  2. The subset data took almost the same time as the full data (HG001_part even took longer than HG001 while using CPU)

Is this expected? If so, why is that happening?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions