Skip to content

Speed up 3 slow tests.#369

Merged
bgrant merged 4 commits into
enthought:masterfrom
cowlicks:faster-tests
May 16, 2014
Merged

Speed up 3 slow tests.#369
bgrant merged 4 commits into
enthought:masterfrom
cowlicks:faster-tests

Conversation

@cowlicks
Copy link
Copy Markdown
Contributor

@cowlicks cowlicks commented May 9, 2014

No description provided.

@cowlicks cowlicks added the easy label May 9, 2014
Comment thread distarray/dist/tests/test_distarray.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kindof cheating, isn't it? :) I'd prefer us to keep the bounds the same and find a better and faster way to do the assignment rather than the for loops below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the for loops can be replaced with fromarray. But I don't see any reason to make this test so large.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, May 12, 2014 at 11:17 AM, Blake Griffith
notifications@github.comwrote:

In distarray/dist/tests/test_distarray.py:

 def test_from_global_dim_data_uu(self):
  •    rows = 6
    
  •    cols = 20
    
  •    rows = 3
    
  •    cols = 10
    

the for loops can be replaced with fromarray.

+1.

But I don't see any reason to make this test so large.

Under what definition of "large" is a 120-element array considered so? We
should have a few 106 element arrays in the normal test suite for some
small scaling tests. The performance and profiling test suite should work
with 10
9 element arrays routinely to make obvious any communication or
performance bottlenecks.

Please keep this test at 120 elements at least, and when you or someone
else implements it on top of fromarray with slicing, we should bump that
number up a few orders of magnitude.


Reply to this email directly or view it on GitHubhttps://github.com//pull/369/files#r12538308
.

Kurt W. Smith, Ph.D. ksmith@enthought.com
Enthought, Inc. http://www.enthought.com | 512.536.1057

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these from_global_dim_data_<distribution> tests are indexing with for loops like this. Should this be changed for the all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are also filling up the whole array with unique values, then not doing anything with them...? I suppose these tests are also trying to test that ever element can be indexed too.

But if we just want to test that we can make a distarray with a given distribution and index all its elements we can do:

  • create distribution
  • create distarray
  • call distarray.tondarraay

Should I change the tests to work like this?

@kwmsmith
Copy link
Copy Markdown
Contributor

@cowlicks what's the overall effect on peformance with these changes?

@kwmsmith kwmsmith added this to the 0.3 milestone May 10, 2014
@cowlicks
Copy link
Copy Markdown
Contributor Author

@kwmsmith here is before remote
image
... and local
image

I'm currently trying to generate the after images. Also, sorry these are images, the nose plugin uses color escape codes which I have not been able to remove, so piping the output to a text file is unreadable.

@cowlicks
Copy link
Copy Markdown
Contributor Author

I've added changes to more tests to this PR. See 23d7157

@cowlicks
Copy link
Copy Markdown
Contributor Author

Changes mostly affect test_distarray.py so here are the remote timings there, on this pr:
screenshot from 2014-05-12 16 31 29

And on master:

screenshot from 2014-05-12 16 36 30

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the toarray() call is equivalent to the for loop above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but not exactly. It iterates over the whole array (like the for loop) and calls __getitem__. But it all happens on the client.

The for loop has a round trip for every element and calls __setitem__.

cowlicks added 4 commits May 16, 2014 14:00
speed things up by replacing for loops over arrays with calls to
.toarray(). Which still tests that the array can be indexed, but it
pulls it to the client and does it all on the client instead of
remotely.

This change affects:
    * test_from_global_dim_data_irregular_block
    * test_from_global_dim_data_bu
    * test_from_global_dim_data_uu
    * test_from_global_dim_data_bc
    * test_irregular_block_assignment
@bgrant
Copy link
Copy Markdown
Contributor

bgrant commented May 16, 2014

LGTM. I did similar things in #375, so there may be a minor conflict.

bgrant added a commit that referenced this pull request May 16, 2014
@bgrant bgrant merged commit 521ef90 into enthought:master May 16, 2014
@cowlicks cowlicks deleted the faster-tests branch May 17, 2014 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants