Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 8 additions & 17 deletions distarray/dist/tests/test_distarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,7 @@ def test_from_global_dim_data_irregular_block(self):
)
distribution = Distribution(self.context, glb_dim_data)
distarr = DistArray(distribution, dtype=int)
for i in range(global_size):
distarr[i] = i
distarr.toarray()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the toarray() call is equivalent to the for loop above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but not exactly. It iterates over the whole array (like the for loop) and calls __getitem__. But it all happens on the client.

The for loop has a round trip for every element and calls __setitem__.


def test_from_global_dim_data_1d(self):
total_size = 40
Expand Down Expand Up @@ -180,9 +179,7 @@ def test_from_global_dim_data_bu(self):
)
distribution = Distribution(self.context, glb_dim_data)
distarr = DistArray(distribution, dtype=int)
for i in range(rows):
for j in range(cols):
distarr[i, j] = i*cols + j
distarr.toarray()

def test_from_global_dim_data_bc(self):
""" Test creation of a block-cyclic array. """
Expand All @@ -204,12 +201,11 @@ def test_from_global_dim_data_bc(self):
},)
distribution = Distribution(self.context, global_dim_data)
distarr = DistArray(distribution, dtype=int)
for i in range(rows):
for j in range(cols):
distarr[i, j] = i*cols + j
distarr.toarray()
las = distarr.get_localarrays()
local_shapes = [la.local_shape for la in las]
self.assertSequenceEqual(local_shapes, [(3,5), (3,4), (2,5), (2,4)])
self.assertSequenceEqual(local_shapes,
[(3, 5), (3, 4), (2, 5), (2, 4)])

def test_from_global_dim_data_uu(self):
rows = 6
Expand All @@ -226,9 +222,7 @@ def test_from_global_dim_data_uu(self):
)
distribution = Distribution(self.context, glb_dim_data)
distarr = DistArray(distribution, dtype=int)
for i in range(rows):
for j in range(cols):
distarr[i, j] = i*cols + j
distarr.toarray()

def test_global_dim_data_local_dim_data_equivalence(self):
rows, cols = 5, 9
Expand Down Expand Up @@ -300,7 +294,6 @@ def test_global_dim_data_local_dim_data_equivalence(self):
self.assertSequenceEqual(actual, expected)

def test_irregular_block_assignment(self):
global_shape = (5, 9)
global_dim_data = (
{
'dist_type': 'b',
Expand All @@ -313,9 +306,7 @@ def test_irregular_block_assignment(self):
)
distribution = Distribution(self.context, global_dim_data)
distarr = DistArray(distribution, dtype=int)
for i in range(global_shape[0]):
for j in range(global_shape[1]):
distarr[i, j] = i + j
distarr.toarray()


class TestDistArrayCreation(unittest.TestCase):
Expand All @@ -329,7 +320,7 @@ def tearDown(self):
self.context.close()

def test___init__(self):
shape = (100, 100)
shape = (5, 5)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does making this change speed things up much? I'd expect the runtime of this to be essentially independent of the array's shape

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, but this test is really slow for some reason. Local results are cut in half 0.3643 to 0.1543, no clue why.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really surprising -- is the factor of 2 difference repeatable? The new array is 400 times smaller than the original, but the total number of roundtrips is unchanged. We should look into this one to figure out why.

Does making the size = (1000, 1000) increase the runtime by a big amount?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be related to the client - controller communication since the message is sent 4 times. We are sending 100 elements instead of 40,000.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size = (1000, 1000) causes test___init___ to take 20 seconds. A big slowdown. I'm making an issue now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this test is calling da.tondarray() this indexes over every element of the distarray, which is why it is taking so long.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- that make sense. I didn't see the rest of the method in GH's diff view, so I thought the method ended with the fill() call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note I was testing creating distarray's happens in constant time as expected. I confirmed this trying to reproduce the lag in this test.

distribution = Distribution.from_shape(self.context, shape, ('b', 'c'))
da = DistArray(distribution, dtype=int)
da.fill(42)
Expand Down
7 changes: 1 addition & 6 deletions distarray/dist/tests/test_distributed_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,12 +209,7 @@ def test_save_3d(self):

dist = {0: 'b', 1: 'c', 2: 'n'}
distribution = Distribution.from_shape(self.dac, shape, dist=dist)
da = self.dac.empty(distribution)

for i in range(shape[0]):
for j in range(shape[1]):
for k in range(shape[2]):
da[i, j, k] = source[i, j, k]
da = self.dac.fromarray(source, distribution)

self.dac.save_hdf5(self.output_path, da, mode='w')
with self.h5py.File(self.output_path, 'r') as fp:
Expand Down