Replace many Pandas operations with NumPy#198
Replace many Pandas operations with NumPy#198JCGoran wants to merge 4 commits intostefmolin:mainfrom
Conversation
There was a problem hiding this comment.
Congratulations on making your first pull request to Data Morph! Please familiarize yourself with the contributing guidelines, if you haven't already.
|
Thanks for the PR, @JCGoran! As I'm sure you've seen, I have a backlog to get through 😄 I hope to get to this in the next few weeks. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #198 +/- ##
=======================================
Coverage 98.53% 98.53%
=======================================
Files 58 58
Lines 1907 1915 +8
Branches 114 114
=======================================
+ Hits 1879 1887 +8
Misses 25 25
Partials 3 3
|
stefmolin
left a comment
There was a problem hiding this comment.
Let's start by pulling the LineCollection changes into a separate PR.
298870b to
e708f6c
Compare
|
Bump, this is more or less ready for review as-is. |
|
I haven't forgotten 😄 I'm going to work through the PyCon Taiwan sprint PRs first since I couldn't get to them all at the event, and I want to think more about the design of the internals here. I'm traveling right now and will have very limited time for the next couple of weeks. |
| A dataset with columns x and y. | ||
| x : Iterable[Number] | ||
| The ``x`` value of the dataset. | ||
|
|
| x, y = ( | ||
| start_shape.df['x'].to_numpy(copy=True), | ||
| start_shape.df['y'].to_numpy(copy=True), | ||
| ) |
There was a problem hiding this comment.
Can't we use the _x and _y from the Dataset.__init__() changes?
| x, y = ( | |
| start_shape.df['x'].to_numpy(copy=True), | |
| start_shape.df['y'].to_numpy(copy=True), | |
| ) | |
| x, y = start_shape._x, start_shape._y |
There was a problem hiding this comment.
I'm also wondering if we need to copy here, when we copy in the loop.
| self._x = self.df['x'].to_numpy() | ||
| self._y = self.df['y'].to_numpy() |
There was a problem hiding this comment.
| self._x = self.df['x'].to_numpy() | |
| self._y = self.df['y'].to_numpy() | |
| self._x, self._y = self.df[['x', 'y']].to_numpy().T |
| y1 : Iterable[Number] | ||
| The original value of ``y``. | ||
| x2 : Iterable[Number] | ||
| The perturbed value of ``x``. |
There was a problem hiding this comment.
Extra space:
| The perturbed value of ``x``. | |
| The perturbed value of ``x``. |
| self._x = self.df['x'].to_numpy() | ||
| self._y = self.df['y'].to_numpy() |
There was a problem hiding this comment.
Should these be properties? If we change the DataFrame, these will no longer match.
| morphed_data = perturbed_data | ||
| if self._is_close_enough(x, y, *perturbed_data): | ||
| x, y = perturbed_data | ||
| morphed_data = pd.DataFrame({'x': x, 'y': y}) |
There was a problem hiding this comment.
This isn't necessary in the loop with switch to NumPy. We can have _record_frames() only make the DataFrame if we need to save the CSV. The plot() function can be reworked to use NumPy, and to return the DataFrame at the end of this method, we can do that outside of this loop instead of doing it thousands of times.
Describe your changes
Perf before:
Perf after:
which is more or less in-line with the circular shapes.
Checklist