Replace many Pandas operations with NumPy by JCGoran · Pull Request #198 · stefmolin/data-morph

JCGoran · 2024-07-15T22:02:25Z

Describe your changes

use numpy instead of Pandas to avoid the overhead

Perf before:

320361688 function calls (316213771 primitive calls) in 116.715 seconds

Perf after:

79419311 function calls (78769517 primitive calls) in 43.085 seconds

which is more or less in-line with the circular shapes.

Checklist

Test cases have been modified/added to cover any code changes.
Docstrings have been modified/created for any code changes.
All linting and formatting checks pass (see the contributing guidelines for more information).

github-actions

Congratulations on making your first pull request to Data Morph! Please familiarize yourself with the contributing guidelines, if you haven't already.

stefmolin · 2024-07-16T22:15:39Z

Thanks for the PR, @JCGoran! As I'm sure you've seen, I have a backlog to get through 😄 I hope to get to this in the next few weeks.

codecov · 2024-07-16T22:15:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.53%. Comparing base (e440ee7) to head (0a25272).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #198   +/-   ##
=======================================
  Coverage   98.53%   98.53%           
=======================================
  Files          58       58           
  Lines        1907     1915    +8     
  Branches      114      114           
=======================================
+ Hits         1879     1887    +8     
  Misses         25       25           
  Partials        3        3

Files with missing lines	Coverage Δ
src/data_morph/data/dataset.py	`74.07% <100.00%> (+0.65%)`	⬆️
src/data_morph/data/stats.py	`100.00% <100.00%> (ø)`
src/data_morph/morpher.py	`100.00% <100.00%> (ø)`
src/data_morph/plotting/static.py	`100.00% <100.00%> (ø)`
tests/data/test_stats.py	`100.00% <100.00%> (ø)`
tests/test_morpher.py	`100.00% <100.00%> (ø)`

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

stefmolin

Let's start by pulling the LineCollection changes into a separate PR.

src/data_morph/shapes/bases/line_collection.py

JCGoran · 2024-09-24T16:28:51Z

Bump, this is more or less ready for review as-is.

stefmolin · 2024-09-25T12:10:45Z

I haven't forgotten 😄 I'm going to work through the PyCon Taiwan sprint PRs first since I couldn't get to them all at the event, and I want to think more about the design of the internals here. I'm traveling right now and will have very limited time for the next couple of weeks.

stefmolin · 2024-07-22T22:10:22Z

src/data_morph/data/stats.py

-        A dataset with columns x and y.
+    x : Iterable[Number]
+        The ``x`` value of the dataset.
+


Suggested change

stefmolin · 2024-07-22T22:11:04Z

src/data_morph/morpher.py

+        x, y = (
+            start_shape.df['x'].to_numpy(copy=True),
+            start_shape.df['y'].to_numpy(copy=True),
+        )


Can't we use the _x and _y from the Dataset.__init__() changes?

Suggested change

x, y = (

start_shape.df['x'].to_numpy(copy=True),

start_shape.df['y'].to_numpy(copy=True),

)

x, y = start_shape._x, start_shape._y

I'm also wondering if we need to copy here, when we copy in the loop.

stefmolin · 2024-07-22T22:12:00Z

src/data_morph/data/dataset.py

+        self._x = self.df['x'].to_numpy()
+        self._y = self.df['y'].to_numpy()


Suggested change

self._x = self.df['x'].to_numpy()

self._y = self.df['y'].to_numpy()

self._x, self._y = self.df[['x', 'y']].to_numpy().T

stefmolin · 2024-07-22T22:12:47Z

src/data_morph/morpher.py

+        y1 : Iterable[Number]
+            The original value of ``y``.
+        x2 : Iterable[Number]
+            The perturbed  value of ``x``.


Extra space:

Suggested change

The perturbed value of ``x``.

The perturbed value of ``x``.

stefmolin · 2025-02-09T21:24:26Z

src/data_morph/data/dataset.py

+        self._x = self.df['x'].to_numpy()
+        self._y = self.df['y'].to_numpy()


Should these be properties? If we change the DataFrame, these will no longer match.

stefmolin · 2025-02-09T21:30:52Z

src/data_morph/morpher.py

-                morphed_data = perturbed_data
+            if self._is_close_enough(x, y, *perturbed_data):
+                x, y = perturbed_data
+                morphed_data = pd.DataFrame({'x': x, 'y': y})


This isn't necessary in the loop with switch to NumPy. We can have _record_frames() only make the DataFrame if we need to save the CSV. The plot() function can be reworked to use NumPy, and to return the DataFrame at the end of this method, we can do that outside of this loop instead of doing it thousands of times.

github-actions bot added testing Relating to the testing suite shapes Work relating to shapes module data Work relating to data module plotting Work relating to plotting module morpher Work relating to morpher module labels Jul 15, 2024

github-actions bot reviewed Jul 15, 2024

View reviewed changes

stefmolin added this to the 0.3.0 milestone Jul 16, 2024

JCGoran mentioned this pull request Jul 21, 2024

Cache various statistics to improve performance #204

Draft

3 tasks

stefmolin reviewed Jul 22, 2024

View reviewed changes

src/data_morph/shapes/bases/line_collection.py Outdated Show resolved Hide resolved

src/data_morph/shapes/bases/line_collection.py Outdated Show resolved Hide resolved

JCGoran mentioned this pull request Jul 22, 2024

Optimize LineCollection #206

Merged

3 tasks

Replace some pandas operations with numpy

e708f6c

JCGoran force-pushed the jelic/feature/vectorize branch from 298870b to e708f6c Compare July 22, 2024 22:08

github-actions bot removed the shapes Work relating to shapes module label Jul 22, 2024

JCGoran changed the title ~~Refactor to use more numpy functions internally~~ Replace many Pandas operations with NumPy Jul 22, 2024

JCGoran requested a review from stefmolin July 30, 2024 18:04

Merge branch 'main' into jelic/feature/vectorize

cd2dd8e

JCGoran and others added 2 commits November 10, 2024 16:58

Merge branch 'main' into jelic/feature/vectorize

34be08a

Merge branch 'main' into jelic/feature/vectorize

0a25272

stefmolin requested changes Feb 9, 2025

View reviewed changes

stefmolin removed this from the 0.3.0 milestone Feb 17, 2025

		self._x = self.df['x'].to_numpy()
		self._y = self.df['y'].to_numpy()

	self._x = self.df['x'].to_numpy()
	self._y = self.df['y'].to_numpy()
	self._x, self._y = self.df[['x', 'y']].to_numpy().T

Uh oh!

Conversation

JCGoran commented Jul 15, 2024 • edited by stefmolin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

stefmolin commented Jul 16, 2024

Uh oh!

codecov bot commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stefmolin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JCGoran commented Sep 24, 2024

Uh oh!

stefmolin commented Sep 25, 2024

Uh oh!

stefmolin Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

stefmolin Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

stefmolin Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

stefmolin Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

stefmolin Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

stefmolin Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

stefmolin Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JCGoran commented Jul 15, 2024 •

edited by stefmolin

Loading

codecov bot commented Jul 16, 2024 •

edited

Loading