pandas groupby is too slow for experiment size datasets

**Problem** 

epf.py uses groupby epoch_id and time operations, for instance in QC and center_eeg.

The groupby operations are too slow for use on experiment sized datasets and need to be replaced, probably with numpy operations.

**Solution**

TBD. Centering is operations on floats, only need the numpy arrays

Maybe vectorize ... something like this pseudo code for center_eeg

- look up rows in each epoch in the centering interval

```
idxs = np.where((epochs.time >= start & epochs.time < stop))
```

- slice out the np array of (n_epochs * n_center_times, n_channels) for the centering interval

```
center_data = epochs[idxs]
```

- unstack/reshape the center_data 2D (n_epochs * n_center_times, n_eeg_streams) to 3D (n_epochs, n_center_times, n_eeg_streams)
- compute epoch mean across times (axis 1) = a 2D array of interval means (n_epochs, n_eeg_streams)
- np. repeat/tile/broacast the interval means for each epoch by the number of times per epoch to original dimensions (n_epochs * n_times, n_channels)

This gives a new 2D array (n_epochs * n_times, n_eeg_streams) where each epoch has the value of the mean in the centering interval for that epoch at that eeg_stream

```
center_mns = np.[tile?repeat?](center_data.reshape(?,?,?).mean(axis=1))
assert center_mns.shape == epochs[data_streams].shape
```
Centering the epochs by the mean of the centering interval is a one line subtraction

```
epochs[eeg_streams] = epochs[eeg_streams] - center_mns
```

Run `%%timeit` to see if this helps, if not find something that does.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas groupby is too slow for experiment size datasets #23

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

pandas groupby is too slow for experiment size datasets #23

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions