Skip to content

Get within time range over segments #525

@LawrenceBorst

Description

@LawrenceBorst

It would be ideal to have a function like am.get_within_time_range(segments: pd.DataFrame, **kwargs) that lets us get within time range over segments, instead of passing in something like am.annotations.

I've had the need for such a function on a few occasions.

I paste such a function below I used a while back for some quick experimentation, but it seems to behave slightly differently from CP. For one, I don't think the CP one actually cuts segments at the edges. Of course we could parametrize that.

def _get_within_time_range(
    segments: pd.DataFrame, interval: TimeInterval
) -> pd.DataFrame:
    """
    The ChildProject function doesn't seem to be correct sometimes(?) Besides, it clips segments at the edge,
    which I don't want for the purpose of this script
    This function is also significantly faster and from profiling it's clear that
    the ChildProject routine was the main performance bottleneck in running this script
    """
    segments = segments[
        (
            segments["offset_time"].map(lambda t: t.to_pydatetime().time())
            >= interval.start.time()
        )
        & (
            segments["onset_time"].map(lambda t: t.to_pydatetime().time())
            <= interval.stop.time()
        )
    ]

    segments = segments.apply(get_row_callback_min(interval), axis=1)
    segments = segments.apply(get_row_callback_max(interval), axis=1)

    return segments


def get_row_callback_min(
    time_interval: TimeInterval,
) -> Callable[[pd.Series], pd.Series]:
    def row_callback(row: pd.Series) -> bool:
        onset_time: pd.Timestamp = row["onset_time"]

        if onset_time.to_pydatetime().time() <= time_interval.start.time():
            row["onset_time"] = pd.Timestamp(
                year=onset_time.year,
                month=onset_time.month,
                day=onset_time.day,
                hour=time_interval.start.hour,
                minute=time_interval.start.minute,
                second=time_interval.start.second,
            )

        return row

    return row_callback


def get_row_callback_max(
    time_interval: TimeInterval,
) -> Callable[[pd.Series], pd.Series]:
    def row_callback(row: pd.Series) -> bool:
        offset_time: pd.Timestamp = row["offset_time"]

        if offset_time.to_pydatetime().time() >= time_interval.stop.time():
            row["offset_time"] = pd.Timestamp(
                year=offset_time.year,
                month=offset_time.month,
                day=offset_time.day,
                hour=time_interval.stop.hour,
                minute=time_interval.stop.minute,
                second=time_interval.stop.second,
            )

        return row

    return row_callback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions