Skip to content

[summary] df may get duplicated rows when using time_index=raw #426

@asnyv

Description

@asnyv

If using time_index=raw, the resample_smry_dates method is still called to accommodate for functionality like start_date, end_date and normalize, and then used as input when fetching the summary data through ecl. If one or more timesteps are shorter than the resolution of the TIME vector, resulting in non-unique values in the TIME vector, the data is only fetched from the first of the time steps with non-unique TIME value giving duplicated rows rather than fetching the data for the updated timestep. This was observed by user after #412, since the first of the non-unique time steps typically has a considerably longer TIMESTEP than the subsequent ones, and thus the TIMESTEP correction may not be robust due to the possibility of jumping further than the unique TIME value.

For time_index=None this is not an issue, as resample_smry_dates is not called (though that might mean that not all the arguments like start_date actually work for None?). It seems like we have to call for data from ecl for raw like we currently do for None to make sure that we actually get the raw data. If we have to support start_date, end_date and normalize, we likely have to fetch the data twice from ecl if they are defined, once for raw (like currently for None), and then for start_date, end_date and normalize. Then do the possible TIMESTEP correction (#412) before cutting/merging the two df's. If so, this should probably be the behavior for both raw and None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions