Skip to content

Pandas dataframe used where numpy array is expected in in ArrayUtils.band_df #3

@CyanBC

Description

@CyanBC

When a mask is not used for ArrayUtils there are instances where the input array calling that function is not properly the expected numpy array. The outcome is an ellipsis error due to the difference in indexing between pandas and numpy.

[ ... , bn ]

in Lyzenga method guide you have:
x_train, x_test, y_train, y_test = train_test_split( df[imrds.band_names],df.depth,train_size=20000,random_state=5)

which returns pandas dataframes for x_train, x_test, y_train, y_test

Moving further along these are passed to ArrayUtils.band_df as pandas dataframes.
traindf = ArrayUtils.band_df( x_train )

Even though the function expects:
imarr : np.array or np.ma.MaskedArray

This was probably not noticed because in testing ArrayUtils.band_df a mask was always used, which would have run ArrayUtils.equalize_band_masks which doesn’t have indexing problems and returns:
tuple of N np.ma.MaskedArray

The error you get is:
KeyError: (Ellipsis, 0)
because you cannot subset pandas dataframe with [ ... , ]

my quick-fix was to do
x_train = x_train.as_matrix() x_test = x_test.as_matrix() y_train = y_train.as_matrix() y_test = y_test.as_matrix()

which seems to work, but I don't know if it will cause issues later on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions