-
Notifications
You must be signed in to change notification settings - Fork 18
Pandas dataframe used where numpy array is expected in in ArrayUtils.band_df #3
Description
When a mask is not used for ArrayUtils there are instances where the input array calling that function is not properly the expected numpy array. The outcome is an ellipsis error due to the difference in indexing between pandas and numpy.
[ ... , bn ]
in Lyzenga method guide you have:
x_train, x_test, y_train, y_test = train_test_split( df[imrds.band_names],df.depth,train_size=20000,random_state=5)
which returns pandas dataframes for x_train, x_test, y_train, y_test
Moving further along these are passed to ArrayUtils.band_df as pandas dataframes.
traindf = ArrayUtils.band_df( x_train )
Even though the function expects:
imarr : np.array or np.ma.MaskedArray
This was probably not noticed because in testing ArrayUtils.band_df a mask was always used, which would have run ArrayUtils.equalize_band_masks which doesn’t have indexing problems and returns:
tuple of N np.ma.MaskedArray
The error you get is:
KeyError: (Ellipsis, 0)
because you cannot subset pandas dataframe with [ ... , ]
my quick-fix was to do
x_train = x_train.as_matrix() x_test = x_test.as_matrix() y_train = y_train.as_matrix() y_test = y_test.as_matrix()
which seems to work, but I don't know if it will cause issues later on.