-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Could it be feasible to use a tool like Pandas in order to check the input sizes and shapes before launching any punpy computations?
Punpy allows a lot of flexibility in the input and output forms/shape/dimensions, but it is somehow difficult to figure out the exact input and output shapes that are expected. It is especially true when dealing with the default numpy parallelisation, that adds an extra dimension to the input shape.
A tool like pandas implements an index feature, which could be accessed by a format checker in order to decide if the samples, input, output, uncertainties or covariance shapes and dimensions are coherent. It could also allow to determine the output shape before launching the propagation.
Furthermore, the indexing feature would allow to reorder the parameters automatically, by making sure the column names match between the uncertainties and the inputs (for instance), so that there is no confusion between two dimensions or columns in the input format. These are indeed silly mistakes, but debugging those can be very tedious, and I believe I might not be the only one facing this problem.
I am thinking about pandas because I believe it is one of the most widespread data preprocessing library in the python scientific community that Punpy targets. Other option like Xarray exist, but I think these seem more complex to use. This is indeed an opinion, and I would not mind using another library if it is more suitable.
I would suggest to develop such feature as a separate class/function/module from the MCPropagation or LPUPropagation, and decouple this format checking feature from the core computations performed by this two later classes. Thus format checking would be a completely optional step, and users who simply want to pass basic numpy array to the propagation methods could still do it.