Somewhere between a fully fledged neural-net library, and the existing convolution functions in pal/image - would there be any elements that are a good fit for the Pal library ?
imagine the following function:
3D x 3D -> 2D convolution, with bias, clamped output (supply a minimum, e.g. zero,-1, -FLT_MAX for no effect?) for 'ReLU', and optional max-pooling (N=1,2,3.. ? N=1 for no pooling) to reduce the output image size;
This would be a big chunk of the basic layer-evaluation of the deep-learning image recognition algorithms. You'd invoke multiple 3dx2d convolutions for a 3D result.
It would be important to include the ReLU & max-pooling since this would avoid significant memory traffic. You could provide a helper function for a 3d x 3d->2d convolution without those steps that just calls it with (min=0, pooling=1) ... or have an outright separate function if needed.
An input could be (width x height x channels) - image-planes - or a true 3D image (volume data).
Then imagine functions for training such a thing. (backpropogation and accumulating error deltas throught the weights).
I think a single step like that would go a long way to leveraging the epiphany hardware; you'd have a lot of data-reuse, perhaps uploading an entire 3D filter across multiple cores, then streaming an image through it .
This would be a stepping stone to a full neural net library which could implement pipelines between net layers. Getting some capability in the Pal library might make the epiphany chip more appealing to neural-net/Deep-learning researchers.
Short of that, are there other ways to generalize 2D convolutions to be more useful ?
e.g if the 3rd dimension was interleaved (e.g. [row0 [r0,g0,b0,r1,g1,b1...] row1[r0,g0,b0, r1,g1,b1 ]...]), could you treat it as a 2D convolution with strided input (then merely adding 'col_step', 'row_step' parameters e.g. col_step=3 for r,g,b input..). This would still require the insertion of a clamping & max-pool stage to your 2d convolution, and again if worried about parameter explosion , a simple helper could provide a streamlined interface. Thresholding/clamping is fairly common in image-processing I think (e.g. extracting certain edges from an image, bluring highlights, keeping results in a output range for bit-reduction, etc).
Stepped inputs/outputs would allow using this function for filtered image down scaling, or perhaps colour-space conversions
/*2d convolution, extended */
void p_conv2d_ex_f32(const float *src_image, const float *filter, float *output, int rows, int cols,
int row_stride, // distance in memory between input rows (=cols or less to apply to subtiles)
int mrows, int mcols, // filter size
float bias, // added to all values prior to output.
float min_output, //=-FLT_MAX for no effect, =0.0 to only emit values >0 etc
float max_output, // = FLT_MAX for no effect,
int column_step, int row_step, int output_step, // =1 by default; =3 for interleaved RGB and filter, etc.
int max_pooling //=1 for no reduction, =2 to pool 2x2 squares to one output etc
); // can also implement 3Dx3D->2D convolution by setting 'column_step' to depth, with interleaved
( also a minor point, I would have thought it more logical to order 'cols, rows' as per width/height for images stored in memory; you can still label them rows/cols for people thinking about it as a matrix in the linear-algebra sense.)
Somewhere between a fully fledged neural-net library, and the existing convolution functions in pal/image - would there be any elements that are a good fit for the Pal library ?
imagine the following function:
3D x 3D -> 2D convolution, with bias, clamped output (supply a minimum, e.g. zero,-1, -FLT_MAX for no effect?) for 'ReLU', and optional max-pooling (N=1,2,3.. ? N=1 for no pooling) to reduce the output image size;
This would be a big chunk of the basic layer-evaluation of the deep-learning image recognition algorithms. You'd invoke multiple 3dx2d convolutions for a 3D result.
It would be important to include the ReLU & max-pooling since this would avoid significant memory traffic. You could provide a helper function for a 3d x 3d->2d convolution without those steps that just calls it with (min=0, pooling=1) ... or have an outright separate function if needed.
An input could be (width x height x channels) - image-planes - or a true 3D image (volume data).
Then imagine functions for training such a thing. (backpropogation and accumulating error deltas throught the weights).
I think a single step like that would go a long way to leveraging the epiphany hardware; you'd have a lot of data-reuse, perhaps uploading an entire 3D filter across multiple cores, then streaming an image through it .
This would be a stepping stone to a full neural net library which could implement pipelines between net layers. Getting some capability in the Pal library might make the epiphany chip more appealing to neural-net/Deep-learning researchers.
Short of that, are there other ways to generalize 2D convolutions to be more useful ?
e.g if the 3rd dimension was interleaved (e.g. [row0 [r0,g0,b0,r1,g1,b1...] row1[r0,g0,b0, r1,g1,b1 ]...]), could you treat it as a 2D convolution with strided input (then merely adding 'col_step', 'row_step' parameters e.g. col_step=3 for r,g,b input..). This would still require the insertion of a clamping & max-pool stage to your 2d convolution, and again if worried about parameter explosion , a simple helper could provide a streamlined interface. Thresholding/clamping is fairly common in image-processing I think (e.g. extracting certain edges from an image, bluring highlights, keeping results in a output range for bit-reduction, etc).
Stepped inputs/outputs would allow using this function for filtered image down scaling, or perhaps colour-space conversions
/*2d convolution, extended */
( also a minor point, I would have thought it more logical to order 'cols, rows' as per width/height for images stored in memory; you can still label them rows/cols for people thinking about it as a matrix in the linear-algebra sense.)