Suggestion: Convolutional-neural-nets?

Somewhere between a fully fledged neural-net library, and the existing convolution functions in pal/image - would there be any elements that are a good fit for the Pal library ?

imagine the following function:

3D x 3D -> 2D convolution, with bias, clamped output (supply a minimum, e.g. zero,-1, -FLT_MAX for no effect?) for 'ReLU', and optional max-pooling (N=1,2,3.. ? N=1 for no pooling) to reduce the output image size; 

This would be a big chunk of the basic layer-evaluation of the deep-learning image recognition algorithms.  You'd invoke multiple 3dx2d convolutions for a 3D result. 

It would be important to include the ReLU & max-pooling since this would avoid significant memory traffic. You could provide a helper function for a 3d x 3d->2d convolution without those steps that just calls it with (min=0, pooling=1) ... or have an outright separate function if needed.

An input could be (width x height x channels) - image-planes -  or a true 3D image (volume data).

Then imagine functions for training such a thing. (backpropogation and accumulating error deltas throught the weights).

I think a single step like that would go a long way to leveraging the epiphany hardware; you'd have a lot of data-reuse, perhaps uploading an entire 3D filter across multiple cores, then streaming an image through it .

This would be a stepping stone to a full neural net library which could implement pipelines between net layers. Getting some capability in the Pal library might make the epiphany chip more appealing to neural-net/Deep-learning researchers.

Short of that, are there other ways to generalize 2D convolutions to be more useful ?

e.g if the 3rd dimension was interleaved (e.g. [row0 [r0,g0,b0,r1,g1,b1...] row1[r0,g0,b0, r1,g1,b1 ]...]), could you treat it as a 2D convolution with strided input (then merely adding 'col_step', 'row_step' parameters e.g. col_step=3 for r,g,b input..). This would still require the insertion of a clamping & max-pool stage to your 2d convolution, and again if worried about parameter explosion , a simple helper could provide a streamlined interface. Thresholding/clamping is fairly common in image-processing I think (e.g. extracting certain edges from an image, bluring highlights, keeping results in a output range for bit-reduction, etc). 
Stepped inputs/outputs would allow using this function for filtered image down scaling, or perhaps colour-space conversions
<code>
    /*2d convolution, extended */

```
void p_conv2d_ex_f32(const float *src_image, const float *filter, float *output, int rows, int cols,

              int row_stride,  // distance in memory between input rows (=cols or less to apply to subtiles)

              int mrows, int mcols,  // filter size

              float bias,           // added to all values prior to output.

              float min_output, //=-FLT_MAX for no effect, =0.0 to only emit values >0 etc

              float max_output, // = FLT_MAX for no effect,

              int column_step, int row_step, int output_step, // =1 by default; =3 for interleaved RGB and filter, etc.

              int max_pooling    //=1 for no reduction, =2 to pool 2x2 squares to one output etc

);                // can also implement 3Dx3D->2D convolution by setting 'column_step' to depth, with interleaved 
```

</code>

( also a minor point, I would have thought it more logical to order 'cols, rows' as per width/height for images stored in memory; you can still label them rows/cols for people thinking about it as a matrix in the linear-algebra sense.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Convolutional-neural-nets? #224

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Suggestion: Convolutional-neural-nets? #224

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions