SparseCoding/TODO.md at master · BenCowen/SparseCoding

PHASE 0

Pull ENTIRE thread on L-FISTA: idea: do dict learning on speech spectra

STAGE A: basic 0. ~~Setup PyTest!~~
1. ~~Make FISTA work (unit test/toy problem)~~
2. ~~(A) Make L-FISTA work (unit test/toy problem)~~
3. ~~(B) (L)ISTA (subclass FISTA or vice versa)~~
4. Make a basic dict learning demo
5. Make a convolutional dict learning demo
6. Multi-dataset dataloader for mnist+asirra. 2. unit test dataloader... 2. atom visualizer
7. Combine into config system. Train a linear dictionary on medium data (e.g. mnist, cifar)
8. Train L-FISTA and show it's as good + faster than FISTA. Compare with generic neural net (CNN) encoder. (wallclock plots, sparsity plots: cloud/histogram;; unit-test viz suite?)
9. Repeat 0,1 for SALSA class (unit-test). 2,3 should follow automatedly.
10. Start blog / readme /colab /notebook about this.
STAGE B: more interesting example 8. Try more interesting dataset 1. 1D on financial time series? 2. Speech separation as goal?... 3. Generalize to 2D dictionaries? 9. Update blog (maybe separate blog)
STAGE C: beyond backprop 10. Encoder subclass for saving + optimizing codes (unit-test) 11. Alt-min style: joint dict/LFista coolness (...or is that implicit?) 12. Update blog (maybe separate blog)

Make it public-ready:

STAGE A: dual-dictionary / MCA 0. FISTA and SALSA MCA (unit test/toy problem)
1. Dual-dictionary class + dataloader encapsulation framework (unit test)
2. LFISTA and LSALSA-MCA (unit test/toy problem)
3. LFISTA and LSALSA-MCA (MNIST + ASIRRA)
  1. re-create all plots
  2. make classifier thing... or just skip this particular part.
4. Update blog: official LSALSA paper blog
5. At this point, should be done recreating paper. Email Anna (and respond to rando?)
  1. visualize the analytical results somehow? Like a N=2 version?... top couple eig vals?
STAGE B: alt-min 6. Use framework to train deep neural net on some classification problem (unit test?) 7. Make each layer an algo block, try realizing parallelized layer optimization 1. spawn Alternating Minimizer for each layer? 2. or shold each layer be an encoder and their optimize methods get called in parallel? 8. If it works, do unit tests and some time tests 9. write/update blog, email Anna+Irina etc...

** Crazy Ideas...*

joint dictionary learning for MCA
use linear dictionaries to guide activation (or saliency) maps of less-interpretable architectures
extension of beyond-backprop to multi-task/objective?
"Take your time and THINK": variable length foward propagation in small architectures
1. tiny convex rnns as variable complexity neuron bundles