Skip to content

Latest commit

 

History

History
18 lines (18 loc) · 5.91 KB

File metadata and controls

18 lines (18 loc) · 5.91 KB

Day 1

This was a pretty fun day. About five hours spend in a combination of reading, watching, and experimenting. I covered OOP, kwargs/args, nested loops, error handling, string manipulation, and more advanced ifelse chains. Lots of mini projects to reinforce topics. Especially interesting were classes, as I'd seen them before but hadnt actually written. They seem super useful and clean but I'd like to know a bit more about scope within them and what conventions are. The projects were: a calculator, a shopping cart, and finally a racing car simulator built around classes. Looking forward to numpy tomorrow.

Day 2

This was a more specialized day. I continued learning about classes and inheritance, and also multiple inheritance. I also started working on numpy fundamentals, such as the basic operations with matrices, the indexing operations, and how to create them from files.

Day 3

I dove into NumPy fundamentals for the first part of the day. I learned many ways to manipulate arrays, both array-wise and by index. I learned about and applied the linear algebra that numpy enables. Details in numpylearning.ipynb. To cap the day off, I coded a totally from-scratch 2 layer neural network that predicts XOR gate outputs from inputs. The model, after 1000 iterations, predicted correctly with > 98% accuracy.

Day 4.

First, I added batching into the toy network. I refined the toy network with three layers, added hyperparameters like learning rate and early stopping, and began to learn pytorch for tomorrow. I gave the model batches of 2 input/output pairs at a time. After that, I moved to Pytorch. After building a network from scratch, finally using Pytorch felt like a cheat code. I rebuilt the exact same network in torch, then optimized a few things such as switching the final layer to sigmoid from ReLU. This, plus an order of magnitute more epochs and a lower learning rate, allowed the network to hit over 99.9% confidence. I also did about 30 minutes of python basics, learning about the super method and inhertiance.

Day 5

Today I added the first real dataset to my collection of networks. I used the iris dataset to train a small 6 layer network. The network was able to classify the data points into their respective species with an accuracy of 100% in both test and training data. I added model features I'd never used before, those being scaling, normalization, dropout, and robust model evalutation with a classification report and confusion matrix. I also learned how to do EDA prior to model training by using matplotlib and seaborn. It was very rewarding to see where the things I've been learning fit in. I'm wondering if there's a better way to tune hyperparameters aside from trial-and error.

Day 6

Titanic shocked me. I didn't think there was any chance the data would be descriptive enough for any remotely good predictions. I was very wrong. After looking at some samples other people had made, I trimmed a bunch of somewhat useless features from the dataset and made my third(fourth) net. Pretty huge for this, but I figured I'd start big, with a 128,64,32,16 structure. Much to my surprise, it scored a perfect 100% in test data. After spending quite some time making sure it really was test data, my mind was blown. Somehow, this statistical model had extracted a seemingly perfect set of rules for who lived and died. Incredibly impressive yet very grim indeed. I also spent some time adding l2 weight decay and some more sophisticated plots of model vitals, such as a loss curve. Very fun day.

Day 7

Back after a little while on vacation. Started by drafting a quick flattened MNIST net, 128 and 64 neuron layers got a decent 97% test accuracy. After playing around with the activation function, learning rate, and optimizer, I got that to 98%. That was about the limit for what I could do with an MLP. After reading through the torch conv documentation, I added some convolutional layers to the start of the network. This dramatically improved convergence time and the test accuracy got to about 99.5% (though some of the ones it got "wrong" I think I would struggle just as much to classify). Lastly, I spent a bit over an hour investigating why the DataLoader was eating so much of the runtime (>80%), and got it down to about 60% but still couldnt go further. I'll look into it more tomorrow.

Day 8

The model is much more optimized as of today. After switching to a custom loader that needed less IO, I also switchted datasets to Fashion-MNIST, because the numbers from the number version weren't very helpful at near 100%. Inititally, my test accuracy was at about 90%, with only two convolutional layers, so I added a third. I also switched to Adam as my optimizer. This reached about 92% test accuracy. On the performance side of things, I made a few high impact changes. The first was shrinking the batch size to 1024, a friendlier number that is a power of 2. I also added mixed-precision using torch.amp, which about halved the runtime. Back to accuracy, I switched optimizers again to AdamW with weight decay and added a small LR decay of *0.95 per ten epochs. Batch normalization was the last major change I made, and that brought me up to about 93.5% test accuracy. After playing around a bit with hyperparameters, namely learning rate, I got it up to 94%

Day 9

After completing chapters 3 & 4 of Understanding Deep Learning by Prince, I uploaded the notebook equivalent of those chapters which reinforce the topics covered with their code counterparts. Each notebook is a woven-together sequence of tutorials and opportunities to extend your thinking somewhat. While these notebooks were not all done today, 4.3 was finished today. Chapters 3 and 4 of prince detail shallow and deep net formulism, and will make up the mathematical foundation for the rest of the book. I'm reading this book as part of my plan to master deep learning, a critical element of which is the mathematical basis it is built upon. More notebooks to come as I dive into chapter 5, Loss Functions.