- New Task: Segmentation
- Head Pose Estimation: A Regression Task
- Preview next lesson: IMDB Review Sentiment, an NLP Task
- Extra: Jeremy mentioned activations and pointed to one great resource
Dataset class in PyTorch
class Dataset(object): """An abstract class representing a dataset All other datasets should subclass it and override these methods """ def __getitem__(self, index): """Allow  indexing""" raise NotImplementedError def __len__(self): raise NotImplementedError
__xxx__(): "dunder" xxx.
PyTorch has another class called the
DataLoader for making minibatches. Then, fastai’s
DataLoaders to create a training DataLoader and a validation DataLoader. fastai has the data block API to customize the creation of
DataBunch by isolating the underlying parts of that process in separate blocks, mainly:
- Where are the inputs and how to create them?
- How to split the data into a training and validation sets?
- How to label the inputs?
- What transforms to apply?
- How to add a test set?
- How to wrap in dataloaders and create the DataBunch?
Check the fastai docs for function signatures and types.
To find the corresponding notebooks for the docs, go to the fastai repo
For multi-label image classification such as this one, to put this in a
DataBunch while using the data block API, we need to use
ImageList (and not
ImageDataBunch). This will make sure the model created has the proper loss function to deal with the multiple classes.
# This does image data augmentation by flipping them horizontally by default. # Here we enable vertical flipping as well so it rotates every 90 degrees left and right, so 8 possible settings. # warp: fastai has fast perspective warping. For satellite image we don't need warping tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
We often want to call the same function but with different values of a parameter. For example,
# We want to call with different thresh def acc_02(inp, targ): return accuracy_thresh(inp, targ, thresh=0.2) # Equivalent: a CS concept called "partial" or partial function application, # pass in the original function and the param, returns a new wrapper function (py3) acc_02 = partial(accuracy_thresh, thresh=0.2)
This is really common thing to do!
Question: How to use online feedback to retrain model? Answer: Add the labeled new data into the training set, load the old model, unfreeze, use a slightly larger learning rate and more epochs, train (fine-tune) the model some more.
unfreeze, we train the model’s last layer. The learning rate should look like this.
Note that do not set the learning rate at the bottom, set it at the steepest place.
unfreeze, we finetune the full model. The learning rate should look like this.
For this shape, we find where it starts to go up, and set it to be 10x smaller than that point as the left of the lr range, and the old learning rate for the frozen model divided by 5 or 10. Will talk about this (called discriminative learning rate) in the future.
In the notebook example for the planet data challenge, Jeremy first trained a model on size 128 by 128 image. This is for faster experimentation and can be used as a pretrained model for the actual 256 by 256 image next.
The process is to create the DataBunch with new size
learn.load('stage-2-rn50') # Note, set bs=32 and restart kernel after loading the saved model, # or GPU can run out of memory data = (src.transform(tfms, size=256) .databunch(bs=32).normalize(imagenet_stats)) learn.data = data data.train_ds.shape # Output: torch.Size([3, 256, 256]) # Freeze means we go back to train the last few layers for transfer learning learn.freeze() learn.lr_find() learn.recorder.plot() # Check the image below for output lr=1e-2/2 learn.fit_one_cycle(5, slice(lr)) learn.save('stage-1-256-rn50') learn.unfreeze() learn.fit_one_cycle(5, slice(1e-5, lr/5)) learn.recorder.plot_losses() # Note: save() is used for stages. Data used is also saved learn.save('stage-2-256-rn50') # export() returns a pickle file for inference. It saves all # transforms, weights but not data. # Check https://docs.fast.ai/tutorial.inference.html learn.export()
New Task: Segmentation
In segmentation, every pixel needs to be classified.
The training data needs to have images with all pixels labeled. It’s hard to create such datasets, so usually we download them.
Every time we use the datasets, we should find the citation and credit the creators appropriately.
Question: what to do if training loss > validation loss Answer: This means underfitting. Try 1. Train more epochs 2. Smaller learning rate 3. Decrease regularization: weight decay, dropout, data augmentation
The model for segmentation used is U-Net
fit_one_cyclehelps you get "super convergence", i.e. train 10x faster than plain SGD. It has smaller lr and large momentum in the beginning, and then lr increases, momentum decreases. When it's close to optimum, lr decreases again and momentum increases. Paper by Leslie Smith. Towards Data Science article here.
You pass in the max learning rate, and it uses a range of learning rates as the picture shows below, it goes up first and down after. The downward part is called annealing which is well known, but the upward part is quite new. The motivation is to avoid the optimization being stuck in a local minimum. The loss surface is usually quite bumpy at some areas and flat in other areas.
The approach was proposed by Leslie Smith. Read about it more here.
The fastai version of unet is better than the state-of-the-art result published which is a model called hundred-layer-tiramisu!
.to_fp16()to any learner.
learn = unet_learner( data, models.resnet34, metrics=metrics).to_fp16()
Make sure the GPU driver is update-to-date to use this feature.
Head Pose Estimation: A Regression Task
This is a regression task and the output is a set of (x, y) coordinates. We train a CNN. Instead of using a cross-entropy loss, use MSE.
Preview next lesson: IMDB Review Sentiment, an NLP Task
For texts, we create
TextDataBunch from csv. Texts need to be tokenized and numericalized.
When we do text classifications, we actually create 2 models: one is a language model (pretrain, for transfer learning later), the other is a classification model.
The SOTA accuracy for this dataset is ~95% and this notebook achieves that level.
Extra: Jeremy mentioned activations and pointed to one great resource
A Visual Proof that NN can approximate any shape, or, universal approximation theorem:
(This is an online book, need to check it out!)
What really is deep learning from a math perspective: It's a series of matrix multiplications with max(x, 0) (ReLU) in between, and we use gradient descent to adjust the weights in these matrices to reduce the final error. The forward pass is something like E = loss_func(W3.dot( max(W2.dot( max( W1.dot(X), 0)), 0), 0)) The only thing is the matrices are large. That's it.
Usually, the hardest part is to create the DataBunch, the rest is straightforward in fastai.