In this tutorial, we look in depth at the middle level API for collecting data in computer vision. First we will see how to use:
Those are just functions with added functionality. For dataset processing, we will look in a second part at
TfmdLists
to apply onePipeline
ofTranform
s on a collection of itemsDatasets
to apply severalPipeline
ofTransform
s on a collection of items in parallel and produce tuples
The general rule is to use TfmdLists
when your transforms will output the tuple (input,target) and Datasets
when you build separate Pipeline
s for each of your input(s)/target(s).
After this tutorial, you might be interested by the siamese tutorial that goes even more in depth in the data APIs, showing you how to write your custom types and how to customize the behavior of show_batch
and show_results
.
from fastai.vision.all import *
Cleaning and processing data is one of the most time-consuming things in machine learning, which is why fastai tries to help you as much as it can. At its core, preparing the data for your model can be formalized as a sequence of transformations you apply to some raw items. For instance, in a classic image classification problem, we start with filenames. We have to open the corresponding images, resize them, convert them to tensors, maybe apply some kind of data augmentation, before we are ready to batch them. And that's just for the inputs of our model, for the targets, we need to extract the label of our filename and convert it to an integer.
This process needs to be somewhat reversible, because we often want to inspect our data to double check what we feed the model actually makes sense. That's why fastai represents all those operations by Transform
s, which you can sometimes undo with a decode
method.
First we'll have a look at the basic steps using a single MNIST image. We'll start with a filename, and see step by step how it can be converted in to a labelled image that can be displayed and used for modeling. We use the usual untar_data
to download our dataset (if necessary) and get all the image files:
source = untar_data(URLs.MNIST_TINY)/'train'
items = get_image_files(source)
fn = items[0]; fn
We'll look at each Transform
needed in turn. Here's how we can open an image file:
img = PILImage.create(fn); img
Then we can convert it to a C*H*W
tensor (for channel x height x width, which is the convention in PyTorch):
tconv = ToTensor()
img = tconv(img)
img.shape,type(img)
Now that's done, we can create our labels. First extracting the text label:
lbl = parent_label(fn); lbl
And then converting to an int for modeling:
tcat = Categorize(vocab=['3','7'])
lbl = tcat(lbl); lbl
We use decode
to reverse transforms for display. Reversing the Categorize
transform result in a class name we can display:
lbld = tcat.decode(lbl)
lbld
We can compose our image steps using Pipeline
:
pipe = Pipeline([PILImage.create,tconv])
img = pipe(fn)
img.shape
A Pipeline
can decode and show an item.
pipe.show(img, figsize=(1,1), cmap='Greys');
The show method works behind the scenes with types. Transforms will make sure the type of an element they receive is preserved. Here PILImage.create
returns a PILImage
, which knows how to show itself. tconv
converts it to a TensorImage
, which also knows how to show itself.
type(img)
Those types are also used to enable different behaviors depending on the input received (for instance you don't do data augmentation the same way on an image, a segmentation mask or a bounding box).
Creating your own Transform
Creating your own Transform
is way easier than you think. In fact, each time you have passed a label function to the data block API or to ImageDataLoaders.from_name_func
, you have created a Transform
without knowing it. At its base, a Transform
is just a function. Let's show how you can easily add a transform by implementing one that wraps a data augmentation from the albumentations library.
First things first, you will need to install the albumentations library. Uncomment the following cell to do so if needed:
Then it's going to be easier to see the result of the transform on a color image bigger than the mnist one we had before, so let's load something from the PETS dataset.
source = untar_data(URLs.PETS)
items = get_image_files(source/"images")
We can still open it with PILIlmage.create
:
img = PILImage.create(items[0])
img
We will show how to wrap one transform, but you can as easily wrap any set of transforms you wrapped in a Compose
method. Here let's do some ShiftScaleRotate
:
from albumentations import ShiftScaleRotate
The albumentations transform work on numpy images, so we just convert our PILImage
to a numpy array before wrapping it back in PILImage.create
(this function takes filenames as well as arrays or tensors).
aug = ShiftScaleRotate(p=1)
def aug_tfm(img):
np_img = np.array(img)
aug_img = aug(image=np_img)['image']
return PILImage.create(aug_img)
aug_tfm(img)
tfm = Transform(aug_tfm)
If you have some state in your transform, you might want to create a subclass of Transform
. In that case, the function you want to apply should be written in the encodes
method (the same way you implement forward
for PyTorch module):
class AlbumentationsTransform(Transform):
def __init__(self, aug): self.aug = aug
def encodes(self, img: PILImage):
aug_img = self.aug(image=np.array(img))['image']
return PILImage.create(aug_img)
We also added a type annotation: this will make sure this transform is only applied to PILImage
s and their subclasses. For any other object, it won't do anything. You can also write as many encodes
method you want with different type-annotations and the Transform
will properly dispatch the objects it receives.
This is because in practice, the transform is often applied as an item_tfms
(or a batch_tfms
) that you pass in the data block API. Those items are a tuple of objects of different types, and the transform may have different behaviors on each part of the tuple.
Let's check here how this works:
tfm = AlbumentationsTransform(ShiftScaleRotate(p=1))
a,b = tfm((img, 'dog'))
show_image(a, title=b);
The transform was applied over the tuple (img, "dog")
. img
is a PILImage
, so it applied the encodes
method we wrote. "dog"
is a string, so the transform did nothing to it.
Sometimes however, you need your transform to take your tuple as whole: for instance albumentations is applied simultaneously on images and segmentation masks. In this case you need to subclass ItemTransfrom
instead of Transform
. Let's see how this works:
cv_source = untar_data(URLs.CAMVID_TINY)
cv_items = get_image_files(cv_source/'images')
img = PILImage.create(cv_items[0])
mask = PILMask.create(cv_source/'labels'/f'{cv_items[0].stem}_P{cv_items[0].suffix}')
ax = img.show()
ax = mask.show(ctx=ax)
We then write a subclass of ItemTransform
that can wrap any albumentations augmentation transform, but only for a segmentation problem:
class SegmentationAlbumentationsTransform(ItemTransform):
def __init__(self, aug): self.aug = aug
def encodes(self, x):
img,mask = x
aug = self.aug(image=np.array(img), mask=np.array(mask))
return PILImage.create(aug["image"]), PILMask.create(aug["mask"])
And we can check how it gets applied on the tuple (img, mask)
. This means you can pass it as an item_tfms
in any segmentation problem.
tfm = SegmentationAlbumentationsTransform(ShiftScaleRotate(p=1))
a,b = tfm((img, mask))
ax = a.show()
ax = b.show(ctx=ax)
There is more you can implement in a Transform
: you can reverse it's behavior by adding a decodes
and setup
some state, we'll look at this in the next section:
Loading the pets dataset using only Transform
Let's see how to use fastai.data
to process the Pets dataset. If you are used to writing your own PyTorch Dataset
s, what will feel more natural is to write everything in one Transform
. We use source to refer to the underlying source of our data (e.g. a directory on disk, a database connection, a network connection, etc). Then we grab the items.
source = untar_data(URLs.PETS)/"images"
items = get_image_files(source)
We'll use this function to create consistently sized tensors from image files:
def resized_image(fn:Path, sz=128):
x = Image.open(fn).convert('RGB').resize((sz,sz))
# Convert image to tensor for modeling
return tensor(array(x)).permute(2,0,1).float()/255.
Before we can create a Transform
, we need a type that knows how to show itself (if we want to use the show method). Here we define a TitledImage
:
class TitledImage(fastuple):
def show(self, ctx=None, **kwargs): show_titled_image(self, ctx=ctx, **kwargs)
Let's check it works:
img = resized_image(items[0])
TitledImage(img,'test title').show()
To decode data for showing purposes (like de-normalizing an image or converting back an index to its corresponding class), we implement a decodes
method inside a Transform
.
class PetTfm(Transform):
def __init__(self, vocab, o2i, lblr): self.vocab,self.o2i,self.lblr = vocab,o2i,lblr
def encodes(self, o): return [resized_image(o), self.o2i[self.lblr(o)]]
def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])
The Transform
opens and resizes the images on one side, label it and convert that label to an index using o2i
on the other side. Inside the decodes
method, we decode the index using the vocab
. The image is left as is (we can't really show a filename!).
To use this Transform
, we need a label function. Here we use a regex on the name
attribute of our filenames:
labeller = using_attr(RegexLabeller(pat = r'^(.*)_\d+.jpg$'), 'name')
Then we gather all the possible labels, uniqueify them and ask for the two correspondences (vocab and o2i) using bidir=True
. We can then use them to build our pet transform.
vals = list(map(labeller, items))
vocab,o2i = uniqueify(vals, sort=True, bidir=True)
pets = PetTfm(vocab,o2i,labeller)
We can check how it's applied to a filename:
x,y = pets(items[0])
x.shape,y
And we can decode our transformed version and show it:
dec = pets.decode([x,y])
dec.show()
Note that like __call__
and encodes
, we implemented a decodes
method but we actually call decode
on our Transform
.
Also note that our decodes
method received the two objects (x and y). We said in the previous section Transform
dispatch over tuples (for the encoding as well as the decodeing) but here it took our two elements as a whole and did not try to decode x and y separately. Why is that? It's because we pass a list [x,y]
to decodes. Transform
s dispatch over tuples, but tuples only. And as we saw as well, to prevent a Transform
from dispatching over a tuple, we just have to make it an ItemTransform
:
class PetTfm(ItemTransform):
def __init__(self, vocab, o2i, lblr): self.vocab,self.o2i,self.lblr = vocab,o2i,lblr
def encodes(self, o): return (resized_image(o), self.o2i[self.lblr(o)])
def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])
dec = pets.decode(pets(items[0]))
dec.show()
We can now let's make our ItemTransform
automatically state its state form the data. This way, when we combine together our Transform
with the data, it will automatically get setup without having to do anything. This is very easy to do: just copy the lines we had before to build the categories inside the transform in a setups
method:
class PetTfm(ItemTransform):
def setups(self, items):
self.labeller = using_attr(RegexLabeller(pat = r'^(.*)_\d+.jpg$'), 'name')
vals = map(self.labeller, items)
self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)
def encodes(self, o): return (resized_image(o), self.o2i[self.labeller(o)])
def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])
Now we can create our Transform
, call its setup, and it will be ready to be used:
pets = PetTfm()
pets.setup(items)
x,y = pets(items[0])
x.shape, y
And like before, there is no problem to decode it:
dec = pets.decode((x,y))
dec.show()
class PetTfm(ItemTransform):
def setups(self, items):
self.labeller = using_attr(RegexLabeller(pat = r'^(.*)_\d+.jpg$'), 'name')
vals = map(self.labeller, items)
self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)
def encodes(self, o): return (PILImage.create(o), self.o2i[self.labeller(o)])
def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])
tfms = Pipeline([PetTfm(), Resize(224), FlipItem(p=1), ToTensor()])
Calling setup
on a Pipeline
will set each transform in order:
tfms.setup(items)
To check the setup was done properly, we want to see if we did build the vocab. One cool trick of Pipeline
is that when asking for an attribute, it will look through each of its Transform
s for that attribute and give you the result (or the list of results if the attribute is in multiple transforms):
tfms.vocab
Then we can call our pipeline:
x,y = tfms(items[0])
x.shape,y
tfms.show(tfms(items[0]))
Pipeline.show
will call decode on each Transform
until it gets a type that knows how to show itself. The library considers a tuple as knowing how to show itself if all its parts have a show
method. Here it does not happen before reaching PetTfm
since the second part of our tuple is an int. But after decoding the original PetTfm
, we get a TitledImage
which has a show
method.
It's a good point to note that the Transform
s of the Pipeline
are sorted by their internal order
attribute (with a default of order=0
). You can always check the order in which the transforms are in a Pipeline
by looking at its representation:
tfms
FlipItem.order,Resize.order
To customize the order of a Transform
, just set order = ...
before the __init__
(it's a class attribute). Let's make PetTfm
of order -5 to be sure it's always run first:
class PetTfm(ItemTransform):
order = -5
def setups(self, items):
self.labeller = using_attr(RegexLabeller(pat = r'^(.*)_\d+.jpg$'), 'name')
vals = map(self.labeller, items)
self.vocab,self.o2i = uniqueify(vals, sort=True, bidir=True)
def encodes(self, o): return (PILImage.create(o), self.o2i[self.labeller(o)])
def decodes(self, x): return TitledImage(x[0],self.vocab[x[1]])
Then we can mess up the order of the transforms in our Pipeline
but it will fix itself:
tfms = Pipeline([Resize(224), PetTfm(), FlipItem(p=1), ToTensor()])
tfms
One pipeline makes a TfmdLists
tls = TfmdLists(items, [Resize(224), PetTfm(), FlipItem(p=0.5), ToTensor()])
x,y = tls[0]
x.shape,y
We did not need to pass anything to PetTfm
thanks to our setup method: the Pipeline
was automatically setup on the items
during the initialization, so PetTfm
has created its vocab like before:
tls.vocab
We can ask the TfmdLists
to show the items we got:
tls.show((x,y))
Or we have a shortcut with show_at
:
show_at(tls, 0)
TfmdLists
has an 's' in its name because it can represent several transformed lists: your training and validation sets. To use that functionality, we just need to pass splits
to the initialization. splits
should be a list of lists of indices (one list per set). To help create splits, we can use all the splitters of the fastai library:
splits = RandomSplitter(seed=42)(items)
splits
tls = TfmdLists(items, [Resize(224), PetTfm(), FlipItem(p=0.5), ToTensor()], splits=splits)
Then your tls
get a train and valid attributes (it also had them before, but the valid was empty and the train contained everything).
show_at(tls.train, 0)
An interesting thing is that unless you pass train_setup=False
, your transforms are setup on the training set only (which is best practices): the items
received by setups
are just the elements of the training set.
Getting to DataLoaders
From a TfmdLists
, getting a DataLoaders
object is very easy, you just have to call the dataloaders
method:
dls = tls.dataloaders(bs=64)
And show_batch
will just work:
dls.show_batch()
You can even add augmentation transforms, since we have a proper fastai typed image. Just remember to add the IntToFloatTensor
transform that deals with the conversion of int to float (augmentation transforms of fastai on the GPU require float tensors). When calling TfmdLists.dataloaders
, you pass the batch_tfms
to after_batch
(and potential new item_tfms
to after_item
):
dls = tls.dataloaders(bs=64, after_batch=[IntToFloatTensor(), *aug_transforms()])
dls.show_batch()
Using Datasets
Datasets
applies a list of list of transforms (or list of Pipeline
s) lazily to items of a collection, creating one output per list of transforms/Pipeline
. This makes it easier for us to separate out steps of a process, so that we can re-use them and modify the process more easily. This is what lays the foundation of the data block API: we can easily mix and match types as inputs or outputs as they are associated to certain pipelines of transforms.
For instacnce, let's write our own ImageResizer
transform with two different implementations for images or masks:
class ImageResizer(Transform):
order=1
"Resize image to `size` using `resample`"
def __init__(self, size, resample=Image.BILINEAR):
if not is_listy(size): size=(size,size)
self.size,self.resample = (size[1],size[0]),resample
def encodes(self, o:PILImage): return o.resize(size=self.size, resample=self.resample)
def encodes(self, o:PILMask): return o.resize(size=self.size, resample=Image.NEAREST)
Specifying the type-annotations makes it so that our transform does nothing to thigns that are neither PILImage
or PILMask
, and resize images with self.resample
, masks with the nearest neighbor interpolation. To create a Datasets
, we then pass two pipelines of transforms, one for the input and one for the target:
tfms = [[PILImage.create, ImageResizer(128), ToTensor(), IntToFloatTensor()],
[labeller, Categorize()]]
dsets = Datasets(items, tfms)
We can check that inputs and outputs have the right types:
t = dsets[0]
type(t[0]),type(t[1])
We can decode and show using dsets
:
x,y = dsets.decode(t)
x.shape,y
dsets.show(t);
And we can pass our train/validation split like in TfmdLists
:
dsets = Datasets(items, tfms, splits=splits)
But we are not using the fact that Transform
s dispatch over tuples here. ImageResizer
, ToTensor
and IntToFloatTensor
could be passed as transforms over the tuple. This is done in .dataloaders
by passing them to after_item
. They won't do anything to the category but will only be applied to the inputs.
tfms = [[PILImage.create], [labeller, Categorize()]]
dsets = Datasets(items, tfms, splits=splits)
dls = dsets.dataloaders(bs=64, after_item=[ImageResizer(128), ToTensor(), IntToFloatTensor()])
And we can check it works with show_batch
:
dls.show_batch()
If we just wanted to build one DataLoader
from our Datasets
(or the previous TfmdLists
), you can pass it directly to TfmdDL
:
dsets = Datasets(items, tfms)
dl = TfmdDL(dsets, bs=64, after_item=[ImageResizer(128), ToTensor(), IntToFloatTensor()])
By using the same transforms in after_item
but a different kind of targets (here segmentation masks), the targets are automatically processed as they should with the type-dispatch system.
cv_source = untar_data(URLs.CAMVID_TINY)
cv_items = get_image_files(cv_source/'images')
cv_splitter = RandomSplitter(seed=42)
cv_split = cv_splitter(cv_items)
cv_label = lambda o: cv_source/'labels'/f'{o.stem}_P{o.suffix}'
tfms = [[PILImage.create], [cv_label, PILMask.create]]
cv_dsets = Datasets(cv_items, tfms, splits=cv_split)
dls = cv_dsets.dataloaders(bs=64, after_item=[ImageResizer(128), ToTensor(), IntToFloatTensor()])
dls.show_batch(max_n=4)
If we want to use the augmentation transform we created before, we just need to add one thing to it: we want it to be applied on the training set only, not the validation set. To do this, we specify it should only be applied on a specific idx
of our splits by adding split_idx=0
(0 is for the training set, 1 for the validation set):
class SegmentationAlbumentationsTransform(ItemTransform):
split_idx = 0
def __init__(self, aug): self.aug = aug
def encodes(self, x):
img,mask = x
aug = self.aug(image=np.array(img), mask=np.array(mask))
return PILImage.create(aug["image"]), PILMask.create(aug["mask"])
And we can check how it gets applied on the tuple (img, mask)
. This means you can pass it as an item_tfms
in any segmentation problem.
cv_dsets = Datasets(cv_items, tfms, splits=cv_split)
dls = cv_dsets.dataloaders(bs=64, after_item=[ImageResizer(128), ToTensor(), IntToFloatTensor(),
SegmentationAlbumentationsTransform(ShiftScaleRotate(p=1))])
dls.show_batch(max_n=4)
Let's take back our pets dataset...
tfms = [[PILImage.create], [labeller, Categorize()]]
dsets = Datasets(items, tfms, splits=splits)
dls = dsets.dataloaders(bs=64, after_item=[ImageResizer(128), ToTensor(), IntToFloatTensor()])
...and imagine we have some new files to classify.
path = untar_data(URLs.PETS)
tst_files = get_image_files(path/"images")
len(tst_files)
We can create a dataloader that takes those files and applies the same transforms as the validation set with DataLoaders.test_dl
:
tst_dl = dls.test_dl(tst_files)
tst_dl.show_batch(max_n=9)
Extra:
You can call learn.get_preds
passing this newly created dataloaders to make predictions on our new images!
What is really cool is that after you finished training your model, you can save it with learn.export
, this is also going to save all the transforms that need to be applied to your data. In inference time you just need to load your learner with load_learner
and you can immediately create a dataloader with test_dl
to use it to generate new predictions!