12  A Language Model from Scratch

We’re now ready to go deep… deep into deep learning! You already learned how to train a basic neural network, but how do you go from there to creating state-of-the-art models? In this part of the book we’re going to uncover all of the mysteries, starting with language models.

You saw in Chapter 10 how to fine-tune a pretrained language model to build a text classifier. In this chapter, we will explain to you what exactly is inside that model, and what an RNN is. First, let’s gather some data that will allow us to quickly prototype our various models.

12.1 The Data

Whenever we start working on a new problem, we always first try to think of the simplest dataset we can that will allow us to try out methods quickly and easily, and interpret the results. When we started working on language modeling a few years ago we didn’t find any datasets that would allow for quick prototyping, so we made one. We call it Human Numbers, and it simply contains the first 10,000 numbers written out in English.

j: One of the most common practical mistakes I see even amongst highly experienced practitioners is failing to use appropriate datasets at appropriate times during the analysis process. In particular, most people tend to start with datasets that are too big and too complicated.


This is just a preview of this chapter. The rest of this chapter is not available here, but you read the source notebook which has the same content (but with less nice formatting).