In Chapters 6 and 7, we use algorithms such as regularized linear models, support vector machines, and naive Bayes models to predict outcomes from predictors including text data. Deep learning models approach the same tasks and have the same goals, but the algorithms involved are different. Deep learning models are “deep” in the sense that they use multiple layers to learn how to map from input features to output outcomes; this is in contrast to the kinds of models we used in the previous two chapters which use a shallow (single) mapping.

Deep learning models can be effective for text prediction problems because they use these multiple layers to capture complex relationships in language.

The layers in a deep learning model are connected in a network and these models are called neural networks, although they do not work much like a human brain. The layers can be connected in different configurations called network architectures. Three of the most common architectures used for text data are a recurrent neural network (RNN), a convolutional neural network (CNN), and long short-term memory (LSTM)7. These architectures sometimes incorporate word embeddings, as described in Chapter 5.

TODO: Algorithmic bias specific to deep learning or similar

For the following chapters, we will use tidymodels packages along with Tensorflow and the R interface to Keras (Allaire and Chollet 2020) for preprocessing, modeling, and evaluation.

Table 7.1 presents some key differences between deep learning and what, in this book, we call machine learning methods.

TABLE 7.1: Comparing deep learning with other machine learning methods
Machine learning Deep learning
Faster to train Takes more time to train
Software is typically easier to install Software can be more challenging to install
Can achieve good performance with less data Requires more data for good performance
Depends on preprocessing to model more than very simple relationships Can model highly complex relationships


Allaire, JJ, and François Chollet. 2020. Keras: R Interface to ’Keras’.

  1. In other situations you may do best using a different architecture, for example, when working with dense, tabular data.↩︎