3  Data Ethics

As we discussed in Chapters 1 and 2, sometimes machine learning models can go wrong. They can have bugs. They can be presented with data that they haven’t seen before, and behave in ways we don’t expect. Or they could work exactly as designed, but be used for something that we would much prefer they were never, ever used for.

Because deep learning is such a powerful tool and can be used for so many things, it becomes particularly important that we consider the consequences of our choices. The philosophical study of ethics is the study of right and wrong, including how we can define those terms, recognize right and wrong actions, and understand the connection between actions and consequences. The field of data ethics has been around for a long time, and there are many academics focused on this field. It is being used to help define policy in many jurisdictions; it is being used in companies big and small to consider how best to ensure good societal outcomes from product development; and it is being used by researchers who want to make sure that the work they are doing is used for good, and not for bad.

As a deep learning practitioner, therefore, it is likely that at some point you are going to be put in a situation where you need to consider data ethics. So what is data ethics? It’s a subfield of ethics, so let’s start there.

J: At university, philosophy of ethics was my main thing (it would have been the topic of my thesis, if I’d finished it, instead of dropping out to join the real world). Based on the years I spent studying ethics, I can tell you this: no one really agrees on what right and wrong are, whether they exist, how to spot them, which people are good, and which bad, or pretty much anything else. So don’t expect too much from the theory! We’re going to focus on examples and thought starters here, not theory.

In answering the question “What Is Ethics”, The Markkula Center for Applied Ethics says that the term refers to:

There is no list of right answers. There is no list of do and don’t. Ethics is complicated, and context-dependent. It involves the perspectives of many stakeholders. Ethics is a muscle that you have to develop and practice. In this chapter, our goal is to provide some signposts to help you on that journey.

Spotting ethical issues is best to do as part of a collaborative team. This is the only way you can really incorporate different perspectives. Different people’s backgrounds will help them to see things which may not be obvious to you. Working with a team is helpful for many “muscle-building” activities, including this one.

This chapter is certainly not the only part of the book where we talk about data ethics, but it’s good to have a place where we focus on it for a while. To get oriented, it’s perhaps easiest to look at a few examples. So, we picked out three that we think illustrate effectively some of the key topics.


This is just a preview of this chapter. The rest of this chapter is not available here, but you read the source notebook which has the same content (but with less nice formatting).