The main class to get your data ready for model training is TabularDataLoaders and its factory methods. Checkout the tabular tutorial for examples of use.

This class should not be used directly, one of the factory methods should be preferred instead. All those factory methods accept as arguments:

cat_names: the names of the categorical variables
cont_names: the names of the continuous variables
y_names: the names of the dependent variables
y_block: the TransformBlock to use for the target
valid_idx: the indices to use for the validation set (defaults to a random split otherwise)
bs: the batch size
val_bs: the batch size for the validation DataLoader (defaults to bs)
shuffle_train: if we shuffle the training DataLoader or not
n: overrides the numbers of elements in the dataset
device: the PyTorch device to use (defaults to default_device())

Let's have a look on an example with the adult dataset:

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
df.head()

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]

dls = TabularDataLoaders.from_df(df, path, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                 y_names="salary", valid_idx=list(range(800,1000)), bs=64)

dls.show_batch()

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                  y_names="salary", valid_idx=list(range(800,1000)), bs=64)

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	sex	capital-gain	capital-loss	hours-per-week	native-country	salary
0	49	Private	101320	Assoc-acdm	12.0	Married-civ-spouse	NaN	Wife	White	Female	0	1902	40	United-States	>=50k
1	44	Private	236746	Masters	14.0	Divorced	Exec-managerial	Not-in-family	White	Male	10520	0	45	United-States	>=50k
2	38	Private	96185	HS-grad	NaN	Divorced	NaN	Unmarried	Black	Female	0	0	32	United-States	<50k
3	38	Self-emp-inc	112847	Prof-school	15.0	Married-civ-spouse	Prof-specialty	Husband	Asian-Pac-Islander	Male	0	0	40	United-States	>=50k
4	42	Self-emp-not-inc	82297	7th-8th	NaN	Married-civ-spouse	Other-service	Wife	Black	Female	0	0	50	United-States	<50k

	workclass	education	marital-status	occupation	relationship	race	education-num_na	age	fnlwgt	education-num	salary
0	Private	HS-grad	Divorced	Sales	Not-in-family	White	False	40.0	116632.001407	9.0	>=50k
1	State-gov	Some-college	Never-married	Protective-serv	Own-child	Black	False	22.0	293363.998886	10.0	<50k
2	Private	HS-grad	Divorced	Craft-repair	Own-child	White	False	35.0	126568.998886	9.0	<50k
3	Private	Masters	Divorced	Exec-managerial	Unmarried	Black	False	39.0	150061.001071	14.0	>=50k
4	Private	Some-college	Never-married	Sales	Own-child	White	False	21.0	283756.998474	10.0	<50k
5	Private	Masters	Married-civ-spouse	Sales	Husband	White	False	29.0	134565.997603	14.0	<50k
6	Self-emp-not-inc	HS-grad	Married-civ-spouse	Farming-fishing	Husband	White	False	39.0	148442.999504	9.0	<50k
7	Private	Some-college	Married-civ-spouse	Adm-clerical	Husband	White	False	49.0	280524.999991	10.0	>=50k
8	Local-gov	HS-grad	Divorced	Handlers-cleaners	Not-in-family	White	False	39.0	166497.000063	9.0	>=50k
9	?	11th	Never-married	?	Own-child	White	False	17.0	47407.001911	7.0	<50k

Tabular data

`class` `TabularDataLoaders`[source]

`TabularDataLoaders.from_df`[source]

`TabularDataLoaders.from_csv`[source]

Tabular data

class TabularDataLoaders[source]

TabularDataLoaders.from_df[source]

TabularDataLoaders.from_csv[source]

`class` `TabularDataLoaders`[source]

`TabularDataLoaders.from_df`[source]

`TabularDataLoaders.from_csv`[source]