Helper functions to get data in a `DataLoaders` in the tabular application and higher class `TabularDataLoaders`

The main class to get your data ready for model training is TabularDataLoaders and its factory methods. Checkout the tabular tutorial for examples of use.

class TabularDataLoaders[source]

TabularDataLoaders(*loaders, path='.', device=None) :: DataLoaders

Basic wrapper around several DataLoaders with factory methods for tabular data

This class should not be used directly, one of the factory methods should be preferred instead. All those factory methods accept as arguments:

  • cat_names: the names of the categorical variables
  • cont_names: the names of the continuous variables
  • y_names: the names of the dependent variables
  • y_block: the TransformBlock to use for the target
  • valid_idx: the indices to use for the validation set (defaults to a random split otherwise)
  • bs: the batch size
  • val_bs: the batch size for the validation DataLoader (defaults to bs)
  • shuffle_train: if we shuffle the training DataLoader or not
  • n: overrides the numbers of elements in the dataset
  • device: the PyTorch device to use (defaults to default_device())

TabularDataLoaders.from_df[source]

TabularDataLoaders.from_df(df, path='.', procs=None, cat_names=None, cont_names=None, y_names=None, y_block=None, valid_idx=None, bs=64, val_bs=None, shuffle_train=True, n=None, device=None)

Create from df in path using procs

Let's have a look on an example with the adult dataset:

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
df.head()
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country salary
0 49 Private 101320 Assoc-acdm 12.0 Married-civ-spouse NaN Wife White Female 0 1902 40 United-States >=50k
1 44 Private 236746 Masters 14.0 Divorced Exec-managerial Not-in-family White Male 10520 0 45 United-States >=50k
2 38 Private 96185 HS-grad NaN Divorced NaN Unmarried Black Female 0 0 32 United-States <50k
3 38 Self-emp-inc 112847 Prof-school 15.0 Married-civ-spouse Prof-specialty Husband Asian-Pac-Islander Male 0 0 40 United-States >=50k
4 42 Self-emp-not-inc 82297 7th-8th NaN Married-civ-spouse Other-service Wife Black Female 0 0 50 United-States <50k
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_df(df, path, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                 y_names="salary", valid_idx=list(range(800,1000)), bs=64)
dls.show_batch()
workclass education marital-status occupation relationship race education-num_na age fnlwgt education-num salary
0 Private HS-grad Divorced Sales Not-in-family White False 40.0 116632.001407 9.0 >=50k
1 State-gov Some-college Never-married Protective-serv Own-child Black False 22.0 293363.998886 10.0 <50k
2 Private HS-grad Divorced Craft-repair Own-child White False 35.0 126568.998886 9.0 <50k
3 Private Masters Divorced Exec-managerial Unmarried Black False 39.0 150061.001071 14.0 >=50k
4 Private Some-college Never-married Sales Own-child White False 21.0 283756.998474 10.0 <50k
5 Private Masters Married-civ-spouse Sales Husband White False 29.0 134565.997603 14.0 <50k
6 Self-emp-not-inc HS-grad Married-civ-spouse Farming-fishing Husband White False 39.0 148442.999504 9.0 <50k
7 Private Some-college Married-civ-spouse Adm-clerical Husband White False 49.0 280524.999991 10.0 >=50k
8 Local-gov HS-grad Divorced Handlers-cleaners Not-in-family White False 39.0 166497.000063 9.0 >=50k
9 ? 11th Never-married ? Own-child White False 17.0 47407.001911 7.0 <50k

TabularDataLoaders.from_csv[source]

TabularDataLoaders.from_csv(csv, path='.', procs=None, cat_names=None, cont_names=None, y_names=None, y_block=None, valid_idx=None, bs=64, val_bs=None, shuffle_train=True, n=None, device=None)

Create from csv file in path using procs

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, procs=procs, cat_names=cat_names, cont_names=cont_names, 
                                  y_names="salary", valid_idx=list(range(800,1000)), bs=64)