Experiment Tracking Tip

2020-09-03

Here’s an easy technique that has streamlined my workflows.

Often there are things that you want to track with your data. I work with collections of tabular data. For example, I may have seven datasets, each representing data collected on a specific day of the week. Often, I want to perform experiments on each dataset. I found it useful to create a dataset object to instantiate and track these objects.

class dataset:
    def __init__(self, path, target, rate, label='', note=''):
        self.path = path
        self.target = target
        self.label_path = label
        self.note = note
        self.accuracy = 0

For me, some key details are the path to the data, the target value (I’m counting binary events), the path to any label file, and any notes about the dataset. Obviously, this class could be extended for any other attributes you need (or even methods). With the elements I have here, the object attributes capture enough information to load data, perform a test, and generate accuracy metrics. For example, if you had an evaluate() function and a model(), you could:

for data in datasets:
    data.accuracy = evaluate(model(data.path), data.label_path)

This structure gives me an organized way to iterate through my datasets. I usually create a list of these objects, then perform my experiments by iterating on the list. In embarrassingly parallel scenarios, it’s easy to punt function calls on these dataset objects over to joblib.

PythonData Science

Labeling Tip

Scikit-Learn Sprint