Tracking the runs of the workflow🔗

Since most of the time workflows are run multiple times, it can be useful to have a history of past runs and their results. For this bandsaw comes with a bandsaw.tracking.tracker.TrackerExtension.

This extension tracks information about the different parts of a workflow, be it the run itself or the executed tasks with their result. This information is then send to a backend where the data is stored and can be accessed and integrated into ones own process and monitoring solution. Currently, the only available backend is the bandsaw.tracking.filesystem.FileSystemBackend which stores the data in json format in a local directory.

Configuration🔗

backend (bandsaw.tracking.backend.Backend)🔗

backend is a required positional argument. It contains the pre-configured backend instance that will be used for storing the tracking information. The object has to inherit from type bandsaw.tracking.backend.Backend.

Example configuration🔗

import bandsaw

from bandsaw.tracking.tracker import TrackerExtension
from bandsaw.tracking.filesystem import FileSystemBackend

tracking_directory = '/path/to/my/tracking/directory'

configuration = bandsaw.Configuration()
configuration.add_extension(TrackerExtension(FileSystemBackend(tracking_directory)))

Backends🔗

All backends need to inherit from the base class bandsaw.tracking.backend.Backend which defines the interface that is being expected from the tracker extension. It defines a set of methods which take the information about the workflow:

def track_run(self, ids, run_info):
    pass

def track_task(self, ids, task_info):
    pass

def track_execution(self, ids, execution_info):
    pass

def track_session(self, ids, session_info):
    pass

def track_result(self, ids, result_info):
    pass

def track_attachments(self, ids, attachments):
    pass

def track_distribution_archive(self, distribution_archive):
    pass

All *_infos that are passed over are dictionaries which contain the information.

FileSystemBackend🔗

The first available backend is the bandsaw.tracking.filesystem.FileSystemBackend which writes the info objects to the file system. For this it takes a path to a directory as configuration, where the individual *-info.json files are stored. The directory layout follows a specific schema:

.tracking/
├── runs
│   ├── 025c8e7e-6992-11ec-8e2d-48f17f64520d
│   │   ├── 0d268ac06a82654e_76560cb43662cc8f_025c8e7e-6992-11ec-8e2d-48f17f64520d
│   │   └── run-info.json
│   └── 6aa525e8-698f-11ec-8e2d-48f17f64520d
│       ├── 0d268ac06a82654e_76560cb43662cc8f_6aa525e8-698f-11ec-8e2d-48f17f64520d
│       └── run-info.json
└── tasks
    └── 0d268ac06a82654e
        ├── 76560cb43662cc8f
        │   ├── 025c8e7e-6992-11ec-8e2d-48f17f64520d
        │   │   ├── attachments
        │   │   │   ├── metrics.json
        │   │   │   └── session.log
        │   │   ├── result-info.json
        │   │   └── session-info.json
        │   ├── 6aa525e8-698f-11ec-8e2d-48f17f64520d
        │   │   ├── attachments
        │   │   │   ├── metrics.json
        │   │   │   └── session.log
        │   │   ├── result-info.json
        │   │   └── session-info.json
        │   └── execution-info.json
        └── task-info.json

tasks contains on its level a single directory for each individual task whose name is the task_id of the containing task. The task directory contains a single task-info.json file, which contains a json object with some information about this specific tasks. Every execution of this task is stored in a separate subdirectory named with the execution_id of the corresponding execution. These directories contain a execution-info.json file with meta information about the arguments used in this execution and subdirectories named by the run_id for every specific run, where the task was executed with these specific arguments. This directory contains then an session-info.json with information about the session of this execution and the result-info.json which describes the computed result. If attachments where created by the advice chain while executing the task, those are stored in the attachments directory. Additionally, in the root of the tracking directory a directory runs stores information about the executed sessions for each individual run. Every run has their own specific directory named by its run_id This directory contains a run-info.json and empty files named with the session_ids of every session that was computed during this run.

Example JSON info file format🔗

All *-info.json files have the same format and share some of their content. An example can be found below. Most files contain only a subset of this example data.

{
  "task": {
    "id": "0d268ac06a82654e",
    "definition": "demo.greet",
    "advice_parameters": {}
  },
  "execution": {
    "id": "76560cb43662cc8f",
    "arguments": [
      {
        "type": "str",
        "value": "Christoph",
        "size": "9",
        "name": "recipient"
      }
    ]
  },
  "run": {
    "id": "6aa525e8-698f-11ec-8e2d-48f17f64520d"
  },
  "configuration": "bandsaw_config",
  "distribution_archive": {
    "modules": [
      "__main__",
      "bandsaw",
      "bandsaw_config",
      "multimeter"
    ],
    "id": null
  },
  "session": {
    "id": "0d268ac06a82654e_76560cb43662cc8f_6aa525e8-698f-11ec-8e2d-48f17f64520d"
  },
  "result": {
    "value": {
      "type": "int",
      "value": "16421"
    }
  }
}

Last update: 2021-12-31