This user guide can be used as a starting point for getting a deeper understanding of the inner workings of the bandsaw library. It is meant for users who want to learn about individual details or who plan to extend its features by developing own advices or extensions.
First we start with some high-level concepts that we use throughout the library. The purpose of this sections is to explain the structure of bandsaw and introduce a common set of terms that helps to talk about the underlying ideas
The fundamental basis in bandsaw is the idea of a workflow. We think of a workflow as a single python script, that contains code for all individual steps that are performed in a defined sequence.
The individual steps can have dependencies on external inputs and on results of other
steps, so that they form a directed acyclic graph. Those steps are referred to as
Tasks within bandsaw.
Tasks are pieces of code, that are used to process data. Task instances can be created
by calling the
create_task(cls, obj) class
function in the
Task class. At the moment, bandsaw
supports only free functions defined on module level as tasks.
Usually, tasks are defined by adding the
decorator to the code that should be executed as a task.
@bandsaw.task def my_function(x): return x
Each task has a unique
task_id that is derived from the code that is executed. This can
be used to differentiate between different tasks.
Tasks can take arbitrary arguments, which means we can have multiple executions of the
same task, that differ in the given arguments. This is captured by a
Run object, which encapsulates the arguments for a specific
task execution. Similar to the
task_id, each run has a separate
differentiates between different runs.
Objects that implement the
Advice protocol are the
mechanism that allows bandsaw to influence the execution of the task. Each advice class
can implement two different methods,
before(self, session) and
before() method is called before the task is actually executed and allows the
advice to make changes to the way the task is executed, e.g. running the task on a
different system or returning a result early without executing the task at all.
after() method is called after the task was actually executed and returned a
result. This allows an advice to make changes to the result or use it in a different way
than just returning it.
Both methods decide on what happens after by calling the appropriate method on the
session, the sole argument both methods are taking.
Session is the object, that manages the process
of executing a task for a specific run. It defines the different actions that advices
can take when their
after() methods are called.
When a task is being called within a workflow, a new
Session object is instantiated
with the task, a run object containing the tasks arguments and a list of advices, called
the "advice chain", that should be used for advising the execution.
before() executing a task.🔗
The session calls the
before() methods of all advices one after another in the order
that the advices are defined in the advice chain. Each
Advice has to tell the session
- continue with the next advice by calling
- return early by skipping the following advices AND the actual task execution by
session.conclude(result)and providing a
Resultthat should be used instead. After the advice has concluded with a result, the only advices whose
after()methods will be called are the advices, that come before in the advice chain.
Executing the task🔗
If all advices of the advice chain decided to
proceed(), the session will execute
that task and keep the value it returns. If an exception or error is raised during
the execution it will be stored in the
instance in the session.
after() executing a task🔗
After the task has been executed, the session begins to call the
after() methods of
the advices in the REVERSED order from the advice chain. This means the advice that was
called last for
before() is called first for
Now each advice can decide how to continue. Similar to
before() there are two
- continue with the current result and the next advice by calling
- return a different result and continue with that by calling
session.conclude(result)with a different
Resultinstance. All following advices will be called using
after()with a session containing the new result.
Once all advices have finished, the session will unpack the result and either return its value to the workflow or re-raise the error.
Serializing the session🔗
In order to move the execution of a task across different python interpreters, an advice
can use the capability of a
Session to serialize its state to a stream and recreate it
at a later point in time or on a different platform.
For this the session contains two methods,
restore(stream). Both methods will
(de-)serialize the complete session including context, task, run and result. The only
thing missing here are the objects from the advice chain, since bandsaw can't enforce them
to be serializable. This means, that the same advice chain with the same name must be
available from the configuration at the time, the session is restored.
For an example, how transfer to a different python interpreter can be implemented, please
look at the implementation of the
Bandsaw needs configuration to know which advices to apply to the individual tasks. This
configuration is given in form of an object of the
bandsaw.config.Configuration class. Just
creating a new
Configuration object will create an empty, but working configuration,
that actually does nothing and executes tasks without any changes.
import bandsaw configuration = bandsaw.Configuration()
The class has all the required methods to configure the different aspects of bandsaw.
An advice chain is a sequence of objects implementing the
Advice protocol, that should
be used for advising task executions. An advice chain is added to the configuration
method. It takes instances of
Advice as positional arguments with an optional name
... configuration.add_device_chain( bandsaw.advices.logging.LoggingAdvice(), )
Each advice chain has a name, that can be used to choose which chain to use per task.
name is given, the chain with name 'default' is configured. Already existing
chains will be overwritten. So if you configure two different advice chains, with no or
the same name, the latter will replace the former.
In order to transfer tasks between different python interpreters, bandsaw needs the
capability to serialize tasks, their arguments and internal classes. For this bandsaw
class, that can be implemented to support different types of serialization.
Which serializer to use, can be configured as part of the configuration:
from bandsaw.serialization.json import JsonSerializer ... configuration.set_serializer(JsonSerializer())
Bandsaw comes with two different serialization formats:
the advantage, that it works out of the box with most standard python types. It uses
the standard python
to serialize python objects and should work across different python versions. If custom
types need to be serialized (e.g. as part of some arguments to a task), that don't work
with pickle, support for pickle can easily be
One disadvantage of pickle is that sometimes the serialized representation of a value
is not unique. Since bandsaw uses the serialized form of arguments to derive the
run_id of a run, this can lead to inconsistencies, when the same arguments can lead
to different run_ids.
JsonSerializer uses JSON as
format for the serialized data. The standard
library supports only primitive types like strings, dict, int etc. so for all complex
types one need to explicitly add code to serialize them. Bandsaw implements support for
serializing exceptions, tuples and all of its internal classes, so that they can be
serialized to json.
To serialize a custom type to json, bandsaw offers two options:
The easiest way to make your custom type json serializable is to inherit from the
SerializableValue base class
and implement the abstract methods
serialized(self) must return a value that is json serializable (e.g. a dict
containing only primitives). The class method
deserialize(cls, values) is given this
value and returns a new instance of the class.
Create a ValueSerializer🔗
If the custom type can't be changed, there is the option to create a new
ValueSerializer class that
can serialize this type. It consists of 3 different methods:
Has to return
True if value is of the type that this particular serializer can serialize.
Has to return the serialized representation of the type, consisting only of primitives that are json serializable.
Has to return a new instance of custom_type from its serialized representation.
Finally, the new
ValueSerializer needs to be added to the json serializer as part
of the configuration:
How to use bandsaw🔗
Now with some knowledge about the different concepts within bandsaw at our hands, let's dive into the topic of how to put the library to good use.
Install the library🔗
Use stable release from PyPI🔗
All stable versions of bandsaw are available on PyPI
and can be downloaded and installed from there. The easiest option to get it installed
into your python environment is by using
pip install bandsaw
Use from source🔗
Bandsaw's Git repository is available for everyone and can easily be cloned into a new repository on your local machine:
$ cd /your/local/directory $ git clone https://gitlab.com/kantai/bandsaw.git $ cd bandsaw
If you want to make changes to library, please follow the guidance in the README.md on how to setup the necessary tools for testing your changes.
If you just want to use the library, it is sufficient to add the path to your local
bandsaw repository to your
$PYTHONPATH variable, e.g.:
$ export PYTHONPATH="$PYTHONPATH:/your/local/directory/bandsaw"
Defining the individual tasks of your workflow🔗
In order to use bandsaw in your workflow, you first have to import its package.
import bandsaw ...
Splitting up your workflow into individual tasks can be done by annotating the
individual functions with the
@bandsaw.task def my_function(x): ... return x
The decorator can be only applied to free functions in the current implementation, so it isn't possible to decorate methods of classes for now:
class MyClass: @bandsaw.task def my_method(self, x): return x
Bandsaw loads its configuration automatically at the time it is needed. This happens by
dynamically importing a python module that contains an instance of the class
Configuration assigned to the variable
import bandsaw configuration = bandsaw.Configuration()
Bandsaw expects the configuration to be found in the python module
name of the module can be changed in two ways:
BANDSAW_CONFIGenvironment variable to a different module name. Bandsaw reads the name of the module containing the configuration from this variable if it is set. So by setting it to e.g.
my_configit will import
my_configand take the configuration from it instead.
Add an additional keyword argument to the
@taskdecorator that contains the name of the configuration to use for this task:By adding the
@bandsaw.task(config='my_config_module') def my_function(x): ... return x
configkeyword argument we can choose the configuration we want to use per task.
The existence of the configuration is checked as soon as the decorator is being applied. If no configuration could be loaded, an error is raised, usually at module import time.
Choose the advice chain🔗
Each task can define, which advice chain should be used. This can be done, by adding a
chain to the task decorator:
@bandsaw.task(chain='my_chain') def my_function(x): ... return x
If no chain exists with the specified name, a
KeyError is raised.
Bandsaw is extensible and allows developers to add new functionality in various ways:
Implementing custom advices🔗
Advices are the main building blocks for adding new functionality. Implementing an advice is easy, the only necessary work is to adhere to a protocol and follow some simple rules.
- Advices have to call the session.
When called via
after(), the advice is responsible for telling the
sessionhow to continue. They must call once either
session.conclude(). Failure to do so, will lead to a
RuntimeErrorraised by the session.
- Advices shouldn't keep state themselves.
Advice instances are used for multiple tasks. Therefore, their implementation shouldn't
store any task specific state themselves. If task specific state is necessary, it can be
easily added to the
contextof the session.
- Advices may be instantiated multiple times and on different machines.
Everytime a task is transferred to a different python interpreter, its configuration and
with it the configured advices are instantiated again, even if they aren't actually
called. Therefore, advices shouldn't do much work or interact with their environment
at instantiation time, but only when they are actually called via
before(). Usually they can assume that both methods are called in the same environment for the same task.