========= Autoflow ========= As apps are defined as regular Python functions, we decided to use Python also to define flows. The main goal was to alleviate users from manually defining dependencies between flow components. The idea is to turn processors of apps into *functions stubs* - empty functions that copies the signature of the processor. This enables two important things: 1. The function stub gives the user a hint of what the processor expects as input and what it returns as output. It also copies docstrings. So function is self-contained and its purpose is clear. 2. As stub does not have any implementation, you are not required to install any dependencies that are necessary to run the processor. Once you installed :code:`malevich` package you can use a processor of any complexity (API calls, model inference, etc.) without installing extra packages. To better understand what happens to apps when they are used in flows, consider the following processor: .. code-block:: python from transformers import AutoTokenizer @processor() def tokenize(text: DF, ctx: Context) -> DF: """Tokenize texts using BERT tokenizer.""" tokenizer = AutoTokenizer.from_pretrained("bert-base-cased") # Text pre-processing text = ... # Actual tokenization tokens = ... # Return a dataframe with tokens return pd.DataFrame({ "tokens": ... }) Once you install app, a function stub is created in the :code:`malevich` package. The stub is a regular Python function that copies the signature of the processor and has no implementation: .. code-block:: python from malevich._autoflow.function import autotrace, @autotrace def tokenize(text, config: dict = {}): """Tokenize texts using BERT tokenizer.""" return OperationNode(...) So, to define a flow with this processor, you do not need to install `transformers` package. Simply, import the function stub and use it as a regular Python function: .. code-block:: python from malevich.my_app import tokenize @flow() def tokenize_flow(): texts = collection(...) # Calling tokenize function stub # will create an operation node tokens = tokenize(texts) return tokens task = tokenize_flow() task.interpret() # Remote execution print(task()) .. note:: Such a simple design achieved by **Autoflow** - a special engine that enables dependency tracking. Whenever you call one of special Malevich functions: :code:`collection`, :code:`asset.file`, :code:`asset.mutltifile`, or a processor stub, you produce a special kind of entities - tracers. They are wrappers around arbitrary objects, which hold a reference to current plan of execution and advance it when you passed to other stubs. To be precise, in the example above, :code:`texts` is a traced object, and when it is passed to function stub :code:`tokenize`, a new dependency (texts → tokenize) is registred within the flow. .. warning:: Beware that :code:`tokens` has nothing to do with the actual result of the :code:`tokenize` processor. It is just a placeholder, that is used to define a dependency between :code:`texts` and :code:`tokenize`. The actual result of the flow retrieved by returning :code:`tokens` from the flow function, running the flow, and requesting results. See `Working with Results `_ for more details.