===================
malevich.square.df
===================

:class:`DF`, :class:`DFS`, :class:`OBJ`, :class:`M` and :class:`Sink` are special types
used to denote a specification of units like processors in your apps. 

.. automodule:: malevich.square.df

    .. autoclass:: malevich.square.df.DF
        
        .. automethod:: malevich.square.df.DF.scheme_name

    .. class:: DFS

        Wrapper class for tabular data.
        
        DFS is a container for multiple DFs. It is used to denote an
        output of processors that return multiple data frames. 
        
        Each of the elements of DFS is also a :class:`DF` or :class:`DFS`.

        Usage
        =====
        
        DFS is primarily used to denote types of arguments of processors. There
        are couple of cases to consider:
        
        1. Explicit number of inputs
        -----------------------------
        
        Once you know the number of inputs, and their schemes, you can use DFS
        in the following way:
        
        .. code-block:: python
        
            from typing import Any
            from malevich.square import DFS, processor
            
            @processor()
            def my_processor(dfs: DFS["users", Any]):
                ...
                
        Here, we have one input argument (from either collection or previous app) that
        consists of two data frames. The first data frame has scheme "users",
        and the second data frame has an arbitrary scheme.
        
        1. Variable number of inputs
        -----------------------------
        
        You may also assume an unbouded number of inputs. In this case, you
        should use :class:`malevich.square.jls.M` together with DFS:
        
        .. code-block:: python
        
            from typing import Any
            from malevich.square import DFS, M, processor
            
            @processor()
            def process_tables(dfs: DFS[M["sql_tables"]]):
                ...
                
            @processor()
            def process_user_data(dfs: DFS["users", M[Any]]):
                ...
                
        Here, we have two processors. The first one accepts any number of data
        frames with scheme "sql_tables". The second one accepts one data frame
        with scheme "users", and any number of data frames with arbitrary schemes
        as one argument.
        
        .. note::
        
            When iterating over argument of type :code:`DFS[M["sql_tables"]]`, it will
            contain exactly one element of type DFS, which will consist of a number
            of data frames with scheme "sql_tables". 
            
            When iterating over argument of type :code:`DFS["users", M[Any]]`,
            the first element will be of type DF, and the second element will be
            of type DFS, consisting of data frames with arbitrary schemes.

        .. automethod:: malevich.square.df.DFS.__getitem__
        .. automethod:: malevich.square.df.DFS.__iter__
        .. automethod:: malevich.square.df.DFS.__len__

    .. autoclass:: malevich.square.df.OBJ
        :members:

    .. autoclass:: malevich.square.df.M
        :members:
        
   .. class:: Sink
    
        Wrapper class to denote a specific type inputs to processor
        
        Normally, each argument in processor function signature corresponds to
        exactly one output of the previous processor or exactly one collection.
        
        To denote a processor that is able to accept a variable number of inputs,
        you should use this class.
        
        .. code-block:: python
        
            from typing import Any
            from malevich.square import Sink, DFS, M, processor
            
            @processor()
            def merge_tables_sink(dfs: Sink["sql_tables"]):
                pass
                
            @processor()
            def merge_tables_dfs(dfs: DFS[M["sql_tables"]]):
                pass
                
        Here, we have two processors. :code:`Sink[schema]` is
        equivalent to :code:`List[DFS[M[schema]]]`. 
        
        The difference between two processors lies in the fact that
        the first one can be connected to any number of processors
        that return data frames with scheme "sql_tables", while the
        second one can be connected to exactly one processor that
        returns any number of data frames with scheme "sql_tables".
        
        See the difference visually:
        
        .. mermaid::
        
            graph TD
                C1[Prev. processor] -->|table_1, table_2| B[merge_tables_dfs]
                C2[Collection 1] -->|table_1| A[merge_tables_sink]
                C3[Collection 2] -->|table_2| A[merge_tables_sink]
                C4[Prev. processor] -->|table_3, table_4| A[merge_tables_sink]
                
        In this case, in function :code:`merge_tables_sink`, accessing
        :code:`dfs[0]` will return a :class:`DFS` object consisting of
        a single data frame with scheme "sql_tables", but accessing
        :code:`dfs[2]` will return a :class:`DFS` with two data frames
        with scheme "sql_tables" inside. 
        
        In case of :code:`merge_tables_dfs`, accessing :code:`dfs[0]`
        will return a :class:`DFS` object consisting of a two data frames
        with scheme "sql_tables". There is no way to connect more than
        one processor to :code:`merge_tables_dfs`. 
        
        .. note::
        
            In case there are other arguments in processor
            with :code:`Sink` argument, they will be mapped
            to non-sink arguments and Sink will greedily collect the rest. 

            In other words, first and last inputs will be connected to non-Sink arguments 
            (if there are any), and the rest will be included into Sink.

            .. code-block:: python3

                @processor()
                def merge_tables_sink(
                    table_1: DF, 
                    dfs: Sink["sql_tables"]
                    table_2: DF,
                ):
                    pass

            .. mermaid::

                graph TD
                    C1[App 1] -->|table_1| A[merge_tables_sink]
                    C2[Collection 1] -->|table_2| A[merge_tables_sink]
                    C4[App 2] -->|table_4, table_5| A[merge_tables_sink]
                    C3[Collection 2] -->|table_3| A[merge_tables_sink]
                
            Consider the example above. In this case, :code:`table_1` will come
            from :code:`App 1`, :code:`table_3` will come from :code:`Collection 2`,
            and the rest of input data frames (:code:`table_2`, :code:`table_4` and :code:`table_5`)
            will be included into :code:`dfs` argument. Accessing :code:`dfs[0]` will
            return a :class:`DFS` object consisting of a single data frame (:code:`table_2`),
            but accessing :code:`dfs[1]` will return a :class:`DFS` with two data frames (:code:`table_4` and :code:`table_5`).