slitflow.manager module

class Pipeline(root_dir)[source]

Bases: object

Manage the sequential running of the Data class and file IO.

root_dir

File path to the project directory.

Type:

str

df

Pipeline table consisting of a series of data classes.

Type:

pandas.DataFrame

init_df()[source]

Create a pipeline table.

init_folder()[source]

Make the project folder if it doesn’t exist.

save(sheet_name)[source]

Export the pipeline table as a CSV file.

The CSV file is saved in the g0_config folder.

Parameters:

sheet_name (str) – Pipeline CSV file name without extension.

load(sheet_names)[source]

Import pipeline table from the CSV file.

The CSV file is loaded from the g0_config folder.

Parameters:

sheet_names (str or list of str) – Pipeline CSV file name without extension.

add(class_name, run_mode, address, grp_name, ana_name, obs_names, reqs_address, reqs_split, param)[source]

Add a task to the pipeline table.

Parameters:
  • class_name (str) – Class name string.

  • run_mode (int) – Run mode (0=single data, single CPU; 1=single data , multi CPU; 2=multi data, multi CPU; 3=multi data, multi CPU).

  • address (tuple) – (group no, analysis no) to save the task.

  • grp_name (str) – Group name.

  • ana_name (str) – Analysis name.

  • obs_names (list of str) – List of observation names that are used for data file names.

  • reqs_address (list of tuple) – List of (group no, analysis no) of required data files.

  • reqs_split (list of int or list of list of int) – List of split depth of each required data. Each element should be [load_split, data_split]. If load_split and data_split are the same, it can be specified as [split]. That is, it is specified in the format [[load_split1, data_split1], [load_split2, data_split2], …] or [load_and_data_split1, load_and_data_split2,…].

  • param (dict) – Parameter dictionary.

set_class_name(class_name)[source]

Standardize various type of class_name to formatted string.

Parameters:

class_name (Data or str) – Input to set the class name.

Returns:

eval() executable class_name string. “slitflow” package can be imported as “sf”.

Return type:

str

set_run_mode(run_mode)[source]

Convert run mode to integer.

Parameters:

run_mode (int or str) – Input to set the run mode.

Returns:

Run mode number (0=single data, single CPU; 1=single data, multi CPU; 2=multi data, multi CPU; 3=multi data, multi CPU).

Return type:

int

set_address(address)[source]

Check address format.

Parameters:

address (tuple of int, or str) – Input address should be (group_no, analysis_no).

Returns:

(group_no, analysis_no)

Return type:

tuple of int

set_grp_name(address, grp_name)[source]

Check input group name.

Additional restrictions will be written here.

Parameters:
  • address (tuple of int, or str) – Input address should be (group_no, analysis_no).

  • grp_name (str) – Group name to check.

Returns:

Group name

Return type:

str

set_ana_name(ana_name)[source]

Check input analysis name.

Additional restrictions will be written here.

Parameters:

ana_name (str) – Analysis name to check.

Returns:

Analysis name

Return type:

str

set_reqs_address(reqs_address)[source]

Check required addresses.

Parameters:

reqs_address (list of tuple) – List of (group_no, analysis_no) of required data.

Returns:

List of required data address

Return type:

list of tuple

set_obs_names(obs_names)[source]

Check and convert observation names.

Parameters:

obs_names (list or str) – List of observation names.

Returns:

Observation names

Return type:

list of str

set_reqs_split(reqs_split, reqs_address)[source]

Check and convert split depth to resplit required data.

Parameters:
  • reqs_split (list or str) – List of split_depth of required data. reqs_split should be [[load_split1, data_split1], [load_split2, data_split2], …] or [load_and_data_split1, load_and_data_split2,…]

  • reqs_address (list of tuple) – List of required address to check then number of required data.

Returns:

List of split_depth of required data

Return type:

list of int

set_param(param)[source]

Check parameter dictionary.

Parameters:

param (dict, str, or None) – Input to set as a parameter dictionary.

Returns:

Parameter dictionary

Return type:

dict

run(sheet_name=None, indices=None)[source]

Run selected tasks.

Parameters:
  • sheet_name (str, optional) – Pipeline CSV file name without extension.

  • indices (list of int, optional) – Task indices to run.

load_obs_names(obs_names, reqs_address)[source]

Get observation names from saved files if obs_names is empty list.

Parameters:
  • obs_names (list) – Observation names. Empty list is required to execute this method.

  • reqs_address (list of tuple) – List of required address tuples. The first address is used to pick up observation names.

Returns:

List of observation names

Return type:

list of str

convert_indices(indices=None)[source]

Standardize the indices argument of run method.

Parameters:

indices (None or int or tuple or list) –

Task row indices to

  • None : run all rows.

  • int : run a row of selected directly.

  • list : run rows of selected directly.

  • tuple : run rows of selected by (start, end, step(optional)). tuple[1]==0 make select to the last row.

Returns:

Task row indices to run

Return type:

pandas.Int64Index

Examples

When index of self.df is reset:

>>> self.convert_indices()
self.df.index
>>> self.convert_indices(-1)
pd.Index([self.df.index[-1]])
>>> self.convert_indices([1, -1])
pd.Index([self.df.index[1], self.df.index[-1]])
>>> self.convert_indices(range(3))
self.df.index[:3]
>>> self.convert_indices((1, -1))
self.df.index[1:-1]
>>> self.convert_indices((1, 0, 2))
self.df.index[1::2]
run_one_data(class_name, reqs_split, reqs_address, obs_name, param, grp_name, ana_name, run_mode, address)[source]

Execute a task that is not split into multiple files.

Parameters:
  • class_name (str) – eval() executable class name string.

  • reqs_split (list) – List of split depth of each required data.

  • reqs_address (list of tuple) – List of required data address.

  • obs_name (list of str) – Observation names.

  • param (dict) – Parameter dictionary.

  • grp_name (str) – Group name.

  • ana_name (str) – Analysis name.

  • run_mode (int) – Run mode number. This should be 0 or 1.

  • address (tuple) – (group_no, analysis_no) of the result data.

run_one_data_multi_obs(class_name, reqs_split, reqs_address, obs_names, param, grp_name, ana_name, run_mode, address)[source]

Execute a task that is not split into multiple files.

The first element of obs_names is used to the result file name.

Parameters:
  • class_name (str) – eval() executable class name string.

  • reqs_split (list) – List of split depth of each required data.

  • reqs_address (list of tuple) – List of required data address.

  • obs_names (list of str) – Observation names.

  • param (dict) – Parameter dictionary.

  • grp_name (str) – Group name.

  • ana_name (str) – Analysis name.

  • run_mode (int) – Run mode number. This should be 0 or 1.

  • address (tuple) – (group_no, analysis_no) of the result data.

run_multi_data(class_name, reqs_split, reqs_address, obs_name, param, grp_name, ana_name, run_mode, address)[source]

Execute a task that is split into multiple files.

Parameters:
  • class_name (str) – eval() executable class_name string.

  • reqs_split (list) – List of split depth of each required data.

  • reqs_address (list of tuple) – List of required data address.

  • obs_name (list of str) – Observation names.

  • param (dict) – Parameter dictionary.

  • grp_name (str) – Group name.

  • ana_name (str) – Analysis name.

  • run_mode (int) – Run mode number. This should be 0 or 1.

  • address (tuple) – (group_no, analysis_no) of the result data.

run_Obs2Depth(class_name, reqs_split, reqs_address, obs_names, param, grp_name, ana_name, run_mode, address)[source]

Merge different observations into one observation with depth.

Caution

Currently only run_mode=0 is supported.

Parameters:
  • class_name (str) – eval() executable class_name string.

  • reqs_split (list) – List of split depth of each required data.

  • reqs_address (list of tuple) – List of required data address.

  • obs_name (list of str) – Observation names.

  • param (dict) – Parameter dictionary.

  • grp_name (str) – Group name.

  • ana_name (str) – Analysis name.

  • run_mode (int) – Run mode number. This should be 0 or 1.

  • address (tuple) – (group_no, analysis_no) of the result data.

  • param – Parameter dictionary. This should have the below item.

  • param["obs_name"] (str) – Newly created observation name.

run_delete(reqs_address, obs_names, param)[source]

Delete selected data.

Parameters:
  • reqs_address (list of tuple) – List of (group name, analysis name) to delete.

  • obs_names (list of str) – Observation names to delete.

  • param (dict, optional) – Parameter dictionary. param would have the below item.

  • param["keep"] (str, optional) –

    Defines delete type.

    • info : Not delete information files.

    • folder : Delete the information files but not the folder itself.

run_copy(address, ana_name, grp_name, reqs_address, obs_names, param)[source]

Copy data from a different analysis.

Parameters:
  • address (tuple) – (group_no, analysis_no) of copy destination.

  • ana_name (str) – Analysis name of copy destination.

  • grp_name (str) – Group name of copy destination.

  • reqs_address (list of tuple) – List containing only one data address of copy source.

  • obs_names (list of str) – List containing only one observation name of copy destination.

  • param (dict) – Parameter dictionary. This should have the below item.

  • param["obs_name"] (str, optional) – Observation name of copy source.

run_index(class_name, reqs_address, obs_names, param, grp_name, ana_name, address)[source]

A specific run method for tbl.convert.Index class.

slitflow.tbl.convert.Index class is a class that create a index table from required Data object. The class loads only the index file of the required data. Therefore, the class does not need to load the required data.

Parameters:
  • class_name (str) – eval() executable class name string.

  • reqs_address (list of tuple) – List of required data address.

  • obs_name (list of str) – Observation names.

  • param (dict) – Parameter dictionary.

  • grp_name (str) – Group name.

  • ana_name (str) – Analysis name.

  • address (tuple) – (group_no, analysis_no) of the result data.

make_flowchart(fig_name, label_type, is_vertical=False, scale=(0.5, 1), format='png', dpi=300)[source]

Create workflow graph into the g0_config directory.

Parameters:
  • fig_name (str) – Name of the flowchart file.

  • label_type (str) –

    Description type. This should be

    • ”class_desc” : shows the one-line class description from class docstring.

    • ”grp_ana” : shows “grp_name (newline) ana_name”.

  • is_vertical (bool) – Flowchart direction. Defaults to False (horizontal).

  • scale (tuple of int) – Scale factors of (width, height).

  • format (str) – File save format. Defaults to “png”.

  • dpi (int) – Dot per inch of exporting file.

rename_info_class(grp_no, ana_no, new_name)[source]

Rename info class name.

Rename class name of info.json file of saved required data. This method is used if the class name of the saved data is changed.

Parameters:
  • grp_no (int) – Group number.

  • ana_no (int) – Analysis number.

  • new_name (str) – New class name as slitflow.modulename.ClassName.