slitflow.manager module

class Pipeline(root_dir)[source]

Bases: object

Manage the sequential running of the Data class and file IO.

root_dir

File path to the project directory.

Type:: str

df

Pipeline table consisting of a series of data classes.

Type:: pandas.DataFrame

init_df()[source]: Create a pipeline table.

init_folder()[source]: Make the project folder if it doesn’t exist.

save(sheet_name)[source]

Export the pipeline table as a CSV file.

The CSV file is saved in the g0_config folder.

Parameters:: sheet_name (str) – Pipeline CSV file name without extension.

load(sheet_names)[source]

Import pipeline table from the CSV file.

The CSV file is loaded from the g0_config folder.

Parameters:: sheet_names (str or list of str) – Pipeline CSV file name without extension.

add(class_name, run_mode, address, grp_name, ana_name, obs_names, reqs_address, reqs_split, param)[source]

Add a task to the pipeline table.

Parameters:

class_name (str) – Class name string.
run_mode (int) – Run mode (0=single data, single CPU; 1=single data , multi CPU; 2=multi data, multi CPU; 3=multi data, multi CPU).
address (tuple) – (group no, analysis no) to save the task.
grp_name (str) – Group name.
ana_name (str) – Analysis name.
obs_names (list of str) – List of observation names that are used for data file names.
reqs_address (list of tuple) – List of (group no, analysis no) of required data files.
reqs_split (list of int or list of list of int) – List of split depth of each required data. Each element should be [load_split, data_split]. If load_split and data_split are the same, it can be specified as [split]. That is, it is specified in the format [[load_split1, data_split1], [load_split2, data_split2], …] or [load_and_data_split1, load_and_data_split2,…].
param (dict) – Parameter dictionary.

set_class_name(class_name)[source]

Standardize various type of class_name to formatted string.

Parameters:: class_name (Data or str) – Input to set the class name.
Returns:: eval() executable class_name string. “slitflow” package can be imported as “sf”.
Return type:: str

set_run_mode(run_mode)[source]

Convert run mode to integer.

Parameters:: run_mode (int or str) – Input to set the run mode.
Returns:: Run mode number (0=single data, single CPU; 1=single data, multi CPU; 2=multi data, multi CPU; 3=multi data, multi CPU).
Return type:: int

set_address(address)[source]

Check address format.

Parameters:: address (tuple of int, or str) – Input address should be (group_no, analysis_no).
Returns:: (group_no, analysis_no)
Return type:: tuple of int

set_grp_name(address, grp_name)[source]

Check input group name.

Additional restrictions will be written here.

Parameters:

address (tuple of int, or str) – Input address should be (group_no, analysis_no).
grp_name (str) – Group name to check.

Returns:

Group name

Return type:

str

set_ana_name(ana_name)[source]

Check input analysis name.

Additional restrictions will be written here.

Parameters:: ana_name (str) – Analysis name to check.
Returns:: Analysis name
Return type:: str

set_reqs_address(reqs_address)[source]

Check required addresses.

Parameters:: reqs_address (list of tuple) – List of (group_no, analysis_no) of required data.
Returns:: List of required data address
Return type:: list of tuple

set_obs_names(obs_names)[source]

Check and convert observation names.

Parameters:: obs_names (list or str) – List of observation names.
Returns:: Observation names
Return type:: list of str

set_reqs_split(reqs_split, reqs_address)[source]

Check and convert split depth to resplit required data.

Parameters:

reqs_split (list or str) – List of split_depth of required data. reqs_split should be [[load_split1, data_split1], [load_split2, data_split2], …] or [load_and_data_split1, load_and_data_split2,…]
reqs_address (list of tuple) – List of required address to check then number of required data.

Returns:

List of split_depth of required data

Return type:

list of int

set_param(param)[source]

Check parameter dictionary.

Parameters:: param (dict, str, or None) – Input to set as a parameter dictionary.
Returns:: Parameter dictionary
Return type:: dict

run(sheet_name=None, indices=None)[source]

Run selected tasks.

Parameters:

sheet_name (str, optional) – Pipeline CSV file name without extension.
indices (list of int, optional) – Task indices to run.

load_obs_names(obs_names, reqs_address)[source]

Get observation names from saved files if obs_names is empty list.

Parameters:

obs_names (list) – Observation names. Empty list is required to execute this method.
reqs_address (list of tuple) – List of required address tuples. The first address is used to pick up observation names.

Returns:

List of observation names

Return type:

list of str

convert_indices(indices=None)[source]

Standardize the indices argument of run method.

Parameters:

indices (None or int or tuple or list) –

Task row indices to

None : run all rows.
int : run a row of selected directly.
list : run rows of selected directly.
tuple : run rows of selected by (start, end, step(optional)). tuple[1]==0 make select to the last row.

Returns:

Task row indices to run

Return type:

pandas.Int64Index

Examples

When index of self.df is reset:

>>> self.convert_indices()
self.df.index
>>> self.convert_indices(-1)
pd.Index([self.df.index[-1]])
>>> self.convert_indices([1, -1])
pd.Index([self.df.index[1], self.df.index[-1]])
>>> self.convert_indices(range(3))
self.df.index[:3]
>>> self.convert_indices((1, -1))
self.df.index[1:-1]
>>> self.convert_indices((1, 0, 2))
self.df.index[1::2]

run_one_data(class_name, reqs_split, reqs_address, obs_name, param, grp_name, ana_name, run_mode, address)[source]

Execute a task that is not split into multiple files.

Parameters:

class_name (str) – eval() executable class name string.
reqs_split (list) – List of split depth of each required data.
reqs_address (list of tuple) – List of required data address.
obs_name (list of str) – Observation names.
param (dict) – Parameter dictionary.
grp_name (str) – Group name.
ana_name (str) – Analysis name.
run_mode (int) – Run mode number. This should be 0 or 1.
address (tuple) – (group_no, analysis_no) of the result data.

run_one_data_multi_obs(class_name, reqs_split, reqs_address, obs_names, param, grp_name, ana_name, run_mode, address)[source]

Execute a task that is not split into multiple files.

The first element of obs_names is used to the result file name.

Parameters:

class_name (str) – eval() executable class name string.
reqs_split (list) – List of split depth of each required data.
reqs_address (list of tuple) – List of required data address.
obs_names (list of str) – Observation names.
param (dict) – Parameter dictionary.
grp_name (str) – Group name.
ana_name (str) – Analysis name.
run_mode (int) – Run mode number. This should be 0 or 1.
address (tuple) – (group_no, analysis_no) of the result data.

run_multi_data(class_name, reqs_split, reqs_address, obs_name, param, grp_name, ana_name, run_mode, address)[source]

Execute a task that is split into multiple files.

Parameters:

class_name (str) – eval() executable class_name string.
reqs_split (list) – List of split depth of each required data.
reqs_address (list of tuple) – List of required data address.
obs_name (list of str) – Observation names.
param (dict) – Parameter dictionary.
grp_name (str) – Group name.
ana_name (str) – Analysis name.
run_mode (int) – Run mode number. This should be 0 or 1.
address (tuple) – (group_no, analysis_no) of the result data.

run_Obs2Depth(class_name, reqs_split, reqs_address, obs_names, param, grp_name, ana_name, run_mode, address)[source]

Merge different observations into one observation with depth.

Caution

Currently only run_mode=0 is supported.

Parameters:

class_name (str) – eval() executable class_name string.
reqs_split (list) – List of split depth of each required data.
reqs_address (list of tuple) – List of required data address.
obs_name (list of str) – Observation names.
param (dict) – Parameter dictionary.
grp_name (str) – Group name.
ana_name (str) – Analysis name.
run_mode (int) – Run mode number. This should be 0 or 1.
address (tuple) – (group_no, analysis_no) of the result data.
param – Parameter dictionary. This should have the below item.
param["obs_name"] (str) – Newly created observation name.

run_delete(reqs_address, obs_names, param)[source]

Delete selected data.

Parameters:

reqs_address (list of tuple) – List of (group name, analysis name) to delete.
obs_names (list of str) – Observation names to delete.
param (dict, optional) – Parameter dictionary. param would have the below item.
param["keep"] (str, optional) –
Defines delete type.
- info : Not delete information files.
- folder : Delete the information files but not the folder itself.

run_copy(address, ana_name, grp_name, reqs_address, obs_names, param)[source]

Copy data from a different analysis.

Parameters:

address (tuple) – (group_no, analysis_no) of copy destination.
ana_name (str) – Analysis name of copy destination.
grp_name (str) – Group name of copy destination.
reqs_address (list of tuple) – List containing only one data address of copy source.
obs_names (list of str) – List containing only one observation name of copy destination.
param (dict) – Parameter dictionary. This should have the below item.
param["obs_name"] (str, optional) – Observation name of copy source.

run_index(class_name, reqs_address, obs_names, param, grp_name, ana_name, address)[source]

A specific run method for tbl.convert.Index class.

slitflow.tbl.convert.Index class is a class that create a index table from required Data object. The class loads only the index file of the required data. Therefore, the class does not need to load the required data.

Parameters:

class_name (str) – eval() executable class name string.
reqs_address (list of tuple) – List of required data address.
obs_name (list of str) – Observation names.
param (dict) – Parameter dictionary.
grp_name (str) – Group name.
ana_name (str) – Analysis name.
address (tuple) – (group_no, analysis_no) of the result data.

make_flowchart(fig_name, label_type, is_vertical=False, scale=(0.5, 1), format='png', dpi=300)[source]

Create workflow graph into the g0_config directory.

Parameters:

fig_name (str) – Name of the flowchart file.
label_type (str) –
Description type. This should be
- ”class_desc” : shows the one-line class description from class docstring.
- ”grp_ana” : shows “grp_name (newline) ana_name”.
is_vertical (bool) – Flowchart direction. Defaults to False (horizontal).
scale (tuple of int) – Scale factors of (width, height).
format (str) – File save format. Defaults to “png”.
dpi (int) – Dot per inch of exporting file.

rename_info_class(grp_no, ana_no, new_name)[source]

Rename info class name.

Rename class name of info.json file of saved required data. This method is used if the class name of the saved data is changed.

Parameters:

grp_no (int) – Group number.
ana_no (int) – Analysis number.
new_name (str) – New class name as slitflow.modulename.ClassName.