slitflow.data module

class Data(info_path=None)[source]

Bases: object

Basic Data super class.

All analysis classes should be subclasses of this class. In this class, run() executes process() to all split data.

info

Information object containing column and parameter information.

Type:: Info

reqs

List of Data objects required to run process() static method of this class.

Type:: list of Data

data

List of result data calculated by process().

Type:: list of data such as pandas.DataFrame or numpy.ndarray

n_worker

Number of CPU used by process(). This number is defined by cpu_count * slitflow.CPU_RATE. This attribute is used during run_mp().

Type:: int

memory_limit

Max usage of memory. This value is defined by slitflow.MEMORY_LIMIT. This attribute prevents crashing memory during loading data and calculation.

Type:: int

EXT

Extension of data file with “.”. Implement in subclass.

Type:: str

MEMORY_LIMIT = 0.9

CPU_RATE = 0.7

load(file_nos=None)[source]: Load and split data files.

load_from_file()[source]

load_from_keep()[source]

load_data(path)[source]: Implement in each subclass.

save(clear=True)[source]

save_data(data, path)[source]: Implement in each subclass.

clear_data()[source]

keep_data()[source]

split(split_depth=None, index=None)[source]: Split info index and data.

set_split(split_depth)[source]

Split info index and data.

This method can be used to overwrite split_depth.

split_data()[source]: Implement in each subclass.

set_reqs(reqs=None, param=None)[source]

Preparation of required data.

This step strongly depends on the analysis type. Frequently used processes are in slitflow.setreqs.

set_info(param={})[source]

Convert input information to Info object.

This method creates columns and parameters information. The columns information is used to handle data structure. The parameter dictionaries are set as param of process(). This method is called before run(). Implemented in subclass.

Parameters:: param (dict, optional) – Parameters for columns or params.

set_index()[source]

Create index structure of this analysis data.

This step strongly depends on the analysis type. Frequently used processes are in slitflow.setindex.

run(reqs=None, param=None)[source]

Execute a series of processes to all data.

Parameters:

reqs (list of any) – List of required data.
param (dict) – Dictionary of parameters.

run_mp(reqs=None, param=None)[source]

Execute run method using multiple CPU.

This method uses ProcessPoolExecutor.

post_run()[source]

Implement in each subclass.

This method is used when an additional process is required. Example; the addition of the index into a calculated data table and the calculated result into param information.

static process(reqs, param={})[source]: Calculation code.

class Pickle(info_path=None)[source]

Bases: Data

Pickle Data class.

Warning

Pickle Data class is not recommended for data inaccessibility of saved data. It is recommended to create subsequent export classes that convert binary data to a table or image.

EXT = '.pickle'

load_data(path)[source]: Load pickle data.

split_data()[source]: Pickle object can not be split.

save_data(data, path)[source]: Save pickle data.