Configuration

The constructor of SapientML class consumes various parameters depending on plugin installation. Here we show the parameters you can assign at the constructor of SapientML in cases of each model_type assigned.

Model types

sapientml provides the plugin mechanism for generating source code that is different from the original algorithm of sapientml in utilizing machine learning models and preprocessing components. Each plugin has a unique model_type, and users can choose one of them as a parameter of the constructor of SapientML class. The default value of model_type is sapientml, which is provided by sapientml_core plugin.

Parameters for `sapientml`

target_columns (list[str]): Names of target columns.
task_type (classification’, ‘regression’, or None) = None: Identifies the task type from classification or regression, or automatically suggests it if set to None
adaptation_metric (str) = ‘f1’ if task_type is ‘classification’, ‘r2’ if ‘regression’: Metric for evaluation. f1, auc, ROC_AUC, accuracy, Gini, LogLoss, MCC (Matthews correlation coefficient), QWK (Quadratic weighted kappa) are available for classification. r2, RMSLE, RMSE, MAE are available for regression.
split_method (‘random’, ‘time’, or ‘group’) = ‘random’: Method of train-test split. random uses random split. time requires split_column_name. This sorts the data rows based on the column, and then splits data. group requires split_column_name. This splits the data so that rows with the same value of split_column_name are not placed in both training and test data.
split_seed (int) = 17: Random seed for train-test split. Ignored when split_method='time'.
split_train_size (float) = 0.75: The ratio of training size to input data. Ignored when split_method='time'.
split_column_name (str or None) = None: Name of the column used to split. Ignored when split_method='random'
time_split_num (int) = 5: Passed to n_splits of TimeSeriesSplit. Valid only when split_method='time'.
time_split_index (int) = 4: The index of the split from TimeSeriesSplit. Valid only when split_method='time'.
split_stratification (bool or None) = None: To perform stratification in train-test split. Valid only when task_type='classification'.
initial_timeout (int) = 600: Timelimit to execute each generated script. Ignored when hyperparameter_tuning=True and hyperparameter_tuning_timeout is set.
timeout_for_test (int) = 0: Timelimit to execute test script (final_script) and Visualization.
cancel (CancellationToken or None) = None: Object to interrupt evaluations.
project_name (str or None) = None: Project name.
debug (bool) = False: Debug mode or not.
use_pos_list (list[str]) = [“名詞”, “動詞”, “助動詞”, “形容詞”, “副詞”]: List of parts-of-speech to be used during text analysis. This variable is used for japanese texts analysis. Select the part of speech below. “名詞”, “動詞”, “形容詞”, “形容動詞”, “副詞”.
use_word_stemming (bool) = True: Specify whether or not word stemming is used. This variable is used for japanese texts analysis.
n_models (int) = 3: Number of output models to be tried.
seed_for_model (int) = 42: Random seed for models such as RandomForestClassifier.
id_columns_for_prediction (list[str] or None) = None: Name of the dataframe columns that outputs the prediction result.
use_word_list (list[str], dict[str, list[str]], or None) = None: List of words to be used as features when generating explanatory variables from text. If dict type is specified, key must be a column name and value must be a list of words.
hyperparameter_tuning (bool) = False: On/Off of hyperparameter tuning.
hyperparameter_tuning_n_trials (int) = 10: The number of trials of hyperparameter tuning.
hyperparameter_tuning_timeout (int) = 0: Time limit for hyperparameter tuning in each generated script. Ignored when hyperparameter_tuning is False.
hyperparameter_tuning_random_state (int) = 1023: Random seed for hyperparameter tuning.
predict_option (‘default’ or ‘probability’) = ‘default’: Specify predict method (default: predict(), probability: predict_proba().)
permutation_importance (bool) = True: On/Off of outputting permutation importance calculation code.
add_explanation (bool) = False: If True, outputs ipynb files including EDA and explanation.

Configuration

Model types

Parameters for sapientml

Parameters for `sapientml`