PyCaret AutoML Classification¶

Type¶

transform

Class¶

fire.nodes.pycaret.NodePyCaretAutoMLClassification

Fields¶

Name	Title	Description
response_column	Target Column	The label column for model fitting
include_algos	Include Algos	This is the list of Algorithms to be used for training. By default all algos will be selected, ‘lr’,’knn’,’nb’,’dt’,’svm’,’rbfsvm’,’gpc’,’mlp’,’ridge’, ‘rf’, ‘qda’,’ada’, ‘gbc’ ,’lda’,’et’ ,’xgboost’,’lightgbm’,’catboost’
train_size	Train size	Percent of data to be used for training. The rest of the data will be used for validation of the model built.
top_n_model	Top N Models	Number of Top N models to select to show on leaderboard.
imputation_type	Imputation Type	The type of imputation to use. Can be either ‘simple’ or ‘iterative’.
iterative_imputation_iters	Iterative Imputation Iters	Number of iterations. Ignored when imputation_type is not `iterative`.
categorical_features	Categorical Features	It takes a list of strings with column names that are categorical.
categorical_imputation	Categorical Imputation	Missing values in categorical features are imputed with a constant `not available` value. The other available option is `mode`.
categorical_iterative_imputer	Categorical Iterative Imputer	Estimator for iterative imputation of missing values in categorical features.
high_cardinality_features	High Cardinality Features	When categorical features contains many levels, it can be compressed into fewer levels using this parameter. It takes a list of strings with column names that are categorical.
high_cardinality_method	High Cardinality Method	Categorical features with high cardinality are replaced with the frequency ofvalues in each level occurring in the training dataset. Other available method is clustering which trains the K-Means clustering algorithm on the statistical attribute of the training data and replaces the original value of feature with the cluster label
numeric_features	Numeric Features	If the inferred data types are not correct or the silent param is set to True, `numeric features` param can be used to overwrite or define the data types.
numeric_imputation	Numeric Imputation	Missing values in numeric features are imputed with ‘mean’ value of the feature in the training dataset. The other available option is ‘median’ or ‘zero’.
numeric_iterative_imputer	Numeric Iterative Imputer	Estimator for iterative imputation of missing values in numeric features.
date_features	Date Features	If the inferred data types are not correct or the silent param is set to True, `date features` param can be used to overwrite or define the data types. It takes a list of strings with column names that are DateTime.
ignore_features	Ignore Features	This param can be used to ignore features features during model training. It takes a list of strings with column names that are to be ignored.
normalize	Normalize	When set to True, it transforms the numeric features by scaling them to a given range.
normalize_method	Normalize Method	Defines the method for scaling.
transformation	Transformation	When set to True, it transforms the numeric features by scaling them to a given range.
transformation_method	Transformation Method	Defines the method for transformation.
handle_unknown_categorical	Handle Unknown Categorical	When set to True, unknown categorical levels in unseen data are replaced by the most or least frequent level as learned in the training dataset.
unknown_categorical_method	Unknown Categorical Method	Method used to replace unknown categorical levels in unseen data.
pca	PCA	When set to True, dimensionality reduction is applied to project the data into a lower dimensional space using the method defined in `pca method` parameter.
pca_method	PCA Method	Method used to replace unknown categorical levels in unseen data.
ignore_low_variance	Ignore Low Variance	When set to True, all categorical features with insignificant variances are removed from the data.
combine_rare_levels	Combine Rare Levels	When set to True, frequency percentile for levels in categorical features below a certain threshold is combined into a single level.
rare_level_threshold	Rare Level Threshold	Percentile distribution below which rare categories are combined. Ignored when `combine rare levels` is not True.
remove_outliers	Remove Outliers	When set to True, outliers from the training data are removed using the Singular Value Decomposition.
outliers_threshold	Outliers Threshold	The percentage outliers to be removed from the training dataset. Ignored when `remove outliers` is not True.
remove_multicollinearity	Remove Multicollinearity	When set to True, features with the inter-correlations higher than the defined threshold are removed. When two features are highly correlated with each other, the feature that is less correlated with the target variable is removed. Only considers numeric features.
multicollinearity_threshold	Multicollinearity Threshold	Threshold for correlated features. Ignored when `remove multicollinearity` is not True.
remove_perfect_collinearity	Remove Perfect Collinearity	When set to True, perfect collinearity (features with correlation = 1) is removed from the dataset, when two features are 100% correlated, one of it is randomly removed from the dataset.
path	Path	Model Save Path.