한국소성가공학회 실습 2

Predict Bulk Modulus with PyCaret


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. AI in Materials Science

1.1. The Fourth Paradigm in Materials Science





Image: Ankit Agrawal and Alok Choudhary, "Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science," APL Materials 4, 053208 (2016); https://doi.org/10.1063/1.4946894

1.2. Fast Materials Screening

  • AI-based methods as a pre-screening tool for traditional methods like DFT





Image: Park, H., Bartel, C. J., Ceder, G., Zapol, P., "Layered Transition Metal Oxides as Ca Intercalation Cathodes: A Systematic First-Principles Evaluation," Adv. Energy Mater, 2021, 11, 2101698. https://doi.org/10.1002/aenm.202101698

1.3. Materials Database

  • The Materials Project is a database of predicted properties of materials using Density Functional Theory (DFT).
  • Structural information and Property data for inorganic materials
  • https://materialsproject.org/









2. PyCaret





  • PyCaret is ideal for:
  • AutoML
    • Automate from data preprocessing to model validation





In [ ]:
#!pip install pycaret
In [2]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [3]:
import pycaret
import pycaret.regression
import pycaret.classification
import numpy as np
import pandas as pd

# filter warnings messages from the notebook
import warnings
warnings.filterwarnings('ignore')

3. Regression

3.1. Feature Extraction

Descriptors

  • Extract features using matminer
  • Colvalent radius, s$\cdot$p$\cdot$d$\cdot$f orbital valence, oxidation state, space group, density, etc.
  • Bulk modulus is directly related the interatomic potential and volume per atoms





Dataset

  • Input: Descriptors
  • Output: Bulk modulus
  • Bulk modulus of a substance is a measure of how resistant to compression the substance is
In [4]:
df_reg = pd.read_csv("/content/drive/MyDrive/kstp/data_files/df_reg.csv", index_col = 0)
df_reg
Out[4]:
k_vrh vpa density MagpieData mean MeltingT MagpieData mean NUnfilled packing fraction MagpieData mode MeltingT MagpieData minimum NUnfilled MagpieData maximum GSvolume_pa MagpieData mean GSvolume_pa MagpieData minimum NValence MagpieData mean NdUnfilled MagpieData mode NUnfilled MagpieData mean NpValence MagpieData avg_dev NpUnfilled MagpieData minimum MeltingT MagpieData maximum MeltingT MagpieData maximum NdValence MagpieData mode GSvolume_pa MagpieData mean MendeleevNumber MagpieData minimum Electronegativity MagpieData minimum MendeleevNumber std_dev oxidation state
0 295.077545 12.957800 13.988541 2496.500000 4.000000 0.570238 1687.00 4.0 20.440000 17.265000 4.0 2.000 4.0 1.000000 2.000000 1687.00 3306.00 6.0 14.090000 67.500000 1.90 57.0 5.656854
1 74.370488 17.868860 6.519289 1401.760000 2.800000 0.788912 1211.40 0.0 54.230000 24.146000 2.0 1.200 3.0 0.800000 1.920000 1050.00 1768.00 10.0 10.245000 56.400000 0.95 8.0 0.000000
2 234.099927 15.435634 17.027465 2016.300000 3.500000 0.686917 2041.40 2.0 16.690000 15.437500 4.0 2.750 2.0 0.000000 0.000000 1941.00 2041.40 9.0 15.020000 58.000000 1.54 43.0 2.529822
3 30.178322 18.871482 3.312854 606.150000 0.500000 0.732266 453.69 0.0 22.890000 18.892917 1.0 0.000 1.0 0.000000 0.000000 453.69 923.00 10.0 16.593333 35.000000 0.98 1.0 0.000000
4 41.336301 19.547245 5.439796 633.502500 0.250000 0.705746 923.00 0.0 25.237586 21.902730 1.0 0.000 0.0 0.000000 0.000000 234.32 923.00 10.0 22.890000 52.000000 0.98 1.0 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7137 62.971555 11.111316 2.536134 470.408750 2.250000 0.587636 54.80 1.0 16.593333 12.401250 1.0 0.875 2.0 2.000000 1.000000 54.80 2183.00 3.0 9.105000 49.625000 0.98 1.0 3.043544
7138 278.173744 8.570548 5.426881 2277.285714 5.857143 0.686866 2348.00 5.0 13.010000 9.674286 3.0 3.000 5.0 0.571429 2.448980 2183.00 2348.00 3.0 7.172500 60.857143 1.63 46.0 3.346640
7139 32.487326 15.415494 1.283597 743.880588 1.705882 0.718573 453.69 1.0 20.440000 17.498431 1.0 0.000 1.0 0.470588 1.439446 453.69 1687.00 0.0 16.593333 19.117647 0.98 1.0 0.000000
7140 44.284962 16.322156 5.419970 388.491429 2.000000 0.345413 54.80 0.0 31.560000 16.214286 6.0 0.000 2.0 3.142857 0.571429 54.80 903.78 10.0 9.105000 83.857143 1.65 69.0 3.082207
7141 71.567048 11.830199 4.046538 622.086667 1.333333 0.675695 54.80 1.0 16.593333 12.256111 1.0 0.000 1.0 1.333333 0.888889 54.80 1357.77 10.0 9.105000 50.666667 0.98 1.0 1.732051

7142 rows × 23 columns

3.2. Pipeline Setup

  • Initializes the training environment and creates the transformation pipeline

  • Setup function must be called before executing any other functions

  • It takes two mandatory parameters: "data" and "target"

In [5]:
pipeline = pycaret.regression.setup(data = df_reg, 
                                    target = 'k_vrh', 
                                    train_size = 0.9, 
                                    fold = 5, 
                                    silent = True, 
                                    session_id = 123)
Description Value
0 session_id 123
1 Target k_vrh
2 Original Data (7142, 23)
3 Missing Values True
4 Numeric Features 22
5 Categorical Features 0
6 Ordinal Features False
7 High Cardinality Features False
8 High Cardinality Method None
9 Transformed Train Set (6427, 22)
10 Transformed Test Set (715, 22)
11 Shuffle Train-Test True
12 Stratify Train-Test False
13 Fold Generator KFold
14 Fold Number 5
15 CPU Jobs -1
16 Use GPU False
17 Log Experiment False
18 Experiment Name reg-default-name
19 USI 0a0e
20 Imputation Type simple
21 Iterative Imputation Iteration None
22 Numeric Imputer mean
23 Iterative Imputation Numeric Model None
24 Categorical Imputer constant
25 Iterative Imputation Categorical Model None
26 Unknown Categoricals Handling least_frequent
27 Normalize False
28 Normalize Method None
29 Transformation False
30 Transformation Method None
31 PCA False
32 PCA Method None
33 PCA Components None
34 Ignore Low Variance False
35 Combine Rare Levels False
36 Rare Level Threshold None
37 Numeric Binning False
38 Remove Outliers False
39 Outliers Threshold None
40 Remove Multicollinearity False
41 Multicollinearity Threshold None
42 Remove Perfect Collinearity True
43 Clustering False
44 Clustering Iteration None
45 Polynomial Features False
46 Polynomial Degree None
47 Trignometry Features False
48 Polynomial Threshold None
49 Group Features False
50 Feature Selection False
51 Feature Selection Method classic
52 Features Selection Threshold None
53 Feature Interaction False
54 Feature Ratio False
55 Interaction Threshold None
56 Transform Target False
57 Transform Target Method box-cox
INFO:logs:create_model_container: 0
INFO:logs:master_model_container: 0
INFO:logs:display_container: 1
INFO:logs:Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=False, features_todrop=[],
                                      id_columns=[], ml_usecase='regression',
                                      numerical_features=[], target='k_vrh',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_numerical=None,
                                numeric_strategy=...
                ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                ('cluster_all', 'passthrough'),
                ('dummy', Dummify(target='k_vrh')),
                ('fix_perfect', Remove_100(target='k_vrh')),
                ('clean_names', Clean_Colum_Names()),
                ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                ('dfs', 'passthrough'), ('pca', 'passthrough')],
         verbose=False)
INFO:logs:setup() succesfully completed......................................

Normalization





  • Preprocessing such as PCA, feature selection, and normalization is possible
  • We can normalize the data by setting "normalize=True" in "setup()"
In [6]:
pipeline1 = pycaret.regression.setup(data = df_reg, 
                                     target = 'k_vrh', 
                                     train_size = 0.9, 
                                     fold = 5, 
                                     normalize = True, 
                                     silent = True, 
                                     session_id = 123)
Description Value
0 session_id 123
1 Target k_vrh
2 Original Data (7142, 23)
3 Missing Values True
4 Numeric Features 22
5 Categorical Features 0
6 Ordinal Features False
7 High Cardinality Features False
8 High Cardinality Method None
9 Transformed Train Set (6427, 22)
10 Transformed Test Set (715, 22)
11 Shuffle Train-Test True
12 Stratify Train-Test False
13 Fold Generator KFold
14 Fold Number 5
15 CPU Jobs -1
16 Use GPU False
17 Log Experiment False
18 Experiment Name reg-default-name
19 USI a07b
20 Imputation Type simple
21 Iterative Imputation Iteration None
22 Numeric Imputer mean
23 Iterative Imputation Numeric Model None
24 Categorical Imputer constant
25 Iterative Imputation Categorical Model None
26 Unknown Categoricals Handling least_frequent
27 Normalize True
28 Normalize Method zscore
29 Transformation False
30 Transformation Method None
31 PCA False
32 PCA Method None
33 PCA Components None
34 Ignore Low Variance False
35 Combine Rare Levels False
36 Rare Level Threshold None
37 Numeric Binning False
38 Remove Outliers False
39 Outliers Threshold None
40 Remove Multicollinearity False
41 Multicollinearity Threshold None
42 Remove Perfect Collinearity True
43 Clustering False
44 Clustering Iteration None
45 Polynomial Features False
46 Polynomial Degree None
47 Trignometry Features False
48 Polynomial Threshold None
49 Group Features False
50 Feature Selection False
51 Feature Selection Method classic
52 Features Selection Threshold None
53 Feature Interaction False
54 Feature Ratio False
55 Interaction Threshold None
56 Transform Target False
57 Transform Target Method box-cox
INFO:logs:create_model_container: 0
INFO:logs:master_model_container: 0
INFO:logs:display_container: 1
INFO:logs:Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=False, features_todrop=[],
                                      id_columns=[], ml_usecase='regression',
                                      numerical_features=[], target='k_vrh',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_numerical=None,
                                numeric_strategy=...
                                                  target='k_vrh')),
                ('P_transform', 'passthrough'), ('binn', 'passthrough'),
                ('rem_outliers', 'passthrough'), ('cluster_all', 'passthrough'),
                ('dummy', Dummify(target='k_vrh')),
                ('fix_perfect', Remove_100(target='k_vrh')),
                ('clean_names', Clean_Colum_Names()),
                ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                ('dfs', 'passthrough'), ('pca', 'passthrough')],
         verbose=False)
INFO:logs:setup() succesfully completed......................................

3.3. Training





  • Top-performing model based on the criteria defined in “sort” parameter
  • Show performances based on 10-fold cross validation
In [7]:
top_model = pycaret.regression.compare_models(sort = 'R2')
Model MAE MSE RMSE R2 RMSLE MAPE TT (Sec)
et Extra Trees Regressor 12.0479 594.9934 24.2459 0.8960 0.3211 0.2908 2.096
lightgbm Light Gradient Boosting Machine 13.1333 599.6549 24.3223 0.8953 0.3345 0.3181 0.408
rf Random Forest Regressor 13.4195 698.4265 26.3109 0.8778 0.3296 0.3245 5.834
gbr Gradient Boosting Regressor 15.6848 703.4731 26.3972 0.8770 0.3865 0.4309 1.590
knn K Neighbors Regressor 18.8921 1017.3413 31.8747 0.8216 0.4326 0.4564 0.196
dt Decision Tree Regressor 18.9631 1250.4976 35.2976 0.7808 0.4398 0.3759 0.178
lr Linear Regression 27.6286 1617.5449 40.1885 0.7166 0.6450 1.3883 1.148
ridge Ridge Regression 27.6279 1617.5062 40.1880 0.7166 0.6449 1.3880 0.038
br Bayesian Ridge 27.6245 1617.4076 40.1869 0.7166 0.6440 1.3864 0.038
huber Huber Regressor 27.3125 1641.9605 40.4763 0.7124 0.6599 1.5571 0.170
lasso Lasso Regression 28.2483 1673.2683 40.8707 0.7068 0.6444 1.3864 0.032
ada AdaBoost Regressor 32.6235 1866.3366 43.1900 0.6720 0.7023 1.1851 0.622
en Elastic Net 31.0166 1901.4729 43.5832 0.6667 0.6450 1.1215 0.034
par Passive Aggressive Regressor 30.9325 2130.8661 46.0131 0.6269 0.7398 1.9523 0.032
omp Orthogonal Matching Pursuit 33.7089 2237.7258 47.2829 0.6076 0.6832 1.6168 0.040
lar Least Angle Regression 32.9794 2343.6933 46.7437 0.5925 0.7113 1.5937 0.036
llar Lasso Least Angle Regression 60.9662 5703.7367 75.5084 -0.0005 1.0052 2.6295 0.032
dummy Dummy Regressor 60.9662 5703.7368 75.5084 -0.0005 1.0052 2.6295 0.016
INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 2
INFO:logs:ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',
                    max_depth=None, max_features='auto', max_leaf_nodes=None,
                    max_samples=None, min_impurity_decrease=0.0,
                    min_impurity_split=None, min_samples_leaf=1,
                    min_samples_split=2, min_weight_fraction_leaf=0.0,
                    n_estimators=100, n_jobs=-1, oob_score=False,
                    random_state=123, verbose=0, warm_start=False)
INFO:logs:compare_models() succesfully completed......................................
In [8]:
lasso = pycaret.regression.create_model('lasso', fold = 5)
MAE MSE RMSE R2 RMSLE MAPE
Fold
0 28.5412 1877.9492 43.3353 0.6860 0.6319 1.1045
1 28.4220 1503.8801 38.7799 0.7173 0.6346 1.0561
2 28.4987 1619.6501 40.2449 0.7234 0.6728 1.2390
3 27.8673 1575.6525 39.6945 0.7202 0.6126 0.8133
4 27.9125 1789.2094 42.2990 0.6869 0.6699 2.7191
Mean 28.2483 1673.2683 40.8707 0.7068 0.6444 1.3864
Std 0.2955 138.8760 1.6889 0.0167 0.0234 0.6804
INFO:logs:create_model_container: 19
INFO:logs:master_model_container: 19
INFO:logs:display_container: 3
INFO:logs:Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)
INFO:logs:create_model() succesfully completed......................................

Turn off the K-fold cross validation

  • It should be noted that the performance of the model may be overestimated
In [9]:
models = pycaret.regression.compare_models(sort = 'R2', cross_validation = False)
Model MAE MSE RMSE R2 RMSLE MAPE TT (Sec)
lightgbm Light Gradient Boosting Machine 12.5440 451.4136 21.2465 0.9219 0.3249 0.2789 0.29
rf Random Forest Regressor 11.9251 463.2798 21.5239 0.9199 0.3017 0.2681 5.70
et Extra Trees Regressor 11.3981 556.9921 23.6007 0.9036 0.2978 0.2479 2.62
gbr Gradient Boosting Regressor 15.6892 668.0345 25.8464 0.8844 0.3723 0.3922 2.55
knn K Neighbors Regressor 17.9950 827.4488 28.7654 0.8569 0.3869 0.3661 0.03
dt Decision Tree Regressor 18.1102 1040.7476 32.2606 0.8200 0.3962 0.3470 0.15
lr Linear Regression 29.1737 1809.9832 42.5439 0.6869 0.7109 2.3270 0.02
ridge Ridge Regression 29.1751 1810.0604 42.5448 0.6869 0.7108 2.3277 0.02
lar Least Angle Regression 29.1737 1809.9846 42.5439 0.6869 0.7109 2.3270 0.02
br Bayesian Ridge 29.1856 1810.6336 42.5515 0.6868 0.7105 2.3321 0.03
lasso Lasso Regression 30.0671 1897.7419 43.5631 0.6717 0.7275 2.6124 0.02
en Elastic Net 32.2494 1944.7172 44.0989 0.6636 0.7063 1.8929 0.01
huber Huber Regressor 29.3523 2053.9846 45.3209 0.6447 0.7408 2.9267 0.22
ada AdaBoost Regressor 36.0101 2121.9747 46.0649 0.6329 0.8378 1.4045 0.89
par Passive Aggressive Regressor 32.5164 2264.5766 47.5876 0.6082 0.8107 3.0197 0.02
omp Orthogonal Matching Pursuit 36.1562 2731.0672 52.2596 0.5275 0.7631 3.2885 0.01
llar Lasso Least Angle Regression 61.6404 5780.9048 76.0323 -0.0001 1.0266 2.8614 0.02
dummy Dummy Regressor 61.6404 5780.9048 76.0323 -0.0001 1.0266 2.8614 0.00
INFO:logs:create_model_container: 19
INFO:logs:master_model_container: 19
INFO:logs:display_container: 4
INFO:logs:LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
              importance_type='split', learning_rate=0.1, max_depth=-1,
              min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
              n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
              random_state=123, reg_alpha=0.0, reg_lambda=0.0, silent='warn',
              subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
INFO:logs:compare_models() succesfully completed......................................
In [10]:
lasso = pycaret.regression.create_model('lasso', cross_validation = False)
MAE MSE RMSE R2 RMSLE MAPE
0 30.067101 1897.741943 43.563099 0.6717 0.7275 2.6124
INFO:logs:display_container: 5
INFO:logs:Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)
INFO:logs:create_models() succesfully completed......................................

3.4. Model Analysis

In [11]:
pycaret.regression.evaluate_model(lasso)
INFO:logs:Initializing evaluate_model()
INFO:logs:evaluate_model(estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False), fold=None, fit_kwargs=None, plot_kwargs=None, feature_name=None, groups=None, use_train_data=False)

3.5. Hyperparameter Tuning





  • Grid search, random search, Bayesian optimization, etc.
  • It can be seen that the performance improves through hyperparameter tuning
In [12]:
tuned_model = pycaret.regression.tune_model(lasso, 
                                            optimize = 'R2', 
                                            choose_better = True, 
                                            n_iter = 50, 
                                            fold = 5)
MAE MSE RMSE R2 RMSLE MAPE
Fold
0 27.6558 1813.2025 42.5817 0.6968 0.6264 1.0405
1 28.1306 1457.2672 38.1742 0.7261 0.6463 1.0540
2 27.9286 1596.6030 39.9575 0.7274 0.6776 1.2729
3 27.3751 1530.2343 39.1182 0.7283 0.6077 0.7910
4 27.1054 1712.9308 41.3876 0.7003 0.6683 2.7697
Mean 27.6391 1622.0476 40.2438 0.7158 0.6452 1.3856
Std 0.3688 127.3025 1.5752 0.0141 0.0259 0.7086
INFO:logs:create_model_container: 21
INFO:logs:master_model_container: 21
INFO:logs:display_container: 6
INFO:logs:Lasso(alpha=0.19, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)
INFO:logs:tune_model() succesfully completed......................................

Before hyperparameter tuning

In [13]:
lasso
Out[13]:
Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)

After hyperparameter tuning

In [14]:
tuned_model
Out[14]:
Lasso(alpha=0.19, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False)

3.6. Prediction





  • Generates the label using a trained model
  • Test the trained model on unseen data
In [15]:
pycaret.regression.predict_model(tuned_model)
INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=Lasso(alpha=0.19, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False), probability_threshold=None, encoded_labels=True, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.REGRESSION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor
Model MAE MSE RMSE R2 RMSLE MAPE
0 Lasso Regression 29.2663 1829.792603 42.7761 0.6834 0.7118 2.4135
Out[15]:
vpa density MagpieData mean MeltingT MagpieData mean NUnfilled packing fraction MagpieData mode MeltingT MagpieData minimum NUnfilled MagpieData maximum GSvolume_pa MagpieData mean GSvolume_pa MagpieData minimum NValence ... MagpieData minimum MeltingT MagpieData maximum MeltingT MagpieData maximum NdValence MagpieData mode GSvolume_pa MagpieData mean MendeleevNumber MagpieData minimum Electronegativity MagpieData minimum MendeleevNumber std_dev oxidation state k_vrh Label
0 0.701711 -0.798078 -0.837051 0.211009 1.895607 -0.898599 -0.368097 4.474749 2.278716 -0.918808 ... -0.834311 -0.157167 -1.604618 -0.717333 -1.143494 -1.529214 -1.211086 0.443816 63.367096 54.765396
1 -0.617081 -0.457117 -0.511846 -0.420388 -0.753925 -0.898599 -0.368097 -0.708838 -1.022111 -0.093024 ... -0.834311 -0.006234 0.843769 -0.717333 0.627458 0.276397 0.239350 0.447951 76.100433 90.682869
2 1.468199 -0.278924 -0.616194 -1.192096 -1.116219 -0.177000 -1.006112 0.236755 0.562947 2.109065 ... 0.128198 -1.454171 0.843769 -0.222338 1.089826 0.541220 1.231753 0.528459 43.414299 21.156166
3 -0.572876 -0.881211 -0.652042 0.304550 -1.339661 -0.900070 -0.368097 -0.708838 -0.900161 -0.093024 ... -0.836273 -0.006234 -1.332575 -0.717333 0.729237 0.276397 0.239350 0.421930 36.862297 93.154945
4 -0.867244 2.043521 1.319934 3.859084 1.163660 1.695573 2.183963 -0.534352 -0.944250 -0.368285 ... 1.207558 0.477466 -1.604618 -0.914362 -0.643324 -0.108800 -0.638545 2.590237 210.491104 276.203522
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
710 -0.107009 -0.698368 -0.290859 1.333494 0.190230 -0.898599 0.269918 0.111275 -0.156392 -0.368285 ... -0.834311 -0.157167 -1.604618 -0.717333 -1.021360 -0.493996 -0.982069 -1.120963 114.602539 94.318146
711 -0.639954 -0.528004 -0.777578 -0.892767 -0.540587 -0.898599 -0.368097 -0.052050 -0.970308 -0.918808 ... -0.834311 -0.699374 0.843769 -0.717333 0.575696 -1.192166 -1.325594 0.487291 119.183334 71.598373
712 0.407570 0.451744 0.210555 0.912562 0.871026 1.091488 -1.006112 -0.261630 0.229826 -0.368285 ... -0.563430 -0.157167 0.843769 0.621347 -1.701824 -0.156949 -0.982069 -1.120963 78.364815 87.682449
713 0.579226 -0.080156 0.558155 0.865792 -0.553754 -0.143086 0.269918 0.236755 0.769799 -0.093024 ... 0.173435 0.216006 0.843769 0.438336 0.473336 -0.229173 0.277519 -1.120963 35.157047 91.986328
714 -0.841389 -0.216850 -1.202937 0.042637 -0.608252 -0.898599 0.269918 -0.312777 -0.692642 -0.368285 ... -0.834311 -1.203634 0.843769 -0.717333 1.250346 0.444921 1.384430 0.787872 184.245941 107.263069

715 rows × 24 columns

Prediction on any data point

In [16]:
rf = pycaret.regression.create_model('rf', fold = 5)
MAE MSE RMSE R2 RMSLE MAPE
Fold
0 13.4856 882.6203 29.7089 0.8524 0.3264 0.3242
1 13.8110 560.0324 23.6650 0.8947 0.3412 0.3628
2 13.0609 563.2316 23.7325 0.9038 0.3299 0.2862
3 13.7995 819.6946 28.6303 0.8545 0.3059 0.2450
4 12.9405 666.5537 25.8177 0.8834 0.3445 0.4045
Mean 13.4195 698.4265 26.3109 0.8778 0.3296 0.3245
Std 0.3633 131.9695 2.4827 0.0209 0.0136 0.0559
INFO:logs:create_model_container: 22
INFO:logs:master_model_container: 22
INFO:logs:display_container: 8
INFO:logs:RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=-1, oob_score=False,
                      random_state=123, verbose=0, warm_start=False)
INFO:logs:create_model() succesfully completed......................................
In [17]:
arbitrary_point = df_reg.iloc[236, 1:].values.reshape(1,-1)

pred = rf.predict(arbitrary_point)
ground_truth = df_reg.iloc[236,0]

print("Model prediction: ", pred)
print("Ground truth: ", ground_truth)
Model prediction:  [161.26855656]
Ground truth:  157.38629328397334

4. Classification

  • Classify whether the material is metal or non-metal from composition information
  • Input: Descriptors obtained from composition
  • Output: 0 (Non-metal) or 1 (Metal)
  • Follow the same procedure as for regression task
In [18]:
import pycaret.classification
In [19]:
df_cls = pd.read_csv("/content/drive/MyDrive/kstp/data_files/df_cls.csv", index_col = 0)
df_cls
Out[19]:
formula is_metal H He Li Be B C N O ... Fm Md No Lr 0-norm 2-norm 3-norm 5-norm 7-norm 10-norm
0 Ag(AuS)2 True 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 3 0.600000 0.514256 0.460906 0.441882 0.428730
1 Ag(W3Br7)2 True 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 3 0.726873 0.683796 0.668584 0.666919 0.666681
2 Ag0.5Ge1Pb1.75S4 False 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 4 0.621647 0.569761 0.553591 0.551970 0.551738
3 Ag0.5Ge1Pb1.75Se4 False 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 4 0.621647 0.569761 0.553591 0.551970 0.551738
4 Ag2BBr True 0.0 0 0.0 0.0 0.25 0.0 0.0 0.00 ... 0 0 0 0 3 0.612372 0.538609 0.506099 0.501109 0.500098
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4916 ZrTaN3 False 0.0 0 0.0 0.0 0.00 0.0 0.6 0.00 ... 0 0 0 0 3 0.663325 0.614463 0.600984 0.600078 0.600002
4917 ZrTe True 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 2 0.707107 0.629961 0.574349 0.552045 0.535887
4918 ZrTi2O True 0.0 0 0.0 0.0 0.00 0.0 0.0 0.25 ... 0 0 0 0 3 0.612372 0.538609 0.506099 0.501109 0.500098
4919 ZrTiF6 True 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 3 0.770552 0.752308 0.750039 0.750001 0.750000
4920 ZrW2 True 0.0 0 0.0 0.0 0.00 0.0 0.0 0.00 ... 0 0 0 0 2 0.745356 0.693361 0.670782 0.667408 0.666732

4921 rows × 111 columns

Use the ramaining data except formula

In [20]:
df_cls = df_cls.iloc[:,1:]
In [21]:
cls = pycaret.classification.setup(data = df_cls, target = 'is_metal', train_size = 0.9, fold = 5, silent = True, session_id = 123)
Description Value
0 session_id 123
1 Target is_metal
2 Target Type Binary
3 Label Encoded False: 0, True: 1
4 Original Data (4921, 110)
5 Missing Values False
6 Numeric Features 85
7 Categorical Features 24
8 Ordinal Features False
9 High Cardinality Features False
10 High Cardinality Method None
11 Transformed Train Set (4428, 109)
12 Transformed Test Set (493, 109)
13 Shuffle Train-Test True
14 Stratify Train-Test False
15 Fold Generator StratifiedKFold
16 Fold Number 5
17 CPU Jobs -1
18 Use GPU False
19 Log Experiment False
20 Experiment Name clf-default-name
21 USI 17db
22 Imputation Type simple
23 Iterative Imputation Iteration None
24 Numeric Imputer mean
25 Iterative Imputation Numeric Model None
26 Categorical Imputer constant
27 Iterative Imputation Categorical Model None
28 Unknown Categoricals Handling least_frequent
29 Normalize False
30 Normalize Method None
31 Transformation False
32 Transformation Method None
33 PCA False
34 PCA Method None
35 PCA Components None
36 Ignore Low Variance False
37 Combine Rare Levels False
38 Rare Level Threshold None
39 Numeric Binning False
40 Remove Outliers False
41 Outliers Threshold None
42 Remove Multicollinearity False
43 Multicollinearity Threshold None
44 Remove Perfect Collinearity True
45 Clustering False
46 Clustering Iteration None
47 Polynomial Features False
48 Polynomial Degree None
49 Trignometry Features False
50 Polynomial Threshold None
51 Group Features False
52 Feature Selection False
53 Feature Selection Method classic
54 Features Selection Threshold None
55 Feature Interaction False
56 Feature Ratio False
57 Interaction Threshold None
58 Fix Imbalance False
59 Fix Imbalance Method SMOTE
INFO:logs:create_model_container: 0
INFO:logs:master_model_container: 0
INFO:logs:display_container: 1
INFO:logs:Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=False, features_todrop=[],
                                      id_columns=[],
                                      ml_usecase='classification',
                                      numerical_features=[], target='is_metal',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_numerical=None,
                                numeric_st...
                ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                ('cluster_all', 'passthrough'),
                ('dummy', Dummify(target='is_metal')),
                ('fix_perfect', Remove_100(target='is_metal')),
                ('clean_names', Clean_Colum_Names()),
                ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                ('dfs', 'passthrough'), ('pca', 'passthrough')],
         verbose=False)
INFO:logs:setup() succesfully completed......................................

Classification metrics





  • Accuracy, AUC, Recall, F1, etc.
  • For all metrics, the closer to 1, the better the classification performance
In [22]:
models = pycaret.classification.compare_models(sort = 'Accuracy')
Model Accuracy AUC Recall Prec. F1 Kappa MCC TT (Sec)
lightgbm Light Gradient Boosting Machine 0.9063 0.9644 0.9003 0.9118 0.9058 0.8126 0.8129 0.236
et Extra Trees Classifier 0.9006 0.9638 0.8867 0.9127 0.8993 0.8013 0.8020 0.702
rf Random Forest Classifier 0.8970 0.9598 0.8822 0.9096 0.8956 0.7941 0.7946 0.718
dt Decision Tree Classifier 0.8740 0.8740 0.8637 0.8825 0.8727 0.7480 0.7487 0.090
lda Linear Discriminant Analysis 0.8726 0.9383 0.8606 0.8825 0.8712 0.7453 0.7458 0.126
knn K Neighbors Classifier 0.8713 0.9289 0.8538 0.8853 0.8691 0.7426 0.7432 0.638
ridge Ridge Classifier 0.8701 0.0000 0.8601 0.8784 0.8690 0.7403 0.7408 0.026
lr Logistic Regression 0.8650 0.9331 0.8660 0.8648 0.8652 0.7299 0.7303 0.244
gbc Gradient Boosting Classifier 0.8629 0.9379 0.8764 0.8540 0.8649 0.7258 0.7264 0.818
ada Ada Boost Classifier 0.8589 0.9337 0.8642 0.8555 0.8596 0.7177 0.7181 0.350
svm SVM - Linear Kernel 0.8496 0.0000 0.8949 0.8313 0.8572 0.6991 0.7107 0.108
nb Naive Bayes 0.7660 0.8791 0.6367 0.8597 0.7308 0.5322 0.5515 0.048
qda Quadratic Discriminant Analysis 0.5452 0.5756 0.5143 0.7212 0.4123 0.0898 0.1255 0.092
dummy Dummy Classifier 0.5005 0.5000 1.0000 0.5005 0.6671 0.0000 0.0000 0.036
INFO:logs:create_model_container: 14
INFO:logs:master_model_container: 14
INFO:logs:display_container: 2
INFO:logs:LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_depth=-1,
               min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0,
               n_estimators=100, n_jobs=-1, num_leaves=31, objective=None,
               random_state=123, reg_alpha=0.0, reg_lambda=0.0, silent='warn',
               subsample=1.0, subsample_for_bin=200000, subsample_freq=0)
INFO:logs:compare_models() succesfully completed......................................
In [23]:
rf = pycaret.classification.create_model('rf', fold = 5)
Accuracy AUC Recall Prec. F1 Kappa MCC
Fold
0 0.8883 0.9587 0.8626 0.9097 0.8855 0.7765 0.7776
1 0.8905 0.9518 0.8939 0.8879 0.8909 0.7810 0.7811
2 0.8962 0.9639 0.8916 0.8998 0.8957 0.7923 0.7924
3 0.9062 0.9622 0.8849 0.9245 0.9043 0.8124 0.8132
4 0.9040 0.9622 0.8781 0.9262 0.9015 0.8079 0.8090
Mean 0.8970 0.9598 0.8822 0.9096 0.8956 0.7941 0.7946
Std 0.0071 0.0043 0.0113 0.0146 0.0068 0.0142 0.0144
INFO:logs:create_model_container: 15
INFO:logs:master_model_container: 15
INFO:logs:display_container: 3
INFO:logs:RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=123, verbose=0,
                       warm_start=False)
INFO:logs:create_model() succesfully completed......................................
In [24]:
pycaret.classification.evaluate_model(rf)
INFO:logs:Initializing evaluate_model()
INFO:logs:evaluate_model(estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=123, verbose=0,
                       warm_start=False), fold=None, fit_kwargs=None, plot_kwargs=None, feature_name=None, groups=None, use_train_data=False)
In [25]:
pycaret.classification.predict_model(rf)
INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=123, verbose=0,
                       warm_start=False), probability_threshold=None, encoded_labels=False, drift_report=False, raw_score=False, round=4, verbose=True, ml_usecase=MLUsecase.CLASSIFICATION, display=None, drift_kwargs=None)
INFO:logs:Checking exceptions
INFO:logs:Preloading libraries
INFO:logs:Preparing display monitor
Model Accuracy AUC Recall Prec. F1 Kappa MCC
0 Random Forest Classifier 0.9047 0.9627 0.8894 0.9087 0.8989 0.8087 0.8089
Out[25]:
H Li Be B C N O F Na Mg ... Fm_0 Md_0 No_0 Lr_0 0-norm_2 0-norm_3 0-norm_4 is_metal Label Score
0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 1.0 0.0 True True 0.98
1 0.0 0.0 0.0 0.0 0.0 0.0 0.636364 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 1.0 0.0 True True 0.86
2 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 1.0 0.0 True True 0.87
3 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 0.0 1.0 False False 1.00
4 0.0 0.0 0.0 0.0 0.0 0.0 0.250000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 0.0 1.0 False False 0.71
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
488 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 1.0 0.0 True True 0.99
489 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 1.0 0.0 0.0 True False 0.75
490 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 0.0 1.0 False False 0.98
491 0.0 0.0 0.0 0.0 0.0 0.0 0.250000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 0.0 1.0 False False 0.73
492 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.0 0.0 ... 1.0 1.0 1.0 1.0 0.0 1.0 0.0 True True 0.56

493 rows × 112 columns

In [2]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

\n \n \n "}}, {"name": "stderr", "text": ["INFO:logs:Visual Rendered Successfully\n"], "output_type": "stream"}, {"name": "stderr", "text": ["INFO:logs:plot_model() succesfully completed......................................\n"], "output_type": "stream"}], "_model_module": "@jupyter-widgets/output", "_view_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "layout": "IPY_MODEL_11613422cd9745f4b53a92c5f7a02bce", "_view_count": null, "_view_module_version": "1.0.0"}, "model_name": "OutputModel", "model_module_version": "1.0.0"}, "bcaae2da1b164f339d079b05651f6f61": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_e7a52d0c6b9043dfa6255293aca09584", "_view_name": "ProgressView", "max": 4, "orientation": "horizontal", "_model_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "min": 0, "description": "Processing: ", "_dom_classes": [], "value": 4, "description_tooltip": null, "_model_name": "IntProgressModel", "bar_style": "", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "layout": "IPY_MODEL_cd03b6f10b714aab9b8d04b1338f8dfa", "_view_count": null}, "model_name": "IntProgressModel", "model_module_version": "1.5.0"}, "7afe45025609434da6cae3d887a95f6f": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_5c2585fd331f4bbba9e7295f37887a3e", "_view_name": "ToggleButtonsView", "icons": [""], "disabled": false, "_model_module": "@jupyter-widgets/controls", "_options_labels": ["Hyperparameters", "Residuals", "Prediction Error", "Cooks Distance", "Feature Selection", "Learning Curve", "Manifold Learning", "Validation Curve", "Feature Importance", "Feature Importance (All)", "Decision Tree", "Interactive Residuals"], "tooltips": [], "description": "Plot Type:", "button_style": "", "description_tooltip": null, "index": 0, "_model_module_version": "1.5.0", "_dom_classes": [], "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "layout": "IPY_MODEL_0a655a8148ba475ab25c1d306339b812", "_view_count": null, "_model_name": "ToggleButtonsModel"}, "model_name": "ToggleButtonsModel", "model_module_version": "1.5.0"}, "adca065ba54041c5986539558fc8d758": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "a37501c8c7784514b1210a406e2e18da": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "478b46bd049e4206acb9b030b06de808": {"model_module": "@jupyter-widgets/controls", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "", "_model_module": "@jupyter-widgets/controls", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null}, "model_name": "ProgressStyleModel", "model_module_version": "1.5.0"}, "67b1f243788948c8b8623e5881cc257c": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "0047d0763929438ba41590acc5162fab": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_4b2b790bfc85460986c987ee29e7b3a0", "_view_name": "ProgressView", "max": 5, "orientation": "horizontal", "_model_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "min": 0, "description": "Processing: ", "_dom_classes": [], "value": 3, "description_tooltip": null, "_model_name": "IntProgressModel", "bar_style": "", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "layout": "IPY_MODEL_adca065ba54041c5986539558fc8d758", "_view_count": null}, "model_name": "IntProgressModel", "model_module_version": "1.5.0"}, "f65be7effe6c44db9f2fd05a8847ab44": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_ade5c60376f5480fb1c66256c83c24df", "_view_name": "ProgressView", "max": 7, "orientation": "horizontal", "_model_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "min": 0, "description": "Processing: ", "_dom_classes": [], "value": 7, "description_tooltip": null, "_model_name": "IntProgressModel", "bar_style": "", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "layout": "IPY_MODEL_5abe36ec62704f9e9b9939bc7f8bcf41", "_view_count": null}, "model_name": "IntProgressModel", "model_module_version": "1.5.0"}, "cd03b6f10b714aab9b8d04b1338f8dfa": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "9486b5ffd13943ff8961b4f510b49637": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_29e2fcd0e3444fe89d5c7fc9230238d1", "_view_name": "ToggleButtonsView", "icons": [""], "disabled": false, "_model_module": "@jupyter-widgets/controls", "_options_labels": ["Hyperparameters", "AUC", "Confusion Matrix", "Threshold", "Precision Recall", "Prediction Error", "Class Report", "Feature Selection", "Learning Curve", "Manifold Learning", "Calibration Curve", "Validation Curve", "Dimensions", "Feature Importance", "Feature Importance (All)", "Decision Boundary", "Lift Chart", "Gain Chart", "Decision Tree", "KS Statistic Plot"], "tooltips": [], "description": "Plot Type:", "button_style": "", "description_tooltip": null, "index": 0, "_model_module_version": "1.5.0", "_dom_classes": [], "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "layout": "IPY_MODEL_e41e63712c8c4144b1ad2d2458bae6db", "_view_count": null, "_model_name": "ToggleButtonsModel"}, "model_name": "ToggleButtonsModel", "model_module_version": "1.5.0"}, "d38aeb2c118b4f159ce8d09ba22c855a": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "22565fd7f6d4445dbfc005603d02062e": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_607706ec63884ad382fbd93e26a2582e", "_view_name": "ProgressView", "max": 94, "orientation": "horizontal", "_model_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "min": 0, "description": "Processing: ", "_dom_classes": [], "value": 94, "description_tooltip": null, "_model_name": "IntProgressModel", "bar_style": "", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "layout": "IPY_MODEL_d38aeb2c118b4f159ce8d09ba22c855a", "_view_count": null}, "model_name": "IntProgressModel", "model_module_version": "1.5.0"}, "d3c34fa526724b3da221cc6c93338b76": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "6742b24318f34c6f961e2c4b543cbfd2": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_69563015f6e249179ea0b5b52aed2d90", "_view_name": "ProgressView", "max": 4, "orientation": "horizontal", "_model_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "min": 0, "description": "Processing: ", "_dom_classes": [], "value": 4, "description_tooltip": null, "_model_name": "IntProgressModel", "bar_style": "", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "layout": "IPY_MODEL_d9fde315ca4944db818c8ada35659195", "_view_count": null}, "model_name": "IntProgressModel", "model_module_version": "1.5.0"}, "2bcab833c2734b849aef4b9cf9576e5e": {"model_module": "@jupyter-widgets/controls", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "", "_model_module": "@jupyter-widgets/controls", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null}, "model_name": "ProgressStyleModel", "model_module_version": "1.5.0"}, "aaa508890dd54a5c8e6f0eaa316aee35": {"model_module": "@jupyter-widgets/controls", "state": {"style": "IPY_MODEL_8f5d2dc80dc047228e0c15ab248bd089", "_view_name": "ProgressView", "max": 94, "orientation": "horizontal", "_model_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "min": 0, "description": "Processing: ", "_dom_classes": [], "value": 94, "description_tooltip": null, "_model_name": "IntProgressModel", "bar_style": "", "_view_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "layout": "IPY_MODEL_d3c34fa526724b3da221cc6c93338b76", "_view_count": null}, "model_name": "IntProgressModel", "model_module_version": "1.5.0"}, "5abe36ec62704f9e9b9939bc7f8bcf41": {"model_module": "@jupyter-widgets/base", "state": {"object_position": null, "top": null, "height": null, "object_fit": null, "overflow_x": null, "justify_content": null, "grid_row": null, "_view_module": "@jupyter-widgets/base", "grid_column": null, "right": null, "grid_gap": null, "align_self": null, "_model_module": "@jupyter-widgets/base", "visibility": null, "align_content": null, "border": null, "grid_area": null, "width": null, "flex_flow": null, "_view_count": null, "grid_template_columns": null, "margin": null, "order": null, "left": null, "grid_template_areas": null, "_view_name": "LayoutView", "bottom": null, "grid_auto_flow": null, "_model_name": "LayoutModel", "justify_items": null, "_view_module_version": "1.2.0", "flex": null, "grid_template_rows": null, "overflow": null, "padding": null, "align_items": null, "grid_auto_columns": null, "min_height": null, "display": null, "overflow_y": null, "max_height": null, "max_width": null, "_model_module_version": "1.2.0", "grid_auto_rows": null, "min_width": null}, "model_name": "LayoutModel", "model_module_version": "1.2.0"}, "22e49b2ceb0b4db5a42f2f05d369e5fa": {"model_module": "@jupyter-widgets/controls", "state": {"box_style": "", "_view_name": "VBoxView", "_view_count": null, "_model_name": "VBoxModel", "_dom_classes": ["widget-interact"], "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "layout": "IPY_MODEL_67b1f243788948c8b8623e5881cc257c", "children": ["IPY_MODEL_7afe45025609434da6cae3d887a95f6f", "IPY_MODEL_bf3332ad63c54a879b3c116a39112e7a"], "_model_module_version": "1.5.0", "_model_module": "@jupyter-widgets/controls"}, "model_name": "VBoxModel", "model_module_version": "1.5.0"}, "bf3332ad63c54a879b3c116a39112e7a": {"model_module": "@jupyter-widgets/output", "state": {"_dom_classes": [], "_view_name": "OutputView", "_model_name": "OutputModel", "msg_id": "", "outputs": [{"output_type": "display_data", "metadata": {}, "data": {"text/plain": " Parameters\nalpha 1.0\ncopy_X True\nfit_intercept True\nmax_iter 1000\nnormalize False\npositive False\nprecompute False\nrandom_state 123\nselection cyclic\ntol 0.0001\nwarm_start False", "text/html": "\n

\n
\n
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Parameters
alpha1.0
copy_XTrue
fit_interceptTrue
max_iter1000
normalizeFalse
positiveFalse
precomputeFalse
random_state123
selectioncyclic
tol0.0001
warm_startFalse
\n
\n \n \n \n\n \n
\n

\n "}}, {"name": "stderr", "text": ["INFO:logs:Visual Rendered Successfully\n"], "output_type": "stream"}, {"name": "stderr", "text": ["INFO:logs:plot_model() succesfully completed......................................\n"], "output_type": "stream"}], "_model_module": "@jupyter-widgets/output", "_view_module": "@jupyter-widgets/output", "_model_module_version": "1.0.0", "layout": "IPY_MODEL_ca9d56fe8597405e9556f7b24932d33e", "_view_count": null, "_view_module_version": "1.0.0"}, "model_name": "OutputModel", "model_module_version": "1.0.0"}, "d60805293cc24ccdb8960ac57cfb3caf": {"model_module": "@jupyter-widgets/controls", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "", "_model_module": "@jupyter-widgets/controls", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null}, "model_name": "ProgressStyleModel", "model_module_version": "1.5.0"}, "14c299f210154b2f9b7048522e819aae": {"model_module": "@jupyter-widgets/controls", "state": {"_view_name": "StyleView", "_model_name": "ProgressStyleModel", "description_width": "", "_model_module": "@jupyter-widgets/controls", "_view_module": "@jupyter-widgets/base", "_model_module_version": "1.5.0", "_view_count": null, "_view_module_version": "1.2.0", "bar_color": null}, "model_name": "ProgressStyleModel", "model_module_version": "1.5.0"}}