my_station

Welcome to my domain, enjoy my posts and feel free to contact me at your convenience if you have any questions or need help with anything.

family_snowfall

Statistics for Data Science

Introduction In the realm of data science, statistics reign supreme. They serve as the cornerstone of every data scientist’s journey, shaping each analytical endeavour from inception to fruition. Before delving into the intricate world of algorithms and predictive modelling, one must embark on a voyage of exploration known as exploratory data analysis (EDA). This preliminary step involves deciphering the intricacies of the data landscape through the lens of statistical techniques.

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 6: Advance Evaluation

Part 5: Modeling - Train, Test and Evaluate In this final part we will evaluate the model that we choose on the chapter before (Random Forest) further using several evaluation metrics. The metrics we will use to evaluate are: ROC, AUC and KS test since these are the most common metrics to evaluate credit risk modeling. K-fold cross validation to make sure there are no data leakage or overfitting.

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 5: Modeling - Train, Test and Evaluate

Part 4: Feature Scaling and Encoding These are the steps we will conduct in this part of the project: Divide our dataset into training set and testing set. Conducting imbalance resampling only on the train set. Developing several models. Evaluate the model. Train - test split (80% - 20%) As we know, the df_model DataFrame have 242059 rows, based on the size we will do a 80 train - 20 test split.

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 4: Feature Scaling and Encoding

Part 3: Feature Engineering and Selection Feature Scaling Feature Scaling: Feature scaling is a preprocessing step in machine learning that standardizes or normalizes the range of independent variables or features of the dataset. The goal is to bring all features to a similar scale, preventing some features from dominating others and ensuring that the model can learn more effectively. Two common methods for feature scaling are: Min-Max Scaling (Normalization): Scales the values in a feature to a range between 0 and 1 Scales the values in a feature to have a mean of 0 and a standard deviation of 1.

The Best Way to Learn is to Do

There always room for improvement.

Recent Posts

Statistics for Data Science

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 6: Advance Evaluation

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 5: Modeling - Train, Test and Evaluate

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 4: Feature Scaling and Encoding

More

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 3: Feature Engineering and Selection

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 2: Defining The Label and Making Target Column

Machine Learning Credit Risk Modelling : A Supervised Learning. Part 1: Understanding The Data