Welcome to my domain, enjoy my posts and feel free to contact me at your convenience if you have any questions or need help with anything.
Recent Posts
Statistics for Data Science
Introduction In the realm of data science, statistics reign supreme. They serve as the cornerstone of every data scientist’s journey, shaping each analytical endeavour from inception to fruition. Before delving into the intricate world of algorithms and predictive modelling, one must embark on a voyage of exploration known as exploratory data analysis (EDA). This preliminary step involves deciphering the intricacies of the data landscape through the lens of statistical techniques.
read more
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 6: Advance Evaluation
Part 5: Modeling - Train, Test and Evaluate
In this final part we will evaluate the model that we choose on the chapter before (Random Forest) further using several evaluation metrics. The metrics we will use to evaluate are:
ROC, AUC and KS test since these are the most common metrics to evaluate credit risk modeling. K-fold cross validation to make sure there are no data leakage or overfitting.
read more
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 5: Modeling - Train, Test and Evaluate
Part 4: Feature Scaling and Encoding
These are the steps we will conduct in this part of the project:
Divide our dataset into training set and testing set. Conducting imbalance resampling only on the train set. Developing several models. Evaluate the model. Train - test split (80% - 20%) As we know, the df_model DataFrame have 242059 rows, based on the size we will do a 80 train - 20 test split.
read more
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 4: Feature Scaling and Encoding
Part 3: Feature Engineering and Selection
Feature Scaling Feature Scaling:
Feature scaling is a preprocessing step in machine learning that standardizes or normalizes the range of independent variables or features of the dataset. The goal is to bring all features to a similar scale, preventing some features from dominating others and ensuring that the model can learn more effectively. Two common methods for feature scaling are:
Min-Max Scaling (Normalization): Scales the values in a feature to a range between 0 and 1 Scales the values in a feature to have a mean of 0 and a standard deviation of 1.
read more