Posts
Statistics for Data Science
Introduction In the realm of data science, statistics reign supreme. They serve as the cornerstone of every data scientist’s journey, shaping each analytical endeavour from inception to fruition. Before delving into the intricate world of algorithms and predictive modelling, one must embark on a voyage of exploration known as exploratory data analysis (EDA). This preliminary step involves deciphering the intricacies of the data landscape through the lens of statistical techniques.
Posts
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 6: Advance Evaluation
Part 5: Modeling - Train, Test and Evaluate
In this final part we will evaluate the model that we choose on the chapter before (Random Forest) further using several evaluation metrics. The metrics we will use to evaluate are:
ROC, AUC and KS test since these are the most common metrics to evaluate credit risk modeling. K-fold cross validation to make sure there are no data leakage or overfitting.
Posts
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 5: Modeling - Train, Test and Evaluate
Part 4: Feature Scaling and Encoding
These are the steps we will conduct in this part of the project:
Divide our dataset into training set and testing set. Conducting imbalance resampling only on the train set. Developing several models. Evaluate the model. Train - test split (80% - 20%) As we know, the df_model DataFrame have 242059 rows, based on the size we will do a 80 train - 20 test split.
Posts
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 4: Feature Scaling and Encoding
Part 3: Feature Engineering and Selection
Feature Scaling Feature Scaling:
Feature scaling is a preprocessing step in machine learning that standardizes or normalizes the range of independent variables or features of the dataset. The goal is to bring all features to a similar scale, preventing some features from dominating others and ensuring that the model can learn more effectively. Two common methods for feature scaling are:
Min-Max Scaling (Normalization): Scales the values in a feature to a range between 0 and 1 Scales the values in a feature to have a mean of 0 and a standard deviation of 1.
Posts
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 3: Feature Engineering and Selection
Part 2: Defining The Label and Making Target Column
What Are Feature Engineering and Feature Selection? Why It’s Important? Feature Engineering:
Feature engineering involves creating new features or modifying existing features in a dataset to improve the performance of machine learning models. It’s a crucial step in the data preprocessing phase. The goal is to provide the model with more meaningful and relevant information, helping it to better understand patterns and relationships within the data.
Posts
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 2: Defining The Label and Making Target Column
Part 1: Understanding The Data
What is Label and Target Column In machine learning, the concept of a label and a target column is crucial for supervised learning tasks. Let’s break down these terms:
Label:
In machine learning, a label refers to the output or the dependent variable that the model is trying to predict. It represents the “answer” or the expected outcome for each input example in the dataset.
Posts
Machine Learning Credit Risk Modelling : A Supervised Learning. Part 1: Understanding The Data
Business Understanding Credit risk refers to the potential that a borrower may fail to meet their financial obligations, such as repaying a loan or meeting interest payments. It is a fundamental component of lending and financial services, and understanding and managing credit risk is crucial for banks, financial institutions, and lenders.
Assessing credit risk can help financial institution to manage and minimize the probability of false positive( i.e.: lending money to someone who can’t repay) and false negative( i.