Comparative Analysis of Naive Bayes and K-Nearest Neighbors Algorithms for Customer Churn Prediction: A Kaggle Dataset Case Study
Main Article Content
Abstract
This research compares Naive Bayes and K-NN algorithms for predicting customer churn using a Kaggle dataset. The data preprocessing includes converting categorical variables and applying the SMOTE method for balanced data testing. Naive Bayes shows improved results on balanced data with SMOTE, while K-NN experiences a notable decrease in performance. Although K-NN maintains high accuracy (around 0.56), there are significant reductions in Precision, Recall, and F1-Score. Conversely, Naive Bayes on balanced data exhibits a decrease in F1-Score for the minority class ('exited') but maintains favorable performance. In conclusion, Naive Bayes is more robust to class imbalance than K-NN, especially with balanced data. The model choice depends on specific goals in addressing class imbalance. Further research should optimize KNN parameters for improved performance on imbalanced data, focusing on data scale and distribution variations.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.