Comparative Analysis of Naive Bayes and K-Nearest Neighbors Algorithms for Customer Churn Prediction: A Kaggle Dataset Case Study

Anggraeni Xena Paradita; Nathifa Agustiana; Asriana; Putri Utami Rukmana; Putri Nelsa; Muharman Lubis

Comparative Analysis of Naive Bayes and K-Nearest Neighbors Algorithms for Customer Churn Prediction: A Kaggle Dataset Case Study

Authors

Anggraeni Xena Paradita School of Industrial Engineering, Telkom University Author
Nathifa Agustiana School of Industrial Engineering, Telkom University Author
Asriana School of Industrial Engineering, Telkom University Author
Putri Utami Rukmana School of Industrial Engineering, Telkom University Author
Putri Nelsa School of Industrial Engineering, Telkom University Author
Muharman Lubis School of Industrial Engineering, Telkom University Author

Keywords:

Customer churn, naïve bayes, k-nearest neigbors, machine learning, prediction, bank

Abstract

This research compares Naive Bayes and K-NN algorithms for predicting customer churn using a Kaggle dataset. The data preprocessing includes converting categorical variables and applying the SMOTE method for balanced data testing. Naive Bayes shows improved results on balanced data with SMOTE, while K-NN experiences a notable decrease in performance. Although K-NN maintains high accuracy (around 0.56), there are significant reductions in Precision, Recall, and F1-Score. Conversely, Naive Bayes on balanced data exhibits a decrease in F1-Score for the minority class ('exited') but maintains favorable performance. In conclusion, Naive Bayes is more robust to class imbalance than K-NN, especially with balanced data. The model choice depends on specific goals in addressing class imbalance. Further research should optimize KNN parameters for improved performance on imbalanced data, focusing on data scale and distribution variations.

Downloads

Download data is not yet available.

Downloads

Published

2024-11-27

Issue

Vol. 1 No. 1 (2024): ACP-CS: 3rd International Conference on Information Science and Technology Innovation (ICoSTEC)

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

How to Cite

Comparative Analysis of Naive Bayes and K-Nearest Neighbors Algorithms for Customer Churn Prediction: A Kaggle Dataset Case Study. (2024). ASTEEC Conference Proceeding: Computer Science, 1(1), 76-81. https://www.proceedings.asteec.com/index.php/acp-cs/article/view/13