Finding Category Value Using Mean Shift Clustering to Optimize Naïve Bayes Classification

Authors

  • Berlian Rahmy Lidiawaty Information System Study Program, School of Industrial and System Engineering, Telkom University, Surabaya Campus, Jl. Ketintang No.156, Surabaya 60231, East Java, Indonesia
  • Arip Ramadan Information System Study Program, School of Industrial and System Engineering, Telkom University, Surabaya Campus, Jl. Ketintang No.156, Surabaya 60231, East Java, Indonesia
  • Tita Ayu Rospricilia Information System Study Program, School of Industrial and System Engineering, Telkom University, Surabaya Campus, Jl. Ketintang No.156, Surabaya 60231, East Java, Indonesia
  • Najma Attaqiya Alya Data Science Technology Study Program, Department of Engineering, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga, Surabaya, 60115, Indonesia; Institute of Statistics and Data Science, Faculty of Science, National Tsing Hua University, Hsinchu, Taiwan
  • Dwi Rantini Data Science Technology Study Program, Department of Engineering, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga, Surabaya, 60115, Indonesia; Research Group of Data-Driven Decision Support System, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga, Surabaya, 60115, Indonesia
  • Alhassan Sesay Faculty of Transformative Education, the United Methodist University, Sierra Leone

DOI:

https://doi.org/10.19139/soic-2310-5070-3161

Keywords:

Naïve Bayes, Mean Shift Clustering, Classification, Optimize, Education

Abstract

The Naïve Bayes classifier is a simple classification method that can make predictions quickly and accurately by considering the independent variables separately from the class. However, in the Naïve Bayes classifier, each independent variable must be divided into several categories, while some of the data remain continuous and uncategorized. Therefore, this study proposes a measurable and precise model to categorize these independent variables effectively. The main objective is to develop a categorization model for independent variables using the Mean Shift clustering algorithm to optimize the performance of the Naïve Bayes classifier. To implement the proposed model, experiments were conducted on two types of datasets. The first dataset contains 191 records with 4 attributes and 6 classes, while the second dataset consists of 2,000 records with 7 attributes and 2 classes. In both datasets, several attributes were initially uncategorized and were categorized using the Mean Shift clustering method. The Mean Shift approach successfully grouped the uncategorized attributes into meaningful categories. In the first dataset, the accuracy of the proposed categorical Naïve Bayes classifier reached 80.1%, representing an improvement of 5.74%. Furthermore, in the second dataset, the accuracy increased to 84.25%, marking a 3% enhancement. The results of this research are expected to contribute to the field of education, especially in the subfield of machine learning.

Downloads

Published

2026-02-17

Issue

Section

Research Articles

How to Cite

Finding Category Value Using Mean Shift Clustering to Optimize Naïve Bayes Classification. (2026). Statistics, Optimization & Information Computing, 15(5), 4164-4171. https://doi.org/10.19139/soic-2310-5070-3161

Most read articles by the same author(s)