Fuzzified Clustering and Sample Reduction for Intelligent High Performance Distributed Classification of Heterogeneous Uncertain Big Data

Authors

  • Sherouk Samir Moawad Statistics Department, Faculty of Economics and Political Sciences, Cairo University, Egypt
  • Magued Osman Statistics Department, Faculty of Economics and Political Sciences, Cairo University, Giza, Egypt
  • Ahmed Shawky Moussa Computer Science Department, Faculty of Computers and Artificial Intelligence, Cairo University, Egypt

DOI:

https://doi.org/10.19139/soic-2310-5070-2275

Keywords:

Big Data, Fuzzified Clustering, Classifier Ensemble, Weighted Subsampling, Parallel Classification, Sample Reduction, Veracity.

Abstract

diverse datasets efficiently. This paper introduces a Fuzzified Clustering technique with sample reduction and distributed Parallel Classification (FCPC). Fuzzified clustering is particularly well-suited for Big Data as it enables the intelligent partitioning of datasets while managing uncertainties and overlapping data points. The FCPC technique takes advantage of this capability to reduce dataset size, capturing essential data structures and enhancing classification performance. Benchmark Big Data sets are used to compare FCPC with traditional classifiers, which require the entire dataset to fit in memory. Four classification techniques were evaluated in terms of classification evaluation metrics, namely, Accuracy, Area Under the ROC Curve, and F1 Score. The proposed model demonstrated improved classification predictive power with a sample reduction of approximately 90%, leading to enhanced performance and potential reductions in computational resources.

Downloads

Published

2025-01-29

Issue

Section

Research Articles

How to Cite

Fuzzified Clustering and Sample Reduction for Intelligent High Performance Distributed Classification of Heterogeneous Uncertain Big Data. (2025). Statistics, Optimization & Information Computing, 13(3), 1162-1191. https://doi.org/10.19139/soic-2310-5070-2275