PSO+K-means Algorithm for Anomaly Detection in Big Data

  • Rasim M. Alguliyev Institute of Information Technology, Azerbaijan National Academy of Sciences
  • Ramiz M. Aliguliyev Institute of Information Technology, Azerbaijan National Academy of Sciences
  • Fargana J. Abdullayeva Institute of Information Technology, Azerbaijan National Academy of Sciences
Keywords: Anomaly detection, Big Data, Particle Swarm Optimization, Clustering, k-means

Abstract

The use of clustering methods in anomaly detection is considered as an effective approach. The choice of the cluster primary center and the finding of local optimum in the well-known k-means and other classic clustering algorithms are considered as one of the major problems and do not allow to get accurate results in anomaly detection. In this paper to improve the accuracy of anomaly detection based on the combination of PSO (particle swarm optimization) and k-means algorithms, the new weighted clustering method is proposed. The proposed method is tested on Yahoo! S5 dataset and a comparative analysis of the obtained results with the k-means algorithm is performed. The results of experiments show that compared to the k-means algorithm the proposed method is more robust and allows to get more accurate results.

Author Biographies

Rasim M. Alguliyev, Institute of Information Technology, Azerbaijan National Academy of Sciences
Director of the Institute of Information Technology of Azerbaijan National Academy of Sciences (ANAS) and academician-secretary of ANAS. He is full member of ANAS and full professor. He received BSc and MSc in electronic computing machines from the Azerbaijan Technical University in 1979. He received his PhD and Doctor of Science (higher degree after PhD) in Computer Science in 1995 and 2003, respectively. His research interests include: Information Security, E-government, Data Mining, Big Data, Online Social Network Analysis, Cloud Computing, Evolutionary and Swarm Computation, and Scientometrics. He is author of more than 590 papers, 4 monographs, 4 patents, several books.
Ramiz M. Aliguliyev, Institute of Information Technology, Azerbaijan National Academy of Sciences
Head of Department at the Institute of Information Technology of ANAS. He received BSc and MSc in applied mathematics from the Baku State University, Azerbaijan in 1983. He received his Ph.D. (2002) in Mathematics and Doctor of Science (higher degree after PhD) in Computer Science (2011). His research interests include Text Mining; Clustering; Evolutionary and Swarm Optimization; Web Mining; Online Social Network Analysis; Big Data Analytics and Scientometrics. He is author of 176 papers and 4 books.
Fargana J. Abdullayeva, Institute of Information Technology, Azerbaijan National Academy of Sciences
She graduated from “Computer techniques and technologies” faculty of Sumqait State University. In 2003 she was accepted for employment at Institute of Information Technology of ANAS, where she addressed issues of information security provision. In 2004, as a respondent, she commenced his work on the subject of “Development of methods and algorithms for providing information security of population and migration system”. In 2010, defense of the thesis took place. At present, she conducts researches on cloud computing security, anomaly detection, load balancing, task scheduling. She is author of more than 40 papers.

References

E. Macedo, Two-step semidefinite programming approach to clustering and dimensionality reduction, Statistics, Optimization and Information Computing, vol.3, no. 3, pp. 294–311, 2015.

N. Karmitsa, A.M. Bagirov, and S. Taheri, Clustering in large data sets with the limited memory bundle method, Pattern Recognition,vol. 83, pp. 245-259, 2018.

N. Karmitsa, A.M. Bagirov, and S. Taheri, New diagonal bundle method for clustering problems in large data sets, European Journal of Operational Research, vol. 263, no. 2, pp. 367-379, 2017.

C. Atilgan and E. Nasibov, A space efficient minimum spanning tree approach to the fuzzy joint points clustering algorithm, IEEE Transactions on Fuzzy Systems, 2018. DOI: 10.1109/TFUZZ.2018.2879465.

E.N. Nasibov and C. Atilgan, A note on fuzzy joint points clustering methods for large datasets, IEEE Transactions on Fuzzy Systems, vol. 24, no. 6, pp. 1648-1653, 2016.

R.M. Alguliyev, R.M. Aliguliyev, A.M. Bagirov, and R.R. Karimov, Batch clustering algorithm for big data sets, Proceedings of the 2016 IEEE 10th International Conference on Application of Information and Communication Technologies, pp.79-82, 2016.

R.M. Alguliyev, R.M. Aliguliyev, Y.N. Imamverdiyev, and L.V.Sukhostat, Weighted clustering for anomaly detection in big data,Statistics, Optimization and Information Computing, vol. 6, no. 2, pp. 178-188, 2018.

R.M. Alguliyev, R.M. Aliguliyev, and L.V. Sukhostat, Anomaly detection in big data based on clustering, Statistics, Optimization and Information Computing, vol. 5, no. 4, pp 325-340, 2017.

S. Rana, S. Jasola, and R. Kumar, A boundary restricted adaptive particle swarm optimization for data clustering, International Journal of Machine Learning and Cybernetics, vol. 4, no. 4, pp. 391-400, 2013.

A.A. Esmin, R.A. Coelho, and S. Matwin, A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data, Artificial Intelligence Review, vol. 44, no. 1, pp 23-45, 2015.

R.J. Kuo, M.J. Wang, and T.W. Huang An application of particle swarm optimization algorithm to clustering analysis, Soft Computing, vol. 15, no. 3, pp. 533-542, 2011.

P. Zhenkui, H. Xia, and H. Jinfeng, The clustering algorithm based on particle swarm optimization algorithm, Proceedings of the International Conference on Intelligent Computation Technology and Automation, pp. 148-151, 2008.

L. Xiao, Z. Shao, G. Liu, k-means algorithm based on particle swarm optimization algorithm for anomaly intrusion detection,Proceedings of the Sixth World Congress on Intelligent Control and Automation, pp. 5854-5854, 2006.

C. Kolias, G. Kambourakis, and M. Maragoudakis, Swarm intelligence in intrusion detection: A survey, Computers and Security,vol. 30, no. 8, pp. 625-642, 2011.

Z.Li,Y.Li,andL.Xu, Anomaly intrusion detection method based on k-means clustering algorithm with particle swarm optimization,Proceedings of the International Conference on Information Technology, Computer Engineering and Management Sciences, pp. 157-161, 2011.

S.H. Li, Y.C. Kao, Z.C. Zhang, Y.P. Chuang, and D.C. Yen, A network behavior-based botnet detection mechanism using PSO and k-means, ACM Transactions on Management Information Systems, vol. 6, no. 1,pp. 3-30, 2015.

A. Karami and M. Guerrero-Zapata, A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks Neurocomputing, vol 149, pp. 1253-1269, 2015.

R.M. Alguliyev, Y.N. Imamverdiyev, and F.C. Abdullayeva, Multicriteria optimization method for load balancing in cloud computing, Problems of Information Technology, no. 2, pp. 3-15, 2017.

J. Chen, Hybrid clustering algorithm based on PSO with the multidimensional asynchronism and stochastic disturbance method, Journal of Theoretical and Applied Information Technology, vol. 46, no.1, pp. 434-440, 2012.

J. Kennedy and R. Eberhart, Particle Swarm Optimization, Proceedings of the lEEE International Conference on Neural Networks, vol. 4, pp. 1942-1948, 1995.

R. Eberhart, Y. Shi, and J. Kennedy, Swarm Intelligence, (1st edition), 512 p., 2002.

R.M. Aliguliyev, Performance evaluation of density-based clustering methods, Information Sciences, vol. 179, no. 20, pp. 3583-3602, 2009.

J.C. Dunn, Well separated clusters and optimal fuzzy partitions, Journal of Cybernetics, vol. 4, no. 1, pp. 95-104, 1974.

S. Saitta, B. Raphael, and F.C. Smith, A bounded index for cluster validity, Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition, pp. 174-185, 2007.

Published
2019-05-19
How to Cite
Alguliyev, R. M., Aliguliyev, R. M., & Abdullayeva, F. J. (2019). PSO+K-means Algorithm for Anomaly Detection in Big Data. Statistics, Optimization & Information Computing, 7(2), 348-359. https://doi.org/10.19139/soic.v7i2.623
Section
Research Articles