Similarity Technique Effectiveness of Optimized Fuzzy C-means Clustering Based on Fuzzy Support Vector Machine for Noisy Data

Fuzzy VIKOR C-means (FVCM) is a kind of unsupervised fuzzy clustering algorithm that improves the accuracy and computational speed of Fuzzy C-means (FCM). So it reduces the sensitivity to noisy and outlier data, and enhances performance and quality of clusters. Since FVCM allocates some data to a specific cluster based on similarity technique, reducing the effect of noisy data increases the quality of the clusters. This paper presents a new approach to the accurate location of noisy data to the clusters overcoming the constraints of noisy points through fuzzy support vector machine (FSVM), called FVCM-FSVM, so that at each stage samples with a high degree of membership are selected for training in the classification of FSVM. Then, the labels of the remaining samples are predicted so the process continues until the convergence of the FVCM-FSVM. The results of the numerical experiments showed the proposed approach has better performance than FVCM. Of course, it greatly achieves high accuracy.


Introduction
A data set divides through partition clustering algorithms into non-overlapping subsets so that each data belongs to a subset. In a fuzzy clustering, a point belongs to each cluster with a weight between zero and one. Fuzzy clustering has important applications in image processing, pattern recognition, object recognition, and so on [1,2,3] other similar techniques as fuzzy c-means [4], possibilistic fuzzy c-means (PFCM) [5], credibility fuzzy c-means (CFCM) [6]. FVCM [7] is an improved FCM algorithm to solve the problem of the sensitivity to noisy data in FCM. FVCM has a good performance in detecting noisy data. FVCM improves the accuracy and computational speed. The sensitivity to noisy and outlier data is one of FCM problems that a robust clustering approach called TCLUST [8] , fuzzy c-means-relaxed constraints support vector machine (FCM-RSVM) [9] , Relative entropy fuzzy c-means (REFCM) [10], and algorithms based on type-2 fuzzy sets such as [11,12] have recently been proposed to solve this problem. In addition, a comparison of partition algorithms is presented by Khanali and Vaziri in [13] . Sefidian and Daneshpour in [14] by combining grey system theory concepts and FCM to provide the clustering accuracy called grey based fuzzy c-means and mutual information (GFCMI) based feature selection imputation. For each missing record it determines a set of similar records, and estimates a missing value by using the regression imputation. Bu in [15] uses the tensor canonical polyadic decomposition in FCM to compress the attributes of each object and the amount of the data which run in Internet of Things (IoT) systems. Further, it called the high-order tensor fuzzy c-means (HOFCM) and enhances the clustering efficiency. Ramos et al. in [16] provide a diagnostic algorithm 619 for the problem of the novel faults such that density oriented fuzzy c-means decreases the confusion by removing the outlier data, and kernel fuzzy c-means classifies the data to reduce the classification errors. Heidari et al. in [17] offer a joint formulation by combining fuzzy c-means clustering and distance metric learning which reduces sensitivity to initializing, improves non-linear optimization problem and increases accuracy. Moreover, the FSVM [18] classification algorithm is a developed species of Support Vector Machine (SVM) [19] so that each of the training samples has different degrees of importance using the fuzzy concepts. This issue is considered in the learning process. Of course, FSVM has a stronger theory than neural networks. It also achieves overall optimality due to the quadratic structure. In addition, FSVM is resistant to over-tting and benefits from legible and geometric interpretations, and it uses a little space in storing the prediction model. Using the fuzzy membership function, FSVM can reduce the problem of classifying noisy and outlier data and increase the accuracy of allocation of border data. Due to various applications of SVM, recently an entropy based on fuzzy least squares support vector machine (EFLSSVM-CIL) and entropy based on fuzzy least squares twin support vector machine (EFLSTWSVM-CIL) are proposed by Gupta and Richhariya in [20] to decrease class imbalance impact in binary class data sets. The class imbalance is a common problem because the class that is very important in scope of application (the minority class) contains fewer samples than the class that has not particular importance (the majority class). Yan and Wang in [21] introduced matching decision method based on the improved SVM to optimize classifying small, uncertain input, and unbalanced samples using triangular fuzzy theory. Further, Hamidzadeh and Moslemnejad in [22] have proposed belief function and fuzzy rough set-boundary samples (BFFR-BS) to train procedure identifying the data points at the boundary of the classes and detecting noisy data points. Zhou et al. in [23] have introduced a new membership function to overcome the dependence of FSVM so that reduces the sensitivity to noisy and outlier data. To solve the problem of class imbalance, Samma et al. in [24] have used particle swarm optimization (PSO) and FSVM. In other case, for developing the linear and nonlinear FSVM, Tang in [25] has defined a new fuzzy membership function where the accuracy is improved and the negative impact of the outliers is reduced. FSVM with a hybrid kernel function and genetic algorithm (GA) is used by Zhou et al. in [26] to achieve better learning ability and performance called control chart patterns recognition (CCPR) method. Since the FVCM allocates some points to a particular cluster based on similarity technique, sometimes they may be attributed with low membership values. On the other hand, FSVM has more tolerant to noisy points and a short-term response by using a fuzzy membership functions. This is due to the fact that the classification method in choosing the decision boundary attempts to maximize the minimum distance to each of the classes, and how to select a boundary based on points called support vectors. Therefore, samples with high membership rates are identified for training in the classification of FSVM, and the labels of the remaining samples are predicted using the FSVM classification. The allocation of the noisy points is done with the correct replacement of them, and this process continues until the convergence of FVCM-FSVM. The rest of this paper is arranged as follows: in section 2, the general field of data mining and the main concepts of FSVM and FVCM are described. Section 3 presents the structure of the proposed algorithm. The experimental results are drawn in Section 4. Finally, conclusion and future work are presented in Section 5.

Background
Data mining helps process a large amount of data and discover the knowledge contained in data sources. Model learning methods of data exploration are categorized based on two groups, predictive (supervised) methods and descriptive (unsupervised) methods. Figure 1 shows the model learning methods. The predictive methods describe the values of some attributes to predict the value of a specified attribute. Classification, regression, and anomalies detection are among the methods of learning a model with the nature of prediction. The different types of classification algorithms are neural networks, support vector machines, decision trees, and Bayesian networks. The descriptive methods explain the relationships between the data regardless of the label or external variable. Clustering [13], association rules [27], and sequential patterns are among the methods of learning a model with the nature of descriptive. The clustering method is a model learning method that has a descriptive nature. In all 620 SIMILARITY TECHNIQUE EFFECTIVENESS OF OPTIMIZED FUZZY C-MEANS CLUSTERING clustering algorithms, the main goal is to minimize cluster density and maximize the separation of clusters from each other. Various methods of clustering are divided into hierarchical clustering, density-based clustering, gridbased clustering, incremental clustering, and partitional clustering.

FSVM
FSVM improves the accuracy of the classification. There is a training sample set of labeled, , p is the dimension of the training samples, with labels y i ∈ {−1, 1} n i=1 and s i is the fuzzy membership 0 < s i ≤ 1. The optimal separating hyperplane (OSH) problem solves the following problem: In the above equations, W is a normal vector of the separating hyper plane and b is a bias in the pair (W, b). C is a constant to control the misclassification errors, a smaller s i for the corresponding point x i reduces the effect of ξ i which is an error measure. The Lagrangian multiplier using a kernel function is represented as: H. KHANALI, AND B. VAZIRI 621 Subject to , and α(k) is the Lagrange multiplier. y is the decision function to the predicted class label for x [18].

FVCM
The principal idea of the fuzzy clustering algorithm is related to the decision-making concept of extended VIKOR [28] so that various alternatives are evaluated with criteria to achieve the best solutions. In designing this algorithm to evaluate clusters, not only the similarity of the clusters is considered through internal validation measures; but also the algorithm uses cluster quality through external validation measures. According to Algorithm 1, Dunn's index, Davies-Bouldin index, entropy, and density and means functions are considered as the decision-making criteria. The set of alternatives ranked with the extended VIKOR are actually improved points. This algorithm then continues until convergence is achieved.
Step 1: Initialize k (number of clusters), Initialize m (fuzziness parameter) Step 2: Step 3: Determine a set of alternatives: Replacing membership degree as alternative of each sample.
Step 4: Update a set of criteria (Dunn's index, the means, the density, Davies-Bouldin index and the entropy) Step 5: Rating the clusters and calculating the vectors of the centers C k = [c j ] with U (k) using extended VIKOR.

Proposed FVCM-FSVM algorithm
In the proposed method, the FVCM function is improved by combining it with the classification of constraints of fuzzy support vector machines. Therefore, this method is called FVCM-FSVM. First, The FVCM has a structure of partition clustering so that there is no special class in this structure so the data points are unlabelled samples as inputs, and the data are only clustered based on the similarity criterion. Then, FVCM clustering done that the cluster centre and the fuzzy membership values are determined in the data clustering. Figure 2 shows the general process. In the next step, each sample has a specific label which is used as input to FSVM. The purpose of the learning algorithm which consists of two stages of training and testing is to find the order of the types of labels based on other features of the samples. In this way, the data set is divided two parts of the training and test data set. In the training phase, the membership of each data point is determined based on the membership values of each data point to the clusters, and a part of the data point of each cluster is considered as a training set for learning and model building. These samples should naturally have high membership rates. In the following, another part of the data is used to estimate the accuracy of the classification model at the test stage. The noisy points gradually achieve the minimum membership values while leading to class balance. Ultimately, this process recursively continues until convergence conditions are provided. At each stage, the quality of the clusters is improved by the evaluation criteria of FVCM, and such an interaction between the two algorithms is the reason for the increased accuracy. This procedure is shown in Algorithm 2.
According to the above, the similarity technique of FVCM-FSVM based on FSVM increases the quality of the clustering. Figure 3 shows the behaviour of the proposed method to clarify the issue. There is a synthetic data set with two clusters. Cluster 1 is red (training data) and purple (classified data) and cluster 2 is green (training data) and phosphorous (classified data). In this type of clustering, the points with higher membership are trained and used to separate the noisy points. In this figure, the effect of FVCM-FSVM is shown to improve the quality of the clusters in the presence of noisy points.
Step Algorithm 2: FVCM-RSVM algorithm Step 1: Initialize k (number of clusters), Initialize m (fuzziness parameter) Step 2: Step 3: Determine a set of alternatives: Replacing membership degree as alternative of each sample.
Step 4: Update a set of criteria (Dunn's index, the means, the density, Davies-Bouldin index and the entropy) Step 5: Rating the clusters and calculating the vectors of the centers C k = [c j ] with U (k) using extended VIKOR.
Step 7: Assign the 90% of the most membership values in the training set and the remaining 10% of samples in the test set.
Step 8: Insert the class labels Step 9: Train one-against-all FSVM Step 10: End while (a) The first data set (b) The second data set

Experiment
FVCM and FVCM-FSVM are evaluated in three weighted modes. In this research, all experiments were run M AT LAB R2015b software, and executed on a computer with an Intel Core T M i5 processor (1.80GHz), and 4G RAM (3.88GB). These experiments are repeated 20 times for each of the three UCI data sets [29]. They are as follows: 1 Iris data set contains 150 instances described by 4 features and classified into three classes.
2 Glass Identification data set contains 214 instances described by 10 features and classified into seven classes.
3 Haberman's Survival data set contains 306 instances described by 3 features and classified into two classes.
These data sets are randomly divided into two parts which are the training section (90% of data) and the test section (10% of data). In this case study, the multi-class classification method is used one-against-all and RBF kernel strategies.

Experiment 1
the quality of the clusters of the two algorithms (FVCM and FVCM-FSVM) is evaluated based on six criteria; such as Dunn's index, Davies-Bouldin index, the entropy, data processing time (seconds), the number of iterations, and accuracy (percent). Dunn's index identifies the sets of clusters that are compact and well separated. The main goal of this measure is to maximize intercluster distances whilst minimizing intracluster distances [30]. The index is defined in [31].
Where d(c i , c j ) is the dissimilarity function between two clusters c i and c j and also diam(c) is the diameter of a cluster [31]. As such large values of D nc correspond to good clusters [30]. The implications of the Dunn's are the considerable amount of time required for its computation, and the sensitive to the presence of noise in data sets [31]. Davies-Bouldin index, As the Dunn's index, aims at identifying sets of clusters that are compact and well separated [30]. A similarity measure R ij between the clusters c i and c j is dened based on a measure of dispersion of a cluster c i and a dissimilarity measure between two clusters d ij . The R index is dened as follows [31]: Where in nc is the number of clusters, s i is the average distance between the cluster data to cluster center, and d ij is the distance between the centers of the clusters. Then the Davies-Bouldin index is dened as [31]: DB nc is the average similarity between each cluster C i , (i = 1, . . . , nc) and its most similar one [31]. Therefore, the cluster configuration that minimizes DB is taken as the optimal number of clusters [30]. Entropy determines cluster quality. The lower entropy means better clustering. Besides, the quantity of disorder is found by using the entropy [32]. To consider a data set containing the classes, S is clustered into k clusters. Let n k be the number of data points in kth cluster and n sk be the number of data points from the sth class in the kth cluster. The entropy of the kth cluster is given by [9]: The total entropy for the set of k clusters is dened as [9]: Where n is the number of data points in the data set. The accuracy is the certain percentage of the data which are correctly clustered by an examined method. Accuracy = T he number of correctly classif ied samples T he total number of samples (11) To consider have the correct assessment, the conditions for all experiments are similar to [7], and to compare the results of FVCM in [7] with experiment results of FVCM-FSVM are presented in tables. With regards to [33], internal validation measures, namely Dunn's index, Davies-Bouldin index, and so on, achieve better results. In addition, Dunn's index and Davies-Bouldin index have more effective results in recognition of the noisy and outlier data than other evaluation criteria [34] so higher weights are assigned to these values. As stated above, extended VIKOR alternatives allocate weights. Therefore, three weight modes, including minimum (#1), average (#2), and maximum (#3), are considered which are shown in the Table 1. In each row of states (#1, #2, #3), the sum of the values is equal to one, but the values less than 0.1 are ineffective.

Iris data set
In this subsection, the efficiency of the algorithms is described in Iris data set. FVCM and FVCM-FSVM are evaluated based on three weight modes and three clusters (k=3), and the results are shown in Table 2 where the best results are bold. Given the values of Table 2, the use of evaluation criteria in both methods can affect the status of membership values of the elements relative to the status of the membership values of the neighbouring points. On the other hand, targeted displacements in extended VIKOR method make the FVCM and FVCM-FSVM optimal with more accuracy. Because of the sustainability of the results in various evaluations, convergence conditions are provided to resist noisy data. Moreover the fuzzy feature and error correction possible of misallocation of samples in both methods has caused the least number of iterations in clustering. In total all, due to FSVM three modes of FVCM-FSVM have more run time than all three modes of FVCM. Since the results of FVCM and FVCM-FSVM are based on the evaluation of DB and Dunn indexes, the intracluster distance with the inter-cluster distance has been compared. In evaluating the quality of the clusters, the FVCM-FSVM has obtained the inter-cluster distance and the inter-cluster diameter better than other. The smaller DB index values and the larger Dunn index values indicate FVCM-FSVM clusters denser so that it has led to a better separation of clusters.

Glass data set
In this subsection, the efficiency of the algorithms is described in Glass data set. FVCM and FVCM-FSVM are evaluated based on three weight modes and seven clusters (k=7), and the results are shown in Table 3 where the best results are bold.
Given the values of Table 3, In FVCM and FVCM-FSVM, evaluation criteria are defined as the measure of similarity and proximity of data, and extended VIKOR is also considered to fit this area. Since clusters of Glass 626 SIMILARITY TECHNIQUE EFFECTIVENESS OF OPTIMIZED FUZZY C-MEANS CLUSTERING data set have various densities, in the evaluation of the comparison between the clusters, the evaluations show that the FVCM-FSVM in DB index provides denser clusters. This means a good separation of clusters from each other. However Glass data set is the standard set of data that contains clusters of different sizes, the best entropy value belongs to FVCM-FSVM#3 because the method of data distribution and the weight of the alternatives are the effective factors of such a result. This property has greatly contributed to the accuracy of the algorithm.

Haberman's survival data set
In this subsection, the efficiency of the algorithms is described in Haberman's survival data set. FVCM and FVCM-FSVM are evaluated based on three weight modes and two clusters (k=2), and the results are shown in Table 4 where the best results are bold. Given the values of Table 4, by comparing the intra-cluster distance with the inter-cluster distance, FVCM produces dense clusters that are well-separated so the best DB index belongs to this algorithm. This is the ability of this algorithm to accurately locate the cluster centres and thus lead to good clustering efficiency. The diameter of the FVCM-FSVM clusters can be very influential if there are noisy data, this parameter can be effective for detecting cluster density. And as a result, by implementing this algorithm with Haberman's survival data set which has non-spherical clusters, Dunn's index is somewhat more desirable than other case. Thus, FSVM is the main cause of the resistance of the algorithm to noisy data. The homogeneous elements of each cluster are measured as 0.62011. Since it is likely that some noisy points in the FVCM-FSVM are assigned to a specific cluster with low membership values, the use of FSVM constraints has been effective in improving fuzzy clustering performance. As a result, the higher accuracy of the algorithm is 98.751%.

Experiment 2
According to the results of Experiment 1, The weight modes of the algorithms that achieved the best results about the accuracy (in Table 2, Table 3, and Table 4) are considered so that Figure 4, Figure 5, and Figure 6 illustrate the behaviour of the objective functions than the number of iterations for both algorithms. FVCM-RSVM shows more flexibility in fuzzy inequalities. That is, the distribution of samples is considered even in the early iterations. On the other hand, FVCM-RSVM has better clustering accuracy which indicates the proper positioning of noisy data. Using extended VIKOR evaluation criteria as alternatives leads to optimal clustering with FVCM and FVCM-FSVM in the first iterations so that the first iterations of the objective function decrease to the convergence level, and they do not have the problem of local optimality.

Conclusion
In this paper, FVCM-FSVM based FSVM develops FVCM to overcome the destructive effects of noisy data, and increase accuracy because FSVM is one of the classification methods which it has a good generalized. Since FVCM selects high-degree membership samples for training in FSVM, and it preserves the labels of the remaining samples using the FSVM classifier, noisy data are correctly allocated by accurately replacing them. Of course, FVCM and FVCM-FSVM consist of the same structure with different weights. Since FVCM-FSVM involves the learning phase due to the use of the FSVM method, it leads to spend more time for clustering than FVCM, but the algorithm has higher clustering accuracy than FVCM. The proposed algorithm using FSVM has a good quality in data clustering and the proper location of cluster centres than FVCM. In addition, the good results of accuracy emphasize the better implementation of this trend. So it reduces the sensitivity to noisy data. Moreover FVCM-FSVM solves the local optimality problem, and provides the reasonable convergence. Future studies can focus on optimizing the points of each cluster through the parallelization process.