Two-Step Semidefinite Programming Approach to Clustering and Dimensionality Reduction

Inspired by the recently proposed statistical technique called clustering and disjoint principal component analysis (CDPCA), this paper presents a new algorithm for clustering objects and dimensionality reduction, based on Semidefinite Programming (SDP) models. The Two-Step-SDP algorithm is based on SDP relaxations of two clustering problems and on a K-means step in a reduced space. The Two-Step-SDP algorithm was implemented and tested in R, a widely used open source software. Besides returning clusters of both objects and attributes, the Two-Step-SDP algorithm returns the variance explained by each component and the component loadings. The numerical experiments on different data sets show that the algorithm is quite efficient and fast. Comparing to other known iterative algorithms for clustering, namely, the K-means and ALS algorithms, the computational time of the Two-Step-SDP algorithm is comparable to the K-means algorithm, and it is faster than the ALS algorithm.


Introduction
The advances of computer technology have enabled to store large databases, such as, for example, data sets of sequenced genomes.When dealing with real data sets, it is often needed not only to reduce the dimension of the attribute space (dimensionality reduction), but also to reveal some patterns among the objects (clustering).Clustering methods and Principal Component Analysis (PCA) are extremely important for data visualization.These techniques have been widely studied and applied to many real-life data sets, and in areas such as Statistics, Machine Learning, Pattern Recognition, Engineering, Computational Biology and Image Processing [3,18,34].
The reduction of the object space is usually done by applying a clustering method to the data set.Clustering is an unsupervised learning technique.Clustering consists in division of a given set of objects into groups, called clusters, based on some similarity criterion in such a way that the objects within a cluster are more similar to one another, than the objects belonging to different clusters.There are different types of clustering problems.In this paper, we focus on the partitional clustering problem, where nonoverlapping clusters are constructed.In such a problem, each object can be considered as a point in a n-dimensional space and each cluster can be identified by its center, called centroid, a non-observable object calculated by taking the mean of all the objects assigned to this cluster [18,32,34].To express similarity between objects, i.e., homogeneity inside a cluster, several similarity measures have been proposed, such as a metric defined on the data set [2,7].One of the most used (dis)similarity measures is the squared Euclidean distance [3,4,14,18,34].Given a number of objects, we will minimize the sum of the squared distances between each object and the corresponding cluster centroid.The resulting problem is called in the literature the minimum sum-of-squares clustering (MSSC) problem (see e.g., [2,4,6,14,18,26,33,34]).The MSSC problem is usually formulated as a Binary Integer Programming problem [2,25,26,33], but it can also be rewritten either as a (0,1) -Semidefinite Programming (SDP) problem [19,25,26], and U is called assignment matrix.By construction, U is a binary row stochastic matrix satisfying the condition Ue p = e m , where e k ∈ R k is a vector with all entries equal to one, k ∈ N. Notice that rank(U) = p.
Each cluster can be identified by its cluster center, called centroid [18,32,34].Thus, if the objects are assigned with the assignment matrix U, then the centroid c j ∈ R n of the cluster C j , for j = 1, ..., p, can be defined as (2) The (p × n) object cluster centroid matrix, whose rows correspond to the clusters presented by their centroids c j , is denoted here by D. In [32], this matrix is written as Given a (m × p) matrix M whose p columns are linearly independent, the matrix P of the orthogonal projection of the space R m onto the subspace spanned by the columns of M has the form P = M(M T M) −1 M T and satisfies the following properties ( [1,35]): rank (P) = tr (P) , (5) rank (P) = p, (6) P has eigenvalues either 0 or 1.
From the last property, it follows that any orthogonal projection matrix P is positive semidefinite, denoted by P ≽ 0.
Given any square matrices A and B, A ≽ B means that A − B is a positive semidefinite matrix.Given an assignment matrix U, denote by C(U) the subspace of R m spanned by its columns and consider a (m × m) matrix Z = [z ij ] in the following form ( [25,26]): It is easy to verify that Z has nonnegative elements and is the orthogonal projection matrix of the space R m onto the space C(U).Hence, Z satisfies the properties (3)- (7).Now, consider a partition of a set of n attributes into k, 2 ≤ k < n, disjoint subsets S j , j = 1, ..., k, called components.The assignment of attributes can be stored in a Notice that V is a binary row stochastic matrix satisfying Ve k = e n and rank(V) = k.Let C(V) be the subspace of R n spanned by the columns of the matrix V.
is an orthogonal projection matrix of R n onto C(V), and satisfies the above properties (3)- (7).Given a data matrix D, and assignment matrices U and V, we introduce the following matrices: -a diagonal matrix of order n specifying the loadings of each component, where v q is the q-th column of V and c q is the eigenvector associated to the largest eigenvalue of the matrix (U Ddiag(v q )) T U Ddiag(v q ) (here, diag(w) is a diagonal matrix whose entries are the elements of vector w), • A = BV -a (n × k) component loading matrix with a unique nonzero element per row, and rank(A) = k and A T A = I k , • Y = DA -a component score matrix where y iq is the value of the i-th object for the q-th component, • Ȳ = DA -an object centroid matrix in the reduced space of the components.

Clustering and dimensionality reduction
In this section, we formulate the clustering problem and describe a standard algorithm to solve it, the K-means algorithm.Then, we describe a statistical technique called CDPCA which permits to reduce not only the object space, but also the attribute space.

The clustering problem
Consider the clustering problem introduced in the previous section.The sum of the squared distances between each object and the centroid of the cluster to which it belongs is equal to , where the cluster centroid c j is defined in (2).Thus, the minimum sum-of-squares clustering (MSSC) problem can be formulated as follows: The first constraint in (11) ensures that each object is assigned to a single cluster and the second constraint ensures that each cluster has at least one object assigned.Any feasible solution of problem ( 11) is an assignment matrix U.
Problem ( 11) is a Binary Integer Programming problem with discrete variables and nonlinear and nonconvex objective function.This problem is known to be NP-hard and hence, very difficult to solve [2,4,15,26].Therefore, many approaches have been proposed to solve it either by exact algorithms, or heuristics (see e.g., [2,4,19,25,26]).By far, the most popular algorithm to solve the MSSC problem (11) is the K-means algorithm, whose main steps can be outlined as follows ( [15]): Choose an initial partition of p clusters (or generate it randomly) and find the corresponding centroids, 2: Assign each object i, i = 1, ..., m, to the closest centroid, 3: Update the centroids using the current assignments.
Steps 2 and 3 are repeated until the within cluster sum of squares is no longer reduced.Numerical tests show that the K-means algorithm returns well-separated clusters having a convex-shaped geometry, i.e., spherical or elliptical [14,15].
It is worthwhile mentioning that the K-means algorithm should be used on scaled data, since it relies on the Euclidean distances.Moreover, the K-means algorithm returns a local optimum and depends on the initial choice of the centroids [6,15,18].To overcome this drawback, it is recommended to consider several random initializations [15].For example, in [32], a "tandem analysis" (i.e., PCA followed by applying the K-means algorithm using only the first few components) was carried out with 10000 random starts of the K-means algorithm using the first two principal components of a particular data set.The data set consists of 20 objects and 6 attributes, and the aim was to obtain 3 clusters of objects.

The Clustering and Disjoint Principal Component Analysis (CDPCA)
A new methodology called CDPCA was recently proposed by Vichi and Saporta in [32] for obtaining not only nonoverlapping clusters of objects, but also a partition of the attribute space into disjoint subsets.This is important for visualization and interpretation purposes of (large) data sets.
Given a data set, CDPCA groups m objects into p, 2 ≤ p < m, clusters identified by their centroids and, simultaneously, partitions n attributes into k, 2 ≤ k < n, disjoint subsets of attributes.
In what follows, we briefly describe the CDPCA methodology (see [23,32] for further details).A given data matrix D can be modelled as follows: where U is the object assignment matrix, Ȳ is the object centroid matrix in the reduced space of the components, A is the component loading matrix and E is a m × n error matrix.
The idea is to minimize the error in the CDPCA model (12), i.e., minimize the norm of the error matrix: ∥D − U ȲA T ∥ 2 2 .It is easy to see that this problem is equivalent to maximizing ∥U ȲA T ∥ 2 2 .Since the columns A i , i = 1, ..., k, of the matrix A are orthogonal, i.e., A T A = I k , the matrix Ȳ is given by Ȳ = DA, and the trace of a square matrix equals the trace of its transpose, then maximizing ∥U ȲA T ∥ 2 2 is the same as maximizing ∥U DA∥ 2 2 , the between cluster deviance in the reduced space.Indeed, ∥U ȲA T ∥ 2 2 = tr . Hence, we get the optimization problem In [32], to solve the problem (13), it is proposed to consider a decomposition of the matrix A in order to include a binary and row stochastic matrix V, specifying the partition of n attributes into k disjoint subsets.The positions of the nonzero elements in the matrix A are identified by the positions of the unit elements in the matrix V. Hence, the CDPCA problem can be formulated as the following Quadratic Mixed Integer Programming problem: The first two constraints correspond to the assignment of m objects into p clusters and the next two constraints represent the assignment of n attributes into k disjoint components.The remaining constraints are associated to the PCA implementation.The objective function value given by ∥U DA∥ 2  2 can be computed either by tr ( U DA(U DA) T ) , which corresponds to the between cluster distances, or by tr ( (U DA) T U DA ) , which represents the total variance of the data in the reduced space, where the objects are identified by their centroids.

Alternating Least-Squares (ALS) algorithm
Problem ( 14) is difficult to solve due to the presence of discrete variables.In [32], an Alternating Least-Squares (ALS) algorithm is proposed to solve this problem.
The ALS algorithm starts from a randomly chosen assignment of the objects, specified by the matrix U, and an assignment of the attributes, specified by V.Then, it repeats the following four steps at each iteration: 1. update the assignment of attributes V, 2. apply PCA to get the component loadings, 3. update the assignment of objects U, 4. update the centroid matrix D = ( In [23], we showed that each iteration of the ALS algorithm can be summarily described by two basic steps: the assignment of objects via K-means, and the reduction of the attribute space via application of PCA to the resulting centroids.The simplified version of the ALS algorithm is presented below. Algorithm 2 Alternating Least-Squares (Two-Step version of ALS) ( [23]) input: D, data matrix of m objects and n attributes; p and k, the desired numbers of clusters of objects and attributes, respectively; ε, numerical tolerance.output: U, object assignment matrix; V, attribute assignment matrix; A, component loading matrix; ∥U DA∥ 2 2 , the between cluster deviance in the reduced space of the components.repeat 1. K-means step (for the objects): • assign m objects into p clusters, obtaining matrix U, • calculate the centroids in the space of the observed attributes and find matrix D, • identify the objects by their cluster centroids in the space of the observed attributes and get U D.

PCA step (for the attributes):
• assign n attributes into k subsets and obtain matrix V, • obtain the loadings of the CDPCA components and get matrix A, • calculate the centroids in the reduced space of k CDPCA components and define matrix Ȳ = DA, • identify the objects in the reduced space of k CDPCA components and calculate matrix Y = DA.until the difference between two consecutive computations of the between cluster deviance in the reduced space is smaller than ε.
Since the objective function of problem ( 14) is bounded above, the algorithm converges to a stationary point, which is a local maximum of problem [32].This procedure can be considered as a heuristic and thus, to increase the possibility to achieve the global maximum, it has been suggested to run the algorithm several times for different initial assignment matrices U and V, randomly chosen at the beginning of each run.
The difference between the four step ALS algorithm proposed in [32] and its simplified version described in [23], and summarized in Algorithm 2, consists in the way matrices V and A are updated.In the general iteration of the Two-Step version of ALS, these matrices are sequentially constructed by an iterative procedure applied to their rows and columns, in order to maximize the between cluster deviance in the reduced space, while in the four step ALS algorithm the matrices V and A are updated separately.Numerical experiments show that the ALS algorithm in its simplified Two-Step version 2 is faster than the four step version.
In the following section, we will present a new algorithm for clustering and dimensionality reduction, based on SDP models.

Two-Step-SDP: A new version of clustering and dimensionality reduction
In this section, the clustering problem is reformulated in the form of a nonlinear SDP-based model.To solve this model, an approximation algorithm based on a linear SDP relaxation is described.The new Two-Step-SDP algorithm based on a modification of the ALS algorithm to perform CDPCA is presented.Some differences between the Two-Step-SDP and the ALS algorithms are discussed.

SDP-based formulation for the clustering problem
In [25,26], another formulation for the clustering problem (11), called 0-1 SDP model, is proposed.This formulation involves an unknown orthogonal projection matrix Z defined in (8).It is shown that the objective function in (11) is equal to the function tr ) and the 0-1 SDP model is formulated as follows: The first two constraints in (15) imply that Z is an orthogonal projection matrix and the next constraint ensures that there are exactly p clusters.The last but one constraint in problem (15) means that each object is assigned to a single cluster, i.e., each row of the matrix Z has sum equal to 1.The last constraint ensures that Z has nonnegative elements.
In [26], it is proved that problem ( 15) is equivalent to (11).Problem ( 15) is called 0-1 SDP model in [25,26], since the first two constraints imply that Z is positive semidefinite, with eigenvalues 0 or 1.Nevertheless, this model does not have a standard SDP form, therefore, in this paper, we will call it the SDP-based model of the clustering problem (11).
Since minimizing tr , we can rewrite problem (15) as Problem ( 16) is very difficult to solve due to the nonlinearity of the second constraint.In [25], an approximation algorithm based on a linear SDP relaxation is proposed to obtain its solution.

SDP-based approximation algorithm
Approximation algorithms are widely used to solve hard optimization problems (see e.g., [4,12,13,26]).The general idea of an approximation algorithm is to first solve a relaxation of the hard problem, and then apply some rounding procedure to the obtained solution to finally find a feasible solution of the original problem.It is known that SDP relaxations provide good approximations.The first attempt to use a SDP-based approximation algorithm dates back to 1995, when Goemans and Williamson [13] suggested the randomized approximation algorithm for the max-cut problem, a well known NP-hard problem [20].Since then, SDP has been successfully applied in the development of approximation algorithms for several classes of hard combinatorial optimization problems.It is also known that standard interior point SDP methods, such as the primal-dual interior point methods [5,12,30,36], are not so efficient for large scale problems and thus, different strategies have been developed [11,19,25].
In [25], an approximation algorithm that uses a linear SDP relaxation is proposed to obtain an approximate solution of (16).First, the SDP relaxation is solved using a procedure based on a characterization introduced in [24].Then, a rounding procedure is used to extract a feasible solution of (16).In what follows, we consider two steps of the approximation algorithm based on a SDP relaxation proposed in [25].

Linear SDP relaxation and its solution
The SDP-based model ( 16) can be relaxed to a simpler problem by removing the nonnegative requirement on the entries of Z and replacing its first two constraints by the constraint I m ≽ Z ≽ 0, yielding the following convex SDP problem: Stat., Optim.Inf.Comput.Vol.Notice that the last constraint ensures that all the eigenvalues of the positive semidefinite matrix Z are less or equal to 1.
It is worthwhile mentioning that the solution of the relaxed problem (17) may not coincide with the solution to the SDP-based problem (16).
Since standard SDP methods are not so efficient for large scale problems, in [25], an alternative procedure to solve the SDP problem (17) is suggested.This procedure is based on the characterization of the sum of the largest eigenvalues of a symmetric matrix introduced in [24].
First, notice that one of the eigenvalues of any feasible solution Z of problem ( 17) is equal to 1 and the corresponding eigenvector is 1 √ m e m .Moreover, any matrix Z can be written as where Z 1 is a column and row centred matrix by the orthogonal projection matrix Notice that tr(Z 1 ) = tr(Z) Applying the same transformation on DD T , we get the matrix W 1 , which is given by Second, notice that maximizing the objective function in ( 17) is equivalent to maximizing tr(DD T Z 1 ), which in turn is equal to tr(W 1 Z 1 ).Then, problem (17) can be rewritten in the form of the following problem w.r.t. the matrix variable Z 1 : Based on the results from [24] and [25], the optimal solution of the SDP problem ( 21) can be obtained if and only if where λ 1 , ..., λ m are the eigenvalues of the matrix W 1 listed in decreasing order.
Any solution Z 1 of the SDP problem (21) has the form Z 1 = FF T , where F is the (m × (p − 1)) matrix whose columns are the eigenvectors associated to the p − 1 largest eigenvalues of the matrix W 1 [20,24,25].
Hence, after obtaining a solution Z 1 of the SDP problem ( 21), we replace it in (18) to get the solution of the SDP relaxed problem (17).
Therefore, a procedure to obtain a solution Z of the relaxed problem ( 17) can be outlined as follows ( [25]): Algorithm 3 Solving the SDP relaxed problem (17) 1: Compute the matrix W 1 by using (20), 2: Apply SVD to obtain the p − 1 largest eigenvalues of the matrix W 1 and the associated eigenvectors, v 1 , ..., v p−1 , and construct the matrix F, whose columns are these eigenvectors, 3: The solution of the SDP problem (17), obtained using the Algorithm 3, can be used as an approximate solution of the SDP-based problem (16).This solution is not necessarily feasible.To find a feasible solution of ( 16), a rounding procedure is needed.In what follows, we will describe a rounding procedure that was proposed in [26].

Rounding the approximate solution
It can be observed that if Z is a solution of problem (16), then ZD has at least p different rows.Notice that ZD can be considered as an object centroid based matrix, where each object is identified by the cluster centroid to which it belongs.Based on this observation, in [26], the following rounding procedure is suggested.
Algorithm 4 Rounding procedure for the SDP-based problem (16) 1: Given the data matrix D and the solution Z of the relaxed problem (17), select p different rows of ZD and define the initial centroids, 2: Apply K-means to problem (11) using the initial centroids to obtain U, Notice that the SDP-based approximation algorithm, described by the Algorithms 3 and 4, performs a PCA (Principal Component Analysis) step, by reducing the problem into (21) and solving it using SVD, and a Kmeans step.Motivated by this and the idea of the CDPCA methodology, in the following section, we propose a new approach based on a Two-Step-SDP clustering scheme, that uses approximation algorithms based on SDP relaxations for clustering and dimensionality reduction.

A new approach: Two-Step-SDP
Motivated by the works [23,25,26] and [32], we propose a new approach to CDPCA by combining SDP models and the CDPCA methodology.The new approach is called Two-Step-SDP, since two clustering problems are considered and solved using a SDP-based approximation algorithmic framework, and since we modify and improve the ALS algorithm in its Two-Step version summarized in Algorithm 2 in order to use SDP models for clustering objects and attributes.There are two main modifications on the Algorithm 2. In the first modification, we suggest to apply the SDP-based approximation algorithm described in Section 4.2 to construct the initial matrices U and V, instead of a random choice.It is expected that this approximation algorithm returns almost optimal solutions quite fast.The second modification on the Algorithm 2 consists in updating the component loading matrix A using the current assignment matrix V.Both modifications improve the Algorithm 2 not only in terms of computational time, but also in efficiency.
The idea of the Two-Step-SDP approach is to obtain not only a solution of the SDP-based problem defined in (16) for clustering objects, but also a solution of a SDP-based problem for clustering attributes into disjoint subsets, using an approximation algorithm based on the framework described in the previous section.The solutions of such problems are computed in order to maximize the between cluster deviance in the reduced space of the components.Therefore, the Two-Step-SDP approach can be considered as an alternative to CDPCA.
Consider the SDP-based model ( 16) for the clustering problem.Recall that for clustering the objects into groups, in (16) we use the matrix Z defined in (8) as an unknown orthogonal projection matrix defined by the assignment matrix U.For clustering attributes into disjoint subsets, we can use the matrix V and an unknown orthogonal projection matrix H defined in (10).Using the model ( 16) with the matrix variable H, we get a SDP-based problem.
As it was mentioned above, the SDP-based models of the form ( 16) are difficult to solve.Let us relax the nonlinear constraints of the SDP-based problems using the approach described in the previous section.Thus, for the SDP-based problem for clustering objects, we get the relaxed SDP problem (17), To solve the linear SDP problems (17) and ( 22) one can use the approach described in Section 4.2, and the solution of the original SDP-based models can be obtained using a rounding procedure.
The rounding procedure used in the Two-Step-SDP approach can be summarized as follows.Given assignment matrices U and V, apply the CDPCA methodology to obtain the component loading matrix A, as well as the component score matrix Y, and the object centroid matrix in the reduced space, Ȳ. Next, perform the K-means algorithm for clustering the objects in the reduced space, i.e., apply K-means to the matrix Y and use Ȳ as initial centroids.Finally, compute the between cluster deviance in the reduced space of the components.The algorithm stops when the between cluster deviance is no longer increased.

Two-Step-SDP algorithm
The algorithmic scheme of the Two-Step-SDP algorithm for clustering and dimensionality reduction can be described as follows.
Algorithm 5 Two-Step-SDP input: D, data matrix of m objects and n attributes; p and k, the desired number of clusters of objects and attributes, respectively; ε, numerical tolerance.output: U, object assignment matrix; V, attribute assignment matrix; A, component loading matrix; d) Compute the between cluster deviance in the reduced space, ∥Z * DA∥ 2 2 .Stopping criterion: The rounding procedure stops when the difference between two consecutive computations of the between cluster deviance in the reduced space is smaller than the pre-specified tolerance value ε.
Since the Two-Step-SDP algorithm uses the approximate solutions of the nonlinear SDP-based problems for clustering objects and attributes, which are close to the final solutions, the clustering process is expected to be finite.
The main feature of the Two-Step-SDP algorithm is that it constructs the clusters of attributes and use them to cluster the objects by maximizing the between cluster deviance in the reduced space of the components, i.e., the Kmeans algorithm is performed in lower dimensions.Moreover, the Two-Step-SDP algorithm provides not only the allocation of objects and attributes into clusters, but also the component loadings of such allocation of attributes.The quality of the clustering can be measured by considering the between cluster deviance in the reduced space of the components, or the within sum of squared distances between each object and the cluster centroids.

Two-Step-SDP and ALS
Notice that both the Two-Step-SDP and the ALS algorithms use iterative schemes to obtain the best solution.Nevertheless, there are some differences.In particular, in the ALS algorithm, the update of the matrices V and A is done using an alternating procedure that works row-by-row and column-by-column on these matrices by obtaining components that explain the largest variance in order to maximize the between cluster deviance in the reduced space (see all the details in [23]).Such alternating procedure has to be performed nk times at each iteration of the main algorithm.Thus, it may be quite expensive in terms of computation time and memory.In the Two-Step-SDP algorithm, the matrix V is obtained using the solution of the SDP relaxation problem (22), and therefore, it is expected to be close to the optimal assignment.Hence, at each iteration of the rounding procedure, the update of the matrix A requires only one step.Notice that in the Two-Step-SDP algorithm, the dimensionality reduction is done by finding a partition of the attributes specified by V, and then the component loadings specified in the matrix A for this particular partition are computed.Thus, the dimensionality reduction may not explain the largest variance.Therefore, the Two-Step-SDP algorithm can be regarded as a simplified version of the ALS algorithm.

Computational results
This section is devoted to present our numerical experiments using the statistical software R, which is an open source software widely used by the statistical community.This software has a lot of specific routines for different kinds of problems.Details on how to use our implementation of the Two-Step-SDP algorithm, as well as some examples are presented.A comparison of the results obtained using the proposed Two-Step-SDP algorithm and other existing techniques is also reported.

The R function TwostepSDPClust
We have implemented the Two-Step-SDP algorithm in R. In order to solve the problems more efficiently, we used two strategies concerning computation of eigenvalues and eigenvectors.In the steps of the algorithm, we need to compute eigenvalues and eigenvectors of possibly huge covariance matrices of the form M T M, where M is a matrix with more columns than rows.In these cases, we used the approach suggested in [27] and considered the smaller matrix MM T .It is shown that the nonzero eigenvalues of M T M and MM T coincide, and each unit eigenvector w of M T M associated to the eigenvalue λ is related to the unit eigenvector v of the smaller matrix MM T by w = M T v √ λ .In the rounding procedure step of the Two-Step-SDP algorithm, we need to compute the largest eigenvalue and the corresponding eigenvector of a matrix.Here, we have implemented the Power Method [8], which proved to be faster than the standard methods for computing eigenvalues and eigenvectors.
The Two-Step-SDP algorithm was implemented in R by the function TwostepSDPClust, and is available from the author upon request.This function is suitable for data matrices with numeric elements and starts by standardizing the data (the description of the standardize step can be found in [23]).The user needs to input the following arguments in TwostepSDPClust: • data -the numeric data matrix • p -the number of clusters of objects • OECD countries -the data set of 20 objects and 6 attributes from [31], representing the short-term macroeconomic scenario (1999) of OECD countries; notice that, although the true classes are not known, the experiments reported in [31,32] suggest to divide the objects into 3 clusters.• Breast Cancer -the Winsconsin Breast Cancer data from UCI repository; contains 683 instances (originally, there were 699 instances, but 16 of them were excluded, since they contain missing values), each instance is described by 9 attributes with integer values in the range 1 − 10 and a real binary class label, which divides the instances into two clusters, representing the type of tumor (benign or malignant).• Diabetes -the Pima Indians Diabetes data set from UCI repository, contains 768 objects and 8 attributes, divided into two classes, representing the results on testing diabetes (positive or negative).• Colon -the data set from microarray experiments on colon tissue samples, available in the the R package plsgenomics; contains 62 samples and 2000 genes, divided into 2 classes, representing the type of tumor (benign or malignant).
• leukemia -the leukemia data set, available in the package plsgenomics; contains 38 samples and 3051 genes, also divided into two classes, representing the type of tumor (benign or malignant).• SRBCT -the gene expression data set from microarray experiments on small round blue cell tumors, also available in package plsgenomics; contains 83 objects (called samples) and 2308 attributes (genes), divided into 4 groups, representing four cancer variants.• Iris -the Fishers Iris data set, available in the UCI repository, the most famous data set for clustering experiments; contains 150 objects described by 4 attributes, and the true classification of the objects in 3 clusters is known, representing the three species of the Iris flower.• Soybean -the Soybean data set from UCI repository contains 47 objects described by 35 attributes, where the four true classes of objects are also known, and represent four soybean diseases.
For the numerical tests using the Two-Step-SDP algorithm we have set the tolerance value to 10 −8 and the maximum number of iterations to 100.The K-means heuristic was executed using all the data sets previously scaled, the multiple random start was set to 1000, and the maximum number of iterations was 100000.The tolerance for the ALS algorithm was set to 10 −5 and the maximum number of iterations was 100.To augment the efficiency of the ALS algorithm, the numerical tests were run 1000 times, with the exception of the gene expression data Colon, leukemia and SRBCT.For the data sets Colon and leukemia, 20 runs were performed, and for SRBCT, only 10 runs were performed, because the computation time of the ALS algorithm increased in an unreasonable way in these cases.For the Soybean data set, the standardize step on the experiments was not executed, since the data matrix presents null columns.
Following the analysis presented in [32] for the OECD countries data set, let us consider only two principal components for all the tested data sets, i.e., k = 2. Table I summarizes the characteristics of the data sets.It also contains the desired number of clusters of objects, represented by p.For all the presented data, the true classes are known, with exception for the OECD countries data set.The detailed numerical results are presented in Tables II, III and IV.The Tables II and III contain the results obtained using the Two-Step-SDP algorithm and the K-means algorithm, respectively, while Table IV contains the results obtained using the ALS algorithm.The first column of Tables II, III and IV contains the name of the data set, and the second column contains the computational time in seconds.In these tables, iter is the number of iterations, wssd is the within sum of squared distances, i.e., within cluster deviance, bcd is the between cluster deviance, bcdr is the between cluster deviance in the reduced space of the components, bcdrp is the between cluster deviance of the total variance, e is the error associated to the CDPCA model (12), Usize and Vsize represent the sizes of the clusters of objects and attributes, respectively, and Exp.var. is the explained variance of the components.It should be noticed that, according to the information provided by the algorithms, the Tables II, III and IV contain a different number of columns.Comparing the results obtained using the Two-Step-SDP algorithm presented in the Table II and the standard K-means algorithm displayed in the Table III, we can conclude that in general, in terms of computational time and solution, both approaches are quite efficient.For the leukemia data set, the K-mean algorithm performs quite faster.Notice that the K-means had to be executed for clustering attributes and objects, thus, the computational time is the sum of the running times of both procedures.With respect to the clusters of attributes, the K-means algorithm does not provide further information on the variance explained by the resulting components, while the Two-Step-SDP algorithm does.It can be observed that in almost all cases, the values of the within and between cluster deviances obtained using the Two-Step-SDP or the K-means algorithms are quite similar.The major difference in the performance of these algorithms is for the Soybean data set.The Two-Step-SDP algorithm returned the within cluster deviance equal to 453.12 and the between cluster deviance equal to 2760.87, while the K-means algorithm returned the within cluster deviance equal to 205.96 and the between cluster deviance equal to 484.2.Regarding the clusters of the attributes, it can be observed that the Two-Step-SDP and the K-means algorithms return clusters of equal sizes, with exception for the Soybean and SRBCT data sets.With respect to the clusters of objects, these algorithms have returned different clusters for the Iris, Soybean and SRBCT data sets.
Comparing the results presented in Tables II and IV, we can conclude that the Two-Step-SDP algorithm is faster than the ALS algorithm.In a couple of iterations, it founds a solution that either yields the same value of the between cluster deviance in the reduced space as that returned by the ALS algorithm, or is very close to it.It can also be observed that the errors of solving the CDPCA model ( 12) using the Two-Step-SDP and the ALS algorithms are very similar.Regarding the results of the clustering of objects, it can be observed that for the Breast Cancer, Diabetes, Colon, leukemia and Iris data sets, the sizes of each cluster coincide for both approaches.With respect to the clusters of attributes, there are some differences, which induce differences in the variance explained by the obtained components.The ALS algorithm provides better results for the Soybean, OECD countries and SRBCT data sets in terms of the value of the between cluster deviance in the reduced space and in the explained variance by the components.It can also be observed that for the Breast Cancer and the Colon data sets, the variances presented by the first component are, respectively, 62.52% and 37.56% for the Two-Step-SDP algorithm, and 39.11% and 23.39% for the ALS algorithm.
In order to compare the quality of the object clusters obtained using the three algorithms, we present pseudoconfusion matrices, summarizing the (mis)classification of the objects when the true classes are known.In a Stat., Optim.Inf.Comput.Vol. 3, September 2015 ELO ÍSA MACEDO 309 pseudo-confusion matrix, the rows correspond to the true classes and the columns correspond to the classes returned by algorithms.Notice that the order of the predicted classes may be different to that chosen for the real ones.The analysis of the performance of each algorithm can be complemented by computing its accuracy.Here, we measure the accuracy of an algorithm by the number of the correct predictions of objects divided by the total number of objects.Considering the pseudo-confusion matrices presented in the Table V, we can conclude that all the approaches to clustering provide the same results in terms of classification of the objects for the Breast Cancer data, and the corresponding accuracy is very high, 96%; the Diabetes data, presenting the accuracy equal to 66%; the Colon data, with the accuracy equal to 55% and the leukemia data, with the accuracy equal to 68%.For the SRBCT and Soybean data sets, all the approaches present different cluster structures.The K-means presents the lowest accuracy for the SRBCT data, which is 37%, while the ALS algorithm had the highest accuracy, 47%, that is quite close to that obtained using the Two-Step-SDP algorithm, equal to 45%.For the Soybean data, the Two-Step-SDP algorithm yielded an accuracy of 57%, the ALS algorithm had an accuracy of 53% and the K-means algorithm presented the highest accuracy: 72%.For the Iris data set, the Two-Step-SDP and ALS algorithms obtained the same classifications, presenting an accuracy of 87%, while the K-means algorithm returned an accuracy of 83%.

Concluding remarks
In this paper, we have presented a new approach to clustering and dimensionality reduction.The Two-Step-SDP algorithm is presented for clustering both objects and attributes.The algorithm starts by solving particular SDP relaxed models, whose solutions are used to get the initial centroids for the clustering procedure.Then, it uses an iterative rounding scheme for clustering the objects by maximizing the between cluster deviance in the reduced space.Since the initial centroids are obtained from the solutions of the SDP relaxations, it is expected that the rounding procedure stops in a finite number of iterations.The new algorithm is implemented in a easy-to-use software application using R, and we have included tests that show the efficiency of the Two-Step-SDP algorithm.A comparison with other clustering algorithms, namely, the ALS and the K-means algorithms, was performed.
Based on the numerical experiments, we can conclude that the Two-Step-SDP algorithm finds solutions that are close to that obtained using the and K-means heuristics.The experiments show that the Two-Step-SDP algorithm is comparable to the K-means algorithm in terms of the execution time, and that the Two-Step-SDP algorithm is significantly faster than the ALS algorithm.Although the computational time of K-means is slightly better than the Two-Step-SDP algorithm in several cases, the K-means algorithm ends up returning less information than the Two-Step-SDP algorithm.In particular, K-means does not provide further information on the partition of the attribute space, namely, about the component loadings and the variance explained by the components, while the Two-Step-SDP algorithm does.We can conclude that the Two-Step-SDP algorithm for clustering and dimensionality reduction can be considered as a modification or a simplified version of the ALS algorithm, since it provides almost the same information, but it is much faster in obtaining the assignments, and the loss on the information of the explained variance by the components is quite insignificant.
T Z) s.t.tr(Z) = p, Ze m = e m , I m ≽ Z ≽ 0, and for the SDP-based problem for clustering attributes, we get the following relaxation: max H tr(D T DH) s.t.tr(H) = k, He n = e n , I n ≽ H ≽ 0.

∥ZDA∥ 2 2
, the between cluster deviance in the reduced space of the components.1: Solve approximately the SDP-based model for clustering objects: a) Considering the relaxed model (17), compute the matrix W 1 by using (20), b) Use SVD to obtain the p − 1 largest eigenvalues of the matrix W 1 and the associated eigenvectors, v 1 , ..., v p−1 , and construct the matrix F, whose columns are these eigenvectors, c) Set the approximate solution: Z = FF T + 1 m e m (e m ) T .2: Solve approximately the SDP-based model for clustering attributes: a) Considering the relaxed model (22), compute the matrix W 2 by using (20) on the matrix D T D, b) Use SVD to obtain the k − 1 largest eigenvalues of the matrix W 2 and the associated eigenvectors, w 1 , ..., w k−1 , and construct the matrix G, whose columns are these eigenvectors, c) Set the approximate solution: H = GG T + 1 n e n (e n ) T .3: Rounding procedure: Initialization: randomly select p different rows from ZD, and k different rows from HD T , as initial centroids on the K-means algorithm to get the initial matrices U, and V, respectively.a) Compute Z * = U(U T U) −1 U T and D = (U T U) −1 U T D. b) Compute A by fixing each column vq, q = 1, ..., k, of V and replacing the nonzero elements in vq by the elements of the eigenvector associated to the largest eigenvalue of the matrix (Z * D[c]) T Z * D[c],where c is a row vector containing the row indices of nonzero elements of vq, i.e., the attributes that contribute to the component q, and [c] means that only the columns specified in c are used.c) Update U by performing the K-means algorithm in the reduced space, i.e., applying it to the matrix Y = DA with initial centroids Ȳ = DA.Randomly select k different rows from HD T as initial centroids on K-means and getV.

Table I .
Summary of the characteristics of the data sets and the number of clusters of objects used in the experiments.

Table II .
Numerical results using the Two-Step-SDP algorithm.

Table III .
Numerical results using the K-means algorithm.

Table IV .
Numerical results using the ALS algorithm.

Table V .
Pseudo-confusion matrices for classification of objects obtained using the R functions TwostepSDPClust, kmeans and CDpca.The OECD countries data is not included, since the true classes of this data set are unknown.