Regularized Jacobi Wavelets Kernel for Support Vector Machines

A new family of regularized Jacobi wavelets is constructed. Based on this Jacobi wavelets, a new kernel for support vector machines is presented. Using kernel and frame theory, the Reproducing Kernel Hilbert Space of this kernel is identified. We show that without being a universal kernel, the proposed one possesses a good separation property and a big ability to extract more discriminative features. These theoretical results are confirmed and supported by numerical experiments.


Introduction
Support vector machines (SVMs) have become a very dominant tool in machine learning used for both data classification and regression ( [20]). They have been extensively applied in many fields as pattern recognition, biology, medical diagnosis, chemistry and bioinformatics ( [8,16,21]). The classification problem can be limited to consideration of the two-class problem namely binary classification. The principle of SVMs is to bring back this binary classification problem in search of a separating hyperplane. SVM's theory is based on using three main concepts : • The Structural Risk Minimization principle ( [20]), which has been shown to be superior, to traditional Empirical Risk Minimization principle ( [3]). • The use of a convex quadratic optimization problem to solve the problem of maximum margin, efficient solvers are available now that can solve such problem even with large size and dense data ( [18]). • When data are nonlinearly separable, SVMs use a kernel function that maps the data into high-dimensional feature space in which the problem becomes linearly separable.
gave rise to new wavelets, such as the wavelets of Legendre, Chebyshev, Hermite, etc, which was a powerful technique for signal processing, solving differential equations, optimal control, and calculus variations problems ( [2,7,14]). Suffering from a continuity problem, kernels based on this type of wavelet have never been introduced before, more in the context of SVMs. In this paper, we built a new kernel based on Jacobi wavelets. Legendre and Chebyshev wavelets represents a particular case of Jacobi wavelets of indices (α, β) when (α = 0, β = 0) and (α = β = − 1 2 ) respectively, this shows the general framework of our kernel. Using frame theory ( [13]), we identify the Reproducing Kernel Hilbert Space (RKHS) of this kernel and we confirm that this new kernel has the separation properties. These theoretical results are confirmed and supported by numerical experiments. The paper is structured as follows: section 2 is devoted to kernel and frame theories. In Section 3, we present the main results of this work, we describe the method of constructing the regularized Jacobi wavelets kernel and its RKHS using frame theory. Numerical experiments are given in Section 4, the Jacobi wavelets kernel performance has been evaluated on two-dimensional illustrative examples in order to give also a graphical comparison with other kernels. Finally, we conclude the paper in section 5.

Given a finite training samples set
are l points in R d and the corresponding labels y i in {−1, 1}, the goal is to find an hyperplane in the space R d that is inserted between the two classes : class 1 and class −1. In linear classification, the aim is to construct a classifier f (a decision function) that distinguishes between the two sets. However, in general data are not linearly separable in practice. Then, a nonlinear classification is proposed that maps the data into a high-dimensional feature space H via a transformation ϕ called feature map (ϕ : X → H), such that the transformed data are linearly separable. When this hyperplane is back into the original space it describes a surface.

The optimization problem
The classic soft margin SVM formulation ( [5]) is defined by where the parameter C is the penalty which trade-off balancing the merit of model complexity and misclassified points. It is not necessary to know the explicit form of the mapping ϕ since we can replace the inner product ϕ(x i ) T ϕ(x j ) by the kernel function K such that where ⟨., .⟩ H denotes the inner product of H and (K ij ) i = 1, ..., l j = 1, ..., l is called the Gram matrix of the kernel K.

671
The solution α=(α 1 , . . . , α l ) of the optimization problem (2) and the bias b, which can be recovered from the Lagrangian multiplier associated to the equality contraint of (2) ( [3]), gives the classifier function which is fundamental for the learning task. The integer sv corresponds to the number of support vectors, which are the points x i whose associated Lagrangian multipliers are not zeros (α i ̸ = 0). Thus, the decision function can be represented by only a few numbers of data points ( the so-called support vectors). SVM classifier that yields fewer support vectors for a given kernel is desirable.

Kernels for SVMs
In the sequel, we present the basic definition and some important properties of kernels for SVMs.
Definition 1 [17] Let X be a non-empty set. Then a function K : X × X → R is called a kernel on X if there exists an R-Hilbert space H and a map ϕ : X → H such that ϕ is called feature map and H a feature space of K.
The most used kernels are the radial basis function RBF or Gaussian defined for some positive parameter γ by , and the polynomial kernel defined for some positive integer p by One can construct new kernels from scratch using kernel's properties. A good study on kernels theory is given in reference ( [17]).
A characterization of kernels for SVMs is given by the following theorem ( [17]) : Theorem 1 (Mercer) Let X be a non-empty set and let a function K : X × X → R. K is a kernel if and only if it is symmetric and its Gram matrix is positive semidefinite.
It's known that every kernel admits several feature space, the next definition introduces the RKHS, which is in a certain sense, the smallest feature space of this kernel and consequently, it can serve as a canonical one.
3. The space H is called a RKHS over X if the Dirac functional is continuous for all x ∈ X. 672 REGULARIZED JACOBI WAVELETS KERNEL FOR SUPPORT VECTOR MACHINES

Frame for SVMs
One can construct kernels and its associated RKHS from frame's theory ( [13]). We give the definition of a frame and then a characterization result necessary for the remainder.
Definition 3 [4] Let N ∈ N and {ψ n } n=1,...,N be a set of non-zeros functions of an Hilbert space (H, ⟨., .⟩ H ) . The familly {ψ n } n=1,...,N is a frame for H if there exist constants B, G > 0 such that The numbers B and G are called frame bounds.
The reconstruction of f from its frame coefficients needs the following definition of a dual frame.
Let H be the set of functions so that a n ψ n , a n ∈ R for n = 1, ..., N } .
The space (H, ⟨., .⟩ B ) is an RKHS and its reproducing kernel is ..,N in H. We will combine SVM and wavelet theories to built a new kernel based on Jacobi wavelets and we will characterize the theoretical properties of this new kernel in the next section.

The Jacobi wavelets kernel
The Jacobi polynomials J (α,β) m are defined by the following iterative formula for all α > −1, β > −1: These polynomials belong to the weight space represents the Kronecker function, Γ is the Euler gamma function and ⟨., ). The Jacobi wavelets are defined by ( [23]) where k ∈ N, n = 1, , ..., 2 k represents the number of decomposition levels, m = 0, 1, ..., M is the degree of the Jacobi polynomials (M ∈ N). The coefficient 2 is for normality.

Theorem 3
The familly forms an orthonormal basis of ( The symmetry and bilinearity of ⟨., .⟩ wav are obvious, we only show that ⟨., .⟩ wav is positive definite. We have

REGULARIZED JACOBI WAVELETS KERNEL FOR SUPPORT VECTOR MACHINES
Now, we show that the family By using the change of variable t = 2 k+1 x − 2n + 1 and the fact that the familly is an orthonormal We will use the same approach of constructing RKHS by frame theory given in Theorem 2. One can affirm the existence of a positive constant ∆ for a fixed M, k ∈ N such that From (4) and the fact that orthogonal basis is frame ([4]), we deduce from Theorem 2. that H wav is a RKHS and its reproducing kernel is given by In general, functions belonging to H wav are not all continuous. In fact, for k = 1 and M = 1, we have [ . We will study the continuity of f at 1 2 .
We take a n,m = a n ′ ,m = a m for n, n ′ = 1, ..., 2 k , we obtain . Kernel of type (5) has never been used for SVM since continuity is necessary to have a high-performance RKHS ( [6]). This continuity problem is addressed in the following subsection.

The regularized Jacobi wavelets
Now, we consider the regularized Jacobi waveletŝ LetĤ be the space of functions defined bŷ One can claim that :

Theorem 4
All functions belonging to the spaceĤ are continuous.

Proof
We know that the function f is continuous on each subinterval , then it only remains to study the continuity at the points n 2 k , n = 1, ..., 2 k − 1. We just need to compare between lim )) .

REGULARIZED JACOBI WAVELETS KERNEL FOR SUPPORT VECTOR MACHINES
hence, f is continuous.

Remark 1
The To apply Theorem 2., we must compute the dual of (6). The frame operator U can be defined as in ( [4]) The adjoint operator is given by is called the frame operator. This operator is linear and self-adjoint. Note that in terms of the frame operator ([4]) we have n,m , f ∈Ĥ.
Now, it remains to determine theψ

Theorem 6
The space H is the RKHS associated to the Jacobi wavelets for the multidimensional case, and K(., .) is its reproducing kernel.
ii Let f ∈ H and x ∈ [0, 1] d , we know that Since f j ∈Ĥ andĤ is an RKHS, then this last equality follows from relations (11) and (13). Then, the reproducing property is verified.
iii The Dirac function is continuous on H. Indeed, for x ∈ [0, 1] d and f ∈ H, we have We know that the functional δ x j is continuous onĤ. So, δ x is continuous.
The following theorem generalizes Remark 1. for the multidimensional case.

Theorem 7
The familly

Proof
We know that

REGULARIZED JACOBI WAVELETS KERNEL FOR SUPPORT VECTOR MACHINES
Let f ∈ H, we have By applying relation (10) on every function f j inĤ, we get

Remark 2
The spaceĤ = span is the vector space of polynomials of degree less than or equal to M . So, we can separate the data by applying theorem 6 of ( [6]).

Numerical tests
In this section, we validate the performance of our regularized Jacobi wavelets on two examples in two-dimensional space and give a geometric comparison between Jacobi kernel, Jacobi wavelets kernel and other kernels : Haar wavelet kernel, Wavelet kernel proposed by Zhang and al ([24]), Hermite and Laguerre polynomial kernel, RBF and polynomial. In Table 1, we give the kernels which are used for our numerical tests  Since Jacobi wavelets are defined on [0, 1], the original data should be transformed to [0,1] using this formula The kernels parameters are selected by using the ten-folds cross validation. First, the data are normalized by the relation (14). Then, 90% of the normalized data are chosen randomly as training set and the left ones as testing set. The best kernel parameters are the ones that gave the best accuracy over the 10 blocks. The accuracy is given by the following formula accuracy = nbr of correctly predicted data nbr of total testing data × 100.
We choose the variation of the parameters as follows : the parameter C equal to 2 i with i = −5, ..., 15. The RBF parameter γ equal to 2 i with i = −15, ..., 3. The degree of the polynomial N varies from 1 to 6. The dilatation parameter of wavelets k varies from 1 to 3 and the parameter a = 0.5 : 0.5 : 4. Finally, the parameters α and β varie from −0.75 : 0.25 : 2.
The programs used are implemented on a pc of 2.2 GHz with 4 Go of RAM for the first example, and on a pc of 3.2 GHz with 4 Go of RAM for the second example. The SVM toolbox has been used is the solver f itcsvm of Matlab version R2014a.

Example 1 : two spiral dataset
We consider the two spiral dataset which consist of points of R 2 on two interwined spirals not lineary separable. The spiral dataset has 100 points, 50 for each class (Figure 1.) In Table 2, we reported only the best results for each kernel and they are shown in figures 2-9 Table 2   Table 3 4

.2. Example 2 : checkerboard
The second dataset used was the checkerboard consisting of 486 blue points and 514 red points. For calculations reasons, we only consider 200 points of the checkerboard chosen randomly ( see Figure 10. ).  The best results are reported in Table 3 and figures 11-18.

Conclusion
In this paper, a new kernel based on regularized Jacobi wavelets is proposed. We show that the functions belonging to the RKHS defined by the Jacobi wavelets are not all continuous, where the continuity being essential for the separation property. So, we construct another RKHS by introducing regularized Jacobi wavelets that form a frame for the new RKHS. Using frame theory, this new RKHS verified the separation properties. Experimental results show that this kernel can provide competitive results compared to other kernel functions. Kernels based on wavelets and orthogonal polynomials, have the reputation of having an expensive Gram matrix in terms of computation time,