An Ensemble Based Offline Handwritten Signature Verification System

In the field of security and forgery prevention, handwritten signatures are the most widely recognized biometric since long and also most practical. Although handwritten signature verification systems are studied using both On-line and Off-line approaches, Off-line signature verification systems are more difficult to compare to On-line verification systems. This is due to the lack of dynamic information, viz. a database which constantly stores the latest signature of the person. In the paper, an approach using ensemble methods are adopted to classify a signature as forgery or not. In the proposed system, three classifiers, viz, one unsupervised, viz. Fuzzy C-Means (FCM) and two supervised classifiers, viz. Naive Bayes (NB) and Support Vector Machine (SVM) are used as base classifiers. An attempt is made to compare the different approaches. We attempt both the categories of classification not only because both of them are applicable in this particular case but also intending to find out which performs better. In most cases, it is observed that Naive Bayes and Ensemble are comparable as they exhibit better performance than the other two. But among them, in most of the cases Ensemble classifier performs better than the Naive Bayes and consequently, we have taken the Ensemble as a final classifier.


Introduction
A signature is nothing but a combination of letters, words and special symbols. Signatures are utilized as a means of identity verifier for authentication, attestation and authorization in different legal transactions especially in banking environment [1]. Depending on the nature of the samples acquisition method, a signature verification system may of online and offline [2,3]. The off-line handwritten recognition systems are more difficult and a more practical application area as compared to the on-line one. In a pattern recognition system, an ensemble classifier is the fusion of multiple base classifiers to predict a class label of an unseen instance [4]. Here, the base classifiers can be aggregated with the help of combiners like simple majority voting, weighted majority voting, Bayesian combination, probabilistic approximation etc. Among them, the simple majority voting is found to be immensely used combiner due to its simplicity [5]. Thus the output which appears most frequently as a vote for the base classifiers is considered as a required pattern of an instance [6].
Ensemble classifier may be of two types, viz. parallel and serial [7]. In the proposed system, we have applied parallel ensemble classifier where Fuzzy C-Means (FCM), Naive Bayes (NB) classifier and Support Vector Machine (SVM) is applied as base classifiers such that they can perform the classification task independently and finally their outcomes are aggregated to predict the pattern of a given signature. The choice of these three classifiers are based on the fact that we do not have a readily available large training dataset and NB is the best 903 choice when the training set is small with high bias/low variance. SVM are popular in text classification problems where very high-dimensional spaces are the norm. It also gives high accuracy, nice theoretical guarantees regarding overfitting, and with an appropriate kernel, they can work well even if the data isn't linearly separable in the base feature space. FCM was adopted as a writer-independent verification model, where a single classifier deals with the whole population of writers. The main advantages of writer independent model are -reduction in the number of signature samples required, for training and validation, and ability to absorb new writers without generating new personal models.
The prediction results of these three classifiers are fused with the simple majority voting combiner. Thus based on the following possible voting conditions (as given in Table 1), the ensemble classifier decides whether a tested signature is genuine or forgery. We present in Figure 1 the overview of our proposed work. The objective of the study was to detect forgery for all the three classes of counterfeit, viz. random, simple and skilled. It has enormous practical utility because the system can be applied in and Banking, Financial and other Document registration organizations, where every day signatures have to be documented and verified in bulk. The contribution of the study may be summarized as follows: • Development of a well-documented and real-time database consisting of genuine and forged samples of all the three categories: random, simple and skilled. • Development of an algorithm for several classification methods and creating the ensemble for the most accurate output. • Development of the algorithm for all three categories of forgery. Hence it is a robust algorithm.

Preprocessing
Here, a binary signature image (which is the conversion of RGB to grayscale and then to binary form) is taken as an input image where the operations like complement binarization, removal of redundant bordering components, adjustment of signatures spaces etc. are performed [8,9].
1. Conversion to binary complement signature: The complement of a binary signature is the reverse of the binary signature. A complement binary signature is displayed in Figure 2. Here, to invert the binary signature, the logical operator NOT is applied on that binary signature such that 2. Removal of redundant bordering components: During adjustment of signatures in Photoshop software, the bordering components as shown in Figure 3(a) are found to be present in some of the samples. Here, we have discarded such elements in order to interpret only the signature region. Now, consider a marker and original signature images as F mark and I bor respectively.
Next, to extract only bordered elements from the given signature, we have performed reconstruction taking I bor as a mask image by following way: Here, the resultant J is the image of only bordered elements. Now, to obtain only the signature region which is free from bordered elements, we have performed the following operation as below: The final resultant signature is displayed in Figure 3(b).

Features Extraction
In the proposed system, we have extracted the global and local features as mentioned below [10,11,12,13,14]

Classification
The three base classifiers are explained below:

FCM clustering technique:
The FCM technique partitions data elements or objects into clusters based on their similarity of behavior [15,16,17]. We have developed our algorithm for verification of signatures with FCM technique on the assumption that if the cluster size (i.e. the number of data items along with the testing data present in a cluster) is more than 2, the process will continue. Thus when the cluster size becomes 2 then a given signature will be assumed as genuine and when it becomes 1 (i.e. no training points remain with the testing data) then the signature will be recognized as a forgery.

NB Classifier:
Formally, NB classifier is based on the following three conditions [18]: • Pre-defined knowledge about the classes in terms of prior probabilities should be known (Here, we have taken prior = 0.5, since before testing a signature p(genuine) = p(forgery)). • To estimate parameters, a suitable distribution is selected. (In the proposed system, Gaussian distribution is implemented, being the general system of distributions). • To assign a class label to a test pattern, posterior probabilities of available classes are evaluated. Finally, proper class label of the pattern is declared based on maximum posterior probability. That is, the tested signature is genuine if post(gen)>post(frg) and forgery if post(gen)<post(frg).

SVM Classifier:
In our proposed system, we have applied the LS-SVM classifier with the help of trial and error method. In LS-SVM classifier, every data point acts as a support vector where some of them have more contribution than others [19]. We have selected the following parameters.
• Kernel function: Radial Basis Function where the value of δ is taken as 1.
• Boxconstraint: Boxconstraint is nothing but the tradeoff parameter between error and margin which is taken as 1e-2 in the system. SVM can classify both linearly and non-linearly separable data. Data in our proposed system are non-linearly separable data.

Results and Discussion
Experiments were conducted with the methodology described above with various test signatures belonging to the categories genuine and skilled, simple and random type of forgery cases. One signature of each type is illustrated below. Figure 6. A tested genuine signature with before and after application of preprocessing steps.

Verification with Genuine Signature
We illustrate the result with the help of a signature (which is originally genuine), as indicated in Figure 6. The extracted features after requisite pre-processing are indicated in Table 3.  Table 3, co-ordinate of the tested signature concerning features Area and Height-Width Ratio are (642, 0.3455). In Figure 7, this tested data item is indicated by an arrow mark. From the alignment of the training data items, it is clear that the indicated data item is in the left-hand sides cluster. Since our verification result with FCM is based on the assumption that the number of data items in the final cluster < 3, therefore, we have to again divide that particular cluster where the tested data item lies, based on the features Normalized Area (NA) as well as Sum of Local Normalized Areas (SLNA). Now, the co-ordinate of the tested signature or data item for features like Normalised Area and Sum of Local Normalized Areas is (0.0384, 0.1536) as indicated by an arrow mark in Figure 8. From the figure, it is clear that the indicated data item lies in the right cluster. Since this data item is making a cluster with one another data item of the training set, hence we can conclude that tested signature is genuine.
Using NB: Here the probability values obtained for the given signature were post(gen) = 0.4923 and post(frg) = 6.2974e-004. Hence based on maximum posterior probability, the tested signature is recognized as genuine.
Thus based on simple majority voting algorithm, ensemble classifier correctly signifies the given signature as genuine.  The method is then applied to one each Skilled, Simple and Random forgery signatures (Figures 9, 11 and 13) of the same signature. It may be mentioned that here the values of the test data will be different but the training sets will be the same as before. Figure 9. A tested shilled forgery signature with before and after application of preprocessing steps.

909
Using FCM: The FCM technique has to be conducted three times: first with IA and HWR, then with NA and MVP and finally with MHP and SLNA, from which it is clear that the tested signature is a forgery (Figure 10). Using NB: Here post (gen) = 0.1273 and post (frg) = 4.1712e-004 Hence, by the method mentioned above, the tested signature is wrongly recognized as genuine.
Using SVM: Since the training set of tested signature is same as in the genuine case, therefore its support vectors, values of alpha and bias will also be same (as in above genuine case) which exhibit the tested signature as a forgery one. Thus,the results of the individual classifiers and the Ensemble classifier is as follow: Using FCM: Here also, the FCM has to be conducted three times. As a result, the tested signature is classified as forgery ( Figure 12).
Using SVM: Here also, the training set of tested signature is same.  Using FCM: Here the test signature is classified as Forgery after the first step of clustering, with features IA and HWR ( Figure 14).
Using SVM: Here also the training set of tested signature is same. Thus,

FCM
NB SVM Ensemble Decision Forgery Forgery Forgery Forgery

Performance measurement
The proposed method is evaluated with the following evaluation criteria [20]( Table 4): The smaller value of AER indicates higher significance of the system. From Tables 5-7, it is observed that Naive Bayes and Ensemble are comparable as they exhibit better performance than the other two. But among them, in most of the cases Ensemble classifier performs better than the Naive Bayes and consequently, we have taken the Ensemble as a final classifier.
Jana et al. [21] developed an euclidean based offline signature verification system (database size = 175, language = English) with global and local features. The computed FRR and FAR are 2.86% and 17.14% respectively. With the help of global and local features, Bhausaheb et al. [22] presented a euclidean based offline signature verification system (language = English) with accuracy 90% to 100% for 60 to 360 signature samples.Hatkar et al. [23] explained a neural network based offline signature verification system (database size = 1000, language = English) with geometrical features. The accuracy of the proposed system was obtained as 86.25%. Chugh et al. [24] illustrated a Kohonen networks based offline signature verification system where database size is 250. They obtained FAR and FRR as 2.8% and 5% respectively.
M.H. Sigari et al. [25] elaborated an ensemble offline signature verification system based on three simple classifiers with Nearest Neighbor using different combination methods where Majority Voting obtained the better result with 19.25% AER. Here, the database consists of 30 signature samples of 20 classes. Swanepoel, JP. [26] developed an offline signature verification system with database size 4800 where AER is obtained as 10.23%. Here, the combined classifier is constructed by ensembling DTW and HMM base classifiers.
Finally, we have compared the performance of the proposed system with other systems (which deal with ensemble classifier with majority voting method) as tabulated below in Table 8:

Conclusion
Above table, Table 8, indicates that our proposed system gives more significant result than the other three systems in terms of their AER. Hence this technique may be adopted to resolve the confusing and routine cases of handwritten signature forgery cases faced in real-life situations.