Improper Multivariate Receiver Operating Characteristic (iMROC) Curve

In a multivariate setup, the classification techniques have its significance in identifying the exact status of the individual/observer along with accuracy of the test. One such classification technique is the Multivariate Receiver Operating Characteristic (MROC) Curve. This technique is well known to explain the extent of correct classification with the curve above the random classifier (guessing line) when it satisfies all of its properties especially the property of increasing likelihood ratio function. However, there are circumstances where the curve violates the above property. Such a curve is termed as improper curve. This paper demonstrates the methodology of improperness of the MROC Curve and ways of measuring it. The methodology is explained using real data sets.


Introduction
In classification, there are plenty of techniques to accommodate the need for identifying an individual/observer's status in a wide variety of fields like Psychology, Banking, Forensic, Medicine, etcetera [12,3]. One such classification technique, the Receiver Operating Characteristic (ROC) Curve has been adapted by many authors in order to evaluate the accuracy of a test especially in the field of medicine [6]. The ROC analysis is majorly used to identify the individual's health status by defining an optimal threshold for a biomarker observed in the case of that particular disease. The first parametric ROC is the Binormal ROC Curve where the variable under study for two independent populations (healthy/diseased or signal/noise) follow Normal distributions [3].
The ROC Curve has three important properties [4]: is the mathematical model of the ROC Curve, where y denotes the true positive rate and x denotes the false positive rate. The curve is a monotonic increasing function in the positive quadrant, lying between y=0 at x=0 and y=1 at x=1. • The ROC Curve is unaltered if the classification scores undergo a strictly increasing transformation. • The slope of the ROC Curve (likelihood ratio of ROC Curve) at threshold value 'c' is always positive and given by dy dx = P (U > c|1) P (U > c|0 493 When dealing with practical problems, we often come across the presence or involvement of several variables. This generates a scenario of having a classifier rule for a better classification. Su and Liu [11], Reiser and Ferragi [8], Schisterman et. al. [10], Liu et. al. [5], Yuan and Ghosh [13], Chang and Park [2] and Sameera et. al. [9] are a few to cite among those who proposed an extension of univariate ROC model to multivariate setup. However the present work is based upon the Multivariate ROC (MROC) model proposed by Sameera et. al. [9], as they showed that this model works better than the model proposed by Su and Liu [11] and their model is applicable to data where the covariance structures of two populations can be proportional or non-proportional. As mentioned about the properties of the ROC Curve, the most important one to verify is the concavity behavior i.e., slope of the ROC Curve at 'c' is always positive. Now the question to be raised is, what happens if a curve is not satisfying the concavity property?. If the curve violates this property, it might affect the accuracy of the test as well as the optimal cutoff point defined for that particular test.
Mathematically, a meaningful decision variable should be an increasing function of the likelihood ratio [7] and such MROC Curve is said to be "Proper". A function whose first derivative is decreasing throughout an open interval is called concave in that interval, and a function whose first derivative is increasing throughout an open interval is called convex in that interval. Since the slope of a MROC Curve for a continuous decision variable is equal to the likelihood ratio at the corresponding threshold, it follows that the slope of a MROC Curve decreases as the false positive rate (FPR)increases, that is, a MROC Curve will be concave everywhere (0 ≤ F P R ≤ 1). If the decision variable is not an increasing function of the likelihood function, then its model and corresponding MROC Curve are said to be improper.

Illustration of Improper MROC (iMROC) Curve
Consider the following example which illustrates the Indian Liver Patients (ILP) data set. For which the MROC Curve has been drawn and depicted in Figure 1. From the figure, fitted MROC Curve seems to be proper but when observed keenly; the improperness of the curve can be identified. In such situation, the usual MROC Curve methodology might not project the true accuracy of a test. Figure 1 visualizes the two crucial points namely, Crossing Reference line (t 0 or Crossing Point) and Inflection Reference line (t 1 or Inflection Point). Figure 1 shows the corresponding fitted MROC Curve; note that there is a visible 'dip in the curve crossing the chance line near the upper right hand corner of the unit square plot. In Figure 1, MROC Curve crosses the chance line at the point (1-Specificity, Sensitivity) = (0.96, 0.96), shown by the intersection of the "crossing" reference line with the MROC Curve. Furthermore, this MROC Curve is concave for F P R < 0.76, but is convex for F P R > 0.76. Therefore, the MROC Curve which separates the concave and convex portions of the curve is called the "Inflection Point (t 1 )". Similarly, the MROC Curve which crosses the chance line at the point where FPR=TPR is called the "Chance line crossing point or Crossing Point (t 0 )". From Figure 1, though the dip of the curve is visible i.e. the MROC Curve is not concave everywhere, it not possible to identify the inflection point visually. Even in the case of improper MROC Curves, it is not that easy to identify the point where the curve changes from concave to convex. In order to deal with this situation, the improper MROC Curve methodology has been developed and demonstrated below.

MROC Curve
the vectors of test scores of two independent multivariate normal populations with mean vectors µ 0 , µ 1 and covariance matrices Σ 0 & Σ 1 with m and n sample sizes respectively.
where b(̸ = 0) be a k × 1 vector, U is the test score and c be a scalar. The threshold value thus obtained using (1) is given as which is the form of Multivariate ROC model [9]. The AUC of MROC Curve is

Crossing Point
Let 'c' denote threshold to a chance line crossing FPR, then ) on further simplification, the expression for C 0 crossing threshold is Let t 0 denote the chance line crossing FPR corresponding to c 0 . Then ) on substituting (6) in the above expression, we obtain the expression for crossing point as, Uniqueness of t 0 follows from the uniqueness of c 0 .

Inflection Point
The slope of ROC Curve is twice differentiable. From basic calculus results concerning concave functions it follows that the MROC Curve is concave (convex) over an open interval if its second derivative is negative (positive) throughout the interval (0, 1). The approach is to show that the second derivative of the MROC Curve is negative throughout (0, t 1 ) and positive throughout (t 1 , 1) if ν < 1, and positive throughout (0, t 1 ) and negative throughout (t 1 , 1) if ν > 1. Let t denote an FPR with corresponding threshold c. The derivative of the MROC Curve evaluated at t is equal to the likelihood ratio evaluated at c, i.e., It follows, using the chain rule, that Since, Then t is a strictly decreasing function of c and ) 496 IMROC CURVE Therefore, the equation (8) can be rewritten as, Since ϕ ) > 0, it follows from Equation (9) that the second derivative of the MROC Curve and the derivative of the likelihood ratio have opposites signs when evaluated at t and c, respectively. That is, Since the logarithmic function is strictly increasing, the likelihood ratio and log likelihood ratio derivatives have the same sign; hence it follows that The log likelihood and its derivative with respect to c is given by The threshold value at the inflection point can be estimated as, on further simplification, the threshold value at the inflection point is given by Then the corresponding FPR is ) on substituting c 1 in the above equation, the FPR at the corresponding c 1 is given by on further simplification, the FPR value at the inflection point is as follows Since the derivative of the log likelihood ratio has the opposite sign of the second derivative of the MROC Curve evaluated at the corresponding FPR (10) and thresholds less than c 1 correspond to FPRs greater than t 1 and vice versa, the FPR value is the unique inflection point FPR and c 1 = is its corresponding inflection point threshold. i.e., LR(c) at c 1 is zero

Results and Discussion
In order to demonstrate the methodology, two real datasets are used, namely MCA and ILP. Further, ILP dataset has been split according to sex of the patients. Of which, ILP male dataset has a form of Improper ROC Curve and the same dataset has been chosen for demonstration purpose.

ILP Male dataset
The Indian Liver Patient Male Dataset [1] contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. Selector is a class label used to divide the subjects into groups (liver patient or not). The intrinsic measures TPR and FPR, summary measure AUC and optimal cut point are computed using equations (1) to (5). The AUC observed is 0.7495which provides moderate classification, TPR and FPR are 0.6992 and 0.3008 respectively at the optimal cut point c = 1.5372. The best linear combination is given by If the test score is greater than optimal cutoff i.e., 1.5372 the individual is classified as diseased, otherwise healthy. The MROC Curve is drawn and depicted in the Figure 2. From Figure 2, it is clear that the fitted MROC Curve crosses the chance line and is moving towards the top right corner of the unit square plot, which generates an improper MROC Curve. Using the proposed methodology, the inflection point and chance line cross reference points are obtained and are highlighted in the Figure 2. Also, the inflection point and crossing points along with their thresholds are reported in Table 1.   In Table 1, it is observed that at the point of inflection t 1 = 0.5221 along with c 1 = −0.8002, the MROC Curve becomes convex (Figure 2). That is, this MROC Curve is concave for F P R < 0.5221, but is convex for F P R > 0.5221. Similarly, the MROC Curve crosses the chance line at crossing point t0=0.6235 along with c 0 = −0.9119. Hence, the area under the curve up to t 1 = 0.5221 projects the correct accuracy of the curve and the area from t 1 = 0.5221 to 1 will be biased as the healthy and diseased individuals will be misdiagnosed from this point forward.

MCA dataset
The neonatal dataset consists of two procedures: MCA and CPR used to check the blood flow from the womb of the mother to the baby for identifying the growth of the baby. Three indices were measured namely pulsatility index (PI), resistivity index (RI) and Systolic/Diastolic (S/D) ratio in all the procedures. The intrinsic measures TPR and FPR, summary measure AUC and optimal cut point are computed using equation (1) to (5). The AUC observed is 0.6253which provides moderate classification, TPR and FPR are 0.5968 and 0.4032at the optimal cut point c = −3.1749.The best linear combination is given by The above linear combination can be used for identifying the status of new individual. If the test score is greater than optimal cutoff i.e., -3.1749 the individual is classified as diseased, otherwise healthy. Further, the MROC Curve is drawn and depicted in Figure 3. From Figure 3, it is clear that the fitted MROC Curve crosses the chance line and moves towards the top right corner of the unit square plot, which leads to an improper MROC Curve. Using the proposed methodology, the inflection point and chance line cross reference points are obtained and are highlighted in the Figure 3. Also, the inflection point and crossing points along with their thresholds are reported in Table 2.
In Table 2, it is observed that at the point of inflection t 1 = 0.5631 along with c 1 = −3.3421, the MROC Curve becomes convex (Figure 3). That is, this MROC Curve is concave for F P R < 0.5631, but is convex for

ILP Complete dataset
The Indian Liver Patient Dataset [1] contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos. Selector is a class label used to divide the subjects into groups (liver patient or not).
The AUC is observed to be 0.7365 which provides moderate classification. For this dataset, TPR and FPR are 0.6921 and 0.3079 at optimal cut point c = 1.7773. Hence, the best linear combination is given by If the test score is greater than optimal cutoff i.e., 1.7773 the individual is classified as diseased, otherwise healthy. Further, the MROC Curve is drawn and is depicted in the Figure 4. From Figure 4, it is very difficult to say whether the fitted MROC Curve is proper one or improper one. But, using the proposed methodology it is clear that the fitted MROC Curve crosses the chance line at the crossing point where F P R = T P R = 0.9602.   Table 3, it is observed that at the point of inflection t 1 = 0.7406 along with c 1 = 1.2713, the MROC Curve becomes convex from this point ( Figure 4). Furthermore, this MROC Curve is concave for F P R < 0.7406, but 500 IMROC CURVE

Conclusion
In classification, the MROC Curve and its parameters have been estimated through the maximum likelihood estimation procedure based on the multivariate model, this multivariate model implies that the decision variable is not a monotone function of the likelihood ratio; hence this method produces improper MROC Curves that are not concave everywhere and cross the chance line, implying that the test performs worse than chance for a range of FPR values. Although in most situations the degree of improperness is so small that it cannot be seen, it is important to be able to easily identify those MROC Curves where the improperness is visible. The main interest of this paper is to identify the degree of improperness of the fitted MROC Curve and the same has been explained and also visualized in the figures in the cases of considered real data sets.