Solving a Typical Small Sample Size MRSM Dataset Problem Using a Flexible Hybrid Ensemble Approach for Credibility

Multiresponse surface methodology often involves small data analytics which, statistically, have regression modelling credibility problems. This is worsened by dataset, model selection and solution methodology uncertainties.It is difficult for solution methodologies which select and use single best models per response at simultaneous optimisation to effectively deal with these problems.This paper exploited the fact that model selection criteria choose differently, in a flexible hybrid ensemble system, to generate several solutions for integration and comparison.Mean square prediction error, with bias-variance-covariance decomposition values, were computed and analysed at simultaneous optimisation. Results suggest that the credibility of the final solution is enhanced when working with multiple models, solution methodologies and results.However, the results do not show any significance of small sample size correction to model selection criteria and analysis of bias-variance-covariance decompositions at simultaneous optimisation does not encourage dependence on theoretical optimality for best results.


Introduction
Multiple response surface methodology (MRSM) involves experimental designing and execution, regression modelling, model selection, response surface analysis, and optimisation [1,2,3,4].MRSM often involves small data analytics.The credibility of working with small sample size datasets is a big issue in statistics [5,6].Practitioners face the problem of model uncertainty in MRSM work and classical model selection (MS) and model averaging (MA) are the common techniques used for attempting to solve this problem [7], though MS is more common.Over the years, many MS criteria have been developed resulting in the problem of MS criterion uncertainty as the criteria often make different choices [8].Moreover, MS has been criticised for its failure to account for dataset and model uncertainty, model bias and loss of information contained in discarded response models resulting in overly specified prediction accuracy, and understatement of variance [3,9,10].These problems compound to solution uncertainty which leaves a question on the credibility of the final result.
Optimisation in MRSM is multi-objective in nature, and is performed after regression modelling and model selection of single "best" models for each response [3,4].Once single "best" models for each response are available, optimisation is performed to determine optimum or desired working process parameters.The existence of so many MRSM solution methodologies that often give different results implies the existence of solution methodology uncertainty which, again, leads to solution uncertainty and lack of solution credibility.A typical small sample size MRSM dataset problem borrowed from [11] of an experiment done to determine rubber covered mining conveyor belting optimum cure times (T c (min.)), for different rubber thicknesses (R t (mm)), is shown in Table 1.The intention was to derive the relationship Given the small sample size MRSM experiment dataset in Table 1 and that the component adhesion and cover hardness are parametric functions of cure time and rubber thickness (see Figure 1), the company wanted to submit a work instruction to the shop floor with a table of the form in Table 2. Rawlings et al. in [5] defined as small sample size anything satisfying the relationship n/k < 40 and stated that no MS exercise should be taken seriously when the sample sizes are as small as (n − k) <= 10, where n is sample size.In this case n/k = 13/6 < 40 and (n − k) = (13 − 6) = 7 < 10 [11].MRSM literature has many such small sample size examples [3].To guarantee solution credibility, there was obvious need for rigour in the solution methodology working with such a small sample size MRSM dataset.The rest of this paper is organised as follows: Section 2 describes the solution methodology, while Section 3 presents the empirical example investigation results, Section 4 is discussion and conclusion.

Solution Methodology
Zhang in [12] suggested that for the practitioner who is less concerned with theoretical optimality, it is more important to have available methods that are simple but flexible enough to be used in a variety of practical situations.Burnham and Anderson in [9] proposed multi-model inference (M-MI).An ensemble is defined as a system that is constructed with several individual models/algorithms working in parallel and whose outputs are combined with a suitable decision fusion strategy to produce a single answer for a given problem [14] as shown in Figure 2.

Fusion Strategy
Final Result Figure 2. A general ensemble system architecture.
A bias-variance decomposition of a single regression model is shown in equation ( 2): where F i is the fitted value and E(f ) is the expected value of the function f [13].This decomposition can be reduced to Krogh and Vedelsby in [15] derived that where E ave is the average error of the base models, and D is the diversity term estimating the amount of variability among the base models.E ens ≤ E ave since D is non-negative.This implies that Stat., Optim.Inf.Comput.Vol. 12, March 2024 where M SE i is the mean square error of each base model in the ensemble.Ueda and Nakano in [16] then produced the bias-variance-covariance decomposition of the generalisation of ensemble estimators.In this decomposition it is assumed that: then and This shows that if we are able to design an ensemble with both low bias, variance and uncorrelated individual solution methodologies, we can expect improved generalisation performance.
Unlike the MRSM approach which is based on selecting single "best" models for each response before simultaneous optimisation, the ensemble system integrates several accurate results after simultaneous optimisation and is thus less risky, more credible and can be more accurate than any of its base members if they are both accurate and diverse [17,18,19,20,21,22].The ensemble approach is popular for solving complex problems including small sample size dataset problems [23,24,25,26,27].The following flexible hybrid ensemble solution methodology was, therefore, employed:

Application of the hybrid ensemble methodology
Hybrid ensembles are heterogeneous mixtures of base models, base methodologies and ensemble systems [28,29].Hybrid ensemble systems have been demonstrated to perform better than homogeneous systems [30,31].Figure 3 is a hybrid ensemble of five solution methodologies used to obtain five solutions which were then integrated into two before a final credible solution was selected.Figure 4: Presents the diagrammatic flow of the hybrid ensemble.
All possible regression modelling was carried out on the small sample size MRSM dataset of Table 1 for both the hardness and adhesion responses.Response surface conformance analysis (RSCA) was done on the two sets of twenty-five OLS adhesion and hardness response models.The hardness response had only one model with a conforming response surface, hardness response model [11].The hybrid ensemble base methodologies, therefore, focused on selecting candidate sets of adhesion response models for simultaneous optimisation with the single hardness response model to generate accurate and diverse results for integration.Table A1, in Appendix, shows the fifteen MS criteria used to assess the twenty-five adhesion response models.Table A2, in Appendix, shows the adhesion response models selected as best in red by each MS criterion.The fifteen MS criteria were then split into three groups of five which are (i) MS criteria not corrected for small sample size inefficiency (Table 3), (ii) MS criteria corrected for small sample size inefficiency (Table 4), and (iii) prediction MS criteria (Table 5).Tables 3 to 5 show the MS criteria, their values and the models they selected.The Votes column shows the number of times a particular response model was selected, for example model [T c .R t , R 2 t ] was selected four times by MS criteria AIC, BIC, KIC and TIC.These three groups were used to construct the first three base ensembles in the hybrid ensemble.Table 7 shows the implications of comparisons of results from various base ensembles of Figure 4.The fourth base ensemble methodology utilised the first five "best" AIC c adhesion models as base models for simultaneous optimisation as proposed by Burnham and Anderson in [9] in their multi-model inference system (M-MI).Table 6 shows the five adhesion response models and their AIC c values and the computed smoothed AIC c (S − AIC c ) weights.The formula for S − AIC c is,  The fifth and last methodology combined the adhesion response models of the M-MI ensemble system using their S-AICc weights, as shown in Table 7, to come up with an averaged model estimator for simultaneous Model

Model
AICc Wt.Intercept optimisation with the hardness response model.The difference between the M-MI ensemble and the M-MI CBFMA methodology was that, whilst M-MI combined results after simultaneous optimisation, M-MI CBFMA combined the five models using MA before simultaneous optimisation.After simultaneous optimisation of the adhesion response base models with the single hardness response model with a conforming response surface, results from the individual solution methodologies were compared and integrated using arithmetic averaging (Ave.) and majority vote (M.Vote).Majority vote meant selecting the cure time with the highest frequency of occurrence for a given rubber thickness.Equation ( 5) of Krogh and Vedelsby in [15] suggested that the mean square prediction error (MSPE) of the hybrid ensemble would be better than or equal to the average of the base methodologies.Table 8 shows the expected implications of analysing results from different base ensembles.

Bias-variance-covariance decomposition
Computations of the bias-variance decomposition of base models and bias-variance-covariance decomposition of base ensemble estimators at simultaneous optimisation were done using equations ( 2) to (7) and Table 9 shows the simultaneous optimisation and determination of the various components of the bias-variance decomposition of the adhesion-hardness response model pair c ] using Excel.The T c (min.) column is the estimated cure time for each rubber thickness R t (mm).The value e 2 is the square of the residual, the difference between the targeted value and the fitted value.Var. is the variance (ref.equation ( 3)).
simultaneous optimisations and computations of bias-variance decomposition for the rest of the base models of all base methodologies were done and contributed to the bias-variance-covariance decomposition of the ensembles in Tables 10, 11, 12 and 13, respectively.show that a response model with the best MSPE and bias on the adhesion side normally has the worst MSPE and bias at the hardness response model side at simultaneous optimisation, and the reverse is true.This suggests an MSPE (adhesion) -MSPE (hardness) trade-off that could be reminiscent of the popular biasvariance trade-off in statistics and machine learning.Khuri and Conlon in [32] called this simultaneous optimisation compromise.

Integration of ensembles
Tables 14, 15, 16 and 17 show the integrations of individual base model cure time estimates for each of the four ensemble systems with bias-variance-covariance computations shown in Tables 10, 11, 12 and 13, respectively.The weights used to compute the weighted averages (W.AVE) in Tables 14, 15 and 16 were from the VOTES column of Tables 3, 4 and 5. Table 16 used the S-AICc weights of Table 6.The weights of response models in Tables 14 - 16 were computed from the votes shown in Tables 3, 4 and 5.The yellow shading in Tables 14 -18 shows where the cure time estimates of each base model agreed with the weighted average (W.AVE) values in the last column.The formula for relative accuracy of base models and methodologies relative to the weighted average or hybrid ensemble results is shown in equation (15).The adhesion base response models selected in both the small sample size inefficiency uncorrected and corrected criteria ensemble systems were the same.The only difference in Tables 14 and 15 is in the weights as shown in the VOTES of Tables 3 and 4.

Integration of the results of the five solutions methodologies
Both the Ave. and the M. Vote fusion methods had the same cure time estimates.The three base methodologies of Prediction MS, M-MI and M-MI CBFMA agreed on all the cure time estimates.The uncorrected and the corrected MS information criteria ensembles had the same results, though different from the other three on three rubber thicknesses.
The bias-variance-covariance decomposition of the hybrid ensemble with the five different solution methodologies is shown in Table 18.
The small sample size uncorrected and corrected MS information criteria ensembles comparatively had the worst MSPE and bias values on the adhesion response side, but the best on the hardness side.M-MI and M-MI CBFMA,

Discussion and Conclusion
The discussion of the results is structured around the comparisons/analysis and implications presented in Table 8.

Effect of small sample size inefficiency correction
The adhesion models selected by the two sets of five small sample size uncorrected and corrected MS information criteria were the same.The cure time estimates of the small sample size uncorrected and corrected MS information criteria ensembles were thus the same.The differences on the MSPE and biases of the two ensembles were insignificant.This suggests that the effect of small sample size correction on model selection criteria may be insignificant at multiple model selection criteria level.

Error of optimism
The cure time estimates of the uncorrected and corrected MS criteria ensembles are different on three rubber thicknesses from, and the theoretical accuracy values of the ensembles are worse than those of, the prediction MS criteria and M-MI ensembles.This demonstrates the error of optimism.Response models that have very good fit to the dataset are not necessarily good at generalisation.

Effect of CBFMA
The M-MI and M-MI CBFMA ensembles had the same cure time estimates but different theoretical accuracies.The M-MI CBFMA had a better theoretical accuracy than the M-MI.This agrees with literature.

Solution uncertainty/credibility
Six out of eight results (0.75) were in agreement as shown in Figure 5. Based on the principle that the greater the uncertainty, the lower the credibility, this result had the least uncertainty and hence the highest credibility.In this particular MRSM example, the hybrid ensemble did not produce a final result that was more accurate than all the base methodologies, but one more credible than individual base methodologies.The fact that different answers were obtained by different solution methodologies exposed the risk of depending on a result from a single methodology.This suggests that a multiple solution methodology approach exposes the problem of solution methodology uncertainty which is not exposed by the use of a single solution methodology and supports ensembling of results from different methodologies.Multiple methodologies allow for comparison and justifications on selection of final result.The accuracy values in the bias-variance-covariance decomposition at simultaneous optimisation do not give a clear indication of the "best" methodology or "best" response model pair.Future research areas would look at: • A simple software that can easily generate solutions from different methodologies for integration using multiple fusion methodologies and the most logical solution selected.
• A further study on simultaneous optimisation compromise and it's relationship to prediction accuracy

Figure 3 .
Figure 3.The flexible hybrid ensemble solution methodology.

Figure 4 .
Figure 4.The diagrammatic flow of the hybrid ensemble.

322SOLVINGA
TYPICAL SMALL SAMPLE SIZE MRSM DATASET PROBLEM

Figure 5 .
Figure 5.Comparison of Results

Table 1 .
Averaged results for two-factor CCD MRSM conveyor belting experiment Run T c (min.) R t (mm) Ave.Hardness Ave.
T c (min) R t (mm) Adhesion(T c , R t ) ≥ 12Kn/m Hardness(T c , R t ) ≥ 60ShoreAFigure 1. Conveyor belting vulcanisation process and the desired minimum quality requirements.

Table 2 .
Showing the tabular form of equation 1 required in work instruction

Table 3 .
Showing adhesion response models selected as best by five small sample size uncorrected MS criteria Models AIC BIC HQ KIC TIC Votes[T c , R t , T c .R t , T 2 c , R2 t ] 11.7 15.1 0.5 20.7 13.7 1 [T c , T c .R t , T 2

Table 4 .
Showing adhesion response models selected as best by five small sample size corrected MS criteria

Table 5 .
Showing adhesion response models selected as best by Prediction MS criteria.Adeq.Prec.Cp − k PRESS AP C p Votes [T c , T c .R t , T 2

Table 6 .
Showing adhesion response models selected as best by the AICc MS criterion.

Table 7 .
Showing the computation of the frequentist model averaged estimator (M-MI CBFMA).

Table 8 .
Showing the implications of comparisons of results from different base ensembles.

Table 9 .
The simultaneous optimisation of adhesion-hardness response model pair[Tc.R t , R 2 t ] − [Tc, R t , Tc.R t , T 2

Table 10 .
Bias-variance-covariance decomposition of the Uncorrected MS ensemble.

Table 11 .
Bias-variance-covariance decomposition of the Corrected MS ensemble.

Table 12 .
Bias-variance-covariance decomposition of the the Prediction MS ensemble.

Table 13 .
Bias-variance-covariance decomposition of the M-MI ensemble.

Table 14 .
Results integration for Uncorrected MS criteria ensemble cure time estimates.

Table 15 .
Results integration forCorrected MS ensemble., R t , T c .R t , T 2 c , R 2 t ] [T c .R t , R 2 t ] [T c , T c .R t , T 2

Table 16 .
Results integration for Prediction MS criteria ensemble.hand, had the best MSPE and bias values on the adhesion side.The Prediction MS criteria ensemble had the worst MSPE and bias values on the hardness side but was middle-of-the-road on the adhesion side.It was therefore, difficult to select a "best" methodology in this case.

Table 17 .
Results integration for M-MI ensemble.

Table 18 .
Showing the cure time estimates of the various methods and their integration.

Table 19 .
Comparing bias-variance-covariance decomposition results of all methods.