Selection of Initial Points Using Latin Hypercube Sampling for Active Learning

Nompilo Mabaso; Roelof  Coetzer; Shawn  Liebenberg,

doi:10.19139/soic-2310-5070-2688

Selection of Initial Points Using Latin Hypercube Sampling for Active Learning

Nompilo Mabaso Focus Area for Pure and Applied Analytics, North-West University, South Africa; School of Mathematical and Statistical Sciences, North-West University, South Africa https://orcid.org/0000-0003-4594-6345
R.L.J Coetzer Focus Area for Pure and Applied Analytics, North-West University, South Africa https://orcid.org/0000-0002-7505-3911
Shawn C. Liebenberg Focus Area for Pure and Applied Analytics, North-West University, South Africa; School of Mathematical and Statistical Sciences, North-West University, South Africa https://orcid.org/0000-0002-0106-3084

DOI: https://doi.org/10.19139/soic-2310-5070-2688

Keywords: Active learning, uncertainty sampling, Latin hypercube designs, Logistic regression

Abstract

Classification requires labelling large sets of data, which is often a time-consuming and expensive process. Active learning is a machine learning technique that has gained popularity in recent years due to its ability to effectively reduce the amount of labelled data required to train accurate models. The success of the active learner heavily relies on the selection of the initial points to initialise the active learning process. In this paper, we compare the performance of the traditional random sampling approach to the maximin Latin Hypercube sampling, conditioned Latin Hypercube sampling, and a modified Latin Hypercube sampling procedure for initialising active learning for the estimation of the logistic regression in binary classification problems. We show that the Latin Hypercube sampling designs outperform random sampling for all the performance measures evaluated. The results are demonstrated using simulated data sets and an actual case study. Specifically, the conditioned Latin hypercube sampling design exhibits high prediction accuracy using a smaller sample size for both heterogeneous and homogeneous classes. In contrast, the modified Latin hypercube sampling design yields the smallest variance of prediction across varying initial sample sizes for both homogeneous and heterogeneous classes. Furthermore, principal component analysis indicates that approximately 10\% of the data is required to develop an accurate and precise logistic regression classifier.

Published

2025-11-24

How to Cite

Mabaso, N., Coetzer, R., & Liebenberg, S. (2025). Selection of Initial Points Using Latin Hypercube Sampling for Active Learning. Statistics, Optimization & Information Computing, 15(3), 1664-1691. https://doi.org/10.19139/soic-2310-5070-2688

Download Citation

Issue

Vol 15 No 3 (2026)

Section

Research Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).