Selection of Initial Points Using Latin Hypercube Sampling for Active Learning

Keywords: Active learning, uncertainty sampling, Latin hypercube designs, Logistic regression

Abstract

Classification requires labelling large sets of data, which is often a time-consuming and expensive process. Active learning is a machine learning technique that has gained popularity in recent years due to its ability to effectively reduce the amount of labelled data required to train accurate models. The success of the active learner heavily relies on the selection of the initial points to initialise the active learning process. In this paper, we compare the performance of the traditional random sampling approach to the maximin Latin Hypercube sampling, conditioned Latin Hypercube sampling, and a modified Latin Hypercube sampling procedure for initialising active learning for the estimation of the logistic regression in binary classification problems. We show that the Latin Hypercube sampling designs outperform random sampling for all the performance measures evaluated. The results are demonstrated using simulated data sets and an actual case study. Specifically, the conditioned Latin hypercube sampling design exhibits high prediction accuracy using a smaller sample size for both heterogeneous and homogeneous classes. In contrast, the modified Latin hypercube sampling design yields the smallest variance of prediction across varying initial sample sizes for both homogeneous and heterogeneous classes. Furthermore, principal component analysis indicates that approximately 10\% of the data is required to develop an accurate and precise logistic regression classifier.
Published
2025-11-24
How to Cite
Mabaso, N., Coetzer, R., & Liebenberg, S. (2025). Selection of Initial Points Using Latin Hypercube Sampling for Active Learning. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2688
Section
Research Articles