Weighted Machine Learning

Mahdi Hashemi, Hassan A. Karimi

Abstract


Sometimes not all training samples are equal in supervised machine learning. This might happen in different applications because some training samples are measured by more accurate devices, training samples come from different sources with different reliabilities, there is more confidence on some training samples than others, some training samples are more relevant than others, or for any other reason the user wants to put more emphasis on some training samples. Non-weighted machine learning techniques are designed for equally important training samples: (a) the cost of misclassification is equal for training samples in parametric classification techniques, (b) residuals are equally important in parametric regression models, and (c) when voting in non-parametric classification and regression models, training samples either have equal weights or their weights are determined internally by kernels in the feature space, thus no external weights. Weighted least squares model is an example of a weighted machine learning technique which takes the training samples’ weights into account. In this work, we develop the weighted versions of Bayesian predictor, perceptron, multilayer perceptron, SVM, and decision tree and show how their results would be different from their non-weighted versions.

Keywords


Classification; Regression; Bayesian; Multilayer perceptron; Support vector machines; Decision tree.

References


J. R. Quinlan, Bagging, boosting, and C4.5, Proceedings of the Thirteenth National Conference on Artificial Intelligence. Portland, Oregon: AAAI Press, pp. 725–730, 1996.

J. Friedman, T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting, The Annals of Statistics, vol.28, no. 2, pp. 337–407, 2000.

L. Devroye, L. Gyorfi, and G. Lugosi, A probabilistic theory of pattern recognition, Springer, 1996.

S. Theodoridis, and K. Koutroumbas, Pattern Recognition, (4th ed.). Elsevier, 2009.

E. Parzen, On estimation of a probability density function and mode, The Annals of Mathematical Statistics, vol. 33, no. 3, pp. 1065–1076, 1962.

W. Hardle, Applied nonparametric regression, Cambridge, UK: Cambridge University Press, 1990.

J. Fan, and I. Gijbels, Local polynomial modelling and its applications: monographs on statistics and applied probability, London:CRC Press, 1996.

S. J. Leon, Linear algebra with applications, (8th ed.). Upper Saddle River, NJ: Prentice Hall, 2010.

L. Wasserman, All of nonparametric statistics, Berlin: Springer, 2006.

F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review,vol. 65, no. 6, pp. 386–408, 1958.

M. L. Minsky, and S. Papert, Perceptrons, expanded edition. MA: MIT Press, 1988.

C. Cortes, and V. Vapnik, Support-vector networks, Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.

V. N. Vapnik, The nature of statistical learning theory, New York: Springer, 1995.

V. N. Vapnik, Statistical learning theory, New York: Wiley, 1998.

M. S. Bazaraa, H. D. Sherali, and C. M. Shetty, Nonlinear programming: theory and algorithms, (3rd ed.). New Jersey: Wiley, 2006.

D. P. Bertsekas, Nonlinear programming, (2nd ed.). Belmont, Massachusetts: Athena Scientific, 1999.

R. Fletcher, Practical methods of optimization, (2nd ed.). Chichester, West Sussex, England: Wiley, 1987.

S. G. Nash, and A. Sofer, Linear and nonlinear programming, New York: McGraw-Hill, 1996.

L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen, Classification and regression trees, CRC press, 1984.

B. D. Ripley, Neural networks and related methods for classification, Journal of the Royal Statistical Society, Series B, vol. 56, no.3, pp. 409–456, 1994.

P. J. Werbos, Beyond regression: New tools for prediction and analysis in the behavioral sciences, Cambridge, MA: Ph.D. Thesis, Harvard University, 1974.

C. M. Bishop, Neural networks for pattern recognition, Oxford University Press, 1995.

S. S. Haykin, Neural networks: a comprehensive foundation, (2nd ed.). Prentice Hall, 1999.

I.Witten,E.Frank,M.Hall,andC.Pal, DataMining:Practicalmachinelearningtoolsandtechniques, (4thed.).MorganKaufmann,2016.

M. D. Richard, and R. P. Lippmann, Neural network classifiers estimate Bayesian a posteriori probabilities, Neural Computation, vol. 3, no. 4, pp. 461–483, 1991.

J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the theory of neural computation, Addison-Wesley, 1991.

I. Goodfellow, Y. Bengio, and A. Courville, Deep learning, MIT Press, 2016.

B. E. Boser, I. M. Guyon, and V. N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, pp. 144–152, 1992.

T. M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Transactions on Electronic Computers, vol. 14, no. 3, pp. 326–334, 1965.

J. Mercer, Functions of positive and negative type, and their connection with the theory of integral equations, Philosophical

Transactions of the Royal Society A, vol. 209, no. 1, pp. 415–446, 1909.

J. Shawe-Taylor, and N. Cristianini, Kernel methods for pattern analysis, Cambridge University Press, 2004.

B. Scholkopf, and A. J. Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT Press, 2002.

R. Courant, and D. Hilbert, Methods of Mathematical Physics (Vol. I), New York: Interscience, 1953.

W. H. Wolberg, Breast Cancer Wisconsin Data Set, 1992, Retrieved 2017, from http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

R. A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Networks, vol. 1, no. 4, pp. 295–307, 1988.


Full Text: PDF

DOI: 10.19139/soic.v6i4.479

Refbacks

  • There are currently no refbacks.