Learning Unknown Structure in CRFs via Adaptive Gradient Projection Method
AbstractWe study the problem of fitting probabilistic graphical models to the given data when the structure is not known. More specifically, we focus on learning unknown structure in conditional random fields, especially learning both the structure and parameters of a conditional random field model simultaneously. To do this, we first formulate the learning problem as a convex minimization problem by adding an l_2-regularization to the node parameters and a group l_1-regularization to the edge parameters, and then a gradient-based projection method is proposed to solve it which combines an adaptive stepsize selection strategy with a nonmonotone line search. Extensive simulation experiments are presented to show the performance of our approach in solving unknown structure learning problems.
J. Barzilai and J. M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., vol. 8, no. 1, pp. 141-148, 1988.
J. Besag, Efficiency of pseudo-likelihood estimation for simple Gaussian fields, Biometrika, vol. 64, pp. 616-618, 1977
F. Biglari, M. A. Hassan and W. J. Leong, New quasi-Newton methods via higher order tensor models, J. Comput. Appl. Math., vol. 235, pp. 2412-2422, 2011.
F. Biglari and M. Solimanpur, Scaling on the spectral gradient method, J. Optim. Theory Appl., vol. 158, no. 2, pp. 626-635, 2013.
E. G. Birgin, J. M. Mart´ ınez and M. Raydan, Nonmonotone spectral projected gradient methods on convex sets, SIAM J. Optim., vol.10, no. 4, pp. 1196-1211, 2000.
S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1-122, 2011.
Y. Dai and R. Fletcher, Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numer. Math., vol. 100, no. 1, pp. 21-47, 2005.
Y. Dai, J. Yuan and Y. Yuan, Modified two-point stepsize gradient methods for unconstrained optimization, Comput. Optim. Appl., vol.22, no. 1, pp. 103-109, 2002.
L. Grippo, F. Lampariello and S. Lucidi, A nonmonotone line search technique for Newton’s method, SIAM J. Numer. Anal., vol. 23, pp. 707-716, 1986.
S. Huang, Z. Wan and X. Chen, A new nonmonotone line search technique for unconstrained optimization, Numer. Algorithms, vol. 68, no.4, pp. 671-689, 2015.
J. Lafferty, A. McCallum and F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in ICML, pp. 282-289, 2011.
J. Nocedal and S. J. Wright, Convex Optimization, New York: Springer Science & Business Media, 2006.
M. Raydan, The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM J. Optim., vol.7, no. 1, pp. 26-33, 1997.
M. Schmidt, K. Murphy, G. Fung and R. Rosales, Structure learning in random fields for heart motion abnormality detection, in CVPR, 2008.
M. Schmidt, E. van den Berg, M. P. Friedlander and K. Murphy, Optimizing costly functions with simple constraints: a limitedmemory projected quasi-Newton algorithm, in AISTATS, 2009.
B. A. Turlach, W. N. Venables and S. J. Wright, Simultaneous variable selection, Technometrics, vol. 47, no. 3, pp. 349-363, 2005.
W. Wang and Q. Wang, Approximated function based spectral gradient algorithm for sparse signal recovery, Stat., Optim. Inf. Comput., vol. 2, no. 1, pp. 10-20, 2014.
Z. Wei, G. Yu, G. Yuan and Z. Lian, The superlinear convergence of a modified BFGS-type method for unconstrained optimization, Comput. Optim. Appl., vol. 29, no. 3, pp. 315-332, 2004.
Y. Xiao, Q. Wang and D. Wang, Notes on the Dai-Yuan-Yuan modified spectral gradient method, J. Comput. Appl. Math., vol. 234, no. 10, pp. 2986-2992, 2010.
G. Yu, L. Qi, Y. Sun and Y. Zhou, Impulse noise removal by a nonmonotone adaptive gradient method, Signal Process., vol. 90, pp. 2891-2897, 2010.
H. Zhang and W. W. Hager, A nonmonotone line search technique and its application to unconstrained optimization, SIAM J. Optim., vol. 14, pp. 1043-1056, 2004.
J. Zhang, N. Deng and L. Chen, New quasi-Newton equation and related methods for unconstrained optimization, J. Optim. Theory Appl., vol. 102, pp. 147-167, 1999.
B. Zhou, L. Gao and Y. Dai, Gradient methods with adaptive stepsizes, Comput. Optim. Appl., vol. 35, pp. 69-86, 2006.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).