Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

  • Stanley Xu Institute for Health Research Kaiser Permanente Colorado
  • Emily B. Schroeder
  • Susan Shetterly
  • Glenn K. Goodrich
  • Patrick J. O’Connor
  • John F. Steiner
  • Julie A. Schmittdiel
  • Jay Desai
  • Ram D. Pathak
  • Romain Neugebauer
  • Melissa G. Butler
  • Lester Kirchner
  • Marsha A. Raebel


In studies that use electronic health record data, imputation of important data elements such as Glycated hemoglobin (A1c) has become common. However, few studies have systematically examined the validity of various imputation strategies for missing A1c values. We derived a complete dataset using an incident diabetes population that has no missing values in A1c, fasting and random plasma glucose (FPG and RPG), age, and gender. We then created missing A1c values under two assumptions: missing completely at random (MCAR) and missing at random (MAR). We then imputed A1c values, compared the imputed values to the true A1c values, and used these data to assess the impact of A1c on initiation of antihyperglycemic therapy. Under MCAR, imputation of A1c based on FPG 1) estimated a continuous A1c within ± 1.88% of the true A1c 68.3% of the time; 2) estimated a categorical A1c within ± one category from the true A1c about 50% of the time. Including RPG in imputation slightly improved the precision but did not improve the accuracy. Under MAR, including gender and age in addition to FPG improved the accuracy of imputed continuous A1c but not categorical A1c. Moreover, imputation of up to 33% of missing A1c values did not change the accuracy and precision and did not alter the impact of A1c on initiation of antihyperglycemic therapy. When using A1c values as a predictor variable, a simple imputation algorithm based only on age, sex, and fasting plasma glucose gave acceptable results.

Author Biography

Stanley Xu, Institute for Health Research Kaiser Permanente Colorado
Head of BiostatisticsAssociate ProfessorColorado School of Public Health


Nichols GA, Desai J, Elston Lafata J, Lawrence JM, O'Connor PJ, Pathak R et al. Construction of a multi-site DataLink using electronic health records for the identification, surveillance, prevention, and management of diabetes mellitus: The SUPREME-DM project. Preventing Chron DIs 2012; 9:11_-0311.

Raebel RA, Xu S, Goodrich GK, et al. Predictors of Initial Antihyperglycemic Therapy among Adults with Newly Identified Diabetes in the SUrveillance, PREvention, and ManagEment of Diabetes Mellitus (SUPREME-DM) Cohort. The Annals of Pharmacotherapy. 2013;47(10):1280–1291.

Gonen B, Rachman H, Rubenstein AH, Tanega SP, Horwitz DL. Hemoglobin A1c as an indicator of the degree of glucose intolerance in diabetics. Lancet 1977; 2:734 -737.

Nathan DM, Singer DE, Hurxthal K, Goodson JD. The clinical information value of the glycosylated hemoglobin assay. N Engl J Med 1984; 310:341-346.

Singer DE, Coley CM, Samet JH, Nathan DM. Tests of glycemia in diabetes mellitus: their use in establishing a diagnosis and treatment. Ann Intern Med 1989; 110 :125-137.

The Diabetes Control and Complications Trial Research Group. The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N Engl J Med. 1993; 329 (14): 977–86.

Diabetes Trials Unit. Oxford University. United Kingdom Prospective Diabetes Study, 1999.

van der Heijden GJ, Donders AR, Stijnen T, Moons KG. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol. 2006;59(10):1102-9.

Janssen KJ, Donders AR, Harrell FE Jr, Vergouwe Y, Chen Q, Grobbee DE, Moons KG. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010;63(7):721-7.

Masica AL, Ewen E, Daoud YA, Cheng D, Franceschini N, Kudyakov RE,

Bowen JR, Brouwer ES, Wallace D, Fleming NS and West SL. Comparative effectiveness research using electronic health records: impacts of oral antidiabetic drugs on the development of chronic kidney disease. Pharmacoepidemiology and Drug Safety 2013 (in press).

Hung AM, Roumie CL, Greevy RA, Liu X, Grijalva CG,Murff HJ, and Griffin MR. Kidney function decline in metformin versus sulfonylurea initiators: assessment of time-dependent contribution of weight, blood pressure, and glycemic control. Pharmacoepidemiology and Drug Safety 2013 (in press).

Rubin DB. Inference and missing data (with discussion). Biometrika 1976; 63:581–592.

The International Expert Committee. International Expert Committee Report on the Role of the A1c Assay in the Diagnosis of Diabetes. Diabetes Care 2009;32:1327–34.

Lu ZX, Walkerm KZ, O'Dea K, Sikaris KA, and Shaw JE. A1C for Screening and Diagnosis of Type 2 Diabetes in Routine Clinical Practice. Diabetes Care 2010; 33: 817-819.

Choi SH, Kim TH, Lim S, Park KS, Jang HC, and Cho NH. Hemoglobin A1c as a Diagnostic Tool for Diabetes Screening and New-Onset Diabetes Prediction: A 6-year community-based prospective study. Diabetes Care 2011; 34(4): 944–949.

Kahn R, Fonseca V. Translating the A1C Assay. Diabetes Care 2008; 31:1-4.

Rohlfing CL, Wiedmeyer HM, Little RR, England JD, Tennill A, Goldstein DE. Defining the relationship between plasma glucose and HbA1c: analysis of glucose profiles and HbA1c in the Diabetes Control and Complications Trial. Diabetes Care 2002; 25:275–278.

Nathan DM, Kuenen J, Borg R, Zheng H, Schoenfeld D, Heine RJ. A1c-Derived Average Glucose Study Group. Translating the A1C assay into estimated average glucose values. Diabetes Care 2008;31(8):1473-8.

American Diabetes Association. Standards of Medical Care in Diabetes-2013. Diabetes Care. 2013 Supplement 1.

Koenig R, Peterson CM, Kilo C, Cerami A, and Williamson JR. Hemoglobin A1c as an indicator of degree of glucose intolerance in diabetes. Diabetes 1976; 25: 230-232.

Little RJ: Regression with missing X's: a review. J Am Stat Assoc 1992; 87:1227-1237.

Little RJ, Rubin DB. Statistical analysis with missing data. New York: Wiley; 2002.

Enders CK. Analyzing structural equation models with missing data. In G. R.Hancock & R. O. Mueller (Eds.), Structural Equation Modeling: A Second Course (pp.313-342). 2006. Greenwich, CT: Information Age Publishing.

White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010; 29:2920-2931.

Schafer JL. Analysis of Incomplete Multivariate Data. 1997. Chapman and Hall.

Rubin DB. Multiple Imputation for Nonresponse in Surveys. 1987. New York: John Wiley & Sons.

O'Connor PJ, Gregg E, Rush WA, Cherney LM, Stiffman MN, Engelgau MM. Diabetes: how are we diagnosing and initially managing it? Ann Fam Med. 2006;4(1):15-22.

How to Cite
Xu, S., Schroeder, E. B., Shetterly, S., Goodrich, G. K., O’Connor, P. J., Steiner, J. F., Schmittdiel, J. A., Desai, J., Pathak, R. D., Neugebauer, R., Butler, M. G., Kirchner, L., & Raebel, M. A. (2014). Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data. Statistics, Optimization & Information Computing, 2(2), 93-104.
Research Articles