Skip to main content
  • Research article
  • Open access
  • Published:

Health plan administrative records versus birth certificate records: quality of race and ethnicity information in children



To understand racial and ethnic disparities in health care utilization and their potential underlying causes, valid information on race and ethnicity is necessary. However, the validity of pediatric race and ethnicity information in administrative records from large integrated health care systems using electronic medical records is largely unknown.


Information on race and ethnicity of 325,810 children born between 1998-2008 was extracted from health plan administrative records and compared to birth certificate records. Positive predictive values (PPV) were calculated for correct classification of race and ethnicity in administrative records compared to birth certificate records.


Misclassification of ethnicity and race in administrative records occurred in 23.1% and 33.6% children, respectively; the majority due to missing ethnicity (48.3%) and race (40.9%) information. Misclassification was most common in children of minority groups. PPV for White, Black, Asian/Pacific Islander, American Indian/Alaskan Native, multiple and other was 89.3%, 86.6%, 73.8%, 18.2%, 51.8% and 1.2%, respectively. PPV for Hispanic ethnicity was 95.6%. Racial and ethnic information improved with increasing number of medical visits. Subgroup analyses comparing racial classification between non-Hispanics and Hispanics showed White, Black and Asian race was more accurate among non-Hispanics than Hispanics.


In children, race and ethnicity information from administrative records has significant limitations in accurately identifying small minority groups. These results suggest that the quality of racial information obtained from administrative records may benefit from additional supplementation by birth certificate data.

Peer Review reports


Increasing attention has been given to the research potential of information collected in electronic health records [13]. Electronic health records have been successfully used to improve patient care [46]. Electronic health records also help to obtain important information on demographic and behavioural characteristics, medical conditions, and health care costs [710]. Among the most burning questions is the understanding of racial and ethnic disparities in health care utilization and their underlying causes [11, 12]. To address these problems, valid race and ethnicity information is needed.

Many health plans collect race and ethnicity information from their members [13, 14]. This data comes from various sources such as insurance enrollment forms, inpatient and outpatient visit information, and birth certificates while the quality varies from different sources. Some studies indicated that the quality of this administrative data is fairly good in adults [14, 15] but have some limitations for small minority groups such as American Indians [15]. The quality of race and ethnicity information for children, however, is largely unknown. Relatively frequent medical visits at a young age in children accompanied by a parent may result in higher quality of race and ethnicity information for children with adults.

Information from birth certificates is considered a criterion standard because it is nearly universal, includes self-reported race and ethnicity, and has been frequently validated [1619]. While race and ethnicity information from birth certificates has been shown to provide a valid data source with positive predictive values (PPV) for most races above 96%, known limitations exist for Native Americans [16].

To fill the knowledge gap on the quality of race and ethnicity information for children in the administrative records of integrated health care systems, we compared information from these administrative records of a large managed health care system to the maternal and paternal race and ethnicity information obtained from birth certificates. We also investigated the main sources of racial and ethnic misclassification and the effect of health care utilization on the quality of race and ethnicity information, taking into consideration that information in the electronic health record of an integrated health care system is constantly updated.


Study design and population

Kaiser Permanente Southern California (KPSC) is an integrated health care system that provides health care for approximately 3.3 million members in southern California. The coverage area of KPSC includes 10 counties with approximately 22.7 million residents (based on 2008 estimates). Thus, KPSC members represent about 16% of the underlying population. Members receive medical care in KPSC owned hospitals and medical offices in the southern California area. On average, about 30,000 children are born in KPSC hospitals each year. For the present study, we identified 357,389 children who were delivered in KPSC hospitals between January 1, 1998 and December 31, 2008. We excluded 31,579 (8.8%) children because the maternal race was missing on the birth certificate, resulting in a final study population of 325,810 children. The study protocol was reviewed and approved by the Institutional Review Board of KPSC.

Race and ethnicity information from birth certificate records

Race and ethnicity from birth certificates are often used for federal statistics, particularly for intercensal population estimates and annual statistical tabulations regarding maternal and child health [2022]. As all children are born in KPSC owned hospitals, birth certificate information is collected by clerks during the hospital stay and based on parental self-report. Information on maternal and paternal race from the KPSC birth certificate record database was used as the criterion standard to classify children as White, Black, Asian/Pacific Islander (PI), American Indian/Alaskan Native (AIAN), other race, or multiple races based on maternal and paternal race information. If the maternal and paternal race were not identical, children were classified as multiple races. The paternal race from birth certificates was unknown in 20.5% of the children. These children were classified according to the maternal race information obtained.

An infant's ethnicity was classified as Hispanic or non-Hispanic based on maternal and paternal ethnicity information obtained from birth certificates. If at least one parent was of Hispanic ethnicity, the infant was classified as Hispanic. Paternal ethnicity was unknown in 7.3%. These children were classified according to maternal ethnicity information. Maternal ethnicity was unknown in 53 children (<0.01%) who were classified based upon paternal information, or classified as unknown.

Race and ethnicity information from health plan administrative records

Racial categories from health plan administrative records are collapsed to White, Black, AIAN, Asian/PI, multiple races, other races, and unknown/missing races. Ethnic categories are Hispanic and non-Hispanic. Information on race, ethnicity, and language preference is collected at health plan enrollment, as well as during inpatient and outpatient medical visits. These are referred to as administrative records. Medical staff is asked to update these administrative records and, therefore, information can change over time. For the present study, information on race and ethnicity was extracted as of Dec 31, 2008. Administrative records include information from three different sources using the most recent information: (1) The Kaiser Foundation System, which is a management information system for health plan administration and accounting; (2) the electronic health record (EMR) system HealthConnect; and (3) a hospital inpatient information system which was used before EMRs were implemented. No information from birth certificates was included in this source. Within these sources, language preference for medical visits and other contacts provided by the patient or guardian was used to supplement this information. A KPSC member is classified as Asian/PI race if any Asian language is preferred. A KPSC member is classified as Hispanic if any Spanish language is preferred. If the three informational sources deliver contradictory information on race (other than unknown information), the race is classified as multiple.

Statistical analysis

We calculated racial/ethnic distribution of the study population based on birth certificates as the criterion standard compared with administrative records. We also calculated sensitivity (the conditional probability that a specific race/ethnicity according to birth certificates is correctly classified as such in administrative records) and positive predictive value (PPV) which is the proportion of children who are correctly classified in administrative records as being of a specific race/ethnicity among all children with this race/ethnicity). Sensitivity was calculated with and without subjects who have missing race/ethnicity information in the administrative records to distinguish between misclassification of race/ethnicity and misclassification due to non-classification. The distribution of unknown/missing race in the administrative records was comparable among most races/ethnicities (12.5-15.5%) except for AIAN (22.8%). Multivariable logistic regression models were used to estimate the relationship of correct classification of race/ethnicity with the length of health insurance coverage, number of medical encounters, and race and ethnicity. Odds ratio (OR) and their corresponding 95% confidence intervals (CI) are given. Statistical software package PASW Statistics 17.0 was used (SPSS Inc., Chicago, IL).


The majority of children (n = 174,361) enrolled in the study were Hispanic (Table 1). Regardless of ethnicity, 69.5% of children were classified as White (n = 240,214), 10.6% Black (n = 37,056), 9.7% Asian/PI (n = 40,895), 0.2% AIAN (n = 1,390), 0.3% of other race (n = 1,514), and 9.7% of multiple race (n = 4,741) based on maternal and paternal race from birth certificates.

Table 1 Characteristics of the study population

Identification of Hispanic Ethnicity in Administrative Records

According to administrative records, 43.1% of children were Hispanic, 48.2% non-Hispanic, and 8.8% had an unknown ethnicity. Sensitivity and PPV for Hispanic ethnicities were 76.9% and 95.6%, respectively (Table 2). Because most cases of misclassification were due to missing ethnicity information in administrative records (48.3% of misclassified cases), we also calculated sensitivity without those with missing information. If children with unknown ethnicity in administrative records were excluded, the sensitivity was 84.7%.

Table 2 Sensitivity and positive values for racial/ethnic information from administrative and birth certificates records in children

Identification of Hispanic ethnicity in administrative records was better in children whose parents were both Hispanic as compared to children with only one Hispanic parent (p < 0.001). In administrative records, 87.6% of children with parents who were both non-Hispanic or both Hispanic were identified in accordance with birth certificates. From children with only one Hispanic parent, children with Hispanic mothers (66.2%) were more likely to be identified as Hispanic in administrative records then children with Hispanic fathers (26.9%, p < 0.001).

Correct identification of Hispanic ethnicity was positively associated with the duration of health insurance coverage (OR per year of health insurance coverage 1.03, 95%-CI 1.02-1.04) but not with the number of all medical encounters. However, among medical encounters, inpatient (OR for each additional encounter 1.22, 95%-CI 1.19-1.25) and emergency (OR for each additional encounter 1.17, 95%-CI 1.16-1.18) visits showed a strong association with correct classification of Hispanic ethnicity.

Identification of Race in Administrative Records

According to administrative records, 56.8% of children were White, 9.8% Black, 9.2% Asian/PI, 0.2% AIAN, 9.8% of other race, 0.5% of multiple race, and 13.7% of unknown race. The overall sensitivity was 66.4%, but this was higher in the three largest racial groups (Table 2). The low sensitivity was mainly caused by high numbers of children with unknown or missing race/ethnicity in administrative records (40.9% of misclassified cases). When children with unknown race in administrative records were excluded, the overall sensitivity increased to 86.7%.

The sensitivity and PPVs were lowest in children of multiple races. Among incorrectly classified children of multiple races, 56.2% of children were not identified correctly because only the maternal race and 17.5% because only the paternal race was recorded. If children with known race are counted as correctly classified when at least one parent’s racial information was reflected correctly, the overall PPV increased to 95.7% with a sensitivity of 90.2%.

Correct identification of race varied by ethnicity (Table 2), the number of medical encounters, and birth outcome. The odds ratio for correct identification of race in the administrative records was higher in non-Hispanics (OR 2.62, 95%-CI 2.56-2.68) than in Hispanics. Stillbirth (OR 0.005, 95%-CI 0.004-0.006) but not neonatal death (OR 0.99, 95%-CI 0.72-1.38) decreased the odds for correct race identification. The total number of medical encounters only slightly increased the odds for correct race identification (OR for each additional encounter 1.01, 95%-CI 1.00-1.01). Among all medical encounters, inpatient (OR for each additional encounter 1.32, 95%-CI 1.30-1.35) and emergency (OR for each additional encounter 1.09, 95%-CI 1.09-1.10) visits showed the strongest association with correct race classification. Duration of health insurance coverage was not associated with the odds for correct race identification. Patterns of racial classification deviating from birth certificates (i.e. misclassification) differed significantly among non-Hispanics and Hispanics (Figure 1). When children with unknown race in administrative records were excluded, racial classification was more accurate in non-Hispanic Whites, Blacks, and Asians (PPV 81%). Hispanic children from minority groups were frequently misclassified as White.

Figure 1
figure 1

Patterns of racial classification in administrative records from Hispanic (n = 174,361) and non-Hispanic (n = 151,396) children. Abbreviations used: Asian/PI: Asian or Pacific Islander, AIAN: American Indian or Alaskan Native.


This study utilized the most recent race and ethnicity data collected as part of the administrative records of a large, integrated health plan and compared it to information available from birth certificates. The information was accurate for ethnicity and for the three largest racial groups (White, Black, and Asian). Two major causes of disagreement between administrative and birth certificate records were identified: (1) missing information in administrative records and (2) classification of children of multiple races based on information from only one parent. Eliminating these causes would increase the sensitivity for correct racial classification from 66.4% to 95.7%. Because race and ethnicity information in health plan administrative records are constantly updated, information was more accurate in children with more medical encounters. Sensitivity and PPVs were generally higher in non-Hispanics than in Hispanics. Limitations in data quality were noted for children of multiple races and children of AIAN origin.

The quality of racial and ethnic information in children has not been well studied. However, the results from the present study were comparable to two previous studies investigating race and ethnicity information in adults [14, 15]. In these studies, PPVs for Whites and Blacks were between 86.7% and 95.1%. However, PPVs and sensitivity for small minority groups such as AIAN were generally poor [15]. Comparably, PPV and sensitivity for Hispanic adults was lower than for non-Hispanic Whites and Blacks. These patterns are generally consistent with the accuracy observed for racial and ethnic information in Medicare enrollment databases [23]. The present study also shows that the patterns of misclassification varied greatly between Hispanic and non-Hispanic children.

In the present study, one major reason for race/ethnicity misclassification in the administrative records was missing information (non-classification). After exclusion of non-classified individuals, the sensitivity improved significantly for Whites, Black, and AIAN. This partially explains the lower sensitivity observed in our study compared to other studies which excluded non-classified individuals from their study population [14, 15]. Incomplete and missing information on race, ethnicity and language in databases from health care organizations has been reported by others previously [24]. The results from our study suggest that birth certificate information is not routinely used to fill missing information in administrative records, even if available as in this setting.

The second important cause of disagreement between administrative records and birth certificates was the misclassification of children whose parents had a different race (i.e. multiple races). Among children of multiple races, the vast majority of children were misclassified because only racial information of one parent - mostly maternal information - was used for classification purposes. One possible explanation for this misclassification is an often observed simplification of multiracial heritage. Multiple races are often reported as one main race [25, 26]. Multiracial identification varies across regions and races; in particular, AIAN are less likely to report themselves as multiracial [26]. It may also be speculated that maternal presence during birth as well as later medical encounters account for this observation.

The present study adds new information on changes in the quality of race information over the course of membership. Race and ethnicity data collected in an integrated health care system used in the present study are updated during medical visits, as opposed to other settings such as health insurance claims where race/ethnicity information is usually collected at enrollment. The present study shows that the quality of information increased over time with increasing number of medical encounters, especially inpatient visits. Although the effects may differ in magnitude by organization, we can assume our results are generalizable to other integrated health care settings that update their patient's demographic data during office visits.

Our study benefited from the substantial size of a diverse population with adequate numbers of Hispanic and non-Hispanic racial and ethnic group representation to generate ample statistical power and allow valid estimates of sensitivity and PPVs. A limitation of the present study is the use of information obtained from birth certificate records as a criterion standard. After carefully reviewing the birth certificate records, previous studies have reported that birth certificate records provide relatively valid information on race and ethnicity [16, 19]. Race and ethnicity from birth certificates are also used as standards for federal statistics such as intercensal population estimates [2022]. Despite PPVs of 96% and above for most races, significant limitations of the data quality were described for individuals of AIAN origin.

Consequences of misclassification of racial and ethnic minorities can lead to data misinterpretation and erroneous conclusions. Incorrect classification of individuals of a small minority group may lead to over or underestimation of health disparities and race-related risk factors. Therefore, accurate racial and ethnic information is crucial for health care research.


Results of the present study suggest that the overall quality of racial and ethnic information is relatively good for distinguishing between Hispanics and non-Hispanics, Whites, and Blacks. Our results also show that use of health plan administrative records alone leads to frequent misclassification of minority groups and individuals of multiple races. Thus, linking birth certificate information to the administrative records of children can optimize the accuracy of race and ethnicity classification if this information is available.


  1. Adams WG, Mann AM, Bauchner H: Use of an electronic medical record improves the quality of urban pediatric primary care. Pediatrics. 2003, 111 (3): 626-632. 10.1542/peds.111.3.626.

    Article  PubMed  Google Scholar 

  2. Bates DW, Ebell M, Gotlieb E, Zapp J, Mullins HC: A proposal for electronic medical records in U.S. primary care. J AmMedInformAssoc. 2003, 10 (1): 1-10.

    Google Scholar 

  3. Bordowitz R, Morland K, Reich D: The use of an electronic medical record to improve documentation and treatment of obesity. FamMed. 2007, 39 (4): 274-279.

    Google Scholar 

  4. Flower KB, Perrin EM, Viadro CI, Ammerman AS: Using body mass index to identify overweight children: barriers and facilitators in primary care. AmbulPediatr. 2007, 7 (1): 38-44.

    Google Scholar 

  5. Garg AX, Adhikari NK, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, Sam J, Haynes RB: Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005, 293 (10): 1223-1238. 10.1001/jama.293.10.1223.

    Article  CAS  PubMed  Google Scholar 

  6. Kemper AR, Uren RL, Clark SJ: Adoption of electronic health records in primary care pediatric practices. Pediatrics. 2006, 118 (1): e20-e24. 10.1542/peds.2005-3000.

    Article  PubMed  Google Scholar 

  7. Cebul RD: Using electronic medical records to measure and improve performance. Trans Am ClinClimatolAssoc. 2008, 119: 65-75.

    Google Scholar 

  8. D'Avolio LW: Electronic medical records at a crossroads: impetus for change or missed opportunity?. JAMA. 2009, 302 (10): 1109-1111. 10.1001/jama.2009.1319.

    Article  PubMed  Google Scholar 

  9. Dunn MJ: Benefits of electronic medical records outweigh every challenge. WMJ. 2007, 106 (3): 159-160.

    PubMed  Google Scholar 

  10. Wu RC, Straus SE: Evidence for handheld electronic medical records in improving care: a systematic review. BMCMedInformDecisMak. 2006, 6: 26.

    CAS  Google Scholar 

  11. Coker TR, Elliott MN, Kataoka S, Schwebel DC, Mrug S, Grunbaum JA, Cuccaro P, Peskin MF, Schuster MA: Racial/Ethnic disparities in the mental health care utilization of fifth grade children. AcadPediatr. 2009, 9 (2): 89-96.

    Google Scholar 

  12. Stevens GD, Shi L: Effect of managed care on children's relationships with their primary care physicians: differences by race. ArchPediatr AdolescMed. 2002, 156 (4): 369-377.

    Google Scholar 

  13. Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N: A New Method for Estimating Race/Ethnicity and Associated Disparities Where Administrative Records Lack Self-Reported Race/Ethnicity. Health ServRes. 2008.

    Google Scholar 

  14. West CN, Geiger AM, Greene SM, Harris EL, Liu IL, Barton MB, Elmore JG, Rolnick S, Nekhlyudov L, Altschuler A, et al: Race and ethnicity: comparing medical records to self-reports. J NatlCancer InstMonogr. 2005, 72-74. 35

  15. Gomez SL, Kelsey JL, Glaser SL, Lee MM, Sidney S: Inconsistencies between self-reported ethnicity and ethnicity recorded in a health maintenance organization. AnnEpidemiol. 2005, 15 (1): 71-79.

    Google Scholar 

  16. Baumeister L, Marchi K, Pearl M, Williams R, Braveman P: The validity of information on "race" and "Hispanic ethnicity" in California birth certificate data. Health ServRes. 2000, 35 (4): 869-883.

    CAS  Google Scholar 

  17. Braveman P, Pearl M, Egerter S, Marchi K, Williams R: Validity of insurance information on California birth certificates. AmJ Public Health. 1998, 88 (5): 813-816. 10.2105/AJPH.88.5.813.

    Article  CAS  Google Scholar 

  18. Brender JD, Suarez L, Langlois PH: Validity of parental work information on the birth certificate. BMCPublic Health. 2008, 8: 95.

    Google Scholar 

  19. Northam S, Knapp TR: The reliability and validity of birth certificates. J ObstetGynecolNeonatal Nurs. 2006, 35 (1): 3-12.

    Article  Google Scholar 

  20. Schoendorf KC, Parker JD, Batkhan LZ, Kiely JL: Comparability of the birth certificate and 1988 Maternal and Infant Health Survey. Vital Health Stat 2. 1993, 1-19. 116

  21. National Center for Health Statistics (NCHS): Documentation for Intercensal population estimates of the specified Hispanic origin groups. []

  22. State of California DoF: Projected Total Population of California Counties 1990 to 2040. Edited by: State of California DoF. Sacramento. 1993

    Google Scholar 

  23. Arday SL, Arday DR, Monroe S, Zhang J: HCFA's racial and ethnic data: current accuracy and recent improvements. Health Care FinancRev. 2000, 21 (4): 107-116.

    CAS  Google Scholar 

  24. Hasnain-Wynia R, Baker DW: Obtaining data on patient race, ethnicity, and primary language in health care organizations: current challenges and proposed solutions. Health ServRes. 2006, 41 (4 Pt 1): 1501-1518.

    Google Scholar 

  25. Harris DR, Sim JJ: Who is Multiracial? Assessing the Complexity of Lived Race. Am Sociol Rev. 2002, 67: 614-627. 10.2307/3088948.

    Article  Google Scholar 

  26. Tafoya SM, Hohnson H, Hill LE: Who chooses to choose two?. The American People: Census 2000. Edited by: Farley R, Haga J. 2005, New York: Russel Sage Foundation, 332-351.

    Google Scholar 

Pre-publication history

Download references


This research was supported by the National Institute of Diabetes and Digestive and Kidney Disorders at the National Institutes of Health [R21DK085395, PI: Koebnick] and by Kaiser Permanente Direct Community Benefit Funds.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Corinna Koebnick.

Additional information

Competing interests

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

Authors' contributions

Conception and design of the study and: NS, RY, CK

Acquisition of the data: NS, RY, CK

Analysis and interpretation: NS, RY, ALG, DG, DS, SJJ, WC, SD, CK

Drafting and critically revising the manuscript: NS, RY, ALG, DG, DS, SJJ, WC, SD, CK

Final approval: NS, RY, ALG, DG, DS, SJJ, WC, SD, CK

Ning Smith, Rajan L Iyer contributed equally to this work.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Smith, N., Iyer, R.L., Langer-Gould, A. et al. Health plan administrative records versus birth certificate records: quality of race and ethnicity information in children. BMC Health Serv Res 10, 316 (2010).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: