Health plan administrative records versus birth certificate records: quality of race and ethnicity information in children
- Ning Smith†1,
- Rajan L Iyer†1,
- Annette Langer-Gould1,
- Darios T Getahun1,
- Daniel Strickland1,
- Steven J Jacobsen1,
- Wansu Chen1,
- Stephen F Derose1 and
- Corinna Koebnick1Email author
© Smith et al; licensee BioMed Central Ltd. 2010
Received: 7 April 2010
Accepted: 23 November 2010
Published: 23 November 2010
To understand racial and ethnic disparities in health care utilization and their potential underlying causes, valid information on race and ethnicity is necessary. However, the validity of pediatric race and ethnicity information in administrative records from large integrated health care systems using electronic medical records is largely unknown.
Information on race and ethnicity of 325,810 children born between 1998-2008 was extracted from health plan administrative records and compared to birth certificate records. Positive predictive values (PPV) were calculated for correct classification of race and ethnicity in administrative records compared to birth certificate records.
Misclassification of ethnicity and race in administrative records occurred in 23.1% and 33.6% children, respectively; the majority due to missing ethnicity (48.3%) and race (40.9%) information. Misclassification was most common in children of minority groups. PPV for White, Black, Asian/Pacific Islander, American Indian/Alaskan Native, multiple and other was 89.3%, 86.6%, 73.8%, 18.2%, 51.8% and 1.2%, respectively. PPV for Hispanic ethnicity was 95.6%. Racial and ethnic information improved with increasing number of medical visits. Subgroup analyses comparing racial classification between non-Hispanics and Hispanics showed White, Black and Asian race was more accurate among non-Hispanics than Hispanics.
In children, race and ethnicity information from administrative records has significant limitations in accurately identifying small minority groups. These results suggest that the quality of racial information obtained from administrative records may benefit from additional supplementation by birth certificate data.
Increasing attention has been given to the research potential of information collected in electronic health records [1–3]. Electronic health records have been successfully used to improve patient care [4–6]. Electronic health records also help to obtain important information on demographic and behavioural characteristics, medical conditions, and health care costs [7–10]. Among the most burning questions is the understanding of racial and ethnic disparities in health care utilization and their underlying causes [11, 12]. To address these problems, valid race and ethnicity information is needed.
Many health plans collect race and ethnicity information from their members [13, 14]. This data comes from various sources such as insurance enrollment forms, inpatient and outpatient visit information, and birth certificates while the quality varies from different sources. Some studies indicated that the quality of this administrative data is fairly good in adults [14, 15] but have some limitations for small minority groups such as American Indians . The quality of race and ethnicity information for children, however, is largely unknown. Relatively frequent medical visits at a young age in children accompanied by a parent may result in higher quality of race and ethnicity information for children with adults.
Information from birth certificates is considered a criterion standard because it is nearly universal, includes self-reported race and ethnicity, and has been frequently validated [16–19]. While race and ethnicity information from birth certificates has been shown to provide a valid data source with positive predictive values (PPV) for most races above 96%, known limitations exist for Native Americans .
To fill the knowledge gap on the quality of race and ethnicity information for children in the administrative records of integrated health care systems, we compared information from these administrative records of a large managed health care system to the maternal and paternal race and ethnicity information obtained from birth certificates. We also investigated the main sources of racial and ethnic misclassification and the effect of health care utilization on the quality of race and ethnicity information, taking into consideration that information in the electronic health record of an integrated health care system is constantly updated.
Study design and population
Kaiser Permanente Southern California (KPSC) is an integrated health care system that provides health care for approximately 3.3 million members in southern California. The coverage area of KPSC includes 10 counties with approximately 22.7 million residents (based on 2008 estimates). Thus, KPSC members represent about 16% of the underlying population. Members receive medical care in KPSC owned hospitals and medical offices in the southern California area. On average, about 30,000 children are born in KPSC hospitals each year. For the present study, we identified 357,389 children who were delivered in KPSC hospitals between January 1, 1998 and December 31, 2008. We excluded 31,579 (8.8%) children because the maternal race was missing on the birth certificate, resulting in a final study population of 325,810 children. The study protocol was reviewed and approved by the Institutional Review Board of KPSC.
Race and ethnicity information from birth certificate records
Race and ethnicity from birth certificates are often used for federal statistics, particularly for intercensal population estimates and annual statistical tabulations regarding maternal and child health [20–22]. As all children are born in KPSC owned hospitals, birth certificate information is collected by clerks during the hospital stay and based on parental self-report. Information on maternal and paternal race from the KPSC birth certificate record database was used as the criterion standard to classify children as White, Black, Asian/Pacific Islander (PI), American Indian/Alaskan Native (AIAN), other race, or multiple races based on maternal and paternal race information. If the maternal and paternal race were not identical, children were classified as multiple races. The paternal race from birth certificates was unknown in 20.5% of the children. These children were classified according to the maternal race information obtained.
An infant's ethnicity was classified as Hispanic or non-Hispanic based on maternal and paternal ethnicity information obtained from birth certificates. If at least one parent was of Hispanic ethnicity, the infant was classified as Hispanic. Paternal ethnicity was unknown in 7.3%. These children were classified according to maternal ethnicity information. Maternal ethnicity was unknown in 53 children (<0.01%) who were classified based upon paternal information, or classified as unknown.
Race and ethnicity information from health plan administrative records
Racial categories from health plan administrative records are collapsed to White, Black, AIAN, Asian/PI, multiple races, other races, and unknown/missing races. Ethnic categories are Hispanic and non-Hispanic. Information on race, ethnicity, and language preference is collected at health plan enrollment, as well as during inpatient and outpatient medical visits. These are referred to as administrative records. Medical staff is asked to update these administrative records and, therefore, information can change over time. For the present study, information on race and ethnicity was extracted as of Dec 31, 2008. Administrative records include information from three different sources using the most recent information: (1) The Kaiser Foundation System, which is a management information system for health plan administration and accounting; (2) the electronic health record (EMR) system HealthConnect; and (3) a hospital inpatient information system which was used before EMRs were implemented. No information from birth certificates was included in this source. Within these sources, language preference for medical visits and other contacts provided by the patient or guardian was used to supplement this information. A KPSC member is classified as Asian/PI race if any Asian language is preferred. A KPSC member is classified as Hispanic if any Spanish language is preferred. If the three informational sources deliver contradictory information on race (other than unknown information), the race is classified as multiple.
We calculated racial/ethnic distribution of the study population based on birth certificates as the criterion standard compared with administrative records. We also calculated sensitivity (the conditional probability that a specific race/ethnicity according to birth certificates is correctly classified as such in administrative records) and positive predictive value (PPV) which is the proportion of children who are correctly classified in administrative records as being of a specific race/ethnicity among all children with this race/ethnicity). Sensitivity was calculated with and without subjects who have missing race/ethnicity information in the administrative records to distinguish between misclassification of race/ethnicity and misclassification due to non-classification. The distribution of unknown/missing race in the administrative records was comparable among most races/ethnicities (12.5-15.5%) except for AIAN (22.8%). Multivariable logistic regression models were used to estimate the relationship of correct classification of race/ethnicity with the length of health insurance coverage, number of medical encounters, and race and ethnicity. Odds ratio (OR) and their corresponding 95% confidence intervals (CI) are given. Statistical software package PASW Statistics 17.0 was used (SPSS Inc., Chicago, IL).
Characteristics of the study population
(n = 325,810)
Age in 2008 (y)1
Membership duration (y)1
Medical encounters (n)1
Stillbirth, neonatal death, or other non-live births (%)
Hispanic ethnicity (%)
Identification of Hispanic Ethnicity in Administrative Records
Sensitivity and positive values for racial/ethnic information from administrative and birth certificates records in children
Positive predictive value
Records with unknown race/ethnicity
All Children (n = 325,810)
Non-Hispanic children (n = 151,396)
Hispanic children (n = 174,361)
Identification of Hispanic ethnicity in administrative records was better in children whose parents were both Hispanic as compared to children with only one Hispanic parent (p < 0.001). In administrative records, 87.6% of children with parents who were both non-Hispanic or both Hispanic were identified in accordance with birth certificates. From children with only one Hispanic parent, children with Hispanic mothers (66.2%) were more likely to be identified as Hispanic in administrative records then children with Hispanic fathers (26.9%, p < 0.001).
Correct identification of Hispanic ethnicity was positively associated with the duration of health insurance coverage (OR per year of health insurance coverage 1.03, 95%-CI 1.02-1.04) but not with the number of all medical encounters. However, among medical encounters, inpatient (OR for each additional encounter 1.22, 95%-CI 1.19-1.25) and emergency (OR for each additional encounter 1.17, 95%-CI 1.16-1.18) visits showed a strong association with correct classification of Hispanic ethnicity.
Identification of Race in Administrative Records
According to administrative records, 56.8% of children were White, 9.8% Black, 9.2% Asian/PI, 0.2% AIAN, 9.8% of other race, 0.5% of multiple race, and 13.7% of unknown race. The overall sensitivity was 66.4%, but this was higher in the three largest racial groups (Table 2). The low sensitivity was mainly caused by high numbers of children with unknown or missing race/ethnicity in administrative records (40.9% of misclassified cases). When children with unknown race in administrative records were excluded, the overall sensitivity increased to 86.7%.
The sensitivity and PPVs were lowest in children of multiple races. Among incorrectly classified children of multiple races, 56.2% of children were not identified correctly because only the maternal race and 17.5% because only the paternal race was recorded. If children with known race are counted as correctly classified when at least one parent’s racial information was reflected correctly, the overall PPV increased to 95.7% with a sensitivity of 90.2%.
This study utilized the most recent race and ethnicity data collected as part of the administrative records of a large, integrated health plan and compared it to information available from birth certificates. The information was accurate for ethnicity and for the three largest racial groups (White, Black, and Asian). Two major causes of disagreement between administrative and birth certificate records were identified: (1) missing information in administrative records and (2) classification of children of multiple races based on information from only one parent. Eliminating these causes would increase the sensitivity for correct racial classification from 66.4% to 95.7%. Because race and ethnicity information in health plan administrative records are constantly updated, information was more accurate in children with more medical encounters. Sensitivity and PPVs were generally higher in non-Hispanics than in Hispanics. Limitations in data quality were noted for children of multiple races and children of AIAN origin.
The quality of racial and ethnic information in children has not been well studied. However, the results from the present study were comparable to two previous studies investigating race and ethnicity information in adults [14, 15]. In these studies, PPVs for Whites and Blacks were between 86.7% and 95.1%. However, PPVs and sensitivity for small minority groups such as AIAN were generally poor . Comparably, PPV and sensitivity for Hispanic adults was lower than for non-Hispanic Whites and Blacks. These patterns are generally consistent with the accuracy observed for racial and ethnic information in Medicare enrollment databases . The present study also shows that the patterns of misclassification varied greatly between Hispanic and non-Hispanic children.
In the present study, one major reason for race/ethnicity misclassification in the administrative records was missing information (non-classification). After exclusion of non-classified individuals, the sensitivity improved significantly for Whites, Black, and AIAN. This partially explains the lower sensitivity observed in our study compared to other studies which excluded non-classified individuals from their study population [14, 15]. Incomplete and missing information on race, ethnicity and language in databases from health care organizations has been reported by others previously . The results from our study suggest that birth certificate information is not routinely used to fill missing information in administrative records, even if available as in this setting.
The second important cause of disagreement between administrative records and birth certificates was the misclassification of children whose parents had a different race (i.e. multiple races). Among children of multiple races, the vast majority of children were misclassified because only racial information of one parent - mostly maternal information - was used for classification purposes. One possible explanation for this misclassification is an often observed simplification of multiracial heritage. Multiple races are often reported as one main race [25, 26]. Multiracial identification varies across regions and races; in particular, AIAN are less likely to report themselves as multiracial . It may also be speculated that maternal presence during birth as well as later medical encounters account for this observation.
The present study adds new information on changes in the quality of race information over the course of membership. Race and ethnicity data collected in an integrated health care system used in the present study are updated during medical visits, as opposed to other settings such as health insurance claims where race/ethnicity information is usually collected at enrollment. The present study shows that the quality of information increased over time with increasing number of medical encounters, especially inpatient visits. Although the effects may differ in magnitude by organization, we can assume our results are generalizable to other integrated health care settings that update their patient's demographic data during office visits.
Our study benefited from the substantial size of a diverse population with adequate numbers of Hispanic and non-Hispanic racial and ethnic group representation to generate ample statistical power and allow valid estimates of sensitivity and PPVs. A limitation of the present study is the use of information obtained from birth certificate records as a criterion standard. After carefully reviewing the birth certificate records, previous studies have reported that birth certificate records provide relatively valid information on race and ethnicity [16, 19]. Race and ethnicity from birth certificates are also used as standards for federal statistics such as intercensal population estimates [20–22]. Despite PPVs of 96% and above for most races, significant limitations of the data quality were described for individuals of AIAN origin.
Consequences of misclassification of racial and ethnic minorities can lead to data misinterpretation and erroneous conclusions. Incorrect classification of individuals of a small minority group may lead to over or underestimation of health disparities and race-related risk factors. Therefore, accurate racial and ethnic information is crucial for health care research.
Results of the present study suggest that the overall quality of racial and ethnic information is relatively good for distinguishing between Hispanics and non-Hispanics, Whites, and Blacks. Our results also show that use of health plan administrative records alone leads to frequent misclassification of minority groups and individuals of multiple races. Thus, linking birth certificate information to the administrative records of children can optimize the accuracy of race and ethnicity classification if this information is available.
This research was supported by the National Institute of Diabetes and Digestive and Kidney Disorders at the National Institutes of Health [R21DK085395, PI: Koebnick] and by Kaiser Permanente Direct Community Benefit Funds.
- Adams WG, Mann AM, Bauchner H: Use of an electronic medical record improves the quality of urban pediatric primary care. Pediatrics. 2003, 111 (3): 626-632. 10.1542/peds.111.3.626.View ArticlePubMed
- Bates DW, Ebell M, Gotlieb E, Zapp J, Mullins HC: A proposal for electronic medical records in U.S. primary care. J AmMedInformAssoc. 2003, 10 (1): 1-10.
- Bordowitz R, Morland K, Reich D: The use of an electronic medical record to improve documentation and treatment of obesity. FamMed. 2007, 39 (4): 274-279.
- Flower KB, Perrin EM, Viadro CI, Ammerman AS: Using body mass index to identify overweight children: barriers and facilitators in primary care. AmbulPediatr. 2007, 7 (1): 38-44.
- Garg AX, Adhikari NK, McDonald H, Rosas-Arellano MP, Devereaux PJ, Beyene J, Sam J, Haynes RB: Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005, 293 (10): 1223-1238. 10.1001/jama.293.10.1223.View ArticlePubMed
- Kemper AR, Uren RL, Clark SJ: Adoption of electronic health records in primary care pediatric practices. Pediatrics. 2006, 118 (1): e20-e24. 10.1542/peds.2005-3000.View ArticlePubMed
- Cebul RD: Using electronic medical records to measure and improve performance. Trans Am ClinClimatolAssoc. 2008, 119: 65-75.
- D'Avolio LW: Electronic medical records at a crossroads: impetus for change or missed opportunity?. JAMA. 2009, 302 (10): 1109-1111. 10.1001/jama.2009.1319.View ArticlePubMed
- Dunn MJ: Benefits of electronic medical records outweigh every challenge. WMJ. 2007, 106 (3): 159-160.PubMed
- Wu RC, Straus SE: Evidence for handheld electronic medical records in improving care: a systematic review. BMCMedInformDecisMak. 2006, 6: 26.
- Coker TR, Elliott MN, Kataoka S, Schwebel DC, Mrug S, Grunbaum JA, Cuccaro P, Peskin MF, Schuster MA: Racial/Ethnic disparities in the mental health care utilization of fifth grade children. AcadPediatr. 2009, 9 (2): 89-96.
- Stevens GD, Shi L: Effect of managed care on children's relationships with their primary care physicians: differences by race. ArchPediatr AdolescMed. 2002, 156 (4): 369-377.
- Elliott MN, Fremont A, Morrison PA, Pantoja P, Lurie N: A New Method for Estimating Race/Ethnicity and Associated Disparities Where Administrative Records Lack Self-Reported Race/Ethnicity. Health ServRes. 2008.
- West CN, Geiger AM, Greene SM, Harris EL, Liu IL, Barton MB, Elmore JG, Rolnick S, Nekhlyudov L, Altschuler A, et al: Race and ethnicity: comparing medical records to self-reports. J NatlCancer InstMonogr. 2005, 72-74. 35
- Gomez SL, Kelsey JL, Glaser SL, Lee MM, Sidney S: Inconsistencies between self-reported ethnicity and ethnicity recorded in a health maintenance organization. AnnEpidemiol. 2005, 15 (1): 71-79.
- Baumeister L, Marchi K, Pearl M, Williams R, Braveman P: The validity of information on "race" and "Hispanic ethnicity" in California birth certificate data. Health ServRes. 2000, 35 (4): 869-883.
- Braveman P, Pearl M, Egerter S, Marchi K, Williams R: Validity of insurance information on California birth certificates. AmJ Public Health. 1998, 88 (5): 813-816. 10.2105/AJPH.88.5.813.View Article
- Brender JD, Suarez L, Langlois PH: Validity of parental work information on the birth certificate. BMCPublic Health. 2008, 8: 95.
- Northam S, Knapp TR: The reliability and validity of birth certificates. J ObstetGynecolNeonatal Nurs. 2006, 35 (1): 3-12.View Article
- Schoendorf KC, Parker JD, Batkhan LZ, Kiely JL: Comparability of the birth certificate and 1988 Maternal and Infant Health Survey. Vital Health Stat 2. 1993, 1-19. 116
- National Center for Health Statistics (NCHS): Documentation for Intercensal population estimates of the specified Hispanic origin groups. [http://www.cdc.gov/nchs/data/dvs/DOCUMENTATION.pdf]
- State of California DoF: Projected Total Population of California Counties 1990 to 2040. Edited by: State of California DoF. Sacramento. 1993
- Arday SL, Arday DR, Monroe S, Zhang J: HCFA's racial and ethnic data: current accuracy and recent improvements. Health Care FinancRev. 2000, 21 (4): 107-116.
- Hasnain-Wynia R, Baker DW: Obtaining data on patient race, ethnicity, and primary language in health care organizations: current challenges and proposed solutions. Health ServRes. 2006, 41 (4 Pt 1): 1501-1518.
- Harris DR, Sim JJ: Who is Multiracial? Assessing the Complexity of Lived Race. Am Sociol Rev. 2002, 67: 614-627. 10.2307/3088948.View Article
- Tafoya SM, Hohnson H, Hill LE: Who chooses to choose two?. The American People: Census 2000. Edited by: Farley R, Haga J. 2005, New York: Russel Sage Foundation, 332-351.
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6963/10/316/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.