Health plan administrative records versus birth certificate records: quality of race and ethnicity information in children

Background To understand racial and ethnic disparities in health care utilization and their potential underlying causes, valid information on race and ethnicity is necessary. However, the validity of pediatric race and ethnicity information in administrative records from large integrated health care systems using electronic medical records is largely unknown. Methods Information on race and ethnicity of 325,810 children born between 1998-2008 was extracted from health plan administrative records and compared to birth certificate records. Positive predictive values (PPV) were calculated for correct classification of race and ethnicity in administrative records compared to birth certificate records. Results Misclassification of ethnicity and race in administrative records occurred in 23.1% and 33.6% children, respectively; the majority due to missing ethnicity (48.3%) and race (40.9%) information. Misclassification was most common in children of minority groups. PPV for White, Black, Asian/Pacific Islander, American Indian/Alaskan Native, multiple and other was 89.3%, 86.6%, 73.8%, 18.2%, 51.8% and 1.2%, respectively. PPV for Hispanic ethnicity was 95.6%. Racial and ethnic information improved with increasing number of medical visits. Subgroup analyses comparing racial classification between non-Hispanics and Hispanics showed White, Black and Asian race was more accurate among non-Hispanics than Hispanics. Conclusions In children, race and ethnicity information from administrative records has significant limitations in accurately identifying small minority groups. These results suggest that the quality of racial information obtained from administrative records may benefit from additional supplementation by birth certificate data.


Background
Increasing attention has been given to the research potential of information collected in electronic health records [1][2][3]. Electronic health records have been successfully used to improve patient care [4][5][6]. Electronic health records also help to obtain important information on demographic and behavioural characteristics, medical conditions, and health care costs [7][8][9][10]. Among the most burning questions is the understanding of racial and ethnic disparities in health care utilization and their underlying causes [11,12]. To address these problems, valid race and ethnicity information is needed.
Many health plans collect race and ethnicity information from their members [13,14]. This data comes from various sources such as insurance enrollment forms, inpatient and outpatient visit information, and birth certificates while the quality varies from different sources. Some studies indicated that the quality of this administrative data is fairly good in adults [14,15] but have some limitations for small minority groups such as American Indians [15]. The quality of race and ethnicity information for children, however, is largely unknown. Relatively frequent medical visits at a young age in children accompanied by a parent may result in higher quality of race and ethnicity information for children with adults.
Information from birth certificates is considered a criterion standard because it is nearly universal, includes self-reported race and ethnicity, and has been frequently validated [16][17][18][19]. While race and ethnicity information from birth certificates has been shown to provide a valid data source with positive predictive values (PPV) for most races above 96%, known limitations exist for Native Americans [16].
To fill the knowledge gap on the quality of race and ethnicity information for children in the administrative records of integrated health care systems, we compared information from these administrative records of a large managed health care system to the maternal and paternal race and ethnicity information obtained from birth certificates. We also investigated the main sources of racial and ethnic misclassification and the effect of health care utilization on the quality of race and ethnicity information, taking into consideration that information in the electronic health record of an integrated health care system is constantly updated.

Race and ethnicity information from birth certificate records
Race and ethnicity from birth certificates are often used for federal statistics, particularly for intercensal population estimates and annual statistical tabulations regarding maternal and child health [20][21][22]. As all children are born in KPSC owned hospitals, birth certificate information is collected by clerks during the hospital stay and based on parental self-report. Information on maternal and paternal race from the KPSC birth certificate record database was used as the criterion standard to classify children as White, Black, Asian/Pacific Islander (PI), American Indian/Alaskan Native (AIAN), other race, or multiple races based on maternal and paternal race information. If the maternal and paternal race were not identical, children were classified as multiple races. The paternal race from birth certificates was unknown in 20.5% of the children. These children were classified according to the maternal race information obtained.
An infant's ethnicity was classified as Hispanic or non-Hispanic based on maternal and paternal ethnicity information obtained from birth certificates. If at least one parent was of Hispanic ethnicity, the infant was classified as Hispanic. Paternal ethnicity was unknown in 7.3%. These children were classified according to maternal ethnicity information. Maternal ethnicity was unknown in 53 children (<0.01%) who were classified based upon paternal information, or classified as unknown.

Race and ethnicity information from health plan administrative records
Racial categories from health plan administrative records are collapsed to White, Black, AIAN, Asian/PI, multiple races, other races, and unknown/missing races. Ethnic categories are Hispanic and non-Hispanic. Information on race, ethnicity, and language preference is collected at health plan enrollment, as well as during inpatient and outpatient medical visits. These are referred to as administrative records. Medical staff is asked to update these administrative records and, therefore, information can change over time. For the present study, information on race and ethnicity was extracted as of Dec 31, 2008. Administrative records include information from three different sources using the most recent information: (1) The Kaiser Foundation System, which is a management information system for health plan administration and accounting; (2) the electronic health record (EMR) system HealthConnect; and (3) a hospital inpatient information system which was used before EMRs were implemented. No information from birth certificates was included in this source. Within these sources, language preference for medical visits and other contacts provided by the patient or guardian was used to supplement this information. A KPSC member is classified as Asian/PI race if any Asian language is preferred. A KPSC member is classified as Hispanic if any Spanish language is preferred. If the three informational sources deliver contradictory information on race (other than unknown information), the race is classified as multiple.

Statistical analysis
We calculated racial/ethnic distribution of the study population based on birth certificates as the criterion standard compared with administrative records. We also calculated sensitivity (the conditional probability that a specific race/ethnicity according to birth certificates is correctly classified as such in administrative records) and positive predictive value (PPV) which is the proportion of children who are correctly classified in administrative records as being of a specific race/ethnicity among all children with this race/ethnicity). Sensitivity was calculated with and without subjects who have missing race/ethnicity information in the administrative records to distinguish between misclassification of race/ ethnicity and misclassification due to non-classification. The distribution of unknown/missing race in the administrative records was comparable among most races/ethnicities (12.5-15.5%) except for AIAN (22.8%). Multivariable logistic regression models were used to estimate the relationship of correct classification of race/ethnicity with the length of health insurance coverage, number of medical encounters, and race and ethnicity. Odds ratio (OR) and their corresponding 95% confidence intervals (CI) are given. Statistical software package PASW Statistics 17.0 was used (SPSS Inc., Chicago, IL).

Identification of Hispanic Ethnicity in Administrative Records
According to administrative records, 43.1% of children were Hispanic, 48.2% non-Hispanic, and 8.8% had an unknown ethnicity. Sensitivity and PPV for Hispanic ethnicities were 76.9% and 95.6%, respectively ( Table 2). Because most cases of misclassification were due to missing ethnicity information in administrative records (48.3% of misclassified cases), we also calculated sensitivity without those with missing information. If children with unknown ethnicity in administrative records were excluded, the sensitivity was 84.7%.
Identification of Hispanic ethnicity in administrative records was better in children whose parents were both Hispanic as compared to children with only one Hispanic parent (p < 0.001). In administrative records, 87.6% of children with parents who were both non-Hispanic or both Hispanic were identified in accordance with birth certificates. From children with only one Hispanic parent, children with Hispanic mothers (66.2%) were more likely to be identified as Hispanic in administrative records then children with Hispanic fathers (26.9%, p < 0.001).
Correct identification of Hispanic ethnicity was positively associated with the duration of health insurance coverage (OR per year of health insurance coverage 1.03, 95%-CI 1.02-1.04) but not with the number of all medical encounters. However, among medical encounters, inpatient (OR for each additional encounter 1.22, 95%-CI 1.19-1.25) and emergency (OR for each additional encounter 1.17, 95%-CI 1.16-1.18) visits showed a strong association with correct classification of Hispanic ethnicity.

Identification of Race in Administrative Records
According to administrative records, 56.8% of children were White, 9.8% Black, 9.2% Asian/PI, 0.2% AIAN, 9.8% of other race, 0.5% of multiple race, and 13.7% of unknown race. The overall sensitivity was 66.4%, but this was higher in the three largest racial groups ( Table 2). The low sensitivity was mainly caused by high numbers of children with unknown or missing race/ethnicity in administrative records (40.9% of misclassified cases). When children with unknown race in administrative records were excluded, the overall sensitivity increased to 86.7%.
The sensitivity and PPVs were lowest in children of multiple races. Among incorrectly classified children of multiple races, 56.2% of children were not identified correctly because only the maternal race and 17.5% because only the paternal race was recorded. If children with known race are counted as correctly classified when at least one parent's racial information was reflected correctly, the overall PPV increased to 95.7% with a sensitivity of 90.2%. Correct identification of race varied by ethnicity (Table  2), the number of medical encounters, and birth outcome. The odds ratio for correct identification of race in the administrative records was higher in non-Hispanics (OR 2.62, 95%-CI 2.56-2.68) than in Hispanics. Stillbirth (OR 0.005, 95%-CI 0.004-0.006) but not neonatal death (OR 0.99, 95%-CI 0.72-1.38) decreased the odds for correct race identification. The total number of medical encounters only slightly increased the odds for correct race identification (OR for each additional encounter 1.01, 95%-CI 1.00-1.01). Among all medical encounters, inpatient (OR for each additional encounter 1.32, 95%-CI 1.30-1.35) and emergency (OR for each additional encounter 1.09, 95%-CI 1.09-1.10) visits showed the strongest association with correct race classification. Duration of health insurance coverage was not associated with the odds for correct race identification. Patterns of racial classification deviating from birth certificates (i.e. misclassification) differed significantly among non-Hispanics and Hispanics (Figure 1).
When children with unknown race in administrative records were excluded, racial classification was more accurate in non-Hispanic Whites, Blacks, and Asians (PPV 81%). Hispanic children from minority groups were frequently misclassified as White.

Discussion
This study utilized the most recent race and ethnicity data collected as part of the administrative records of a large, integrated health plan and compared it to information available from birth certificates. The information was accurate for ethnicity and for the three largest racial groups (White, Black, and Asian). Two major causes of disagreement between administrative and birth certificate records were identified: (1) missing information in administrative records and (2) classification of children of multiple races based on information from only one parent. Eliminating these causes would increase the sensitivity for correct racial classification from 66.4% to 95.7%. Because race and ethnicity information in health plan administrative records are constantly updated, information was more accurate in children with more medical encounters. Sensitivity and PPVs were generally higher in non-Hispanics than in Hispanics. Limitations in data quality were noted for children of multiple races and children of AIAN origin.
The quality of racial and ethnic information in children has not been well studied. However, the results from the present study were comparable to two previous studies investigating race and ethnicity information in adults [14,15]. In these studies, PPVs for Whites and Blacks were between 86.7% and 95.1%. However, PPVs and sensitivity for small minority groups such as AIAN were generally poor [15]. Comparably, PPV and sensitivity for Hispanic adults was lower than for non-Hispanic Whites and Blacks. These patterns are generally consistent with the accuracy observed for racial and ethnic information in Medicare enrollment databases [23]. The present study also shows that the patterns of misclassification varied greatly between Hispanic and non-Hispanic children.
In the present study, one major reason for race/ethnicity misclassification in the administrative records was missing information (non-classification). After exclusion of non-classified individuals, the sensitivity improved significantly for Whites, Black, and AIAN. This partially explains the lower sensitivity observed in our study compared to other studies which excluded non-classified individuals from their study population [14,15]. Incomplete and missing information on race, ethnicity and language in databases from health care organizations has been reported by others previously [24]. The results from our study suggest that birth certificate information is not routinely used to fill missing information in administrative records, even if available as in this setting.
The second important cause of disagreement between administrative records and birth certificates was the misclassification of children whose parents had a different race (i.e. multiple races). Among children of multiple races, the vast majority of children were misclassified because only racial information of one parent -mostly maternal information -was used for classification purposes. One possible explanation for this misclassification is an often observed simplification of multiracial heritage. Multiple races are often reported as one main race [25,26]. Multiracial identification varies across regions and races; in particular, AIAN are less likely to report themselves as multiracial [26]. It may also be speculated that maternal presence during birth as well as later medical encounters account for this observation.
The present study adds new information on changes in the quality of race information over the course of membership. Race and ethnicity data collected in an integrated health care system used in the present study are updated during medical visits, as opposed to other settings such as health insurance claims where race/ethnicity information is usually collected at enrollment. The present study shows that the quality of information increased over time with increasing number of medical encounters, especially inpatient visits. Although the effects may differ in magnitude by organization, we can assume our results are generalizable to other integrated health care settings that update their patient's demographic data during office visits.
Our study benefited from the substantial size of a diverse population with adequate numbers of Hispanic and non-Hispanic racial and ethnic group representation to generate ample statistical power and allow valid estimates of sensitivity and PPVs. A limitation of the present study is the use of information obtained from birth certificate records as a criterion standard. After carefully reviewing the birth certificate records, previous studies have reported that birth certificate records provide relatively valid information on race and ethnicity [16,19]. Race and ethnicity from birth certificates are also used as standards for federal statistics such as intercensal population estimates [20][21][22]. Despite PPVs of 96% and above for most races, significant limitations of the data quality were described for individuals of AIAN origin.
Consequences of misclassification of racial and ethnic minorities can lead to data misinterpretation and erroneous conclusions. Incorrect classification of individuals of a small minority group may lead to over or underestimation of health disparities and race-related risk factors. Therefore, accurate racial and ethnic information is crucial for health care research.

Conclusions
Results of the present study suggest that the overall quality of racial and ethnic information is relatively good for distinguishing between Hispanics and non-Hispanics, Whites, and Blacks. Our results also show that use of health plan administrative records alone leads to frequent misclassification of minority groups and individuals of multiple races. Thus, linking birth certificate information to the administrative records of children can optimize the accuracy of race and ethnicity classification if this information is available.