Validation of use of billing codes for identifying telemedicine encounters in administrative data

Background Telemedicine is the use of telecommunication technology to remotely provide healthcare services. Evaluation of telemedicine use often relies on administrative data, but the validity of identifying telemedicine encounters in administrative data is not known. The objective of this study was to assess the accuracy of billing codes for identifying telemedicine use. Methods In this retrospective study of encounters within a large integrated health system from January 2016 to December 2017, we examined the accuracy of billing codes for identifying live-interactive and store-and-forward telemedicine encounters compared to manual chart review. To further examine external validity, we applied these codes and assessed patient and visit characteristics for identified live-interactive telemedicine encounters and store-and-forward telemedicine encounters in a second data set. Results In manual review of 390 encounters, 75 encounters were live-interactive telemedicine and 158 were store-and-forward telemedicine. In weighted analysis, the presence of the GT modifier in the absence of the GQ modifier or CPT code 99444 yielded 100% sensitivity and 99.99% specificity for identification of live-interactive telemedicine encounters. The presence of either the GQ modifier or the CPT code 99444 had 100% sensitivity and 100% specificity for identification of store-and-forward telemedicine encounters. Applying these algorithms to a second data set (n = 5,917,555) identified telemedicine encounters with expected patient and visit characteristics. Conclusions These findings provide support for use of CPT codes to perform telemedicine research in administrative data, aiding ongoing work to understand the role of non-face-to-face care in optimizing health care delivery.


Background
Telemedicine encompasses the use of telecommunication technology to remotely provide healthcare services that support patient care [1]. Telemedicine can take many forms, with some models facilitating live interactive communication between patients and clinicians ("synchronous") while other models record and send patient information, allowing clinicians to review and respond to this information at a later time (store-and-forward, or "asynchronous").
Both of these models have the potential to overcome geographic barriers and to enhance the timeliness of care. While these features certainly have appeal, potential concerns with telemedicine include the possibility of increased unnecessary visits due to the greater ease of access (contributing to increased health care costs), the potential for lower quality of care (particularly if clinical decisions are made with inadequate information), and the potential to perpetuate or exacerbate access disparities [2][3][4]. Thus, understanding the use, trends, and impact of telemedicine across multiple specialties and applications is vital to promoting telemedicine use that facilitates more effective, efficient, equitable, and patient-centered care delivery.
Administrative data are an important resource for understanding trends in telemedicine use and for assessing the effectiveness of telemedicine compared to other care delivery strategies. Administrative data analyses of telemedicine encounters rely on billing modifiers to identify telemedicine visits. Telemedicine billing modifiers were designated by the United States Centers for Medicare and Medicaid Services (CMS) to indicate encounters that used two specific types of telemedicine: live interactive telemedicine (GT) and store-and-forward telemedicine (GQ). Through identification of these codes, analyses of administrative data have contributed to our current understanding of telemedicine practices. For example, a recent analysis of a Minnesota All-Payers Claims Database elucidated a comprehensive picture of telemedicine use across multiple payer types and patient geographies [5]. Other studies used administrative data to study telemedicine for mental health services [6], substance use disorder treatments [7], primary care [8], and acute respiratory infections [9]. Prior analyses have also highlighted trends in telemedicine use among specific populations, such as rural communities [10] and children [11].
While the use of billing codes within administrative data to identify telemedicine encounters are widely used, we are not aware of any validation studies comparing these codes to a gold standard of chart review. Understanding the sensitivity and specificity of these codes for identification of telemedicine in general, as well as for synchronous versus asynchronous telemedicine specifically, will allow for clearer understanding of the strengths and limitation of this approach for identification of telemedicine encounters. Thus, in this study, we assessed the validity of using billing modifiers to identify telemedicine encounters within a large integrated health system.

Study design and data sources
Our study had two parts. First, we examined the accuracy of billing codes in identifying telemedicine encounters, the goal of which was to determine the criterion validity of use of these billing codes (i.e., to what degree do these codes reflect the occurrence of a telemedicine encounter?). To accomplish this, we performed manual chart review and compared billing codes to chart review findings, allowing determination of sensitivity and specificity of specific combinations of billing codes for telemedicine encounters and refinement of billing code algorithms for identifying telemedicine encounters.
Second, we examined the frequency with which telemedicine encounters were identified using billing code algorithms, overall and by encounter characteristics, to provide external validity for use of these billing codes in a separate data set (i.e., After applying the validated algorithms to another data set, to what degree do identified telemedicine encounters reflect expected characteristics of telemedicine encounters?). To accomplish this, we applied the billing code algorithms with best performance during the prior step, and compared the patient and visit characteristics of identified telemedicine encounters.
For both parts of this analysis, we used data from a 40hospital, 4900-physician integrated health system spanning Western Pennsylvania. Multiple electronic health records are used across inpatient and outpatient services within this system, and our analysis was not limited to a specific electronic health record or a specific telemedicine platform. Data were obtained from University of Pittsburgh Medical Center (UPMC) clinical data obtained through the University of Pittsburgh Health Record Research Request (R3) Service. For the first part of the study, we sampled encounters from a data set from January 1 -June 30, 2016. For the second part of the study, we used a complete encounter data set from July 1, 2016 -December 31, 2017. Analyses were conducted in StataSE 14 (StataCorp), and this project was approved by the University of Pittsburgh Institutional Review Board.

Candidate components of telemedicine identification algorithms
Prior to analysis, we identified multiple potential components of algorithms for identification of telemedicine, based on review of prior approaches [5,8,10] and on insurer bulletins [12]. Specifically, we focused on two specific modifiers (GT and GQ), which were developed for indication of live interactive telemedicine and store-and-forward telemedicine, respectively. Additionally, we investigated additional CPT codes indicating telemedicine and online services (e.g., 99,444, G0425-G0427) delivered by a physician, nurse practitioner, or physician assistant services. We did not investigate CPT codes designed for use by allied health professionals (e.g., G0270) to allow more focused study and because of the potential for such services to be documented outside of systems available for manual review.
From this set of encounters, a random stratified sample of 390 encounters was selected for chart review. To ensure adequate representation of different combinations of billing codes in the sample, we randomly sampled approximately 130 encounters with the GT modifier, 130 with the GQ modifier, and 130 with neither modifier, with this sample size set to provide estimates with a margin of error at or below 5%. Within each modifier group (GT, GQ, and neither), we sampled an equal number of inpatient and outpatient encounters (if available) to ensure sampling of encounters across potential sources of variability in coding accuracy ( Table 1). The GQ modifier was only associated with outpatient billing codes, however, and therefore all 130 encounters with the GQ modifier were sampled from outpatient billing code encounters.
Manual review of two electronic medical record systems were then conducted by trained reviewers using a structured instrument to assess and record whether the selected encounter was completed through telemedicine, whether telemedicine used was synchronous or asynchronous, as well as additional features of telemedicine encounters, when identified. We determined whether the visit was telemedicine and what type of telemedicine encounter first by reviewing the full text of the clinical documentation, second by searching specifically for common keywords and phrases in the encounters ("telemedicine", "audio", "visual", "electronic"), and third by specifically reviewing the exam documentation for indication of inperson versus remote evaluation. Any indication of telemedicine through these three steps was considered sufficient to categorize that visit as a telemedicine encounter.
Among visits determined to be telemedicine (either live interactive or store-and-forward), we also extracted additional elements to examine the specific contexts of telemedicine use. Specifically, to understand the context of the telemedicine encounter relative to in-person care, we extracted whether any prior or follow-up visits occurred with the same specialty in the year prior or the year subsequent to the telemedicine encounter.
To achieve high reliability in this manual review of encounters, we first had two reviewers (DY, KNR) review 6 training encounters outside of the sample, after which discrepancies were discussed encounter by encounter. Then the first 20 encounters in the sample were independently reviewed by these 2 reviewers with discrepancies discussed, after which the next 40 encounters were again reviewed independently by 2 reviewers with discrepancies discussed. The Cohen's kappa statistic for the first set of 20 was 0.92 and for the second set of 40 was 1.00, indicating a high level of agreement between reviewer. The remaining encounters were reviewed by a single reviewer (DY). Encounters were reviewed in a random order, and the reviewer was blinded to the billing codes for each encounter. Sampled encounters with no available clinical documentation were replaced with another randomlyselected encounter from the same sampling strata, with 2 additional GQ encounters, no additional GT encounters, and 22 additional encounters with neither modifier sampled. Data were entered into REDCap electronic data capture tools hosted at the University of Pittsburgh.

Chart review analysis
Results from manual review of encounters were weighted to account for proportion of each strata in the available encounters relative to the sampled encounters (Table 1). Applying survey functions to account for these weights, we calculated sensitivity and specificity of these billing codes for identification of telemedicine visits, including 95% confidence intervals (CIs). We determined these values for multiple combinations of telemedicine types (live interactive telemedicine, store-and-forward, or either) and multiple permutations of billing codes (GT or GQ as well as specific sets of CPT codes; Table 2), allowing manual identification of preferable algorithms for identifying telemedicine encounters. We also determined the positive predictive value (PPV) and negative predictive value (NPV) for each combination of billing codes. After selecting the preferred algorithm for identification of each type of telemedicine visit in the overall data, we further determined the sensitivity, specificity, PPV, and NPV for these algorithms within specific subsets of visits identified by CPT codes (e.g., inpatient, outpatient, etc.). Finally, we used descriptive statistics to compare encounter review findings for live-interactive versus store-and-forward telemedicine in terms of prior and follow-up visits within the same department.

External validation data set
The second set of administrative data was obtained for the 18 months after the initial data set (July 1, 2016-December 31, 2017). Encounters were again identified for inclusion based on the same CPT codes as the prior data set. Encounters were not excluded based on any patient age. Variables in this second data set included billing codes, billing modifiers, patient age, race/ethnicity, insurer, ZIP code, visit date, provider and specialty department. Insurer type was categorized as commercial, public (Medicaid, Medicare), and selfpay/missing. Straight line distance from patient ZIP code centroid to the primary tertiary medical center location was determined using Stata calculations. County rural/urban status was determined using USDA rural-urban continuum codes [13]. Visit dates were categorized into quarters. Clinician specialty were summarized into the following groups: primary care, emergency medicine, dermatology, pediatric specialties, non-pediatric medical specialties, non-pediatric surgical specialties, psychiatry, and other. In this second data set, we determined whether a telemedicine visit occurred by applying the most sensitive and specific algorithms for live interactive telemedicine and store-and-forward telemedicine determined through the prior analysis. Specifically, live interactive telemedicine was defined as the presence of the GT modifier in the absence of the GQ modifier and the absence of the 99444 CPT code. Store-and-forward telemedicine was defined as the presence of the GQ modifier or the 99444 CPT code, regardless of the presence of the GT modifier. Telemedicine of either type was defined as the presence of the GT modifier, the GQ modifier, or the 99444 CPT code.

External validation analysis
Using chi-square tests, we compared visit characteristics for non-telemedicine visits, live interactive telemedicine visits, and store-and-forward visits to determine external validity in this separate data set. We also assessed the unique number of providers delivering care via each type of visit.
Based on prior studies [6] and insurer policies [14], we hypothesized that live interactive telemedicine would be more prevalent among individuals residing greater distances from the main tertiary medical center and residing in rural counties. We hypothesized that store-andforward telemedicine would be more prevalent among younger adults than older adults, because younger age patients have been reported in prior evaluations of store-and-forward dermatology encounters [15]. We hypothesized that both types of telemedicine would be more prevalent in more recent quarters given reports of increasing telemedicine use [8,11].

Chart review
The first data set included 2,096,002 encounters without GT or GQ modifiers, 2,628 with the GT modifier, and 258 with the GQ modifier. Among these, a sample of 390 were reviewed. Through manual encounter review of these 390 encounters, we identified 75 live interactive telemedicine visits, 158 store-and-forward telemedicine visits, and 157 visits that were not telemedicine. Of note, 5 of the encounters sampled based on the presence of the GT modifier also contained the GQ modifier (4%), and 108 of the encounters sampled based on the presence of the GQ modifier also contained the GT modifier (82%). After adjusting for prevalence of each sampling strata through survey weights, the presence of either the GT and/ or GQ modifier had 100% sensitivity and 99.99% specificity (95% CI 99.98-99.99%) for broadly identifying telemedicine encounters (Table 2), including either store-and-forward or live interactive telemedicine visits. In this sample, the positive predictive value (PPV) was 90.8% (95% CI 86.0-94.0%) and the negative predictive value (NPV) was 100%.
For identification of live interactive telemedicine visits, the presence of the GT modifier had 100% sensitivity and 99.93% specificity (95% CI 99.92-99.95%). Identifying live-interactive telemedicine with the presence of GT modifier after excluding cases that also had the GQ modifier or the CPT code 99444 improved the specificity to 99.99% (95% CI 99.98-99.99%) while maintaining 100% sensitivity, making this the preferred algorithm for live interactive telemedicine identification. Because telehealth codes (G0406-G0408; G0425-G0427; G0508-G0509) consistently co-occurred with the GT modifier in this sample, algorithm sensitivity/specificity neither improved nor worsened with the addition of these codes into identification algorithms.
For identification of store-and-forward telemedicine visits, the presence of the GQ modifier had 36.36% sensitivity (95% CI 25.2-49.3) and 100% specificity. Using the presence of either the GQ modifier or the CPT code 99444 had 100% sensitivity and 100% specificity for identification of store-and-forward telemedicine visits, making this the preferred algorithm for store-and-forward identification.
Among only outpatient visits, the presence of either the GT and/or GQ modifier had 100% sensitivity and 99.99% specificity (95% CI 99.98-100%), with PPV of 97.0 (95% CI 89.3-99.2) and NPV of 100% (Table 3). When examining only inpatient visits, the presence of either the GT and/or GQ modifier had 100% sensitivity and 99.98% specificity (95% CI 99.97-99.98%), with 64.6% PPV (95% CI 52.1-75.4%) and 100% NPV. Using encounter review data, we also compared overall clinical context of live-interactive versus store-and-forward telemedicine encounters. In general, live-interactive telemedicine visits more often occurred within the context of ongoing clinical relationships, while store-and-forward visits more often occurred as isolated encounters. Specifically, from unweighted encounter review results, the percentage of encounters in which patients had a prior inperson visit in the same department was 23% among live interactive telemedicine encounters, compared to 8% of store-and-forward telemedicine encounters (p = 0.002). Similarly, the percentage of encounters in which patients had a subsequent follow-up visit in the same department (in-person and/or telemedicine) was 57% among live interactive telemedicine encounters and 33% of store-andforward encounters (p < 0.001).

External validation
We then applied the preferred telemedicine identification algorithms to the second dataset to examine external validity of this algorithm outside of the original chart review data set. The second data set (July 1, 2016-December 31, 2017) included 5,917,555 encounters with over 6000 clinicians. Among these encounters, 888,365 (15%) were by ≤17-year-old patients and 2,138,653 (36%) were by > 65-year-old patients (Table 4). Similar proportions of encounters were covered by commercial (47%) and public insurance (49%).
Visits identified as live-interactive telemedicine were associated with expected encounter characteristics. Specifically, live-interactive telemedicine visits were more likely to be with patients living farther from the tertiary care center (60-90 miles: 65% of live interactive telemedicine visits versus 14% of in-person visits; Table 4). Live interactive telemedicine visits were also more commonly from nonmetropolitan communities (66% of live interactive telemedicine encounters versus 14% of in-person visits). Additionally, a greater percentage of live interactive telemedicine visits occurred in the most recent quarter (20% in the fourth quarter of 2017, increased from 15.2% in the third quarter of 2016). These findings were consistent with our expectations for live-interactive telemedicine visits. Additionally, live-interactive telemedicine encounters were predominantly associated with consultation CPT codes (inpatient, outpatient, and telehealth) and with age > 65 years old (42%).
Visits identified as store-and-forward telemedicine encounters were also associated with expected encounter characteristics. Specifically, these encounters were most likely to occur among young adults (25-44-year-old: 55% of store-and-forward encounters versus 16% of in-person encounters). Encounters identified as store-and-forward were also predominantly in more recent quarters (fourth quarter of 2017: 23% of all store-and-forward encounters, increased from 14% in the third quarter of 2016). These findings further support the validity of the identification algorithm in this category as well. Additionally, we found that store-and-forward telemedicine encounters were more likely to occur with patients in close proximity to the tertiary care center.

Discussion
With increasing interest in and use of telemedicine, health services researchers require accurate means to identify telemedicine encounters in order to identify trends in use, clarify disparities in use, and determine impact on health outcomes. In this analysis, we examine the validity of using specific modifiers and billing codes to identify any telemedicine encounter and to identify specific subtypes of telemedicine encounters (live-interactive and store-and-forward). Using GT, GQ, and the 99444 modifiers together to identify any telemedicine encounter had 100% sensitivity and 99.99% specificity for telemedicine encounters. Our analysis provides strong support for use of these codes in telemedicine research, essential for ongoing work to understand role of non-face-to-face care in optimizing health care delivery.
Through this analysis, we examined criterion validity for identification of telemedicine encounters with billing codes, and we refined algorithms to achieve high sensitivity and specificity. We subsequently further assessed external validity by examining the degree to which identified telemedicine encounters were associated with expected patient and visit characteristics in a second data set. Results in this second phase of our analysis were consistent with hypothesized findings, providing evidence of external validity. Specifically, we anticipated increased use of liveinteractive and of store-and-forward telemedicine over time, which we observed in these data. Also consistent with our expectations, live interactive telemedicine encounters were more commonly observed among patients who lived in more rural communities and at greater distance from the tertiary care center. These findings are consistent with prior literature [5,6] and consistent with Medicaid and Medicare policy during the study period [14]. Specifically, Medicare telemedicine reimbursement was limited to health professional shortage areas during the study period, and Pennsylvania Medicaid policy in place during the study period suggests that providers should consider travel time greater than 60 min in rural areas or greater than 30 min in an urban area when considering telemedicine use [14]. Also consistent with prior studies and expected findings, we observed higher use of store-and-forward telemedicine among younger adults compared to older adults. Altogether, these findings provide additional support for the validity of use of billing claims to identify telemedicine and to differentiate between live-interactive and store-and-forward telemedicine.
We also observed additional associations between patient characteristics and use of each subtypes of telemedicine beyond those we hypothesized. These additional findings add further to our understanding of the use of liveinteractive telemedicine versus store-and-forward telemedicine. For example, live interactive telemedicine encounters were more common among adults over 65 years of age and publicly insured individuals. The consistency in federal regulations governing use of live interactive telemedicine among Medicare recipientsin contrast to the complexity of policies and practices among commercial insurers and Medicaid managed care organizationsmay contribute to this finding. In contrast, store-and-forward telemedicine use appeared to be primarily among commercially insured individuals. Additionally, the patients receiving telemedicine encounters were disproportionately white (94% of live interactive telemedicine encounters; 93% of store-andforward telemedicine encounters with non-missing race), raising concerns about whether use of telemedicine is enhancing access uniformly versus perpetuating disparities.
In examining the clinical context of telemedicine visits, it appears that specialties primarily offer one of the two models of telemedicine, rather than both. For example, dermatology and emergency medicine department clinicians provided a large number of store-andforward encounters and no live-interactive telemedicine encounters. In contrast, medical specialties and surgical specialties provided the bulk of live-interactive telemedicine encounters and rare store-and-forward encounters. This may be appropriate adoption of the technology that best fits a field's clinical care, but may also reflect the difficulties behind starting telemedicine service lines, requiring departments to focus on one technology and workflow at a time. From our manual review of encounters, we also observed different longitudinal contexts of each telemedicine type, with live interactive telemedicine associated more often than store-and-forward telemedicine with prior in-person visits and with subsequent follow-up visits. These additional findings help to characterize the contexts of live interactive and store-and-forward telemedicine use within a large health care system, adding to the main findings which validate use of claims for identification of telemedicine encounters.
One limitation of our analysis is that we examined encounters within one health system. However, the health system includes over 40 hospitals and thousands of providers across a large geographic area, and we used separate data from a more recent time period for the second phase of our analysis. Additionally, telemedicine encounters remain a small percentage of overall care in this system, such that there is the potential for billing and coding to be relatively idiosyncratichowever, we did identify 103 unique providers contributing to live interactive telemedicine claims and 76 unique providers contributing to storeand-forward claims. An additional limitation is that telemedicine billing continues to evolve, with Medicare adding unique telemedicine place of service codes in 2018 and proposing additional telemedicine-specific encounter codes in 2019 [12]. While future work may need to update this analysis, our analysis remains an important step in validating the most common billing codes in current analyses. Finally, we note that because positive and negative predictive value are influenced by prevalence, the PPVs and NPVs will not be applicable to health systems with telemedicine use that differs significantly from the prevalence observed in our system.

Conclusion
We identified algorithms with high sensitivity and specificity for identification of telemedicine encounters overall and for identification specifically of live-interactive telemedicine and store-and-forward telemedicine encounters, and we provide addition evidence of validity through assessment of the association of each type of telemedicine encounter with expected patient and visit characteristics. These findings provide strong methodological support for use of CPT codes in telemedicine research, essential for ongoing work to understand the role of non-face-to-face care in optimizing health care delivery.