Sample and study design
We reviewed all eligible insurance claims for LBP using the 2015–2016 Health Care Cost Institute (HCCI) database that includes health insurance claims data for approximately 50 million insured individuals per year in the US, across four private health insurance companies (Aetna, Humana, United Healthcare, and Kaiser Permanente) including claims covered by a Medicare Advantage plan. The data that support the findings of this study are available from HCCI (https://healthcostinstitute.org), but restrictions apply to the availability of these data, which were used under license for a fee to conduct the current study, and so are not publicly available. HCCI is an independent, non-profit organization that licenses access to insurance claims data via secure enclave. We merged county-level data from the 2015 Area Health Resource File (AHRF) and distance information from the National Bureau of Economic Research (NBER) [23, 24].
We restricted the sample to include individuals 18 years or older who lived and received services within the 50 states and the District of Columbia. Individuals were excluded if they had a diagnosis of LBP, serious illness associated with non-musculoskeletal based LBP (S1 Table A), or an opioid prescription as defined by National Drug Codes identified by the Centers for Disease Control and Prevention  6 months prior to the index date (date of diagnosis of low back pain), which we refer to as a “clean period”. We further restricted the sample to those individuals who had continuous insurance enrollment for one-year after the index date. LBP was defined using ICD9/10-CM codes from the literature that were frequently used to designate nonspecific LBP diagnoses. (S1 Table B) [26, 27].
We defined cohorts based on the first provider seen on the index date. Provider categories were included in the study based on the most frequent health care providers seen first for an episode of musculoskeletal LBP in the database. The provider categories examined were: 1) acupuncturist (Acu), 2) advanced practice registered nurse (APRN), 3) chiropractor (Chiro), 4) emergency medicine (EM), 5) orthopedic specialist (Ortho), 6) physical medicine and rehabilitation (PM&R), 7) physical therapist (PT), and 8) primary care physician (PCP) including family medicine, internal medicine, and osteopathic medicine. All other provider types not specified above were collapsed into a category of “Other”. The six most common other provider types that made up about 75% of this category included radiology (23%), anesthesiology (17%), unknown provider type (15%), other non-physician provider (6%), and neurological surgeon (5%). We assumed that these providers were not common choices for an individual with a new diagnosis of LBP, but we kept this other category in the analysis to maximize sample size. For PT, we used the Current Procedural Terminology (CPT) code 97001, which is an evaluation code for new PT visits and excluded individuals with the follow-up examination CPT code (97002).
In some cases, individuals saw more than one provider on the index date. We excluded these individuals to ensure a uniform analysis of treatment decisions associated with the first provider. Of the 525,663 excluded, approximately 25% saw a PCP, 20% saw an EM, and 40% saw an “Other” type of provider; as such, the activities of these groups may be slightly underrepresented. Our final sample size was 3,799,593. Our study was approved by The George Washington University Institutional Review Boards (IRB #011814). All experiment protocol for involving humans was in accordance to guidelines of national/international/institutional or Declaration of Helsinki in the manuscript. A waiver for consent was given by The George Washington University IRB. Our research team had a signed data agreement with HCCI that is re-examined annually to access the proposed dataset. HCCI was the holder of the identifiers, and prohibited the release of the identifiers to the study team.
As our dependent variables, we included several utilization and cost measures within the one-year post-index date. We included a set of binary measures related to opioid prescriptions. To allow for comparison across studies, we used the definition by Kazis and colleagues for early- and long-term opioid prescriptions . Receiving an early opioid prescription was defined as a filled prescription within 30 days or less of the index date. Receiving a long-term opioid prescription was defined as a filled prescription within 60 days or less and either 1) received 120 days or more of pills supplied in the one-year post-index date or 2) received 90 or more days of pills supplied and had 10 or more refills in that 1 year. We created binary measures for a) diagnostic imaging services, which was defined as whether an individual had magnetic resonance imaging (MRI) or a computed tomography (CT) scan, b) radiography, c) surgery related to LBP (Table A in S1), d) ED visit, and e) inpatient hospitalization. A binary measure was created for any serious illness related to LBP (see Table A in S1 for ICD-9 or − 10 codes) to address concerns about any delays in diagnoses associated with provider types.
We used two measures of costs. First, we defined total health care costs as the net paid amount to the provider after all deductions and calculations over the course of the one-year post-index date. Second, we defined total out-of-pocket costs, which are a subset of total health care costs, to include deductibles, coinsurance, and copays for all visits over the course of the year. Negative or missing values were coded as zero.
Control variables included whether an individual identified at the index date as female or male, age categories (with age 18 to 34 as the reference category and age 75 and older collapsed into one category), whether an individual was in a Medicare Advantage plan, plan type categorized by flexibility of provider referrals (preferred provider organization and exclusive provider organization as the combined reference category; health maintenance organization; point of service; and “other” including private fee-for-service, independent, life insurance, and unknown), whether the individual was in a high deductible plan or not, and an Elixhauser Index value based on all other diagnoses identified on the index date (and accounting for changes in coding from ICD-9 to ICD-10-CM) .
Using data from the AHRF, we categorized the county in which an individual lived based by Urban Influence Code (UIC), which accounts for population density and urban influence, collapsed into three categories: metropolitan (reference), micropolitan and noncore (rural) [23, 29]. To account for socioeconomic factors influencing the utilization of health care by individuals, we included county typology codes defined by the Economic Research Service (ERS) including whether the county population had low levels of educational attainment (defined as 20 % or more of the working-age population lacking a high school diploma or equivalent), low employment levels (defined as less than 65% of working age population employed), and whether the county population had persistent poverty levels. We included the percent of the county that was uninsured. We also controlled for whether a patient lived in a state with limits or provisions on physical therapy services.
Instrumental variables estimation
Observational studies require statistical techniques to control for confounding, or selection bias, into an intervention. To control for the selection bias associated with an individual’s choice of the first provider seen, which is the variable of interest in our study, we used an econometric technique called instrumental variable (IV) estimation that is well-known for causal inference in health services research and epidemiological studies [30, 31]. We specifically used the two-stage residual inclusion (2SRI) estimation approach of the IV estimation, which is recommended for use in non-linear modeling to create consistent estimators . An IV estimation requires a variable (the “instrument”) that strongly predicts the intervention that an individual receives (i.e., the first provider seen at the index date), but is not directly associated with the outcome measures. We used the copay associated with the index date as our “instrument”. For a sensitivity analysis, we also tested differential distance, an instrument used in a previous study,  defined as: 1) the distance between an individual and the first provider of choice, and 2) the distance between an individual and the closest alternative provider (see S2 for further discussion about measures and S2 Table A and Table B for first stage results).
Both instruments had a large Wald statistic. Given that the Wald statistic is asymptotically equivalent to a likelihood ratio test statistic, a generalized version of the F-statistic, we have evidence that our instrument satisfies the F-test threshold being above 10, indicating a strong instrument [32, 33]. Given that the sample size was large for this study, achieving this threshold was not surprising.
We report descriptive statistics for bivariate analyses. In the first stage of the 2SRI, we predict the first provider type seen (PT as reference group) as a function of the instrument (e.g., copay) and control variables described in the previous section using a multinominal logistic regression model, which is used to estimate categorical variables with no logical ordering (see S2 Tables A & B). In the second stage, we used probit models to predict the probability for each of our health care utilization measures (e.g., opioid prescription, imaging service, ED visit, hospitalization, surgery, and serious illness) and generalized linear models assuming gamma distribution and using a log link to estimate total and out-of-pocket costs as a function of first provider seen, control variables from the first stage, and the raw residuals from the first stage. We conducted two additional sensitivity analyses: 1) using deviance residuals in place of raw residuals, which produced nearly identical results, and 2) adding state dummies produced similar consistent parameter estimates but we did not include them in the final model at risk of overfitting. While we kept the “Other” category of providers in our models to preserve sample size and further reduce selection bias, we report predicted probabilities for each of the binary outcome measures and the predicted costs for individuals seen by each of the specific provider types; results for the Other group are available upon request. Robust standard errors were used for all models.