Pragmatic adaptation of implementation research measures for a novel context and multiple professional roles: a factor analysis study

Background Although some advances have been made in recent years, the lack of measures remains a major challenge in the field of implementation research. This results in frequent adaptation of implementation measures for different contexts—including different types of respondents or professional roles—than those for which they were originally developed and validated. The psychometric properties of these adapted measures are often not rigorously evaluated or reported. In this study, we examined the internal consistency, factor structure, and structural invariance of four well-validated measures of inner setting factors across four groups of respondents. The items in these measures were adapted as part of an evaluation of a large-scale organizational change in a rehabilitation hospital, which involved transitioning to a new building and a new model of patient care, facilitated by a significant redesign of patient care and research spaces. Methods Items were tailored for the context and perspective of different respondent groups and shortened for pragmatism. Confirmatory factor analysis was then used to test study hypotheses related to fit, internal consistency, and invariance across groups. Results The survey was administered to approximately 1208 employees; 785 responded (65% response rate) across the roles of clinician, researcher, leader, support staff, or dual clinician and researcher. For each of the four scales, confirmatory factor analysis demonstrated adequate fit that largely replicated the original measure. However, a few items loaded poorly and were removed from the final models. Internal consistencies of the final scales were acceptable. For scales that were administered to multiple professional roles, factor structures were not statistically different across groups, indicating structural invariance. Conclusions The four inner setting measures were robust for use in this new context and across the multiple stakeholder groups surveyed. Shortening these measures did not significantly impair their measurement properties; however, as this study was cross sectional, future studies are required to evaluate the predictive validity and test-retest reliability of these measures. The successful use of adapted measures across contexts, across and between respondent groups, and with fewer items is encouraging, given the current emphasis on designing pragmatic implementation measures.


Background
Measurement issues in dissemination and implementation research are some of the most pressing challenges in the field due to a lack of psychometric studies and a tendency for scales to need adaptation for specific contexts and respondents [1,2]. As implementation research expands within the field of health services research, it is important that healthcare researchers are aware of best practices in implementation measurement [2]. Several implementation measures have been developed and validated in the social sciences and mental health fields to assess, for example, attitudes toward use of evidence-based practices (EBPs), organizational culture, and implementation leadership [3][4][5]. Using validated measures, or providing psychometric data when using existing measures in a new context, will improve the reproducibility and comparability of research on implementation determinants and processes [1]. Further, given the overall dearth of well-established measures for specific contexts and stakeholder types, researchers often adapt existing measures to align with the aims of a new study. Not only is this common practice, it is specifically endorsed as a research priority in the recent re-issue of the National Institutes of Health funding opportunity announcements on Dissemination and Implementation Research in Health posted May 8, 2019 (https://grants.nih. gov/grants/guide/pa-files/PAR- .html), in which researchers are discouraged from developing new measures solely for use in a specific study and are instead encouraged to use standard, validated measures where possible.
However, adapting a measure for a new context may affect the reliability and validity of the measure, necessitating re-evaluation of the measurement properties. We cannot assume that the construct being measured is salient across contexts as diverse as elementary schools, mental healthcare, or hospital-based adult medicine, because respondents may interpret items differently. Adaptations may include word changes to improve the specificity and interpretability of items for the new context and new participants, respectively. However, the psychometric properties of few implementation research measures have been formally evaluated across diverse health care contexts.
There is also a growing emphasis on pragmatic measures in implementation research. Among other characteristics, pragmatic measures should be salient to both the stakeholders and researchers, be perceived as low burden, and have broad applicability [6][7][8]. However, adapting validated measures to make them more pragmatic (i.e., more salient, shorter, and with broader application) could result in reducing the psychometric stability of the original measure. Thus, when adapting established implementation research questionnaires for a new context, it is imperative to verify that properties such as internal consistency and factor structure are acceptable and comparable to the original measure. For example, the Implementation Leadership Scale (ILS), was originally validated in social services settings and subsequently applied to substance use disorder treatment organizations. Aarons et al. [3] found that the properties of the ILS in this new context were acceptable and consistent with the original measure.
It is also important to understand whether internal consistency and factor structure of validated scales are adequate across different stakeholder roles, because inherent differences in perspective could translate to a different interpretation of the items. Current implementation research models and frameworks specify that obtaining data from people with varied perspectives is necessary [1,9]. However, stakeholders with different professional roles within complex health systems may have different perspectives on the implementation of innovations [10,11]. These differences may reflect actual differences in perspective or simply differences in the questionnaire's constructs between groups that is unintended or unanticipated. It is therefore important to look beyond simple mean-level differences to determine whether the psychometric properties of the questionnaire vary between stakeholder groups. It may also be beneficial to assess structural variance within confirmatory factor analysis to test for differences between respondent groups.
The purpose of this report is to describe our process for adapting validated implementation research measures to reflect the context and the distinct perspectives of key stakeholder groups in a large organizational change that involved changes in physical space and leadership structure, as well as changes in personnel and team roles. Specifically, the change was the transition of a major urban academic hospital, the Rehabilitation Institute of Chicago (RIC) to a new building, named the Shirley Ryan AbilityLab (SRAlab), and a new model of patient care that emphasizes collaborative efforts among clinicians and investigators. Prior to this transition, we were interested in examining the (1) internal consistency, (2) factor structure, and (3) variation in factor structures in implementation research measures across stakeholder groups such as clinicians, researchers, leaders, support staff, or dual-role clinician-researchers. Variation in factor structure was important due to differing item sets in some cases. In addition, our observations during transition preparations led us to hypothesize that the interpretation and understanding of items may differ based on stakeholders' roles within the organization.

Context
In 2015, a group of implementation researchers from Northwestern University and researchers from the SRALab, an academic physical medicine and rehabilitation hospital, began a partnership to evaluate an upcoming organizational change. The patients, clinical staff, and research scientists of the RIC were relocating to the SRAlab, a newly constructed building one block away. This transition involved a physical move to a state-of-the-art building, a reorganization of leadership, and a significant change in patient care and research practices: patient-care floors were designed to accommodate research staff alongside healthcare providers, with the goal of increasing collaboration between researchers and clinicians.
The clinical-academic team developed a survey to evaluate the implementation factors that would contribute to, and were affected by, the transition to the new model of care, which included (1) all private rooms, (2) improved technology integration, (3) research labs embedded within clinical therapy spaces, and consequently (4) increased interactions between clinicians and researchers.
The survey was administered 3 months prior to the transition, with plans to re-administer the survey periodically in the years following the transition.

Measure selection
We initially identified four validated implementation research measures to evaluate (1) leadership climate, (2) beliefs about the upcoming transition, and individuals' (3) use of and (4) attitudes toward EBPs. Consistent with the Consolidated Framework for Implementation Research (CFIR) [12], the selected measures assessed determinants in the domains of "characteristics of the intervention", or model of care, and the "inner setting", which were hypothesized to be the most relevant to the transition. These constructs have been shown to impact the uptake of innovations in healthcare delivery systems [13].

Measure adaptation
We adapted each measure to the SRAlab context and the professional roles we intended to survey. Instrument adaptations were discussed by a workgroup comprised of investigators with expertise in implementation research, organizational and systems change, and rehabilitative services provided by RIC, together with representatives from the three primary roles in RIC that we intended to survey: clinical staff, researchers, and leaders (dual-role leader/researcher). The workgroup selected subscales and items from each measure that were deemed salient to the transition and tailored each selected item for the various stakeholder groups, to ensure relevance. For example, in the Implementation Leadership Scale [14], we adapted the question originally developed by Aarons et al. "[Name of Supervisor] supports employee efforts to learn more about evidence-based practice" to "RIC's leadership team supports clinicians' efforts to learn more about research" (to be asked of clinicians) and to "RIC's leadership team supports researchers' efforts to use clinical practice to drive research development" (to be asked of researchers). In order to be more pragmatic, we removed less relevant or redundant items and subscales through a series of meetings with leadership and pilot testing with representatives from each of the professional roles. The feedback was used to refine the remaining items. A list of the adapted items for each role is presented in Table 1.

Survey administration
After receiving IRB approval from Northwestern University, the survey was administered using Research Electronic Data Capture (REDCap), a web-based application for data collection and management, hosted at Northwestern University Feinberg School of Medicine [15,16]. Active email addresses (n = 1208) in the human resources systemthe best means of contacting all current employeeswere used to invite employees to take the survey, via email, over 7 weeks, from 1/17/2017 to 3/3/2017, beginning 3 months before the transition. Email communication via REDCAP ensured participant anonymity while permitting follow up of nonrespondents. We used several strategies to encourage participation, including offering incentives (a customized mug with the SRAlab logo, a raffled dinner with the RIC Chief Executive Officer), periodic electronic mail prompts, and in-person reminders during clinical and research team meetings. Written informed consent was obtained electronically prior to administration of the online survey. We need to improve the way we deliver care at RIC.
We need to improve the way we deliver care at RIC. We need to improve the way we deliver care at RIC.
We need to improve the way we deliver care at RIC.

Participants
Question banks were adapted for and administered based on four self-reported primary roles: clinician (physicians, nurses, and allied health therapists), researcher, leader, or support staff. Secondary roles were also reported, allowing us to generate five analysis categories accounting for dual roles: clinician only, researcher only, support staff only, leaders (including those with primary and secondary leadership roles), and people with dual roles as a clinician/researcher. A total of 785 employees completed the survey, for an overall response rate of 65%. Response rates by primary role were 63% for clinicians (n = 544), 58% for researchers (n = 100), 92% for leaders (n = 79), and 64% for support staff (n = 52). Response rates were approximate, as we were unable to verify that all of the 1208 unique email addresses were active. For example, the 43 surveys that were returned as undeliverable, suggesting that the person was no longer employed at RIC or that they used a different address, were included in the total number of surveys administered. Respondents were predominantly female (77%), White (76%) (Black: 13%; Asian: 10%; other race/ethni-city< 1%), and most had been employed in the hospital for less than 10 years (73%), with 53% reporting less than 5 years' employment.

Measures
Implementation Leadership Scale (ILS) comprises 12 items that are rated on a 5-point scale indicating the degree to which the leader performs a specific behavior [14]. The 4 original subscales include Proactive Leadership (4 items), Knowledgeable Leadership (4 items), Supportive Leadership (4 items), and Perseverant Leadership (4 items). The mean of the subscales is computed to create the ILS total mean score (α = 0.97). Internal consistencies of the original subscales and total ILS score range from α = 0.93-0.98 in published studies [3,14]. Our adapted ILS included 6 items: 1 from the Proactive subscale, 1 from Knowledgeable subscale, 3 from Supportive subscale, and 1 from the Perseverant subscale. All professional roles except for support staff answered these questions regarding the leadership team, including leaders. For this and all other measures, subscale scores were only computed when at least three items from the original subscale were used (see Data Analysis section). Organizational Change Recipients' Beliefs Scale (OCRBS) is a 24-item scale to assess respondents' beliefs about a current or proposed change, to gauge the degree of buy-in among recipients and assess beliefs about Discrepancy (4 items), Appropriateness (5 items), Efficacy (4 items), Principal Support (6 items), and Valence (4 items) that could adversely impact the success of the change [4]. Internal consistencies of the original scales ranged from α = 0.86-0.95 in the original validation [4]. In our study, all participants answered 8 items: 1 in Discrepancy, 1 in Appropriateness, 3 in Efficacy, 1 in Principal Support, and 2 in Valance. We modified the original 7-cell anchored format to a 5-point Likert scale to be consistent with our other items. This change to a 5-point scale resulted from our bench testing procedures, during which time feedback was given that it was clearer and easier to interpret differences between points on the 5-piont scales than it was for the instruments with 7-point scales. This change was also made to the Evidence-Based Practice Questionnaire for the same reasons.
Evidence-Based Practice Questionnaire (EBPQ) comprises 24 items and was developed to measure nurse's EBP use, attitudes, and knowledge [17]. Internal consistencies are acceptable with Cronbach's alpha (α) of 0.87 for the full questionnaire; α = 0.85 for the Practice of EBP subscale; α = 0.79 for the Attitude towards EBP subscale; and α = 0.91 for the Knowledge/Skills associated with EBP subscale [17]. Our adapted survey included 5 items from the Practice subscale for clinicians and a change to a 5-point scale.
Evidence-Based Practice Attitudes Scale (EBPAS) comprises 15 items across four dimensions pertaining to attitudes of mental health therapists toward adopting and delivering EBPs: openness to new practices (4 items); intuitive appeal of EBP (4 items); likelihood of adopting EBP given the requirements to do so (3 items); and perceived divergence of usual practices from research-based, academically-developed interventions (4 items) [5]. Internal consistency of the original subscales in the original study of the EBPAS ranged from α = 0.59 to α = 0.90 with an overall α = 0.77 [5]. Our adapted survey included 7 items for clinicians (3 from Openness, 2 from Appeal, 1 from Requirements, and 1 from Divergence).

Data analysis
Data analyses were conducted in Mplus 8 [18] using maximum likelihood estimation to conduct a confirmatory factor analysis (CFA), while correlations, internal consistency, and descriptive statistics were completed in SAS 9.4. Determination of model fit included standard indicators: comparative fit index (CFI) [19], the root mean square error of approximation (RMSEA) [20], and the weighted root mean residual (WRMR) [21]. Good fit to the data was indicated by CFI values greater than 0.93 [22], RMSEA values less than 0.06, and WRMR values less than 1.0 [18,23,24]. First, we conducted an independent CFA for each measure by replicating the original subscales when at least 3 items were available, or by including all items in a single overall scale when subscales could not be specified. We removed items when there was evidence of low contribution to the underlying construct, as evidenced by standardized factor loadings < 0.5, and allowed items within scales/subscales to correlate when doing so resulted in a significant improvement in model fit. We tested for structural invariance by professional role for the ILS and OCRBS after fitting an acceptable overall CFA model using a Wald Test, to determine whether role of the respondent was related to differences in the factor loadings of the latent variables. Variances and standard errors were allowed to vary across roles. We report omnibus tests of mean differences in each scale by professional role. Table 2 shows internal consistency and final CFA models for each scale. Two items were eliminated from the OCRBS model, one from the EBPAS, and one from the EBPQ. Each final model provided acceptable fit to the data and the standardized factor loadings were statistically significant (p < 0.001), ranging from 0.63 to 0.95 for the retained items, indicating that the factors, with correlated items when appropriate, contributed to the latent construct. The multiple-group CFA by professional role approached statistical significance on the ILS (Wald  (Table S1), r = .41-.69 on the OCRBS (Table S2), r = .32-.73 on the EBPQ (Table S3), and r = .42-.83 on the EBPAS (Table S4).

Results
Mean values of the scales or subscales were calculated based on the factors included in the final CFA models. A significant difference between professional roles was found in the mean ILS (F [3, 721] = 6.27, p < 0.001), such that leaders rated leadership support higher than did researchers and dual-role clinician/researchers. Clinicians also rated their leaders higher on the ILS than did the dual-role clinician/researchers. In addition, there was a significant difference in mean OCRBS between roles (F [4, 787] = 5.38, p < 0.001), such that researchers reported less buy-in to the proposed change to the SRAlab and the new model of care compared to clinicians, support staff, and leaders.

Discussion
The assumption that the properties of an implementation measure construct will hold when administered in different contexts and to various stakeholders is rarely tested and applying measures without appropriate customization could lead to misinterpretation. Our results indicated that measures of leadership climate, beliefs regarding change, and use of and attitudes toward EBP had adequate internal consistency and factor loadings. However, in some cases, the best fitting CFA model required removing additional items-further shortening the original scale. This could have been in part due to inclusion of single items from the original subscales that, when combined with the remaining items, resulted in poor factor loadings on a general construct. Although the measures of leadership support and beliefs about change did not differ significantly based on the respondent's role, the data suggest that caution is required when applying brief implementation measures to people with different roles in an organization.
These findings are promising for several reasons. First, they demonstrate the robustness of four common implementation research measures used to assess the inner setting subdomains of leadership climate, beliefs regarding change, and use of and attitudes toward EBPs, even when these measures are shortened for pragmatism and adapted.
Second, our results indicate that tailoring the items of well-validated scales to new contexts and for specific stakeholder perspectives is feasible and empirically supported. While the current structure of the ILS guides researchers to tailor the question stems for a specific context, our results support considering this approach during the development and validation of new pragmatic measures. Tailoring items could result in better predictive validity of these measures by reducing error variance and misinterpretation of general items applied to a specific problem or viewpoint. In most implementation research studies, some adaptation of items to the context of the study is necessary or preferable to obtain valid and reliable results. This study also shows that some measures developed for a specific context might contain items that do not translate well to other contexts. This is exemplified by removal of some items during CFA, even after the items had been selected as relevant and tailored to better match this specific context by key stakeholders.
Third, results show that shortened versions of some implementation research measures can be developed. However, the shortened measures used in this study resulted in only a single item from some of the original subscales, which reduces specificity for addressing research questions that require a psychometrically robust scale. Although we did not test the predictive validity of the adapted scales, establishing that the measure has adequate internal consistency, factor loadings, and is invariant across respondent groups, is a necessary first step.
Additional File 1 includes the adapted versions of the ILS, OCRBS, EBPQ, and EBPAS resulting from this study. Scoring for each adapted scale simply involves calculating the mean. .88 Factor Loadings I will experience more self-fulfillment with the AbilityLab Model of Care.

Limitations
Limitations of this study include the potential loss of specificity due to the shortening of measures. For example, the original EPBAS had four subscales. Our final model replicated the openness subscale and created a new subscale containing items from both the appeal and requirement subscales of the original measure that relate to the likelihood to use EBP. Were our research questions and hypotheses specific to the independent role of each of these subscales, it would not have been advisable to reduce the items. Future research could address the reliability, construct, and predictive validity of our adapted measures [25]. Additionally, future work could include analyses based on item response theory, rather than using CFA, to determine the appropriateness of our reduced scales and removal of items. In this study, we confirmed our CFA results by documenting internal consistency with and without the dropped item(s). A second limitation is the reliance on one healthcare organization. Future research to replicate these findings across organization types and with different respondents is needed. However, this study supports continued evaluation of this specific organizational change with confidence in our measurement approach. Last, this study includes only a handful of the subdomains of the CFIR, which has measures for many but not all of the subdomains. The Society for Implementation Research Collaboration Instrument Review Project has compiled a comprehensive repository of available measures for each subdomain [6,26]. Although generalizability of each measure was not included in their review and ratings, even a cursory scan of the included measures suggests that some are quite specific to a particular service context, respondent, or clinical practice. These measures could prove more challenging to adapt than the more general measures described in this paper.

Conclusions
This study demonstrates methods for adapting and shortening implementation research measures and examining the impact on multiple psychometric properties. We selected measures that are widely used and whose original versions are psychometrically sound. However, evaluation studies are needed for other implementation measures. Similarly, development of new measures should include their evaluation in diverse contexts and with varied stakeholders. With a current emphasis on more pragmatic implementation research measures [8], these results are encouraging from the standpoint of use across contexts, with different respondent groups, and with reduced item counts. Validating adaptations of existing measures and publication of cross-informant and cross-setting psychometric evaluations such as this can help to address the noted gaps and shortcomings of implementation research instrumentation.
Additional file 1: Table S1. Intercorrelations and descriptive statistics for ILS. Table S2. Intercorrelations and descriptive statistics for OCRBS.