Psychometric properties of an instrument measuring communication within and between the professional groups licensed practical nurses and registered nurses in anaesthetic clinics

Background The most common cause of clinical incidents and adverse events in relation to surgery is communication error. There is a shortage of studies on communication between registered nurses and licenced practical nurses as well as of instruments to measure their perception of communication within and between the professional groups. The aim of the present study was to evaluate the psychometric properties of the Swedish version of the adapted ICU Nurse-Physician Questionnaire, designed to also measure communication within and between two professional groups: licensed practical nurses and registered nurses. Specifically, the aim was to examine the instrument’s construct validity using confirmatory factor analysis and its internal consistency using Cronbach’s Alpha. Methods A cross-sectional and correlational design was used. The setting was anaesthetic clinics in two Swedish hospitals. A total of 316 questionnaires were delivered during spring 2011, of which 195 were analysed to evaluate the psychometric properties of the questionnaire. Construct validity was assessed using confirmatory factor analysis and internal consistency using Cronbach’s Alpha. To assess items with missing values, we conducted a sensitivity analysis of two sets of data, and to assess the assumption of normally distributed data, we used Bayesian estimation. Results The results support the construct validity and internal consistency of the adapted ICU Nurse-Physician Questionnaire. Model fit indices for the confirmative factor analysis were acceptable, and estimated factor loadings were reasonable. There were no large differences between the estimated factor loadings when comparing the two samples, suggesting that items with missing values did not alter the findings. The estimated factor loadings from Bayesian estimation were very similar to the maximum likelihood results. This indicates that confirmative factor analysis using maximum likelihood produced reliable factor loadings. Regarding internal consistency, alpha values ranged from 0.72 to 0.82. Conclusions The tests of the adapted ICU Nurse-Physician Questionnaire indicate acceptable construct validity and internal consistency, both of which need to be further tested in new settings and samples. Trial registration Current controlled trials http://www.controlled-trials.com Communication and patient safety in anaesthesia and intensive care. Does implementation of SBAR make any differences? Identifier: ISRCTN37251313, retrospectively registered (assigned 08/11/2012).

group and between other professional groups is investigated. There is a shortage of studies concerning the communication between nurses and LPNs as well as of instruments to measure their perceptions of the communication within and between the professional groups.
The aim of the present study was to evaluate the psychometric properties of the Swedish version of the adapted ICU Nurse-Physician Questionnaire, which is designed to also measure communication within and between professional groups: licensed practical nurses (LPNs) and registered nurses (RNs)/nurse anaesthetists (RNA)/specialist nurse in intensive care/theatre nurse. Specifically, the aim was to examine the instrument's construct validity and its internal consistency.

Design
A cross-sectional and correlational design was used. The present study is part of a project examining staff members' perceptions of relationships and communication across different professions. The data were collected prior to and after implementation of the communication tool Situation-Background-Assessment-Recommendation (SBAR) at an anaesthetic clinic [18], and the present study is based on baseline data.

The ICU nurse-physician questionnaire
The original ICU Nurse-Physician Questionnaire was derived from the Organizational Culture Inventory and showed Cronbach's alphas greater than 0.60 [19]. A shortened version consists of six sections: Relationships and communication within the ICU, Teamwork and leadership, Perceived effectiveness, Managing disagreements between physicians/managing disagreements between nurses and physicians, Authority, Satisfaction [20]. The creators of the questionnaire had not completed tests for the shorter version. However, earlier studies carried out in the US [21][22][23], Canada [24] and Japan [25], using parts of the questionnaire, have shown Cronbach's alpha between 0.51-0.93. In the present study, the short version section one (Relationships and communications within the ICU) adapted for LPNs was used. It consists of five factors: Within-group communication openness (4 items); Between-group communication openness (4 items); Within-group communication accuracy (4 items); Between-group communication accuracy (3 items); Communication timeliness (3 items). Openness, involves how nurse and physician can say what they mean when speaking with each other without fear or misunderstanding. Accuracy, involves the degree to which nurse and physician believe in the accuracy of the information presented to them by the other party. Timeliness, involves the degree that information about patients is related promptly to personnel who need to be informed [19]. Responses to the items are made on a 5point Likert scale ranging from "Strongly Disagree" to "Strongly Agree". The negatively worded items are reversed before factor scores are averaged.

Swedish version of relationships and communications within the ICU
During the project evaluating relationships and communication between different professionals in an anaesthetic clinic, the ICU Nurse-Physician Questionnaire (Short version, section one) was used [18]. Before the study began, permission to use the questionnaire was obtained from the developer. Four of the authors (MR, ME, GM, and CLS) translated the English version into Swedish. A bilingual professional translator subsequently carried out a back translation. Discrepancies between the two versions were thereafter discussed in the research group, and minor changes were made to make the questions understandable to physicians, RNs and LPNs in the context of a Swedish anaesthetic clinic [26]. The original questionnaire was created to address the relationships and communication between physicians and nurses only. As LPNs are a common staff group in Sweden, the questionnaire was adapted to also measure the relationships and communication between nurses and LPNs.

Context and participants
The setting was an anaesthetic clinic at two hospitals located in central Sweden, in the same county council and sharing the same top management. All LPNs, nurses and physicians working in the ORs, ICUs and PACUs were invited to participate in the study if they had been working at the anaesthetic clinic for the past 6 months and would be continuing their employment. The specialist nurses in intensive care, the RNAs, and the theatre nurses have a protected professional title, indicating that the person holding the title is a registered nurse with a Master's degree in either specialist nursing-intensive care, specialist nursing-anaesthesia care or specialist nursing-surgery care [27]. A specialist intensive care nurse has the authority to, when instructed, address, evaluate, and judge, e.g., analgesia and sedation. The title of nurse anaesthetist exists in the US, Sweden, Norway, Denmark, and Switzerland. A nurse anaesthetist has the authority to, when instructed, independently induce, maintain and conclude general anaesthesia, with some support from an anaesthesiologist. The specialist nurse in surgery care has the authority to, e.g., independently ensure that hygienic and aseptic standards are met as well as to organize the work associated with the surgery [28]. In Sweden, LPN is the most common profession within healthcare, with about 250,000 practitioners. The LPN is a vocational degree usually obtained after upper secondary education and does not have a protected professional title. At the anaesthetic clinic, the LPNs work in the OR, PACU and ICU.

Procedure
A total of 316 questionnaires, and two reminders, were sent out during spring 2011. The questionnaires were coded and sent back in prepaid envelopes to the researchers. The researchers worked at a university and had no employment at the hospital. The respondents were not in any dependency position of the researchers. The response rate was 73% (n = 230 of 316), whereof 195 respondents were nurses and LPNs (Table 1). Questionnaires measuring the relationships and communication between physicians and nurses were excluded. The 195 questionnaires were analysed to evaluate the psychometric properties of the Swedish version of the adapted ICU Nurse-Physician Questionnaire (Relationships and communications within the ICU).

Data analysis Statistical methods
Data were analysed with IBM SPSS Statistics version 22.0; for the CFA we used IBM SPSS AMOS 22.0. Construct validity and especially structural validity were assessed using confirmatory factor analysis (CFA).
The sample consists of 195 individuals. Regarding sample size recommendations, we considered the sample to be adequate for CFA [29]. The ratio of our sample size to the number of items was almost 11:1.
Out of the sample of 195 individuals, there were some internal missing values, i.e. individuals with missing observations in parts of the questionnaire. Fourteen individuals had at least one missing value for some items in the questionnaire.
Our analyses were conducted in two-by-two modelling stepstwo ways of handling missing data and two methods of estimation. We first deleted the individuals with at least one missing observation on an item, a socalled listwise deletion. The primary purpose of our initial analysis of listwise deletion of data (missing listwise) was to be able to use modification indices and standardized residuals to look for possible modification in specifications of the model [30]. We then repeated the analysis on complete data (missing casewise) for which all available data were used. No individuals with missing observations on items were deleted from the analysis. The original CFA model (Model 1) is displayed in Additional file 1: Figure S1. To set the scale of the factors, we restricted one of the regression weights from each of the factors to one. Furthermore, we allowed the factors to correlate with each other. Checks for multivariate normality were conducted using the index of multivariate kurtosis and its critical ratio [30]. The models were fitted to the covariance matrix. The modification indices and standardized residuals were then used to obtain evidence of misfit.
In each of the two missing data scenarios, we first estimated the model using the maximum likelihood (ML) method. CFA based on listwise deletion assumes that the mechanism underlying the missing data can be categorized as missing completely at random (MCAR) [30,31]. Listwise deletion increases standard errors and reduces statistical power. If the missing data pattern cannot be assumed to be MCAR, the estimates may be biased [30,32]. When there is missing data on items, i.e., missing casewise, AMOS uses the full information maximum likelihood (FIML) method, which produces consistent and unbiased estimates. Because it uses all information in the data, the method is likely to result in greater statistical power.
Given that responses to our items were made on a Likert scale, we have strictly ordinal scaled categorical data. According to Byrne [30], the methodological approach to the analysis of categorical variables in AMOS is Bayesian estimation. We therefore also conducted the analyses using the Bayesian method of estimation to evaluate estimation diagnostics and to compare the estimates derived from the two estimation methods. Without going into detail, the Bayesian estimation method uses a Markov Chain Monte Carlo (MCMC) algorithm to simulate the posterior distribution from which estimates of the factor loadings are obtained [31]. Other characteristics from the distribution may be used as estimates of standard errors, skewness and kurtosis. The number of simulations needed to obtain stable estimates from the posterior distribution was determined using a convergence statistic [30,31,33].
Model fit was assessed using Model Chi 2 , degrees of freedom (df), Chi 2 /df, and goodness-of-fit indices such as comparative fit index (CFI), standardized root mean square residual (SRMR), and root mean square error of approximation (RMSEA) with a p-value (PCLOSE) for testing RMSEA no greater than 0.05. The combinations of these fit indices are in line with recommendations in Hooper et.al [34].. Model Chi 2 is the likelihood ratio test statistic that compares the observed sample covariance matrix with the estimated model covariance matrix. A low value of Model Chi 2 , and in relation to its degrees of freedom, a value of Chi 2 /df < 2.0, is indicative of a wellfitting model. The CFI is based on a comparison of the estimated covariance matrix with the null model in which all variables are uncorrelated, i.e., comparison of the model chi-square to the independence chi-square. According to an earlier recommendation, CFI > 0.90 was indicative of a good fit, but later CFI > 0.95 was suggested [30,34,35]. The CFI is thought to perform well with small sample sizes [30,34]. The SRMR is based on squared differences between the sample correlation matrix and the hypothesized model correlation matrix. Acceptable values for SRMR are lower than 0.08, or preferably lower than 0.05 [30,34]. Finally, the RMSEA is based on the idea of how well the model, with optimally chosen parameter estimates, would fit the true population covariance matrix. Lower values on this index indicate a better fit. A value below 0.08 indicates good fit, but later recommendations suggest that values below 0.06 indicate good fit [30]. P-values (PCLOSE) for testing the null hypothesis RMSEA ≤0.05 should be p > 0.05. The internal consistency of the items was assessed using Cronbach's alpha coefficient.
The level for statistical significance was set at α = 0.05 for all tests.

Construct validity
Missing listwise (n = 181) The fit indices for the original model (Model 1) are shown in Table 2, and the model is shown in Additional file 1: Figure S1. According to the criteria for model fit, the indices were not entirely acceptable. Using Modification Indices and Standardized Residuals as well as theoretical reasoning, we tested an modified model (Model 2), Additional file 1: Figure S2, in which the error terms of item 'ICU9' (Within-group Communication Accuracy; "I feel that certain ICU nurses [Licensed Practical Nurses] don't completely understand the information they receive") and 'ICU 18' (Between-group group Communication Accuracy; "I feel that certain ICU nurses [Licensed Practical Nurses] don't completely understand the information they receive") were allowed to correlate. The fit indices were improved and now showed a better model fit; see Table 2. In Table 3, the estimates of factor loadings are displayed. Factor loadings from re-estimation of the original model using a Bayesian method are also shown. These factor loadings show only minor discrepancies from the estimates made using the Maximum Likelihood method. Diagnostics from the Bayesian method (see Additional file 2: Table S1) reveal some skewness and kurtosis of the estimates. Moreover, some instability of the estimates was indicated (see Additional file 2: Figure S1a-d of trace and autocorrelation ). This could be a sign of lack of fit in estimating the model using ML, because the items were on an ordinal measurement scale. The critical ratio for multivariate kurtosis for the included items was 11.99 and is a sign of multivariate nonnormality.

Missing casewise (n = 195)
The fit indices for the original model (Model 1) and the modified model (Model 2) using missing casewise (i.e., using all available data) are shown in Table 4. The fit indices were slightly improved compared to those from the smaller sample and, for the modified model, now showed an acceptable model fit; see Table 4. Table 5 displays the estimates of factor loadings. Factor loadings from a re-estimation using the Bayesian method of the original model are also shown. These factor loadings show only minor discrepancies from the estimates made using the Maximum Likelihood method. Diagnostics from the Bayesian estimation (see Additional file 2) still reveal some skewness and kurtosis of the estimates. Moreover, some instability of the estimates was indicated. Moreover, in the analysis of all available data, this could be a sign of lack of fit in estimating the model

Main findings
This is the first study to assess the psychometric properties of the Nurse-Physician Questionnaire (Short version, section one), adapted for LPNs and used in the context of a Swedish anaesthetic clinic. In the study, we found support for the construct validity of the adapted ICU Nurse-Physician Questionnaire as well as for its internal consistency.
Validity often refers to the ability of an instrument to measure what it purports to measure. Finding empirical support for validity is a matter of degree and involves a combination of logical arguments and an ongoing and iterative process. The concept of validity and construct validity has been discussed and defined in various ways [36]. We looked at construct validity as one important component of validity, alongside content validity and criterion validity as in, for example, the COSMIN checklist manual [37]. In the present study, we focused on the structural part of construct validity and used CFA to test whether the a priori structure of the adapted ICU Nurse-Physician Questionnaire fitted the sampled data. This gave partial insight into the scale construct validity. Other aspects of construct validity, such as contrasted groups and hypothesis testing, were not investigated in the study. We used CFA because it is more suitable to testing specific hypotheses regarding the relationship between items and the latent factors than is, e.g., Explorative Factor Analysis (EFA).
The CFA model first tested was a five-factor model with correlated factors and based on cases with complete data, i.e., when cases with at least one missing data point on the items were deleted. This model was not completely satisfactory as judged by the goodness-of-fit indices. A slightly better fit was achieved by letting two error terms correlate. Correlation between the ICU9 -Within-group Communication Accuracy ("I feel that certain ICU [Nurses] don't completely understand the information they receive") and the ICU18 -Betweengroup Communication Accuracy ("I feel that certain ICU [Licensed Practical Nurses] don't completely understand the information they receive") was allowed. The correlation was allowed because one can reasonably assume that personnel are unsure as to the extent to which all information is generally understood. Such a correlation may be caused by respondent bias in the associated items or a high degree of overlap in item content [30].
Missing values are a persistent problem in quantitative research. Handling missing values using listwise deletion requires that the mechanism underlying missing values be a completely random one (MCAR) to yield unbiased estimates. This is not a realistic assumption in most cases. Listwise deletion also reduces the number of observations, making the standard error larger. Instead of imputing values, we reanalysed the material using all of the available data. The ML method in AMOS for dealing with missing data is the FIML, which uses all data and produces unbiased, efficient and consistent data, provided we can assume that the missing data are missing at random (MAR), which is a less strict assumption than missing completely at random (MCAR) [31]. Because there were only 14 cases with at least one missing value, Convergence was achieved after 500 so-called burn-in samples and an additional 98,500 random draws the differences between the estimated factor loadings were not particularly large. Another concern was analysing the ordinal data, as they were quantitative and normally distributed. A common approach is to regard ordinal variables as a crude realization of underlying normally distributed data and to proceed with standard statistical methods or base the CFA on other correlation coefficients, such as polychoric correlations and asymptotic distribution-free methodology requiring very large sample sizes. Many of our variables showed marked skewness and kurtosis, and the index of multivariate kurtosis and its critical ratio indicated multivariate kurtosis. This could lead to biased factor loadings and correlations as well as to too low standard error estimates when using ML estimation [30].
A strength of our study was the use of Bayesian estimation as an alternative to ML estimation for ordinal data. In a Bayesian framework inferences do not depend on normally distributed variables and large samples assumption. Although this framework is quite different in its inference perspective, it is useful for comparison with ML estimates. With normally distributed data and in large samples with no outliers and no missing data, one should expect the estimates in CFA to be close [38]. Despite some skewness, kurtosis and signs of autocorrelations of the estimated factor loadings, they are very similar to the maximum likelihood results. This indicates that CFA using ML produces quite reliable factor loadings, but inflated goodness-of-fit statistics (chi-square values). This information enhances the conclusions from the CFA regarding the validity of the proposed construct.
Explicit calculation of sample size for a given power or precision has not been carried out. Although not satisfactory, we have relied on commonly used recommendations for sample sizes appropriate for CFA. Such recommendations suggest that the ratio of sample size to number of items should vary from 3:1 to 15:1, and a minimum size of 200 [29,39]. Our sample of 195 comes close to fulfilling these recommendations and may be considered adequate.
Interpretation of the goodness-of-fit indices indicated an acceptable model fit. This is not to say that the construct could not be improved as regards validity. It is also a possibility that our indices were inflated via the model chi-square due to departure from assumptions for using ML.
In conclusion, a strong feature of our study was that we used two sets of data, missing listwise data as well as missing casewise data, to compare the robustness of model to missing cases and that we used two estimation methods, ML and Bayesian, to compare the effects of data level on results.
One limitation with translation of a questionnaire from one language to another language is that there are no linguistic or cultural universals, which can guarantee equivalence between texts. However, the risk of misinterpretation should have been minimized as the questionnaire was translated by four authors (MR, ME, GM, and CLS) who have worked as RNs in hospital care and back translated by a bilingual professional translator. One of the authors (CLS) is a Theatre nurse and another author (MR) is an ICU nurse as well as a RNA and have worked at anaesthetic clinics for decades, and are therefore familiar with the context.
Summing up, the relationships and communication within and between different professional groups are crucial to patient safety. With the original version of Nurse-Physician Questionnaire, nurse-physician relationships and communication can be studied. The strength of the present additions to the adapted ICU Nurse-Physician Questionnaire is that the relationships and communication of the entire personnel group can be studied. However, the generalizability of the survey should be tested in other settings. Hopefully, regular measurements can capture potential relationships and communication difficulties within and between all professional groups, thus allowing measures to improve communication, collaboration and teamwork to be taken in time.

Conclusion
The tests of the adapted ICU Nurse-Physician Questionnaire indicate acceptable construct validity and internal consistency, both of which need to be further tested in new settings and samples. Although use of ML estimation is not strictly correct for ordinally scaled variables, the estimates did not differ substantially from those obtained using the Bayesian method.
Construct validity is one aspect of the validation of an instrument. CFA provides a mean to test construct validity but should be supported by other evidence as well.
Additional file 1. A schematic illustration of Model 1 and 2.
Additional file 2. Detailed information regarding Bayesian estimation.