Measuring informal workplace learning outcomes in residency training: a validation study | BMC Medical Education

Residency in Germany

In Germany, residency training is independent from university-based undergraduate medical education. The medical associations (Ärztekammern) of the federal states are responsible for this, as they determine the learning content of specialist training. This includes mandatory sections in certain clinical areas (e.g. four years of anaesthesiology and one year of intensive care for an anaesthesiologist), as well as a correspondingly documented and performed minimum number of certain procedures (e.g. 50 central venous catheters). While the residents rotate through the different areas of their clinic to become familiar with the entire spectrum of their specialty, as well as to learn the necessary skills, the medical association decides at the end in a “collegial examination interview” whether to grant the specialist certification.

Currently, while competency-based curricula have been well integrated into undergraduate medical education, they are not generally applied to postgraduate specialist training. Furthermore, validated tools to assess competency do not yet exist or have not been accepted by German licensing institutions [21,22,23]. Although there is more competition for employment in the United States than in Germany, calls for changes in medical training have risen in recent years [24]. However, the structure of residency still differs substantially from that of American hospitals [25, 26]. In contrast, in countries like Great Britain and Ireland, for instance, competency-based assessments have been introduced to guarantee structured surgical training. These include specific teaching clinics, necessities for the ongoing education of the educators, and regular training days for trainees [27,28,29]. In Ireland, surgery residents need to complete lab-based operative skills assessments to be entrusted with professional activities [27, 29]. As mentioned above, German residency programmes may define the required content and length, but they do not emphasise the skills and competencies residents must possess at the end. Accordingly, the end of residency is defined by acquired time and knowledge, rather than skills and competencies.


Data collection occurred in the context of a survey study conducted among Bavarian medical graduates (Bayerische Absolventenstudie Medizin, MediBAS) during October 2018 and January 2019. Invitations to participate were distributed among 1,610 physicians who graduated between October 1, 2016 and September 30, 2017 at one of five Bavarian medical schools (Friedrich-Alexander Universität Erlangen, Ludwig-Maximilians-Universität München, Technische Universität München, Universität Regensburg, and Julius-Maximilians-Universität Würzburg). The survey was distributed via mail (paper-based) or email (Questback, Globalpark Inc.; for more details, see [30]. A total of 528 participants (n = 339 female; age: M = 29.79, SD = 3.37 years) completed the survey and were included in the present study. Of these participants, 88.8% were still employed in their first job after graduation, 9.3% had already started another job, and only three physicians were not employed at the time of the survey. The average number of working hours per week was 52.37 (SD = 9.42, min = 8.50, max = 85.00). A total of 58.3% worked in institutions with more than 500 employees, 25.2% worked in hospitals with 50–499 employees, 11.4% worked in companies with 2–49 employees, and 5.1% did not provide any information. The distribution of medical disciplines can be found in Table 1, and the occupation type can be obtained from Table 2. At the time of the survey, the participants had been in graduate medical training for an average of 13.5 months (SD = 7.2). Participants who had not (yet) received graduate medical training were excluded from these analyses, as we could rule out WPL in residency training for these participants. The participants were informed about the content and purpose of the study and gave their informed consent to participate in the survey in advance. The data collection was anonymous.

Table 1 Distribution of medical disciplines among participants
Table 2 Occupation type of participants

Validation approach

Our validation approach is based on the construct validity concept, which postulates that the trustworthiness of a score can be determined by revealing the relationship between a (theoretical) construct and the measure under consideration through construct validity [31]. The first step in investigating construct validity was to examine the psychometric properties of the German translation of the original questionnaire. We started by examining the dimensionality (factor structure) of the scale using explorative and confirmatory factor analyses. This approach aimed to answer RQ1. We then examined the reliability of the instrument to answer RQ2.

Finally, we compared the results of the factor analysis with the three subscales of “medical expertise”, “communication”, and “scholarship” on the Freiburg Questionnaire to Assess Competencies in Medicine (FKM; [32]), for construct validity to answer RQ3. To investigate the construct validity of the WPL measure, we assume that a higher degree of competencies would be related to higher informal WPL outcomes.


The Questionnaire of Informal Workplace Learning Outcomes was developed and first validated for social care workers in Belgium [1]. A second Belgian study investigated informal WPL outcomes by adapting the instrument for policy inspectors [6]. Both studies applied the instrument in the Flemish language. The validation of the original instrument [1] consisted of three factors (GLO, OLLO, and JSLO) that were replicated in the study with policy inspectors [6]. The model fit in both studies was in an acceptable range ( [1]: CFI = 0.95; SRMR = 0.04, RMSEA = 0.05; [6]: CFI = 0.92; SRMR = 0.072, RMSEA = 0.078). Internal consistency was also satisfactory in both studies ( [1, 6]: α ≥ 0.73).

As the work environments of socio-educational care workers and healthcare providers are different, the original Questionnaire of Informal Workplace Learning Outcomes by Kyndt et al. [1] was adapted to the work environment of healthcare (see item wording in Table 3). Then, the questionnaire was translated into German. In translating the instrument, we followed the recommendations of Wild et al. [33]. The translation of the instrument was done by two educational researchers with English language skills at the C1–C2 level. A back-translation was not performed due to time constraints (fixed start of study from the administration committee of data collection). The translated version was discussed by the research group in order to ensure the substantive accuracy of the translation and its suitability to residency training. Based on the suggestions of Kyndt et al. [1], we adjusted the scale of the JSLO to our professional field of medicine. We eliminated item No. 10 on the original JSLO, “to support clients in their social participation”, as the item describes a (medical) activity that applies to only a small proportion of the medical profession (e.g. psychiatry or psychosomatics). As a result, we reduced the number of items from five to four. In addition, JSLO items No. 3 and No. 4 were adjusted to the medical context. The items were appraised on a 5-point Likert scale (1 = “disagree”, to 5 = “agree”).

Table 3 Overview of factor structure of measure by Kyndt et al., Janssens et al., and the present study

Further data included age, gender, and the FKM score with the three subscales of “medical expertise”, “communication”, and “scholarship”. The scale measuring medical expertise referred to the knowledge and skills needed to conduct basic diagnostics and to develop treatment plans [32]. The scale for communication covers different communicative situations with patients and how to build trustworthy relationships with them [32]. Finally, the scale for scholarship covers the scientific competencies of reading, interpreting, and applying medical research findings in one’s daily work. The competencies described by the FKM are estimated on a 5-point Likert scale (1 = “not at all”, to 5 = “very much”; [32]).


Missing data were imputed using the Random Forest imputation method [34] in R by applying the R-packages “missForest” [35] and “randomForest” [36]. We started the instrument validation calculations with a CFA to answer RQ1. The aim of this CFA was to investigate the fit of the factor structure of the original scale developed by Kyndt et al. [1] for our data. Due to the poor data fit to the original model structure, we tested the common variance of the intercorrelation matrix using the Kaiser–Meyer–Olkin measure and Bartlett’s test of sphericity. The Kaiser–Meyer–Olkin measure of sampling adequacy provided a value of 0.91, and we found a significant result for Bartlett’s test (p < 0.001), showing that our data were suitable for exploratory factor analysis (EFA; [37]). We conducted an EFA with a robust maximum likelihood estimator to estimate the appropriate number of factors. Only variables with a loading ≥ 0.4 were considered in the respective factor [6, 37]. We then considered a four-factor solution for further analysis. The item OLLO No. 11, “…to fulfil managerial tasks autonomously”, did not fit any of the factors and was therefore excluded from further analyses.

We compared the two models with each other and found the most suitable structural equation model (SEM) using ANOVA. First, we compared the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) of the two models. The smaller these values were, the better the model fit the data. Second, we interpreted the Chi2 differences and their significance. The Chi2 differences of the two models were significant (p < 0.001). Based on these comparisons, Model 1 performed better (Table 4). Then, we calculated the measurement models for each factor of Model 1 (cf. Table 5). The model fit values of each measurement model were found to be satisfactory [38]. For further analysis, we examined the explained variance for each factor and for the overall scale. The reliability of the factors was tested using McDonald’s omega and Cronbach’s alpha to answer RQ2. We assumed omega and alpha ≥ 0.7 to be adequate for a reliable test instrument [39].

Table 4 Comparison of fit indices of the two structural equation models
Table 5 Fit indices of measurement models of the four-factor solution

To answer RQ3, we then investigated the relationship between the WPL factors and the three FKM scales of “medical expertise”; “communication”, and “scholarship” using Spearman’s rho correlation coefficient due to a missing normal distribution of the data. According to the work of Field [40], we assumed a correlation coefficient of r ± 0.1 as a small effect, r ± 0.3 as a medium effect, and r ± 0.5 as a large effect. The significance level was set at p ≤ 0.05. The analyses were conducted with the R package lavaan (version 3.6.1; [41, 42]) and IBM SPSS statistical software (version 28; [43]).

To further investigate the construct validity of the instrument, we calculated an SEM by applying the three FKM scales as predictors of the four WPL factors. In the first step, a measurement model was calculated that covered the four latent variables of informal WPL outcomes. In the second step, a regression model was conducted to add the three FKM scales as predictors to the measurement model [39]. The SEM was calculated by using the software Mplus 8.7 [44] by applying maximum likelihood. The cut-offs of the fit indices for SEM applied by Kline [39] were used here (CFI ≥ 0.90, TFI ≥ 0.95, SRMR ≤ 0.05; RSMEA ≤ 0.08).

Scroll to Top