Copyright 2008 American Medical Association. All Rights Reserved. Applicable FARS/DFARS Restrictions Apply to Government Use.2008
To investigate pediatric residents' efforts to assess understanding in discussions about positive newborn screening test results. Newborn screening saves lives, but confusion about false-positive and carrier results often leads to psychosocial problems.
Explicit-criteria abstraction of transcripts of encounters with standardized parents of a fictitious infant found to carry cystic fibrosis or sickle cell hemoglobinopathy.
Simulated doctor-patient encounter.
Pediatric residents participating in an educational workshop on how to inform parents about positive newborn screening test results.
Main Outcome Measures
Abstraction used an explicit-criteria data dictionary with definitions for 5 different ways to assess understanding. A “partial” designation was used for leading syntax or no pause for response.
Interabstractor reliability over 59 transcripts (2 per resident)
was κ = 0.93. Only 26 of 59 transcripts (44.1%) met definite criteria for at least 1 assessment of understanding. Most assessments were the less effective close-ended (37.3% of transcripts)
and “OK?” question types (32.2% of transcripts). Only 3 transcripts met definite criteria for an open-ended assessment and no transcripts included a request for a teach-back, the type thought to be most effective. Four transcripts (6.8%) included an advance request for questions. With partial-criteria assessments included, an additional 31 transcripts (52%) were identified.
The small number of assessments of understanding and the high fraction of less effective assessments do not bode well for parental understanding, especially for parents with limited health literacy. Training programs should address assessments of understanding, but quality improvement activities using these types of assessment methods may also be needed.
Communication is said to be the “main ingredient” of medical care1 and “the vehicle by which technical care is implemented and on which its success depends.”2(p1744) Unfortunately, communication problems have been documented across much of health care.3- 12 Medical schools and residency programs provide training for communication skills,13 but without periodic reinforcement, it is unclear whether the benefits of training will persist for long. It also seems unlikely that physicians already in practice will have time or resources necessary for either training or refresher courses. We have therefore been working to adapt techniques from quality improvement to the needs of communication.14- 17 Quality improvement has a track record for changing physician behavior on a population scale and uses fewer resources than are necessary for educational programs. We use quality indicators to operationalize individual communication behaviors. Quality indicators are explicitly defined, quantitatively reliable measures of ideal clinical behaviors that each represent a small domain within the overall quality of care.18,19 In this study, we used a quality indicator method to analyze pediatric residents' efforts to assess parent understanding during a discussion about a newborn screening result.
Assessment of understanding questions (AU) (Table 1) is important for those many situations when a physician needs to explicitly verify a patient's knowledge about a topic or that a message has been successfully conveyed.4,20- 27 Assessment of understanding questions is especially critical for patients with limited health literacy, patients who are asked to adhere to a complicated treatment plan, and patients who might not be displaying their feelings because of a sense of shock or for cultural reasons. Physicians can also include a statement early in each encounter encouraging the patient to interrupt if questions arise (an “advance request for questions”).
We are especially concerned about use of AUs after newborn screening. Nearly every US newborn is screened for a panel of life-threatening diseases,28,29 but most “abnormal” results turn out to be false positives or reveal that the infant is a genetic carrier for cystic fibrosis (CF) or sickle cell hemoglobinopathy (SCH). Some authors have referred to these infants as having a “nondisease” since they do not become sick but their parents can develop psychosocial complications such as anxiety, depression, or stigmatization or misconceptions about whether the infant has a disease.30- 38 Confusion could be partially due to primary care physicians' role in informing parents, since communication in primary care has been criticized by families39 and public health officials.40 Newborn screening programs provide educational materials to parents and physicians, but in a recent survey, we found that most programs lack a follow-up mechanism to monitor psychosocial outcomes.40 Families of carrier infants can be helped by genetic counselors, but the supply of counselors is limited enough that most families will not be able to access their services.41
These communication failings and psychological complications are often cited by ethicists and policy experts in arguments against the routine use of genetic and molecular screening technologies. To keep newborn screening viable, we have argued that the screening programs or referral centers should introduce population-scale programs to assess and improve the quality of communication services and parents' psychological outcomes.16 These types of programs may ensure that newborn screening results in more good than harm, similar to the way that biomedical follow-up mechanisms ensure that infants with diseases receive appropriate care and have favorable outcomes.40 Once developed for newborn screening, population-scale communication quality-assurance techniques may be applicable to many other communication problems in health care.
To determine the frequency and type of AUs, we developed a new explicit-criteria procedure to abstract transcripts of conversations between pediatric residents and standardized parents of a fictitious infant whose newborn screening test results suggested carrier status for CF or SCH. Methods were approved by institutional review boards at Yale and the Medical College of Wisconsin.
The participants were all residents beyond their first year in a prominent pediatric residency program. Residents were scheduled in groups of 4 to attend a workshop entitled “Communicating Genetic Test Results: Newborn Screening for Cystic Fibrosis and Sickle Cell Hemoglobinopathy.” The residents were told that the objectives of the workshop included to “learn about newborn screening” and “practice your communication with 2 simulated patients.” Each workshop began with a 10-minute review of newborn screening, CF, SCH, and autosomal recessive inheritance. The review avoided any discussion of how to discuss results. The workshops were part of the official curriculum, but residents gave informed consent and were free to decline the use of their tapes in research.
Each resident was audiotaped in 2 separate encounters. A handout described the screening result and some background data but did not prompt how to inform the parent. In the SCH carrier scenarios, the handout reported a screening result of hemoglobin F, A, and S, a result that had been presented in the review session as definitely indicating that an infant is an SCH carrier. In the “likely CF carrier” scenarios, the handout reported an elevated screening immunoreactive trypsinogen level and the presence of 1 ΔF508 mutation with no multiallele follow-up screening. The review sessions had presented such a result as suggesting that the infant was probably a carrier but still had a 5% to 10% chance of having the disease because of an undetected allele.42 The infant's mother and father were portrayed in the scenario as both being adopted so that the session could focus on risk communication rather than on taking a family history.
During the year, 6 female standardized parents worked on the project; each was chosen to plausibly depict the age and ethnicity of a mother of an infant with CF or SCH. The order of encounters for each resident was randomly determined in double-crossover fashion over the 2 diseases and 2 script types. The 2 script types were included for a separate validation study of the new script type, the Brief Standardized Communication Assessment. The Brief Standardized Communication Assessment is a streamlined version of a standardized patient script that requires less training, discourages improvisation, and encourages counseling to continue via nonleading questions like “Is there anything else I should know?” The comparison script resembled scripts used in most educational settings, which seek to make the encounter realistic by asking the patient to use acting skills and improvise a 2-way dialogue.43 All standardized parents were coached not to appear anxious, to avoid requests for clarification, and to adopt a “polite nod” body language that does not necessarily denote understanding. These instructions helped our analysis to focus on AUs rather than the resident's ability to reclarify for parents who appear mystified (an important skill but not the subject of this study).
We taped 32 residents in 2 interviews apiece, although taping equipment failed during part of 5 interviews so the final sample consisted of the remaining 59 transcripts. Tapes were transcribed verbatim and proofread for accuracy by a board-certified pediatrician. We used a sentence diagramming procedure to divide transcripts into individual “statements,” each with 1 subject and 1 predicate.
We adapted the transcript abstraction procedure from methods used in medical record review.44 Abstractors were guided by a quality improvement–style data dictionary derived from existing guidelines.20,21,25,27 The data dictionary contained explicit-criteria definitions and examples for 4 different types of AUs (Table 1) and for the advance request for questions. For a “definite” designation, the AU had to be followed by a pause for parent response, even for a simple continuer like “uh huh.” A “partial” AU designation was used for AUs that were not followed by a pause or for AUs that used a leading syntax (eg, “You don't have any questions, do you?”). The “OK?” question AU was only allowed for instances when the word “OK” was transcribed as an interrogative because of a raising of voice pitch.
Half of the transcripts were duplicatively abstracted to assess interabstractor reliability, and following the suggestion by Feinstein,45 half of the duplicates were discussed afterward to ensure quality control and consistency. Discrepancies between abstractors were automatically resolved by a spreadsheet to avoid subjective judgment.
Statistical analysis was done using JMP software (SAS Institute, Cary, NC). Interabstractor reliability was calculated using the Cohen method with an adaptation for ordinal agreement.46 Variables, such as the presence of the various AU types, were analyzed using 1-way analysis of variance for continuous responses for categorical variables and the χ2 test for grouped categorical responses.
Quality improvement interventions often involve feedback to a targeted subgroup most in need of improvement, so as with our earlier studies,14,15,17,47 this analysis included a procedure for adapting data for clinician feedback. The problem with feedback about AU questions is that there is no AU taxonomy that would be familiar to most clinicians. To help categorize AU quality indicator performance for future interventions, scores were grouped ordinally using the familiar letter grades of A through F, with C and lower grades denoting clinicians in need of improvement. Comparisons across transcripts were made using a grade point average system of zero to 4 points. Given the recent recommendations for requests for a teach-back in the health literacy literature,21 a definite-criteria example of this AU type was used to define an A grade. At the other end, an F grade was used for transcripts that contained no AUs of any type and the D grade, for AU attempts such as “OK?” questions or partial-criteria AUs. This left the C grade for close-ended AUs, and the B grades for open-ended AUs.
Descriptive data on the participants (Table 2) were similar to those of the population of the residency program at the time of the study. Interviews ranged from 5 to 20 minutes (mean, 9.8 minutes). A total of 144 full AUs and 551
partial AUs were identified in 102 135 words of transcribed conversation. Interabstractor reliability corrected for chance was κ = 0.93, with separate coefficients for AU types of 1.0 for advance request for questions, 0.33 for open-ended questions (which were rare), 0.85
for close-ended AUs, and 0.95 for “OK?” questions.
Only 26 of 59 transcripts (44.1%) met definite criteria for at least 1 definite AU (Table 3). Among these, the mean (SD) number of AUs per transcript was 5.5
(7.6). Definite criteria for an advance request for questions were found in 4 of 59 transcripts (6.8%). There were no statistically significant associations between AU number and the resident's sex or year in residency, duration of counseling, or a screening result of CF or SCH carrier status (power was 85% to detect a difference of 6 AUs).
The percentages of transcripts meeting definite criteria for each AU type are shown in Table 3. Only a few transcripts contained definite criteria for an open-ended AU, and no transcripts contained definite criteria for a request for a teach-back. The less effective close-ended and “OK?” question AUs were far more common. Fortunately, the “OK?” question was the sole type of AU in only 3 of 26 transcripts with AUs (11.5%).
As mentioned earlier, the definite/partial/absent scheme helped to differentiate between AUs and statements that may have been intended as AUs but were less likely to be effective. When abstractors' partial ratings were considered, at least 1 AU “attempt” was seen in an additional 31 transcripts (52.5%), so that 57 of 59 transcripts included either definite or partial criteria for at least 1 AU. The overall mean (SD) number of AUs increased with partial criteria to 12 (14.9) per transcript but there were still no statistically significant associations with case or resident characteristics. The AU type most responsible for the increase with partial criteria added was the “OK?” question, which was seen with no pause for response in 21 of the 31
new transcripts (63.6%), with a range of up to 63 instances in a single transcript.
Under the letter-grade definitions proposed earlier, the mean (SD) grade point average would have been 1.4 (0.7), a D+ average. Residents were moderately consistent across their 2 transcripts; 22
residents (73%) had the same letter grade for both transcripts, while 6 (20%) differed by only 1 grade. No transcripts would have qualified for an A grade because none of the residents requested a teach-back from either of the standardized parents. Three transcripts (5%) would have received a B grade, 20 transcripts (34%) had criteria for a C grade, and 34 transcripts (58%) had criteria for a D grade. Two transcripts would have been rated with an F grade with no partial or definite AUs at all.
Thus, if transcripts qualified for targeted feedback on the basis of a C or lower grade, then a total of 56 individual transcripts (94.9%) would have received feedback, or all of the residents on at least 1 of their transcripts. This situation is anything but the “targeted” feedback situation preferred in many quality improvement projects.
The purpose of this study was to investigate pediatric residents'
efforts to assess parental understanding during discussions about a positive newborn screening test result. Most transcripts included at least 1 AU question, but most AUs were the close-ended or “OK?” question type. Only 3 residents included an open-ended AU and none of the residents asked for a teach-back, the communication behavior thought to be most effective at assessing understanding.21
These findings have important implications for medical education, newborn screening, and health care in general. Physicians are often asked to explain complicated subjects to patients, but unless patient understanding is explicitly verified, the physician may never know if counseling was effective. Unfortunately, the stakes are high; in the case of newborn screening, misunderstandings can lead to anxiety, depression, vulnerable child syndrome, stigmatization, or reproductive or medical decision making that the parents later come to regret.30- 38 Physicians may be less oblivious to these complications if they can be taught to include AU questions in counseling. Such a focused educational strategy seems more like the often-disparaged “bag-of-tricks” model of communication than like the global paradigm that is most common in educational settings. The difference between focused and global perspectives was also anecdotally evident in the educational workshop aspect of the project, in which taped conversations were evaluated much more favorably than they might have been if preceptors had had access to AU quality indicator scores at the time of evaluation. If such discrepancies between global and specific perspectives are found in future studies, medical schools and residency programs should consider implementing more behavior-specific teaching and evaluation methods.
On a population scale, where a traditional educational approach would require too many resources and too much time from physicians, our methodological approach may finally enable quality improvement professionals to assess and improve communication. To evaluate the feasibility of our methods, we tracked time and expenses and project that quality improvement projects in the future will be achievable with minimal training and costs less than $50 per physician. Use of an explicit-criteria abstraction technique instead of a subjective rating scale should help reduce bias and improve interabstractor reliability and the ability to compare or rank physicians; the κ coefficient of 0.93 is excellent for a social science method at an early stage of development.
This study was limited by its small sample size, but we see some limitations as strengths from a quality improvement perspective. Qualitative methods would have provided a richer description of conversation, but qualitative methods have limited reliability and would be prohibitively expensive for use in quality improvement. The use of standardized parent encounters for quality improvement projects instead of real parents avoids logistical, privacy, and consent difficulties associated with audiotaping actual clinic visits. Simulation is also useful because a sense of observation prompts clinicians to greater efforts; the resulting competence data approximate a ceiling for process of care because competence is necessary but not sufficient for performance.48 Simulation also allows an equal-footing comparison across clinicians that would be impossible in analyses of actual encounters. Even so, we reduced artificiality of simulation by holding the workshops in the residents' continuity clinic. We saw residents as ideal for this demonstration project since many are near the peak of their content knowledge on these topics.
Further study will be necessary to investigate a possible relationship between AU use and communication outcomes, but such studies will need to examine more complicated variables than whether an AU is included anywhere in conversation. We have therefore developed a new “load” calculation to represent patterns of AU placement in conversation (M.H.F. and S. Christopher, MA, unpublished data, 2007). Until more research has been done about AU placement relative to informational content, it is important for all health care providers to at least recognize that guidelines advise assessment of understanding at several points in conversation.4,20,22,24- 27
In sum, the quality of communication observed in this study was marked by many missed opportunities to assess understanding, suggesting to us a route by which many parents will develop psychological problems. Communication about newborn screening is challenging because of differences in testing methods, diseases, and families,42 and psychosocial complications of poor communication are in part inhibiting society from implementing new screening programs. If integrated into US newborn screening practices, communication quality assurance may benefit clinicians and families and help satisfy society's ethical concerns about the risk and benefit with expanded newborn screening and genetic screening. Other communication quality assurance programs can also be developed to improve communication processes and outcomes in every other area of health care.
Correspondence: Michael H. Farrell, MD, Center for Patient Care and Outcomes Research, Medical College of Wisconsin, 8701 Watertown Plank Rd, Milwaukee, WI 53226 (email@example.com).
Accepted for Publication: July 17, 2007.
Author Contributions:Study concept and design: Farrell. Acquisition of data: Farrell and Kuruvilla. Analysis and interpretation of data: Farrell and Kuruvilla. Drafting of the manuscript: Farrell and Kuruvilla. Critical revision of the manuscript for important intellectual content: Farrell. Statistical analysis: Farrell and Kuruvilla. Obtained funding: Farrell. Administrative, technical, and material support: Farrell. Study supervision: Farrell.
Financial Disclosure: None reported.
Funding/Support: Dr Farrell is supported in part by grant K01HL072530 from the National Heart, Lung, and Blood Institute.
Additional Information: When the project began, Dr Farrell was assistant professor at the Yale Primary Care Medicine Research Center (Waterbury, Connecticut) and Dr Kuruvilla was a medical student in the Yale University School of Medicine (New Haven, Connecticut).
Additional Contributions: Jeffrey Stein, MD, Lynnea Ladouceur, MPH, and Stephanie Christopher, MA, provided valuable advice and assistance.
Farrell MH, Kuruvilla P. Assessment of Parental Understanding by Pediatric Residents During Counseling After Newborn Genetic Screening. Arch Pediatr Adolesc Med. 2008;162(3):199-204. doi:10.1001/archpediatrics.2007.55