Help or Hindrance?
By Beth W. Orenstein
Vol. 24 No. 7 P. 10
Does AI assist breast imaging? The answer is open to debate.
The American Cancer Society reports that the number of patients diagnosed with breast cancer has been steadily increasing by 0.5% a year. Breast cancer remains the number two cancer killer of women (behind lung), yet the number of deaths from the disease has not risen correspondingly. Many believe that more women are surviving breast cancer in the 2020s thanks, at least in part, to effective screening and early detection. When breast cancer is caught early, survival increases significantly, says Stamatia Destounis, MD, FACR, FSBI, FAIUM, managing partner at Elizabeth Wende Breast Care in Rochester, New York, where she has practiced as a breast radiologist since 1993.
Mammography is a key to early detection, but it is not perfect. According to the National Cancer Institute, screening mammograms miss about 20% of breast cancers. Could AI help make mammography more efficient and effective? That is the big question.
Research is mixed. A headline-making prospective study out of Sweden was published in July in the journal Lancet Oncology: A single radiologist reading with AI assistance found more cancers than two radiologists “double reading” mammograms without AI. Another retrospective study published online in July in Radiology: Artificial Intelligence found that a deep learning tool trained to predict future breast cancer might aid in the detection of precancerous changes in the breast in a high-risk, racially diverse population. However, two recently published studies found that AI was not only not helpful but also possibly detrimental.
One study, performed in a reader study laboratory setting, which was published in May in Radiology, found that false feedback from an AI-based decision support system might impair the performance of radiologists at every level of expertise when reading mammograms. Another study, published in July in the American Journal of Roentgenology (AJR), failed to show the benefit of AI with screening mammography and supplemental breast ultrasound in patients with dense breasts.
Constance Lehman, MD, PhD, a professor of radiology at Harvard Medical School and codirector of the Breast Imaging Research Center at Massachusetts General Hospital in Boston, says breast imaging has a head start when it comes to AI. The FDA approved the first commercial computer-aided detection (CAD) system as a second opinion for screening mammography in 1998.
“So, we’ve had a long time to study it and see the pearls and pitfalls of computers helping radiologists read mammograms,” Lehman says.
Lehman says she’s not surprised that current research reaches different conclusions about the efficiency and effectiveness of AI in interpreting mammograms. Each study, she says, has to be put in perspective based on the specific study design and methods. Lehman says she is excited about what the prospective, randomized Swedish study says about the use of AI in reading mammograms, even though it was done in Europe, where it is standard practice to have two radiologists read every exam. That’s not standard practice in the United States, she says.
The Swedish researchers looked at mammograms of more than 80,000 women who were scanned between April 2021 and July 2022. The mammograms of half the women were read by AI before a radiologist analyzed them. Two radiologists read the other group’s mammograms without the use of AI. All the radiologists in the study were highly experienced. About 20% more cancers were found in the first group (AI and radiologist) than the group read by two radiologists without the additional technology. When AI was used, the cancer detection rate was six per 1,000, compared with five per 1,000 by radiologists alone.
“They stopped halfway in the study to do a safety check,” Lehman notes. “Because they are actually using this clinically, they wanted to make sure that the single radiologist wasn’t finding fewer cancers than their standard of double reads. And they were very pleased to see that, actually, the single radiologist with AI had a 20% increase in cancer detection.”
The recall rate was 2.2% by the single radiologist with AI and 2% by the standard “double reader” system, Lehman says. “Although some believe these findings are not relevant to screening in the US, given we rarely have double reader settings, I think these findings should give us pause. Is it possible we could follow a similar pathway and significantly reduce our false positive rates in the US without a drop in our cancer detection rates? That is intriguing.”
Lehman says some have resisted comparisons between the United States and Europe “because of differences in our medical legal systems and our increased attention to detecting DCIS [ductal carcinoma in situ]. That makes sense,” Lehman says, “but at the end of the day, you have to look at the hard science and the outcomes and say, which program would you rather be in? What a great outcome of low (2.2%) recall rates and high (6.1 cancers per thousand) cancer detection. It’s hard to argue with the success of those results.” Given how important early detection is, anything that helps make radiologists more accurate when reading mammograms is “wonderful,” Destounis says.
The Swedish group that used AI to read mammograms found it reduced reading workload by 44%. Kristina Lång, MD, PhD, an associate professor of radiology diagnostics from Lund University and a principal investigator for the study, says another “apparent potential” of using AI is that it could help reduce the workload of radiologists, which is much needed given the workforce shortage in Europe. “That, however, remains to be assessed in the full study population of 100,000 women and a two-year follow up (primary endpoint),” she says.
More Studies Needed
Lehman says she has no doubt that AI is going to be a central component of image interpretation going forward. “We are past the point of no return. We can’t go back to not having AI. It is so clear that it can help us.” That’s why, she says, studies, such as the one in Radiology that found AI could bias interpretation, are equally as important as the Swedish study. “At this time, many breast imaging practices are using AI tools. They are essentially replacing CAD tools with more modern AI tools. Because it’s becoming more and more routine, we need to know the impact on the outcomes of patients and the accuracy of our mammographic interpretations with and without AI.”
The goal of the Radiology study was to determine how automation bias can affect radiologists at varying levels of experience when reading mammograms aided by AI. Researchers from institutions in Germany and the Netherlands conducted a prospective study in which 27 radiologists read 50 mammograms. The readers provided their BI-RADS assessment, assisted by an AI system. While BI-RADS categorization is not a diagnosis, it is crucial in helping physicians determine next steps.
The mammograms were presented in two randomized sets. The first was a training set of 10, in which the AI suggested the correct BI-RADS category. The second set contained incorrect BI-RADS categories, purportedly suggested by AI, in 12 of the remaining 40 mammograms. The radiologists were significantly worse at assigning the correct BI-RADS scores in the cases where the purported AI suggested an incorrect BI-RADS category. Inexperienced radiologists assigned the correct BI-RADS in nearly 80% of cases in which the AI suggested the correct BI-RADS category. When the AI system suggested the incorrect category, their accuracy fell to less than 20%. The results were only slightly better with experienced radiologists—those with more than 15 years of experience on average. They saw their accuracy fall from 82% to 45.5% when the purported AI suggested the incorrect category.
Study lead author Thomas Dratsch, MD, PhD, from the Institute of Diagnostic and Interventional Radiology at the University Hospital Cologne in Germany, says the researchers anticipated that inaccurate AI predictions would influence the decisions made by the radiologists in the study, particularly for those with less experience. However, in a press release about the study, he said, “It was surprising to find that even highly experienced radiologists were adversely impacted by the AI system’s judgments, albeit to a lesser extent than their less seasoned counterparts.” The researchers say their results show why the effects of human-machine interaction must be carefully considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI.
No Additional Benefit?
The AJR study, published in July, included 1,325 women (mean age, 53 years) with dense breasts who underwent both screening mammography and supplemental breast ultrasound within a one-month interval from January 2017 to December 2017. Prior mammograms were available to compare 91.2% of exams, while prior ultrasounds were available to compare 91.8%. Fifteen radiologists (five who were on staff and 10 who were fellows) interpreted mammography and ultrasound examinations. A commercially available AI algorithm was used to retrospectively evaluate mammographic examinations for cancer presence. Screening performances were then compared among mammography, AI, ultrasound, and test combinations, using generalized estimating equations.
A benign diagnosis required at least 24 months of imaging stability. Ultimately, mammography with AI, mammography with ultrasound, and mammography with both ultrasound and AI showed recall rates of 14.9%, 11.7%, and 21.4%; sensitivity of 83.3%, 100%, and 100%; specificity of 85.8%, 89.1%, and 79.4%; and accuracy of 85.7%, 89.2%, and 79.5%, respectively. The researchers concluded that, for patients with dense breasts undergoing screening in the incidence setting, a commercial AI tool did not provide additional benefit to mammography with supplementary ultrasound.
Destounis says that because the study is a retrospective review and not a prospective one, the findings may be skewed. “The radiologist had knowledge of the mammogram prior to the ultrasound interpretation,” she says. Also, she notes when reviewing the results, it appears as though AI “did quite well.” Mammography detected eight cancers, including four invasive ductal carcinoma (IDC) and four DCIS; standalone AI detected nine cancers, including five IDC and four DCIS; and ultrasound detected eight cancers, including six IDC and two DCIS. Standalone AI detected one IDC and one DCIS that were not identified by mammography and three DCIS that were not detected by ultrasound. “So, I would say that AI would have been helpful if used in conjunction with mammography or ultrasound imaging,” Destounis says.
Si Eun Lee, MD, PhD, of Yonsei University College of Medicine in Yongin, Korea, the lead author of the AJR study, says the use of ultrasound is powerful in Korea due to a higher prevalence of dense breasts in its population. “Prior studies have demonstrated that AI can help in detecting more cancers and can even enhance the specificity of mammography,” she says. “A significant portion of this can be addressed by ultrasound, even if it demands extensive manpower. The second distinction is that our study population is almost exclusively in an incidence setting. Compared to older studies, this could boost the performance of mammography and even ultrasound. Hence, in this context, the utility of AI might be somewhat restricted.”
Lee believes that even in populations screened solely by mammography without ultrasound, AI proves beneficial in reducing missed cancers and decreasing false positive recalls. “For this, it’s vital to understand the characteristics of the AI program in use,” she says. For example, Lunit Insight, which is used often in Korea, has an acceptable false positive rate that is similar to or sometimes lower than that of radiologists. “Users should consider the findings from AI and weigh them alongside other diagnostic tools,” she says. “It’s also essential to recognize which types of cancers are commonly overlooked, such as circumscribed masses, grouped calcifications, or subtle asymmetries.”
Could AI be modified in a way that would make it more effective in this context? Lee says she is aware of one of the many AI programs that evaluate performance based on comparisons with prior studies, but most don’t. “I believe such a feature could help reduce false recalls,” she says. “Also, continuous refinement with complex cases might boost the program’s accuracy. However, mammography has an inherent limitation due to the masking effect. This means AI will have its limitations, necessitating support from supplementary tools such as ultrasound or MR.”
Lehman says AI has the potential to help radiologists provide more accurate interpretations of mammography and other breast imaging exams. She believes AI is “showing great promise” but that it’s not ready—just yet—to be a standalone tool for interpretation.
Destounis adds that, as more research and more AI algorithms are developed, “I see AI being used with success in many modalities, not just mammography.”— Beth W. Orenstein, of Northampton, Pennsylvania, is a freelance medical writer and a frequent contributor to Radiology Today.