Who’s Minding the AI?
By Beth W. Orenstein
Radiology Today
Vol. 23 No. 1 P. 22

Experts highlight the need to regulate AI/ML devices in radiology.

Although the immediate-past secretary of the US Department of Health and Human Services had submitted a proposal to exempt certain medical devices from the FDA’s 510(k) premarket notification requirements, it did not happen. Devices that would have been covered by the exemption include several types that radiologists use in patient care, such as AI/machine learning (ML)-enabled software for computer-assisted/aided triage, detection, or diagnosis. After several health care organizations, including the ACR, RSNA, and the Society for Imaging Informatics in Medicine, expressed their deep concerns, the Biden administration announced in April 2021 that it had withdrawn the controversial proposal. The proposal’s withdrawal has now raised questions about how AI/ML in medicine and particularly in radiology should be regulated.

The question that deserves much attention, says Raymond Geis, MD, MD, FSIIM, FACR, an adjunct associate professor of radiology at National Jewish Health in Colorado and a senior scientist at the ACR Data Science Institute, isn’t whether AI/ML in medicine should be regulated. The answer to that is “absolutely,” and few would argue, he says. “AI/ML software is helping make decisions that affect how a patient is cared for or a patient’s health. You need some sort of regulation to do that. The real question is: ‘What kind of regulation?’”

The FDA largely regulates software based on what it is intended to be used for and, in the event of inaccuracies, the level of risk to patients. The FDA considers the software a “medical device” if it is to be used to treat, diagnose, cure, mitigate, or prevent disease or other conditions. It categorizes products that are considered to be medical devices and rely on AI/ML as “software as a medical device.” Examples of FDA-cleared or approved AI-enabled products include: IDx-DR, software that analyzes images of the eye to determine whether the patient should be referred to an eye professional; OsteoDetect, software that analyzes X-rays for signs of distal radius fracture, marking the location to aid in its detection and diagnosis; and ContaCT, software that analyzes CT images of the brain for indications typically associated with a stroke and immediately texts a specialist if a large vessel blockage is suspected and identified.

Most devices that include AI software tools fall under Class II devices: those considered to be moderate to high risk. Most Class II devices undergo 510(k) review, which requires the manufacturer to demonstrate its device is “substantially equivalent” to an existing device on the market, with the same intended use and technological characteristics.  Some Class II device manufacturers can submit a De Novo request to the FDA. A De Novo request can be used for devices that are novel but considered to be lower risk because their safety and underlying technology are well understood.

Jan Witowski, MD, PhD, a postdoctoral research fellow in the department of radiology at New York University Grossman School of Medicine, believes Class II is sufficient for most AI/ML-enabled devices. In an ideal world, he says, no regulation would be necessary. However, he notes, medical devices, especially those that are AI-enabled, are inevitably complex and at least some oversight is necessary because neither their manufacturer nor their end users can be expected to know all the potential risks associated with them. “There’s a lot of discussion whether the current regulations are tight enough,” Witowski says. “But the current regulations for these devices are very strict compared with what happens in the research level.”

Geis says regulation of AI/ML devices is a “real tough problem at the moment.” He believes “there is an almost unanimous feeling these things have to be validated, at least among the clinical community.” But he also says, “This question of how we are going to define what validated means is up in the air. We don’t have answers yet.”

More Rigor Needed
Bibb Allen, MD, FACR, a radiologist in Birmingham, Alabama, and CMO of the ACR Data Science Institute, believes that the requirements for FDA approval of AI/ML software are not particularly rigorous. The AI/ML products don’t have to go through randomized clinical trials, as do pharmaceuticals and other devices. These trials are expensive and time-consuming, making them impractical in the case of AI/ML software, he says. But he would at least like to see the FDA require manufacturers to be more transparent when seeking approval of their software. “The problem is that the FDA is not requiring AI/ML software developers to expose all the data they have—not how many separate sites validated their software, not the demographic data, not which manufacturers’ scanners it was tested on, and not what protocols they used,” Allen says.

Such transparency is critical for patient safety when imagers use the software, Allen says. For example, AI/ML software designed to detect lung nodules on CT scans may have been trained exclusively on an adult population. “Lung nodules in children tend to have a different appearance and tend to be smaller than those we find in adults,” Allen says. “When those algorithms are used in settings in hospitals where they’re doing both adults and children, the model breaks down.”

Allen also believes that approvals need to be particularly hard to obtain when it comes to AI/ML software that makes decisions autonomously—without input from qualified physicians. AI making patient care decisions autonomously could be putting patients at risk, he says. “An AI device might not find the intracranial hemorrhage, but if a radiologist reviews it and finds it, then the patient is still good,” Allen says. To date, the IDx-DR is the only FDA-approved software that does not require physician input. Primary care physicians use it to compare photos of the patient’s retina and, if they suspect diabetic retinopathy, they refer the patient to an ophthalmologist, he says. “If that happens, the ophthalmologists will do their own exam and check the results of the AI,” Allen says. “So, the patient is not really going to go on to have downstream medical care as a result. We don’t consider that a patient safety risk.”

Standardizing Datasets
Hari Trivedi, MD, an assistant professor of radiology and biomedical informatics and codirector of the HITI Lab at Emory University in Atlanta, agrees with Allen when it comes to transparency. “Right now, the challenge with FDA approval for AI models is that each company submits their own data set for evaluation,” Trivedi says. “There is little guidance for what types of patients, diseases, racial, or geographic representation must be included in that dataset.”

The playing field could be leveled, he says, if AI models could be tested against a single, standardized dataset that was representative of the target population. “If you had one standardized dataset to use for testing models with common use cases, the performance of all models could be compared against a standard benchmark,” Trivedi says. Ideally, an organizing body would create and manage a reference dataset for testing common AI models, “but the challenge is that a dataset would need to be created for each use case in radiology,” he adds.

The idea for standardized reference data for the FDA isn’t as preposterous as it might seem, Trivedi says. The idea has been discussed in the radiology AI community for some time. “It’s my opinion,” Trivedi says, “that it is very plausible that the FDA would eventually go in that direction. It would not be without challenges, such as who gets to decide what the standard dataset is, who curates it, and what it would be used for. It is fraught with many challenges, but I don’t think that the current method of each company submitting a one-off for their own FDA clearance is the ideal method.”

Monitoring and Retraining
Bradley J. Erickson, MD, PhD, a radiologist at the Mayo Clinic in Rochester, Minnesota, and chair of the American Board of Imaging Informatics, says it is critical to not only validate results at the time of product acceptance and deployment but also have an ongoing monitoring and possibly retraining mechanism in place. “Nonimaging AI tools have already shown a tendency to drift, even with seemingly codified textual input, and that is likely to be more of an issue with images and the installation of new imaging devices,” Erickson says.

In addition, the rigor for conducting validation activities for software (including AI) should be “risk based,” ie, the higher the risk, the more important it is to have an independent review, says David Vidal, JD, regulatory director of the Center for Digital Health-Data & Analytics at Mayo Clinic. (Vidal was the lawyer who got the first FDA clearance for an AI product; he was with IDX before joining Mayo.) AI devices that are at a risk level that necessitates FDA notification are generally required to submit formal clinical investigation results, and FDA typically looks for at least three independent sites that are geographically and demographically diverse to be represented, Vidal says.

How safe is safe enough when it comes to AI/ML device approval? Because each AI device is unique in its intended use and deployment, it would be difficult for the FDA to establish a general standard of safety for all AI, Erickson says.

As a result, “safe enough” is mostly defined by the AI manufacturer after conducting a risk analysis and determining that the benefits of the AI outweigh the potential risks, he says. Safety and efficacy thresholds can be communicated using certain metrics, such as sensitivity and specificity or agreement with the physician. “But, for AI especially, continuing safety requires trained people, robust processes, and the right tools, such as real-world monitoring of the AI in practice, incorporating necessary changes based on issues or complaints, and reviewing the circumstances surrounding the AI to ensure the environment doesn’t change the input to the AI—eg, a new CT manufacturer.”

Also, Vidal says, it is necessary to ensure the output of the AI continues to align with the clinical standard of care, which also may change, and revalidating whether, for example, any material changes are made to the AI or its associated software/third party tools used to deploy the AI. Most of the decisions regarding safety and efficacy are specific to the particular product’s intended use and surrounding circumstances, Vidal says. “As a result, the FDA cannot make all decisions.” Instead, FDA regulations define general principles and practices that must be followed, eg, risk management and lifecycle management, “so that medical device manufacturers can effectively communicate with the FDA that they have done their due diligence for assuring safety and efficacy for their particular device.”

More FDA Guidance
The FDA is releasing guidance increasingly often. In October 2020, it released guiding principles for good machine learning practice in medical devices. In November 2020, it released guidance on the content of FDA submissions for software medical devices. The FDA also recently issued two proposed frameworks on how to regulate AI in the future. One is a pilot precertification program, which Erickson likens to airport “precheck”; companies demonstrating a robust quality culture and organizational excellence could be qualified for a streamlined review of their FDA submission.

The other is a proposed framework for modifications to AI. It would help AI manufacturers establish a predetermined change control plan with the FDA that would allow AI to be more adaptive, without needing to go back to the FDA. “Elements of the predetermined change protocol framework are already being used in cleared AI devices,” Vidal says.

AI models need to be constantly updated and improved over time. Which improvements will require reapproval from the FDA? Trivedi expects that the FDA would err on the side of caution and require new approvals to the model when manufacturers make substantial changes. But is changing the location of one button to perform a particular function a substantial change? Does it affect the way the user interfaces with the software? Those are the types of questions that the FDA will have to decide, he says.

— Beth W. Orenstein, of Northampton, Pennsylvania, is a freelance medical writer and regular contributor to Radiology Today.