Exercise: AI simulations require a comprehensive preclinical trial to eliminate safety concerns

An intelligent algorithm used to detect thigh fractures is more effective than human radiologists, but researchers have found flaws to prevent safe use in re -testing, e.g. as in study published in The Lancet.

The researchers evaluated an in-depth study that aims to detect proximal femoral fractures on early X-rays in critically ill patients, studied on data from the Royal Adelaide Hospital in Australia.

They compared the accuracy of the model against five radiologists on data from the Royal Adelaide Hospital, and then performed an external validation study using imaging results from Stanford University Medical Center in the US.

Lastly, they performed an algorithmic analysis to detect any errors.

In the Royal Adelaide study, the area under the host performing the modeling curve (AUC) evaluating the performance of the AI ​​model was 0.994 compared with an AUC of 0.969 for the radiologists. Using Stanford data, the simulation performance was measured at an AUC of 0.980.

However, the researchers found that external validation could not be used in the new configuration without preparation.

“While the artificial intelligence system (AUC) is controlled by external testing, the reduction of heart rate in the pre -selected work area (from 95.5 to 75.0) does not use the System in the modern environment.writing instructors.

While this shift can be mitigated by choosing a new workplace, as shown when we saw the same information and detail in a post-hoc review ( where a small decrease in precision indicates a small reduction in discriminatory action), this is necessary. a localization process that determines the new workplace in the new environment. “

Although the overall model worked well, the study also found that in some cases no human wrongdoing, or unforeseen wrongdoing was not committed by the human radiologist.

“While the model works very well in the performance of detecting proximal femoral fracture when evaluated with summary scores, the model is found to be non -malignant and potentially erroneous. Not in cases that man thinks are easy to explain, ”the authors write.

Why it is

The researchers said the study shows the importance of rigorous testing before implementing AI models.

“The model was more effective than radiologists who had tried and maintained the procedure on external examination, but showed some unexpected limitations during the re -examination. it’s about understanding future testing and decision -making, ”they wrote.


Some companies are using AI to monitor image effects. Last month, Aidoc received two FDA 510 (k) approvals for programs to label and test the potential. pneumothorax and cerebral aneurysms. A public company, Qure.ai, raised $ 40 million in funding shortly after it received FDA approval for a tool to help suppliers set the standard. breathing tubes attached to X-ray boxes.

While proponents of AI argue that it can improve results and reduce costs, research has shown how much of the resources are used to develop these features. from the US and China, may limit their rights in other countries. Bias is also a major concern for suppliers and researchers, as it can increase health risks.

Related Posts

Leave a Reply

Your email address will not be published.