A new report in the Journal of the American Medical Informatics Association has shown that real-world data contained in unstructured narratives has big predictive value when it comes to clinical research.
WHY IT MATTERS
While structured clinical notes in the electronic health record have obvious value, the research in JAMIA suggests that real-world data captured in unstructured notes offers more accuracy when trained algorithms are used to mine it.
While the challenges of making good use of unstructured data have been well-documented. And indeed, researchers in this case depended on artificial intelligence technology from Verantos (whose founder, Stanford professor Dr. Dan Riskin, was an investigator on the study) to mine it for insights. The details contained in these EHR narratives, with their real-world insights into patient history, conditions, procedures and more, were more useful in predicting coronary artery disease.
“With growing availability of digital health data and technology, health-related studies are increasingly augmented or implemented using real world data,” wrote the researchers, led by Tina Hernandez-Boussard, associate professor of biomedical informatics, data science and surgery at Stanford University School of Medicine.
“Recent federal initiatives promote the use of RWD to make clinical assertions that influence regulatory decision-making,” the researchers said. “Our objective was to determine whether traditional real world evidence techniques in cardiovascular medicine achieve accuracy sufficient for credible clinical assertions, also known as ‘regulatory-grade’ RWE.”
For the retrospective observational study, which used six years’ worth of deidentified EHR data, a specified set of clinical concepts was mined from both structured (using standard query techniques) and unstructured EHR data (using AI).
“The dataset included 10,840 clinical notes,” researchers explained. “Individual concept occurrence ranged from 194 for coronary artery bypass graft to 4502 for diabetes mellitus.”
Granular insights such as those helped the real-world evidence in the narrative notes correspond to more accurate predictive modeling, they found.
With structured EHR data, or EHR-S, “average recall and precision were 51.7% and 98.3%, respectively,” according to the report. For unstructured data (EHR-U) those numbers were 95.5% and 95.3%.
Researchers concluded from the research that, “overall, EHR-S did not meet regulatory grade criteria, while EHR-U did. These results suggest that recall should be routinely measured in EHR-based studies intended for regulatory use. Furthermore, advanced data and technologies may be required to achieve regulatory grade results.”