Hovering at just 11 percent, pancreatic cancer has the lowest five-year relative survival rate of any cancer diagnosis. Those grim odds are largely because the disease is typically caught in its advanced stages. If caught in its earliest stages, five-year survival rates can reach as high as 80 percent; however, current screening guidelines apply only to a very small fraction of the 62,000 pancreatic cancer cases that are diagnosed each year in the United States.
Investigators at Harvard-affiliated Beth Israel Deaconess Medical Center (BIDMC), in collaboration with colleagues at Massachusetts Institute of Technology, built and validated a risk prediction model to help physicians identify patients who are at high risk for developing pancreatic cancer. The team’s model, a neural network trained on de-identified data from electronic health records from 55 U.S. health care organizations, flagged patients as at risk of developing pancreatic cancer up to 18 months before diagnosis in patients 40 years or older and caught 3.5 times as many cases than current screening guidelines would if applied to the same group. Their findings appear in eBioMedicine, part of Lancet Discovery Science.
“Most cases of pancreatic cancer are not detected until they are already advanced, and they are no longer curable,” said Limor Appelbaum, an investigator at BIDMC and an instructor at Harvard Medical School. “The people who have the opportunity to undergo screening represent about 10 percent of the pancreatic cancer cases that we know of. That’s a very small proportion. We’re trying to catch as many of the other 90 percent of cases as we can.”
Current screening guidelines target only people with an inherited predisposition to pancreatic cancer — people who have either first-degree relatives with the disease or a known gene mutation that puts them at risk of developing it. With an estimated relative risk of developing pancreatic cancer that is at least five times higher than that of the general population, these patients are eligible for yearly screenings, typically MRI scans, “known to be very effective and strongly correlated with far better survival rates,” Appelbaum said.
“The people who have the opportunity to undergo screening represent about 10 percent of the pancreatic cancer cases that we know of. We’re trying to catch as many of the other 90 percent of cases as we can.”Limor Appelbaum
That’s why she and colleagues turned to electronic health record (EHR) data to find more people who would likely benefit from earlier screening.
“There are signals in the data that’s being routinely collected already when people see their primary care physician or go to the ED with a broken ankle—symptoms that show up, such as certain medications or changes in lab values,” Appelbaum said. “Taken together, these are all signals that can predict pancreatic cancer before the cancer is actually detected, and that gives us the opportunity to catch those cancers early, before it has spread.”
Named PrismNN, the team’s machine learning model was trained on data from more than 1.5 million EHR provided by industry partner TriNetX. The data set included an average of 13 years of historical data about demographics, doctor’s visits, diagnoses, lab work, procedures and medications for more than 35,000 patients who eventually developed pancreatic cancer and more than 1.5 million controls. The model flagged patients at high risk for developing cancer based on 87 features it automatically selected based on the input training data.
“Thanks to the depth and breadth of the data set PrismNN was trained on, it can be applied anywhere in the U.S. because it includes data from our nation’s diverse population,” Applebaum said. “The idea is to bring this capability into every clinic, to every computer for every physician, whether they are working in a tertiary hospital in Boston or in a small community clinic in the Southwest, so it’s a really critically important feature of our model.”
To further validate the model, Appelbaum and colleagues are assessing its real-time accuracy by allowing PrismNN to sort patients into low-, intermediate- and high-risk groups, as well as follow their outcomes. Additionally, the investigators are inviting patients flagged by the model to participate in studies to search for common, quantifiable physical characteristics called biomarkers, that could signal predisposition to pancreatic cancer. In this way, PrismNNcould serve on its own as eligibility criterion for annual screening or serve as an initial filter for people with selected biomarkers who would then go on to traditional pancreatic cancer screening.
“Our approach enables potential expansion of the population targeted for screening beyond the traditionally screened minority with an inherited predisposition,” Appelbaum said. “Our PrismNN model sets the stage to identify more high-risk patients.”
Co-authors included Irving D. Kaplan of BIDMC; lead author Kai Jia, and co-senior author Martin Rinard of MIT; Steven Kundrot, Matvey B. Palchuk, Jeff Warnick, and Katherine Haapala of TriNetX, LLC.
This work was funded by Prevent Cancer Foundation, TriNetX, Boeing, DARPA, NSF and Aarno Labs. Kaplan and Applebaum are not aware of any payments or services paid to themselves or BIDMC that could be perceived to influence the submitted work. Please see the published paper for a complete list of declared interests.