Explore By Subject Area   

How An Informatics Expert at University of Chicago Tackles Data Standardization

Samuel Volchenbaum, MD, PhD, describes how he approaches data interoperability as a grassroots effort at the University of Chicago, encouraging adoption from the bottom-up with other physicians.

February 18, 2021
How An Informatics Expert at University of Chicago Tackles Data Standardization

What led to the Pediatric Cancer Data Commons?

One of the main causes of this lack of interoperability is a lack of data standardization. We have a system where if you go from one hospital to another hospital across the street, it’s still very likely that they are going to have trouble using your data. A lot of medical data are still transferred is on CDs and via fax machine.

It really drove my professional direction towards trying to build systems that allowed us to share data better and to really tackle the problems at the root. We didn’t want to do the easy thing, which is just take data and then hire people to standardize and then share it. We wanted to tackle it as a grassroots phenomenon: teach people what standards are, empower them to build a data dictionary that’s validated with international input, and then let people harmonize the data into that standard and, ultimately, use that standard to collect subsequent data.

Around 2014, Dr Susan Cohn, a neuroblastoma expert at UChicago Medicine, came to me and said, “We have this collection of about 10,000 patients of neuroblastoma, which is a very rare pediatric tumor. It’s in a big Excel sheet. Can you help us figure out how to share it?”

What does the Pediatric Cancer Data Commons hope to accomplish?

We hope to move beyond pediatric cancer into other areas, and we hope to empower people to do the right things with data. We set up an infrastructure to take these data that was somewhat standardized, and started working with this neuroblastoma group to develop rules around how we would share it and how researchers could use it. And based on that success, the rhabdomyosarcoma group came to us to build their data commons. We started from the ground with them; we went to get collaborators in Europe and in the US. We created a big consortium, and set up rules of engagement. It took us a year and a half to build a data dictionary, painstakingly negotiating every single value and element in the data dictionary to come to a consensus.

And then after that, we started to receive more funding to support our work with more disease groups. Now we’re covering just about every pediatric cancer and are in various stages of building data dictionaries and harmonizing the data. We are tackling the problem at its source and providing a great way to share data and lower barriers to research. 

"We wanted to tackle it as a grassroots phenomenon: teach people what standards are, empower them to build a data dictionary that the world agrees upon and then let people harmonize the data into that standard and then collect it."


What has this work elucidated for you about current data capture?

People are always going to put data in the way that’s the easiest possible. If you offer people a blank textbox, they are going to use it. If you have to enter 10 things about a patient - their performance status, height, weight, blood pressure, etc., and you’re faced with 10 checkboxes, or a big empty text box, you’re probably going to just fill that box with text that will then be very hard to study.

The current methods of data capture could cater very well to structured capture, but they allow easy outs. That may serve the purpose of billing, but it ignores the really great things you can accomplish with the EHR - serving the research needs and performing rules and evidence- based medicine. You lose all those things when you don’t capture the data in a granular, structured form.

You can circumvent that a little bit by creating smart forms, where, for example, if you’re going to see a patient with sickle cell, it provides the questions that you ask and gives you a place to put every answer. There needs to be strong top-down leadership. Eventually, there may be an Alexa in your office to listen to the whole thing and automatically fill everything out. But for now, we’re still relying on people to do the right thing.

What are the challenges that you’re coming across when creating a common data language?

The biggest problem is that, up until now, most groups have been left to their own devices in creating their own data collection forms. The data elements we see are not clearly harmonized. For example, in the USA, one group might have, for site of disease, “lips, cheek or face.” But the group in Europe might just have “head.” You have to sit there and have a scientific discussion, and they have to come to an agreement. So that’s one of the biggest challenges: trying to figure out how to let science drive the building of the dictionary. Then harmonizing the data is just a technical issue. There are cultural issues, language issues, issues around which coding system do we use, etc. We’ve been able to tackle these by getting buy in from the right kinds of clinical leaders across the different countries.

How does someone enter into the PCDC and start contributing data?

We’ve taken a very UN approach to this. The different disease commons are run by disease consortia that we’ve helped set up for each disease group. So, there’s a soft tissue sarcoma consortium, an AML consortium, a neuroblastoma consortium. For example, there’s a group in South America that wants to join our retinoblastoma group. We invite them to meetings, and then the executive committee will have to work with them to decide if they want to come in as members and contribute data and what those data look like.

One of our big missions over the next couple of years is to move very decidedly into less developed areas of the world where most pediatric cancer occurs and try to bring this same idea of standardized data collection to help improve care. To do that, we’re obviously going to need to try to bring in many more groups. Right now, we’ve mainly engaged groups in North America and Western Europe, but we need to move beyond that to get all these other groups in. Defining the governance and rules of engagement are going to be really important.

"One of the big frontiers we’re going to cross next is how to get data right out of the EHR in real time to help with patient care. Because right now, by the time you collect data into a research commons, it’s old; the patient is diagnosed or already being treated. But how can we collect data in real time and use it for real time decision making for patients?"


How does this impact or affect the type of care that clinicians are able to offer their patients?

The goal of our work is to make research data more available to clinicians to try to lower the barriers to research. If you look at the neuroblastoma group, they’ve used the data for all sorts of projects that have changed the way we diagnosed and treat children with neuroblastoma. At one point, the cutoff for age for neuroblastoma risk was one year.

Based on the data in the data commons, they did a study and changed it 18 months. That immediately affects a large number of patients and the kinds of chemotherapy they will get.

One of more exciting ways we’re working now is to create better ways to match patients to precision medicine trials. We are working with The Leukemia & Lymphoma Society to create a decision support tool that is going to allow clinicians to go to the tool, enter information about their patient – clinical, genomic, immunophenotype - and match them to clinical trials. The goal of the Pediatric Acute Leukemia (PedAL) project is to have a child with relapsed leukemia go onto a clinical trial within 72 hours. We’re hoping to release the first generation of the tool later this year.

One of the big frontiers we’re going to cross next is how to get data right out of the EHR in real time to help with patient care. Because right now, by the time you collect data into a research commons, it’s old; the patient is diagnosed or already being treated. But how can we collect data in real time and use it for real time decision making for patients?

What are the security aspects of developing an international data commons platform?

We have made the determination only to work with de-identified data for our data commons. We remove the dates and instead have the age in days of the patient at the time of whatever event happened. We may not know that a kid had a CT scan on July 1, but we’ll know that they were 58 days old at the time. But just because the data are de-identified, that does not change at all the fact that we operate in a fully HIPAA-compliant infrastructure. We have been successful in implementing GDPR regulation throughout our data commons. But that means that for every single site that we’re sharing data with, we are negotiating a data-sharing agreement.

Even though the data are de-identified, the GDPR is extremely strict about holding data. There is the concept of “the right to forget,” meaning that if a patient in the EU decides they don’t want their data as part of the commons anymore, even if it’s de-identified, we have to be able to show that we can remove their data from the commons.

What is coming down the pipeline for the Pediatric Cancer Data Commons and your work?

One of my personal goals is to start to do more with real world evidence. We have a pilot project where we’re starting to bring in things like pollution data, traffic data and crime data at a hyper-local level to try to understand why there are differences in outcomes for people.

In this pilot, we’re looking at kids with asthma, where we will simultaneously map their information to weather, traffic, green space, crime, food availability, and see if we can model their outcomes from asthma based on nonclinical factors, and then try to recommend improvements. That’s the next frontier: being able to do that in a way that’s scalable and extensible.


For more information on Clinical Research as a Care Option, visit CRAACOevent.com.

Subscribe for More Information

Please provide your contact information and select areas of interest to receive updates.