Published on November 7th, 2012 | by Daniel Castro0
5 Q’s on Data Innovation with Lynn Etheredge
Lynn Etheredge is an independent consultant on health care and social policy issues and heads the Rapid Learning Project at George Washington University in Washington, DC. I asked Lynn to share with me his thoughts on how health care research is changing as a result of the increased use of data.
Castro: How will rapid learning health networks change how health care research is performed?
Etheredge: Traditionally, health research has relied on in vitro and in vivo methods—lab work and animal and human experiments. The rapid-learning networks add in silico research—using computerized databases and networks with individual-level, clinically rich, and longitudinal data from millions of patients captured in electronic health records. Francis Collins, NIH’s director, has recently proposed a new national patient-centered research network with 20-30 million patients. As discussed in a recent report—Toward Precision Medicine (National Academy of Sciences, 2012)—this will revolutionize biomedical research, clinical practice, and public health.
Castro: How is this type of in silico research being used to help patients today?
Etheredge: The most dramatic results have been in pediatric cancers. Major childhood cancers used to have 90% mortality rates; today, life expectancy is almost the same as for children who have never had cancer, i.e. 90%+ cure. This is a result of an organized research system that reports and learns from the treatment of every child with cancer, and shares those data and experiences for rapid-cycle learning. Many other networks are now producing breakthrough studies, including disease specific (cancer, cardiovascular, diabetes), and genetics-related research. A few weeks ago, NIH announced its new Health Care System (HCS) Collaboratory which creates a new national research system using organized delivery systems, like HMOs and leading academic health centers, that now have electronic health records databases for millions of patients, The FDA’s new mini-Sentinel system for studying drug safety issues is now accessing over 125 million patients records and is already doing hundreds of studies a year. The largest organized delivery system, Kaiser Permanente, with 9 million patients in its EHR databases, is ramping up quickly and will likely do over 1,000 studies this year.
Castro: Both the public and private sectors have invested millions of dollars in electronic health record systems, especially over the last few years. What needs to happen so that researchers can take advantage of all of this data in these systems for their medical research?
Etheredge: Collaboration and data-sharing are critical so the world’s researchers can access and learn from this cornucopia of “big data”. There are technical issues here, mostly NIH, as the nation’s leading research funder, needs to require that publicly-supported research studies make full (de-identified) data sets available for open science. And Kaiser’s (NIH-supported) Biobank, with 500,000 patient records of clinical, genetic and environmental data, also needs to expand its capabilities to serve as the world’s leading resource for open science.
Castro: Do researchers have the tools and resources they need to take advantage of the data that is available?
Etheredge: I think of “rapid-learning” as the result of great researchers + computing power + great databases + Apps. We already have great researchers and peta-flop computers (a quadrillion operations per second). We need to complete a national system of pre-designed, pre-populated, pre-positioned databases for open science, so researchers can literally log on to the world’s evidence base for biomedical and clinical research. There is also a large challenge in creating the research tools (“Apps”) to get the learning out of the databases quickly. In the future, biology and medicine will increasingly become “digital sciences”, and new software will have a major role.
Castro: What other changes are needed in the research community so that we can build a stronger evidence-based health care system?
Etheredge: To an economist, the sharing of research data is a classic case of the “economics (or tragedy) of the commons”. From a high-level perspective, the arithmetic for data-sharing is compelling; if 100 institutions each contribute 100 case records to a common database, there will be 10,000 shared records – a “return on investment” for each institution of 9,900 for 100 records contributed, i.e. 99:1. It sure beats the rate of return on Treasuries! But, as economists have noted in the case of production of new knowledge, there is sub-optimal data-sharing precisely because most beneficiaries do not share in the production costs. To foster open science, there need to be organizers, incentives, and/or rules. Attribution/credit is a related, solvable issue so that researchers who spend many years creating a first-class database do not feel ripped-off by someone harvesting their data and benefiting from all of their hard work. Trust is another issue. We train researchers to collect their own data, as their professional reputations depend on the credibility of results; the research community needs confidence that, in using shared data resources, they can count on its quality.
“5 Q’s on Data Innovation” is part of an ongoing series of interviews for Data Innovation Day by ITIF Senior Analyst Daniel Castro. If you have a suggestion for someone who should be featured, send an email to Daniel Castro at firstname.lastname@example.org.