An Introduction to the Tools and Policies Behind Data-Driven Innovation
This article was originally published in Ideas Lab.
Excited about data? Join the club. Data, both “big” and small, has the potential to grow the economy, cut costs in government and improve the health and welfare of individuals around the world.
While some organizations have been slow to adopt data-driven innovations, there has been a great deal of innovation through the entire “data lifecycle,” which includes collection, storage, analysis, use and dissemination. Not all data-driven initiatives have pieces that fall into all categories, but most projects have some aspects of each category.
Collection and Storage
Collection is conducted principally using sensors and electronic records. Sensors have plummeted in price and are now widely deployed. Sensors can be used to record information, as in video and audio recordings, or measure information, as in devices to measure speed, temperature, pressure, chemical content, and other physical information about the world. Sensors are used to measure subtle changes in the global environment that inform hurricane and drought predictions at the National Weather Service, as well as take the video footage that enables doctors to diagnose patients from afar. Electronic records, such as those created by online forms and transactions, comprise the other major means of data collection. The United Parcel Service uses its electronic records to optimize its drivers’ routes, and Visa uses its electronic records to alert banks to potential fraudulent activity.
Once data is collected, it must be stored. Data storage has also become much cheaper in recent decades. It cost around $440,000 to store a single gigabyte of data in 1980, while today it costs about five cents. As of 2013, more than a billion gigabytes of data were stored in the cloud alone. There exist specialized databases designed to store particular kinds of information, from documents to network data, and many newer databases are built with large-scale “big data” performance in mind.
Analysis and Use
The analysis category includes a wide range of techniques and technologies used to probe for insights in data that is stored in a database. Data scientists, the professionals who conduct advanced data analysis, typically have backgrounds in statistics, mathematics and computer science, and can deploy tools ranging from custom code to off-the-shelf analysis software to test hypotheses on data and identify unusual observations. Data analysis is useful in a wide range of contexts, from helping online dating site OKCupid find the best match for each user, to helping the Department of the Interior prioritize conservation efforts.
After analysis, data can be used to help inform decision-making. This can mean informing the decisions of a person in his or her home or the “decisions” of a robot in a factory. Human decisions are often aided by data visualizations or data-driven decision support systems, while computer systems and robots can be automated to perform actions better as more data is available. Google’s self-driving car uses video, as well as data on speed, acceleration and other factors, to avoid obstacles on the road and avoid disturbing its inhabitants. Humans use recommendation services like Netflix and Pandora to help them decide what movies to watch or what music to listen to.
Policymakers can help continue to encourage innovation in all areas of the data lifecycle. In 2012, the Obama Administration announced a one-time big data research and development initiative with $200 million in funding. Funding efforts such as these should be continued and expanded, since the benefits of these technologies can have strong positive spillover effects and benefits throughout the economy.
Government can also work to improve access to data by making additional government data sets available as well, by creating the legal and regulatory frameworks necessary to encourage data sharing and reuse in different industries. It can also help and encourage development of skills by better incorporating data-related disciplines into science, engineering, technology and math (STEM) development programs and course curricula at the K-12 and university levels.