“Frontiers in Massive Data Analysis,” by the National Research Council
Frontiers in Massive Data Analysis is a book-length report from the U.S. National Research Council, intended to assess the current state of data analysis, identify gaps in the current theory and practice and propose a research agenda to fill those gaps. Specifically, it details the challenges to data management and analysis that have arisen in the “big data” environment. These include dealing with distributed data sources, tracking data provenance, developing scalable and parallelizable algorithms, enabling data sharing and others.
The book covers the research areas of data representation, computational complexity, model-building, sampling and human-data interaction. One of the main conclusions of Frontiers is the need for significantly expanded interdisciplinary work in approaching “big data” problems. Computer scientists, statisticians, mathematicians and domain experts will need to work together to overcome many of the challenges detailed in the book.