What Big Data Can Do for National Statistics
The Bureau of Labor Statistics (BLS) is the principal U.S. agency tasked with collecting, analyzing, and disseminating essential labor market information. BLS data has broad applications in the public and private sector alike. The Consumer Price Index, the nation’s most widely used inflation measure, and the Current Population Survey, which provides unemployment data, offer critical indicators for federal monetary and fiscal policy and influence large-scale industry behavior. Other BLS data products, such as the Consumer Expenditure Survey and the National Compensation Survey, create value by helping private companies target the most lucrative market segments and attract the best talent with competitive wages.
But the BLS’ efforts to collect more data and create new data products have been stymied by several rounds of budget cuts in recent years. Since 2010, the agency has had to cut several programs and restructure others, reducing its ability to provide timely and accurate economic information. The President’s 2015 budget would give the BLS more funding than it received in 2014, but also new data collection responsibilities and other mandates, meaning that the new funding will not be sufficient to restore programs cut in 2013 and 2014. Some of these cuts have come from government-wide cost-cutting measures, but the difficult fiscal environment should not lead Congress to sacrifice long-term data quality for short-term savings. Instead, legislators should help cut long-term costs by supporting the BLS’ adoption of alternative data sources.
National surveys are costly. The Current Population Survey, for example, draws data from about 60,000 households for each monthly iteration, a job which required 2,200 highly trained interviewers, as of 2009. But although surveys are the current data collection standard for national statistical agencies such as the BLS, surveys are not always the only way to collect large quantities of nationally representative data. The Massachusetts Institute of Technology’s Billion Prices Project draws prices from hundreds of online retailers around the world to create its inflation index. The index, which updates daily, also collects five times as many prices as the BLS’ monthly Consumer Price Index. Premise Data Corp, a San Francisco-based startup, has experimented with a near-real time approach to prices, supplementing web data aggregation with part-time employees equipped with smart phones and software to remotely upload price data. More unusual data sources may show promise as well. Google researchers showed in a 2011 paper that online search query data drawn from Google Trends can produce accurate forecasts of certain economic indicators, including unemployment claims and consumer confidence.
Other statistical agencies have already begun exploring such alternative data sources. The Bureau of Economic Analysis is experimenting with using anonymized data from financial software firm Intuit to improve official estimates of employment and sales trends. But such initiatives are in their infancy, and deserve much more attention among national statistical agencies including the BLS. One way Congress can begin encouraging broader integration of these and other potentially cost-saving data sources is to provide funding for one-time research initiatives examining their feasibility. This would allow the BLS to devote attention to reforms that could provide cheaper, better quality, and more timely data in the long term—helping reconcile tight congressional budgetary constraints with the need for a robust national statistical infrastructure all at once.