Data Set Yahoo

Published on January 21st, 2016 | by Joshua New


Fueling Machine Learning Research

Yahoo has made its massive Yahoo News Feed data set—anonymized user interactions on Yahoo services, including Yahoo News, Yahoo Sports, and Yahoo Finance—publicly available to spur machine learning research. Yahoo’s 13.5 terabyte data set contains approximately 110 billion user interactions for 20 million users from February 2015 to May 2015. Yahoo published the data to help academics working in data science, who often do not have access to large-scale data sets that can help train machine learning algorithms and lead to new analytic techniques.

Get the data.

Image: Scott Schiller

Tags: , , ,

About the Author

Joshua New is a policy analyst at the Center for Data Innovation. He has a background in government affairs, policy, and communication. Prior to joining the Center for Data Innovation, Joshua graduated from American University with degrees in C.L.E.G. (Communication, Legal Institutions, Economics, and Government) and Public Communication. His research focuses on methods of promoting innovative and emerging technologies as a means of improving the economy and quality of life. Follow Joshua on Twitter @Josh_A_New.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top ↑

Show Buttons
Hide Buttons