Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Yahoo Releases Largest-ever Machine Learning Dataset for Researchers

Suju Rajan – “Data is the lifeblood of research in machine learning. However, access to truly large-scale datasets is a privilege that has been traditionally reserved for machine learning researchers and data scientists working at large companies – and out of reach for most academic researchers. Research scientists at Yahoo Labs have long enjoyed working on large-scale machine learning problems inspired by consumer-facing products. This has enabled us to advance the thinking in areas such as search ranking, computational advertising, information retrieval, and core machine learning. A key aspect of interest to the external research community has been the application of new algorithms and methodologies to production traffic and to large-scale datasets gathered from real products. Today, we are proud to announce the public release of the largest-ever machine learning dataset to the research community. The dataset stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015. The Yahoo News Feed dataset is a collection based on a sample of anonymized user interactions on the news feeds of several Yahoo properties, including the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate.”

Sorry, comments are closed for this post.