WSDM Cup 2022 is a machine learning competition-style event co-located with the leading WSDM (Web Search and Data Mining) conference.  Competition tasks are hosted on public competition platforms (e.g. Kaggle, Biendata), offering a clear specification of objective, a dataset (can be anonymized and de-identified appropriately), and a small reward for top-3 winning teams (typically $5-10K total, split across winners).  Moreover, they can lead to opportunities for internships and full-time roles with leading companies based on candidate participation.  Past tasks have come from partners like Microsoft, Wikimedia, Adobe, ByteDance, Spotify, Baidu, and more, on tasks as diverse as fake news classification, user retention prediction, music recommendation, vandalism detection, and knowledge-base completion, and winning teams have received excellent results across the board.  You can find previous task specifications, and associated results below for inspiration: 2016, 2017, 2018, 2019, 2020.


[Oct 15] WSDM Cup tasks have officially started!  Please see below for the competition details and links.

[Sep 29] We are excited to announce the following confirmed tasks for the WSDM Cup 2022. WSDM Cup competitions will start after October 15th and run until late January (individual dates will depend on task and sponsor details, which will be shared soon).  Please check back soon for more details.

User Retention Score Prediction

Task sponsor: iQIYI

Task abstract: iQIYI is the world’s leading movie and video streaming platform, with nearly 8 billion hours spent on iQIYI app each month and over 500 million monthly active users. It features highly popular original content, as well as a comprehensive library of other professionally-produced content and user-generated content.  iQIYI app uses deep-learning AI algorithms and massive user data, to produce content that caters to various user tastes, and to deliver superior entertainment experience to them. User retention is a key indicator to measure the users’ satisfaction. It uses a retention score for the next N days to evaluate the retention. For example, a user having retention score 3 for the next 7 days means this user would launch the iQIYI app in 3 separate days during the period. It is very challenging to predict the retention score. It not only depends on the individual, but also on the trends of the entertainment, e.g., the number of daily users fluctuates severely when some blockbuster original contents are released. The task’s goal is to predict user retention scores using deep models.

Temporal Link Prediction

Task sponsor: Intel / Amazon

Task abstract: Real world datasets can often be expressed as graphs, with entities as nodes and interactions as edges.  Examples include user-user interaction in social networks, user-item interactions in recommender systems, etc.  Moreover, these graphs are often in practice temporal, with new edges coming in with timestamps.  Contrary to link prediction which asks if an edge exists between two nodes on a partially observed graph, temporal link prediction asks if an edge will exist between two nodes within a given time span.  It is more useful than traditional link prediction as one can then build multiple applications around the model, such as forecasting the demand of customers in E-commerce, or forecasting what event will happen in a social network, etc.  We are expecting a single model that can handle two kinds of data simultaneously: a dynamic event graph with entities as nodes and events as edges, and a user-item graph with users and items as nodes and various interactions as edges.  The task will be predicting whether an edge of a given type will exist between two given nodes before a given timestamp.

Cross-market Recommendation

Task sponsor: University of Amsterdam / University of Massachusetts Amherst / Amazon

Task abstract: E-commerce companies often operate across markets; for instance Amazon has expanded their operations and sales to 18 markets (i.e. countries) around the globe. The cross-market recommendation concerns the problem of recommending relevant products to users in a target market (e.g., a resource-scarce market) by leveraging data from similar high-resource markets, e.g. using data from the U.S. market to improve recommendations in a target market. The key challenge, however, is that data, such as user interaction data with products (clicks, purchases, reviews), convey certain biases of the individual markets. Therefore, the algorithms trained on a source market are not necessarily effective in a different target market.  Despite its significance, small progress has been made in cross-market recommendation, mainly due to a lack of experimental data for the researchers. In this WSDM Cup challenge, we provide user purchase and rating data on various markets, enriched with review data in different languages, with a considerable number of shared item subsets. The goal is to improve individual recommendation systems in these target markets by leveraging data from similar auxiliary markets.


Please see below for the timeline:

EventCompletion Date
Proposal Submission DueAugust 10, 2021
Selected Sponsors FinalizedAugust 20, 2021
Sponsors Set Up TasksAugust 20 – October 15, 2021
Task Competitions StartOctober 15, 2021
Task Competitions EndJanuary 24, 2022
Winner Reports SubmittedFebruary 15, 2022
WSDM Cup’22 Conference EventFebruary 21-25, 2022