WSDM Cup 2020

WSDM Cup is a competition-style event co-located with the leading WSDM conference. This year, we have three exciting competition tasks from Microsoft Research, 4Paradigm and Sichuan Airlines, each of whom makes available 1 or more industrial scale datasets, enabling research on new problems. These challenges will be conducted through publicly accessible data challenge platforms and will have clear objectives to maximize for establishing rankings. Top participants will receive cash prizes of thousands of dollars, and be invited to present their work at WSDM Cup in February 2020.

Task 1 (Citation Intent Recognition) from Microsoft Research has been launched!

Task 2 (Automated Time Series Regression) from 4Paradigm has been launched!

Task 3 (Flight Delay Discovery and Optimization) from Sichuan Airlines has been launched!

Task 1

Microsoft Research - Citation Intent Recognition

For centuries, a key to the remarkable technological progress in our society has been the unassailable integrity exhibited by scientists in conducting scholarly communications. New discoveries and theories are openly distributed and discussed in published articles, and impactful contributions are often recognized by the research community at large in the form of citations. However, with the competition for research funding or promotions getting ever fiercer, unscrupulous behaviors intended at “gaming the system” rather than advancing the frontiers of our knowledge have become regrettably prevalent. Known as “coercive citations”, journal editors are seen to force authors to cite marginally relevant articles in particular journals to boost their journal impact factors, so are paper reviewers to solely increase their citation counts or h-index. These conducts are an affront to the highest integrity demanded of any scientists and technologists and, left unchecked, can undermine the public trusts and hamper the future developments in science and technology. This contest is the first in a series that explores the extent to which the web search and data mining technologies can be employed to distinguish superfluous citations from genuine recognitions. In this contest, however, we are focusing on a necessary first step in which the citation intents of the authors are recognized: the contestant is asked to develop a system that can recognize the citation intent of a given passage in a scholarly article and retrieve relevant citation targets from a given database.

Access the competition details here! Good luck!

Task 2

4Paradigm - Automated Time Series Regression

Machine Learning has achieved remarkable success in time series-related tasks, e.g., classification, regression and clustering. For time series regression, ML methods show powerful predictive performances. However, in practice, it is very difficult to switch between different datasets without human efforts. To address this problem, Automated Machine Learning (AutoML) is proposed to explore automatic pipeline to train an effective ML model given a specific task requirement. Since its proposal, AutoML have been explored in various applications, and a series of AutoML competitions, e.g., Auto-ML Track at KDD Cup, Automated Natural Language Processing (AutoNLP) and Automated Computer Vision (AutoCV) have been organized by 4Paradigm, Inc. and ChaLearn (sponsored by Google, Microsoft). These competitions have drawn a lot of attention from both academic researchers and industrial practitioners. In this challenge, we further propose the Automated Time Series Regression (AutoSeries) competition which aims at proposing automated solutions for time series regression task. This challenge is restricted to multivariate regression problems, which come from different time series domains, including air quality, sales, work presence, city traffic, etc. Provided solutions are expected to flexibly handle multiple types of datasets and automatically extract useful features, discover temporal correlations and make solutions generic enough to be applicable for unseen datasets.

Access the competition details here! Good luck!

Task 3

Sichuan Airlines - Flight Delay Discovery and Optimization

Sichuan Airlines Co., Ltd was established on August 29, 2002 and its headquarters is located in Chengdu with nine branches in Chongqing, Beijing, Yunnan,etc and three operational bases in Shenzhen, Nanning, Mianyang. From the former seven routes to the current 200-plus routes, Sichuan Airlines has built a well-operating network, integrating main routes, secondary routes, international routes, regional routes and branch routes, which contribute to the formation of the regional comprehensive transportation hub. On the one hand, both the number of passengers and routes served by Sichuan Airlines have shown a gradual upward trend. On the other hand, Sichuan Airlines is also facing an increasing number of challenges, such as severe weather and aircraft fault in scheduling flights, and these issues have the potential to cause the large-scale delay of subsequent flights at the relevant airports. When the large-scale delay of subsequent flights may occur, the dispatcher needs to adjust the flight schedules in time to make the flight scheduling reasonable and orderly, but Sichuan Airlines currently uses manual assistance to adjust flight scheduling and update flight information in the system, which means that in the case of extreme weather, relying on manual identification and adjustment of flights is not only time-consuming or laborious but also involves many restrictions. The purpose of this project is to implement a model which automatically identifies the subsequent flights that potentially delay, and recommend an optimization scheme.

Access the competition details here! Good luck!


WSDM Cup Chairs

Questions about the WSDM Cup 2020 should be directed to: