WSDM Cup 2019

WSDM Cup 2019 Workshop Program

Friday 15th, 9:00-17:00, MCEC Room 112

WSDM CUP Participant Reports
8:30am - 9:00am Registration
9:00am - 9:10am Opening and Prizes (Zhifeng Bao and Jianzhong Qi)
9:10am – 10:30am Keynote - Mining Geospatial Data (Gao Cong, Nanyang Technological University, Singapore)
10:30am – 11:00am Tea break (poster session for all accepted posters)
Fake News Classification Challenge
11:00am – 11:15am Fake News Detection at ByteDance (Shiwei Wu, ByteDance)
11:15am – 12:00pm Transferring, Transforming, Ensembling: The Novel Formula of Identifying Fake News (Lam Pham)

Trust or Suspect? An Empirical Ensemble Framework for Fake News Classification (Shuaipeng Liu, Shuo Liu and Lei Ren)

Fake News Detection as Natural Language Inference (Kai-Chou Yang, Timothy Niven and Hung-Yu Kao)
Intelligent Flight Schedules Challenge
12:00pm – 12:15pm The Intelligent Decision of Flights Adjusting Rule-Based Flight Scheduling Optimisation (Sichen Zhao, Wei Shao and Haitao Zhu)
12:15pm – 1:30pm Lunch (not provided, but please listen for late announcements)
Sequential Skip Prediction Challenge
1:30pm – 2:15pm Recommendations at Spotify: Marketplace and Sequences (Brian Brost, Spotify Research)
2:15pm – 3:00pm Session-based Sequential Skip Prediction via Recurrent Neural Networks (Lin Zhu and Yihong Chen)

Modelling Sequential Music Track Skips Using a Multi-RNN Approach (Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen and Christina Lioma)

Sequential Skip Prediction with Few-Shot in Streamed Music Contents (Sungkyun Chang, Seungjin Lee and Kyogu Lee)
3:00pm – 3:30pm Tea break (poster session for all accepted posters)
Retention Rate of Baidu Hao Kan APP Users Challenge
3:30pm – 5:00pm Retention Rate of Baidu Hao Kan APP Users Challenge
3:30pm – 4:30pm Multi-Aspect Representation Learning (Dingcheng Li, Baidu)
4:30pm – 5:00pm An Effective Classification Method with RNN and Grand Boosting for Retention Rate of Baidu Hao Kan APP Challenge (Bo Wang, Weilong Chen and Ailin Sheng)

A Hybrid Approach for User Retention Rate Prediction (Haocheng Xu, Siyi Liu and Jiaxin Wu)
5:00pm Workshop Closing

Speakers

Gao Cong, Nanyang Technological University, Singapore

Title: Mining Geospatial Data

Abstract: With the proliferation of GPS enabled devices, massive amounts of geo-spatial data that contain both geospatial and textual content are being generated at an unprecedented scale. The talk will cover the topic of mining geo-spatial data.

Bio: Dr Gao Cong received the PhD degree from the National University of Singapore, in 2004. He is a professor with Nanyang Technological University, Singapore. Before he relocated to Singapore, he worked with Aalborg University, Microsoft Research Asia, and the University of Edinburgh. His current research interests include geo-textual data management and data mining.

Shiwei Wu, ByteDance

Title: Fake News Detection at Bytedance

Abstract: ByteDance is a global content platform that enables people to enjoy various content in various forms. One of the challenges we are facing is to combat different types of fake news. Fake news here refers to all forms of false, inaccurate or misleading information. In this talk I will present our approach to build the fake news detection system. At ByteDance, we implemented the system combining manual reviewing and automated detection algorithms. We apply natural language processing methods including text classification, semantic matching to variety of data forms like content or reviews to detect fake news.

Bio: Shiwei Wu is an engineer at Bytedance. He received his Master’s degree from Fudan University in computer science. His interest focuses on natural language processing and now he is working on content quality control in Toutiao (ByteDance’s main product), which is an AI powered app that recommends personalized information to its users.

Brian Brost, Spotify Research

Title: Recommendations at Spotify - Marketplace and Sequences

Abstract: In this talk I will provide an overview of machine learning and recommendation challenges faced by Spotify. I will also describe the motivation behind our WSDM Cup sequential skip prediction challenge. Finally, I will describe some new research challenges related to the problem of sequential recommendations.

Bio: Brian Brost is a Senior Researcher at Spotify Research in the United Kingdom.

Dingcheng Li, Baidu

Title: Multi-Aspect Representation Learning

Abstract: In 2018, our team developed more expressive representation learning model with topic embedding and knowledge graph embedding. Due to our creative integrations of topic word embedding and sparse autoencoder, our model is capable of catching thematic information from language data with high precision. In light of similar thoughts, we developed a model named as multi-label and multi-level neural network. It targets at doing automatic semantic indexing on documents with hierarchical terminology. Meanwhile, we integrate knowledge graph embedding into topic modeling with hierarchical Dirichlet process so that a more interpretable topics can be obtained with higher topic coherence and more accurate document classifications. Those works were published in IEEE Big Data’18, SDM'19 and WWW19. We also will talk about the design of WSDM Cup Baidu Challenge.

Bio: Dr. Li is currently a research scientist at Baidu Cognitive Computational Lab. He was a senior software engineer in Watson Health Cloud of IBM. He received PhD from the University of Minnesota, Twin Cities in 2011. Currently his research spans to natural language processing, medical text mining, medical informatics as well as deep learning, deep reinforcement learning and machine learning. He is the author and co-author of over 20 peer-reviewed papers in premier journals and conferences in relevant fields, including ACL (Associate of Computational Linguistics), NAACL, Journal of Natural Language Engineer, WWW, SDM, AMIA (American Medical Informatics Association), Journal of Biomedical Informatics and etc. He is the recipient of NIH Pathway to Independence Award (K99/R00) 2015. He is now a member of AMIA, ACL and ACM.

Task 1: ByteDance - Fake News Classification

ByteDance is a China-based global Internet technology company. Their goal is to build a global content platform that enables people to enjoy various content in different forms, with an emphasis on informing, entertaining, and inspiring people across language, culture, and geography. One of the challenges ByteDance faces is to combat different types of fake news, here referring to all forms of false, inaccurate, or misleading information. As a result, ByteDance has created a large database of fake news articles, and any new article must go through a test for content truthfulness before being published, based on matching between the new article and the articles in the database. Articles identified as containing fake news are then withdrawn after human verification of their status. The accuracy and efficiency of the process, therefore, are crucial in regard to making the platform safe, reliable, and healthy. ByteDance invites researchers and students in the community to take part in the following task. Given the title of a fake news article A and the title of a coming news article B, participants are asked to classify whether B may contain fake news.

Register now at https://www.kaggle.com/c/fake-news-pair-classification-challenge/

Task 2: Spotify - Sequential Skip Prediction Challenge

Spotify is an online music streaming service with over 180 million active users interacting with a library of over 35 million tracks. A central challenge for Spotify is to recommend the right music to each user. While there is a large body of work on recommender systems, there is very little work, or data, describing how users sequentially interact with the streamed content they are presented with. In particular within music, the question of if, and when, a user skips a track is an important implicit feedback signal. Spotify have released a dataset and challenge to WSDM in the hope of spurring research on this important and understudied problem in streaming. The challenge focuses on the task of session-based sequential skip prediction, that is, predicting given a user’s preceding interactions, if they will skip the next tracks encountered in a session.

Register now at https://www.crowdai.org/challenges/spotify-sequential-skip-prediction-challenge

Task 3: Baidu - Retention Rate of Baidu Hao Kan APP Users

Baidu Hao Kan App is an aggregation platform that provides users with massive high-quality short video content. It provides full coverage of high-quality videos such as fun, music, film, entertainment, games, life, and essays. Baidu Hao Kan App uses intelligent algorithms to understand users' interests and preferences and to recommend tailored video content to users. In the process of rapid growth, Baidu Hao Kan App faces new challenges. New users may download the app to browse and play the video for a while. Some new users will use the app again to watch the video the next day (referred to as “retained” users); however, others no longer use the app. The challenge is to identify factors that increase the percentage of user retention and determine reasons that affect user retention.

Register now at http://dianshi.baidu.com/competition/24/rule

Task 4: Sichuan Airlines - Intelligent Flight Schedules

Sichuan Airlines Co., Ltd was established on August 29, 2002 and is headquartered in Chengdu, China. It has branches in Chongqing, Yunnan and operational bases in Harbin, Beijing, Hangzhou, Xi'an, Sanya, Tianjin, Urumqi, and Xichang. It now operates over 160 flight routes. Sichuan Airlines is facing a number of challenges in scheduling its large number of flights, such as severe weather and aircraft servicing, which may cause large-scale flight delays. When large-scale delays occur, the dispatcher must adjust flight schedules in a timely and effective manner. The purpose of this task is to design and implement an algorithm to automatically identify flights that may be delayed, and to recommend an optimization scheme. For example, when extreme weather causes large-scale delays at multi-bases, the algorithm should automatically identify the subsequent flights that may be delayed and to recommend an optimal flight replacement plan under various practical constraints.

Register now at http://dianshi.baidu.com/competition/25/rule

Important Dates

Competitions begin Mid November, 2018
Competitions end and result notification See each task page
Workshop paper submission deadline January 11, 2019
WSDM Cup workshop at WSDM 2019 in Melbourne, Australia February 15, 2019

WSDM Cup Chairs

Zhifeng Bao, RMIT University zhifeng.bao@rmit.edu.au
Jianzhong Qi, The University of Melbourne jianzhong.qi@unimelb.edu.au
  • Start Times

  • Monday 11 February Tutorials and Industry Day
    Commences: 9:00am
  • Tuesday 12 February Welcome to Country Ceremony
    Commences: 8:45am
  • Please arrive and complete your registration prior to these times.