Beyond Probability Home | 14th ACM International WSDM Conference


The following tutorials have been selected for WSDM 2021:

Morning (March, 8)

Afternoon (March, 8)

Evening (March, 8)


Scalable Graph Neural Networks with Deep Graph Library

Presenters: Minjie Wang (Amazon, China), Quan Gan (Amazon, China), Mufei Li (Amazon, China), Zheng Zhang (Amazon, China)

Abstract: Learning from graph and relational data plays a major role in many applications including social network analysis, marketing, e-commerce, information retrieval, knowledge modeling, medical and biological sciences, engineering, and others. In the last few years, Graph Neural Networks (GNNs) have emerged as a promising new supervised learning framework capable of bringing the power of deep representation learning to graph and relational data. This ever-growing body of research has shown that GNNs achieve state-of-the-art performance for problems such as link prediction, fraud detection, target-ligand binding activity prediction, knowledge-graph completion, and product recommendations. In practice, many of the real-world graphs are very large. It is urgent to have scalable solutions to train GNN on large graphs efficiently. This tutorial will provide an overview of the theory behind GNNs, discuss the types of problems that GNNs are well suited for, and introduce some of the most widely used GNN model architectures and problems/applications that are designed to solve. It will introduce the Deep Graph Library (DGL), a scalable GNN framework that simplifies the development of efficient GNN-based training at a large scale. The tutorial will provide hands-on sessions to show how to use DGL to perform scalable training in different settings (multi-GPU training and distributed training).

Advances in Bias-aware Recommendation on the Web


Presenters: Ludovico Boratto (Eurecat - Centre Tecnológic de Catalunya, Spain), Mirko Marras (École Polytechnique Fédérale de Lausanne EPFL, Switzerland)

Abstract: Ranking and recommender systems are playing a key role in today's online platforms, definitely influencing the information-seeking behavior of tons of users. However, these systems are trained on data which often conveys imbalances and inequalities, and such patterns might be captured and emphasized in the results the system provides to the final users, creating exposure biases and providing unfair results. Given that biases are becoming a threat to information seeking, (i) studying the interdisciplinary concepts and problem space, (ii) formulating and designing a bias-aware algorithmic pipeline, and (iii) materializing and mitigating the effects of bias, while retaining the effectiveness of the underlying system, are rapidly becoming prominent and timely activities.

The proposed tutorial is organized around this topic, presenting the WSDM community with recent advances on the assessment and the mitigation of data and algorithmic bias in recommender systems. We will first introduce conceptual foundations, by surveying the state of the art and describing real-world examples of how a bias can impact recommendation algorithms from several perspectives (e.g., ethics and system's objectives). The tutorial will continue with a systematic presentation of algorithmic solutions to uncover, assess, and reduce bias along the recommendation design process. A practical part will then provide attendees with concrete implementations of pre-, in-, and post-processing bias mitigation algorithms, leveraging open-source tools and public datasets. In this part, tutorial participants will be engaged in the design of the bias countermeasures and in articulating impacts on stakeholders. We will finally conclude the tutorial with an analysis of the emerging open issues and future directions in this vibrant and rapidly evolving research area.

Information to Wisdom: Commonsense Knowledge Extraction and Compilation


Presenters: Simon Razniewski (Max Planck Institute for Informatics, Germany), Niket Tandon (Allen Institute for AI, USA), Aparna S. Varde (Montclair State University, USA)

Abstract: Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt's birth date, or Angelina Jolie's movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.

Beyond Probability Ranking Principle: Modeling the Dependencies among Documents

Presenters: Liang Pang (Institute of Computing Technology, CAS, China), Qingyao Ai (The University of Utah, USA), Jun Xu (Renmin University of China, China)

Abstract: Probability Ranking Principle (PRP) is the fundamental principle for ranking, which assumes that each document has a unique and independent probability to satisfy a particular information need. Previously, traditional heuristic features and well-known learning-to-rank approaches are designed following PRP principle. Besides, recent deep learning enhanced ranking models, also referred to as “deep text matching”, also obey PRP principle. However, PRP is not an optimal for ranking, due to each document is not independent from the rest in many recent ranking tasks, such as pseudo relevance feedback, interactive information retrieval and so on. To solve this problem, a new trend of ranking models turn to model the dependencies among documents.

In this tutorial, we aim to give a comprehensive survey on recent progress that the ranking models go beyond PRP principle. Our tutorial is perspective, because we try to categorize based on their intrinsic assumptions and formalize the standard problems. In this way, we expect researchers to focus on this field which will lead to a big improvement in the information retrieval. The tutorial mainly consists of three parts. Firstly, we introduce the ranking problem and the well-known probability ranking principle. Secondly, we present traditional approaches under PRP principle. Lastly, we illustrate the limitations of PRP principle and introduce most recent work that models the dependencies among documents in a sequential way and in a global way.

Pretrained Transformers for Text Ranking: BERT and Beyond

Presenters: Andrew Yates (Max Planck Institute for Informatics, Germany), Rodrigo Nogueira (University of Waterloo, CA), and Jimmy Lin (University of Waterloo, CA)

Abstract: The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond.

In this tutorial, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our tutorial: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this tutorial also attempts to prognosticate where the field is heading.

Personalization in Practice: Methods and Applications

Presenters: Dmitri Goldenberg (, Tel Aviv, Israel), Kostia Kofman (, Tel Aviv, Israel), Javier Albert (, Tel Aviv, Israel), Sarai Mizrachi (, Tel Aviv, Israel), Adam Horowitz (, Tel Aviv, Israel), Irene Teinemaa (, Amsterdam)

Abstract: Personalization is one of the key applications of machine learning, with widespread usage across e-commerce, entertainment, manufacturing, healthcare, and various other industries. While many machine learning techniques integrate state-of-the-art advances and enhanced performance capabilities from year to year, personalization and recommender systems applications are often late adopters, typically due to heightened complexity of implementation. This tutorial presents recent advances in personalization techniques and demonstrates their practical applications in real-world case studies from leading online platforms, including real case studies from We overview some of the recent years most prominent research areas in machine learning based personalization , specifically deep learning, causality, and active exploration. We deep dive into a large body of work aimed at solving key business problems, covering uplift modeling, contextual bandits and sequence modeling methods. We also cover emerging topics in personalization, including explainability, fairness, natural interfaces, and content generation, addressing the technological and user experience considerations of their application.

Systemic Challenges and Solutions on Bias and Unfairness in Peer Review


Presenter: Nihar B. Shah (Carnegie Mellon University, USA)

Abstract: Peer review is the backbone of scientific research. Yet peer review is frequently called "biased,'' "broken,'' "corrupt,'' and "unscientific'' in many scientific disciplines. This problem is further compounded with the near-exponentially growing number of submissions in various computer science conferences. Due to the prevalence of the "Matthew effect'' of the rich getting richer in academia, any source of unfairness in the peer review system, such as those discussed in this tutorial, can considerably affect the entire career trajectory of (young) researchers. In this tutorial, we will discuss a number of systemic challenges in peer review such as biases, subjectivity, miscalibration, dishonest behavior, noise, as well as policy choices. For each issue, we will first present insightful experiments to understand the issue. Then we will present computational techniques designed to address these challenges. We will also highlight a number of open problems.

Neural Structured Learning: Training Neural Networks with Structured Signals

Presenters: Arjun Gopalan (Google Research, USA), Da-Cheng Juan (Google Research, USA), Cesar Ilharco Magalhães (Google Research, USA), Chun-Sung Ferng (Google Research, USA), Allan Heydon (Google Research, USA), Chun-Ta Lu (Google Research, USA), Philip Pham (Google Research, USA), George Yu (Google Research, USA), Yicheng Fan (Google Research, USA), Yueqi Wang (Google Research, USA).

Abstract: We present Neural Structured Learning (NSL), a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. Structured signals are commonly used to represent relations or similarity among samples that may be labeled or unlabeled. So, leveraging these signals during neural network training harnesses both labeled and unlabeled data, which can improve model accuracy, particularly when the amount of labeled data is relatively small. Additionally, models trained with samples that are generated by adding adversarial perturbation have been shown to be robust against malicious attacks, which are designed to mislead a model’s prediction or classification. NSL generalizes to both Neural Graph Learning as well as Adversarial Learning. Neural Structured Learning is open-sourced on GitHub and is part of the TensorFlow ecosystem. The NSL website is hosted at, which contains the theoretical foundations of the technology, API documentation, and hands-on tutorials. NSL is widely used in Google across many products and services.

Our tutorial will cover several aspects of Neural Structured Learning with an emphasis on two techniques -- graph regularization and adversarial regularization. In addition to using interactive hands-on tutorials that demonstrate the NSL framework and APIs in TensorFlow, we also plan to have short presentations that accompany them to provide additional motivation and context. Finally, we will discuss some recent research that is closely related to Neural Structured Learning but not yet part of its framework in TensorFlow. Topics here include using graphs for learning embeddings and several advanced models of graph neural networks. This will demonstrate the generality of the Neural Structured Learning framework as well as open doors to future extensions and collaborations with the community.

Tutorial on Conversational Recommendation Systems

Presenters: Zuohui Fu (Rutgers University, USA), Yikun Xian (Rutgers University, USA), Yongfeng Zhang (Rutgers University, USA), Yi Zhang (University of California Santa Cruz, USA).

Abstract: Recent years have witnessed the emerging of conversational systems, including both physical devices and mobile-based applications. Both the research community and industry believe that conversational systems will have a major impact on human-computer interaction, and specifically, the RecSys community has begun to explore Conversational Recommendation Systems. Conversational recommendation aims at finding or recommending the most relevant information (e.g., web pages, answers, movies, products) for users based on textual- or spoken-dialogs, through which users can communicate with the system more efficiently using natural language conversations. Due to users’ constant need to look for information to support both work and daily life, conversational recommendation systems will be one of the key techniques towards an intelligent web. The tutorial focuses on the foundations and algorithms for conversational recommendation, as well as their applications in real-world systems such as search engines, e-commerce and social networks. The tutorial aims at introducing and communicating conversational recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.

Deep Learning for Anomaly Detection: Challenges, Methods, and Opportunities


Presenters: Guansong Pang (University of Adelaide, Australia), Longbing Cao (University of Technology Sydney, Australia), Charu Aggarwal (IBM T. J. Watson Research Center, USA)

Abstract: Anomaly detection can offer important insights into many safety-critical or commercially significant real-world applications such as extreme climate event detection, mechanical fault detection, terrorist detection, fraud detection, malicious URL detection, just to name a few. In this tutorial we aim to present a comprehensive review of the advances in deep learning techniques specifically designed for anomaly detection. Deep learning has gained tremendous success in transforming many data mining and machine learning tasks, but popular deep learning techniques are inapplicable to anomaly detection due to some unique characteristics of anomalies, e.g., rarity, heterogeneity, boundless nature, and prohibitively high cost of collecting large-scale anomaly data. Through this tutorial, we first discuss the challenges presented in anomaly detection, and then present a systematic overview of this area from various learning perspectives in three high-level categories of methods and 11 fine-grained subcategories of methods. We introduce the key intuitions, objective functions, underlying assumptions, advantages and disadvantages of these deep anomaly detection methods, and discuss how they may address the aforementioned challenges. Lastly, we review some closely related areas, such as out-of-distribution detection and curiosity learning in reinforcement learning, and then discuss important future research opportunities.

WSDM 2021 Tutorial Chairs