Knowledge-Augmented Methods for Natural Language Processing
Duration: Half-Day (AM)
Website: https://github.com/zcgzcgzcg1/WSDM2023_Knowledge_NLP_Tutorial
Summary: Knowledge in NLP has been a rising trend especially after the advent of large-scale pre-trained models. Knowledge is critical to equip statistics-based models with common sense, logic and other external information. In this tutorial, we will introduce recent state-of-the-art works in applying knowledge in language understanding, language generation and commonsense reasoning.
Presenters: Chenguang Zhu (Microsoft Cognitive Services Research), Yichong Xu (Microsoft Cognitive Services Research), Xiang Ren (University of Southern California), Bill Yuchen Lin (University of Southern California), Meng Jiang (University of Notre Dame), Wenhao Yu (University of Notre Dame)
Hate Speech: Detection, Mitigation and Beyond
Duration: Half-Day (PM)
Website: https://hate-alert.github.io/talk/wsdm_tutorial/
Summary: Social media sites such as Twitter and Facebook have connected billions of people and given the opportunity to the users to share their ideas and opinions instantly. That being said, there are several negative consequences as well such as online harassment, trolling, cyber-bullying, fake news, and hate speech. Out of these, hate speech presents a unique challenge as it is deeply engraved into our society and is often linked with offline violence. Social media platforms rely on human moderators to identify hate speech and take necessary action. However, with the increase in online hate speech, these platforms are turning toward automated hate speech detection and mitigation systems. This shift brings several challenges to the plate, and hence, is an important avenue to explore for the computation social science community. In this tutorial, we present an exposition of hate speech detection and mitigation in three steps. First, we describe the current state of research in the hate speech domain, focusing on different hate speech detection and mitigation systems that have developed over time. Next, we highlight the challenges that these systems might carry like bias and the lack of transparency. The final section concretizes the path ahead, providing clear guidelines for the community working in hate speech and related domains. We also outline the open challenges and research directions for interested researchers.
Presenters: Punyajoy Saha (IIT Kharagpur), Mithun Das (IIT Kharagpur), Binny Mathew (IIT Kharagpur), Animesh Mukherjee (IIT Kharagpur)
A Tutorial on Domain Generalization
Duration: Half-Day (PM)
Website: https://dgresearch.github.io/
Summary: With the availability of massive labeled training data, powerful machine learning models can be trained. However, the traditional I.I.D. assumption that the training and testing data should follow the same distribution is often violated in reality. While existing domain adaptation approaches can tackle domain shift, it relies on the target samples for training. Domain generalization is a promising technology that aims to train models with good generalization ability to unseen distributions. In this tutorial, we will present the recent advance of domain generalization. Specifically, we introduce the background, formulation, and theory behind this topic. Our primary focus is on the methodology, evaluation, and applications. We hope this tutorial can draw interest of the community and provide a thorough review of this area.
Presenters: Jindong Wang (Microsoft Research Asia), Haoliang Li (City University of Hong Kong), Sinno Pan (Nanyang Technological University), Xing Xie (Microsoft Research Asia)
Trustworthy Algorithmic Ranking Systems
Duration: Half-Day (PM)
Website: https://github.com/socialcomplab/Trustworthy-ARS-Tutorial-WSDM22
Summary: This tutorial aims at providing its audience an interdisciplinary overview about the topics of fairness and non-discrimination, diversity, and transparency as the main dimensions of trustworthy AI systems, tailored to algorithmic ranking systems such as search engines and recommender systems. We will equip the mostly technical audience of WSDM with the necessary understanding of the ethical implications of their research and development on the one hand, and of recent political and legal regulations that address the aforementioned dimensions on the other hand. While the tutorial foremost takes a European perspective, because EU regulation is at the forefront of elaborating guidelines for ethical and trustworthy AI, we also review initiatives outside of Europe, in particular in the US and China. Since ensuring non-discrimination, diversity, and transparency in retrieval and recommendation systems is a global endeavor in which academic institutions and companies in different parts of the world collaborate, this tutorial is relevant also to researchers and practitioners in countries that do not regulate AI technologies yet, in particular since we are experiencing more and more of such regulations recently. The tutorial, therefore, targets both academic scholars as well as practitioners around the globe, by reviewing recent research and providing practical examples addressing one or more of the trustworthiness aspects, and showcasing how new regulations affect the audience’s daily work.
Presenters: Markus Schedl (Johannes Kepler University and Linz Institue of Technology), Emilia Gómez (European Commission, Joint Research Centre and Universitat Pompeu Fabra), Elisabeth Lex (Graz University of Technology)
Proactive Conversational Agents
Duration: Half-Day (AM)
Website: https://github.com/lsyysl9711/WSDM2023_Proactive_Conversational_Agents_Tutorial
Summary: Conversational agents, or commonly known as dialogue systems, have gained escalating popularity in recent years. Their widespread applications support conversational interactions with users and accomplishing various tasks as personal assistants. However, one key weakness in existing conversational agents is that they only learn to passively answer user queries via training on pre-collected and manually-labeled data. Such passiveness makes the interaction modeling and system-building process relatively easier, but it largely hinders the possibility of being human-like hence lowering the user engagement level. In this tutorial, we introduce and discuss methods to equip conversational agents with the ability to interact with end users in a more proactive way. This three-hour tutorial is divided into three parts and includes two interactive exercises. It reviews and presents recent advancements on the topic, focusing on automatically expanding ontology space, actively driving conversation by asking questions or strategically shifting topics, and retrospectively conducting response quality control.
Presenters: Lizi Liao (Singapore Management University), Grace Hui Yang (Georgetown University), Chirag Shah (University of Washington)
Preference-Based Offline Evaluation
Duration: Half-Day (AM)
Website: https://github.com/claclark/wsdm2023-tutorial
Summary: A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. These labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly-effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. This tutorial, intended for both researchers and practitioners, will review early work in preference-based evaluation and cover recent developments.
Presenters: Charles Clarke (University of Waterloo), Fernando Diaz (Google), Negar Arabzadeh (University of Waterloo)
Natural and Artificial Dynamics in GNNs
Duration: Half-Day (AM)
Website: https://github.com/DongqiFu/Natural-and-Artificial-Dynamics-in-GNNs-A-Tutoriall
Summary: In the big data era, the relationship between entities becomes more complex. Therefore, graph (or network) data attracts increasing research attention for carrying complex relational information. For a myriad of graph mining/learning tasks, graph neural networks (GNNs) have been proven as effective tools for extracting informative node and graph representations, which empowers a broad range of applications such as recommendation, fraud detection, molecule design, and many more. However, real-world scenarios bring pragmatic challenges to GNNs. First, the input graphs are evolving, i.e., the graph structure and node features are time-dependent. Integrating temporal information into the GNNs to enhance their representation power requires additional ingenious designs. Second, the input graphs may be unreliable, noisy, and suboptimal for a variety of downstream graph mining/learning tasks. How could end-users deliberately modify the given graphs (e.g., graph topology and node features) to boost GNNs’ utility (e.g., accuracy and robustness)? Inspired by the above two kinds of dynamics, in this tutorial, we focus on topics of natural dynamics and artificial dynamics in GNNs and introduce the related works systematically. After that, we point out some promising but under-explored research problems in the combination of these two dynamics. We hope this tutorial could be beneficial to researchers and practitioners in areas including data mining, machine learning, and general artificial intelligence.
Presenters: Dongqi Fu (University of Illinois at Urbana-Champaign), Zhe Xu (University of Illinois at Urbana-Champaign), Hanghang Tong (University of Illinois at Urbana-Champaign), Jingrui He (University of Illinois at Urbana-Champaign)
Next-Generation Challenges of Responsible Data Integration
Duration: Half-Day (AM)
Website: https://asudeh.github.io/indexlab/tutorial22.htm
Summary: Data integration has been extensively studied by the data management community and is a core task in the data pre-processing step of ML pipelines. When the integrated data is used for analysis and model training, responsible data science requires addressing concerns about data quality and bias. We present a tutorial on data integration and responsibility, highlighting the existing efforts in responsible data integration along with research opportunities and challenges. In this tutorial, we encourage the community to audit data integration tasks with responsibility measures and develop integration techniques that optimize the requirements of responsible data science. We focus on three critical aspects: (1) the requirements to be considered for evaluating and auditing data integration tasks for quality and bias; (2) the data integration tasks that elicit attention to data responsibility measures and methods to satisfy these requirements; and, (3) techniques, tasks, and open problems in data integration that help achieve data responsibility.
Presenters: Fatemeh Nargesian (University of Rochester), Abolfazl Asudeh (University of Illinois Chicago) and H. V. Jagadish (University of Michigan)
Data Democratisation with Deep Learning: The Anatomy of a Natural Language Data Interface
Duration: Half-Day (PM)
Website: https://darelab.imsi.athenarc.gr/tutorials/datadem_wsdm23/
Summary: In the age of the Digital Revolution, almost all human activities, from industrial and business operations to medical and academic research, are reliant on the constant integration and utilisation of ever-increasing volumes of data. However, the explosive volume and complexity of data makes data querying and exploration challenging even for experts, and makes the need to democratise the access to data, even for non-technical users, all the more evident. It is time to lift all technical barriers, by empowering users to access relational databases through conversation. We consider 3 main research areas that a natural language data interface is based on: Text-to-SQL, SQL-to-Text, and Data-to-Text. The purpose of this tutorial is a deep dive into these areas, covering state-of-the-art techniques and models, and explaining how the progress in the deep learning field has led to impressive advancements. We will present benchmarks that sparked research and competition, and discuss open problems and research opportunities with one of the most important challenges being the integration of these 3 research areas into one conversational system.
Presenters: George Katsogiannis-Meimarakis (Athena Research Center), Mike Xydas (Athena Research Center), Georgia Koutrika (Athena Research Center)
AutoML for Deep Recommender Systems: Fundamentals and Advances
Duration: Half-Day (PM)
Website: https://advanced-recommender-systems.github.io/AutoML-Recommendations/
Summary: Recommender systems have become increasingly important in our daily lives since they play an important role in mitigating the information overload problem, especially in many user-oriented online services. Recommender systems aim to identify a set of items that best match users’ explicit or implicit preferences, by utilizing the user and item interactions to improve the accuracy. With the fast advancement of deep neural networks (DNNs) in the past few decades, recommendation techniques have achieved promising performance. However, we still meet three inherent challenges to design deep recommender systems (DRS): 1) the majority of existing DRS are developed based on hand-crafted components, which requires ample expert knowledge recommender systems; 2) human error and bias can lead to suboptimal components, which reduces the recommendation effectiveness; 3) non-trivial time and engineering efforts are usually required to design the task-specific components in different recommendation scenarios. In this tutorial, we aim to give a comprehensive survey on the recent progress of advanced Automated Machine Learning (AutoML) techniques for solving the above problems in deep recommender systems. More specifically, we will present feature selection, feature embedding search, feature interaction search, and whole DRS pipeline model training and comprehensive search for deep recommender systems. In this way, we expect academic researchers and industrial practitioners in related fields can get deep understanding and accurate insight into the spaces, stimulate more ideas and discussions, and promote developments of technologies in recommendations.
Presenters: Xiangyu Zhao (City University of Hong Kong), Wenqi Fan (The Hong Kong Polytechnic University), Bo Chen (Huawei Noah’s Ark Lab), Yejing Wang (City University of Hong Kong), Huifeng Guo (Huawei Noah’s Ark Lab), Ruiming Tang (Huawei Noah’s Ark Lab)