Tutorials

Full Day Tutorial

Health Search: From Consumers to Clinicians

Bevan KoopmanCSIRO, Australia

Guido ZucconUniversity of Queensland, Australia

When: Full day.

Topic: This tutorial will cover topics from an area of information retrieval (IR) with significant societal impact – health search. Whether it is searching patient records, helping medical professionals find best-practice evidence, or helping the public locate reliable and readable health information online, health search is a challenging area for IR research with an actively growing community and many open problems. This tutorial will provide attendees with a full stack of knowledge on health search, from understanding users and their problems to practical, hands-on sessions on current tools and techniques, current campaigns and evaluation resources, as well as important open questions and future directions.

Audience: Researchers of all levels seeking to understand the challenges, tasks and recent developments in information retrieval related to health search. No prior knowledge in health search is required, making this tutorial ideal for those unfamiliar with this domain. The tutorial is also suitable for those familiar with health search as they will acquire insights from hands-on sessions. The tutorial will also provide an analysis of the successes and failures of current techniques, and an outline of the opportunities for IR research in the health domain.

Presenters: Bevan Koopman is a Research Scientist at the Commonwealth Science and Industrial Research Organisation (CSIRO), based in Brisbane, Australia. He leads projects dedicated to health search: novel search engine technology to improve access, retrieval and analysis of different health data. Guido Zuccon is a Senior Lecturer at University of Queensland and an ARC DECRA fellow and a Google Faculty Award recipient. His research interests include formal models of search and evaluation methods, in particular applied to health search. Guido and Bevan have a long history of collaboration in health research and co-supervise a number of PhD students in the area of health search.


Half-Day Tutorials

Causal Inference and Counterfactual Reasoning

Emre KıcımanMicrosoft, USA

Amit SharmaMicrosoft, India

When: Morning.

Topic: As computing systems are more frequently and more actively intervening to improve people’s work and daily lives, it is critical to correctly predict and understand the causal effects of these interventions. This tutorial will introduce participants to concepts in causal inference and counterfactual reasoning, drawing from abroad literature from statistics, social sciences and machine learning. To tackle such questions, we will introduce the key ingredient that causal analysis depends on—counterfactual reasoning—and describe the two most popular frameworks based on Bayesian graphical models and potential outcomes. Based on this, we will cover a range of methods suitable for doing causal inference with large-scale online data, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We will also focus on best practices for evaluation and validation of causal inference techniques, drawing from our own experiences.

Audience: Empirical researchers with who have experience working with data gathered from computing systems and are interested in causal insights from their data. Intermediate understanding of probability and statistics is preferred, although not required for following the core counterfactual concepts and examples.

Presenters: The presenters, Emre Kıcıman (Principal Researcher, Microsoft Research AI) and Amit Sharma (Researcher, Microsoft Research India) focus on methods for causal inference from large-scale data, for information retrieval, recommendation, computational social science, social computing and other applications. Amit's work combines principles from Bayesian graphical models, data mining and machine learning. He is focused on understanding the underlying mechanisms that shape people's activities as they interact with algorithmic systems, with an emphasis on the effect of recommendation systems and social influences. Emre's research focuses on causal analysis of large-scale, individual-level timelines to support decision-making applications, and he is broadly engaged in AI and its impacts on people and society.


Deep Learning for Matching in Search and Recommendation

Jun XuRenmin University of China, China

Xiangnan HeNational University of Singapore, Singapore

Hang LiBytedance AI Lab, China

When: Morning.

Topic: Matching is a key problem in both search and recommendation. In recent years deep learning have been successfully applied to the task and neural network based matching models have been extensively utilized in practice, including deep semantic matching models for search and neural collaborative filtering models for recommendation. In this tutorial, the presenters will give a comprehensive survey on recent progresses in deep learning for matching in search and recommendation. The tutorial is unique in that it provides a unified view on search and recommendation. By unifying the two tasks under the view of matching and comparably reviewing existing techniques the presenters will help the participants to generate deeper understanding and more insights on the space.

Audience: This half-day tutorial targets both PhD students, researchers, and industry practitioners who are interested in acquiring or advancing their current knowledge of search and recommendation. Participants of the tutorial are expected to have basic knowledge on search, recommendation, and machine learning.

Presenters: Dr Jun Xu is a professor at School of Information, Renmin University of China, Dr Xiangnan He is a senior research fellow at School of Computing, National University of Singapore (NUS). Dr Hang Li is a director of AI Lab, Bytedance Technology. The three presenters are experts on the subject of the tutorial.


Privacy-Preserving Data Mining in Industry

Krishnaram KenthapadiLinkedIn, USA

Ilya MironovGoogle, USA

Abhradeep ThakurtaUniversity of California Santa Cruz, USA

When: Morning.

Topic: Preserving the privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications; and privacy has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, Microsoft's differential privacy deployment for collecting Windows telemetry, and Uber's SQL differential privacy tool. We will also discuss various open source as well as commercial privacy tools, and conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.

Audience: This tutorial is aimed at attendees with a wide range of interests and backgrounds, including researchers interested in knowing about various privacy breaches, regulations, definitions, and techniques as well as practitioners interested in deploying privacy-preserving mechanisms for web-scale data mining applications. We will not assume any prerequisite knowledge, and present the intuition underlying various privacy definitions and techniques to ensure that the material is accessible to all WSDM attendees.

Presenters: The presenters have extensive experience in the field of privacy-preserving data mining, including the theory and application of privacy techniques. Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the fairness, transparency, explainability, and privacy modeling efforts across different LinkedIn applications. He also serves as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Committee. Ilya Mironov is a Staff Research Scientist at Google Brain working in privacy of machine learning. Abhradeep Guha Thakurta is an Assistant Professor in the Department of Computer Science at University of California Santa Cruz. His primary research interest is in the intersection of data privacy and machine learning. He focuses on demonstrating, both theoretically and in practice, that privacy enables designing better machine learning algorithms, and vice versa.


Conducting Laboratory Experiments Properly with Statistical Tools

Tetsuya SakaiWaseda University, Japan

When: Afternoon.

Topic: This hands-on half-day tutorial consists of two sessions. Part I covers the following topics: preliminaries; paired and two-sample t-tests, confidence intervals; one-way ANOVA and two-way ANOVA without replication; familiwise error rate. Part II covers the following topics: Tukey's HSD test, simultaneous confidence intervals; randomisation test and randomised Tukey HSD test; what's wrong with statistical significance tests; effect sizes and statistical power; topic set size design and power analysis; summary; and how to report your results.

Audience: Participants should have some prior knowledge about the very basics of statistical significance testing including being familiar with some very basic concepts of statistical significance testing such as the normal distribution and the Type I error; and are strongly encouraged to bring a laptop with R already installed. They will learn how to design and conduct statistical significance tests for comparing the mean effectiveness scores of two or more systems appropriately, and to report on the test results in an informative manner.

Presenter: Tetsuya Sakai is a professor and the head of department at the Department of Computer Science and Engineering, Waseda University, Japan. He is also a visiting professor at the National Institute of Informatics. He joined Toshiba in 1993 and obtained a PhD from Waseda in 2000. From 2000 to 2001, he was supervised by the late Karen Sparck Jones at the Computer Laboratory, University of Cambridge, as a visiting researcher. In 2007, he joined NewsWatch, Inc. as the director of the Natural Language Processing Lab. In 2009, he joined Microsoft Research Asia. He joined the Waseda faculty in 2013, and has received three teaching awards from the university (2014, 2016, 2017). He is an editor-in-chief of the Information Retrieval Journal (Springer) and an associate editor of ACM TOIS. He is the author of a Springer book: Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power (2018).


Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned

Sarah BirdFacebook, USA

Krishnaram KenthapadiLinkedIn, USA

Emre KıcımanMicrosoft, USA

Margaret MitchellGoogle, USA

When: Afternoon.

Topic: Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice, by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.

Audience: This tutorial is aimed at attendees with a wide range of interests and backgrounds, including researchers interested in knowing about algorithmic bias / discrimination issues and takeaways, key regulations / laws, and fairness definitions / techniques as well as practitioners interested in implementing fairness-aware mechanisms for web-scale machine learning applications. We will not assume any prerequisite knowledge, and will present the intuition underlying various fairness definitions and techniques to ensure that the material is accessible to all WSDM attendees.

Presenters: The presenters have extensive experience in the field of fairness-aware machine learning, including the theory and application of fairness techniques. Sarah Bird leads strategic projects at the intersection of AI research and products at Facebook. Her current work focuses on AI Ethics and developing AI responsibly at scale. Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the fairness, transparency, explainability, and privacy modeling efforts across different LinkedIn applications. He also serves as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Committee. Emre Kıcıman is a Principal Researcher at Microsoft Research AI. His current research focuses on causal analysis and data bias in the context of computational social science analyses and decision support systems; and, broadly, the implications of AI for people and society. Margaret Mitchell is a Senior Research Scientist in Google's Research & Machine Intelligence group, working on artificial intelligence, multimodality, and ethics, and she currently leads Google's Ethical AI team. Her research involves vision-language, computer vision, and grounded language generation, focusing on how to evolve artificial intelligence towards positive goals.


Fake News: Fundamental Theories, Detection Strategies and Challenges

Xinyi ZhouSyracuse University, USA

Reza ZafaraniSyracuse University, USA

Kai ShuArizona State University, USA

Huan LiuArizona State University, USA

When: Afternoon.

Topic: The explosive growth of fake news and its erosion to democracy, justice, and public trust increased the demand for fake news detection. As an interdisciplinary topic, the study of fake news encourages a concerted effort of experts in computer and information science, political science, journalism, social psychology, and economics. A comprehensive framework to systematically understand and detect fake news is necessary to attract and unite researchers in related areas to work on fake news topic. This tutorial aims to clearly present (1) fake news detection problems, challenges, and research direction; (2) a comparison between fake news and other related concepts (for example, rumors); (3) the fundamental theories developed across various disciplines that facilitate interdisciplinary research; (4) various detection strategies unified under a comprehensive fake news detection framework; and (5) the state-of-the-art datasets, patterns, and models. We present fake news detection from various perspectives, broadly adopting techniques from data mining, machine learning, natural language processing, information retrieval and social search. Facing the upcoming 2020 US presidential election, challenges for automatic, effective and efficient fake news detection are also detailed in this tutorial.

Audience: Researchers, graduate students, practitioners, and project managers in areas such as computer science and engineering, information science and management, journalism, political science, social sciences, psychology and economics. Preliminary background in data mining, machine learning, natural language processing is recommended for tutorial participants.

Presenters: Xinyi Zhou is a PhD student of Computer and Information Science and Engineering at Syracuse University. She also works as a research assistant at Data Lab being advised by Dr Reza Zafarani. Her research interests span Computational Journalism, Data Mining, and Social Media Mining. Reza Zafarani is an Assistant Professor of EECS at Syracuse University. Reza's research interests are in Social Media Mining, Data Mining, Machine Learning, and Social Network Analysis, with an emphasis on addressing challenges in large-scale data analytics to enhance the scientific discovery process from big data, especially in social media. Kai Shu is a PhD student majoring in computer science at Arizona State University. He also works as a research assistant at the Data Mining and Machine Learning Lab (DMML), supervised by Dr Huan Liu. Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his PhD in Computer Science at University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at National University of Singapore.

  • Key Dates

  • Early bird registration deadline 8 December 2018
  • On time registration deadline 8 February 2019
  • WSDM conference 11-15 February 2019