The 11th ACM International Conference on Web Search and Data Mining

Los Angeles, California, USA, Feb. 5-9, 2018.

Tutorials

In case of limited space due to popularity, space will be first come, first serve.

Neural Networks for Information Retrieval

09:00 - 17:00, Monday, February 5, 2018
Room: Salon II & III
Tom Kenter (University of Amsterdam), Alexey Borisov (Yandex), Christophe Van Gysel (University of Amsterdam), Mostafa Dehghani (University of Amsterdam), Maarten de Rijke (University of Amsterdam), Bhaskar Mitra (Microsoft)
Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many approaches to many IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. The aim of this full day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR.

A Critical Review of Online Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

09:00 - 12:30, Monday, February 5, 2018
Room: Admiralty
Alexandra Olteanu (IBM), Emre Kiciman (Microsoft), Carlos Castillo (Universitat Pompeu Fabra)
Online social data like user-generated content, expressed or implicit relations among people, and behavioral traces are at the core of many popular web applications and platforms, driving the research agenda of researchers in both academia and industry. The promises of social data are many, including the understanding of "what the world thinks" about a social issue, brand, product, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. However, many academics and practitioners are increasingly warning against the na\"ive usage of social data. They highlight that there are biases and inaccuracies occurring at the source of the data, but also introduced during data processing pipeline; there are methodological limitations and pitfalls, as well as ethical boundaries and unexpected outcomes that are often overlooked. Such an overlook can lead to wrong or inappropriate results that can be consequential.

This tutorial recognizes the rigor with which these issues are addressed by different researchers varies across a wide range, and aims to survey and categorize common classes of data biases and pitfalls that can occur both at the sources of social data as well as along the prototypical data processing pipeline.


Athlytics: Winning in Sports with Data

09:00--12:30, Monday, February 5, 2018
Room: Pavillion
Konstantinos Pelechrinis (University of Pittsburgh), Evangelos Papalexakis (University of California Riverside)
Data and analytics have been part of the sports industry from as early as the 1870s, when the first boxscore in baseball was recorded. However, it is only recently that advanced data mining and machine learning techniques have been utilized for facilitating the operations of sports franchises. While part of the reason is related with the ability to collect more fine-grained data, an equally important factor for this turn to analytics is the huge success and competitive advantage that early adopters of investment in analytics enjoyed (popularized by the best-seller "Moneyball" that described the success that Oakland Athletics had with analytics). Draft selection, game-day decision making and player evaluation are just a few of the applications where sports analytics play a crucial role today. Apart from the sports clubs, other stakeholders in the industry (e.g., the leagues' offices, media, etc.) invest in analytics. The leagues increasingly rely on data in order to decide on potential rule changes. For instance, the most recent rule change in NFL, i.e., the kickoff touchback, was a result of thorough data analysis of concussion instances. In this tutorial we will review the literature in data mining and machine learning techniques for sports analytics. We will introduce the audience to the design and methodologies behind advanced metrics such as the adjusted plus/minus for evaluating basketball players, spatial metrics for evaluating the ability of a player to spread the defense in basketball, and the Player Efficiency Rating (PER). We will also go in depth in advanced data mining methods, and in particular tensor mining, that can analyze heterogenous data similar to the ones available in today's sports world.

Differential Privacy for Information Retrieval

09:00 - 12:30, Monday, February 5, 2018
Room: Plaza
Grace Hui Yang (Georgetown University), Sicong Zhang (Georgetown University)
The concern for privacy is real for any research that uses user data. Information Retrieval (IR) is not an exception. Many IR algorithms and applications require the use of users' personal information, contextual information and other sensitive and private information. The extensive use of personalization in IR has become a double-edged sword. Sometimes, the concern becomes so overwhelming that IR research has to stop to avoid privacy leaks. The good news is that recently there have been increasing attentions paid on the joint field of privacy and IR -- privacy-preserving IR. As part of the effort, this tutorial offers an introduction to differential privacy (DP), one of the most advanced techniques in privacy research, and provides necessary set of theoretical knowledge for applying privacy techniques in IR. Differential privacy is a technique that provides strong privacy guarantees for data protection. Theoretically, it aims to maximize the data utility in statistical datasets while minimizing the risk of exposing individual data entries to any adversary. Differential privacy has been successfully applied to a wide range of applications in database (DB) and data mining (DM). The research in privacy-preserving IR is relatively new, however, research has shown that DP is also effective in supporting multiple IR tasks. This tutorial aims to lay a theoretical foundation of DP and explains how it can be applied to IR. It highlights the differences in IR tasks and DB and DM tasks and how DP connects to IR.We hope the attendees of this tutorial will have a good understanding of DP and other necessary knowledge to work on the newly minted joint research field of privacy and IR.

Influence Maximization in Online Social Networks

09:00 - 12:30, Monday, February 5, 2018
Room: Ballroom Terrace
Cigdem Aslay (ISI Foundation), Laks V.S. Lakshmanan (The University of British Columbia), Wei Lu (LinkedIn Corporation), Xiaokui Xiao (National University of Singapore)
Viral marketing, a popular concept in the business literature, has recently attracted a lot of attention also in computer science, due to its high application potential and computational challenges.The idea of viral marketing is simple yet appealing: by targeting the most influential users in a social network (e.g., by giving them free or price-discounted samples), one can exploit the power of the network effect through word-of-mouth, thus delivering the marketing message to a large portion of the network analogous to the spread of a virus. Influence maximization is the key algorithmic problem behind viral marketing. The problem, as originally defined by Kempe et al. [32], is as follows: given (i) a directed social network, (ii) a set of weights associated with edges, representing strengths or probabilities of influence among users, (iii) a stochastic influence propagation model that governs how a certain behavior would diffuse among users, and (iv) a cardinality constraint k, aim is to identify a set of k nodes, called the "seed set", that can be targeted to maximize the expected number of influenced nodes. Kempe et al. studied influence maximization as a discrete optimization problem, obtaining provable approximation guarantees under several social influence propagation models. Following this seminal work, research on the dynamics of social influence propagation and influence maximization took off in several dimensions. In this tutorial we cover major algorithmic and theoretical developments and issues arising in this field. A good chunk of this research has been done in the data mining and databases communities. While related tutorials [2, 10, 23, 24, 35, 55] appeared in VLDB'11, KDD'11, KDD'12, WSDM'13, WWW'15, and KDD'15, our tutorial showcase recent advances in the field not covered by the previous tutorials. A tutorial like the one that we propose can allow interested researchers and practitioners to gain up-to-date knowledge on the recent theoretical and algorithmic developments and seize the opportunity to contribute to the advancement of this fast-paced field.

Network Science of Teams: Characterization, Prediction, and Optimization

13:30 - 17:00, Monday, February 5, 2018
Room: Admiralty
Liangyue Li (Arizona State University), Hanghang Tong (Arizona State University)
In defining the essence of professional teamwork, Hackman and Katz [4] stated that teams function as "purposive social systems", defined as people who are readily identifiable to each other by role and position and who work interdependently to accomplish one or more collective objectives. Teams are increasingly indispensable to achievement in any organization. This is perhaps most evident in multinational organizations where communication technology has transformed the geographically dispersed teams and networks. Business operations in the large organizations now involve large, interactive, and layered networks of teams and personnel communicating across hierarchies and countries during the execution of complex and multifaceted international businesses. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the social, cognitive and information level in relation to team performance and network dynamics. Generally speaking, the team performance can be viewed as the composite of the following three aspects, including (1) its users, (2) tasks that the team performs and (3) the networks that the team is embedded in or operates on. In this tutorial, we will provide a comprehensive review of the recent advances in characterizing, predicting and optimizing teams' performance in the context of composite networks (i.e., social-cognitive-information networks). Research in sociology and psychology has long been trying to characterize the high-performing teams in organizations. The basics of team effectiveness were identified by J. Richard Hackman, who uncovered a groundbreaking insight: what matter most to collaboration are certain enabling conditions. Recent studies found that three of Hackman's conditions -- a compelling direction, a strong structure, and a supportive context -- continue to be particularly critical to team success [3]. We would comprehensively survey related literatures in sociology, psychology and computer science. Understanding the dynamic mechanisms that drive the success of high-performing teams can provide the key insights into building the best teams and hence lifting the productivity and profitability of the organizations. For this purpose, we introduce some of the recent work on developing novel predictive models to forecast the long-term performance of teams (point prediction) as well as the pathway to impact (trajectory prediction). From the practical perspective, it is important to form a good team in the context of networks for a given tasks. For an existing team, it is often desirable to optimize its performance through expanding the team by bringing a new team member with certain expertise, or finding a new candidate to replace a current underperforming team member. We would introduce recent advances in team performance optimization.

Tutorial on Metrics of User Engagement: Applications to News, Search and E-Commerce

13:30 - 17:00, Monday, February 5, 2018
Room: Marina Vista
Mounia Lalmas (Spotify), Liangjie Long (Etsy Inc.)
User engagement plays a central role in companies operating online services, such as search engines, news portals, e-commerce sites, and social networks. A main challenge is to leverage collected knowledge about the daily online behavior of millions of users to understand what engage them short-term and more importantly long-term. The most common way that engagement is measured is through various online metrics, acting as proxy measures of user engagement. This tutorial will review these metrics, their advantages and drawbacks, and their appropriateness to various types of online services. As case studies, we will focus on three types of services, news, search and e-commerce.We will also briefly discuss how to develop better machine learning models to optimize online metrics, and design experiments to test these models.

Mining Knowledge Graphs From Text

13:30 - 17:00, Monday, February 5, 2018
Room: Ballroom Terrace
Jay Pujara (University of Southern California), Sameer Singh (University of California, Irvine)
Knowledge graphs have become an increasingly crucial component in machine intelligence systems, powering ubiquitous digital assistants and inspiring several large scale academic projects across the globe. Our tutorial explains why knowledge graphs are important, how knowledge graphs are constructed, and where new research opportunities exist for improving the state-of-the-art. In this tutorial, we cover the many sophisticated approaches that complete and correct knowledge graphs. We organize this exploration into two main classes of models. The first include probabilistic logical frameworks that use graphical models, random walks, or statistical rule mining to construct knowledge graphs. The second class of models includes latent space models such as matrix and tensor factorization and neural networks. We conclude the tutorial with a critical comparison of techniques and results. We will offer practical advice for novices to identify common empirical challenges and concrete data sets for initial experimentation. Finally, we will highlight promising areas of current and future work.