Tutorials
In case of limited space due to popularity, space will be first come,
first serve.
Neural Networks for Information Retrieval
09:00 - 17:00, Monday, February 5, 2018
Room: Salon II & III
Tom Kenter (University of Amsterdam), Alexey Borisov (Yandex), Christophe Van Gysel (University of
Amsterdam), Mostafa Dehghani (University of Amsterdam), Maarten de Rijke (University of Amsterdam),
Bhaskar Mitra (Microsoft)
Machine learning plays a role in many aspects of modern IR systems,
and deep learning is applied in all of them. The fast pace of
modern-day research has given rise to many approaches to many
IR problems. The amount of information available can be overwhelming
both for junior students and for experienced researchers
looking for new research topics and directions. The aim of this full day
tutorial is to give a clear overview of current tried-and-trusted
neural methods in IR and how they benefit IR.
A Critical Review of Online Social Data: Biases, Methodological
Pitfalls, and Ethical Boundaries
09:00 - 12:30, Monday, February 5, 2018
Room: Admiralty
Alexandra Olteanu (IBM), Emre Kiciman (Microsoft), Carlos Castillo (Universitat Pompeu Fabra)
Online social data like user-generated content, expressed or implicit
relations among people, and behavioral traces are at the core of
many popular web applications and platforms, driving the research
agenda of researchers in both academia and industry. The promises
of social data are many, including the understanding of "what the
world thinks" about a social issue, brand, product, celebrity, or other
entity, as well as enabling better decision-making in a variety of
fields including public policy, healthcare, and economics. However,
many academics and practitioners are increasingly warning against
the na\"ive usage of social data. They highlight that there are biases
and inaccuracies occurring at the source of the data, but also introduced
during data processing pipeline; there are methodological
limitations and pitfalls, as well as ethical boundaries and unexpected
outcomes that are often overlooked. Such an overlook can
lead to wrong or inappropriate results that can be consequential.
This tutorial recognizes the rigor with which these issues are
addressed by different researchers varies across a wide range, and
aims to survey and categorize common classes of data biases and
pitfalls that can occur both at the sources of social data as well as
along the prototypical data processing pipeline.
Athlytics: Winning in Sports with Data
09:00--12:30, Monday, February 5, 2018
Room: Pavillion
Konstantinos Pelechrinis (University of Pittsburgh), Evangelos Papalexakis (University of California Riverside)
Data and analytics have been part of the sports industry from as
early as the 1870s, when the first boxscore in baseball was recorded.
However, it is only recently that advanced data mining and machine
learning techniques have been utilized for facilitating the operations
of sports franchises. While part of the reason is related with
the ability to collect more fine-grained data, an equally important
factor for this turn to analytics is the huge success and competitive
advantage that early adopters of investment in analytics enjoyed
(popularized by the best-seller "Moneyball" that described the success
that Oakland Athletics had with analytics). Draft selection,
game-day decision making and player evaluation are just a few of
the applications where sports analytics play a crucial role today.
Apart from the sports clubs, other stakeholders in the industry (e.g.,
the leagues' offices, media, etc.) invest in analytics. The leagues increasingly
rely on data in order to decide on potential rule changes.
For instance, the most recent rule change in NFL, i.e., the kickoff
touchback, was a result of thorough data analysis of concussion
instances. In this tutorial we will review the literature in data mining
and machine learning techniques for sports analytics. We will
introduce the audience to the design and methodologies behind
advanced metrics such as the adjusted plus/minus for evaluating
basketball players, spatial metrics for evaluating the ability of a
player to spread the defense in basketball, and the Player Efficiency
Rating (PER). We will also go in depth in advanced data mining
methods, and in particular tensor mining, that can analyze heterogenous
data similar to the ones available in today's sports world.
Differential Privacy for Information Retrieval
09:00 - 12:30, Monday, February 5, 2018
Room: Plaza
Grace Hui Yang (Georgetown University), Sicong Zhang (Georgetown University)
The concern for privacy is real for any research that uses user data.
Information Retrieval (IR) is not an exception. Many IR algorithms
and applications require the use of users' personal information,
contextual information and other sensitive and private information.
The extensive use of personalization in IR has become a double-edged
sword. Sometimes, the concern becomes so overwhelming
that IR research has to stop to avoid privacy leaks. The good news
is that recently there have been increasing attentions paid on the
joint field of privacy and IR -- privacy-preserving IR. As part of
the effort, this tutorial offers an introduction to differential privacy
(DP), one of the most advanced techniques in privacy research,
and provides necessary set of theoretical knowledge for applying
privacy techniques in IR. Differential privacy is a technique that
provides strong privacy guarantees for data protection. Theoretically,
it aims to maximize the data utility in statistical datasets
while minimizing the risk of exposing individual data entries to any
adversary. Differential privacy has been successfully applied to a
wide range of applications in database (DB) and data mining (DM).
The research in privacy-preserving IR is relatively new, however,
research has shown that DP is also effective in supporting multiple
IR tasks. This tutorial aims to lay a theoretical foundation of DP and
explains how it can be applied to IR. It highlights the differences in
IR tasks and DB and DM tasks and how DP connects to IR.We hope
the attendees of this tutorial will have a good understanding of DP
and other necessary knowledge to work on the newly minted joint
research field of privacy and IR.
Influence Maximization in Online Social Networks
09:00 - 12:30, Monday, February 5, 2018
Room: Ballroom Terrace
Cigdem Aslay (ISI Foundation), Laks V.S. Lakshmanan (The University of British Columbia), Wei Lu (LinkedIn Corporation), Xiaokui Xiao (National University of Singapore)
Viral marketing, a popular concept in the business literature, has
recently attracted a lot of attention also in computer science, due
to its high application potential and computational challenges.The
idea of viral marketing is simple yet appealing: by targeting the
most influential users in a social network (e.g., by giving them
free or price-discounted samples), one can exploit the power of
the network effect through word-of-mouth, thus delivering the
marketing message to a large portion of the network analogous to
the spread of a virus.
Influence maximization is the key algorithmic problem behind
viral marketing. The problem, as originally defined by Kempe et
al. [32], is as follows: given (i) a directed social network, (ii) a set of
weights associated with edges, representing strengths or probabilities
of influence among users, (iii) a stochastic influence propagation
model that governs how a certain behavior would diffuse among
users, and (iv) a cardinality constraint k, aim is to identify a set of
k nodes, called the "seed set", that can be targeted to maximize the
expected number of influenced nodes. Kempe et al. studied influence
maximization as a discrete optimization problem, obtaining
provable approximation guarantees under several social influence
propagation models. Following this seminal work, research on the
dynamics of social influence propagation and influence maximization
took off in several dimensions.
In this tutorial we cover major algorithmic and theoretical developments
and issues arising in this field. A good chunk of this
research has been done in the data mining and databases communities.
While related tutorials [2, 10, 23, 24, 35, 55] appeared in VLDB'11, KDD'11, KDD'12, WSDM'13, WWW'15, and KDD'15,
our tutorial showcase recent advances in the field not covered by
the previous tutorials. A tutorial like the one that we propose can
allow interested researchers and practitioners to gain up-to-date
knowledge on the recent theoretical and algorithmic developments
and seize the opportunity to contribute to the advancement of this
fast-paced field.
Network Science of Teams:
Characterization, Prediction, and Optimization
13:30 - 17:00, Monday, February 5, 2018
Room: Admiralty
Liangyue Li (Arizona State University), Hanghang Tong (Arizona State University)
In defining the essence of professional teamwork, Hackman and
Katz [4] stated that teams function as "purposive social systems",
defined as people who are readily identifiable to each other by
role and position and who work interdependently to accomplish
one or more collective objectives. Teams are increasingly indispensable
to achievement in any organization. This is perhaps most
evident in multinational organizations where communication technology
has transformed the geographically dispersed teams and
networks. Business operations in the large organizations now involve
large, interactive, and layered networks of teams and personnel
communicating across hierarchies and countries during the
execution of complex and multifaceted international businesses.
Despite the organizations' substantial dependency on teams, fundamental
knowledge about the conduct of team-enabled operations
is lacking, especially at the social, cognitive and information level in
relation to team performance and network dynamics.
Generally speaking, the team performance can be viewed as
the composite of the following three aspects, including (1) its users,
(2) tasks that the team performs and (3) the networks that the team
is embedded in or operates on. In this tutorial, we will provide
a comprehensive review of the recent advances in characterizing,
predicting and optimizing teams' performance in the context of
composite networks (i.e., social-cognitive-information networks).
Research in sociology and psychology has long been trying to
characterize the high-performing teams in organizations. The basics
of team effectiveness were identified by J. Richard Hackman,
who uncovered a groundbreaking insight: what matter most to
collaboration are certain enabling conditions. Recent studies found
that three of Hackman's conditions -- a compelling direction, a
strong structure, and a supportive context -- continue to be particularly
critical to team success [3]. We would comprehensively survey related literatures in sociology, psychology and computer
science.
Understanding the dynamic mechanisms that drive the success
of high-performing teams can provide the key insights into building
the best teams and hence lifting the productivity and profitability
of the organizations. For this purpose, we introduce some of the
recent work on developing novel predictive models to forecast the
long-term performance of teams (point prediction) as well as the
pathway to impact (trajectory prediction).
From the practical perspective, it is important to form a good
team in the context of networks for a given tasks. For an existing
team, it is often desirable to optimize its performance through
expanding the team by bringing a new team member with certain
expertise, or finding a new candidate to replace a current underperforming
team member. We would introduce recent advances in
team performance optimization.
Tutorial on Metrics of User Engagement: Applications to News,
Search and E-Commerce
13:30 - 17:00, Monday, February 5, 2018
Room: Marina Vista
Mounia Lalmas (Spotify), Liangjie Long (Etsy Inc.)
User engagement plays a central role in companies operating online
services, such as search engines, news portals, e-commerce sites,
and social networks. A main challenge is to leverage collected
knowledge about the daily online behavior of millions of users to
understand what engage them short-term and more importantly
long-term. The most common way that engagement is measured
is through various online metrics, acting as proxy measures of
user engagement. This tutorial will review these metrics, their
advantages and drawbacks, and their appropriateness to various
types of online services. As case studies, we will focus on three
types of services, news, search and e-commerce.We will also briefly
discuss how to develop better machine learning models to optimize
online metrics, and design experiments to test these models.
Mining Knowledge Graphs From Text
13:30 - 17:00, Monday, February 5, 2018
Room: Ballroom Terrace
Jay Pujara (University of Southern California), Sameer Singh (University of California, Irvine)
Knowledge graphs have become an increasingly crucial component
in machine intelligence systems, powering ubiquitous
digital assistants and inspiring several large scale academic
projects across the globe. Our tutorial explains why knowledge
graphs are important, how knowledge graphs are constructed,
and where new research opportunities exist for
improving the state-of-the-art. In this tutorial, we cover the
many sophisticated approaches that complete and correct
knowledge graphs. We organize this exploration into two
main classes of models. The first include probabilistic logical
frameworks that use graphical models, random walks, or
statistical rule mining to construct knowledge graphs. The
second class of models includes latent space models such as
matrix and tensor factorization and neural networks. We
conclude the tutorial with a critical comparison of techniques
and results. We will offer practical advice for novices to identify
common empirical challenges and concrete data sets for
initial experimentation. Finally, we will highlight promising
areas of current and future work.