WSDM '19- Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningFull Citation in the ACM Digital Library
SESSION: Keynote & Invited Talks
Online services are increasingly intelligent. They evolve intelligently through A/B testing and experimentation, employ artificial intelligence in their core functionality using machine learning, and seamlessly engage human intelligence by connecting people in a low-friction manner. All of this has resulted in incredibly engaging experiences -- but not particularly productive ones. As more and more of people's most important tasks move online we need to think carefully about the underlying influence online services have on people's ability to attend to what matters to them. There is an opportunity to use intelligence for this to do more than just not distract people and actually start helping people attend to what matters even better than they would otherwise. This presentation explores the ways we might make it as compelling and easy to start an important task as it is to check social media.
Technologists have a responsibility to develop Data Science and AI methods that satisfy fairness, accountability, transparency, and ethical requirements. This statement has repeatedly been made in recent years and in many quarters, including major newspapers and magazines. The technical community has responded with work in this direction. However, almost all of this work has been directed towards the decision-making algorithm that performs a task such as scoring or classification. This presentation examines the Data Science pipeline, and points out the importance of addressing responsibility in all stages of this pipeline, and not just the decision-making stage. The presentation then outlines some recent research results that have been obtained in that regard.
The computing industry has been on an inexorable march toward simplifying human-computer interaction, and earlier this decade Amazon bet big on combining voice technology and artificial intelligence. In 2014, with the introduction of Echo and Alexa, Amazon created an entirely new technology category with an AI-first strategy and vision. Since then, Alexa has captured the imagination of customers across the globe, and the company has accelerated the pace of AI research and innovation in support of its promise to improve Alexa every day. In this presentation Rohit Prasad, Vice President and Head Scientist of Amazon Alexa, shares his insights into how recent scientific innovations are advancing Alexa.
The goals of learning from user data and preserving user privacy are often considered to be in conflict. This presentation will demonstrate that there are contexts when provable privacy guarantees can be an enabler for better web search and data mining (WSDM), and can empower researchers hoping to change the world by mining sensitive user data. The presentation starts by motivating the rigorous statistical data privacy definition that is particularly suitable for today's world of big data, differential privacy. It will then demonstrate how to achieve differential privacy for WSDM tasks when the data collector is trusted by the users. Using Chrome's deployment of RAPPOR as a case study, it will be shown that achieving differential privacy while preserving utility is feasible even when the data collector is not trusted. The presentation concludes with open problems and challenges for the WSDM community.
Interactive systems such as search engines or recommender systems are increasingly moving away from single-turn exchanges with users. Instead, series of exchanges between the user and the system are becoming mainstream, especially when users have complex needs or when the system struggles to understand the user's intent. Standard machine learning has helped us a lot in the single-turn paradigm, where we use it to predict: intent, relevance, user satisfaction, etc. When we think of search or recommendation as a series of exchanges, we need to turn to bandit algorithms to determine which action the system should take next, or to reinforcement learning to determine not just the next action but also to plan future actions and estimate their potential pay-off. The use of reinforcement learning for search and recommendations comes with a number of challenges, because of the very large action spaces, the large number of potential contexts, and noisy feedback signals characteristic for this domain. This presentation will survey some recent success stories of reinforcement learning for search, recommendation, and conversations; and will identify promising future research directions for reinforcement learning for search and recommendation.
SESSION: Session 1: Search and Ranking
Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed.
Reducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful approach to balancing efficiency and effectiveness trade-offs in large-scale search systems. However, learning a cascade model is often complex, and usually performed stagewise independently across the entire ranking pipeline. In this work we show that learning a cascade ranking model in this manner is often suboptimal in terms of both effectiveness and efficiency. We present a new general framework for learning an end-to-end cascade of rankers using backpropagation. We show that stagewise objectives can be chained together and optimized jointly to achieve significantly better trade-offs globally. This novel approach is generalizable to not only differentiable models but also state-of-the-art tree-based algorithms such as LambdaMART and cost-efficient gradient boosted trees, and it opens up new opportunities for exploring additional efficiency-effectiveness trade-offs in large-scale search systems.
Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of [email protected] are 15%, 5%, 7%, 5%, respectively.
MSA: Jointly Detecting Drug Name and Adverse Drug Reaction Mentioning Tweets with Multi-Head Self-Attention
Twitter is a popular social media platform for information sharing and dissemination. Many Twitter users post tweets to share their experiences about drugs and adverse drug reactions. Automatic detection of tweets mentioning drug names and adverse drug reactions at a large scale has important applications such as pharmacovigilance. However, detecting drug name and adverse drug reaction mentioning tweets is very challenging, because tweets are usually very noisy and informal, and there are massive misspellings and user-created abbreviations for these mentions. In addition, these mentions are usually context dependent. In this paper, we propose a neural approach with hierarchical tweet representation and multi-head self-attention mechanism to jointly detect tweets mentioning drug names and adverse drug reactions. In order to alleviate the influence of massive misspellings and user-created abbreviations in tweets, we propose to use a hierarchical tweet representation model to first learn word representations from characters and then learn tweet representations from words. In addition, we propose to use multi-head self-attention mechanism to capture the interactions between words to fully model the contexts of tweets. Besides, we use additive attention mechanism to select the informative words to learn more informative tweet representations. Experimental results validate the effectiveness of our approach.
SESSION: Session 2: Knowledge Graphs and Analytics
Extracting entities and their relations from text is an important task for understanding massive text corpora. Open information extraction (IE) systems mine relation tuples (i.e., entity arguments and a predicate string to describe their relation) from sentences. These relation tuples are not confined to a predefined schema for the relations of interests. However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions. In this paper, we propose a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework. Leveraging facts from external knowledge bases as supervision, the new system can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics. Our system operates by solving a joint optimization problem to unify (1) segmenting entity/relation phrases in individual sentences based on local context; and (2) measuring the quality of tuples extracted from individual sentences with a translating-based objective. Learning the two subtasks jointly helps correct errors produced in each subtask so that they can mutually enhance each other. Experiments on two real-world corpora from different domains demonstrate the effectiveness, generality, and robustness of ReMine when compared to state-of-the-art open IE systems.
Recent studies show that by combining network topology and node attributes, we can better understand community structures in complex networks. However, existing algorithms do not explore "contextually" similar node attribute values, and therefore may miss communities defined with abstract concepts. We propose a community detection and characterization algorithm that incorporates the contextual information of node attributes described by multiple domain-specific hierarchical concept graphs. The core problem is to find the context that can best summarize the nodes in communities, while also discovering communities aligned with the context summarizing communities. We formulate the two intertwined problems, optimal community-context computation, and community discovery, with a coordinate-ascent based algorithm that iteratively updates the nodes' community label assignment with a community-context and computes the best context summarizing nodes of each community. Our unique contributions include (1) a composite metric on Informativeness and Purity criteria in searching for the best context summarizing nodes of a community; (2) a node similarity measure that incorporates the context-level similarity on multiple node attributes; and (3) an integrated algorithm that drives community structure discovery by appropriately weighing edges. Experimental results on public datasets show nearly 20 percent improvement on F-measure and Jaccard for discovering underlying community structure over the current state-of-the-art of community detection methods. Community structure characterization was also accurate to find appropriate community types for four datasets.
Representation learning models map data instances into a low-dimensional vector space, thus facilitating the deployment of subsequent models such as classification and clustering models, or the implementation of downstream applications such as recommendation and anomaly detection. However, the outcome of representation learning is difficult to be directly understood by users, since each dimension of the latent space may not have any specific meaning. Understanding representation learning could be beneficial to many applications. For example, in recommender systems, knowing why a user instance is mapped to a certain position in the latent space may unveil the user's interests and profile. In this paper, we propose an interpretation framework to understand and describe how representation vectors distribute in the latent space. Specifically, we design a coding scheme to transform representation instances into spatial codes to indicate their locations in the latent space. Following that, a multimodal autoencoder is built for generating the description of a representation instance given its spatial codes. The coding scheme enables indication of position with different granularity. The incorporation of autoencoder makes the framework capable of dealing with different types of data. Several metrics are designed to evaluate interpretation results. Experiments under various application scenarios and different representation learning models are conducted to demonstrate the flexibility and effectiveness of the proposed framework.
Identifying and recommending potential new customers for local businesses are crucial to the survival and success of local businesses. A key component to identifying the right customers is to understand the decision-making process of choosing a business over the others. However, modeling this process is an extremely challenging task as a decision is influenced by multiple factors. These factors include but are not limited to an individual's taste or preference, the location accessibility of a business, and the reputation of a business from social media. Most of the recommender systems lack the power to integrate multiple factors together and are hardly extensible to accommodate new incoming factors. In this paper, we introduce a unified framework, CORALS, which considers the personal preferences of different customers, the geographical influence, and the reputation of local businesses in the customer recommendation task. To evaluate the proposed model, we conduct a series of experiments to extensively compare with 12 state-of-the-art methods using two real-world datasets. The results demonstrate that CORALS outperforms all these baselines by a significant margin in most scenarios. In addition to identifying potential new customers, we also break down the analysis for different types of businesses to evaluate the impact of various factors that may affect customers' decisions. This information, in return, provides a great resource for local businesses to adjust their advertising strategies and business services to attract more prospective customers.
A supervised method relies on simple, lightweight features in order to distinguish Wikipedia articles that are classes (Shield volcano) from other articles (Kilauea). The features are lexical or semantic in nature. Experimental results in multiple languages over multiple evaluation sets demonstrate the superiority of the proposed method over previous work.
Fact-checking is a crucial task for accurately populating, updating and curating knowledge graphs. Manually validating candidate facts is time-consuming. Prior work on automating this task focuses on estimating truthfulness using numerical scores which are not human-interpretable. Others extract explicit mentions of the candidate fact in the text as an evidence for the candidate fact, which can be hard to directly spot. In our work, we introduce ExFaKT, a framework focused on generating human-comprehensible explanations for candidate facts. ExFaKT uses background knowledge encoded in the form of Horn clauses to rewrite the fact in question into a set of other easier-to-spot facts. The final output of our framework is a set of semantic traces for the candidate fact from both text and knowledge graphs. The experiments demonstrate that our rewritings significantly increase the recall of fact-spotting while preserving high precision. Moreover, we show that the explanations effectively help humans to perform fact-checking and can also be exploited for automating this task.
Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions -- bi-directional effects between entities and relations --- help select related information when predicting a new triple, but haven't been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective -- giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions.
Question answering over knowledge graph (QA-KG) aims to use facts in the knowledge graph (KG) to answer natural language questions. It helps end users more efficiently and more easily access the substantial and valuable knowledge in the KG, without knowing its data structures. QA-KG is a nontrivial problem since capturing the semantic meaning of natural language is difficult for a machine. Meanwhile, many knowledge graph embedding methods have been proposed. The key idea is to represent each predicate/entity as a low-dimensional vector, such that the relation information in the KG could be preserved. The learned vectors could benefit various applications such as KG completion and recommender systems. In this paper, we explore to use them to handle the QA-KG problem. However, this remains a challenging task since a predicate could be expressed in different ways in natural language questions. Also, the ambiguity of entity names and partial names makes the number of possible answers large. To bridge the gap, we propose an effective Knowledge Embedding based Question Answering (KEQA) framework. We focus on answering the most common types of questions, i.e., simple questions, in which each question could be answered by the machine straightforwardly if its single head entity and single predicate are correctly identified. To answer a simple question, instead of inferring its head entity and predicate directly, KEQA targets at jointly recovering the question's head entity, predicate, and tail entity representations in the KG embedding spaces. Based on a carefully-designed joint distance metric, the three learned vectors' closest fact in the KG is returned as the answer. Experiments on a widely-adopted benchmark demonstrate that the proposed KEQA outperforms the state-of-the-art QA-KG methods.
Relevance search over a knowledge graph (KG) has gained much research attention. Given a query entity in a KG, the problem is to find its most relevant entities. However, the relevance function is hidden and dynamic. Different users for different queries may consider relevance from different angles of semantics. The ambiguity in a query is more noticeable in the presence of thousands of types of entities and relations in a schema-rich KG, which has challenged the effectiveness and scalability of existing methods. To meet the challenge, our approach called RelSUE requests a user to provide a small number of answer entities as examples, and then automatically learns the most likely relevance function from these examples. Specifically, we assume the intent of a query can be characterized by a set of meta-paths at the schema level. RelSUE searches a KG for diversified significant meta-paths that best characterize the relevance of the user-provided examples to the query entity. It reduces the large search space of a schema-rich KG using distance and degree-based heuristics, and performs reasoning to deduplicate meta-paths that represent equivalent query-specific semantics. Finally, a linear model is learned to predict meta-path based relevance. Extensive experiments demonstrate that RelSUE outperforms several state-of-the-art methods.
Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too many interruptions, failing to provide value to users and spurring users' complaints. In this paper we explore unique features of interactions between mobile notifications and user engagement. We propose a state transition framework to quantitatively evaluate the effectiveness of notifications. Within this framework, we develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution. Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model. In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.
Exploiting low-rank structure of the user-item rating matrix has been the crux of many recommendation engines. However, existing recommendation engines force raters with heterogeneous behavior profiles to map their intrinsic rating scales to a common rating scale (e.g. 1-5). This non-linear transformation of the rating scale shatters the low-rank structure of the rating matrix, therefore resulting in a poor fit and consequentially, poor recommendations. In this paper, we propose Clustered Monotone Transforms for Rating Factorization (CMTRF), a novel approach to perform regression up to unknown monotonic transforms over unknown population segments. Essentially, for recommendation systems, the technique searches for monotonic transformations of the rating scales resulting in a better fit. This is combined with an underlying matrix factorization regression model that couples the user-wise ratings to exploit shared low dimensional structure. The rating scale transformations can be generated for each user, for a cluster of users, or for all the users at once, forming the basis of three simple and efficient algorithms proposed in this paper, all of which alternate between transformation of the rating scales and matrix factorization regression. Despite the non-convexity, CMTRF is theoretically shown to recover a unique solution under mild conditions. Experimental results on two synthetic and seven real-world datasets show that CMTRF outperforms other state-of-the-art baselines.
Topic sparsity refers to the observation that individual documents usually focus on several salient topics instead of covering a wide variety of topics, and a real topic adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this topic sparsity is especially important for analyzing user-generated web content and social media, which are featured in the form of extremely short posts and discussions. As topic sparsity of individual documents in online social media increases, so does the difficulty of analyzing the online text sources using traditional methods. In this paper, we propose two novel neural models by providing sparse posterior distributions over topics based on the Gaussian sparsemax construction, enabling efficient training by stochastic backpropagation. We construct an inference network conditioned on the input data and infer the variational distribution with the relaxed Wasserstein (RW) divergence. Unlike existing works based on Gaussian softmax construction and Kullback-Leibler (KL) divergence, our approaches can identify latent topic sparsity with training stability, predictive performance, and topic coherence. Experiments on different genres of large text corpora have demonstrated the effectiveness of our models as they outperform both probabilistic and neural methods.
SESSION: Session 3: Recommendation and Temporal Trends
Random walks can provide a powerful tool for harvesting the rich network of interactions captured within item-based models for top-n recommendation. They can exploit indirect relations between the items, mitigate the effects of sparsity, ensure wider itemspace coverage, as well as increase the diversity of recommendation lists. Their potential however, is hindered by the tendency of the walks to rapidly concentrate towards the central nodes of the graph, thereby significantly restricting the range of K-step distributions that can be exploited for personalized recommendations. In this work we introduce RecWalk; a novel random walk-based method that leverages the spectral properties of nearly uncoupled Markov chains to provably lift this limitation and prolong the influence of users' past preferences on the successive steps of the walk--allowing the walker to explore the underlying network more fruitfully. A comprehensive set of experiments on real-world datasets verify the theoretically predicted properties of the proposed approach and indicate that they are directly linked to significant improvements in top-n recommendation accuracy. They also highlight RecWalk's potential in providing a framework for boosting the performance of item-based models. RecWalk achieves state-of-the-art top-n recommendation quality outperforming several competing approaches, including recently proposed methods that rely on deep neural networks.
Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.
Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is typically sequentially processed and parameters are synchronously updated. Distributed architectures for asynchronous training that have been proposed either focus on scaling vocabulary sizes and dimensionality or suffer from expensive synchronization latencies. In this paper, we propose a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of the embeddings. Our training procedure does not involve any parameter synchronization except a final sub-model merge phase that typically executes in a few minutes. Our distributed training scales seamlessly to large corpus sizes and we get comparable and sometimes even up to 45% performance improvement in a variety of NLP benchmarks using models trained by our distributed procedure which requires $1/10$ of the time taken by the baseline approach. Finally we also show that we are robust to missing words in sub-models and are able to effectively reconstruct word representations.
Social connections are known to be helpful for modeling users' potential preferences and improving the performance of recommender systems. However, in social-aware recommendations, there are two issues which influence the inference of users' preferences, and haven't been well-studied in most existing methods: First, the preferences of a user may only partially match that of his friends in certain aspects, especially when considering a user with diverse interests. Second, for an individual, the influence strength of his friends might be different, as not all friends are equally helpful for modeling his preferences in the system. To address the above issues, in this paper, we propose a novel Social Attentional Memory Network (SAMN) for social-aware recommendation. Specifically, we first design an attention-based memory module to learn user-friend relation vectors, which can capture the varying aspect attentions that a user share with his different friends. Then we build a friend-level attention component to adaptively select informative friends for user modeling. The two components are fused together to mutually enhance each other and lead to a finer extended model. Experimental results on three publicly available datasets show that the proposed SAMN model consistently and significantly outperforms the state-of-the-art recommendation methods. Furthermore, qualitative studies have been made to explore what the proposed attention-based memory module and friend-level attention have learnt, which provide insights into the model's learning process.
Long Short-Term Memory (LSTM) is one of the most powerful sequence models for user browsing history \citetan2016improved,korpusik2016recurrent or natural language text \citemikolov2010recurrent.Despite the strong performance, it has not gained popularity for user-facing applications, mainly owing to a large number of parameters and lack of interpretability. Recently \citetzaheer2017latent introduced latent LSTM Allocation (LLA) to address these problems by incorporating topic models with LSTM, where the topic model maps observed words in each sequence to topics that evolve using an LSTM model. In our experiments, we found the resulting model, although powerful and interpretable, to show shortcomings when applied to sequence data that exhibit multi-modes of behaviors with abrupt dynamic changes. To address this problem we introduce thLLA: a threading LLA model. thLLA has the ability to break each sequence into a set of segments and then model the dynamic in each segment using an LSTM mixture. In that way, thLLA can model abrupt changes in sequence dynamics and provides a better fit for sequence data while still being interpretable and requiring fewer parameters. In addition, thLLA uncovers hidden themes in the data via its dynamic mixture components. However, such generalization and interpretability come at a cost of complex dependence structure, for which inference would be extremely non-trivial. To remedy this, we present an efficient sampler based on particle MCMC method for inference that can draw from the joint posterior directly. Experimental results confirm the superiority of thLLA and the stability of the new inference algorithm on a variety of domains.
SESSION: Session 4: FATE & Privacy
Biased language commonly occurs around topics which are of controversial nature, thus, stirring disagreement between the different involved parties of a discussion. This is due to the fact that for language and its use, specifically, the understanding and use of phrases, the stances are cohesive within the particular groups. However, such cohesiveness does not hold across groups. In collaborative environments or environments where impartial language is desired (e.g. Wikipedia, news media), statements and the language therein should represent equally the involved parties and be neutrally phrased. Biased language is introduced through the presence of inflammatory words or phrases, or statements that may be incorrect or one-sided, thus violating such consensus. In this work, we focus on the specific case of phrasing bias, which may be introduced through specific inflammatory words or phrases in a statement. For this purpose, we propose an approach that relies on a recurrent neural networks in order to capture the inter-dependencies between words in a phrase that introduced bias. We perform a thorough experimental evaluation, where we show the advantages of a neural based approach over competitors that rely on word lexicons and other hand-crafted features in detecting biased language. We are able to distinguish biased statements with a precision of P=0.917, thus significantly outperforming baseline models with an improvement of over 30%. Finally, we release the largest corpus of statements annotated for biased language.
The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In this work, we propose a preventive approach for privacy-preserving sharing of genomic data in decentralized networks for Genome-wide association studies (GWASs), which have been widely used in discovering the association between genotypes and phenotypes. The key components of this work are: a decentralized secure network, with a privacy- preserving sharing protocol, and a gene fragmentation framework that is trainable in an end-to-end manner. Our experiments on real datasets show the effectiveness of our privacy-preserving approaches as well as significant improvements in efficiency when compared with recent, related algorithms.
Protecting User Privacy: An Approach for Untraceable Web Browsing History and Unambiguous User Profiles
The overturning of the Internet Privacy Rules by the Federal Communications Commissions (FCC) in late March 2017 allows Internet Service Providers (ISPs) to collect, share and sell their customers' Web browsing data without their consent. With third-party trackers embedded on Web pages, this new rule has put user privacy under more risk. The need arises for users on their own to protect their Web browsing history from any potential adversaries. Although some available solutions such as Tor, VPN, and HTTPS can help users conceal their online activities, their use can also significantly hamper personalized online services, i.e., degraded utility. In this paper, we design an effective Web browsing history anonymization scheme, PBooster, aiming to protect users' privacy while retaining the utility of their Web browsing history. The proposed model pollutes users' Web browsing history by automatically inferring how many and what links should be added to the history while addressing the utility-privacy trade-off challenge. We conduct experiments to validate the quality of the manipulated Web browsing history and examine the robustness of the proposed approach for user privacy protection.
It has been established that, ratings are missing not at random in recommender systems. However, little research has been done to reveal how the ratings are missing. In this paper we present one possible explanation of the missing not at random phenomenon. We verify that, using a variety of different real-life datasets, there is a spiral process for a silent minority in recommender systems where (1) people whose opinions fall into the minority are less likely to give ratings than majority opinion holders; (2) as the majority opinion becomes more dominant, the rating possibility of a majority opinion holder is intensifying but the rating possibility of a minority opinion holder is shrinking; (3) only hardcore users remain to rate for minority opinions when the spiral achieves its steady state. Our empirical findings are beneficial for future recommendation models. To demonstrate the impact of our empirical findings, we present a probabilistic model that mimics the generation process of spiral of silence. We experimentally show that, the presented model offers more accurate recommendations, compared with state-of-the-art recommendation models.
Fighting Fire with Fire: Using Antidote Data to Improve Polarization and Fairness of Recommender Systems
The increasing role of recommender systems in many aspects of society makes it essential to consider how such systems may impact social good. Various modifications to recommendation algorithms have been proposed to improve their performance for specific socially relevant measures. However, previous proposals are often not easily adapted to different measures, and they generally require the ability to modify either existing system inputs, the system's algorithm, or the system's outputs. As an alternative, in this paper we introduce the idea of improving the social desirability of recommender system outputs by adding more data to the input, an approach we view as as providing 'antidote' data to the system. We formalize the antidote data problem, and develop optimization-based solutions. We take as our model system the matrix factorization approach to recommendation, and we propose a set of measures to capture the polarization or fairness of recommendations. We then show how to generate antidote data for each measure, pointing out a number of computational efficiencies, and discuss the impact on overall system accuracy. Our experiments show that a modest budget for antidote data can lead to significant improvements in the polarization or fairness of recommendations.
Users increasingly rely on social media feeds for consuming daily information. The items in a feed, such as news, questions, songs, etc., usually result from the complex interplay of a user's social contacts, her interests and her actions on the platform. The relationship of the user's own behavior and the received feed is often puzzling, and many users would like to have a clear explanation on why certain items were shown to them. Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a framework that systematically discovers, ranks, and explains relationships between users' actions and items in their social media feeds. We model the user's local neighborhood on the platform as an interaction graph, a form of heterogeneous information network constructed solely from information that is easily accessible to the concerned user. We posit that paths in this interaction graph connecting the user and her feed items can act as pertinent explanations for the user. These paths are scored with a learning-to-rank model that captures relevance and surprisal. User studies on two social platforms demonstrate the practical viability and user benefits of the FAIRY method.
Online Learning to Rank is a powerful paradigm that allows to train ranking models using only online feedback from its users.In this work, we consider Federated Online Learning to Rank setup (FOLtR) where on-mobile ranking models are trained in a way that respects the users' privacy. We require that the user data, such as queries, results, and their feature representations are never communicated for the purpose of the ranker's training. We believe this setup is interesting, as it combines unique requirements for the learning algorithm: (a) preserving the user privacy, (b) low communication and computation costs, (c) learning from noisy bandit feedback, and (d) learning with non-continuous ranking quality measures. We propose a learning algorithm FOLtR-ES that satisfies these requirements. A part of FOLtR-ES is a privatization procedure that allows it to provide ε-local differential privacy guarantees, i.e. protecting the clients from an adversary who has access to the communicated messages. This procedure can be applied to any absolute online metric that takes finitely many values or can be discretized to a finite domain. Our experimental study is based on a widely used click simulation approach and publicly available learning to rank datasets MQ2007 and MQ2008. We evaluate FOLtR-ES against offline baselines that are trained using relevance labels, linear regression model and RankingSVM. From our experiments, we observe that FOLtR-ES can optimize a ranking model to perform similarly to the baselines in terms of the optimized online metric, Max Reciprocal Rank.
SESSION: Session 5: Understanding Conversation, Discussion, and Opinions
In an increasingly polarized world, demagogues who reduce complexity down to simple arguments based on emotion are gaining in popularity. Are opinions and online discussions falling into demagoguery? In this work, we aim to provide computational tools to investigate this question and, by doing so, explore the nature and complexity of online discussions and their space of opinions, uncovering where each participant lies. More specifically, we present a modeling framework to construct latent representations of opinions in online discussions which are consistent with human judgments, as measured by online voting. If two opinions are close in the resulting latent space of opinions, it is because humans think they are similar. Our framework is theoretically grounded and establishes a surprising connection between opinion and voting models and the sign-rank of matrices. Moreover, it also provides a set of practical algorithms to both estimate the dimensionality of the latent space of opinions and infer where opinions expressed by the participants of an online discussion lie in this space. Experiments on a large dataset from Yahoo! News, Yahoo! Finance, Yahoo! Sports, and the Newsroom app show that many discussions are multisided, reveal a positive correlation between the complexity of a discussion, its linguistic diversity and its level of controversy, and show that our framework may be able to circumvent language nuances such as sarcasm or humor by relying on human judgments instead of textual analysis.
We consider context-response matching with multiple types of representations for multi-turn response selection in retrieval-based chatbots. The representations encode semantics of contexts and responses on words, n-grams, and sub-sequences of utterances, and capture both short-term and long-term dependencies among words. With such a number of representations in hand, we study how to fuse them in a deep neural architecture for matching and how each of them contributes to matching. To this end, we propose a multi-representation fusion network where the representations can be fused into matching at an early stage, at an intermediate stage, or at the last stage. We empirically compare different representations and fusing strategies on two benchmark data sets. Evaluation results indicate that late fusion is always better than early fusion, and by fusing the representations at the last stage, our model significantly outperforms the existing methods, and achieves new state-of-the-art performance on both data sets. Through a thorough ablation study, we demonstrate the effect of each representation to matching, which sheds light on how to select them in practical systems.
Recent years have witnessed the flourishing of podcasts, a unique type of audio medium. Prior work on podcast content modeling focused on analyzing Automatic Speech Recognition outputs, which ignored vocal, musical, and conversational properties (e.g., energy, humor, and creativity) that uniquely characterize this medium. In this paper, we present an Adversarial Learning-based Podcast Representation (ALPR) that captures non-textual aspects of podcasts. Through extensive experiments on a large-scale podcast dataset (88,728 episodes from 18,433 channels), we show that (1) ALPR significantly outperforms the state-of-the-art features developed for music and speech in predicting theseriousness andenergy of podcasts, and (2) incorporating ALPR significantly improves the performance of topic-based podcast-popularity prediction. Our experiments also reveal factors that correlate with podcast popularity.
We tackle Attitude Detection, which we define as the task of extracting the replier's attitude, i.e., a target-polarity pair, from a given one-round conversation. While previous studies considered Target Extraction and Polarity Classification separately, we regard them as subtasks of Attitude Detection. Our experimental results show that treating the two subtasks independently is not the optimal solution for Attitude Detection, as achieving high performance in each subtask is not sufficient for obtaining correct target-polarity pairs. Our jointly trained model AD-NET substantially outperforms the separately trained models by alleviating the target-polarity mismatch problem. Moreover, we proposed a method utilising the attitude detection model to improve retrieval-based chatbots by re-ranking the response candidates with attitude features. Human evaluation indicates that with attitude detection integrated, the new responses to the sampled queries from are statistically significantly more consistent, coherent, engaging and informative than the original ones obtained from a commercial chatbot.
SESSION: Session 6: Networks and Social Behavior
Pattern counting in graphs is fundamental to several network sci- ence tasks, and there is an abundance of scalable methods for estimating counts of small patterns, often called motifs, in large graphs. However, modern graph datasets now contain richer structure, and incorporating temporal information in particular has become a key part of network analysis. Consequently, temporal motifs, which are generalizations of small subgraph patterns that incorporate temporal ordering on edges, are an emerging part of the network analysis toolbox. However, there are no algorithms for fast estimation of temporal motifs counts; moreover, we show that even counting simple temporal star motifs is NP-complete. Thus, there is a need for fast and approximate algorithms. Here, we present the first frequency estimation algorithms for counting temporal motifs. More specifically, we develop a sampling framework that sits as a layer on top of existing exact counting algorithms and enables fast and accurate memory-efficient estimates of temporal motif counts. Our results show that we can achieve one to two orders of magnitude speedups over existing algorithms with minimal and controllable loss in accuracy on a number of datasets.
The phenomenon of edge clustering in real-world networks is a fundamental property underlying many ideas and techniques in network science. Clustering is typically quantified by the clustering coefficient, which measures the fraction of pairs of neighbors of a given center node that are connected. However, many common explanations of edge clustering attribute the triadic closure to a head node instead of the center node of a length-2 path; for example, a friend of my friend is also my friend. While such explanations are common in network analysis, there is no measurement for edge clustering that can be attributed to the head node. Here we develop local closure coefficients as a metric quantifying head-node-based edge clustering. We define the local closure coefficient as the fraction of length-2 paths emanating from the head node that induce a triangle. This subtle difference in definition leads to remarkably different properties from traditional clustering coefficients. We analyze correlations with node degree, connect the closure coefficient to community detection, and show that closure coefficients as a feature can improve link prediction.
Social media is becoming popular for news consumption due to its fast dissemination, easy access, and low cost. However, it also enables the wide propagation of fake news, i.e., news with intentionally false information. Detecting fake news is an important task, which not only ensures users receive authentic information but also helps maintain a trustworthy news ecosystem. The majority of existing detection algorithms focus on finding clues from news contents, which are generally not effective because fake news is often intentionally written to mislead users by mimicking true news. Therefore, we need to explore auxiliary information to improve detection. The social context during news dissemination process on social media forms the inherent tri-relationship, the relationship among publishers, news pieces, and users, which has the potential to improve fake news detection. For example, partisan-biased publishers are more likely to publish fake news, and low-credible users are more likely to share fake news. In this paper, we study the novel problem of exploiting social context for fake news detection. We propose a tri-relationship embedding framework TriFN, which models publisher-news relations and user-news interactions simultaneously for fake news classification. We conduct experiments on two real-world datasets, which demonstrate that the proposed approach significantly outperforms other baseline methods for fake news detection.
Crowdsourcing has become a standard methodology to collect manually annotated data such as relevance judgments at scale. On crowdsourcing platforms like Amazon MTurk or FigureEight, crowd workers select tasks to work on based on different dimensions such as task reward and requester reputation. Requesters then receive the judgments of workers who self-selected into the tasks and completed them successfully. Several crowd workers, however, preview tasks, begin working on them, reaching varying stages of task completion without finally submitting their work. Such behavior results in unrewarded effort which remains invisible to requesters. In this paper, we conduct the first investigation into the phenomenon of task abandonment, the act of workers previewing or beginning a task and deciding not to complete it. We follow a three-fold methodology which includes 1) investigating the prevalence and causes of task abandonment by means of a survey over different crowdsourcing platforms, 2) data-driven analyses of logs collected during a large-scale relevance judgment experiment, and 3) controlled experiments measuring the effect of different dimensions on abandonment. Our results show that task abandonment is a widely spread phenomenon. Apart from accounting for a considerable amount of wasted human effort, this bears important implications on the hourly wages of workers as they are not rewarded for tasks that they do not complete. We also show how task abandonment may have strong implications on the use of collected data (for example, on the evaluation of IR systems).
Twitter's popularity has fostered the emergence of various illegal user activities - one such activity is to artificially bolster visibility of tweets by gaining large number of retweets within a short time span. The natural way to gain visibility is time-consuming. Therefore, users who want their tweets to get quick visibility try to explore shortcuts - one such shortcut is to approach the blackmarket services, and gain retweets for their own tweets by retweeting other customers' tweets. Thus the users intrinsically become a part of a collusive ecosystem controlled by these services. In this paper, we propose CoReRank, an unsupervised framework to detect collusive users (who are involved in producing artificial retweets), and suspicious tweets (which are submitted to the blackmarket services) simultaneously. CoReRank leverages the retweeting (or quoting) patterns of users, and measures two scores - the 'credibility' of a user and the 'merit' of a tweet. We propose a set of axioms to derive the interdependency between these two scores, and update them in a recursive manner. The formulation is further extended to handle the cold start problem. CoReRank is guaranteed to converge in a finite number of iterations and has linear time complexity. We also propose a semi-supervised version of CoReRank (called CoReRank+) which leverages a partial ground-truth labeling of users and tweets. Extensive experiments are conducted to show the superiority of CoReRank compared to six baselines on a novel dataset we collected and annotated. CoReRank beats the best unsupervised baseline method by 269% (20%) (relative) average precision and 300% (22.22%) (relative) average recall in detecting collusive (genuine) users. CoReRank+ beats the best supervised baseline method by 33.18% AUC. CoReRank also detects suspicious tweets with 0.85 (0.60) average precision (recall). To our knowledge, CoReRank is the first unsupervised method to detect collusive users and suspicious tweets simultaneously with theoretical guarantees.
Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models.
An overwhelming number of true and false news stories are posted and shared in social networks, and users diffuse the stories based on multiple factors. Diffusion of news stories from one user to another depends not only on the stories' content and the genuineness but also on the alignment of the topical interests between the users. In this paper, we propose a novel Bayesian nonparametric model that incorporates homogeneity of news stories as the key component that regulates the topical similarity between the posting and sharing users' topical interests. Our model extends hierarchical Dirichlet process to model the topics of the news stories and incorporates Bayesian Gaussian process latent variable model to discover the homogeneity values. We train our model on a real-world social network dataset and find homogeneity values of news stories that strongly relate to their labels of genuineness and their contents. Finally, we show that the supervised version of our model predicts the labels of news stories better than the state-of-the-art neural network and Bayesian models.
Performing anomaly detection on attributed networks concerns with finding nodes whose patterns or behaviors deviate significantly from the majority of reference nodes. Its success can be easily found in many real-world applications such as network intrusion detection, opinion spam detection and system fault diagnosis, to name a few. Despite their empirical success, a vast majority of existing efforts are overwhelmingly performed in an unsupervised scenario due to the expensive labeling costs of ground truth anomalies. In fact, in many scenarios, a small amount of prior human knowledge of the data is often effortless to obtain, and getting it involved in the learning process has shown to be effective in advancing many important learning tasks. Additionally, since new types of anomalies may constantly arise over time especially in an adversarial environment, the interests of human expert could also change accordingly regarding to the detected anomaly types. It brings further challenges to conventional anomaly detection algorithms as they are often applied in a batch setting and are incapable to interact with the environment. To tackle the above issues, in this paper, we investigate the problem of anomaly detection on attributed networks in an interactive setting by allowing the system to proactively communicate with the human expert in making a limited number of queries about ground truth anomalies. Our objective is to maximize the true anomalies presented to the human expert after a given budget is used up. Along with this line, we formulate the problem through the principled multi-armed bandit framework and develop a novel collaborative contextual bandit algorithm, named GraphUCB. In particular, our developed algorithm: (1) explicitly models the nodal attributes and node dependencies seamlessly in a joint framework; and (2) handles the exploration-exploitation dilemma when querying anomalies of different types. Extensive experiments on real-world datasets show the improvement of the proposed algorithm over the state-of-the-art algorithms.
Core-periphery structure is a common property of complex networks, which is a composition of tightly connected groups of core vertices and sparsely connected periphery vertices. This structure frequently emerges in traffic systems, biology, and social networks via underlying spatial positioning of the vertices. While core-periphery structure is ubiquitous, there have been limited attempts at modeling network data with this structure. Here, we develop a generative, random network model with core-periphery structure that jointly accounts for topological and spatial information by "core scores'' of vertices. Our model achieves substantially higher likelihood than existing generative models of core-periphery structure, and we demonstrate how the core scores can be used in downstream data mining tasks, such as predicting airline traffic and classifying fungal networks. We also develop nearly linear time algorithms for learning model parameters and network sampling by using a method akin to the fast multipole method, a technique traditional to computational physics, which allow us to scale to networks with millions of vertices with minor tradeoffs in accuracy.
We propose a general view that demonstrates the relationship between network embedding approaches and matrix factorization. Unlike previous works that present the equivalence for the approaches from a skip-gram model perspective, we provide a more fundamental connection from an optimization (objective function) perspective. We demonstrate that matrix factorization is equivalent to optimizing two objectives: one is for bringing together the embeddings of similar nodes; the other is for separating the embeddings of distant nodes. The matrix to be factorized has a general form: S-β. The elements of $\mathbfS $ indicate pairwise node similarities. They can be based on any user-defined similarity/distance measure or learned from random walks on networks. The shift number β is related to a parameter that balances the two objectives. More importantly, the resulting embeddings are sensitive to β and we can improve the embeddings by tuning β. Experiments show that matrix factorization based on a new proposed similarity measure and β-tuning strategy significantly outperforms existing matrix factorization approaches on a range of benchmark networks.
Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, but very costly to compute in practice. Inspired by the recent success of neural network approaches to several graph applications, such as node or graph classification, we propose a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance. The proposed approach, called SimGNN, combines two strategies. First, we design a learnable embedding function that maps every graph into an embedding vector, which provides a global summary of a graph. A novel attention mechanism is proposed to emphasize the important nodes with respect to a specific similarity metric. Second, we design a pairwise node comparison method to supplement the graph-level embeddings with fine-grained node-level information. Our model achieves better generalization on unseen graphs, and in the worst case runs in quadratic time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error rate and great time reduction compared against a series of baselines, including several approximation algorithms on GED computation, and many existing graph neural network based models. Our study suggests SimGNN provides a new direction for future research on graph similarity computation and graph similarity search.
Existing embedding methods for attributed networks aim at learning low-dimensional vector representations for nodes only but not for both nodes and attributes, resulting in the fact that they cannot capture the affinities between nodes and attributes. However, capturing such affinities is of great importance to the success of many real-world attributed network applications, such as attribute inference and user profiling. Accordingly, in this paper, we introduce a Co-embedding model for Attributed Networks (CAN), which learns low-dimensional representations of both attributes and nodes in the same semantic space such that the affinities between them can be effectively captured and measured. To obtain high-quality embeddings, we propose a variational auto-encoder that embeds each node and attribute with means and variances of Gaussian distributions. Experimental results on real-world networks demonstrate that our model yields excellent performance in a number of applications compared with state-of-the-art techniques.
SESSION: Session 7: E-commerce and Recommendation
Relevance is the core problem of a search engine, and one of the main challenges is the vocabulary gap between user queries and documents. This problem is more serious in e-commerce, because language in product titles is more professional. Query rewriting and semantic matching are two key techniques to bridge the semantic gap between them to improve relevance. Recently, deep neural networks have been successfully applied to the two tasks and enhanced the relevance performance. However, such approaches suffer from the sparseness of training data in e-commerce scenario. In this study, we investigate the instinctive connection between query rewriting and semantic matching tasks, and propose a co-training framework to address the data sparseness problem when training deep neural networks. We first build a huge unlabeled dataset from search logs, on which the two tasks can be considered as two different views of the relevance problem. Then we iteratively co-train them via labeled data generated from this unlabeled set to boost their performance simultaneously. We conduct a series of offline and online experiments on a real-world e-commerce search engine, and the results demonstrate that the proposed method improves relevance significantly.
The users often have many product-related questions before they make a purchase decision in E-commerce. However, it is often time-consuming to examine each user review to identify the desired information. In this paper, we propose a novel review-driven framework for answer generation for product-related questions in E-commerce, named RAGE. We develope RAGE on the basis of the multi-layer convolutional architecture to facilitate speed-up of answer generation with the parallel computation. For each question, RAGE first extracts the relevant review snippets from the reviews of the corresponding product. Then, we devise a mechanism to identify the relevant information from the noise-prone review snippets and incorporate this information to guide the answer generation. The experiments on two real-world E-Commerce datasets show that the proposed RAGE significantly outperforms the existing alternatives in producing more accurate and informative answers in natural language. Moreover, RAGE takes much less time for both model training and answer generation than the existing RNN based generation models.
Evaluating algorithmic recommendations is an important, but difficult, problem. Evaluations conducted offline using data collected from user interactions with an online system often suffer from biases arising from the user interface or the recommendation engine. Online evaluation (A/B testing) can more easily address problems of bias, but depending on setting can be time-consuming and incur risk of negatively impacting the user experience, not to mention that it is generally more difficult when access to a large user base is not taken as granted. A compromise based on \em counterfactual analysis is to present some subset of online users with recommendation results that have been randomized or otherwise manipulated, log their interactions, and then use those to de-bias offline evaluations on historical data. However, previous work does not offer clear conclusions on how well such methods correlate with and are able to predict the results of online A/B tests. Understanding this is crucial to widespread adoption of new offline evaluation techniques in recommender systems. In this work we present a comparison of offline and online evaluation results for a particular recommendation problem: recommending playlists of tracks to a user looking for music. We describe two different ways to think about de-biasing offline collections for more accurate evaluation. Our results show that, contrary to much of the previous work on this topic, properly-conducted offline experiments do correlate well to A/B test results, and moreover that we can expect an offline evaluation to identify the best candidate systems for online testing with high probability.
In e-commerce portals, generating answers for product-related questions has become a crucial task. In this paper, we propose the task of product-aware answer generation, which tends to generate an accurate and complete answer from large-scale unlabeled e-commerce reviews and product attributes. Unlike existing question-answering problems, answer generation in e-commerce confronts three main challenges: (1) Reviews are informal and noisy; (2) joint modeling of reviews and key-value product attributes is challenging; (3) traditional methods easily generate meaningless answers. To tackle above challenges, we propose an adversarial learning based model, named PAAG, which is composed of three components: a question-aware review representation module, a key-value memory network encoding attributes, and a recurrent neural network as a sequence generator. Specifically, we employ a convolutional discriminator to distinguish whether our generated answer matches the facts. To extract the salience part of reviews, an attention-based review reader is proposed to capture the most relevant words given the question. Conducted on a large-scale real-world e-commerce dataset, our extensive experiments verify the effectiveness of each module in our proposed model. Moreover, our experiments show that our model achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.
Recommendation in the modern world is not only about capturing the interaction between users and items, but also about understanding the relationship between items. Besides improving the quality of recommendation, it enables the generation of candidate items that can serve as substitutes and supplements of another item. For example, when recommending Xbox, PS4 could be a logical substitute and the supplements could be items such as game controllers, surround system, and travel case. Therefore, given a network of items, our objective is to learn their content features such that they explain the relationship between items in terms of substitutes and supplements. To achieve this, we propose a generative deep learning model that links two variational autoencoders using a connector neural network to create Linked Variational Autoencoder (LVA). LVA learns the latent features of items by conditioning on the observed relationship between items. Using a rigorous series of experiments, we show that LVA significantly outperforms other representative and state-of-the-art baseline methods in terms of prediction accuracy. We then extend LVA by incorporating collaborative filtering (CF) to create CLVA that captures the implicit relationship between users and items. By comparing CLVA with LVA we show that inducing CF-based features greatly improve the recommendation quality of substitutable and supplementary items on a user level.
SESSION: Session 8: Counterfactual and Causal Learning
We consider the novel problem of evaluating a recommendation policy offline in environments where the reward signal is non-stationary. Non-stationarity appears in many Information Retrieval (IR) applications such as recommendation and advertising, but its effect on off-policy evaluation has not been studied at all. We are the first to address this issue. First, we analyze standard off-policy estimators in non-stationary environments and show both theoretically and experimentally that their bias grows with time. Then, we propose new off-policy estimators with moving averages and show that their bias is independent of time and can be bounded. Furthermore, we provide a method to trade-off bias and variance in a principled way to get an off-policy estimator that works well in both non-stationary and stationary environments. We experiment on publicly available recommendation datasets and show that our newly proposed moving average estimators accurately capture changes in non-stationary environments, while standard off-policy estimators fail to do so.
Industrial recommender systems deal with extremely large action spaces -- many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learning. Learning from the logged feedback is however subject to biases caused by only observing feedback on recommendations selected by the previous versions of the recommender. In this work, we present a general recipe of addressing such biases in a production top-K recommender system at Youtube, built with a policy-gradient-based algorithm, i.e. REINFORCE. The contributions of the paper are: (1) scaling REINFORCE to a production recommender system with an action space on the orders of millions; (2) applying off-policy correction to address data biases in learning from logged feedback collected from multiple behavior policies; (3) proposing a novel top-K off-policy correction to account for our policy recommending multiple items at a time; (4) showcasing the value of exploration. We demonstrate the efficacy of our approaches through a series of simulations and multiple live experiments on Youtube.
In this paper, we propose an offline counterfactual policy estimation framework called Genie to optimize Sponsored Search Marketplace. Genie employs an open box simulation engine with click calibration model to compute the KPI impact of any modification to the system. From the experimental results on Bing traffic, we showed that Genie performs better than existing observational approaches that employs randomized experiments for traffic slices that have frequent policy updates. We also show that Genie can be used to tune completely new policies efficiently without creating risky randomized experiments due to cold start problem. As time of today, Genie hosts more than $10000$ optimization jobs yearly which runs more than $30$ Million processing node hours of big data jobs for Bing Ads. For the last 3 years, Genie has been proven to be the one of the major platforms to optimize Bing Ads Marketplace due to its reliability under frequent policy changes and its efficiency to minimize risks in real experiments.
Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \citeJoachims/etal/17a can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies.
We evaluate the impact of probabilistically-constructed digital identity data collected between Sep. 2017 and Dec. 2017, approximately, in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying user. The identity data allows to generate "identity-based", rather than "identifier-based", user models, giving a fuller picture of the interests of the users underlying the identifiers. We employ off-policy evaluation techniques to evaluate the potential of identity-powered lookalike models without incurring the risk of allowing untested models to direct large amounts of ad spend or the large cost of performing A/B tests. We add to historical work on off-policy evaluation by noting a significant type of "finite-sample bias" that occurs for studies combining modestly-sized datasets and evaluation metrics based on ratios involving rare events (e.g., conversions). We illustrate this bias using a simulation study that later informs the handling of inverse propensity weights in our analyses on real data. We demonstrate significant lift in identity-powered lookalikes versus an identity-ignorant baseline: on average ~70% lift in conversion rate, CVR, with a concordant drop in cost-per-acquisition, CPA. This rises to factors of ~(4-32)x for identifiers having little data themselves, but that can be inferred to belong to users with substantial data to aggregate across identifiers. This implies that identity-powered user modeling is especially important in the context of identifiers having very short lifespans (i.e., frequently churned cookies). Our work motivates and informs the use of probabilistically-constructed digital identities in the marketing context. It also deepens the canon of examples in which off-policy learning has been employed to evaluate the complex systems of the internet economy.
A Sequential Test for Selecting the Better Variant: Online A/B testing, Adaptive Allocation, and Continuous Monitoring
Online A/B tests play an instrumental role for Internet companies to improve products and technologies in a data-driven manner. An online A/B test, in its most straightforward form, can be treated as a static hypothesis test where traditional statistical tools such as p-values and power analysis might be applied to help decision makers determine which variant performs better. However, a static A/B test presents both time cost and the opportunity cost for rapid product iterations. For time cost, a fast-paced product evolution pushes its shareholders to consistently monitor results from online A/B experiments, which usually invites peeking and altering experimental designs as data collected. It is recognized that this flexibility might harm statistical guarantees if not introduced in the right way, especially when online tests are considered as static hypothesis tests. For opportunity cost, a static test usually entails a static allocation of users into different variants, which prevents an immediate roll-out of the better version to larger audience or risks of alienating users who may suffer from a bad experience. While some works try to tackle these challenges, no prior method focuses on a holistic solution to both issues. In this paper, we propose a unified framework utilizing sequential analysis and multi-armed bandit to address time cost and the opportunity cost of static online tests simultaneously. In particular, we present an imputed sequential Girshick test that accommodates online data and dynamic allocation of data. The unobserved potential outcomes are treated as missing data and are imputed using empirical averages. Focusing on the binomial model, we demonstrate that the proposed imputed Girshick test achieves Type-I error and power control with both a fixed allocation ratio and an adaptive allocation such as Thompson Sampling through extensive experiments. In addition, we also run experiments on historical Etsy.com A/B tests to show the reduction in opportunity cost when using the proposed method.
We have seen a massive growth of online experiments at Internet companies. Although conceptually simple, A/B tests can easily go wrong in the hands of inexperienced users and on an A/B testing platform with little governance. An invalid A/B test hurts the business by leading to non-optimal decisions. Therefore, it is now more important than ever to create an intelligent A/B platform that democratizes A/B testing and allows everyone to make quality decisions through built-in detection and diagnosis of invalid tests. In this paper, we share how we mined through historical A/B tests and identified the most common causes for invalid tests, ranging from biased design, self-selection bias to attempting to generalize A/B test result beyond the experiment population and time frame. Furthermore, we also developed scalable algorithms to automatically detect invalid A/B tests and diagnose the root cause of invalidity. Surfacing up invalidity not only improved decision quality, but also served as a user education and reduced problematic experiment designs in the long run.
SESSION: Session 9: Recommendation
Online P2PL systems allow lending and borrowing between peers without the need for intermediaries such as banks. Convenience and high rate of returns have made P2PL systems very popular. Recommendation systems have been developed to help lenders make wise investment decisions, lowering the chances of overall default. However, P2PL marketplace suffers from low financial liquidity, i.e., loans of different grades are not always available for investment. Moreover, P2PL investments are long term (usually a few years), hence, incorrect investment cannot be liquidated easily. Overall, the state-of-the-art recommendation systems do not account for the low market liquidity and hence, can lead to unwise investment decisions. In this paper we remedy this shortcoming by building a recommendation framework that builds an investment portfolio, which results in the highest return and the lowest risk along with a statistical measure of the number of days required for the amount to be completely funded. Our recommendation system predicts the grade and number of loans that will appear in the future when constructing the investment portfolio. Experimental results show that our recommendation engine outperforms the current state-of-the-art techniques. Our recommendation system can increase the probability of achieving the highest return with the lowest risk by ~ 69%.
The rapid growth of Internet services and mobile devices provides an excellent opportunity to satisfy the strong demand for the personalized item or product recommendation. However, with the tremendous increase of users and items, personalized recommender systems still face several challenging problems: (1) the hardness of exploiting sparse implicit feedback; (2) the difficulty of combining heterogeneous data. To cope with these challenges, we propose a gated attentive-autoencoder (GATE) model, which is capable of learning fused hidden representations of items' contents and binary ratings, through a neural gating structure. Based on the fused representations, our model exploits neighboring relations between items to help infer users' preferences. In particular, a word-level and a neighbor-level attention module are integrated with the autoencoder. The word-level attention learns the item hidden representations from items' word sequences, while favoring informative words by assigning larger attention weights. The neighbor-level attention learns the hidden representation of an item's neighborhood by considering its neighbors in a weighted manner. We extensively evaluate our model with several state-of-the-art methods and different validation metrics on four real-world datasets. The experimental results not only demonstrate the effectiveness of our model on top-N recommendation but also provide interpretable results attributed to the attention modules.
This paper reformulates the problem of recommending related queries on a search engine as an extreme multi-label learning task. Extreme multi-label learning aims to annotate each data point with the most relevant subset of labels from an extremely large label set. Each of the top 100 million queries on Bing was treated as a separate label in the proposed reformulation and an extreme classifier was learnt which took the user's query as input and predicted the relevant subset of 100 million queries as output. Unfortunately, state-of-the-art extreme classifiers have not been shown to scale beyond 10 million labels and have poor prediction accuracies for queries. This paper therefore develops the Slice algorithm which can be accurately trained on low-dimensional, dense deep learning features popularly used to represent queries and which efficiently scales to 100 million labels and 240 million training points. Slice achieves this by reducing the training and prediction times from linear to logarithmic in the number of labels based on a novel negative sampling technique. This allows the proposed reformulation to address some of the limitations of traditional related searches approaches in terms of coverage, density and quality. Experiments on publically available extreme classification datasets with low-dimensional dense features as well as related searches datasets mined from the Bing logs revealed that slice could be more accurate than leading extreme classifiers while also scaling to 100 million labels. Furthermore, slice was found to improve the accuracy of recommendations by 10% as compared to state-of-the-art related searches techniques. Finally, when added to the ensemble in production in Bing, slice was found to increase the trigger coverage by 52%, the suggestion density by 33%, the overall success rate by 2.6% and the success rate for tail queries by 12.6%. Slice's source code can be downloaded from .
Neural collaborative filtering (NCF) and recurrent recommender systems (RRN) have been successful in modeling relational data (user-item interactions). However, they are also limited in their assumption of static or sequential modeling of relational data as they do not account for evolving users' preference over time as well as changes in the underlying factors that drive the change in user-item relationship over time. We address these limitations by proposing a Neural network based Tensor Factorization (NTF) model for predictive tasks on dynamic relational data. The NTF model generalizes conventional tensor factorization from two perspectives: First, it leverages the long short-term memory architecture to characterize the multi-dimensional temporal interactions on relational data. Second, it incorporates the multi-layer perceptron structure for learning the non-linearities between different latent factors. Our extensive experiments demonstrate the significant improvement in both the rating prediction and link prediction tasks on various dynamic relational data by our NTF model over both neural network based factorization models and other traditional methods.
Shaping Feedback Data in Recommender Systems with Interventions Based on Information Foraging Theory
Recommender systems rely heavily on the predictive accuracy of the learning algorithm. Most work on improving accuracy has focused on the learning algorithm itself. We argue that this algorithmic focus is myopic. In particular, since learning algorithms generally improve with more and better data, we propose shaping the feedback generation process as an alternate and complementary route to improving accuracy. To this effect, we explore how changes to the user interface can impact the quality and quantity of feedback data -- and therefore the learning accuracy. Motivated by information foraging theory, we study how feedback quality and quantity are influenced by interface design choices along two axes: information scent and information access cost. We present a user study of these interface factors for the common task of picking a movie to watch, showing that these factors can effectively shape and improve the implicit feedback data that is generated while maintaining the user experience.
Online communities such as Facebook and Twitter are enormously popular and have become an essential part of the daily life of many of their users. Through these platforms, users can discover and create information that others will then consume. In that context, recommending relevant information to users becomes critical for viability. However, recommendation in online communities is a challenging problem: 1) users' interests are dynamic, and 2) users are influenced by their friends. Moreover, the influencers may be context-dependent. That is, different friends may be relied upon for different topics. Modeling both signals is therefore essential for recommendations. We propose a recommender system for online communities based on a dynamic-graph-attention neural network. We model dynamic user behaviors with a recurrent neural network, and context-dependent social influence with a graph-attention neural network, which dynamically infers the influencers based on users' current interests. The whole model can be efficiently fit on large-scale data. Experimental results on several real-world data sets demonstrate the effectiveness of our proposed approach over several competitive baselines including state-of-the-art models.
We propose a new time-dependent predictive model of user-item ratings centered around local coherence -- that is, while both users and items are constantly in flux, within a short-term sequence, the neighborhood of a particular user or item is likely to be coherent. Three unique characteristics of the framework are: (i) it incorporates both implicit and explicit feedbacks by extracting the local coherence hidden in the feedback sequences; (ii) it uses parallel recurrent neural networks to capture the evolution of users and items, resulting in a dual factor recommendation model; and (iii) it combines both coherence-enhanced consistent latent factors and dynamic latent factors to balance short-term changes with long-term trends for improved recommendation. Through experiments on Goodreads and Amazon, we find that the proposed model can outperform state-of-the-art models in predicting users' preferences.
In this paper, we focus on the task of sequential recommendation using taxonomy data. Existing sequential recommendation methods usually adopt a single vectorized representation for learning the overall sequential characteristics, and have a limited modeling capacity in capturing multi-grained sequential characteristics over context information. Besides, existing methods often directly take the feature vectors derived from context information as auxiliary input, which is difficult to fully exploit the structural patterns in context information for learning preference representations. To address above issues, we propose a novel Taxonomy-aware Multi-hop Reasoning Network, named TMRN, which integrates a basic GRU-based sequential recommender with an elaborately designed memory-based multi-hop reasoning architecture. For enhancing the reasoning capacity, we incorporate taxonomy data as structural knowledge to instruct the learning of our model. We associate the learning of user preference in sequential recommendation with the category hierarchy in the taxonomy. Given a user, for each recommendation, we learn a unique preference representation corresponding to each level in the taxonomy based on her/his overall sequential preference. In this way, the overall, coarse-grained preference representation can be gradually refined in different levels from general to specific, and we are able to capture the evolvement and refinement of user preference over the taxonomy, which makes our model highly explainable. Extensive experiments show that our proposed model is superior to state-of-the-art baselines in terms of both effectiveness and interpretability.
Convolutional Neural Networks (CNNs) have been recently introduced in the domain of session-based next item recommendation. An ordered collection of past items the user has interacted with in a session (or sequence) are embedded into a 2-dimensional latent matrix, and treated as an image. The convolution and pooling operations are then applied to the mapped item embeddings. In this paper, we first examine the typical session-based CNN recommender and show that both the generative model and network architecture are suboptimal when modeling long-range dependencies in the item sequence. To address the issues, we introduce a simple, but very effective generative model that is capable of learning high-level representation from both short- and long-range item dependencies. The network architecture of the proposed model is formed of a stack of holed convolutional layers, which can efficiently increase the receptive fields without relying on the pooling operation. Another contribution is the effective use of residual block structure in recommender systems, which can ease the optimization for much deeper networks. The proposed generative model attains state-of-the-art accuracy with less training time in the next item recommendation task. It accordingly can be used as a powerful recommendation baseline to beat in future, especially when there are long sequences of user feedback.
Time is of the Essence: A Joint Hierarchical RNN and Point Process Model for Time and Item Predictions
In recent years session-based recommendation has emerged as an increasingly applicable type of recommendation. As sessions consist of sequences of events, this type of recommendation is a natural fit for Recurrent Neural Networks (RNNs). Several additions have been proposed for extending such models in order to handle specific problems or data. Two such extensions are 1.) modeling of inter-session relations for catching long term dependencies over user sessions, and 2.) modeling temporal aspects of user-item interactions. The former allows the session-based recommendation to utilize extended session history and inter-session information when providing new recommendations. The latter has been used to both provide state-of-the-art predictions for when the user will return to the service and also for improving recommendations. In this work, we combine these two extensions in a joint model for the tasks of recommendation and return-time prediction. The model consists of a Hierarchical RNN for the inter-session and intra-session items recommendation extended with a Point Process model for the time-gaps between the sessions. The experimental results indicate that the proposed model improves recommendations significantly on two datasets over a strong baseline, while simultaneously improving return-time predictions over a baseline return-time prediction model.
Variational autoencoders were proven successful in domains such as computer vision and speech processing. Their adoption for modeling user preferences is still unexplored, although recently it is starting to gain attention in the current literature. In this work, we propose a model which extends variational autoencoders by exploiting the rich information present in the past preference history. We introduce a recurrent version of the VAE, where instead of passing a subset of the whole history regardless of temporal dependencies, we rather pass the consumption sequence subset through a recurrent neural network. At each time-step of the RNN, the sequence is fed through a series of fully-connected layers, the output of which models the probability distribution of the most likely future preferences. We show that handling temporal information is crucial for improving the accuracy of the VAE: In fact, our model beats the current state-of-the-art by valuable margins because of its ability to capture temporal dependencies among the user-consumption sequence using the recurrent encoder still keeping the fundamentals of variational autoencoders intact.
A user-generated review document is a product between the item's intrinsic properties and the user's perceived composition of those properties. Without properly modeling and decoupling these two factors, one can hardly obtain any accurate user understanding nor item profiling from such user-generated data. In this paper, we study a new text mining problem that aims at differentiating a user's subjective composition of topical content in his/her review document from the entity's intrinsic properties. Motivated by the Item Response Theory (IRT), we model each review document as a user's detailed response to an item, and assume the response is jointly determined by the individuality of the user and the property of the item. We model the text-based response with a generative topic model, in which we characterize the items' properties and users' manifestations of them in a low-dimensional topic space. Via posterior inference, we separate and study these two components over a collection of review documents. Extensive experiments on two large collections of Amazon and Yelp review data verified the effectiveness of the proposed solution: it outperforms the state-of-art topic models with better predictive power in unseen documents, which is directly translated into improved performance in item recommendation and item summarization tasks.
SESSION: Session 10: Personalization and Characterizing User Behavior
As one of the Web's primary multilingual knowledge sources, Wikipedia is read by millions of people across the globe every day. Despite this global readership, little is known about why users read Wikipedia's various language editions. To bridge this gap, we conduct a comparative study by combining a large-scale survey of Wikipedia readers across 14 language editions with a log-based analysis of user activity. We proceed in three steps. First, we analyze the survey results to compare the prevalence of Wikipedia use cases across languages, discovering commonalities, but also substantial differences, among Wikipedia languages with respect to their usage. Second, we match survey responses to the respondents' traces in Wikipedia's server logs to characterize behavioral patterns associated with specific use cases, finding that distinctive patterns consistently mark certain use cases across language editions. Third, we show that certain Wikipedia use cases are more common in countries with certain socio-economic characteristics; e.g., in-depth reading of Wikipedia articles is substantially more common in countries with a low Human Development Index. These findings advance our understanding of reader motivations and behaviors across Wikipedia languages and have implications for Wikipedia editors and developers of Wikipedia and other Web technologies.
Email triage involves going throughunhandled emails and deciding what to do with them. This familiar process can become increasingly challenging as the number of unhandled email grows. During a triage session, users commonlydefer handling emails that they cannot immediately deal with to later. These deferred emails, are often related to tasks that are postponed until the user has more time or the right information to deal with them. In this paper, through qualitative interviews and a large-scale log analysis, we study when and whatenterprise email users tend to defer. We found that users are more likely to defer emails when handling them involves replying, reading carefully, or clicking on links and attachments. We also learned that the decision to defer emails depends on many factors such as user's workload and the importance of the sender. Our qualitative results suggested that deferring is very common, and our quantitative log analysis confirms that 12% of triage sessions and 16% of daily active users had at least one deferred email on weekdays. We also discuss severaldeferral strategies such as marking emails as unread and flagging that are reported by our interviewees, and illustrate how such patterns can be also observed in user logs. Inspired by the characteristics of deferred emails and contextual factors involved in deciding if an email should be deferred, we train a classifier for predicting whether a recently triaged email is actually deferred. Our experimental results suggests that deferral can be classified with modest effectiveness. Overall, our work provides novel insights about how users handle their emails and how deferral can be modeled.
Estimating how long a task will take to complete (i.e., the task duration) is important for many applications, including calendaring and project management. Population-scale calendar data contains distributional information about time allocated by individuals for tasks that may be useful to build computational models for task duration estimation. This study analyzes anonymized large-scale calendar appointment data from hundreds of thousands of individuals and millions of tasks to understand expected task durations and the longitudinal evolution in these durations. Machine-learned models are trained using the appointment data to estimate task duration. Study findings show that task attributes, including content (anonymized appointment subjects), context, and history, are correlated with time allocated for tasks. We also show that machine-learned models can be trained to estimate task duration, with multiclass classification accuracies of almost 80%. The findings have implications for understanding time estimation in populations, and in the design of support in digital assistants and calendaring applications to find time for tasks and to help people, especially those who are new to a task, block sufficient time for task completion.
Understanding search intents behind queries is of vital importance for improving search performance or designing better evaluation metrics. Although there exist many efforts in Web search user intent taxonomies and investigating how users' interaction behaviors vary with the intent types, only a few of them have been made specifically for the image search scenario. Different from previous works which investigate image search user behavior and task characteristics based on either lab studies or large scale log analysis, we conducted a field study which lasts one month and involves 2,040 search queries from 555 search tasks. By this means, we collected relatively large amount of practical search behavior data with extensive first-tier annotation from users. With this data set, we investigate how various image search intents affect users' search behavior, and try to adopt different signals to predict search satisfaction under the certain intent. Meanwhile, external assessors were also employed to categorize each search task using four orthogonal intent taxonomies. Based on the hypothesis that behavior is dependent of task type, we analyze user search behavior on the field study data, examining characteristics of the session, click and mouse patterns. We also link the search satisfaction prediction to image search intent, which shows that different types of signals play different roles in satisfaction prediction as intent varies. Our findings indicate the importance of considering search intent in user behavior analysis and satisfaction prediction in image search.
Demographics of online users such as age and gender play an important role in personalized web applications. However, it is difficult to directly obtain the demographic information of online users. Luckily, search queries can cover many online users and the search queries from users with different demographics usually have some difference in contents and writing styles. Thus, search queries can provide useful clues for demographic prediction. In this paper, we study predicting users' demographics based on their search queries, and propose a neural approach for this task. Since search queries can be very noisy and many of them are not useful, instead of combining all queries together for user representation, in our approach we propose a hierarchical user representation with attention (HURA) model to learn informative user representations from their search queries. Our HURA model first learns representations for search queries from words using a word encoder, which consists of a CNN network and a word-level attention network to select important words. Then we learn representations of users based on the representations of their search queries using a query encoder, which contains a CNN network to capture the local contexts of search queries and a query-level attention network to select informative search queries for demographic prediction. Experiments on two real-world datasets validate that our approach can effectively improve the performance of search query based age and gender prediction and consistently outperform many baseline methods.
SESSION: Session 11: Domain Transfer and Representation Learning
Understanding user behavior and predicting future behavior on the web is critical for providing seamless user experiences as well as increasing revenue of service providers. Recently, thanks to the remarkable success of recurrent neural networks (RNNs), it has been widely used for modeling sequences of user behaviors. However, although sequential behaviors appear across multiple domains in practice, existing RNN-based approaches still focus on the single-domain scenario assuming that sequential behaviors come from only a single domain. Hence, in order to analyze sequential behaviors across multiple domains, they require to separately train multiple RNN models, which fails to jointly model the interplay among sequential behaviors across multiple domains. Consequently, they often suffer from lack of information within each domain. In this paper, we first introduce a practical but overlooked phenomenon in sequential behaviors across multiple domains, i.e.,domain switch where two successive behaviors belong to different domains. Then, we propose aDomain Switch-Aware Holistic Recurrent Neural Network (DS-HRNN) that effectively shares the knowledge extracted from multiple domains by systematically handlingdomain switch for the multi-domain scenario. DS-HRNN jointly models the multi-domain sequential behaviors and accurately predicts the future behaviors in each domain with only a single RNN model. Our extensive evaluations on two real-world datasets demonstrate that \DCHRNN\ outperforms existing RNN-based approaches and non-sequential baselines with significant improvements by up to 14.93% in terms of recall of the future behavior prediction.
People often make commitments to perform future actions. Detecting commitments made in email (e.g., "I'll send the report by end of day'') enables digital assistants to help their users recall promises they have made and assist them in meeting those promises in a timely manner. In this paper, we show that commitments can be reliably extracted from emails when models are trained and evaluated on the same domain (corpus). However, their performance degrades when the evaluation domain differs. This illustrates the domain bias associated with email datasets and a need for more robust and generalizable models for commitment detection. To learn a domain-independent commitment model, we first characterize the differences between domains (email corpora) and then use this characterization to transfer knowledge between them. We investigate the performance of domain adaptation, namely transfer learning, at different granularities: feature-level adaptation and sample-level adaptation. We extend this further using a neural autoencoder trained to learn a domain-independent representation for training samples. We show that transfer learning can help remove domain bias to obtain models with less domain dependence. Overall, our results show that domain differences can have a significant negative impact on the quality of commitment detection models and that transfer learning has enormous potential to address this issue.
Users seek direct answers to complex questions from large open-domain knowledge sources like the Web. Open-domain question answering has become a critical task to be solved for building systems that help address users' complex information needs. Most open-domain question answering systems use a search engine to retrieve a set of candidate documents, select one or a few of them as context, and then apply reading comprehension models to extract answers. Some questions, however, require taking a broader context into account, e.g., by considering low-ranked documents that are not immediately relevant, combining information from multiple documents, and reasoning over multiple facts from these documents to infer the answer. In this paper, we propose a model based on the Transformer architecture that is able to efficiently operate over a larger set of candidate documents by effectively combining the evidence from these documents during multiple steps of reasoning, while it is robust against noise from low-ranked non-relevant documents included in the set. We use our proposed model, called TraCRNet, on two public open-domain question answering datasets, SearchQA and Quasar-T, and achieve results that meet or exceed the state-of-the-art.
Representation learning in heterogeneous networks faces challenges due to heterogeneous structural information of multiple types of nodes and relations, and also due to the unstructured attribute or content (e.g., text) associated with some types of nodes. While many recent works have studied homogeneous, heterogeneous, and attributed networks embedding, there are few works that have collectively solved these challenges in heterogeneous networks. In this paper, we address them by developing a Semantic-aware Heterogeneous Network Embedding model (SHNE). SHNE performs joint optimization of heterogeneous SkipGram and deep semantic encoding for capturing both heterogeneous structural closeness and unstructured semantic relations among all nodes, as function of node content, that exist in the network. Extensive experiments demonstrate that SHNE outperforms state-of-the-art baselines in various heterogeneous network mining tasks, such as link prediction, document retrieval, node recommendation, relevance search, and class visualization.
Deep text matching approaches have been widely studied for many applications including question answering and information retrieval systems. To deal with a domain that has insufficient labeled data, these approaches can be used in a Transfer Learning (TL) setting to leverage labeled data from a resource-rich source domain. To achieve better performance, source domain data selection is essential in this process to prevent the "negative transfer" problem. However, the emerging deep transfer models do not fit well with most existing data selection methods, because the data selection policy and the transfer learning model are not jointly trained, leading to sub-optimal training efficiency. In this paper, we propose a novel reinforced data selector to select high-quality source domain data to help the TL model. Specifically, the data selector "acts" on the source domain data to find a subset for optimization of the TL model, and the performance of the TL model can provide "rewards" in turn to update the selector. We build the reinforced data selector based on the actor-critic framework and integrate it to a DNN based transfer learning model, resulting in a Reinforced Transfer Learning (RTL) method. We perform a thorough experimental evaluation on two major tasks for text matching, namely, paraphrase identification and natural language inference. Experimental results show the proposed RTL can significantly improve the performance of the TL model. We further investigate different settings of states, rewards, and policy optimization methods to examine the robustness of our method. Last, we conduct a case study on the selected data and find our method is able to select source domain data whose Wasserstein distance is close to the target domain data. This is reasonable and intuitive as such source domain data can provide more transferability power to the model.
We propose a link prediction algorithm that is based on spring-electrical models. The idea to study these models came from the fact that spring-electrical models have been successfully used for networks visualization. A good network visualization usually implies that nodes similar in terms of network topology, e.g., connected and/or belonging to one cluster, tend to be visualized close to each other. Therefore, we assumed that the Euclidean distance between nodes in the obtained network layout correlates with a probability of a link between them. We evaluate the proposed method against several popular baselines and demonstrate its flexibility by applying it to undirected, directed and bipartite networks.
Solving the Sparsity Problem in Recommendations via Cross-Domain Item Embedding Based on Co-Clustering
Session-based recommendations recently receive much attentions due to no available user data in many cases, e.g., users are not logged-in/tracked. Most session-based methods focus on exploring abundant historical records of anonymous users but ignoring the sparsity problem, where historical data are lacking or are insufficient for items in sessions. In fact, as users' behavior is relevant across domains, information from different domains is correlative, e.g., a user tends to watch related movies in a movie domain, after listening to some movie-themed songs in a music domain (i.e., cross-domain sessions). Therefore, we can learn a complete item description to solve the sparsity problem using complementary information from related domains. In this paper, we propose an innovative method, called Cross-Domain Item Embedding method based on Co-clustering (CDIE-C), to learn cross-domain comprehensive representations of items by collectively leveraging single-domain and cross-domain sessions within a unified framework. We first extract cluster-level correlations across domains using co-clustering and filter out noise. Then, cross-domain items and clusters are embedded into a unified space by jointly capturing item-level sequence information and cluster-level correlative information. Besides, CDIE-C enhances information exchange across domains utilizing three types of relations (i.e., item-to-context-item, item-to-context-co-cluster and co-cluster-to-context-item relations). Finally, we train CDIE-C with two efficient training strategies, i.e., joint training and two-stage training. Empirical results show CDIE-C outperforms the state-of-the-art recommendation methods on three cross-domain datasets and can effectively alleviate the sparsity problem.
SESSION: Session 12: Text Understanding
The recent art in relation extraction is distant supervision which generates training data by heuristically aligning a knowledge base with free texts and thus avoids human labelling. However, the concerned relation mentions often use the bag-of-words representation, which ignores inner correlations between features located in different dimensions and makes relation extraction less effective. To capture the complex characteristics of relation expression and tighten the correlated features, we attempt to discover and utilise informative correlations between features by the following four phases: 1) formulating semantic similarities between lexical features using the embedding method; 2) constructing generative relation for lexical features with different sizes of side windows; 3) computing correlation scores between syntactic features through a kernel-based method; and 4) conducting a distillation process for the obtained correlated feature pairs and integrating informative pairs with existing relation extraction models. The extensive experiments demonstrate that our method can effectively discover correlation information and improve the performance of state-of-the-art relation extraction methods.
Comparative summarization is an effective strategy to discover important similarities and differences in collections of documents biased to users' interests. A natural method of this task is to find important and corresponding content. In this paper, we propose a novel research task of automatic query-based across-time summarization in news archives as well as we introduce an effective method to solve this task. The proposed model first learns an orthogonal transformation between temporally distant news collections. Then, it generates a set of corresponding sentence pairs based on a concise integer linear programming framework. We experimentally demonstrate the effectiveness of our method on the New York Times Annotated Corpus.
There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.
In this paper, we advance the state-of-the-art in topic modeling by means of a new document representation based on pre-trained word embeddings for non-probabilistic matrix factorization. Specifically, our strategy, called CluWords, exploits the nearest words of a given pre-trained word embedding to generate meta-words capable of enhancing the document representation, in terms of both, syntactic and semantic information. The novel contributions of our solution include: (i)the introduction of a novel data representation for topic modeling based on syntactic and semantic relationships derived from distances calculated within a pre-trained word embedding space and (ii)the proposal of a new TF-IDF-based strategy, particularly developed to weight the CluWords. In our extensive experimentation evaluation, covering 12 datasets and 8 state-of-the-art baselines, we exceed (with a few ties) in almost cases, with gains of more than 50% against the best baselines (achieving up to 80% against some runner-ups). Finally, we show that our method is able to improve document representation for the task of automatic text classification.
SESSION: Demonstration Papers
In recent years, we have witnessed a rapid increase of text content stored in digital archives such as newspaper archives or web archives. With the passage of time, it is however difficult to effectively perform search within such collections due to vocabulary and context change. In this paper, we present a system that helps to find analogical terms across temporal text collections by applying non-linear transformation. We implement two approaches for analog retrieval where one of them allows users to also input an aspect term specifying particular perspective of a query. The current prototype system permits temporal analog search across two different time periods based on New York Times Annotated Corpus.
Cross-lingual summarization (CLS) aims to create summaries in a target language, from a document or document set given in a different, source language. Cross-lingual summarization can play a critical role in enabling cross-lingual information access for millions of people across the globe who do not speak or understand languages having large representation on the web. It can also make documents originally published in local languages quickly accessible to a large audience which does not understand those local languages. Though cross-lingual summarization has gathered some attention in the last decade, there has been no serious effort to publish rigorous software for this task. In this paper, we provide a design for an end-to-end CLS software called clstk. Besides implementing a number of methods proposed by different CLS researchers over years, the software integrates multiple components critical for CLS. We hope that this extremely modular tool-kit will help CLS researchers to contribute more effectively to the area.
Retrieval models in information retrieval are used to rank documents for typically under-specified queries. Today machine learning is used to learn retrieval models from click logs and/or relevance judgments that maximizes an objective correlated with user satisfaction. As these models become increasingly powerful and sophisticated, they also become harder to understand. Consequently, it is hard for to identify artifacts in training, data specific biases and intents from a complex trained model like neural rankers even if trained purely on text features. EXS is a search system designed specifically to provide its users with insight into the following questions: "What is the intent of the query according to the ranker?'', "Why is this document ranked higher than another?'' and "Why is this document relevant to the query?''. EXS uses a version of a popular posthoc explanation method for classifiers -- LIME, adapted specifically to answer these questions. We show how such a system can effectively help a user understand the results of neural rankers and highlight areas of improvement.
Designing desirable and aesthetical manifestation of web graphic user interfaces (GUI) is a challenging task for web developers. After determining a web page's content, developers usually refer to existing pages, and adapt the styles from desired pages into the target one. However, it is not only difficult to find appropriate pages to exhibit the target page's content, but also tedious to incorporate styles from different pages harmoniously in the target page. To tackle these two issues, we propose FaceOff, a data-driven automation system that assists the manifestation design of web GUI. FaceOff constructs a repository of web GUI templates based on 15,491 web pages from popular websites and professional design examples. Given a web page for designing manifestation, FaceOff first segments it into multiple blocks, and retrieves GUI templates in the repository for each block. Subsequently, FaceOff recommends multiple combinations of templates according to a Convolutional Neural Network (CNN) based style-embedding model, which makes the recommended style combinations diverse and accordant. We demonstrate that FaceOff can retrieve suitable GUI templates with well-designed and harmonious style, and thus alleviate the developer efforts.
In this paper, we would like to demonstrate an intelligent traffic analytics system called T4, which enables intelligent analytics over real-time and historical trajectories from vehicles. At the front end, we visualize the current traffic flow and result trajectories of different types of queries, as well as the histograms of traffic flow and traffic lights. At the back end, T4 is able to support multiple types of common queries over trajectories, with compact storage, efficient index and fast pruning algorithms. The output of those queries can be used for further monitoring and analytics purposes. Moreover, we train the deep models for traffic flow prediction and traffic light control to reduce traffic congestion. A preliminary version of T4 is available at https://sites.google.com/site/shengwangcs/torch.
Interactive Visualization of Urban Areas of Interest: A Parameter-Free and Efficient Footprint Method
Understanding urban areas of interest (AOIs) is essential to decision making in various urban planning and exploration tasks. Such AOIs can be computed based on the geographic points that satisfy the user query. In this demo, we present an interactive visualization system of urban AOIs, supported by a parameter-free and efficient footprint method called AOI-shapes. Compared to state-of-the-art footprint methods, the proposed AOI-shapes (i) is parameter-free, (ii) is able to recognize multiple regions/outliers, (iii) can detect inner holes, and (iv) supports the incremental method. We demonstrate the effectiveness and efficiency of the proposed AOI-shapes based on a real-world real estate dataset in Australia. A preliminary version of the online demo can be accessed at http://aoishapes.com/.
For a tourist who wishes to stroll in an unknown city, it is useful to have a recommendation of not just the shortest routes but also routes that are pleasant. This paper demonstrates a system that provides pleasant route recommendation. Currently, we focus on routes that have much green and bright views. The system measures pleasure scores by extracting colors or objects in Google Street View panorama images and re-ranks shortest paths in the order of the computed pleasure scores. The current prototype provides route recommendation for city areas in Tokyo, Kyoto and San Francisco.
In this paper, we develop a neural attentive interpretable recommendation system, named NAIRS. A self-attention network, as a key component of the system, is designed to assign attention weights to interacted items of a user. This attention mechanism can distinguish the importance of the various interacted items in contributing to a user profile. %, and it also provides interpretable recommendations. Based on the user profiles obtained by the self-attention network, NAIRS offers personalized high-quality recommendation. Moreover, it develops visual cues to interpret recommendations. This demo application with the implementation of NAIRS enables users to interact with a recommendation system, and it persistently collects training data to improve the system. The demonstration and experimental results show the effectiveness of NAIRS.
In this work, we demonstrate structured search capabilities of the GYANI indexing infrastructure. GYANI allows linguists, journalists, and scholars in humanities to search large semantically annotated document collections in a structured manner by supporting queries with regular expressions between word sequences and annotations. In addition to this, we provide support for attaching semantics to words via annotations in the form of part-of-speech, named entities, temporal expressions, and numerical quantities. We demonstrate that by enabling such structured search capabilities we can quickly gather annotated text regions for various knowledge-centric tasks such as information extraction and question answering.
The recent introduction of entity-centric implicit network representations of unstructured text offers novel ways for exploring entity relations in document collections and streams efficiently and interactively. Here, we present TopExNet as a tool for exploring entity-centric network topics in streams of news articles. The application is available as a web service at https://topexnet.ifi.uni-heidelberg.de.
Open multidimensional data from existing sources and social media often carries insightful information on social issues. With the increase of high volume data and the proliferation of visual analytics platforms, users can more easily interact with and pick out meaningful information from a large dataset. In this paper, we present VisCrime, a system that uses visual analytics to maps out crimes that have occurred in a region/neighbourhood. VisCrime is underpinned by a novel trajectory algorithm that is used to create trajectories from open data sources that reports incidents of crime and data gathered from social media. Our system can be accessed at http://viscrime.ml/deckmap
We propose KGdiff, a new interactive visualization tool for social media content focusing on entities and relationships. The core component is a layout algorithm that highlights the differences between two graphs. We apply this algorithm on knowledge graphs consisting of named entities and their relations extracted from text streams over different time periods. The visualization system provides additional information such as the volume and frequency ranking of entities and allows users to select which parts of the graph to visualize interactively. On Twitter and news article collections, KGdiff allows users to compare different data subsets. Results of such comparisons often reveal topical or geographical changes in a discussion. More broadly, graph differences are useful for a wide range of relational data comparison tasks, such as comparing social interaction graphs, identifying changes in user behavior, or discovering differences in graphs from distinct sources, geography, or political stance.
SESSION: Doctoral Consortium Papers
Understanding and predicting the popularity of online items is an important open problem in social media analysis. Most of the recent work on popularity prediction is either based on learning a variety of features from full network data or using generative processes to model the event time data. We identify two gaps in the current state of the art prediction models. The first is the unexplored connection and comparison between the two aforementioned approaches. In our work, we bridge gap between feature-driven and generative models by modelling social cascade with a marked Hawkes self-exciting point process. We then learn a predictive layer on top for popularity prediction using a collection of cascade history. Secondly, the existing methods typically focus on a single source of external influence, whereas for many types of online content such as YouTube videos or news articles, attention is driven by multiple heterogeneous sources simultaneously - e.g. microblogs or traditional media coverage. We propose a recurrent neural network based model for asynchronous streams that connects multiple streams of different granularity via joint inference. We further design two new measures, one to explain the viral potential of videos, the other to uncover latent influences including seasonal trends. This work provides accurate and explainable popularity predictions, as well as computational tools for content producers and marketers to allocate resources for promotion campaigns.
The goal of this thesis is to develop techniques for comparative summarisation of multimodal document collections. Comparative summarisation is extractive summarisation in comparative settings, where documents form two or more groups, e.g. articles on the same topic but from different sources. Comparative summarisation involves, not only, selecting representative and diverse samples within groups, but also samples that highlight commonalities and differences between the groups. We posit that comparative summarisation is a fruitful problem for diverse use cases, such as comparing content over time, authors, or distinct view points. We formulate the problem of comparative summarisation by reducing it to binary classification problem and define objectives to incorporate representativeness, diversity and comparativeness. We design new automatic and crowd-sourced evaluation protocols for summarisation evaluation that scales much better than the evaluations requiring manually created ground truth summaries. We show the efficacy of the approach in a newly curated datasets of controversial news topics. We plan to develop new collection comparison methods for multimodal document collections.
Our understanding of the web has been evolving from a large database of information to a Socio - Cognitive Space, where humans are not just using the web but participating in the web. World wide web has evolved into the largest source of information in the history, and it continues to grow without any known agenda. The web needs to be observed and studied to understand various impacts of it on the society (both positive and negative) and shape the future of the web and the society. This gave rise to the global grid of Web Observatories which focus and observe various aspects of the web. Web Observatories aim to share and collaborate various data sets, analysis tools and applications with all web observatories across the world. We plan to design and develop a Web Observatory called to observe and understand online social cognition. We propose that the social media on the web is acting as a Marketplace of Opinions where multiple users with differing interests exchange opinions. For a given trending topic on social media, we propose a model to identify the Signature of the trending topic which characterizes the discourse around the topic.
The share of videos on Internet traffic has been growing, e.g., people are now spending a billion hours watching YouTube videos every day. Therefore, understanding how videos capture attention on a global scale is also of growing importance for both research and practice. In online platforms, people can interact with videos in different ways -- there are behaviors of active participation (watching, commenting, and sharing) and that of passive consumption (viewing). In this paper, we take a data-driven approach to studying how human attention is allocated in online videos with respect to both active and passive behaviors. We first investigate the active interaction behaviors by proposing a novel metric to represent the aggregate user engagement on YouTube videos. We show this metric is correlated with video quality, stable over lifetime, and predictable before video's upload. Next, we extend the line of work on modelling video view counts by disentangling the effects of two dominant traffic sources -- related videos and YouTube search. Findings from this work can help content producers to create engaging videos and hosting platforms to optimize advertising strategies, recommender systems, and many more applications.
Epidemic models and Hawkes point process models are two common model classes for information diffusion. Recent work has revealed the equivalence between the two for information diffusion modeling. This allows tools created for one class of models to be applied to another. However, epidemic models and Hawkes point processes can be connected in more ways. This thesis aims to develop a rich set of mathematical equivalences and extensions, and use them to ask and answer questions in social media and beyond. Specifically, we show our plan of generalizing the equivalence of the two model classes by extending it to Hawkes point process models with arbitrary memory kernels. We then outline a rich set of quantities describing diffusion, including diffusion size and extinction probability, introduced in the fields where the models are originally designed. Lastly, we discuss some novel applications of these quantities in a range of problems such as popularity prediction and popularity intervention.
As heterogeneous verticals account for more and more in search engines, users' preference of search results is largely affected by their presentations. Apart from texts, multimedia information such as images and videos has been widely adopted as it makes the search engine result pages (SERPs) more informative and attractive. It is more proper to regard the SERP as an information union, not separate search results because they interact with each other. Considering these changes in search engines, we plan to better exploit the contents of search results displayed on SERPs through deep neural networks and formulate the pagewise optimization of SERPs as a reinforcement learning problem.
Networks can be extracted from a wide range of real systems, such as online social networks, communication networks and biological systems. Detection of cohesive groups in these graphs, primarily based on link information, is the goal of community detection. Community structures emerge when a group of nodes is more likely to be linked to each other than to the rest of the network. The modules found can be disjoint or overlapping. Another relevant feature of networks is the possibility to evolve over time. Furthermore, nodes can have valuable information to improve the community detection process. Hence, in this work we propose to design a soft overlapping community detection method for static and dynamic social networks with node attributes. Preliminary results on a toy network are promising.
Traditionally, recommenders have been based on a single-shot model based on past user actions. Conversational recommenders allow incremental elicitation of user preference by performing user-system dialogue. For example, the systems can ask about user preference toward a feature associated with the items. In such systems, it is important to design an efficient conversation, which minimizes the number of question asked while maximizing the preference information obtained. Therefore, this research is intended to explore possible ways to design a conversational recommender with an efficient preference elicitation. Specifically, it focuses on the order of questions. Also, an idea proposed to suggest answers for each question asked, which can assist users in giving their feedback.
Web-based image search engines differ from Web search engines greatly. The intents or goals behind human interactions with image search engines are different. In image search, users mainly search images instead of Web pages or online services. It is essential to know why people search for images because user satisfaction may vary as intent varies. Furthermore, image search engines show results differently. For example, grid-based placement is used in image search instead of the linear result list, so that users can browse result list both vertically and horizontally. Different user intents and system UIs lead to different user behavior. Thus, it is hard to apply standard user behavior models developed for general Web search to image search. To better understand user intent and behavior in image search scenarios, we plan to conduct the lab-based user study, field study and commercial search log analysis. We then propose user behavior models based on the observation from data analysis to improve the performance of Web image search engines.
SESSION: Tutorial Summaries
As computing systems are more frequently and more actively intervening to improve people's work and daily lives, it is critical to correctly predict and understand the causal effects of these interventions. Conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal analysis. This tutorial will introduce participants to concepts in causal inference and counterfactual reasoning, drawing from a broad literature from statistics, social sciences and machine learning. We will first motivate the use of causal inference through examples in domains such as recommender systems, social media datasets, health, education and governance. To tackle such questions, we will introduce the key ingredient that causal analysis depends on---counterfactual reasoning---and describe the two most popular frameworks based on Bayesian graphical models and potential outcomes. Based on this, we will cover a range of methods suitable for doing causal inference with large-scale online data, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We will also focus on best practices for evaluation and validation of causal inference techniques, drawing from our own experiences. After attending this tutorial, participants will understand the basics of causal inference, be able to appropriately apply the most common causal inference methods, and be able to recognize situations where more complex methods are required.
This hands-on half-day tutorial consists of two sessions. Part~I covers the following topics: Preliminaries; Paired and two-sample t-tests, confidence intervals; One-way ANOVA and two-way ANOVA without replication; Familiwise error rate. Part~II covers the following topics: Tukey's HSD test, simultaneous confidence intervals; Randomisation test and randomised Tukey HSD test; What's wrong with statistical significance tests?; Effect sizes, statistical power; Topic set size design and power analysis; Summary: how to report your results. Participants should have some prior knowledge about the very basics of statistical significance testing and are strongly encouraged to bring a laptop with R already installed. They will learn how to design and conduct statistical significance tests for comparing the mean effectiveness scores of two or more systems appropriately, and to report on the test results in an informative manner.
Matching is the key problem in search and recommendation, that is to measure the relevance of a document to a query or the interest of a user on an item. Previously, machine learning methods have been exploited to address the problem, which learn a matching function from labeled data, also referred to as "learning to match". In recent years, deep learning has been successfully applied to matching and significant progresses have been made. Deep semantic matching models for search and neural collaborative filtering models for recommendation are becoming the state-of-the-art technologies. The key to the success of the deep learning approach is its strong ability in learning of representations and generalization of matching patterns from raw data (e.g., queries, documents, users, and items, particularly in their raw forms). In this tutorial, we aim to give a comprehensive survey on recent progress in deep learning for matching in search and recommendation. Our tutorial is unique in that we try to give a unified view on search and recommendation. In this way, we expect researchers from the two fields can get deep understanding and accurate insight on the spaces, stimulate more ideas and discussions, and promote developments of technologies. The tutorial mainly consists of three parts. Firstly, we introduce the general problem of matching, which is fundamental in both search and recommendation. Secondly, we explain how traditional machine learning techniques are utilized to address the matching problems in search and recommendation. Lastly, we elaborate how deep learning can be effectively used to solve the matching problems in both tasks.
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial aims to present an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness-first" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice, by presenting case studies from different technology companies. Based on our experiences in industry, we will identify open problems and research challenges for the data mining / machine learning community.
The explosive growth of fake news and its erosion to democracy, justice, and public trust increased the demand for fake news detection. As an interdisciplinary topic, the study of fake news encourages a concerted effort of experts in computer and information science, political science, journalism, social science, psychology, and economics. A comprehensive framework to systematically understand and detect fake news is necessary to attract and unite researchers in related areas to conduct research on fake news. This tutorial aims to clearly present (1) fake news research, its challenges, and research directions; (2) a comparison between fake news and other related concepts (e.g., rumors); (3) the fundamental theories developed across various disciplines that facilitate interdisciplinary research; (4) various detection strategies unified under a comprehensive framework for fake news detection; and (5) the state-of-the-art datasets, patterns, and models. We present fake news detection from various perspectives, which involve news content and information in social networks, and broadly adopt techniques in data mining, machine learning, natural language processing, information retrieval and social search. Facing the upcoming 2020 U.S. presidential election, challenges for automatic, effective and efficient fake news detection are also clarified in this tutorial.
The HS2019 tutorial will cover topics from an area of information retrieval (IR) with significant societal impact --- health search. Whether it is searching patient records, helping medical professionals find best-practice evidence, or helping the public locate reliable and readable health information online, health search is a challenging area for IR research with an actively growing community and many open problems. This tutorial will provide attendees with a full stack of knowledge on health search, from understanding users and their problems to practical, hands-on sessions on current tools and techniques, current campaigns and evaluation resources, as well as important open questions and future directions.
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
SESSION: Workshop Summaries
The 1st International Workshop on Context-Aware Recommendation Systems with Big Data Analytics (CARS-BDA)
With the explosive growth of online service platforms, increasing number of people and enterprises are doing everything online. In order for organizations, governments, and individuals to understand their users, and promote their products or services, it is necessary for them to analyse big data and recommend the media or online services in real time. Effective recommendation of items of interest to consumers has become critical for enterprises in domains such as retail, e-commerce, and online media. Driven by the business successes, academic research in this field has also been active for many years. Through many scientific breakthroughs have been achieved, there are still tremendous challenges in developing effective and scalable recommendation systems for real-world industrial applications. Existing solutions focus on recommending items based on pre-set contexts, such as time, location, weather etc. The big data sizes and complex contextual information add further challenges to the deployment of advanced recommender systems. This workshop aims to bring together researchers with wide-ranging backgrounds to identify important research questions, to exchange ideas from different research disciplines, and, more generally, to facilitate discussion and innovation in the area of context-aware recommender systems and big data analytics.
Matching between two information objects is the core of many different information retrieval (IR) applications including Web search, question answering, and recommendation. Recently, deep learning methods have yielded immense success in speech recognition, computer vision, and natural language processing, significantly advancing state-of-the-art of these areas. In the IR community, deep learning has also attracted much attention, and researchers have proposed a large number of deep matching models to tackle the matching problem for different IR applications. Despite the fact that deep matching models have gained significant progress in these areas, there are still many challenges to be addressed when applying these models to real IR scenarios. In this workshop, we focus on the applicability of deep matching models to practical applications. We aim to discuss the issues of applying deep matching models to production systems, as well as to shed some light on the fundamental characteristics of different matching tasks in IR. website : https://wsdm2019-dapa.github.io/index.html
The first workshop on Interactive Data Mining is held in Melbourne, Australia, on February 15, 2019 and is co-located with 12th ACM International Conference on Web Search and Data Mining (WSDM 2019). The goal of this workshop is to share and discuss research and projects that focus on interaction with and interactivity of data mining systems. The program includes invited speaker, presentation of research papers, and a discussion session.
The task intelligence workshop at the 2019 ACM Web Search and Data Mining (WSDM) conference comprised a mixture of research paper presentations, reports from data challenge participants, invited keynote(s) on broad topics related to tasks, and a workshop-wide discussion about task intelligence and its implications for system development.