WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data MiningFull Citation in the ACM Digital Library
SESSION: Keynote Talks
Towards Autonomous Driving
The automotive and transportation industry is going through a tectonic shift in the next decade with the advent of Connectivity, Automation, Sharing, and Electrification (CASE). Autonomous driving presents a historical opportunity to transform the academic, technological, and industrial landscape with advanced sensing and actuation, high definition mapping, new machine learning algorithms, smart planning and control, increasing computing powers, and new infrastructure with 5G, cloud and edge computing. Indeed, we have witnessed unprecedented innovation and activities in the past five years in R&D, investment, joint ventures, road tests and commercial trials, from auto makers, tier-ones, and new forces from the internet and high-tech industries.
In this talk, I will speak about this historical opportunity and challenges from technological, industrial and policy perspectives. I will address some of the core controversial and critical issues in the advancement of autonomous driving: open vs closed, Lidar vs cameras, progressive L2-L3-L4 vs new L4, autonomous capabilities vs V2X, Robotaxi vs vertical, China vs global, automakers vs new players, and the evolutional path and end game.
Beyond-Accuracy Goals, Again
Improving the performance of information retrieval systems tends to be narrowly scoped. Often, better prediction performance is considered the only metric of improvement. As a result, work on improving information retrieval methods usually focuses on im- proving the methods' accuracy. Such a focus is myopic. Instead, as researchers and practitioners we should adopt a richer perspective measuring the performance of information retrieval systems. I am not the first to make this point (see, e.g., ), but I want to highlight dimensions that broaden the scope considered so far and offer a number of examples to illustrate what this would mean for our research agendas.
First, trustworthiness is a prerequisite for people, organizations, and societies to use AI-based, and, especially, machine learning- based systems in general, and information retrieval systems in particular. Trust can be gained in an intrinsic manner by revealing the inner workings of an AI-based system, i.e., through explainability. Or it can be gained extrinsically by showing, in a principled or empirical manner, that a system upholds verifiable guarantees. Such guarantees should obtained for the following dimensions (at a minimum): (i) accuracy, including well-defined and explained contexts of usage; (ii) reliability, including exhibiting parity with respect to sensitive attributes; (iii) repeatable and reproducible results, including audit trails; (iv) resilience to adversarial examples, distributional shifts; and (v) safety, including privacy-preserving search and recommendation.
Second, in information retrieval, our experiments are mostly conducted in controlled laboratory environments. Extrapolating this information to evaluate the real-world effects often remains a challenge. This is particularly true when measuring the impact of information retrieval systems across broader scales, both temporally and spatially. Conducting controlled experimental trials for evaluating real-world impacts of information retrieval systems can result in depicting a snapshot situation, where systems are tailored towards that specific environment. As society is constantly changing, the requirements set for information retrieval systems are changing as well, resulting in short-term and long-term feedback loops with interactions between society and information retrieval systems.
Learning to Understand Audio and Multimodal Content
Music, podcasts and audiobooks are rich audio content types with strong listener engagement. Search and recommendation across these content types can be greatly enhanced with a deep understanding of their content; across audio, text, and other multimodal content. In this talk, I discuss some of the challenges and opportunities in understanding this content. This deep understanding of content enables us to delight our users and expand the reach of our content creators. As part of enabling wider academic research into podcast content understanding, Spotify Research  has released a podcast dataset  with 120,000 hours of podcasts in English  and Portuguese .
SESSION: Session 1: Social Issues in Data Mining
Local Edge Dynamics and Opinion Polarization
The proliferation of social media platforms, recommender systems, and their joint societal impacts have prompted significant interest in opinion formation and evolution within social networks. We study how local edge dynamics can drive opinion polarization. In particular, we introduce a variant of the classic Friedkin-Johnsen opinion dynamics, augmented with a simple time-evolving network model. Edges are iteratively added or deleted according to simple rules, modeling decisions based on individual preferences and network recommendations.
Via simulations on synthetic and real-world graphs, we find that the combined presence of two dynamics gives rise to high polarization: 1) confirmation bias -- i.e., the preference for nodes to connect to other nodes with similar expressed opinions and 2) friend-of-friend link recommendations, which encourage new connections between closely connected nodes. We show that our model is tractable to theoretical analysis, which helps explain how these local dynamics erode connectivity across opinion groups, affecting polarization and a related measure of disagreement across edges. Finally, we validate our model against real-world data, showing that our edge dynamics drive the structure of arbitrary graphs, including random graphs, to more closely resemble real social networks.
Our code and supplemental materials are available at https://github.com/adamlechowicz/opinion-polarization/.
Never Too Late to Learn: Regularizing Gender Bias in Coreference Resolution
Leveraging pre-trained language models (PLMs) as initializers for efficient transfer learning has become a universal approach for text-related tasks. However, the models not only learn the language understanding abilities but also reproduce prejudices for certain groups in the datasets used for pre-training. Recent studies show that the biased knowledge acquired from the datasets affects the model predictions on downstream tasks. In this paper, we mitigate and analyze the gender biases in PLMs with coreference resolution, which is one of the natural language understanding (NLU) tasks. PLMs exhibit two types of gender biases: stereotype and skew. The primary causes for the biases are the imbalanced datasets with more male examples and the stereotypical examples on gender roles. While previous studies mainly focused on the skew problem, we aim to mitigate both gender biases in PLMs while maintaining the model's original linguistic capabilities. Our method employs two regularization terms, Stereotype Neutralization (SN) and Elastic Weight Consolidation (EWC). The models trained with the methods show to be neutralized and reduce the biases significantly on the WinoBias dataset compared to the public BERT. We also invented a new gender bias quantification metric called the Stereotype Quantification (SQ) score. In addition to the metrics, embedding visualizations were used to interpret how our methods have successfully debiased the models.
Marginal-Certainty-Aware Fair Ranking Algorithm
Ranking systems are ubiquitous in modern Internet services, including online marketplaces, social media, and search engines. Traditionally, ranking systems only focus on how to get better relevance estimation. When relevance estimation is available, they usually adopt a user-centric optimization strategy where ranked lists are generated by sorting items according to their estimated relevance. However, such user-centric optimization ignores the fact that item providers also draw utility from ranking systems. It has been shown in existing research that such user-centric optimization will cause much unfairness to item providers, followed by unfair opportunities and unfair economic gains for item providers.
To address ranking fairness, many fair ranking methods have been proposed. However, as we show in this paper, these methods could be suboptimal as they directly rely on the relevance estimation without being aware of the uncertainty (i.e., variance of the estimated relevance). To address this uncertainty, we propose a novel Marginal-Certainty-aware Fair algorithm named MCFair. MCFair jointly optimizes fairness and user utility, while relevance estimation is constantly updated in an online manner. In MCFair, we first develop a ranking objective that includes uncertainty, fairness, and user utility. Then we directly use the gradient of the ranking objective as the ranking score. We theoretically prove that MCFair based on gradients is optimal for the aforementioned ranking objective. Empirically, we find that on semi-synthesized datasets, MCFair is effective and practical and can deliver superior performance compared to state-of-the-art fair ranking methods. To facilitate reproducibility, we release our code.
Beyond Digital "Echo Chambers": The Role of Viewpoint Diversity in Political Discussion
Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming---siloed in so called "echo chambers" of exclusively like-minded discussants. Yet, to date we lack sufficient means to measure viewpoint diversity in conversations. To this end, in this paper, we operationalize two viewpoint metrics proposed for recommender systems and adapt them to the context of social media conversations. This is the first study to apply these two metrics (Representation and Fragmentation) to real world data and to consider the implications for online conversations specifically. We apply these measures to two topics---daylight savings time (DST), which serves as a control, and the more politically polarized topic of immigration. We find that the diversity scores for both Fragmentation and Representation are lower for immigration than for DST. Further, we find that while pro-immigrant views receive consistent pushback on the platform, anti-immigrant views largely operate within echo chambers. We observe less severe yet similar patterns for DST. Taken together, Representation and Fragmentation paint a meaningful and important new picture of viewpoint diversity.
BLADE: Biased Neighborhood Sampling based Graph Neural Network for Directed Graphs
Directed graphs are ubiquitous and have applications across multiple domains including citation, website, social, and traffic networks. Yet, majority of research involving graph neural networks (GNNs) focus on undirected graphs. In this paper, we deal with the problem of node recommendation in directed graphs. Specifically, given a directed graph and query node as input, the goal is to recommend top- nodes that have a high likelihood of a link with the query node. Here we propose BLADE, a novel GNN to model directed graphs. In order to jointly capture link likelihood and link direction, we employ an asymmetric loss function and learn dual embeddings for each node, by appropriately aggregating features from its neighborhood. In order to achieve optimal performance on both low and high-degree nodes, we employ a biased neighborhood sampling scheme that generates locally varying neighborhoods which differ based on a node's connectivity structure. Extensive experimentation on several open-source and proprietary directed graphs show that BLADE outperforms state-of-the-art baselines by 6-230% in terms of HitRate and MRR for the node recommendation task and 10.5% in terms of AUC for the link direction prediction task. We perform ablation study to accentuate the importance of biased neighborhood sampling employed in generating higher quality recommendations for both low-degree and high-degree query nodes. Further, BLADE delivers significant improvement in revenue and sales as measured through an A/B experiment.
Mining User-aware Multi-relations for Fake News Detection in Large Scale Online Social Networks
Users' involvement in creating and propagating news is a vital aspect of fake news detection in online social networks. Intuitively, credible users are more likely to share trustworthy news, while untrusted users have a higher probability of spreading untrustworthy news. In this paper, we construct a dual-layer graph (i.e., news layer and user layer) to extract multi-relations of news and users in social networks to derive rich information for detecting fake news. Based on the dual-layer graph, we propose a fake news detection model Us-DeFake. It learns the propagation features of news in the news layer and the interaction features of users in the user layer. Through the inter-layer in the graph, Us-DeFake fuses the user signals that contain credibility information into the news features, to provide distinctive user-aware embeddings of news for fake news detection. The training process conducts on multiple dual-layer subgraphs obtained by a graph sampler to scale Us-DeFake in large scale social networks. Extensive experiments on real-world datasets illustrate the superiority of Us-DeFake which outperforms all baselines, and the users' credibility signals learned by interaction relation can notably improve the performance of our model.
SESSION: Session 2: Recommender Systems I
Simplifying Graph-based Collaborative Filtering for Recommendation
Graph Convolutional Networks (GCNs) are a popular type of machine learning models that use multiple layers of convolutional aggregation operations and non-linear activations to represent data. Recent studies apply GCNs to Collaborative Filtering (CF)-based recommender systems (RSs) by modeling user-item interactions as a bipartite graph and achieve superior performance. However, these models face difficulty in training with non-linear activations on large graphs. Besides, most GCN-based models could not model deeper layers due to the over-smoothing effect with the graph convolution operation. In this paper, we improve the GCN-based CF models from two aspects. First, we remove non-linearities to enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we obtain the initialization of the embedding for each node in the graph by computing the network embedding on the condensed graph, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse interaction data. The proposed model is a linear model that is easy to train, scalable to large datasets, and shown to yield better efficiency and effectiveness on four real datasets.
Self-Supervised Group Graph Collaborative Filtering for Group Recommendation
Nowadays, it is more and more convenient for people to participate in group activities. Therefore, providing some recommendations to groups of individuals is indispensable. Group recommendation is the task of suggesting items or events for a group of users in social networks or online communities. In this work, we study group recommendation in a particular scenario, namely occasional group recommendation, which has few or no historical directly interacted items. Existing group recommendation methods mostly adopt attention-based preference aggregation strategies to capture group preferences. However, these models either ignore the complex high-order interactions between groups, users and items or greatly reduce the efficiency by introducing complex data structures. Moreover, occasional group recommendation suffers from the problem of data sparsity due to the lack of historical group-item interactions. In this work, we focus on addressing the aforementioned challenges and propose a novel group recommendation model called Self-Supervised Group Graph Collaborative Filtering (SGGCF). The goal of the model is capturing the high-order interactions between users, items and groups and alleviating the data sparsity issue in an efficient way. First, we explicitly model the complex relationships as a unified user-centered heterogeneous graph and devise a base group recommendation model. Second, we explore self-supervised learning on the graph with two kinds of contrastive learning module to capture the implicit relations between groups and items. At last, we treat the proposed contrastive learning loss as supplementary and apply a multi-task strategy to jointly train the BPR loss and the proposed contrastive learning loss. We conduct extensive experiments on three real-world datasets, and the experimental results demonstrate the superiority of our proposed model in comparison to the state-of-the-art baselines.
Towards Universal Cross-Domain Recommendation
In industry, web platforms such as Alibaba and Amazon often provide diverse services for users. Unsurprisingly, some developed services are data-rich, while some newly started services are data-scarce accompanied by severe data sparsity and cold-start problems. To alleviate the above problems and incubate new services easily, cross-domain recommendation (CDR) has attracted much attention from industrial and academic researchers. Generally, CDR aims to transfer rich user-item interaction information from related source domains (e.g., developed services) to boost recommendation quality of target domains (e.g., newly started services). For different scenarios, previous CDR methods can be roughly divided into two branches: (1) Data sparsity CDR fulfills user preference aided by other domain data to make intra-domain recommendations for users with few interactions, (2) Cold-start CDR projects user preference from other domain to make inter-domain recommendations for users with none interactions. In the past years, many outstanding CDR methods are emerged, however, to the best of our knowledge, none of them attempts to solve the two branches simultaneously. In this paper, we provide a unified framework, namely UniCDR, which can universally model different CDR scenarios by transferring the domain-shared information. Extensive experiments under the above 2 branches on 4 CDR scenarios and 6 public and large-scale industrial datasets demonstrate the effectiveness and universal ability of our UniCDR.
A Personalized Neighborhood-based Model for Within-basket Recommendation in Grocery Shopping
Users of online shopping platforms typically purchase multiple items at a time in the form of a shopping basket. Personalized within-basket recommendation is the task of recommending items to complete an incomplete basket during a shopping session. In contrast to the related task of session-based recommendation, where the goal is to complete an ongoing anonymous session, we have access to the shopping history of the user in within-basket recommendation. Previous studies have shown the superiority of neighborhood-based models for session-based recommendation and the importance of personal history in the grocery shopping domain. But their applicability in within-basket recommendation remains unexplored.
We propose PerNIR, a neighborhood-based model that explicitly models the personal history of users for within-basket recommendation in grocery shopping. The main novelty of PerNIR is in modeling the short-term interests of users, which are represented by the current basket, as well as their long-term interest, which is reflected in their purchasing history. In addition to the personal history, user neighbors are used to capture the collaborative purchase behavior. We evaluate PerNIR on two public and proprietary datasets. The experimental results show that it outperforms 10 state-of-the-art competitors with a significant margin, i.e., with gains of more than 12% in terms of hit rate over the second best performing approach. Additionally, we showcase an optimized implementation of our method, which computes recommendations fast enough for real-world production scenarios.
Disentangled Negative Sampling for Collaborative Filtering
Negative sampling is essential for implicit collaborative filtering to generate negative samples from massive unlabeled data. Unlike existing strategies that consider items as a whole when selecting negative items, we argue that normally user interactions are mainly driven by some relevant, but not all, factors of items, leading to a new direction of negative sampling. In this paper, we introduce a novel disentangled negative sampling (DENS) method. We first disentangle the relevant and irrelevant factors of positive and negative items using a hierarchical gating module. Next, we design a factor-aware sampling strategy to identify the best negative samples by contrasting the relevant factors while keeping irrelevant factors similar. To ensure the credibility of the disentanglement, we propose to adopt contrastive learning and introduce four pairwise contrastive tasks, which enable to learn better disentangled representations of the relevant and irrelevant factors and remove the dependency on ground truth. Extensive experiments on five real-world datasets demonstrate the superiority of DENS against several state-of-the-art competitors, achieving over 7% improvement over the strongest baseline in terms of Recall@20 and NDCG@20. Our code is publically available at https://github.com/Riwei-HEU/DENS .
Relation Preference Oriented High-order Sampling for Recommendation
The introduction of knowledge graphs (KG) into recommendation systems (RS) has been proven to be effective because KG introduces a variety of relations between items. In fact, users have different relation preferences depending on the relationship in KG. Existing GNN-based models largely adopt random neighbor sampling strategies to process propagation; however, these models cannot aggregate biased relation preference local information for a specific user, and thus cannot effectively reveal the internal relationship between users' preferences. This will reduce the accuracy of recommendations, while also limiting the interpretability of the results.
Therefore, we propose a Relation Preference oriented High-order Sampling (RPHS) model to dynamically sample subgraphs based on relation preference and hard negative samples for user-item pairs. We design a path sampling strategy based on relation preference, which can encode the critical paths between specific user-item pairs to sample the paths in the high-order message passing subgraphs. Next, we design a mixed sampling strategy and define a new propagation operation to further enhance RPHS's ability to distinguish negative signals. Through the above sampling strategies, our model can better aggregate local relation preference information and reveal the internal relationship between users' preferences. Experiments show that our model outperforms the state-of-the-art models on three datasets by 14.98%, 5.31%, and 8.65%, and also performs well in terms of interpretability. The codes are available at https://github.com/RPHS/RPHS.git
SESSION: Session 3: Graph Neural Networks
Minimum Entropy Principle Guided Graph Neural Networks
Graph neural networks (GNNs) are now the mainstream method for mining graph-structured data and learning low-dimensional node- and graph-level embeddings to serve downstream tasks. However, limited by the bottleneck of interpretability that deep neural networks present, existing GNNs have ignored the issue of estimating the appropriate number of dimensions for the embeddings. Hence, we propose a novel framework called Minimum Graph Entropy principle-guided Dimension Estimation, i.e. MGEDE, that learns the appropriate embedding dimensions for both node and graph representations. In terms of node-level estimation, a minimum entropy function that counts both structure and attribute entropy, appraises the appropriate number of dimensions. In terms of graph-level estimation, each graph is assigned a customized embedding dimension from a candidate set based on the number of dimensions estimated for the node-level embeddings. Comprehensive experiments with node and graph classification tasks and nine benchmark datasets verify the effectiveness and generalizability of MGEDE.
Learning to Distill Graph Neural Networks
Graph Neural Networks (GNNs) can effectively capture both the topology and attribute information of a graph, and have been extensively studied in many domains. Recently, there is an emerging trend that equips GNNs with knowledge distillation for better efficiency or effectiveness. However, to the best of our knowledge, existing knowledge distillation methods applied on GNNs all employed predefined distillation processes, which are controlled by several hyper-parameters without any supervision from the performance of distilled models. Such isolation between distillation and evaluation would lead to suboptimal results. In this work, we aim to propose a general knowledge distillation framework that can be applied on any pretrained GNN models to further improve their performance. To address the isolation problem, we propose to parameterize and learn distillation processes suitable for distilling GNNs. Specifically, instead of introducing a unified temperature hyper-parameter as most previous work did, we will learn node-specific distillation temperatures towards better performance of distilled models. We first parameterize each node's temperature by a function of its neighborhood's encodings and predictions, and then design a novel iterative learning process for model distilling and temperature learning. We also introduce a scalable variant of our method to accelerate model training. Experimental results on five benchmark datasets show that our proposed framework can be applied on five popular GNN models and consistently improve their prediction accuracies with 3.12% relative enhancement on average. Besides, the scalable variant enables 8 times faster training speed at the cost of 1% prediction accuracy.
MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution
Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors. Recently, some studies have discussed the importance of modeling neighborhood distribution on the graph. However, most existing GNNs aggregate neighbors' features through single statistic (e.g., mean, max, sum), which loses the information related to neighbor's feature distribution and therefore degrades the model performance. In this paper, inspired by the method of moment in statistical theory, we propose to model neighbor's feature distribution with multi-order moments. We design a novel GNN model, namely Mix-Moment Graph Neural Network (MM-GNN), which includes a Multi-order Moment Embedding (MME) module and an Element-wise Attention-based Moment Adaptor module. MM-GNN first calculates the multi-order moments of the neighbors for each node as signatures, and then use an Element-wise Attention-based Moment Adaptor to assign larger weights to important moments for each node and update node representations. We conduct extensive experiments on 15 real-world graphs (including social networks, citation networks and web-page networks etc.) to evaluate our model, and the results demonstrate the superiority of MM-GNN over existing state-of-the-art models.
Global Counterfactual Explainer for Graph Neural Networks
Graph neural networks (GNNs) find applications in various domains such as computational biology, natural language processing, and computer security. Owing to their popularity, there is an increasing need to explain GNN predictions since GNNs are black-box machine learning models. One way to address this is counterfactual reasoning where the objective is to change the GNN prediction by minimal changes in the input graph. Existing methods for counterfactual explanation of GNNs are limited to instance-specific local reasoning. This approach has two major limitations of not being able to offer global recourse policies and overloading human cognitive ability with too much information. In this work, we study the global explainability of GNNs through global counterfactual reasoning. Specifically, we want to find a small set of representative counterfactual graphs that explains all input graphs. Towards this goal, we propose GCFExplainer, a novel algorithm powered by vertex-reinforced random walks on an edit map of graphs with a greedy summary. Extensive experiments on real graph datasets show that the global explanation from GCFExplainer provides important high-level insights of the model behavior and achieves a 46.9% gain in recourse coverage and a 9.5% reduction in recourse cost compared to the state-of-the-art local counterfactual explainers.
Effective Graph Kernels for Evolving Functional Brain Networks
The graph kernel of the functional brain network is an effective method in the field of neuropsychiatric disease diagnosis like Alzheimer's Disease (AD). The traditional static brain networks cannot reflect dynamic changes of brain activities, but evolving brain networks, which are a series of brain networks over time, are able to seize such dynamic changes. As far as we know, the graph kernel method is effective for calculating the differences among networks. Therefore, it has a great potential to understand the dynamic changes of evolving brain networks, which are a series of chronological differences. However, if the conventional graph kernel methods which are built for static networks are applied directly to evolving networks, the evolving information will be lost and accurate diagnostic results will be far from reach. We propose an effective method, called Global Matching based Graph Kernels (GM-GK), which captures dynamic changes of evolving brain networks and significantly improves classification accuracy. At the same time, in order to reflect the natural properties of the brain activity of the evolving brain network neglected by the GM-GK method, we also propose a Local Matching based Graph Kernel (LM-GK), which allows the order of the evolving brain network to be locally fine-tuned. Finally, the experiments are conducted on real data sets and the results show that the proposed methods can significantly improve the neuropsychiatric disease diagnostic accuracy.
Self-Supervised Graph Structure Refinement for Graph Neural Networks
Graph structure learning (GSL), which aims to learn the adjacency matrix for graph neural networks (GNNs), has shown great potential in boosting the performance of GNNs. Most existing GSL works apply a joint learning framework where the estimated adjacency matrix and GNN parameters are optimized for downstream tasks. However, as GSL is essentially a link prediction task, whose goal may largely differ from the goal of the downstream task. The inconsistency of these two goals limits the GSL methods to learn the potential optimal graph structure. Moreover, the joint learning framework suffers from scalability issues in terms of time and space during the process of estimation and optimization of the adjacency matrix. To mitigate these issues, we propose a graph structure refinement (GSR) framework with a pretrain-finetune pipeline. Specifically, The pre-training phase aims to comprehensively estimate the underlying graph structure by a multi-view contrastive learning framework with both intra- and inter-view link prediction tasks. Then, the graph structure is refined by adding and removing edges according to the edge probabilities estimated by the pre-trained model. Finally, the fine-tuning GNN is initialized by the pre-trained model and optimized toward downstream tasks. With the refined graph structure remaining static in the fine-tuning space, GSR avoids estimating and optimizing graph structure in the fine-tuning phase which enjoys great scalability and efficiency. Moreover, the fine-tuning GNN is boosted by both migrating knowledge and refining graphs. Extensive experiments are conducted to evaluate the effectiveness (best performance on six benchmark datasets), efficiency, and scalability (13.8 times faster using 32.8% GPU memory compared to the best GSL baseline on Cora) of the proposed model.
SESSION: Session 4: Best of WSDM 2023
Efficiently Leveraging Multi-level User Intent for Session-based Recommendation via Atten-Mixer Network
Session-based recommendation (SBR) aims to predict the user's next action based on short and dynamic sessions. Recently, there has been an increasing interest in utilizing various elaborately designed graph neural networks (GNNs) to capture the pair-wise relationships among items, seemingly suggesting the design of more complicated models is the panacea for improving the empirical performance. However, these models achieve relatively marginal improvements with exponential growth in model complexity. In this paper, we dissect the classical GNN-based SBR models and empirically find that some sophisticated GNN propagations are redundant, given the readout module plays a significant role in GNN-based models. Based on this observation, we intuitively propose to remove the GNN propagation part, while the readout module will take on more responsibility in the model reasoning process. To this end, we propose the Multi-Level Attention Mixture Network (Atten-Mixer), which leverages both concept-view and instance-view readouts to achieve multi-level reasoning over item transitions. As simply enumerating all possible high-level concepts is infeasible for large real-world recommender systems, we further incorporate SBR-related inductive biases, i.e., local invariance and inherent priority to prune the search space. Experiments on three benchmarks demonstrate the effectiveness and efficiency of our proposal. We also have already launched the proposed techniques to a large-scale e-commercial online service since April 2021, with significant improvements of top-tier business metrics demonstrated in the online experiments on live traffic.
Learning Stance Embeddings from Signed Social Graphs
A challenge in social network analysis, is understanding the position, or stance, of people on a large set of topics. While past work has modeled (dis)agreement in social networks using signed graphs, these approaches have not modeled agreement patterns across a range of correlated topics. For instance, disagreement on one topic may make disagreement (or agreement) more likely for related topics. Recognizing topics influence agreement and disagreement, we propose the Stance Embeddings Model (SEM), which jointly learns embeddings for each user and topic in signed social graphs with distinct edge types for each topic. By jointly learning user and topic embeddings, SEM can perform cold-start topic stance detection, predicting the stance of a user on topics for which we have not observed their engagement. We demonstrate the effectiveness of SEM using two large-scale Twitter signed graph datasets that we open-source. One dataset, TwitterSG, labels (dis)agreements using engagements between users via tweets to derive topic-informed, signed edges. The other, BirdwatchSG, leverages community reports on misinformation and misleading content. On TwitterSG and BirdwatchSG, SEM shows a 39% and 26% error reduction respectively against strong topic-agnostic baselines.
Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning
We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need to store infinitely many policies in previous methods to only constantly many policies, which achieves nearly optimal policy efficiency, making it practical and favorable for industrial usage. We further show that this method is guaranteed to converge to the optimal policy, which cannot be achieved by previous value-based reinforcement learning methods for marketing budget allocation. Our experiments on a large-scale marketing campaign with tens-of-millions users and more than one billion budget verify the theoretical results and show that the proposed method outperforms various baseline methods. The proposed method has been successfully deployed to serve all the traffic of this marketing campaign.
Knowledge Enhancement for Contrastive Multi-Behavior Recommendation
A well-designed recommender system can accurately capture the attributes of users and items, reflecting the unique preferences of individuals. Traditional recommendation techniques usually focus on modeling the singular type of behaviors between users and items. However, in many practical recommendation scenarios (e.g., social media, e-commerce), there exist multi-typed interactive behaviors in user-item relationships, such as click, tag-as-favorite, and purchase in online shopping platforms. Thus, how to make full use of multi-behavior information for recommendation is of great importance to the existing system, which presents challenges in two aspects that need to be explored: (1) Utilizing users' personalized preferences to capture multi-behavioral dependencies; (2) Dealing with the insufficient recommendation caused by sparse supervision signal for target behavior. In this work, we propose a Knowledge Enhancement Multi-Behavior Contrastive Learning Recommendation (KMCLR) framework, including two Contrastive Learning tasks and three functional modules to tackle the above challenges, respectively. In particular, we design the multi-behavior learning module to extract users' personalized behavior information for user-embedding enhancement, and utilize knowledge graph in the knowledge enhancement module to derive more robust knowledge-aware representations for items. In addition, in the optimization stage, we model the coarse-grained commonalities and the fine-grained differences between multi-behavior of users to further improve the recommendation effect. Extensive experiments and ablation tests on the three real-world datasets indicate our KMCLR outperforms various state-of-the-art recommendation methods and verify the effectiveness of our method.
SESSION: Session 5: Recommender Systems II
Learning to Distinguish Multi-User Coupling Behaviors for TV Recommendation
This paper is concerned with TV recommendation, where one major challenge is the coupling behavior issue that the behaviors of multiple users are coupled together and not directly distinguishable because the users share the same account. Unable to identify the current watching user and use the coupling behaviors directly could lead to sub-optimal recommendation results due to the noise introduced by the behaviors of other users. Most existing methods deal with this issue either by unsupervised clustering algorithms or depending on latent user representation learning with strong assumptions. However, they neglect to sophisticatedly model the current session behaviors, which carry the information of user identification. Another critical limitation of the existing models is the lack of supervision signal on distinguishing behaviors because they solely depend on the final click label, which is insufficient to provide effective supervision. To address the above problems, we propose the Coupling Sequence Model (COSMO) for TV recommendation. In COSMO, we design a session-aware co-attention mechanism that uses both the candidate item and session behaviors as the query to attend to the historical behaviors in a fine-grained manner. Furthermore, we propose to use the data of accounts with multiple devices (e.g., families with various TV sets), which means the behaviors of one account are generated on different devices. We regard the device information as weak supervision and propose a novel pair-wise attention loss for learning to distinguish the coupling behaviors. Extensive offline experiments and online A/B tests over a commercial TV service provider demonstrate the efficacy of COSMO compared to the existing models.
Range Restricted Route Recommendation Based on Spatial Keyword
In this paper, we focus on a new route recommendation problem, i.e., when a user gives a keyword and range constraint, the route that contains the maximum number of POIs tagged with the keyword or similar POIs in the range will be returned for him. This is a practical problem when people want to explore a place, e.g., find a route within 2 km containing as many clothing stores as possible. To solve the problem, we first calculate the score of each edge in road networks based on the number and similarity of POIs. Then, we reformulate the problem into finding the path in a graph with the maximum score within the distance constraint problem, which is proved NP-hard. Given this, we not only propose an exact branch and bound (BnB) algorithm, but also devise a more efficient top-k based network expansion (k-NE) algorithm to find the near-optimal solution. Extensive experiments on real datasets not only verify the effectiveness of the proposed route recommendation algorithm, but also show that the efficiency and accuracy of k-NE algorithm are completely acceptable.
Meta Policy Learning for Cold-Start Conversational Recommendation
Conversational recommender systems (CRS) explicitly solicit users' preferences for improved recommendations on the fly. Most existing CRS solutions count on a single policy trained by reinforcement learning for a population of users. However, for users new to the system, such a global policy becomes ineffective to satisfy them, i.e., the cold-start challenge. In this paper, we study CRS policy learning for cold-start users via meta reinforcement learning. We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations. To facilitate fast policy adaptation, we design three synergetic components. Firstly, we design a meta-exploration policy dedicated to identifying user preferences via a few exploratory conversations, which accelerates personalized policy adaptation from the meta policy. Secondly, we adapt the item recommendation module for each user to maximize the recommendation quality based on the collected conversation states during conversations. Thirdly, we propose a Transformer-based state encoder as the backbone to connect the previous two components. It provides comprehensive state representations by modeling complicated relations between positive and negative feedback during the conversation. Extensive experiments on three datasets demonstrate the advantage of our solution in serving new users, compared with a rich set of state-of-the-art CRS solutions.
Variational Reasoning over Incomplete Knowledge Graphs for Conversational Recommendation
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
A Causal View for Item-level Effect of Recommendation on User Preference
Recommender systems not only serve users but also affect user preferences through personalized recommendations. Recent researches investigate the effects of the entire recommender system on user preferences, i.e., system-level effects, and find that recommendations may lead to problems such as echo chambers and filter bubbles. To properly alleviate the problems, it is necessary to estimate the effects of recommending a specific item on user preferences, i.e., item-level effects. For example, by understanding whether recommending an item aggravates echo chambers, we can better decide whether to recommend it or not.
This work designs a method to estimate the item-level effects from the causal perspective. We resort to causal graphs to characterize the average treatment effect of recommending an item on the preference of another item. The key to estimating the effects lies in mitigating the confounding bias of time and user features without the costly randomized control trials. Towards the goal, we estimate the causal effects from historical observations through a method with stratification and matching to address the two confounders, respectively. Nevertheless, directly implementing stratification and matching is intractable, which requires high computational cost due to the large sample size. We thus propose efficient approximations of stratification and matching to reduce the computation complexity. Extensive experimental results on two real-world datasets validate the effectiveness and efficiency of our method. We also show a simple example of using the item-level effects to provide insights for mitigating echo chambers.
Counterfactual Collaborative Reasoning
Causal reasoning and logical reasoning are two important types of reasoning abilities for human intelligence. However, their relationship has not been extensively explored under machine intelligence context. In this paper, we explore how the two reasoning abilities can be jointly modeled to enhance both accuracy and explainability of machine learning models. More specifically, by integrating two important types of reasoning ability--counterfactual reasoning and (neural) logical reasoning--we propose Counterfactual Collaborative Reasoning (CCR), which conducts counterfactual logic reasoning to improve the performance. In particular, we use recommender system as an example to show how CCR alleviate data scarcity, improve accuracy and enhance transparency. Technically, we leverage counterfactual reasoning to generate "difficult" counterfactual training examples for data augmentation, which--together with the original training examples--can enhance the model performance. Since the augmented data is model irrelevant, they can be used to enhance any model, enabling the wide applicability of the technique. Besides, most of the existing data augmentation methods focus on "implicit data augmentation" over users' implicit feedback, while our framework conducts "explicit data augmentation" over users explicit feedback based on counterfactual logic reasoning. Experiments on three real-world datasets show that CCR achieves better performance than non-augmented models and implicitly augmented models, and also improves model transparency by generating counterfactual explanations.
SESSION: Session 6: Learning
NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all deep XC methods, allowing them to scale to millions of labels. However, despite recent advances, training deep XC models with large encoder architectures such as transformers remains challenging. This paper notices that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down. In response, this paper introduces NGAME, a light-weight mini-batch creation technique that offers provably accurate in-batch negative samples. This allows training with larger mini-batches offering significantly faster convergence and higher accuracies than existing negative sampling techniques. NGAME was found to be up to 16% more accurate than state-of-the-art methods on a wide array of benchmark datasets for extreme classification, as well as 3% more accurate at retrieving search engine queries in response to a user webpage visit to show personalized ads. In live A/B tests on a popular search engine, NGAME yielded up to 23% gains in click-through-rates. Code for NGAME is available at https://github.com/Extreme-classification/ngame
Adversarial Autoencoder for Unsupervised Time Series Anomaly Detection and Interpretation
In many complex systems, devices are typically monitored and generating massive multivariate time series. However, due to the complex patterns and little useful labeled data, it is a great challenge to detect anomalies from these time series data. Existing methods either rely on less regularizations, or require a large number of labeled data, leading to poor accuracy in anomaly detection. To overcome the limitations, in this paper, we propose an adversarial autoencoder anomaly detection and interpretation framework named DAEMON, which performs robustly for various datasets. The key idea is to use two discriminators to adversarially train an autoencoder to learn the normal pattern of multivariate time series, and thereafter use the reconstruction error to detect anomalies. The robustness of DAEMON is guaranteed by the regularization of hidden variables and reconstructed data using the adversarial generation method. An unsupervised approach used to detect anomalies is proposed. Moreover, in order to help operators better diagnose anomalies, DAEMON provides anomaly interpretation by computing the gradients of anomalous data. An extensive empirical study on real data offers evidence that the framework is capable of outperforming state-of-the-art methods in terms of the overall F1-score and interpretation accuracy for time series anomaly detection.
Few-shot Node Classification with Extremely Weak Supervision
Few-shot node classification aims at classifying nodes with limited labeled nodes as references. Recent few-shot node classification methods typically learn from classes with abundant labeled nodes (i.e., meta-training classes) and then generalize to classes with limited labeled nodes (i.e., meta-test classes). Nevertheless, on real-world graphs, it is usually difficult to obtain abundant labeled nodes for many classes. In practice, each meta-training class can only consist of several labeled nodes, known as the extremely weak supervision problem. In few-shot node classification, with extremely limited labeled nodes for meta-training, the generalization gap between meta-training and meta-test will become larger and thus lead to suboptimal performance. To tackle this issue, we study a novel problem of few-shot node classification with extremely weak supervision and propose a principled framework X-FNC under the prevalent meta-learning framework. Specifically, our goal is to accumulate meta-knowledge across different meta-training tasks with extremely weak supervision and generalize such knowledge to meta-test tasks. To address the challenges resulting from extremely scarce labeled nodes, we propose two essential modules to obtain pseudo-labeled nodes as extra references and effectively learn from extremely limited supervision information. We further conduct extensive experiments on four node classification datasets with extremely weak supervision to validate the superiority of our framework compared to the state-of-the-art baselines.
Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild
Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the prediction scores of the multiple subtasks for a specific target policy has not been widely explored yet. In this study, we formulate real-world scenarios of content moderation and introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way. Extensive experiments demonstrate that our approach shows better performance in content moderation compared to existing threshold optimization methods and heuristics.
A Multi-graph Fusion Based Spatiotemporal Dynamic Learning Framework
Spatiotemporal data forecasting is a fundamental task in the field of graph data mining. Typical spatiotemporal data prediction methods usually capture spatial dependencies by directly aggregating features of local neighboring vertices in a fixed graph. However, this kind of aggregators can only capture localized correlations between vertices, and while been stacked for larger receptive field, they fall into the dilemma of over-smoothing. Additional, in temporal perspective, traditional methods focus on fixed graphs, while the correlations among vertexes can be dynamic. And time series components integrated strategies in traditional spatiotemporal learning methods can hardly handle frequently and drastically changed sequences. To overcome those limitations of existing works, in this paper, we propose a novel multi-graph based dynamic learning framework. First, a novel Dynamic Neighbor Search (DNS) mechanism is introduced to model global dynamic correlations between vertices by constructing a feature graph (FG), where the adjacency matrix is dynamically determined by DNS. Then we further alleviate the over-smoothing issue with our newly designed Adaptive Heterogeneous Representation (AHR) module. Both FG and origin graph (OG) are fed into the AHR modules and fused in our proposed Multi-graph Fusion block. Additionally, we design a Differential Vertex Representation (DVR) module which takes advantage of differential information to model temporal trends. Extensive experiments illustrate the superior forecasting performances of our proposed multi-graph based dynamic learning framework on six real-world spatiotemporal datasets from different cities and domains, and this corroborates the solid effectiveness of our proposed framework and its superior generalization ability.
Simultaneous Linear Multi-view Attributed Graph Representation Learning and Clustering
Over the last few years, various multi-view graph clustering methods have shown promising performances. However, we argue that these methods can have limitations. In particular, they are often unnecessarily complex, leading to scalability problems that make them prohibitive for most real-world graph applications. Furthermore, many of them can handle only specific types of multi-view graphs. Another limitation is that the process of learning graph representations is separated from the clustering process, and in some cases these methods do not even learn a graph representation, which severely restricts their flexibility and usefulness. In this paper we propose a simple yet effective linear model that addresses the dual tasks of multi-view attributed graph representation learning and clustering in a unified framework. The model starts by performing a first-order neighborhood smoothing step for the different individual views, then gives each one a weight corresponding to its importance. Finally, an iterative process of simultaneous clustering and representation learning is performed w.r.t. the importance of each view, yielding a consensus embedding and partition of the graph. Our model is generic and can deal with any type of multi-view graph. Finally, we show through extensive experimentation that this simple model consistently achieves competitive performances w.r.t. state-of-the-art multi-view attributed graph clustering models, while at the same time having training times that are shorter, in some cases by orders of magnitude.
SESSION: Session 7: Graph Mining
DeMEtRIS: Counting (near)-Cliques by Crawling
We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be stored as a whole or be scanned entirely and sampling vertices independently is non-trivial in it.
We introduce DeMEtRIS: Dense Motif Estimation through Random Incident Sampling. This method provides a scalable algorithm for clique and near clique counting in the random walk model. We prove the correctness of our algorithm through rigorous mathematical analysis and extensive experiments. Both our theoretical results and our experiments show that DeMEtRIS obtains a high precision estimation by only crawling a sub-linear portion on vertices, thus we demonstrate a significant improvement over previously known results.
Interpretable Research Interest Shift Detection with Temporal Heterogeneous Graphs
Researchers dedicate themselves to research problems they are interested in and often have evolving research interests in their academic careers. The study of research interest shift detection can help to find facts relevant to scientific training paths, scientific funding trends, and knowledge discovery. Existing methods define specific graph structures like author-conference-topic networks, and co-citing networks to detect research interest shift. They either ignore the temporal factor or miss heterogeneous information characterizing academic activities. More importantly, the detection results lack the interpretations of how research interests change over time, thus reducing the model's credibility. To address these issues, we propose a novel interpretable research interest shift detection model with temporal heterogeneous graphs. We first construct temporal heterogeneous graphs to represent the research interests of the target authors. To make the detection interpretable, we design a deep neural network to parameterize the generation process of interpretation on the predicted results in the form of a weighted sub-graph. Additionally, to improve the training process, we propose a semantic-aware negative data sampling strategy to generate non-interesting auxiliary shift graphs as contrastive samples. Extensive experiments demonstrate that our model outperforms the state-of-the-art baselines on two public academic graph datasets and is capable of producing interpretable results.
Self-supervised Graph Representation Learning for Black Market Account Detection
Nowadays, Multi-purpose Messaging Mobile App (MMMA) has become increasingly prevalent. MMMAs attract fraudsters and some cybercriminals provide support for frauds via black market accounts (BMAs). Compared to fraudsters, BMAs are not directly involved in frauds and are more difficult to detect. This paper illustrates our BMA detection system SGRL (Self-supervised Graph Representation Learning) used in WeChat, a representative MMMA with over a billion users. We tailor Graph Neural Network and Graph Self-supervised Learning in SGRL for BMA detection. The workflow of SGRL contains a pretraining phase that utilizes structural information, node attribute information and available human knowledge, and a lightweight detection phase. In offline experiments, SGRL outperforms state-of-the-art methods by 16.06%-58.17% on offline evaluation measures. We deploy SGRL in the online environment to detect BMAs on the billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the online evaluation measure. In conclusion, SGRL can alleviate label reliance, generalize well to unseen data, and effectively detect BMAs in WeChat.
GOOD-D: On Unsupervised Graph Out-Of-Distribution Detection
Most existing deep learning models are trained based on the closed-world assumption, where the test data is assumed to be drawn i.i.d. from the same distribution as the training data, known as in-distribution (ID). However, when models are deployed in an open-world scenario, test samples can be out-of-distribution (OOD) and therefore should be handled with caution. To detect such OOD samples drawn from unknown distribution, OOD detection has received increasing attention lately. However, current endeavors mostly focus on grid-structured data and its application for graph-structured data remains under-explored. Considering the fact that data labeling on graphs is commonly time-expensive and labor-intensive, in this work we study the problem of unsupervised graph OOD detection, aiming at detecting OOD graphs solely based on unlabeled ID data. To achieve this goal, we develop a new graph contrastive learning framework GOOD-D for detecting OOD graphs without using any ground-truth labels. By performing hierarchical contrastive learning on the augmented graphs generated by our perturbation-free graph data augmentation method, GOOD-D is able to capture the latent ID patterns and accurately detect OOD graphs based on the semantic inconsistency in different granularities (i.e., node-level, graph-level, and group-level). As a pioneering work in unsupervised graph-level OOD detection, we build a comprehensive benchmark to compare our proposed approach with different state-of-the-art methods. The experiment results demonstrate the superiority of our approach over different methods on various datasets.
Graph Explicit Neural Networks: Explicitly Encoding Graphs for Efficient and Accurate Inference
As the state-of-the-art graph learning models, the message passing based neural networks (MPNNs) implicitly use the graph topology as the "pathways" to propagate node features. This implicit use of graph topology induces the MPNNs' over-reliance on (node) features and high inference latency, which hinders their large-scale applications in industrial contexts. To mitigate these weaknesses, we propose the Graph Explicit Neural Network (GENN) framework. GENN can be flexibly applied to various MPNNs and improves them by providing more efficient and accurate inference that is robust in feature-constrained settings. Specifically, we carefully incorporate recent developments in network embedding methods to efficiently prioritize the graph topology for inference. From this vantage, GENN explicitly encodes the topology as an important source of information to mitigate the reliance on node features. Moreover, by adopting knowledge distillation (KD) techniques, GENN takes an MPNN as the teacher to supervise the training for better effectiveness while avoiding the teacher's high inference latency. Empirical results show that our GENN infers dramatically faster than its MPNN teacher by 40x-78x. In terms of accuracy, GENN yields significant gains (more than 40%) for its MPNN teacher when the node features are limited based on our explicit encoding. Moreover, GENN outperforms the MPNN teacher even in feature-rich settings thanks to our KD design.
Alleviating Structural Distribution Shift in Graph Anomaly Detection
Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes --- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change across training and testing data, which is called structural distribution shift (SDS) in this paper. The mainstream methods are built on graph neural networks (GNNs), benefiting the classification of normals from aggregating homophilous neighbors, yet ignoring the SDS issue for anomalies and suffering from poor generalization.
This work solves the problem from a feature view. We observe that the degree of SDS varies between anomalies and normal nodes. Hence to address the issue, the key lies in resisting high heterophily for anomalies meanwhile benefiting the learning of normals from homophily. Since different labels correspond to the difference of critical anomaly features which make great contributions to the GAD, we tease out the anomaly features on which we constrain to mitigate the effect of heterophilous neighbors and make them invariant. However, the prior distribution of anomaly features is dynamic and hard to estimate, we thus devise a prototype vector to infer and update this distribution during training. For normal nodes, we constrain the remaining features to preserve the connectivity of nodes and reinforce the influence of the homophilous neighborhood. We term our proposed framework asGraph Decomposition Network (GDN). Extensive experiments are conducted on two benchmark datasets, and the proposed framework achieves a remarkable performance boost in GAD, especially in an SDS environment where anomalies have largely different structural distribution across training and testing environments. Codes are open-sourced in https://github.com/blacksingular/wsdm_GDN.
SESSION: Session 8: Recommendation and Learning
One for All, All for One: Learning and Transferring User Embeddings for Cross-Domain Recommendation
Cross-domain recommendation is an important method to improve recommender system performance, especially when observations in target domains are sparse. However, most existing techniques focus on single-target or dual-target cross-domain recommendation (CDR) and are hard to be generalized to CDR with multiple target domains. In addition, the negative transfer problem is prevalent in CDR, where the recommendation performance in a target domain may not always be enhanced by knowledge learned from a source domain, especially when the source domain has sparse data. In this study, we propose CAT-ART, a multi-target CDR method that learns to improve recommendations in all participating domains through representation learning and embedding transfer. Our method consists of two parts: a self-supervised Contrastive AuToencoder (CAT) framework to generate global user embeddings based on information from all participating domains, and an Attention-based Representation Transfer (ART) framework which transfers domain-specific user embeddings from other domains to assist with target domain recommendation. CAT-ART boosts the recommendation performance in any target domain through the combined use of the learned global user representation and knowledge transferred from other domains, in addition to the original user embedding in the target domain. We conducted extensive experiments on a collected real-world CDR dataset spanning 5 domains and involving a million users. Experimental results demonstrate the superiority of the proposed method over a range of prior arts. We further conducted ablation studies to verify the effectiveness of the proposed components. Our collected dataset will be open-sourced to facilitate future research in the field of multi-domain recommender systems and user modelling.
S2TUL: A Semi-Supervised Framework for Trajectory-User Linking
Trajectory-User Linking (TUL) aiming to identify users of anonymous trajectories, has recently received increasing attention due to its wide range of applications, such as criminal investigation and personalized recommendation systems. In this paper, we propose a flexible <u>S</u>emi-<u>S</u>upervised framework for <u>T</u>rajectory-<u>U</u>ser <u>L</u>inking, namely S2TUL, which includes five components: trajectory-level graph construction, trajectory relation modeling, location-level sequential modeling, a classification layer and greedy trajectory-user relinking. The first two components are proposed to model the relationships among trajectories, in which three homogeneous graphs and two heterogeneous graphs are firstly constructed and then delivered into the graph convolutional networks for converting the discrete identities to hidden representations. Since the graph constructions are irrelevant to the corresponding users, the unlabelled trajectories can also be included in the graphs, which enables the framework to be trained in a semi-supervised way. Afterwards, the location-level sequential modeling component is designed to capture fine-grained intra-trajectory information by passing the trajectories into the sequential neural networks. Finally, these two level representations are concatenated into a classification layer to predict the user of the input trajectory. In the testing phase, a greedy trajectory-user relinking method is proposed to assure the linking results satisfy the timespan overlap constraint. We conduct extensive experiments on three public datasets with six representative competitors. The evaluation results demonstrate the effectiveness of the proposed framework.
DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social Platforms
User engagement prediction plays a critical role in designing interaction strategies to grow user engagement and increase revenue in online social platforms. Through the in-depth analysis of the real-world data from the world's largest professional social platforms, i.e., LinkedIn, we find that users expose diverse engagement patterns, and a major reason for the differences in user engagement patterns is that users have different intents. That is, people have different intents when using LinkedIn, e.g., applying for jobs, building connections, or checking notifications, which shows quite different engagement patterns. Meanwhile, user intents and the corresponding engagement patterns may change over time. Although such pattern differences and dynamics are essential for user engagement prediction, differentiating user engagement patterns based on user dynamic intents for better user engagement forecasting has not received enough attention in previous works. In this paper, we proposed a Dynamic Intent Guided Meta Network (DIGMN), which can explicitly model user intent varying with time and perform differentiated user engagement forecasting. Specifically, we derive some interpretable basic user intents as prior knowledge from data mining and introduce prior intents to explicitly model dynamic user intent. Furthermore, based on the dynamic user intent representations, we propose a meta-predictor to perform differentiated user engagement forecasting. Through a comprehensive evaluation of LinkedIn anonymous user data, our method outperforms state-of-the-art baselines significantly, i.e., 2.96% and 3.48% absolute error reduction, on coarse-grained and fine-grained user engagement prediction tasks, respectively, demonstrating the effectiveness of our method.
Federated Unlearning for On-Device Recommendation
The increasing data privacy concerns in recommendation systems have made federated recommendations attract more and more attention. Existing federated recommendation systems mainly focus on how to effectively and securely learn personal interests and preferences from their on-device interaction data. Still, none of them considers how to efficiently erase a user's contribution to the federated training process. We argue that such a dual setting is necessary. First, from the privacy protection perspective, "the right to be forgotten (RTBF)" requires that users have the right to withdraw their data contributions. Without the reversible ability, federated recommendation systems risk breaking data protection regulations. On the other hand, enabling a federated recommender to forget specific users can improve its robustness and resistance to malicious clients' attacks.
To support user unlearning in federated recommendation systems, we propose an efficient unlearning method FRU (Federated Recommendation Unlearning), inspired by the log-based rollback mechanism of transactions in database management systems. It removes a user's contribution by rolling back and calibrating the historical parameter updates and then uses these updates to speed up federated recommender reconstruction. However, storing all historical parameter updates on resource-constrained personal devices is challenging and even infeasible. In light of this challenge, we propose a small-sized negative sampling method to reduce the number of item embedding updates and an importance-based update selection mechanism to store only important model updates. To evaluate the effectiveness of FRU, we propose an attack method to disturb federated recommenders via a group of compromised users. Then, we use FRU to recover recommenders by eliminating these users' influence. Finally, we conduct extensive experiments on two real-world recommendation datasets (i.e. MovieLens-100k and Steam-200k) with two widely used federated recommenders to show the efficiency and effectiveness of our proposed approaches.
Cognition-aware Knowledge Graph Reasoning for Explainable Recommendation
Knowledge graphs (KGs) have been widely used in recommendation systems to improve recommendation accuracy and interpretability effectively. Recent research usually endows KG reasoning to find the multi-hop user-item connection paths for explaining why an item is recommended. The existing path-finding process is well designed by logic-driven inference algorithms, while there exists a gap between how algorithms and users perceive the reasoning process. Factually, human thinking is a natural reasoning process that can provide more proper and convincing explanations of why particular decisions are made. Motivated by the Dual Process Theory in cognitive science, we propose a cognition-aware KG reasoning model CogER for Explainable Recommendation, which imitates the human cognition process and designs two modules, i.e., System~1 (making intuitive judgment) and System~2 (conducting explicit reasoning), to generate the actual decision-making process. At each step during the cognition-aware reasoning process, System~1 generates an intuitive estimation of the next-step entity based on the user's historical behavior, and System~2 conducts explicit reasoning and selects the most promising knowledge entities. These two modules work iteratively and are mutually complementary, enabling our model to yield high-quality recommendations and proper reasoning paths. Experiments on three real-world datasets show that our model achieves better recommendation results with explanations compared with previous methods.
Multi-Intention Oriented Contrastive Learning for Sequential Recommendation
Sequential recommendation aims to capture users' dynamic preferences, in which data sparsity is a key problem. Most contrastive learning models leverage data augmentation to address this problem, but they amplify noises in original sequences. Contrastive learning has the assumption that two views (positive pairs) obtained from the same user behavior sequence must be similar. However, noises typically disturb the user's main intention, which results in the dissimilarity of two views.
To address this problem, in this work, we formalize the denoising problem by selecting the user's main intention, and apply contrastive learning for the first time under this topic, i.e., we propose a novel framework, namely Multi-Intention Oriented Contrastive Learning Recommender (IOCRec). In order to create high-quality views with intent-level, we fuse local and global intentions to unify sequential patterns and intent-level self-supervision signals. Specifically, we design the sequence encoder in IOCRec which includes three modules: local module, global module and disentangled module. The global module can capture users' global preferences, which is independent of the local module. The disentangled module can obtain multi-intention behind global and local representations. From a fine-grained perspective, IOCRec separates different intentions to guide the denoising process. Extensive experiments on four widely-used real datasets demonstrate the effectiveness of our new method for sequential recommendation.
SESSION: Session 9: Language Models and Text Mining
Friendly Conditional Text Generator
Our goal is to control text generation with more fine-grained conditions at lower computational cost than is possible with current alternatives; these conditions are attributes (i.e., multiple codes and free-text). As large-scale pre-trained language models (PLMs) offer excellent performance in free-form text generation, we explore efficient architectures and training schemes that can best leverage PLMs. Our framework, Friendly Conditional Text Generator (FCTG), introduces a multi-view attention (MVA) mechanism and two training tasks, Masked Attribute Modeling (MAM) and Attribute Linguistic Matching (ALM), to direct various PLMs via modalities between the text and its attributes. The motivation of FCTG is to map texts and attributes into a shared space, and bridge their modality gaps, as the texts and attributes reside in different regions of semantic space. To avoid catastrophic forgetting, modality-free embedded representations are learnt, and used to direct PLMs in this space, FCTG applies MAM to learn attribute representations, maps them in the same space as text through MVA, and optimizes their alignment in this space via ALM. Experiments on publicly available datasets show that FCTG outperforms baselines over higher level conditions at lower computation cost.
Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.
Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning
Pre-trained Language Models (PLMs) have achieved remarkable performance for various language understanding tasks in IR systems, which require the fine-tuning process based on labeled training data. For low-resource scenarios, prompt-based learning for PLMs exploits prompts as task guidance and turns downstream tasks into masked language problems for effective few-shot fine-tuning. In most existing approaches, the high performance of prompt-based learning heavily relies on handcrafted prompts and verbalizers, which may limit the application of such approaches in real-world scenarios. To solve this issue, we present CP-Tuning, an end-to-end Contrastive Prompt Tuning framework for fine-tuning PLMs without any manual engineering of task-specific prompts and verbalizers. It is integrated with the task-invariant continuous prompt encoding technique with fully trainable prompt parameters. We further propose the pair-wise cost-sensitive contrastive learning procedure to optimize the model in order to achieve verbalizer-free class mapping and enhance the task-invariance of prompts. It explicitly learns to distinguish different classes and makes the decision boundary smoother by assigning different costs to easy and hard cases. Experiments over a variety of language understanding tasks and different PLMs show that CP-Tuning outperforms state-of-the-art methods.
Visual Matching is Enough for Scene Text Retrieval
Given a text query, the task of scene text retrieval aims at searching and localizing all the text instances that are contained in an image gallery. The state-of-the-art method learns a cross-modal similarity between the query text and the detected text regions in natural images to facilitate retrieval. However, this cross-modal approach still cannot well bridge the heterogeneity gap between the text and image modalities. In this paper, we propose a new paradigm that converts the task into a single-modality retrieval problem. Unlike previous works that rely on character recognition or embedding, we directly leverage pictorial information by rendering query text into images to learn the glyph feature of each character, which can be utilized to capture the similarity between query and scene text images. With the extracted visual features, we devise a synthetic label image guided feature alignment mechanism that is robust to different scene text styles and layouts. The modules of glyph feature learning, text instance detection, and visual matching are jointly trained in an end-to-end framework. Experimental results show that our proposed paradigm achieves the best performance in multiple benchmark datasets. As a side product, our method can also be easily generalized to support text queries with unseen characters or languages in a zero-shot manner.
AGREE: Aligning Cross-Modal Entities for Image-Text Retrieval Upon Vision-Language Pre-trained Models
Image-text retrieval is a challenging cross-modal task that arouses much attention. While the traditional methods cannot break down the barriers between different modalities, Vision-Language Pre-trained (VLP) models greatly improve image-text retrieval performance based on massive image-text pairs. Nonetheless, the VLP-based methods are still prone to produce retrieval results that cannot be cross-modal aligned with entities. Recent efforts try to fix this problem at the pre-training stage, which is not only expensive but also unpractical due to the unavailable of full datasets. In this paper, we novelly propose a lightweight and practical approach to align cross-modal entities for image-text retrieval upon VLP models only at the fine-tuning and re-ranking stages. We employ external knowledge and tools to construct extra fine-grained image-text pairs, and then emphasize cross-modal entity alignment through contrastive learning and entity-level mask modeling in fine-tuning. Besides, two re-ranking strategies are proposed, including one specially designed for zero-shot scenarios. Extensive experiments with several VLP models on multiple Chinese and English datasets show that our approach achieves state-of-the-art results in nearly all settings.
Can Pre-trained Language Models Understand Chinese Humor?
Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: whether PLMs are capable of humor understanding? This paper is the first work that systematically investigates the humor understanding ability of PLMs. For this purpose, a comprehensive framework with three evaluation steps and four evaluation tasks is designed. We also construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework. Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.
SESSION: Poster Session
IDNP: Interest Dynamics Modeling Using Generative Neural Processes for Sequential Recommendation
Recent sequential recommendation models rely increasingly on consecutive short-term user-item interaction sequences to model user interests. These approaches have, however, raised concerns about both short- and long-term interests. (1) short-term: interaction sequences may not result from a monolithic interest, but rather from several intertwined interests, even within a short period of time, resulting in their failures to model skip behaviors; (2) long-term: interaction sequences are primarily observed sparsely at discrete intervals, other than consecutively over the long run. This renders difficulty in inferring long-term interests, since only discrete interest representations can be derived, without taking into account interest dynamics across sequences. In this study, we address these concerns by learning (1) multi-scale representations of short-term interests; and (2) dynamics-aware representations of long-term interests. To this end, we present an Interest Dynamics modeling framework using generative Neural Processes, coined IDNP, to model user interests from a functional perspective. IDNP learns a global interest function family to define each user's long-term interest as a function instantiation, manifesting interest dynamics through function continuity. Specifically, IDNP first encodes each user's short-term interactions into multi-scale representations, which are then summarized as user context. By combining latent global interest with user context, IDNP then reconstructs long-term user interest functions and predicts interactions at upcoming query timestep. Moreover, IDNP can model such interest functions even when interaction sequences are limited and non-consecutive. Extensive experiments on four real-world datasets demonstrate that our model outperforms the state-of-the-art on various evaluation metrics.
Disentangled Representation for Diversified Recommendations
Accuracy and diversity have long been considered to be two conflicting goals for recommendations. We point out, however, that as the diversity is typically measured by certain pre-selected item attributes, e.g., category as the most popularly employed one, improved diversity can be achieved without sacrificing recommendation accuracy, as long as the diversification respects the user's preference about the pre-selected attributes. This calls for a fine-grained understanding of a user's preferences over items, where one needs to recognize the user's choice is driven by the quality of the item itself, or the pre-selected attributes of the item.
In this work, we focus on diversity defined on item categories. We propose a general diversification framework agnostic to the choice of recommendation algorithm. Our solution disentangles the learnt user representation in the recommendation module into category- independent and category-dependent components to differentiate a user's preference over items from two orthogonal perspectives. Experimental results on three benchmark datasets and online A/B test demonstrate the effectiveness of our solution in improving both recommendation accuracy and diversity. In-depth analysis suggests that the improvement is due to our improved modeling of users' categorical preferences and refined ranking within item categories.
Slate-Aware Ranking for Recommendation
We see widespread adoption of slate recommender systems, where an ordered item list is fed to the user based on the user interests and items' content. For each recommendation, the user can select one or several items from the list for further interaction. In this setting, the significant impact on user behaviors from the mutual influence among the items is well understood. The existing methods add another step of slate re-ranking after the ranking stage of recommender systems, which considers the mutual influence among recommended items to re-rank and generate the recommendation results so as to maximize the expected overall utility. However, to model the complex interaction of multiple recommended items, the re-ranking stage usually can just handle dozens of candidates because of the constraint of limited hardware resource and system latency. Therefore, the ranking stage is still essential for most applications to provide high-quality candidate set for the re-ranking stage. In this paper, we propose a solution named Slate-Aware ranking (SAR ) for the ranking stage. By implicitly considering the relations among the slate items, it significantly enhances the quality of the re-ranking stage's candidate set and boosts the relevance and diversity of the overall recommender systems. Both experiments with the public datasets and internal online A/B testing are conducted to verify its effectiveness.
DisenPOI: Disentangling Sequential and Geographical Influence for Point-of-Interest Recommendation
Point-of-Interest (POI) recommendation plays a vital role in various location-aware services. It has been observed that POI recommendation is driven by both sequential and geographical influences. However, since there is no annotated label of the dominant influence during recommendation, existing methods tend to entangle these two influences, which may lead to sub-optimal recommendation performance and poor interpretability. In this paper, we address the above challenge by proposing DisenPOI, a novel Disentangled dual-graph framework for POI recommendation, which jointly utilizes sequential and geographical relationships on two separate graphs and disentangles the two influences with self-supervision. The key novelty of our model compared with existing approaches is to extract disentangled representations of both sequential and geographical influences with contrastive learning. To be specific, we construct a geographical graph and a sequential graph based on the check-in sequence of a user. We tailor their propagation schemes to become sequence-/geo-aware to better capture the corresponding influences. Preference proxies are extracted from check-in sequence as pseudo labels for the two influences, which supervise the disentanglement via a contrastive loss. Extensive experiments on three datasets demonstrate the superiority of the proposed model.
MUSENET: Multi-Scenario Learning for Repeat-Aware Personalized Recommendation
Personalized recommendation has been instrumental in many real applications. Despite the great progress, the underlying multi-scenario characteristics (e.g., users may behave differently under different scenarios) are largely ignored by existing recommender systems. Intuitively, modeling different scenarios properly could significantly improve the recommendation accuracy, and some existing work has explored this direction. However, these work assumes the scenarios are explicitly given, and thus becomes less effective when such information is unavailable. To complicate things further, proper scenario modeling from data is challenging and the recommendation models may easily overfit to some scenarios. In this paper, we propose a multi-scenario learning framework, MUSENET, for personalized recommendation. The key idea of MUSENET is to learn multiple implicit scenarios from the user behaviors, with a careful design inspired by the causal interpretation of recommender systems to avoid the overfitting issue. Additionally, since users' repeat consumptions account for a large part of the user behavior data on many e-commerce platforms, a repeat-aware mechanism is integrated to handle users' repurchase intentions within each scenario. Comprehensive experimental results on both industrial and public datasets demonstrate the effectiveness of the proposed approach compared with the state-of-the-art methods.
VRKG4Rec: Virtual Relational Knowledge Graph for Recommendation
Incorporating knowledge graph as side information has become a new trend in recommendation systems. Recent studies regard items as entities of a knowledge graph and leverage graph neural networks to assist item encoding, yet by considering each relation type independently. However, relation types are often too many and sometimes one relation type involves too few entities. We argue that there may exist some latent relevance among relations in KG. It may not necessary nor effective to consider all relation types for item encoding. In this paper, we propose a VRKG4Rec model (Virtual Relational Knowledge Graphs for Recommendation), which clusters relations with latent relevance to generates virtual relations. Specifically, we first construct virtual relational graphs (VRKGs) by an unsupervised learning scheme. We also design a local weighted smoothing (LWS) mechanism for node encoding on VRKGs, which iteratively updates a node embedding only depending on the node itself and its neighbors, but involve no additional training parameters. LWS mechanism is also employed on a user-item bipartite graph for user representation learning, which utilizes item encodings with virtual relational knowledge to help train user representations. Experiment results on two public datasets validate that our VRKG4Rec model outperforms the state-of-the-art methods. The implementations are available at https://github.com/lulu0913/VRKG4Rec.
Knowledge-Adaptive Contrastive Learning for Recommendation
By jointly modeling user-item interactions and knowledge graph (KG) information, KG-based recommender systems have shown their superiority in alleviating data sparsity and cold start problems. Recently, graph neural networks (GNNs) have been widely used in KG-based recommendation, owing to the strong ability of capturing high-order structural information. However, we argue that existing GNN-based methods have the following two limitations. Interaction domination: the supervision signal of user-item interaction will dominate the model training, and thus the information of KG is barely encoded in learned item representations; Knowledge overload: KG contains much recommendation-irrelevant information, and such noise would be enlarged during the message aggregation of GNNs. The above limitations prevent existing methods to fully utilize the valuable information lying in KG. In this paper, we propose a novel algorithm named Knowledge-Adaptive Contrastive Learning (KACL) to address these challenges. Specifically, we first generate data augmentations from user-item interaction view and KG view separately, and perform contrastive learning across the two views. Our design of contrastive loss will force the item representations to encode information shared by both views, thereby alleviating the interaction domination issue. Moreover, we introduce two learnable view generators to adaptively remove task-irrelevant edges during data augmentation, and help tolerate the noises brought by knowledge overload. Experimental results on three public benchmarks demonstrate that KACL can significantly improve the performance on top-K recommendation compared with state-of-the-art methods.
Heterogeneous Graph Contrastive Learning for Recommendation
Graph Neural Networks (GNNs) have become powerful tools in modeling graph-structured data in recommender systems. However, real-life recommendation scenarios usually involve heterogeneous relationships (e.g., social-aware user influence, knowledge-aware item dependency) which contains fruitful information to enhance the user preference learning. In this paper, we study the problem of heterogeneous graph-enhanced relational learning for recommendation. Recently, contrastive self-supervised learning has become successful in recommendation. In light of this, we propose a <u>H</u>eterogeneous <u>G</u>raph <u>C</u>ontrastive <u>L</u>earning (HGCL), which is able to incorporate heterogeneous relational semantics into the user-item interaction modeling with contrastive learning-enhanced knowledge transfer across different views. However, the influence of heterogeneous side information on interactions may vary by users and items. To move this idea forward, we enhance our heterogeneous graph contrastive learning with meta networks to allow the personalized knowledge transformer with adaptive contrastive augmentation. The experimental results on three real-world datasets demonstrate the superiority of HGCL over state-of-the-art recommendation methods. Through ablation study, key components in HGCL method are validated to benefit the recommendation performance improvement. The source code of the model implementation is available at the link https://github.com/HKUDS/HGCL.
Exploiting Explicit and Implicit Item relationships for Session-based Recommendation
The session-based recommendation aims to predict users' immediate next actions based on their short-term behaviors reflected by past and ongoing sessions. Graph neural networks (GNNs) recently dominated the related studies, yet their performance heavily relies on graph structures, which are often predefined, task-specific, and designed heuristically. Furthermore, existing graph-based methods either neglect implicit correlations among items or consider explicit and implicit relationships altogether in the same graphs. We propose to decouple explicit and implicit relationships among items. As such, we can capture the prior knowledge encapsulated in explicit dependencies and learned implicit correlations among items simultaneously in a flexible and more interpretable manner for effective recommendations. We design a dual graph neural network that leverages the feature representations extracted by two GNNs: a graph neural network with a single gate (SG-GNN) and an adaptive graph neural network (A-GNN). The former models explicit dependencies among items. The latter employs a self-learning strategy to capture implicit correlations among items. Our experiments on four real-world datasets show our model outperforms state-of-the-art methods by a large margin, achieving 18.46% and 70.72% improvement in HR@20, and 49.10% and 115.29% improvement in MRR@20 on Diginetica and LastFM datasets.
Improving News Recommendation with Channel-Wise Dynamic Representations and Contrastive User Modeling
News modeling and user modeling are the two core tasks of news recommendation. Accurate user representation and news representation can enable the recommendation system to provide users with precise recommendation services. Most existing methods use deep learning models such as CNN and Self-Attention to extract text features from news titles and abstracts to generate specific news vectors. However, the CNN-based methods have fixed parameters and cannot extract specific features for different input words, while the Self-Attention-based methods have high computational costs and are difficult to capture local features effectively. In our proposed method, we build a category-based dynamic component to generate suitable parameters for different inputs and extract local features from multiple perspectives. Meanwhile, users will mistakenly click on some news terms they are not interested in, so there will be some interaction noises in the datasets. In order to explore the critical user behaviors in user data and reduce the impact of noise data on user modeling, we adopt a frequency-aware contrastive learning method in user modeling. Experiments on real-world datasets verify the effectiveness of our proposed method.
Calibrated Recommendations as a Minimum-Cost Flow Problem
Calibration in recommender systems has recently gained significant attention. In the recommended list of items, calibration ensures that the various (past) areas of interest of a user are reflected with their corresponding proportions. For instance, if a user has watched, say, 80 romance movies and 20 action movies, then it is reasonable to expect the recommended list of movies to be comprised of about 80% romance and 20% action movies as well. Calibration is particularly important given that optimizing towards accuracy often leads to the user's minority interests being dominated by their main interests, or by a few overall popular items, in the recommendations they receive. In this paper, we propose a novel approach based on the max flow problem for generating calibrated recommendations. In a series of experiments using two publicly available datasets, we demonstrate the superior performance of our proposed approach compared to the state-of-the-art in generating relevant and calibrated recommendation lists.
Generative Slate Recommendation with Reinforcement Learning
Recent research has employed reinforcement learning (RL) algorithms to optimize long-term user engagement in recommender systems, thereby avoiding common pitfalls such as user boredom and filter bubbles. They capture the sequential and interactive nature of recommendations, and thus offer a principled way to deal with long-term rewards and avoid myopic behaviors. However, RL approaches are intractable in the slate recommendation scenario - where a list of items is recommended at each interaction turn - due to the combinatorial action space. In that setting, an action corresponds to a slate that may contain any combination of items.
While previous work has proposed well-chosen decompositions of actions so as to ensure tractability, these rely on restrictive and sometimes unrealistic assumptions. Instead, in this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. Then, the RL agent selects continuous actions in this latent space, which are ultimately decoded into the corresponding slates. By doing so, we are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates instead of independent items, in particular by enabling diversity. Our experiments performed on a wide array of simulated environments confirm the effectiveness of our generative modeling of slates over baselines in practical scenarios where the restrictive assumptions underlying the baselines are lifted. Our findings suggest that representation learning using generative models is a promising direction towards generalizable RL-based slate recommendation.
SGCCL: Siamese Graph Contrastive Consensus Learning for Personalized Recommendation
Contrastive-learning-based neural networks have recently been introduced to recommender systems, due to their unique advantage of injecting collaborative signals to model deep representations, and the self-supervision nature in the learning process. Existing contrastive learning methods for recommendations are mainly proposed through introducing augmentations to the user-item (U-I) bipartite graphs. Such a contrastive learning process, however, is susceptible to bias towards popular items and users, because higher-degree users/items are subject to more augmentations and their correlations are more captured. In this paper, we advocate a <u>S</u>iamese <u>G</u>raph <u>C</u>ontrastive <u>C</u>onsensus <u>L</u>earning (SGCCL) framework, to explore intrinsic correlations and alleviate the bias effects for personalized recommendation. Instead of augmenting original U-I networks, we introduce siamese graphs, which are homogeneous relations of user-user (U-U) similarity and item-item (I-I) correlations. A contrastive consensus optimization process is also adopted to learn effective features for user-item ratings, user-user similarity, and item-item correlation. Finally, we employ the self-supervised learning coupled with the siamese item-item/user-user graph relationships, which ensures unpopular users/items are well preserved in the embedding space. Different from existing studies, SGCCL performs well on both overall and debiasing recommendation tasks resulting in a balanced recommender. Experiments on four benchmark datasets demonstrate that SGCCL outperforms state-of-the-art methods with higher accuracy and greater long-tail item/user exposure.
AutoGen: An Automated Dynamic Model Generation Framework for Recommender System
Considering the balance between revenue and resource consumption for industrial recommender systems, intelligent recommendation computing has been emerging recently. Existing solutions deploy the same recommendation model to serve users indiscriminately, which is sub-optimal for total revenue maximization. We propose a multi-model service solution by deploying different-complexity models to serve different-valued users. An automated dynamic model generation framework AutoGen is elaborated to efficiently derive multiple parameter-sharing models with diverse complexities and adequate predictive capabilities. A mixed search space is designed and an importance-aware progressive training scheme is proposed to prevent interference between different architectures, which avoids the model retraining and improves the search efficiency, thereby efficiently deriving multiple models. Extensive experiments are conducted on two public datasets to demonstrate the effectiveness and efficiency of AutoGen.
Robust Training of Graph Neural Networks via Noise Governance
Graph Neural Networks (GNNs) have become widely-used models for semi-supervised learning. However, the robustness of GNNs in the presence of label noise remains a largely under-explored problem. In this paper, we consider an important yet challenging scenario where labels on nodes of graphs are not only noisy but also scarce. In this scenario, the performance of GNNs is prone to degrade due to label noise propagation and insufficient learning. To address these issues, we propose a novel RTGNN (<u>R</u>obust <u>T</u>raining of <u>G</u>raph <u>N</u>eural <u>N</u>etworks via Noise Governance) framework that achieves better robustness by learning to explicitly govern label noise. More specifically, we introduce self-reinforcement and consistency regularization as supplemental supervision. The self-reinforcement supervision is inspired by the memorization effects of deep neural networks and aims to correct noisy labels. Further, the consistency regularization prevents GNNs from overfitting to noisy labels via mimicry loss in both the inter-view and intra-view perspectives. To leverage such supervisions, we divide labels into clean and noisy types, rectify inaccurate labels, and further generate pseudo-labels on unlabeled nodes. Supervision for nodes with different types of labels is then chosen adaptively. This enables sufficient learning from clean labels while limiting the impact of noisy ones. We conduct extensive experiments to evaluate the effectiveness of our RTGNN framework, and the results validate its consistent superior performance over state-of-the-art methods with two types of label noises and various noise rates.
Cooperative Explanations of Graph Neural Networks
With the growing success of graph neural networks (GNNs), the explainability of GNN is attracting considerable attention. Current explainers mostly leverage feature attribution and selection to explain a prediction. By tracing the importance of input features, they select the salient subgraph as the explanation. However, their explainability is at the granularity of input features only, and cannot reveal the usefulness of hidden neurons. This inherent limitation makes the explainers fail to scrutinize the model behavior thoroughly, resulting in unfaithful explanations.
In this work, we explore the explainability of GNNs at the granularity of both input features and hidden neurons. To this end, we propose an explainer-agnostic framework, <u>C</u>ooperative <u>G</u>NN <u>E</u>xplanation (CGE) to generate the explanatory subgraph and subnetwork simultaneously, which jointly explain how the GNN model arrived at its prediction. Specifically, it first initializes the importance scores of input features and hidden neurons with masking networks. Then it iteratively retrains the importance scores, refining the salient subgraph and subnetwork by discarding low-scored features and neurons in each iteration. Through such cooperative learning, CGE not only generates faithful and concise explanations, but also exhibits how the salient information flows by activating and deactivating neurons. We conduct extensive experiments on both synthetic and real-world datasets, validating the superiority of CGE over state-of-the-art approaches. Code is available at https://github.com/MangoKiller/CGE_demo.
Bring Your Own View: Graph Neural Networks for Link Prediction with Personalized Subgraph Selection
Graph neural networks (GNNs) have received remarkable success in link prediction (GNNLP) tasks. Existing efforts first predefine the subgraph for the whole dataset and then apply GNNs to encode edge representations by leveraging the neighborhood structure induced by the fixed subgraph. The prominence of GNNLP methods significantly relies on the adhoc subgraph. Since node connectivity in real-world graphs is complex, one shared subgraph is limited for all edges. Thus, the choices of subgraphs should be personalized to different edges. However, performing personalized subgraph selection is nontrivial since the potential selection space grows exponentially to the scale of edges. Besides, the inference edges are not available during training in link prediction scenarios, so the selection process needs to be inductive. To bridge the gap, we introduce a Personalized Subgraph Selector (PS2) as a plug-and-play framework to automatically, personally, and inductively identify optimal subgraphs for different edges when performing GNNLP. PS2 is instantiated as a bi-level optimization problem that can be efficiently solved differently. Coupling GNNLP models with PS2, we suggest a brand-new angle towards GNNLP training: by first identifying the optimal subgraphs for edges; and then focusing on training the inference model by using the sampled subgraphs. Comprehensive experiments endorse the effectiveness of our proposed method across various GNNLP backbones (GCN, GraphSage, NGCF, LightGCN, and SEAL) and diverse benchmarks (Planetoid, OGB, and Recommendation datasets). Our code is publicly available at https://github.com/qiaoyu-tan/PS2
Towards Faithful and Consistent Explanations for Graph Neural Networks
Uncovering rationales behind predictions of graph neural networks (GNNs) has received increasing attention over recent years. Instance-level GNN explanation aims to discover critical input elements, like nodes or edges, that the target GNN relies upon for making predictions. Though various algorithms are proposed, most of them formalize this task by searching the minimal subgraph which can preserve original predictions. However, an inductive bias is deep-rooted in this framework: several subgraphs can result in the same or similar outputs as the original graphs. Consequently, they have the danger of providing spurious explanations and fail to provide consistent explanations. Applying them to explain weakly-performed GNNs would further amplify these issues. To address this problem, we theoretically examine the predictions of GNNs from the causality perspective. Two typical reasons of spurious explanations are identified: confounding effect of latent variables like distribution shift, and causal factors distinct from the original input. Observing that both confounding effects and diverse causal rationales are encoded in internal representations, we propose a simple yet effective countermeasure by aligning embeddings. Concretely, concerning potential shifts in the high-dimensional space, we design a distribution-aware alignment algorithm based on anchors. This new objective is easy to compute and can be incorporated into existing techniques with no or little effort. Theoretical analysis shows that it is in effect optimizing a more faithful explanation objective in design, which further justifies the proposed approach.
Position-Aware Subgraph Neural Networks with Data-Efficient Learning
Data-efficient learning on graphs (GEL) is essential in real-world applications. Existing GEL methods focus on learning useful representations for nodes, edges, or entire graphs with "small" labeled data. But the problem of data-efficient learning for subgraph prediction has not been explored. The challenges of this problem lie in the following aspects: 1) It is crucial for subgraphs to learn positional features to acquire structural information in the base graph in which they exist. Although the existing subgraph neural network method is capable of learning disentangled position encodings, the overall computational complexity is very high. 2) Prevailing graph augmentation methods for GEL, including rule-based, sample-based, adaptive, and automated methods, are not suitable for augmenting subgraphs because a subgraph contains fewer nodes but richer information such as position, neighbor, and structure. Subgraph augmentation is more susceptible to undesirable perturbations. 3) Only a small number of nodes in the base graph are contained in subgraphs, which leads to a potential "bias" problem that the subgraph representation learning is dominated by these "hot" nodes. By contrast, the remaining nodes fail to be fully learned, which reduces the generalization ability of subgraph representation learning. In this paper, we aim to address the challenges above and propose a Position-Aware Data-Efficient Learning framework for subgraph neural networks called PADEL. Specifically, we propose a novel node position encoding method that is anchor-free, and design a new generative subgraph augmentation method based on a diffused variational subgraph autoencoder, and we propose exploratory and exploitable views for subgraph contrastive learning. Extensive experiment results on three real-world datasets show the superiority of our proposed method over state-of-the-art baselines.
Graph Neural Networks with Interlayer Feature Representation for Image Super-Resolution
Although deep learning has been extensively studied and achieved remarkable performance on single image super-resolution (SISR), existing convolutional neural networks (CNN) mainly focus on broader and deeper architecture design, ignoring the detailed information of the image itself and the potential relationship between the features. Recently, several attempts have been made to address the SISR with graph representation learning. However, existing GNN-based methods learning to deal with the SISR problem are limited to the information processing of the entire image or the relationship processing between different feature images of the same layer, ignoring the interdependence between the extracted features of different layers, which is not conducive to extracting deeper hierarchical features. In this paper, we propose an interlayer feature representation based graph neural network for image super-resolution (LSGNN), which consists of a layer feature graph representation learning module and a channel spatial attention module. The layer feature graph representation learning module mainly captures the interdependence between the features of different layers, which can learn more fine-grained image detail features. In addition, we also unified a channel attention module and a spatial attention module into our model, which takes into account the channel dimension information and spatial scale information, to improve the expressive ability, and achieve high quality image details. Extensive experiments and ablation studies demonstrate the superiority of the proposed model.
DGRec: Graph Neural Network for Recommendation with Diversified Embedding Generation
Graph Neural Network (GNN) based recommender systems have been attracting more and more attention in recent years due to their excellent performance in accuracy. Representing user-item interactions as a bipartite graph, a GNN model generates user and item representations by aggregating embeddings of their neighbors. However, such an aggregation procedure often accumulates information purely based on the graph structure, overlooking the redundancy of the aggregated neighbors and resulting in poor diversity of the recommended list. In this paper, we propose diversifying GNN-based recommender systems by directly improving the embedding generation procedure. Particularly, we utilize the following three modules: submodular neighbor selection to find a subset of diverse neighbors to aggregate for each GNN node, layer attention to assign attention weights for each layer, and loss reweighting to focus on the learning of items belonging to long-tail categories. Blending the three modules into GNN, we present DGRec (Diversified GNN-based Recommender System) for diversified recommendation. Experiments on real-world datasets demonstrate that the proposed method can achieve the best diversity while keeping the accuracy comparable to state-of-the-art GNN-based recommender systems. We open source DGRec at https://github.com/YangLiangwei/DGRec.
CLNode: Curriculum Learning for Node Classification
Node classification is a fundamental graph-based task that aims to predict the classes of unlabeled nodes, for which Graph Neural Networks (GNNs) are the state-of-the-art methods. Current GNNs assume that nodes in the training set contribute equally during training. However, the quality of training nodes varies greatly, and the performance of GNNs could be harmed by two types of low-quality training nodes: (1) inter-class nodes situated near class boundaries that lack the typical characteristics of their corresponding classes. Because GNNs are data-driven approaches, training on these nodes could degrade the accuracy. (2) mislabeled nodes. In real-world graphs, nodes are often mislabeled, which can significantly degrade the robustness of GNNs. To mitigate the detrimental effect of the low-quality training nodes, we present CLNode, which employs a selective training strategy to train GNN based on the quality of nodes. Specifically, we first design a multi-perspective difficulty measurer to accurately measure the quality of training nodes. Then, based on the measured qualities, we employ a training scheduler that selects appropriate training nodes to train GNN in each epoch. To evaluate the effectiveness of CLNode, we conduct extensive experiments by incorporating it in six representative backbone GNNs. Experimental results on real-world networks demonstrate that CLNode is a general framework that can be combined with various GNNs to improve their accuracy and robustness.
Inductive Graph Transformer for Delivery Time Estimation
Providing accurate estimated time of package delivery on users' purchasing pages for e-commerce platforms is of great importance to their purchasing decisions and post-purchase experiences. Although this problem shares some common issues with the conventional estimated time of arrival (ETA), it is more challenging with the following aspects: 1) Inductive inference. Models are required to predict ETA for orders with unseen retailers and addresses; 2) High-order interaction of order semantic information. Apart from the spatio-temporal features, the estimated time also varies greatly with other factors, such as the packaging efficiency of retailers, as well as the high-order interaction of these factors. In this paper, we propose an inductive graph transformer (IGT) that leverages raw feature information and structural graph data to estimate package delivery time. Different from previous graph transformer architectures, IGT adopts a decoupled pipeline and trains transformer as a regression function that can capture the multiplex information from both raw feature and dense embeddings encoded by a graph neural network (GNN). In addition, we further simplify the GNN structure by removing its non-linear activation and the learnable linear transformation matrix. The reduced parameter search space and linear information propagation in the simplified GNN enable the IGT to be applied in large-scale industrial scenarios. Experiments on real-world logistics datasets show that our proposed model can significantly outperform the state-of-the-art methods on estimation of delivery time.
Ask "Who", Not "What": Bitcoin Volatility Forecasting with Twitter Data
Understanding the variations in trading price (volatility), and its response to exogenous information, is a well-researched topic in finance. In this study, we focus on finding stable and accurate volatility predictors for a relatively new asset class of cryptocurrencies, in particular Bitcoin, using deep learning representations of public social media data obtained from Twitter. For our experiments, we extracted semantic information and user statistics from over 30 million Bitcoin-related tweets, in conjunction with 15-minute frequency price data over a horizon of 144 days. Using this data, we built several deep learning architectures that utilized different combinations of the gathered information. For each model, we conducted ablation studies to assess the influence of different components and feature sets over the prediction accuracy. We found statistical evidences for the hypotheses that: (i) temporal convolutional networks perform significantly better than both classical autoregressive models and other deep learning-based architectures in the literature, and (ii) tweet author meta-information, even detached from the tweet itself, is a better predictor of volatility than the semantic content and tweet volume statistics. We demonstrate how different information sets gathered from social media can be utilized in different architectures and how they affect the prediction results. As an additional contribution, we make our dataset public for future research.
Search Behavior Prediction: A Hypergraph Perspective
At E-Commerce stores such as Amazon, eBay, and Taobao, the shopping items and the query words that customers use to search for the items form a bipartite graph that captures search behavior. Such a query-item graph can be used to forecast search trends or improve search results. For example, generating query-item associations, which is equivalent to predicting links in the bipartite graph, can yield recommendations that can customize and improve the user search experience. Although the bipartite shopping graphs are straightforward to model search behavior, they suffer from two challenges: 1) The majority of items are sporadically searched and hence have noisy/sparse query associations, leading to a long-tail distribution. 2) Infrequent queries are more likely to link to popular items, leading to another hurdle known as disassortative mixing.
To address these two challenges, we go beyond the bipartite graph to take a hypergraph perspective, introducing a new paradigm that leverages <u>auxiliary</u> information from anonymized customer engagement sessions to assist the <u>main task</u> of query-item link prediction. This auxiliary information is available at web scale in the form of search logs. We treat all items appearing in the same customer session as a single hyperedge. The hypothesis is that items in a customer session are unified by a common shopping interest. With these hyperedges, we augment the original bipartite graph into a new hypergraph. We develop a Dual-Channel Attention-Based Hypergraph Neural Network (DCAH), which synergizes information from two potentially noisy sources (original query-item edges and item-item hyperedges). In this way, items on the tail are better connected due to the extra hyperedges, thereby enhancing their link prediction performance. We further integrate DCAH with self-supervised graph pre-training and/or DropEdge training, both of which effectively alleviate disassortative mixing. Extensive experiments on three proprietary E-Commerce datasets show that DCAH yields significant improvements of up to 24.6% in mean reciprocal rank (MRR) and 48.3% in recall compared to GNN-based baselines. Our source code is available at https://github.com/amazon-science/dual-channel-hypergraph-neural-network.
A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter
Memes can be a useful way to spread information because they are funny, easy to share, and can spread quickly and reach further than other forms. With increased interest in COVID-19 vaccines, vaccination-related memes have grown in number and reach. Memes analysis can be difficult because they use sarcasm and often require contextual understanding. Previous research has shown promising results but could be improved by capturing global and local representations within memes to model contextual information. Further, the limited public availability of annotated vaccine critical memes datasets limit our ability to design computational methods to help design targeted interventions and boost vaccine uptake. To address these gaps, we present VaxMeme, which consists of 10,244 manually labelled memes. With VaxMeme, we propose a new multimodal framework designed to improve the memes' representation by learning the global and local representations of memes. The improved memes' representations are then fed to an attentive representation learning module to capture contextual information for classification using an optimised loss function. Experimental results show that our framework outperformed state-of-the-art methods with an F1-Score of 84.2%. We further analyse the transferability and generalisability of our framework and show that understanding both modalities is important to identify vaccine critical memes on Twitter. Finally, we discuss how understanding memes can be useful in designing shareable vaccination promotion, myth debunking memes and monitoring their uptake on social media platforms.
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation
With the growth of high-dimensional sparse data in web-scale recommender systems, the computational cost to learn high-order feature interaction in CTR prediction task largely increases, which limits the use of high-order interaction models in real industrial applications. Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for accelerating the online model inference. However, they suffer from the degradation of model accuracy in knowledge distillation process. It is challenging to balance the efficiency and effectiveness of the shallow student models. To address this problem, we propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. The proposed lightweight student model DAGFM can learn arbitrary explicit feature interactions from teacher networks, which achieves approximately lossless performance and is proved by a dynamic programming algorithm. Besides, an improved general model KD-DAGFM+ is shown to be effective in distilling both explicit and implicit feature interactions from any complex teacher model. Extensive experiments are conducted on four real-world datasets, including a large-scale industrial dataset from WeChat platform with billions of feature dimensions. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments, showing the superiority of DAGFM to deal with the industrial scale data in CTR prediction task.
Heterogeneous Graph-based Context-aware Document Ranking
Users' complex information needs usually require consecutive queries, which results in sessions with a series of interactions. Exploiting such contextual interactions has been proven to be favorable for result ranking. However, existing studies mainly model the contextual information independently and sequentially. They neglect the diverse information hidden in different relations and structured information of session elements as well as the valuable signals from other relevant sessions. In this paper, we propose HEXA, a heterogeneous graph-based context-aware document ranking framework. It exploits heterogeneous graphs to organize the contextual information and beneficial search logs for modeling user intents and ranking results. Specifically, we construct two heterogeneous graphs, i.e., a session graph and a query graph. The session graph is built from the current session queries and documents. Meanwhile, we sample the current query's k-layer neighbors from search logs to construct the query graph. Then, we employ heterogeneous graph neural networks and specialized readout functions on the two graphs to capture the user intents from local and global aspects. Finally, the document ranking scores are measured by how well the documents are matched with the two user intents. Results on two large-scale datasets confirm the effectiveness of our model.
Learning and Maximizing Influence in Social Networks Under Capacity Constraints
Influence maximization (IM) refers to the problem of finding a subset of nodes in a network through which we could maximize our reach to other nodes in the network. This set is often called the "seed set", and its constituent nodes maximize the social diffusion process. IM has previously been studied in various settings, including under a time deadline, subject to constraints such as that of budget or coverage, and even subject to measures other than the centrality of nodes. The solution approach has generally been to prove that the objective function is submodular, or has a submodular proxy, and thus has a close greedy approximation. In this paper, we explore a variant of the IM problem where we wish to reach out to and maximize the probability of infection of a small subset of bounded capacity K. We show that this problem does not exhibit the same submodular guarantees as the original IM problem, for which we resort to the theory of gamma-weakly submodular functions. Subsequently, we develop a greedy algorithm that maximizes our objective despite the lack of submodularity. We also develop a suitable learning model that out-competes baselines on the task of predicting the top-K infected nodes, given a seed set as input.
Graph Summarization via Node Grouping: A Spectral Algorithm
Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.
Beyond Individuals: Modeling Mutual and Multiple Interactions for Inductive Link Prediction between Groups
Link prediction is a core task in graph machine learning with wide applications. However, little attention has been paid to link prediction between two group entities. This limits the application of the current approaches to many real-life problems, such as predicting collaborations between academic groups or recommending bundles of items to group users. Moreover, groups are often ephemeral or emergent, forcing the predicting model to deal with challenging inductive scenes. To fill this gap, we develop a framework composed of a GNN-based encoder and neural-based aggregating networks, namely the Mutual Multi-view Attention Networks (MMAN). First, we adopt GNN-based encoders to model multiple interactions among members and groups through propagating. Then, we develop MMAN to aggregate members' node representations into multi-view group representations and compute the final results by pooling pairwise scores between views. Specifically, several view-guided attention modules are adopted when learning multi-view group representations, thus capturing diversified member weights and multifaceted group characteristics. In this way, MMAN can further mimic the mutual and multiple interactions between groups. We conduct experiments on three datasets, including two academic group link prediction datasets and one bundle-to-group recommendation dataset. The results demonstrate that the proposed approach can achieve superior performance on both tasks compared with plain GNN-based methods and other aggregating methods.
Scalable Adversarial Attack Algorithms on Influence Maximization
In this paper, we study the adversarial attacks on influence maximization under dynamic influence propagation models in social networks. In particular, given a known seed set S, the problem is to minimize the influence spread from S by deleting a limited number of nodes and edges. This problem reflects many application scenarios, such as blocking virus (e.g. COVID-19) propagation in social networks by quarantine and vaccination, blocking rumor spread by freezing fake accounts, or attacking competitor's influence by incentivizing some users to ignore the information from the competitor. In this paper, under the linear threshold model, we adapt the reverse influence sampling approach and provide efficient algorithms of sampling valid reverse reachable paths to solve the problem. We present three different design choices on reverse sampling, which all guarantee 1/2 - ε approximation (for any small ε >0) and an efficient running time.
Ranking-based Group Identification via Factorized Attention on Social Tripartite Graph
Due to the proliferation of social media, a growing number of users search for and join group activities in their daily life. This develops a need for the study on the ranking-based group identification (RGI) task, i.e., recommending groups to users. The major challenge in this task is how to effectively and efficiently leverage the item interaction information from users' and groups' online behaviors, in addition to the information from social interaction between users and groups. Though recent developments of Graph Neural Networks (GNNs) succeed in aggregating both social and user-item interaction simultaneously, they however fail to comprehensively resolve this RGI task. In this paper, we propose a novel GNN-based framework named Contextualized Factorized Attention for Group identification (CFAG). We devise tripartite graph convolution to aggregate information from different types of neighborhoods among users, groups, and items. To cope with the data sparsity issue, we devise a novel propagation augmentation (PA) layer, which is based on our proposed factorized attention mechanism. PA layers efficiently learn the relevance degree of non-neighbor nodes to improve the information propagation to users. Experimental results on three benchmark datasets verify the superiority of CFAG. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework.
Graph Sequential Neural ODE Process for Link Prediction on Dynamic and Sparse Graphs
Link prediction on dynamic graphs is an important task in graph mining. Existing approaches based on dynamic graph neural networks (DGNNs) typically require a significant amount of historical data (interactions over time), which is not always available in practice. The missing links over time, which is a common phenomenon in graph data, further aggravates the issue and thus creates extremely sparse and dynamic graphs. To address this problem, we propose a novel method based on the neural process, called Graph Sequential Neural ODE Process (GSNOP). Specifically, GSNOP combines the advantage of the neural process and neural ordinary differential equation that models the link prediction on dynamic graphs as a dynamic-changing stochastic process. By defining a distribution over functions, GSNOP introduces the uncertainty into the predictions, making it generalize to more situations instead of overfitting to the sparse data. GSNOP is also agnostic to model structures that can be integrated with any DGNN to consider the chronological and geometrical information for link prediction. Extensive experiments on three dynamic graph datasets show that GSNOP can significantly improve the performance of existing DGNNs and outperform other neural process variants.
S2GAE: Self-Supervised Graph Autoencoders are Generalizable Learners with Graph Masking
Self-supervised learning (SSL) has been demonstrated to be effective in pre-training models that can be generalized to various downstream tasks. Graph Autoencoder (GAE), an increasingly popular SSL approach on graphs, has been widely explored to learn node representations without ground-truth labels. However, recent studies show that existing GAE methods could only perform well on link prediction tasks, while their performance on classification tasks is rather limited. This limitation casts doubt on the generalizability and adoption of GAE. In this paper, for the first time, we show that GAE can generalize well to both link prediction and classification scenarios, including node-level and graph-level tasks, by redesigning its critical building blocks from the graph masking perspective. Our proposal is called Self-Supervised Graph Autoencoder--S2GAE, which unleashes the power of GAEs with minimal yet nontrivial efforts. Specifically, instead of reconstructing the whole input structure, we randomly mask a portion of edges and learn to reconstruct these missing edges with an effective masking strategy and an expressive decoder network. Moreover, we theoretically prove that S2GAE could be regarded as an edge-level contrastive learning framework, providing insights into why it generalizes well. Empirically, we conduct extensive experiments on 21 benchmark datasets across link prediction and node & graph classification tasks. The results validate the superiority of S2GAE against state-of-the-art generative and contrastive methods. This study demonstrates the potential of GAE as a universal representation learner on graphs. Our code is publicly available at https://github.com/qiaoyu-tan/S2GAE.
Dependency-aware Self-training for Entity Alignment
Entity Alignment (EA), which aims to detect entity mappings (i.e. equivalent entity pairs) in different Knowledge Graphs (KGs), is critical for KG fusion. Neural EA methods dominate current EA research but still suffer from their reliance on labelled mappings. To solve this problem, a few works have explored boosting the training of EA models with self-training, which adds confidently predicted mappings into the training data iteratively. Though the effectiveness of self-training can be glimpsed in some specific settings, we still have very limited knowledge about it. One reason is the existing works concentrate on devising EA models and only treat self-training as an auxiliary tool. To fill this knowledge gap, we change the perspective to self-training to shed light on it. In addition, the existing self-training strategies have limited impact because they introduce either much False Positive noise or a low quantity of True Positive pseudo mappings. To improve self-training for EA, we propose exploiting the dependencies between entities, a particularity of EA, to suppress the noise without hurting the recall of True Positive mappings. Through extensive experiments, we show that the introduction of dependency makes the self-training strategy for EA reach a new level. The value of self-training in alleviating the reliance on annotation is actually much higher than what has been realised. Furthermore, we suggest future study on smart data annotation to break the ceiling of EA performance.
CL4CTR: A Contrastive Learning Framework for CTR Prediction
Many Click-Through Rate (CTR) prediction works focused on designing advanced architectures to model complex feature interactions but neglected the importance of feature representation learning, e.g., adopting a plain embedding layer for each feature, which results in sub-optimal feature representations and thus inferior CTR prediction performance. For instance, low frequency features, which account for the majority of features in many CTR tasks, are less considered in standard supervised learning settings, leading to sub-optimal feature representations. In this paper, we introduce self-supervised learning to produce high-quality feature representations directly and propose a model-agnostic Contrastive Learning for CTR (CL4CTR) framework consisting of three self-supervised learning signals to regularize the feature representation learning: contrastive loss, feature alignment, and field uniformity. The contrastive module first constructs positive feature pairs by data augmentation and then minimizes the distance between the representations of each positive feature pair by the contrastive loss. The feature alignment constraint forces the representations of features from the same field to be close, and the field uniformity constraint forces the representations of features from different fields to be distant. Extensive experiments verify that CL4CTR achieves the best performance on four datasets and has excellent effectiveness and compatibility with various representative baselines.
Weakly Supervised Entity Alignment with Positional Inspiration
The current success of entity alignment (EA) is still mainly based on large-scale labeled anchor links. However, the refined annotation of anchor links still consumes a lot of manpower and material resources. As a result, an increasing number of works based on active learning, few-shot learning, or other deep network learning techniques have been developed to address the performance bottleneck caused by a lack of labeled data. These works focus either on the strategy of choosing more informative labeled data or on the strategy of model training, while it remains opaque why existing popular EA models (e.g., GNN-based models) fail the EA task with limited labeled data. To overcome this issue, this paper analyzes the problem of weakly supervised EA from the perspective of model design and proposes a novel weakly supervised learning framework, Position Enhanced Entity Alignment (PEEA). Besides absorbing structural and relational information, PEEA aims to increase the connections between far-away entities and labeled ones by incorporating positional information into the representation learning with a Position Attention Layer (PAL). To fully utilize the limited anchor links, we further introduce a novel position encoding method that considers both anchor links and relational information from a global view. The proposed position encoding will be fed into PEEA as additional entity features. Extensive experiments on public datasets demonstrate the effectiveness of PEEA.
Zero to Hero: Exploiting Null Effects to Achieve Variance Reduction in Experiments with One-sided Triggering
In online experiments where the intervention is only exposed, or "triggered", for a small subset of the population, it is critical to use variance reduction techniques to estimate treatment effects with sufficient precision to inform business decisions. Trigger-dilute analysis is often used in these situations, and reduce the sampling variance of overall intent-to-treat (ITT) effects by an order of magnitude equal to the inverse of the triggering rate; for example, a triggering rate of 5% corresponds to roughly a 20x reduction in variance. To apply trigger-dilute analysis, one needs to know experimental subjects' triggering counterfactual statuses, i.e., the counterfactual behavior of subjects under both treatment and control conditions. In this paper, we propose an unbiased ITT estimator with reduced variance applicable for experiments where the triggering counterfactual status is only observed in the treatment group. Our method is based on the efficiency augmentation idea of CUPED and draws upon identification frameworks from the principal stratification and instrumental variables literature. The unbiasedness of our estimation approach relies on a testable assumption that the augmentation term used for covariate adjustment equals zero in expectation. Unlike traditional covariate adjustment or principal score modeling approaches, our estimator can incorporate both pre-experiment and in-experiment observations. We demonstrate through both a real-world experiment and simulations that our estimator can remain unbiased and achieve precision improvements as large as if triggering status were fully observed, and in some cases can even outperform trigger-dilute analysis.
Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark
Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task. Datasets and codes are released at https://github.com/HITsz-TMG/Hansel.
Self-supervised Multi-view Disentanglement for Expansion of Visual Collections
Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a specific view. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views. To this end, we propose a self-supervised learning method for extracting disentangled view-specific representations for images such that the inter-view overlap is minimized. We show how this allows us to compute the intent of a collection as a distribution over views. Finally, we show how effective retrieval can be performed by prioritizing candidate expansion images that match the intent of a query collection.
Efficient Integration of Multi-Order Dynamics and Internal Dynamics in Stock Movement Prediction
Advances in deep neural network (DNN) architectures have enabled new prediction techniques for stock market data. Unlike other multivariate time-series data, stock markets show two unique characteristics: (i) multi-order dynamics, as stock prices are affected by strong non-pairwise correlations (e.g., within the same industry); and (ii) internal dynamics, as each individual stock shows some particular behaviour. Recent DNN-based methods capture multi-order dynamics using hypergraphs, but rely on the Fourier basis in the convolution, which is both inefficient and ineffective. In addition, they largely ignore internal dynamics by adopting the same model for each stock, which implies a severe information loss.
In this paper, we propose a framework for stock movement prediction to overcome the above issues. Specifically, the framework includes temporal generative filters that implement a memory-based mechanism onto an LSTM network in an attempt to learn individual patterns per stock. Moreover, we employ hypergraph attentions to capture the non-pairwise correlations. Here, using the wavelet basis instead of the Fourier basis, enables us to simplify the message passing and focus on the localized convolution. Experiments with US market data over six years show that our framework outperforms state-of-the-art methods in terms of profit and stability. Our source code and data are available at https://github.com/thanhtrunghuynh93/estimate.
Telecommunication Traffic Forecasting via Multi-task Learning
Accurate telecommunication time series forecasting is critical for smart management systems of cellular networks, and has a special challenge in predicting different types of time series simultaneously at one base station (BS), e.g., the SMS, Calls, and Internet. Unlike the well-studied single target forecasting problem for one BS, this distributed multi-target forecasting problem should take advantage of both the intra-BS dependence of different types of time series at the same BS and the inter-BS dependence of time series at different BS. To this end, we first propose a model to learn the inter-BS dependence by aggregating the multi-view dependence, e.g., from the viewpoint of SMS, Calls, and Internet. To incorporate the interBS dependence in time series forecasting, we then propose a Graph Gate LSTM (GGLSTM) model that includes a graph-based gate mechanism to unite those base stations with a strong dependence on learning a collaboratively strengthened prediction model. We also extract the intra-BS dependence by an attention network and use it in the final prediction. Our proposed approach is evaluated on two real-world datasets. Experiment results demonstrate the effectiveness of our model in predicting multiple types of telecom traffic at the distributed base stations.
Combining vs. Transferring Knowledge: Investigating Strategies for Improving Demographic Inference in Low Resource Settings
For some learning tasks, generating a large labeled data set is impractical. Demographic inference using social media data is one such task. While different strategies have been proposed to mitigate this challenge, including transfer learning, data augmentation, and data combination, they have not been explored for the task of user level demographic inference using social media data. This paper explores two of these strategies: data combination and transfer learning. First, we combine labeled training data from multiple data sets of similar size to understand when the combination is valuable and when it is not. Using data set distance, we quantify the relationship between our data sets to help explain the performance of the combination strategy. Then, we consider supervised transfer learning, where we pretrain a model on a larger labeled data set, fine-tune the model on smaller data sets, and incorporate regularization as part of the transfer learning process. We empirically show the strengths and limitations of the proposed techniques on multiple Twitter data sets.
Active Ensemble Learning for Knowledge Graph Error Detection
Knowledge graphs (KGs) could effectively integrate a large number of real-world assertions, and improve the performance of various applications, such as recommendation and search. KG error detection has been intensively studied since real-world KGs inevitably contain erroneous triples. While existing studies focus on developing a novel algorithm dedicated to one or a few data characteristics, we explore advancing KG error detection by assembling a set of state-of-the-art (SOTA) KG error detectors. However, it is nontrivial to develop a practical ensemble learning framework for KG error detection. Existing ensemble learning models heavily rely on labels, while it is expensive to acquire labeled errors in KGs. Also, KG error detection itself is challenging since triples contain rich semantic information and might be false because of various reasons. To this end, we propose to leverage active learning to minimize human efforts. Our proposed framework - KAEL, could effectively assemble a set of off-the-shelf error detection algorithms, by actively using a limited number of manual annotations. It adaptively updates the ensemble learning policy in each iteration based on active queries, i.e., the answers from experts. After all annotation budget is used, KAEL utilizes the trained policy to identify remaining suspicious triples. Experiments on real-world KGs demonstrate that we can achieve significant improvement when applying KAEL to assemble SOTA error detectors. KAEL also outperforms SOTA ensemble learning baselines significantly.
Stochastic Solutions for Dense Subgraph Discovery in Multilayer Networks
Network analysis has played a key role in knowledge discovery and data mining. In many real-world applications in recent years, we are interested in mining multilayer networks, where we have a number of edge sets called layers, which encode different types of connections and/or time-dependent connections over the same set of vertices. Among many network analysis techniques, dense subgraph discovery, aiming to find a dense component in a network, is an essential primitive with a variety of applications in diverse domains. In this paper, we introduce a novel optimization model for dense subgraph discovery in multilayer networks. Our model aims to find a stochastic solution, i.e., a probability distribution over the family of vertex subsets, rather than a single vertex subset, whereas it can also be used for obtaining a single vertex subset. For our model, we design an LP-based polynomial-time exact algorithm. Moreover, to handle large-scale networks, we also devise a simple, scalable preprocessing algorithm, which often reduces the size of the input networks significantly and results in a substantial speed-up. Computational experiments demonstrate the validity of our model and the effectiveness of our algorithms.
Model-based Unbiased Learning to Rank
Unbiased Learning to Rank(ULTR), i.e., learning to rank documents with biased user feedback data, is a well-known challenge in information retrieval. Existing methods in unbiased learning to rank typically rely on click modeling or inverse propensity weighting(IPW). Unfortunately, search engines face the issue of a severe long-tail query distribution, which neither click modeling nor IPW handles well. Click modeling usually requires that the same query-document pair appears multiple times for reliable inference, which makes it fall short for tail queries; IPW suffers from high variance since it is highly sensitive to small propensity score values. Therefore, a general debiasing framework that works well under tail queries is sorely needed. To address this problem, we propose a model-based unbiased learning-to-rank framework. Specifically, we develop a general context-aware user simulator to generate pseudo clicks for unobserved ranked lists to train rankers, which addresses the data sparsity problem. In addition, considering the discrepancy between pseudo clicks and actual clicks, we take the observation of a ranked list as the treatment variable and further incorporate inverse propensity weighting with pseudo labels in a doubly robust way. The derived bias and variance indicate that the proposed model-based method is more robust than existing methods. Extensive experiments on benchmark datasets, including simulated datasets and real click logs, demonstrate that the proposed model-based method consistently outperforms state-of-the-art methods in various scenarios. The code is available at https://github.com/rowedenny/MULTR.
Reducing Negative Effects of the Biases of Language Models in Zero-Shot Setting
Pre-trained language models (PLMs) such as GPTs have been revealed to be biased towards certain target classes because of the prompt and the model's intrinsic biases. In contrast to the fully supervised scenario where there are a large number of costly labeled samples that can be used to fine-tune model parameters to correct for biases, there are no labeled samples available for the zero-shot setting. We argue that a key to calibrating the biases of a PLM on a target task in zero-shot setting lies in detecting and estimating the biases, which remains a challenge. In this paper, we first construct probing samples with the randomly generated token sequences, which are simple but effective in detecting inputs for stimulating GPTs to show the biases; and we pursue an in-depth research on the plausibility of utilizing class scores for the probing samples to reflect and estimate the biases of GPTs on a downstream target task. Furtherly, in order to effectively utilize the probing samples and thus reduce negative effects of the biases of GPTs, we propose a lightweight model Calibration Adapter (CA) along with a self-guided training strategy that carries out distribution-level optimization, which enables us to take advantage of the probing samples to fine-tune and select only the proposed CA, respectively, while keeping the PLM encoder frozen. To demonstrate the effectiveness of our study, we have conducted extensive experiments, where the results indicate that the calibration ability acquired by CA on the probing samples can be successfully transferred to reduce negative effects of the biases of GPTs on a downstream target task, and our approach can yield better performance than state-of-the-art (SOTA) models in zero-shot settings.
Separating Examination and Trust Bias from Click Predictions for Unbiased Relevance Ranking
Alleviating the examination and trust bias in ranking systems is an important research line in unbiased learning-to-rank (ULTR). Current methods typically use the propensity to correct the biased user clicks and then learn ranking models based on the corrected clicks. Though successes have been achieved, directly modifying the clicks suffers from the inherent high variance because the propensities are usually involved in the denominators of corrected clicks. The problem gets even worse in the situation of mixed examination and trust bias. To address the issue, this paper proposes a novel ULTR method called Decomposed Ranking Debiasing (DRD). DRD is tailored for learning unbiased relevance models with low variance in the existence of examination and trust bias. Unlike existing methods that directly modify the original user clicks, DRD proposes to decompose each click prediction as the combination of a relevance term outputted by the ranking model and other bias terms. The unbiased relevance model, therefore, can be learned by fitting the overall click predictions to the biased user clicks. A joint learning algorithm is developed to learn the relevance and bias models' parameters alternatively. Theoretical analysis showed that, compared with existing methods, DRD has lower variance while retains unbiasedness. Empirical studies indicated that DRD can effectively reduce the variance and outperform the state-of-the-art ULTR baselines.
Unbiased and Efficient Self-Supervised Incremental Contrastive Learning
Contrastive Learning (CL) has been proved to be a powerful self-supervised approach for a wide range of domains, including computer vision and graph representation learning. However, the incremental learning issue of CL has rarely been studied, which brings the limitation in applying it to real-world applications. Contrastive learning identifies the samples with the negative ones from the noise distribution that changes in the incremental scenarios. Therefore, only fitting the change of data without noise distribution causes bias, and directly retraining results in low efficiency. To bridge this research gap, we propose a self-supervised Incremental Contrastive Learning (ICL) framework consisting of (i) a novel Incremental InfoNCE (NCE-II) loss function by estimating the change of noise distribution for old data to guarantee no bias with respect to the retraining, (ii) a meta-optimization with deep reinforced Learning Rate Learning (LRL) mechanism which can adaptively learn the learning rate according to the status of the training processes and achieve fast convergence which is critical for incremental learning. Theoretically, the proposed ICL is equivalent to retraining, which is based on solid mathematical derivation. In practice, extensive experiments in different domains demonstrate that, without retraining a new model, ICL achieves up to 16.7x training speedup and 16.8x faster convergence with competitive results.
Pairwise Fairness in Ranking as a Dissatisfaction Measure
Fairness and equity have become central to ranking problems in information access systems, such as search engines, recommender systems, or marketplaces. To date, several types of fair ranking measures have been proposed, including diversity, exposure, and pairwise fairness measures. Out of those, pairwise fairness is a family of metrics whose normative grounding has not been clearly explicated, leading to uncertainty with respect to the construct that is being measured and how it relates to stakeholders' desiderata.
In this paper, we develop a normative and behavioral grounding for pairwise fairness in ranking. Leveraging measurement theory and user browsing models, we derive an interpretation of pairwise fairness centered on the construct of producer dissatisfaction, tying pairwise fairness to perceptions of ranking quality. Highlighting the key limitations of prior pairwise measures, we introduce a set of reformulations that allow us to capture behavioral and practical aspects of ranking systems. These reformulations form the basis for a novel pairwise metric of producer dissatisfaction. Our analytical and empirical study demonstrates the relationship between dissatisfaction, pairwise, and exposure-based fairness metrics, enabling informed adoption of the measures.
Uncertainty Quantification for Fairness in Two-Stage Recommender Systems
Many large-scale recommender systems consist of two stages. The first stage efficiently screens the complete pool of items for a small subset of promising candidates, from which the second-stage model curates the final recommendations. In this paper, we investigate how to ensure group fairness to the items in this two-stage architecture. In particular, we find that existing first-stage recommenders might select an irrecoverably unfair set of candidates such that there is no hope for the second-stage recommender to deliver fair recommendations. To this end, motivated by recent advances in uncertainty quantification, we propose two threshold-policy selection rules that can provide distribution-free and finite-sample guarantees on fairness in first-stage recommenders. More concretely, given any relevance model of queries and items and a point-wise lower confidence bound on the expected number of relevant items for each threshold-policy, the two rules find near-optimal sets of candidates that contain enough relevant items in expectation from each group of items. To instantiate the rules, we demonstrate how to derive such confidence bounds from potentially partial and biased user feedback data, which are abundant in many large-scale recommender systems. In addition, we provide both finite-sample and asymptotic analyses of how close the two threshold selection rules are to the optimal thresholds. Beyond this theoretical analysis, we show empirically that these two rules can consistently select enough relevant items from each group while minimizing the size of the candidate sets for a wide range of settings.
Generating Explainable Product Comparisons for Online Shopping
An essential part of making shopping purchase decisions is to compare and contrast products based on key differentiating features, but doing this manually can be overwhelming. Prior methods offer limited product comparison capabilities, e.g., via pre-defined common attributes that may be difficult to understand, or irrelevant to a particular product or user. Automatically generating an informative, natural-sounding, and factually consistent comparative text for multiple product and attribute types is a challenging research problem. We describe HCPC (Human Centered Product Comparison), to tackle two kinds of comparisons for online shopping: (i) product-specific, to describe and compare products based on their key attributes; and (ii) attribute-specific comparisons, to compare similar products on a specific attribute. To ensure that comparison text is faithful to the input product data, we introduce a novel multi-decoder, multi-task generative language model. One decoder generates product comparison text, and a second one generates supportive, explanatory text in the form of product attribute names and values. The second task imitates a copy mechanism, improving the comparison generator, and its output is used to justify the factual accuracy of the generated comparison text, by training a factual consistency model to detect and correct errors in the generated comparative text. We release a new dataset (https://registry.opendata.aws/) of ~15K human generated sentences, comparing products on one or more attributes (the first such data we know of for product comparison). We demonstrate on this data that HCPC significantly outperforms strong baselines, by ~10% using automatic metrics, and ~5% using human evaluation.
Reducing the Bias of Visual Objects in Multimodal Named Entity Recognition
Visual information shows to empower accurately named entity recognition in short texts, such as posts from social media. Previous work on multimodal named entity recognition (MNER) often regards an image as a set of visual objects, trying to explicitly align visual objects and entities. However, these methods may suffer the bias introduced by visual objects when they are not identical to entities in quantity and entity type. Different from this kind of explicit alignment, we argue that implicit alignment is effective in optimizing the shared semantic space learning between text and image for improving MNER. To this end, we propose a de-bias contrastive learning based approach for MNER, which studies modality alignment enhanced by cross-modal contrastive learning. Specifically, our contrastive learning adopts a hard sample mining strategy and a debiased contrastive loss to alleviate the bias of quantity and entity type, respectively, which globally learns to align the feature spaces from text and image. Finally, the learned semantic space works with a NER decoder to recognize entities in text. Conducted on two benchmark datasets, experimental results show that our approach outperforms the current state-of-the-art methods.
Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual Bandits
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems, since they allow efficient reuse of existing log data. However, there are fundamental limits to using existing log data alone, since the counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated. To overcome this limitation, we explore the question of how to design data-gathering policies that most effectively augment an existing dataset of bandit feedback with additional observations for both learning and evaluation. To this effect, this paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem. We explore multiple approaches to computing MVAL policies efficiently, and find that they can be substantially more effective in decreasing the variance of an estimator than naïve approaches.
Unbiased Knowledge Distillation for Recommendation
As a promising solution for model compression, knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency. Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge (\iesoft labels ) to supervise the learning of a compact student model. However, we find such a standard distillation paradigm would incur serious bias issue --- popular items are more heavily recommended after the distillation. This effect prevents the student model from making accurate and fair recommendations, decreasing the effectiveness of RS.
In this work, we identify the origin of the bias in KD --- it roots in the biased soft labels from the teacher, and is further propagated and intensified during the distillation. To rectify this, we propose a new KD method with a stratified distillation strategy. It first partitions items into multiple groups according to their popularity, and then extracts the ranking knowledge within each group to supervise the learning of the student. Our method is simple and teacher-agnostic --- it works on distillation stage without affecting the training of the teacher model. We conduct extensive theoretical and empirical studies to validate the effectiveness of our proposal. We release our code at: https://github.com/chengang95/UnKD.
Differentially Private Temporal Difference Learning with Stochastic Nonconvex-Strongly-Concave Optimization
Temporal difference (TD) learning with nonlinear function approximation (nonlinear TD learning for short) permits evaluating policies using neural networks, and is a core component in modern deep reinforcement learning. Though significant advances have been made to improve its effectiveness, little attention has been paid to the data privacy faced when applying it in real applications. To mitigate the privacy concerns in practical applications of nonlinear TD learning, in this paper, we consider preserving its privacy under the notion of differential privacy (DP). This problem is challenging since nonlinear TD learning is usually studied in the formulation of stochastic nonconvex-strongly-concave optimization to obtain finite-sample analysis, which requires simultaneously preserving privacy on both primal and dual sides. To this end, we adopt a single-timescale algorithm, which optimizes both sides using learning rates of the same order, to avoid unnecessary privacy costs. Further, we achieve a good trade-off between the privacy and utility guarantees by perturbing gradients on both sides using Gaussian noises with well-calibrated variances. Consequently, our algorithm achieves rigorous (ε,δ)-DP guarantee with the utility upper bounded by ~O((d/log(1/δ))1/8 (nε)1/4) where n is the trajectory length and d is the ambient dimension of the feature space. Extensive experiments conducted in OpenAI Gym validate the advantages of our algorithm.
Revisiting Code Search in a Two-Stage Paradigm
With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast; however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-K code candidates, and then uses fine-grained cross-encoders for finer ranking. Furthermore, we conduct extensive experiments on different code candidate volumes and multiple programming languages to verify the effectiveness of TOSS. We also compare TOSS with six data fusion methods. Experimental results show that TOSS is not only efficient, but also achieves state-of-the-art accuracy with an overall mean reciprocal ranking (MRR) score of 0.763, compared to the best baseline result on the CodeSearchNet benchmark of 0.713.
Multi-queue Momentum Contrast for Microvideo-Product Retrieval
The booming development and huge market of micro-videos bring new e-commerce channels for merchants. Currently, more micro-video publishers prefer to embed relevant ads into their micro-videos, which not only provides them with business income but helps the audiences to discover their interesting products. However, due to the micro-video recording by unprofessional equipment, involving various topics and including multiple modalities, it is challenging to locate the products related to micro-videos efficiently, appropriately, and accurately. We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances.
A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval, consisting of the uni-modal feature and multi-modal instance representation learning. Moreover, a discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories. We collect two large-scale microvideo-product datasets (MVS and MVS-large) for evaluation and manually construct the hierarchical category ontology, which covers sundry products in daily life. Extensive experiments show that MQMC outperforms the state-of-the-art baselines. Our replication package (including code, dataset, etc.) is publicly available at https://github.com/duyali2000/MQMC.
Beyond Hard Negatives in Product Search: Semantic Matching Using One-Class Classification (SMOCC)
Semantic matching is an important component of a product search pipeline. Its goal is to capture the semantic intent of the search query as opposed to the syntactic matching performed by a lexical matching system. A semantic matching model captures relationships like synonyms, and also captures common behavioral patterns to retrieve relevant results by generalizing from purchase data. They however suffer from lack of availability of informative negative examples for model training. Various methods have been proposed in the past to address this issue based upon hard-negative mining and contrastive learning.
In this work, we propose a novel method for semantic matching based on one-class classification called SMOCC. Given a query and a relevant product, SMOCC generates the representation of an informative negative which is then used to train the model. Our method is based on the idea of generating negatives by using adversarial search in the neighborhood of the positive examples. We also propose a novel approach for selecting the radius to generate adversarial negative products around queries based on the model's understanding of the query. Depending on how we select the radius, we propose two variants of our method: SMOCC-QS, that quantizes the queries using their specificity, and SMOCC-EM, that uses expectation-maximization paradigm to iteratively learn the best radius. We show that our method outperforms the state-of-the-art hard negative mining approaches by increasing the purchase recall by 3 percentage points, and improving the percentage of exacts retrieved by up to 5 percentage points while reducing irrelevant results by 1.8 percentage points.
Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity Retrieval
Zero-shot entity retrieval, aiming to link mentions to candidate entities under the zero-shot setting, is vital for many tasks in Natural Language Processing. Most existing methods represent mentions/entities via the sentence embeddings of corresponding context from the Pre-trained Language Model. However, we argue that such coarse-grained sentence embeddings can not fully model the mentions/entities, especially when the attention scores towards mentions/entities are relatively low. In this work, we propose GER, a Graph enhanced Entity Retrieval framework, to capture more fine-grained information as complementary to sentence embeddings. We extract the knowledge units from the corresponding context and then construct a mention/entity centralized graph. Hence, we can learn the fine-grained information about mention/entity by aggregating information from these knowledge units. To avoid the graph bottleneck for the central mention/entity node, we construct a hierarchical graph and design a novel Hierarchical Graph Attention Network~(HGAN). Experimental results on popular benchmarks demonstrate that our proposed GER framework performs better than previous state-of-the-art models.
Feature Missing-aware Routing-and-Fusion Network for Customer Lifetime Value Prediction in Advertising
Nowadays, customer lifetime value (LTV) plays an important role in mobile game advertising, since it can be beneficial to adjust ad bids and ensure that the games are promoted to the most valuable users. Some neural models are utilized for LTV prediction based on the rich user features. However, in the advertising scenario, due to the privacy settings or limited length of log retention, etc, most of existing approaches suffer from the missing feature problem. Moreover, only a small fraction of purchase behaviours can be observed. The label sparsity inevitably limits model expressiveness. To tackle the aforementioned challenges, we propose a feature missing-aware routing-and-fusion network (MarfNet) to reduce the effect of the missing features while training. Specifically, we calculate the missing states of raw features and feature interactions for each sample. Based on the missing states, two missing-aware layers are designed to route samples into different experts, thus each expert can focus on the real features of samples assigned to it. Finally we get the missing-aware representation by the weighted fusion of the experts. To alleviate the label sparsity, we further propose a batch-in dynamic discrimination enhanced (Bidden) loss weight mechanism, which can automatically assign greater loss weights to difficult samples in the training process. Both offline experiments and online A/B tests have validated the superiority of our proposed Bidden-MarfNet.
Multimodal Pre-Training with Self-Distillation for Product Understanding in E-Commerce
Product understanding refers to a series of product-centric tasks, such as classification, alignment and attribute values prediction, which requires fine-grained fusion of various modalities of products. Excellent product modeling ability will enhance the user experience and benefit search and recommendation systems. In this paper, we propose MBSD, a pre-trained vision-and-language model which can integrate the heterogeneous information of product in a single stream BERT-style architecture. Compared with current approaches, MBSD uses a lightweight convolutional neural network instead of a heavy feature extractor for image encoding, which has lower latency. Besides, we cleverly utilize user behavior data to design a two-stage pre-training task to understand products from different perspectives. In addition, there is an underlying imbalanced problem in multimodal pre-training, which will impairs downstream tasks. To this end, we propose a novel self-distillation strategy to transfer the knowledge in dominated modality to weaker modality, so that each modality can be fully tapped during pre-training. Experimental results on several product understanding tasks demonstrate that the performance of MBSD outperforms the competitive baselines.
Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation
Benefiting from transformer-based pre-trained language models, neural ranking models have made significant progress. More recently, the advent of multilingual pre-trained language models provides great support for designing neural cross-lingual retrieval models. However, due to unbalanced pre-training data in different languages, multilingual language models have already shown a performance gap between high and low-resource languages in many downstream tasks. And cross-lingual retrieval models built on such pre-trained models can inherit language bias, leading to suboptimal result for low-resource languages. Moreover, unlike the English-to-English retrieval task, where large-scale training collections for document ranking such as MS MARCO are available, the lack of cross-lingual retrieval data for low-resource language makes it more challenging for training cross-lingual retrieval models. In this work, we propose OPTICAL: <u>Op</u>timal <u>T</u>ransport dist<u>i</u>llation for low-resource <u>C</u>ross-lingual information retrieval. To transfer a model from high to low resource languages, OPTICAL forms the cross-lingu<u>al</u> token alignment task as an optimal transport problem to learn from a well-trained monolingual retrieval model. By separating the cross-lingual knowledge from knowledge of query document matching, OPTICAL only needs bitext data for distillation training, which is more feasible for low-resource languages. Experimental results show that, with minimal training data, OPTICAL significantly outperforms strong baselines on low-resource languages, including neural machine translation.
An F-shape Click Model for Information Retrieval on Multi-block Mobile Pages
Most click models focus on user behaviors towards a single list. However, with the development of user interface (UI) design, the layout of displayed items on a result page tends to be multi-block style instead of a single list, which requires different assumptions to model user behaviors more accurately. There exist click models for multi-block pages in desktop contexts, but they cannot be directly applied to mobile scenarios due to different interaction manners, result types and especially multi-block presentation styles. In particular, multi-block mobile pages can normally be decomposed into interleavings of basic vertical blocks and horizontal blocks, thus resulting in typically F-shape forms. To mitigate gaps between desktop and mobile contexts for multi-block pages, we conduct a user eye-tracking study, and identify users' sequential browsing, block skip and comparison patterns on F-shape pages. These findings lead to the design of a novel F-shape Click Model (FSCM), which serves as a general solution to multi-block mobile pages. Firstly, we construct a Directed Acyclic Graph (DAG) for each page, where each item is regarded as a vertex and each edge indicates the user's possible examination flow. Secondly, we propose DAG-structured GRUs and a comparison module to model users' sequential (sequential browsing, block skip) and non-sequential (comparison) behaviors respectively. Finally, we combine GRU states and comparison patterns to perform user click predictions. Experiments show that FSCM outperforms baseline models.
Boosting Advertising Space: Designing Ad Auctions for Augment Advertising
In online e-commerce platforms, sponsored ads are always mixed with non-sponsored organic content (recommended items). To guarantee user experience, online platforms always impose strict limitations on the number of ads displayed, becoming the bottleneck for advertising revenue. To boost advertising space, we introduce a novel advertising business paradigm called Augment Advertising, where once a user clicks on a leading ad on the main page, instead of being shown the corresponding products, a collection of mini-detail ads relevant to the clicked ad is displayed. A key component for augment advertising is to design ad auctions to jointly select leading ads on the main page and mini-detail ads on the augment ad page. In this work, we decouple the ad auction into a two-stage auction, including a leading ad auction and a mini-detail ad auction. We design the Potential Generalized Second Price (PGSP) auction with Symmetric Nash Equilibrium (SNE) for leading ads, and adopt GSP auction for mini-detail ads. We have deployed augment advertising on Taobao advertising platform, and conducted extensive offline evaluations and online A/B tests. The evaluation results show that augment advertising could guarantee user experience while improving the ad revenue and the PGSP auction outperforms baselines in terms of revenue and user experience in augment advertising.
A Bird's-eye View of Reranking: From List Level to Page Level
Reranking, as the final stage of multi-stage recommender systems, refines the initial lists to maximize the total utility. With the development of multimedia and user interface design, the recommendation page has evolved to a multi-list style. Separately employing traditional list-level reranking methods for different lists overlooks the inter-list interactions and the effect of different page formats, thus yielding suboptimal reranking performance. Moreover, simply applying a shared network for all the lists fails to capture the commonalities and distinctions in user behaviors on different lists. To this end, we propose to draw a bird's-eye view of page-level reranking and design a novel Page-level Attentional Reranking (PAR) model. We introduce a hierarchical dual-side attention module to extract personalized intra- and inter-list interactions. A spatial-scaled attention network is devised to integrate the spatial relationship into pairwise item influences, which explicitly models the page format. The multi-gated mixture-of-experts module is further applied to capture the commonalities and differences of user behaviors between different lists. Extensive experiments on a public dataset and a proprietary dataset show that PAR significantly outperforms existing baseline models.
Long-Document Cross-Lingual Summarization
Cross-Lingual Summarization (CLS) aims at generating summaries in one language for the given documents in another language. CLS has attracted wide research attention due to its practical significance in the multi-lingual world. Though great contributions have been made, existing CLS works typically focus on short documents, such as news and guides. Different from these short texts, long documents such as academic articles usually discuss complicated subjects and consist of thousands of words, making them non-trivial to process and summarize. To promote CLS research on long documents, we construct Perseus, the first long-document CLS dataset which collects about 94K Chinese scientific documents paired with English summaries. The average length of documents in Perseus is more than 2000 tokens. As a preliminary study on long-document CLS, we build and evaluate various CLS baselines, including pipeline and end-to-end methods. Experimental results on Perseus show the superiority of the end-to-end baseline, which performs the best among all methods. Furthermore, to provide a deeper understanding, we manually analyze the model outputs and discuss specific challenges faced by current approaches. We hope that our work could benchmark long-document CLS and benefit future studies.
FineSum: Target-Oriented, Fine-Grained Opinion Summarization
Target-oriented opinion summarization is to profile a target by extracting user opinions from multiple related documents. Instead of simply mining opinion ratings on a target (e.g., a restaurant) or on multiple aspects (e.g., food, service) of a target, it is desirable to go deeper, to mine opinion on fine-grained sub-aspects (e.g., fish). However, it is expensive to obtain high-quality annotations at such fine-grained scale. This leads to our proposal of a new framework, FineSum, which advances the frontier of opinion analysis in three aspects: (1) minimal supervision, where no document-summary pairs are provided, only aspect names and a few aspect/sentiment keywords are available; (2) fine-grained opinion analysis, where sentiment analysis drills down to a specific subject or characteristic within each general aspect; and (3) phrase-based summarization, where short phrases are taken as basic units for summarization, and semantically coherent phrases are gathered to improve the consistency and comprehensiveness of summary. Given a large corpus with no annotation, FineSum first automatically identifies potential spans of opinion phrases, and further reduces the noise in identification results using aspect and sentiment classifiers. It then constructs multiple fine-grained opinion clusters under each aspect and sentiment. Each cluster expresses uniform opinions towards certain sub-aspects (e.g., "fish" in "food" aspect) or characteristics (e.g., "Mexican" in "food" aspect). To accomplish this, we train a spherical word embedding space to explicitly represent different aspects and sentiments. We then distill the knowledge from embedding to a contextualized phrase classifier, and perform clustering using the contextualized opinion-aware phrase embedding. Both automatic evaluations on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which make it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Specifically, to keep the dialog on track for professional interviews, we pre-train a knowledge selector module to extract information from resume in the job-resume matching. A dialog generator is also pre-trained with ungrounded dialogs, learning to generate fluent responses. Then, a decoding manager is finetuned to combine information from the two pre-trained modules to generate the interview question. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
Concept-Oriented Transformers for Visual Sentiment Analysis
In the richly multimedia Web, detecting sentiment signals expressed in images would support multiple applications, e.g., measuring customer satisfaction from online reviews, analyzing trends and opinions from social media. Given an image, visual sentiment analysis aims at recognizing positive or negative sentiment, and occasionally neutral sentiment as well. A nascent yet promising direction is Transformer-based models applied to image data, whereby Vision Transformer (ViT) establishes remarkable performance on large-scale vision benchmarks. In addition to investigating the fitness of ViT for visual sentiment analysis, we further incorporate concept orientation into the self-attention mechanism, which is the core component of Transformer. The proposed model captures the relationships between image features and specific concepts. We conduct extensive experiments on Visual Sentiment Ontology (VSO) and Yelp.com online review datasets, showing that not only does the proposed model significantly improve upon the base model ViT in detecting visual sentiment but it also outperforms previous visual sentiment analysis models with narrowly-defined orientations. Additional analyses yield insightful results and better understanding of the concept-oriented self-attention mechanism.
SESSION: Demo Session I
UnCommonSense in Action! Informative Negations for Commonsense Knowledge Bases
Knowledge bases about commonsense knowledge i.e., CSKBs, are crucial in applications such as search and question answering. Prominent CSKBs mostly focus on positive statements. In this paper we show that materializing important negations increases the usability of CSKBs. We present Uncommonsense, a web portal to explore informative negations about everyday concepts: (i) in a research-focused interface, users get a glimpse into results-per-steps of the methodology; (ii) in a trivia interface, users can browse fun negative trivia about concepts of their choice; and (iii) in a query interface, users can submit triple-pattern queries with explicit negated relations and compare results with significantly less relevant answers from the positive-only baseline. It can be accessed at:https://uncommonsense.mpi-inf.mpg.de/.
A Framework for Detecting Frauds from Extremely Few Labels
In this paper, we present a framework to deal with the fraud detection task with extremely few labeled frauds. We involve human intelligence in the loop in a labor-saving manner and introduce several ingenious designs to the model construction process. Namely, a rule mining module is introduced, and the learned rules will be refined with expert knowledge. The refined rules will be used to relabel the unlabeled samples and get the potential frauds. We further present a model to learn with the reliable frauds, the potential frauds, and the rest normal samples. Note that the label noise problem, class imbalance problem, and confirmation bias problem are all addressed with specific strategies when building the model. Experimental results are reported to demonstrate the effectiveness of the framework.
MMBench: The Match Making Benchmark
Video gaming has gained huge popularity over the last few decades. As reported, there are about 2.9 billion gamers globally. Among all genres, competitive games are one of the most popular ones.
Matchmaking is a core problem for competitive games, which determines the player satisfaction, hence influences the game success. Most matchmaking systems group the queuing players into opposing teams with similar skill levels. The key challenge is to accurately rate the players' skills based on their match performances. There has been an increasing amount of effort on developing such rating systems such as Elo, Glicko.
However, games with different game-plays might have different game modes, which might require an extensive amount of effort for rating system customization. Even though there are many rating system choices and various customization strategies, there is a clear lack of a systematic framework with which different rating systems can be analysed and compared against each other. Such a framework could help game developers to identify the bottlenecks of their matchmaking systems and enhance the performance of their matchmaking systems.
To bridge the gap, we present MMBench, the first benchmark framework for evaluating different rating systems. It serves as a fair means of comparison for different rating systems and enables a deeper understanding of different rating systems. In this paper, we will present how MMBench could benchmark the three major rating systems, Elo, Glicko, Trueskill in the battle modes of 1 vs 1, n vs n, battle royal and teamed battle royal over both real and synthetic datasets.
SoCraft: Advertiser-level Predictive Scoring for Creative Performance on Meta
In this technical demonstration, we present SoCraft, a framework to build an advertiser-level multimedia ad content scoring platform for Meta Ads. The system utilizes a multimodal deep neural architecture to score and evaluate advertised content on Meta using both high- and low-level features of its contextual data such as text, image, targeting, and ad settings. In this demo, we present two deep models, SoDeep and SoWide, and validate the effectiveness of SoCraft with a successful real-world case study in Singapore.
DisKeyword: Tweet Corpora Exploration for Keyword Selection
How to accelerate the search for relevant topical keywords within a tweet corpus? Computational social scientists conducting topical studies employ large, self-collected or crowdsourced social media datasets such as tweet corpora. Comprehensive sets of relevant keywords are often necessary to sample or analyze these data sources. However, naively skimming through thousands of keywords can quickly become a daunting task. In this study, we present a web-based application to simplify the search for relevant topical hashtags in a tweet corpus. DisKeyword allows users to grasp high-level trends in their dataset, while iteratively labeling keywords recommended based on their links to prior labeled hashtags. We open-source our code under the MIT license.
AgAsk: A Conversational Search Agent for Answering Agricultural Questions
While large amounts of potentially useful agricultural resources (journal articles, manuals, reports) are available, their value cannot be realised if they cannot be easily searched and presented to the agriculture users in a digestible form.AgAsk is a conversational search system for the agricultural domain, providing tailored answers to growers questions. AgAsk is underpinned by an efficient and effective neural passage ranking model fine-tuned on real world growers' questions. An adaptable, messaging-style user interface is deployed via the Telegram messaging platform, allowing users to ask natural language questions via text or voice, and receive short natural language answers as replies.
AgAsk is empirically evaluated on an agricultural passage retrieval test collection. The system provides a single entry point to access the information needed for better growing decisions. Much of the system is domain agnostic and would benefit other domains. AgAsk can be interacted via Telegram; further information about AgAsk, including codebases, instructions and demonstration videos can be accessed at https://ielab.io/publications/agask-agent.
Privacy Aware Experiments without Cookies
Consider two brands that want to jointly test alternate web experiences for their customers with an A/B test. Such collaborative tests are today enabled usingthird-party cookies, where each brand has information on the identity of visitors to another website, ensuring a consistent treatment experience. With the imminent elimination of third-party cookies, such A/B tests will become untenable. We propose a two-stage experimental design, where the two brands only need to agree on high-level aggregate parameters of the experiment to test the alternate experiences. Our design respects the privacy of customers. We propose an unbiased estimator of the Average Treatment Effect (ATE), and provide a way to use regression adjustment to improve this estimate. On real and simulated data, we show that the approach provides valid estimate of the ATE and is robust to the proportion of visitors overlapping across the brands. Our demonstration describes how a marketer can design such an experiment and analyze the results.
ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling
The power of artificial intelligence (AI) models originates with sophisticated model architecture as well as the sheer size of the model. These large-scale AI models impose new and challenging system requirements regarding scalability, reliability, and flexibility. One of the most promising solutions in the industry is to train these large-scale models on distributed deep-learning frameworks. With the power of all distributed computations, it is desired to achieve a training process with excellent scalability, elastic scheduling (flexibility), and fault tolerance (reliability). In this paper, we demonstrate the scalability, flexibility, and reliability of our open-source Elastic Deep Learning (ElasticDL) framework. Our ElasticDL utilizes an open-source system, i.e., Kubernetes, for automating deployment, scaling, and management of containerized application features to provide fault tolerance and support elastic scheduling for DL tasks.
PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals
We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily. We integrate visual-language models, pretrained by HuggingFace, an open-source API platform of deep learning technologies; however, it cannot be runnable without programming skills or deep learning understanding. Hence, our PiggyBack supports an easy-to-use browser-based user interface with several deep-learning visual language pretrained models for general users and domain experts. The PiggyBack includes the following benefits: Portability due to web-based and thus runs on almost any platform, A comprehensive data creation and processing technique, and ease of use on visual language pretrained models. The demo video can be found at https://youtu.be/iz44RZ1lF4s.
A Synthetic Search Session Generator for Task-Aware Information Seeking and Retrieval
For users working on a complex search task, it is common to address different goals at various stages of the task through query iterations. While addressing these goals, users go through different task states as well. Understanding these task states latent under users' interactions is crucial in identifying users' changing intents and search behaviors to simulate and achieve real-time adaptive search recommendations and retrievals. However, the availability of sizeable real-world web search logs is scarce due to various ethical and privacy concerns, thus often challenging to develop generalizable task-aware computation models. Furthermore, session logs with task state labels are rarer. For many researchers who lack the resources to directly and at scale collect data from users and conduct a time-consuming data annotation process, this becomes a considerable bottleneck to furthering their research. Synthetic search sessions have the potential to address this gap. This paper shares a parsimonious model to simulate synthetic web search sessions with task state information, which interactive information retrieval (IIR) and search personalization studies could utilize to develop and evaluate task-based search and retrieval systems.
UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems
We present an extensible user simulation toolkit to facilitate automatic evaluation of conversational recommender systems. It builds on an established agenda-based approach and extends it with several novel elements, including user satisfaction prediction, persona and context modeling, and conditional natural language generation. We showcase the toolkit with a pre-existing movie recommender system and demonstrate its ability to simulate dialogues that mimic real conversations, while requiring only a handful of manually annotated dialogues as training data.
Travel Bird: A Personalized Destination Recommender with TourBERT and Airbnb Experiences
We present Travel Bird, a novel personalized destination recommendation and exploration interface which allows its users to find their next tourist destination by describing their specific preferences in a narrative form. Unlike other solutions, Travel Bird is based on TourBERT, a novel NLP model we developed, specifically tailored to the tourism domain. Travel Bird creates a two-dimensional personalized destination exploration space from TourBERT embeddings of social media content and the users' textual description of the experience they are looking for. In this demo, we will showcase several use cases for Travel Bird, which are beneficial for consumers and destination management organizations.
SESSION: Demo Session II
CoQEx: Entity Counts Explained
For open-domain question answering, queries on entity counts, such ashow many languages are spoken in Indonesia, are challenging. Such queries can be answered through succinct contexts with counts:estimated 700 languages, and instances:Javanese and Sundanese. Answer candidates naturally give rise to a distribution, where count contexts denoting the queried entity counts and their semantic subgroups often coexist, while the instances ground the counts in their constituting entities. In this demo we showcase the CoQEx methodology (<u>Co</u>unt <u>Q</u>ueries <u>Ex</u>plained) [5,6], which aggregates and structures explanatory evidence across search snippets, for answering user queries related to entity counts . Given a entity count query, our system CoQEx retrieves search-snippets and provides the user with a distribution-aware prediction prediction, categorizes the count contexts into semantic groups and ranks instances grounding the counts, all in real-time. Our demo can be accessed athttps://nlcounqer.mpi-inf.mpg.de/.
Web of Conferences: A Conference Knowledge Graph
Academic conferences have been proven to be significant in facilitating academic activities. To promote information retrieval specific to academic conferences, building complete, systematic, and professional conference knowledge graphs is a crucial task. However, many related systems mainly focus on general knowledge of overall academic information or concentrate services on specific domains. Aiming at filling this gap, this work demonstrates a novel conference knowledge graph, namely Web of Conferences. The system accommodates detailed conference profiles, conference ranking lists, intelligent conference queries, and personalized conference recommendations. Web of Conferences supports detailed conference information retrieval while providing the ranking of conferences based on the most recent data. Conference queries in the system can be implemented via precise search or fuzzy search. Then, according to users' query conditions, personalized conference recommendations are available. Web of Conferences is demonstrated with a user-friendly visualization interface and can be served as a useful information retrieval system for researchers.
MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction
Boolean query construction is often critical for medical systematic review literature search. To create an effective Boolean query, systematic review researchers typically spend weeks coming up with effective query terms and combinations. One challenge to creating an effective systematic review Boolean query is the selection of effective MeSH Terms to include in the query. In our previous work, we created neural MeSH term suggestion methods and compared them to state-of-the-art MeSH term suggestion methods. We found neural MeSH term suggestion methods to be highly effective.
In this demonstration, we build upon our previous work by creating (1) a Web-based MeSH term suggestion prototype system that allows users to obtain suggestions from a number of underlying methods and (2) a Python library that implements ours and others' MeSH term suggestion methods and that is aimed at researchers who want to further investigate, create or deploy such type of methods. We describe the architecture of the web-based system and how to use it for the MeSH term suggestion task. For the Python library, we describe how the library can be used for advancing further research and experimentation, and we validate the results of the methods contained in the library on standard datasets. Our web-based prototype system is available at http://ielab-mesh-suggest.uqcloud.net, while our Python library is at https://github.com/ielab/meshsuggestlib.
Developing and Evaluating Graph Counterfactual Explanation with GRETEL
The black-box nature and the lack of interpretability detract from constant improvements in Graph Neural Networks (GNNs) performance in social network tasks like friendship prediction and community detection. Graph Counterfactual Explanation (GCE) methods aid in understanding the prediction of GNNs by generating counterfactual examples that promote trustworthiness, debiasing, and privacy in social networks. Alas, the literature on GCE lacks standardised definitions, explainers, datasets, and evaluation metrics. To bridge the gap between the performance and interpretability of GNNs in social networks, we discuss GRETEL, a unified framework for GCE methods development and evaluation. We demonstrate how GRETEL comes with fully extensible built-in components that allow users to define ad-hoc explainer methods, generate synthetic datasets, implement custom evaluation metrics, and integrate state-of-the-art prediction models.
DistriBayes: A Distributed Platform for Learning, Inference and Attribution on Large Scale Bayesian Network
To improve the marketing performance in the financial scenario, it is necessary to develop a trustworthy model to analyze and select promotion-sensitive customers. Bayesian Network (BN) is suitable for this task because of its interpretability and flexibility, but it usually suffers the exponentially growing computation complexity as the number of nodes grows. To tackle this problem, we present a comprehensive distributed platform named DistriBayes, which can efficiently learn, infer and attribute on a large-scale BN all-in-one platform. It implements several score-based structure learning methods, loopy belief propagation with backdoor adjustment for inference, and a carefully optimized search procedure for attribution. Leveraging the distributed cluster, DistriBayes can finish the learning and attribution on Bayesian Network with hundreds of nodes and millions of samples in hours.
SimSumIoT: A Platform for Simulating the Summarisation from Internet of Things
Summarising from the Web could be formed as a problem of multi-document Summarisaiton (MDS) from multiple sources. In contrast to the current MDS problem that involves working on benchmark datasets which provide well clustered set of documents, we envisage to build a pipeline for content Summarisaiton from the Web, but narrow down to the Social Internet of Things (SIoT) paradigm, starting at data collection from the IoT objects, then applying natural language processing techniques for grouping and summarising the data, to distributing summaries back to the IoT objects. In this paper, we present our simulation tool, SimSumIoT, that simulates the process of data sharing, receiving, clustering, and Summarisaiton. A Web-based interface is developed for this purpose allowing users to visualize the process through a set of interactions. The Web interface is accessible via http://simsumlot.tk.
AntTS: A Toolkit for Time Series Forecasting in Industrial Scenarios
Time series forecasting is an important ingredient in the intelligence of business and decision processes. In industrial scenarios, the time series of interest are mostly macroscopic time series that are aggregated from microscopic time series, e.g., the retail sales is aggregated from the sales of different goods, and that are also intervened by certain treatments on the microscopic individuals, e.g., issuing discount coupons on some goods to increase the retail sales. These characteristics are not considered in existing toolkits, which just focus on the "natural" time series forecasting that predicts the future value based on historical data, regardless of the impact of treatments. In this paper, we present AntTS, a time series toolkit paying more attention on the forecasting of the macroscopic time series with underlying microscopic time series and certain treatments, besides the "natural" time series forecasting. AntTS consists of three decoupled modules, namely Clustering module, Natural Forecasting module, and Effect module, which are utilized to study the homogeneous groups of microscopic individuals, the "natural" time series forecasting of homogeneous groups, and the treatment effect estimation of homogeneous groups. With the combinations of different modules, it can exploit the microscopic individuals and the interventions on them, to help the forecasting of macroscopic time series. We show that AntTS helps address many typical tasks in the industry.
"Just To See You Smile": SMILEY, a Voice-Guided <strike>GUY</strike> GAN
In this technical demonstration, we present SMILEY, a voice-guided virtual assistant. The system utilizes a deep neural architecture ContraCLIP to manipulate facial attributes using voice instructions, allowing for deeper speaker engagement and smoother customer experience when being used in the "virtual concierge" scenario. We validate the effectiveness of SMILEY and ContraCLIP via a successful real-world case study in Singapore and a large-scale quantitative evaluation.
Unsupervised Question Duplicate and Related Questions Detection in e-learning platforms
Online learning platforms provide diverse questions to gauge the learners' understanding of different concepts. The repository of questions has to be constantly updated to ensure a diverse pool of questions to conduct assessments for learners. However, it is impossible for the academician to manually skim through the large repository of questions to check for duplicates when onboarding new questions from external sources. Hence, we propose a toolQDup in this paper that can surface near-duplicate and semantically related questions without any supervised data. The proposed tool follows an unsupervised hybrid pipeline of statistical and neural approaches for incorporating different nuances in similarity for the task of question duplicate detection. We demonstrate thatQDup can detect near-duplicate questions and also suggest related questions for practice with remarkable accuracy and speed from a large repository of questions. The demo video of the tool can be found at https://www.youtube.com/watch?v=loh0_-7XLW4.
DOCoR: Document-level OpenIE with Coreference Resolution
Open Information Extraction (OpenIE) extracts relational fact tuples in the form of <subject, relation, object> from text. Most existing OpenIE solutions operate at sentence level and extract relational tuples solely from a sentence. However, many sentences exist as a part of paragraph or a document, where coreferencing is common. In this demonstration, we present a system which refines the semantic tuples generated by OpenIE with the aid of a coreference resolution tool. Specifically, all coreferential mentions across the entire document are identified and grouped into coreferential clusters. Objects and subjects in the extracted tuples from OpenIE which match any coreferential mentions are then resolved with a suitable representative term. In this way, our system is able to resolve both anaphoric and cataphoric references, to achieve Document-level OpenIE with Coreference Resolution (DOCoR). The demonstration video can be viewed at https://youtu.be/o9ZSWCBvlDs
A Semantic Search Framework for Similar Audit Issue Recommendation in Financial Industry
Audit issues summarize the findings during audit reviews and provide valuable insights of risks and control gaps in a financial institute. Despite the wide use of data analytics and NLP in financial services, due to the diverse coverage and lack of annotations, there are very few use cases that analyze audit issue writing and derive insights from it. In this paper, we propose a deep learning based semantic search framework to search, rank and recommend similar past issues based on new findings. We adopt a two-step approach. First, a TF-IDF based search algorithm and a Bi-Encoder are used to shortlist a set of issue candidates based on the input query. Then a Cross-Encoder will re-rank the candidates and provide the final recommendation. We will also demonstrate how the models are deployed and integrated with the existing workbench to benefit auditors in their daily work.
SESSION: Doctoral Consortium
Classification of Different Participating Entities in the Rise of Hateful Content in Social Media
Hateful content is a growing concern across different platforms, whether it is a moderated platform or an unmoderated platform. The public expression of hate speech encourages the devaluation of minority members. It has some consequences in the real world as well. In such a scenario, it is necessary to design AI systems that could detect such harmful entities/elements in online social media and take cautionary actions to mitigate the risk/harm they cause to society. The way individuals disseminate content on social media platforms also deviates. The content can be in the form of texts, images, videos, etc. Hence hateful content in all forms should be detected, and further actions should be taken to maintain the civility of the platform. We first introduced two published works addressing the challenges of detecting low-resource multilingual abusive speech and hateful user detection. Finally, we discuss our ongoing work on multimodal hateful content detection.
Generalizing Graph Neural Network across Graphs and Time
Graph-structured data widely exist in diverse real-world scenarios, analysis of these graphs can uncover valuable insights about their respective application domains. However, most previous works focused on learning node representation from a single fixed graph, while many real-world scenarios require representations to be quickly generated for unseen nodes, new edges, or entirely new graphs. This inductive ability is essential for high-throughtput machine learning systems. However, this inductive graph representation problem is quite difficult, compared to the transductive setting, for that generalizing to unseen nodes requires new subgraphs containing the new nodes to be aligned to the neural network trained already. Meanwhile, following a message passing framework, graphneural network (GNN) is an inductive and powerful graph representation tool. We further explore inductive GNN from more specific perspectives: (1) generalizing GNN across graphs, in which we tackle with the problem of semi-supervised node classification across graphs; (2) generalizing GNN across time, in which we mainly solve the problem of temporal link prediction; (3) generalizing GNN across tasks; (4) generalizing GNN across locations.
Graphs: Privacy and Generation through ML
Graphs are ubiquitous, which makes machine learning on graphs an important research area. While there are many aspects to this field, our research is focused primarily on two aspects of it. The first research question concerns privacy in graphs, where our work primarily focuses on preserving structural privacy in graphs. The second research question is about generating graphs. With applications in various fields such as drug discovery, designing novel proteins, etc., graph generation is emerging as an essential problem. This paper briefly describes the problems and the methodology to address them.
Data-Efficient Graph Learning Meets Ethical Challenges
Recommender systems have achieved great success in our daily life. In recent years, the ethical concerns of AI systems have gained lots of attention. At the same time, graph learning techniques are powerful in modelling the complex relations among users and items under recommender system applications. These graph learning- based methods are data hungry, which brought a significant data efficiency challenge. In this proposal, I introduce my PhD research from three aspects: 1) Efficient privacy-preserving recommendation for imbalanced data. 2) Efficient recommendation model training for Insufficient samples. 3) Explainability in the social recommendation. Challenges and solutions of the above research problems have been proposed in this proposal.
From Classic GNNs to Hyper-GNNs for Detecting Camouflaged Malicious Actors
Graph neural networks (GNNs), which extend deep learning models to graph-structured data, have achieved great success in many applications such as detecting malicious activities. However, GNN-based models are vulnerable to camouflage behavior of malicious actors, i.e., the performance of existing GNN-based models has been hindered significantly. In this research proposal, we follow two research directions to address this challenge. One direction focuses on enhancing the existing GNN-based models and enabling them to identify both camouflaged and non-camouflaged malicious actors. In this regard, we propose to explore an adaptive aggregation strategy, which empowers GNN-based models to handle camouflage behavior of fraudsters. The other research direction concentrates on leveraging hypergraph neural networks (hyper-GNNs) to learn nodes' representation for more effective identification of camouflaged malicious actors.
Efficient Graph Learning for Anomaly Detection Systems
Anomaly detection plays a significant role in preventing from detrimental effects of abnormalities. It brings many benefits in real-world sectors ranging from transportation, finance to cybersecurity. In reality, millions of data do not stand independently, but they might be connected to each other and form graph or network data. A more advanced technique, named graph anomaly detection, is required to model that data type. The current works of graph anomaly detection have achieved state-of-the-art performance compared to regular anomaly detection. However, most models ignore the efficiency aspect, leading to several problems like technical bottlenecks. This project mainly focuses on improving the efficiency aspect of graph anomaly detection while maintaining its performance.
Self-supervision and Controlling Techniques to Improve Counter Speech Generation
Hate speech is a challenging problem in today's online social media. One of the current solutions followed by different social media platforms is detecting hate speech using human-in-the-loop approaches. After detection, they moderate such hate speech by deleting the posts or suspending the users. While this approach can be a short-term solution for reducing the spread of hate, many researchers argue that it stifles freedom of expression. An alternate strategy that does not hamper freedom of expression is counterspeech. Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. This pipeline has two major challenges 1) How to improve the performance of generation without a large-scale dataset since building the dataset is costly 2) How to add control in the counter speech generation to make it more personalized. In this paper, we present our published and proposed research aimed at solving these two challenges.
Understanding the Effect of Outlier Items in E-commerce Ranking
Implicit feedback is an attractive source of training data in Learning to Rank (LTR). However, naively use of this data can produce unfair ranking policies originating from both exogenous and endogenous factors. Exogenous factors comes from biases in the training data, which can lead to rich-get-richer dynamics. Endogenous factors can result in ranking policies that do not allocate exposure among items in a fair way. Item exposure is a common components influencing both endogenous and exogenous factors which depends on not only position but also Inter-item dependencies. In this project, we focus on a specific case of these Inter-item dependencies which is the existence of an outlier in the list. We first define and formalize outlierness in ranking, then study the effects of this phenomenon on endogenous and exogenous factors. We further investigate the visual aspects of presentational features and their impact on item outlierness.
Knowledge-Augmented Methods for Natural Language Processing
Knowledge in NLP has been a rising trend especially after the advent of large-scale pre-trained models. Knowledge is critical to equip statistics-based models with common sense, logic and other external information. In this tutorial, we will introduce recent state-of-the-art works in applying knowledge in language understanding, language generation and commonsense reasoning.
Hate Speech: Detection, Mitigation and Beyond
Social media sites such as Twitter and Facebook have connected billions of people and given the opportunity to the users to share their ideas and opinions instantly. That being said, there are several negative consequences as well such as online harassment, trolling, cyber-bullying, fake news, and hate speech. Out of these, hate speech presents a unique challenge as it is deeply engraved into our society and is often linked with offline violence. Social media platforms rely on human moderators to identify hate speech and take necessary action. However, with the increase in online hate speech, these platforms are turning toward automated hate speech detection and mitigation systems. This shift brings several challenges to the plate, and hence, is an important avenue to explore for the computation social science community.
In this tutorial, we present an exposition of hate speech detection and mitigation in three steps. First, we describe the current state of research in the hate speech domain, focusing on different hate speech detection and mitigation systems that have developed over time. Next, we highlight the challenges that these systems might carry like bias and the lack of transparency. The final section concretizes the path ahead, providing clear guidelines for the community working in hate speech and related domains. We also outline the open challenges and research directions for interested researchers.
A Tutorial on Domain Generalization
With the availability of massive labeled training data, powerful machine learning models can be trained. However, the traditional I.I.D. assumption that the training and testing data should follow the same distribution is often violated in reality. While existing domain adaptation approaches can tackle domain shift, it relies on the target samples for training. Domain generalization is a promising technology that aims to train models with good generalization ability to unseen distributions. In this tutorial, we will present the recent advance of domain generalization. Specifically, we introduce the background, formulation, and theory behind this topic. Our primary focus is on the methodology, evaluation, and applications. We hope this tutorial can draw interest of the community and provide a thorough review of this area. Eventually, more robust systems can be built for responsible AI. All tutorial materials and updates can be found online at https://dgresearch.github.io/.
Trustworthy Algorithmic Ranking Systems
This tutorial aims at providing its audience an interdisciplinary overview about the topics of fairness and non-discrimination, diversity, and transparency as relevant dimensions of trustworthy AI systems, tailored to algorithmic ranking systems such as search engines and recommender systems. We will equip the mostly technical audience of WSDM with the necessary understanding of the social and ethical implications of their research and development on the one hand, and of recent ethical guidelines and regulatory frameworks addressing the aforementioned dimensions on the other hand. While the tutorial foremost takes a European perspective, starting from the concept of trustworthy AI and discussing EU regulation in this area currently in the implementation stages, we also consider related initiatives worldwide. Since ensuring non-discrimination, diversity, and transparency in retrieval and recommendation systems is an endeavor in which academic institutions and companies in different parts of the world should collaborate, this tutorial is relevant for researchers and practitioners interested in the ethical, social, and legal impact of their work. The tutorial, therefore, targets both academic scholars and practitioners around the globe, by reviewing recent research and providing practical examples addressing these particular trustworthiness aspects, and showcasing how new regulations affect the audience's daily work.
Proactive Conversational Agents
Conversational agents, or commonly known as dialogue systems, have gained escalating popularity in recent years. Their widespread applications support conversational interactions with users and accomplishing various tasks as personal assistants. However, one key weakness in existing conversational agents is that they only learn to passively answer user queries via training on pre-collected and manually-labeled data. Such passiveness makes the interaction modeling and system-building process relatively easier, but it largely hinders the possibility of being human-like hence lowering the user engagement level. In this tutorial, we introduce and discuss methods to equip conversational agents with the ability to interact with end users in a more proactive way. This three-hour tutorial is divided into three parts and includes two interactive exercises. It reviews and presents recent advancements on the topic, focusing on automatically expanding ontology space, actively driving conversation by asking questions or strategically shifting topics, and retrospectively conducting response quality control.
Preference-Based Offline Evaluation
A core step in production model research and development involves the offline evaluation of a system before production deployment. Traditional offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. These labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. This tutorial, intended for both researchers and practitioners, reviews early work in preference-based evaluation and covers recent developments in detail.
Natural and Artificial Dynamics in GNNs: A Tutorial
In the big data era, the relationship between entities becomes more complex. Therefore, graph (or network) data attracts increasing research attention for carrying complex relational information. For a myriad of graph mining/learning tasks, graph neural networks (GNNs) have been proven as effective tools for extracting informative node and graph representations, which empowers a broad range of applications such as recommendation, fraud detection, molecule design, and many more. However, real-world scenarios bring pragmatic challenges to GNNs. First, the input graphs are evolving, i.e., the graph structure and node features are time-dependent. Integrating temporal information into the GNNs to enhance their representation power requires additional ingenious designs. Second, the input graphs may be unreliable, noisy, and suboptimal for a variety of downstream graph mining/learning tasks. How could end-users deliberately modify the given graphs (e.g., graph topology and node features) to boost GNNs' utility (e.g., accuracy and robustness)? Inspired by the above two kinds of dynamics, in this tutorial, we focus on topics of natural dynamics and artificial dynamics in GNNs and introduce the related works systematically. After that, we point out some promising but under-explored research problems in the combination of these two dynamics. We hope this tutorial could be beneficial to researchers and practitioners in areas including data mining, machine learning, and general artificial intelligence.
Next-generation Challenges of Responsible Data Integration
Data integration has been extensively studied by the data management community and is a core task in the data pre-processing step of ML pipelines. When the integrated data is used for analysis and model training, responsible data science requires addressing concerns about data quality and bias. We present a tutorial on data integration and responsibility, highlighting the existing efforts in responsible data integration along with research opportunities and challenges. In this tutorial, we encourage the community to audit data integration tasks with responsibility measures and develop integration techniques that optimize the requirements of responsible data science. We focus on three critical aspects: (1) the requirements to be considered for evaluating and auditing data integration tasks for quality and bias; (2) the data integration tasks that elicit attention to data responsibility measures and methods to satisfy these requirements; and, (3) techniques, tasks, and open problems in data integration that help achieve data responsibility.
Data Democratisation with Deep Learning: The Anatomy of a Natural Language Data Interface
In the age of the Digital Revolution, almost all human activities, from industrial and business operations to medical and academic research, are reliant on the constant integration and utilisation of ever-increasing volumes of data. However, the explosive volume and complexity of data makes data querying and exploration challenging even for experts, and makes the need to democratise the access to data, even for non-technical users, all the more evident. It is time to lift all technical barriers, by empowering users to access relational databases through conversation. We consider 3 main research areas that a natural language data interface is based on: Text-to-SQL, SQL-to-Text, and Data-to-Text. The purpose of this tutorial is a deep dive into these areas, covering state-of-the-art techniques and models, and explaining how the progress in the deep learning field has led to impressive advancements. We will present benchmarks that sparked research and competition, and discuss open problems and research opportunities with one of the most important challenges being the integration of these 3 research areas into one conversational system.
AutoML for Deep Recommender Systems: Fundamentals and Advances
Recommender systems have become increasingly important in our daily lives since they play an important role in mitigating the information overload problem, especially in many user-oriented online services. Recommender systems aim to identify a set of items that best match users' explicit or implicit preferences, by utilizing the user and item interactions to improve the accuracy. With the fast advancement of deep neural networks (DNNs) in the past few decades, recommendation techniques have achieved promising performance. However, we still meet three inherent challenges to design deep recommender systems (DRS): 1) the majority of existing DRS are developed based on hand-crafted components, which requires ample expert knowledge recommender systems; 2) human error and bias can lead to suboptimal components, which reduces the recommendation effectiveness; 3) non-trivial time and engineering efforts are usually required to design the task-specific components in different recommendation scenarios.
In this tutorial, we aim to give a comprehensive survey on the recent progress of advanced Automated Machine Learning (AutoML) techniques for solving the above problems in deep recommender systems. More specifically, we will present feature selection, feature embedding search, feature interaction search, and whole DRS pipeline model training and comprehensive search for deep recommender systems. In this way, we expect academic researchers and industrial practitioners in related fields can get deep understanding and accurate insight into the spaces, stimulate more ideas and discussions, and promote developments of technologies in recommendations.
4th Crowd Science Workshop - CANDLE: Collaboration of Humans and Learning Algorithms for Data Labeling
Crowdsourcing has been used to produce impactful and large-scale datasets for Machine Learning and Artificial Intelligence (AI), such as ImageNET, SuperGLUE, etc. Since the rise of crowdsourcing in early 2000s, the AI community has been studying its computational, system design, and data-centric aspects at various angles. We welcome the studies on developing and enhancing of crowdworker-centric tools, that offer task matching, requester assessment, instruction validation, among other topics. We are also interested in exploring methods that leverage the integration of crowdworkers to improve the recognition and performance of the machine learning models. Thus, we invite studies that focus on shipping active learning techniques, methods for joint learning from noisy data and from crowds, novel approaches for crowd-computer interaction, repetitive task automation, and role separation between humans and machines. Moreover, we invite works on designing and applying such techniques in various domains, including e-commerce and medicine.
Integrity 2023: Integrity in Social Networks and Media
Integrity 2023 is the fourth edition of the successful Workshop on Integrity in Social Networks and Media, held in conjunction with the ACM Conference on Web Search and Data Mining (WSDM) in the past three years. The goal of the workshop is to bring together researchers and practitioners to discuss content and interaction integrity challenges in social networks and social media platforms. The event consists of a combination of invited talks by reputed members of the Integrity community from both academia and industry and peer-reviewed contributed talks and posters solicited via an open call-for-papers.
The 3rd International Workshop on Machine Learning on Graphs (MLoG)
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. Recently, machine learning techniques are widely developed and utilized to effectively tame graphs for discovering actionable patterns and harnessing them for advancing various graph-related computational tasks. Huge success has been achieved and numerous real-world applications have benefited from it. However, since in today's world, we are generating and gathering data in a much faster and more diverse way, real-world graphs are becoming increasingly large-scale and complex. More dedicated efforts are needed to propose more advanced machine learning techniques and properly deploy them for real-world applications in a scalable way. Thus, we organize The 3rd International Workshop on Machine Learning on Graphs (MLoG), held in conjunction with the 16th ACM Conference on Web Search and Data Mining (WSDM), which provides a venue to gather academia researchers and industry researchers/practitioners to present the recent progress on machine learning on graphs.
International Workshop on Learning with Knowledge Graphs: Construction, Embedding, and Reasoning
A knowledge graph (KG) consists of numerous triples, in which each triple, i.e., (head entity, relation, tail entity), denotes a real-world assertion. Many large-scale KGs have been developed, e.g., general-purpose KGs Freebase and YAGO. Also, lots of domain-specific KGs are emerging, e.g., COVID-19 KGs, biomedical KGs, and agricultural KGs. By embedding KGs into low-dimensional vectors, i.e., representations of entities and relations, we could integrate KGs into machine learning models and enhance the performance of many prediction tasks, including search, recommendations, and question answering. During the construction, refinement, embedding, and application of KGs, a variety of KG learning algorithms have been developed to handle challenges in various real-world scenarios. Moreover, graph neural networks have also brought new opportunities to KG learning. This workshop aims to engage with active researchers from KG communities, recommendation communities, natural language processing communities, and other communities, and deliver state-of-the-art research insights into the core challenges in KG learning.
WSDM 2023 Workshop on Interactive Recommender Systems
Interactive recommender systems have attracted increasingly research attentions from both academia and industry. This workshop is a half-day event, which provides a forum for researchers and practitioners to discuss recent research progress and novel research directions about interactive recommender systems. The program will include two keynotes and 6 to 8 research paper presentations. The objective of this workshop is to consolidate the recent technical progresses about interactive recommendation, which will be a promising research and development direction for future recommendation technologies. This workshop will attract the attention of researchers from both academia and industry. It aligns with WSDM's spirit of promoting the collaborations between academia and industry.
SESSION: Industry Day
Responsible AI for Trusted AI-powered Enterprise Platforms
With the rapidly growing AI market opportunities and the accelerated adoption of AI technologies for a wide range of real-world applications, responsible AI has attracted increasing attention in both academia and industries. In this talk, I will focus on the topics of responsible AI in the industry settings towards building trusted AI-powered enterprise platforms. I will share our efforts and experience of responsible AI for enterprise at Salesforce, from defining the principles to putting them into practice to build trust in AI. Finally, I will also address some emerging challenges and open issues of recent generative AI advances and call for actions of joint responsible AI efforts from academia, industries and governments.
Simulating Humans at Scale to Evaluate Voice Interfaces for TVs: the Round-Trip System at Comcast
Evaluating large-scale customer-facing voice interfaces involves a variety of challenges, such as data privacy, fairness or unintended bias, and the cost of human labor. Comcast's Xfinity Voice Remote is one such voice interface aimed at users looking to discover content on their TVs. The artificial intelligence (AI) behind the voice remote currently powers multiple voice interfaces, serving tens of millions of requests every day, from users across the globe.In this talk, we introduce a novel Round-Trip system we have built to evaluate the AI serving these voice interfaces in a semi-automated manner, providing a robust and cheap alternative to traditional quality assurance methods. We discuss five specific challenges we have encountered in Round-Trip and describe our solutions in detail.
Under the Hood of Social Media Advertising: How Do We use AI Responsibly for Advertising Targeting and Creative Evaluation
Digital Advertising is historically one of the most developed areas where Machine Learning and AI have been applied since its origination. From smart bidding to creative content generation and DCO, AI is well-demanded in the modern digital marketing industry and partially serves as a backbone of most of the state-of-the-art computational advertising systems, making them impossible for the AI tech and the programmatic systems to exist apart from one another. At the same time, given the drastic growth of the available AI technology nowadays, the issue of responsible AI utilization as well as the balance between the opportunity of deploying AI systems and the possible borderline etic and privacy-related consequences are still yet to be discussed comprehensively in both business and research communities. Particularly, an important issue of automatic User Profiling use in modern Programmatic systems like Meta Ads as well as the need for responsible application of the creative assessment models to fit into the business etic guidelines is yet to be described well. Therefore, in this talk, we are going to discuss the technology behind modern programmatic bidding and content scoring systems and the responsible application of AI by SoMin.ai to manage the Advertising targeting and Creative Validation process.
An Open-Source Suite of Causal AI Tools and Libraries
We propose to accelerate use-inspired basic research in causal AI through a suite of causal tools and libraries that simultaneously provides core causal AI functionality to practitioners and creates a platform for research advances to be rapidly deployed. In this presentation, we describe our contributions towards an open-source causal AI suite. We describe some of their applications, the lessons learned from their usage, and what is next.
Compliance Analyses of Australia's Online Household Appliances
Commercially sold electrical or gas products must comply with the safety standards imposed within a country and get registered and certified by a regulated body. However, with the increasing transition of businesses to e-commerce platforms, it becomes challenging to govern the compliance status of online products. This can increase the risk of purchasing non-compliant products which may be unsafe to use. Additionally, examining the compliance status before purchasing can be strenuous because the relevant compliance information can be ambiguous and not always directly available. Therefore, we collaborated with a regulated body from Australia, Energy Safe Victoria, and conducted compliance analyses for household appliances sold on multiple online platforms. A fully autonomous method shown in this public repository is also introduced to check the compliance status of any online product. In this talk, we discuss the compliance check process, which incorporates fuzzy logic for textual matching and a Convolutional Neural Network (CNN) model to classify the product listing based on the images listed. Subsequently, we studied the results with the business users and found that many online listings are non-compliant, signifying that online-shopping consumers are highly susceptible to buying unsafe products. We hope this talk can inspire more follow-up works that collaborate with regulated bodies to introduce a user-friendly compliance check platform that assists in educating consumers to purchase compliant products.
Considerations for Ethical Speech Recognition Datasets
Speech AI Technologies are largely trained on publicly available datasets or by the massive web-crawling of speech. In both cases, data acquisition focuses on minimizing collection effort, without necessarily taking the data subjects' protection or user needs into consideration. This results to models that are not robust when used on users who deviate from the dominant demographics in the training set, discriminating individuals having different dialects, accents, speaking styles, and disfluencies. In this talk, we use automatic speech recognition as a case study and examine the properties that ethical speech datasets should possess towards responsible AI applications. We showcase diversity issues, inclusion practices, and necessary considerations that can improve trained models, while facilitating model explainability and protecting users and data subjects. We argue for the legal & privacy protection of data subjects, targeted data sampling corresponding to user demographics & needs, appropriate meta data that ensure explainability & accountability in cases of model failure, and the sociotechnical & situated model design. We hope this talk can inspire researchers & practitioners to design and use more human-centric datasets in speech technologies and other domains, in ways that empower and respect users, while improving machine learning models' robustness and utility.
Incorporating Fairness in Large Scale NLU Systems
NLU models power several user facing experiences such as conversations agents and chat bots. Building NLU models typically consist of 3 stages: a) building or finetuning a pre-trained model b) distilling or fine-tuning the pre-trained model to build task specific models and, c) deploying the task-specific model to production. In this presentation, we will identify fairness considerations that can be incorporated in the aforementioned three stages in the life-cycle of NLU model building: (i) selection/building of a large scale language model, (ii) distillation/fine-tuning the large model into task specific model and, (iii) deployment of the task specific model. We will present select metrics that can be used to quantify fairness in NLU models and fairness enhancement techniques that can be deployed in each of these stages. Finally, we will share some recommendations to successfully implement fairness considerations when building an industrial scale NLU system.
Privacy in the Time of Language Models
Pretrained large language models (LLMs) have consistently shown state-of-the-art performance across multiple natural language processing (NLP) tasks. These models are of much interest for a variety of industrial applications that use NLP as a core component. However, LLMs have also been shown to memorize portions of their training data, which can contain private information. Therefore, when building and deploying LLMs, it is of value to apply privacy-preserving techniques that protect sensitive data.
In this talk, we discuss privacy measurement and preservation techniques for LLMs that can be applied in the context of industrial applications and present case studies of preliminary solutions. We discuss select strategies and metrics relevant for measuring memorization in LLMs that can, in turn, be used to measure privacy-risk in these models. We then discuss privacy-preservation techniques that can be applied at different points of the LLM training life-cycle; including our work on an algorithm for fine-tuning LLMs with improved privacy. In addition, we discuss our work on privacy-preserving solutions that can be applied to LLMs during inference and are feasible for use at run time.
Learning to Infer Product Attribute Values From Descriptive Texts and Images
Online marketplaces are able to offer a staggering array of products that no physical store can match. While this makes it more likely for customers to find what they want, in order for online providers to ensure a smooth and efficient user experience, they must maintain well-organized catalogs, which depends greatly on the availability of per-product attribute values such as color, material, brand, to name a few. Unfortunately, such information is often incomplete or even missing in practice, and therefore we have to resort to predictive models as well as other sources of information to impute missing attribute values.
In this talk we present the deep learning-based approach that we have developed at Rakuten Group to extract attribute values from product descriptive texts and images. Starting from pretrained architectures to encode textual and visual modalities, we discuss several refinements and improvements that we find necessary to achieve satisfactory performance and meet strict business requirements, namely improving recall while maintaining a high precision (>= 95%). Our methodology is driven by a systematic investigation into several practical research questions surrounding multimodality, which we revisit in this talk. At the heart of our multimodal architecture, is a new method to combine modalities inspired by empirical cross-modality comparisons. We present the latter component in details, point out one of its major limitations, namely exacerbating the issue of modality collapse, i.e., when the model forgets one modality, and describe our mitigation to this problem based on a principled regularization scheme.
We present various empirical results on both Rakuten data as well as public benchmark datasets, which provide evidence of the benefits of our approach compared to several strong baselines. We also share some insights to characterise the circumstances in which the proposed model offers the most significant improvements. We conclude this talk by criticising the current model and discussing possible future developments and improvements.
Our model is successfully deployed in Rakuten Ichiba - a Rakuten marketplace - and we believe that our investigation into multimodal attribute value extraction for e-commerce will benefit other researchers and practitioners alike embarking on similar journeys.
Recent Advances on Deep Learning based Knowledge Tracing
Knowledge tracing (KT) is the task of using students' historical learning interaction data to model their knowledge mastery over time so as to make predictions on their future interaction performance. Recently, remarkable progress has been made of using various deep learning techniques to solve the KT problem. However, the success behind deep learning based knowledge tracing (DLKT) approaches is still left somewhat unknown and proper measurement and analysis of these DLKT approaches remain a challenge.
In this talk, we will comprehensively review recent developments of applying state-of-the-art deep learning approaches in KT problems, with a focus on those real-world educational data. Beyond introducing the recent advances of various DLKT models, we will discuss how to guarantee valid comparisons across DLKT methods via thorough evaluations on several publicly available datasets. More specifically, we will talk about (1) KT related psychometric theories; (2) the general DLKT modeling framework that covers recently developed DLKT approaches from different categories; (3) the general DLKT benchmark that allows existing approaches comparable on public KT datasets; (4) the broad application of algorithmic assessment and personalized feedback. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use models, and how related models and techniques benefit real-world KT applications.
SESSION: Smart City Day
Social Public Health Infrastructure for a Smart City Citizen Patient: Advances and Opportunities for AI Driven Disruptive Innovation
Promoting health, preventing disease, and prolonging life are central to the success of any smart city initiative. Today, wireless communication, data infrastructure, and low-cost sensors such as lifestyle and activity trackers are making it increasingly possible for cities to collect, collate, and innovate on developing a smart infrastructure. Combining this with AI driven disruptions for human behavior change can fundamentally transform delivery of public health for the citizen patient. Most urban development government bodies consider such infrastructure to be a distributed ecosystem consisting of physical infrastructure, institutional infrastructure, social infrastructure and economic infrastructure. In this talk we will focus mostly on the social health infrastructure component and first showcase some of the recent initiatives created with the purpose of addressing health which is a key social goal, as a smart city goal. Specifically we will discuss how technical advances in IoT, recommendation systems, geospatial computing, and digital health therapeutics are creating a new future. Yet, such advances are not addressing the issues of making healthy behavior change sustainable with broad health equity for those that need it most. Overwhelming nature of siloed apps and digital health solutions which often leave the citizens overwhelmed and those most in need underserved. This talk highlights the advances and opportunities created when behavioral economics and public health combine with AI and cloud infrastructure to make smart public health initiatives personalized to each individual citizen patient.
SmartCityBus - A Platform for Smart Transportation Systems
With the growth of the Internet of Things (IoT), Smart(er) Cities have been a research goal of researchers, businesses and local authorities willing to adopt IoT technologies to improve their services. Among them, Smart Transportation [7,8], the integrated application of modern technologies and management strategies in transportation systems, refers to the adoption of new IoT solutions to improve urban mobility. These technologies aim to provide innovative solutions related to different modes of transport and traffic management and enable users to be better informed and make safer and 'smarter' use of transport networks. This talk presents SmartCityBus, a data-driven intelligent transportation system (ITS) whose main objective is to use online and offline data in order to provide accurate statistics and predictions and improve public transportation services in the short and medium/long term.
Metropolitan-scale Mobility Digital Twin
Mobility digital twin, which is a a virtual replica of the mobility in the physical world, is the key building block of modern smart city applications at a metropolitan scale, including traffic regulation, emergency management and epidemic control. To duplicate the mobility in the physical world and show the potential outcome based on either the current state or manipulated conditions, we are facing with three main challenges: 1) how to sense real-time human mobility at a large scale and assimilate different data sources to infer a dynamic city mobility state; 2) how to make an accurate prediction for the mobility replica that adapts with dynamic city mobility state; 3) how to simulate the mobility in response to different conditions. In this talk, we will present our recent studies, practices and perspectives on mobility digital twin in addressing these above challenges, with applications in the real-world scenarios with industrial connections.
Towards an Event-Aware Urban Mobility Prediction System
Today, thanks to the rapid developing mobile and sensor networks in IoT (Internet of Things) systems, spatio-temporal big data are being constantly generated. They have brought us a data-driven possibility to sense and understand crowd mobility on a city scale. A fundamental task towards the next-generation mobility services, such as Intelligent Transportation Systems (ITS), Mobility-as-a-Service (MaaS), is spatio-temporal predictive modeling of the geo-sensory signals. There is a recent line of research leveraging deep learning techniques to boost the forecasting performance on such tasks. While simulating the regularity of mobility behaviors (e.g., routines, periodicity) in a more sophisticated way, the existing studies ignore an important part of urban activities, i.e., events. Including holidays, extreme weathers, pandemic, accidents, various urban events happen from time to time and cause non-stationary phenomena, which by nature make the spatio-temporal forecasting task challenging. We thereby envision an event-aware urban mobility prediction model that is capable of fast adapting and making reliable predictions in different scenarios, which is crucial to decision making towards emergency response and urban resilience.
Student Behavior Pattern Mining and Analysis: Towards Smart Campuses
Understanding student behavior patterns is fundamental to building smart campuses. However, the diversity of student behavior and the complexity of educational data not only bring great obstacles to the relevant research, but also leads to unstable performance and low reliability of current student behavior analysis systems. The emergence of educational big data and the latest advances in deep learning and representation learning provide unprecedented opportunities to tackle the above problems. In this talk, we introduce how we mine and analyze student behavior patterns by overcoming the complexity of educational data. Specifically, we propose a series of algorithmic frameworks, which take advantage of network science, data mining, and machine learning to form a data-driven system for mining and analyzing student behavior patterns. Our research not only fills the gap in the field of student abnormal behavior warning and student status monitoring, but also provides insights into data-driven smart city construction.