SESSION: Keynote Talk Abstracts
Delphic Costs and Benefits in Web Search: A Utilitarian and Historical Analysis
We present a new framework to conceptualize and operationalize the total user experience of search, by studying the entirety of a search journey from an utilitarian point of view.
Web search engines are widely perceived as “free”. But search requires time and effort: in reality there are many intermingled non-monetary costs (e.g. time costs, cognitive costs, interactivity costs) and the benefits may be marred by various impairments, such as misunderstanding and misinformation. This characterization of costs and benefits appears to be inherent to the human search for information within the pursuit of some larger task: most of the costs and impairments can be identified in interactions with any web search engine, interactions with public libraries, and even in interactions with ancient oracles. To emphasize this innate connection, we call these costs and benefits “Delphic”, in contrast to explicitly financial costs and benefits.
Our main thesis is that users’ satisfaction with a search engine mostly depends on their experience of Delphic cost and benefits, in other words on their utility. The consumer utility is correlated with classic measures of search engine quality, such as ranking, precision, recall, etc., but is not completely determined by them. To argue our thesis, we catalog the Delphic costs and benefits and show how the development of web search over the last quarter century, from its classic Information Retrieval roots to the integration of Large Language Models and Generative AI, was driven to a great extent by the quest to decrease Delphic costs and increase Delphic benefits.
We hope that the Delphic costs framework will engender new ideas and new research for evaluating and improving the web experience for everyone.
This talk reflects joint work with Preston McAfee and Marc Najork.
What I Learned from Spending a Dozen Years in the Dark Web
Founded in 2011, Silk Road was the first online anonymous marketplace, in which buyers and sellers could transact with anonymity guarantees far superior to those available in online or offline alternatives, thanks to the innovative use of cryptocurrencies and network anonymization. Business on Silk Road, primarily involving narcotics trafficking, was brisk and before long competitors appeared. After Silk Road was taken down by law enforcement, a dynamic ecosystem of online anonymous marketplaces emerged. That ecosystem is highly active, to this day, and has been surprisingly resilient to multiple law enforcement take down operations as well as ”exit scams,” in which the operators of a marketplace abruptly abscond with any money left on the platform.
In this talk, I describe insights we have gained from more than twelve years of active measurements of the online anonymous market ecosystem [1,2,3,4,5,6]. I first highlight the scientific challenges in collecting such data at scale, discuss how overall revenues rapidly grew to hundreds of millions of dollars per year, and describe the leading types of commerce taking place – primarily narcotics, but also cybercrime commoditization. Second, I present several analyses of vendors, ranging from our efforts to match a priori disparate handles to unique individuals, to predicting whether a vendor will be successful in the future. Finally, I introduce some of the unique data we could access, namely backend data from police seizures, and show how we used it to validate our measurements.
While online anonymous marketplaces have recently seen their influence dip a bit – possibly due to a combination of constant infighting and relentless police activity – the insights we gained from studying them for so long help considerably demystify the “dark web,” and more generally, online crime ecosystems, which turn out to be very economically rational environments.
The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
Large Language Models (LLMs) have demonstrated strong capabilities in comprehending and generating human language, as well as emerging abilities like reasoning and using tools. These advancements have been revolutionizing techniques in every front, including the development of personal assistants. However, their inherent limitations such as lack of factuality and hallucinations make LLMs less suitable for creating knowledgeable and trustworthy assistants.
In this talk, we describe our journey in building a knowledgeable AI assistant by harnessing LLM techniques. We start with a comprehensive set of experiments designed to answer the questions of \em how reliable are LLMs on answering factual questions and \em how the performance differs across different types of factual knowledge. Subsequently, we constructed a \em federated Retrieval-Augmented Generation (RAG) system that integrates external information from both the web and knowledge graphs in text generation. This system supports conversation functionality for the Ray-ban Meta smart glasses, providing trustworthy information on real-time topics like stocks and sports, and information on torso-to-tail entities such as local restaurants. Additionally, we are exploring the potential of external knowledge to facilitate multi-modal Q&A. We will share our techniques, our findings, and the path forward in this talk.
Unlocking Human Curiosity
Human Curiosity has always been boundless. Yet for millenia, access to information limited our ability to explore that curiosity. The advent of the web transformed the information landscape, but too often information remained out of reach or required too much effort. Limitations in query understanding, the corpus of content, and information fragmentation continued to create substantial hurdles.
Today, we are early in another profound transformation. Rapid innovations in AI are painting a future where these barriers are crumbling. Yet with this shift, we must also solve a set of critical challenges around trust.
In this talk, I will discuss how AI advancements are reshaping how we access and understand information and associated technical, product and policy challenges. I will explore how advancements in natural language processing, multimodal and cross-language content understanding, and generative AI are breaking down the barriers users face in expressing their questions and easily comprehending results. I will also share how progress on challenges in content safety, authenticity, AI-generated content, bias, and information literacy will be needed to maintain the user trust needed to truly capitalize on the moment.
Unveiling AI-Driven Collective Action for a Worker-Centric Future
Collective action by gig knowledge workers is a potent method for enhancing labor conditions on platforms like Upwork, Amazon Mechanical Turk, and Toloka. However, this type of collective action is still rare today. Existing systems for supporting collective action are inadequate for workers to identify and understand their different workplace problems, plan effective solutions, and put the solutions into action. This talk will discuss how with my research lab we are creating worker-centric AI enhanced technologies that enable collective action among gig knowledge workers. Building solid AI enhanced technologies to enable gig worker collective action will pave the way for a fair and ethical gig economy-one with fair wages, humane working conditions, and increased job security. I will discuss how my proposed approach involves first integrating ”sousveillance,” a concept by Foucault, into the technologies. Sousveillance involves individuals or groups using surveillance tools to monitor and record those in positions of power. In this case, the technologies enable gig workers to monitor their workplace and their algorithmic bosses, giving them access to their own workplace data for the first time. This facilitates the first stage of collective action: problem identification. I will then discuss how we combine this data with Large-Language-Models (LLMs) and social theories to create intelligent assistants that guide workers to complete collective action via sensemaking and solution implementation.
The talk will present a set of case studies to showcase this vision of designing data driven AI technologies to power gig worker collective action. In particular, I will present the systems: 1) GigSousveillance which allows workers to monitor and collect their own job-related data, facilitating quantification of workplace problems; 2) GigSense equips workers with an AI assistant that facilitates sensemaking of their work problems, helping workers to strategically devise solutions to their challenges; 3) GigAction is an AI assistant that guides workers to implement their proposed solutions. I will discuss how we are designing and implementing these systems by adopting a participatory design approach with workers, while also conducting experiments and longitudinal deployments in the real world. I conclude by presenting a research agenda for transforming and rethinking the role of A.I. in our workplaces; and researching effective socio-technical solutions in favor of a worker-centric future and countering techno-authoritarianism.
SESSION: Full Research Papers
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM Interactions
CQA systems aim to create interactive search systems that effectively retrieve information by interacting with users. To replicate human-to-human conversations, existing work uses human annotators to play the roles of the questioner (student) and the answerer (teacher). Despite its effectiveness, challenges exist as human annotation is time-consuming, inconsistent, and not scalable. To address this issue and investigate the applicability of LLM in CQA simulation, we propose a simulation framework that employs zero-shot learner LLM for simulating teacher–student interactions. Our framework involves two LLMs interacting on a specific topic, with the first LLM acting as a student, generating questions to explore a given search topic. The second LLM plays the role of a teacher by answering questions and is equipped with additional information, including a text on the given topic. We implement both the student and teacher by zero-shot prompting the GPT-4 model. To assess the effectiveness of LLMs in simulating CQA interactions and understand the disparities between LLM- and human-generated conversations, we evaluate the simulated data from various perspectives. We begin by evaluating the teacher’s performance through both automatic and human assessment. Next, we evaluate the performance of the student, analyzing and comparing the disparities between questions generated by the LLM and those generated by humans. Furthermore, we conduct extensive analyses to thoroughly examine the LLM performance by benchmarking state-of-the-art reading comprehension models on both datasets. Our results reveal that the teacher LLM generates lengthier answers that tend to be more accurate and complete. The student LLM generates more diverse questions, covering more aspects of a given topic.
IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification
Language models such as Bidirectional Encoder Representations from Transformers (BERT) have been very effective in various Natural Language Processing (NLP) and text mining tasks including text classification. However, some tasks still pose challenges for these models, including text classification with limited labels. This can result in a cold-start problem. Although some approaches have attempted to address this problem through single-stage clustering as an intermediate training step coupled with a pre-trained language model, which generates pseudo-labels to improve classification, these methods are often error-prone due to the limitations of the clustering algorithms. To overcome this, we have developed a novel two-stage intermediate clustering with subsequent fine-tuning that models the pseudo-labels reliably, resulting in reduced prediction errors. The key novelty in our model, IDoFew, is that the two-stage clustering coupled with two different clustering algorithms helps exploit the advantages of the complementary algorithms that reduce the errors in generating reliable pseudo-labels for fine-tuning. Our approach has shown significant improvements compared to strong comparative models.
LabelCraft: Empowering Short Video Recommendations with Automated Label Crafting
Short video recommendations often face limitations due to the quality of user feedback, which may not accurately depict user interests. To tackle this challenge, a new task has emerged: generating more dependable labels from original feedback. Existing label generation methods rely on manual rules, demanding substantial human effort and potentially misaligning with the desired objectives of the platform. To transcend these constraints, we introduce LabelCraft, a novel automated label generation method explicitly optimizing pivotal operational metrics for platform success. By formulating label generation as a higher-level optimization problem above recommender model optimization, LabelCraft introduces a trainable labeling model for automatic label mechanism modeling. Through meta-learning techniques, LabelCraft effectively addresses the bi-level optimization hurdle posed by the recommender and labeling models, enabling the automatic acquisition of intricate label generation mechanisms. Extensive experiments on real-world datasets corroborate LabelCraft’s excellence across varied operational metrics, encompassing usage time, user engagement, and retention. Codes are available at https://github.com/baiyimeng/LabelCraft.
MAD: Multi-Scale Anomaly Detection in Link Streams
Given an arbitrary group of computers, how to identify abnormal changes in their communication pattern? How to assess if the absence of some communications is normal or due to a failure? How to distinguish local from global events when communication data are extremely sparse and volatile? Existing approaches for anomaly detection in interaction streams, focusing on edge, nodes or graphs, lack flexibility to monitor arbitrary communication topologies. Moreover, they rely on structural features that are not adapted to highly sparse settings. In this work, we introduce MAD, a novel Multi-scale Anomaly Detection algorithm that (i) allows to query for the normality/abnormality state of an arbitrary group of observed/non-observed communications at a given time; and (ii) handles the highly sparse and uncertain nature of interaction data through a scoring method that is based on a novel probabilistic and multi-scale analysis of sub-graphs. In particular, MAD is (a) flexible: it can assess if any time-stamped subgraph is anomalous, making edge, node and graph anomalies particular instances; (b) interpretable: its multi-scale analysis allows to characterize the scope and nature of the anomalies; (c) efficient: given historical data of length N and M observed/non-observed communications to analyze, MAD produces an anomaly score in O (NM); and (d) effective: it significantly outperforms state-of-the-art alternatives tailored for edge, node or graph anomalies.
Ranking with Long-Term Constraints
The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead.
LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection
As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-the-art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce sampling bias. At the same time, our experiments reveal that after finetuning on Twitter bot detection task, pretrained language models achieve competitive performance while do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LM serves as input features for the GNN, enabling LMBot to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference with graph knowledge, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which also shows strong performance. Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to existing graph-based Twitter bot detection methods.
To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Sequential Recommenders
Recent studies suggest that the existing neural models have difficulty handling repeated items in sequential recommendation tasks. However, our understanding of this difficulty is still limited. In this study, we substantially advance this field by identifying a major source of the problem: the single hidden state embedding and static item embeddings in the output softmax layer. Specifically, the similarity structure of the global item embedding in the softmax layer sometimes forces the single hidden state embedding to be close to new items when copying is a better choice, while sometimes forcing the hidden state to be close to the items from the input inappropriately. To alleviate the problem, we adapt the recently-proposed softmax alternatives such as softmax-CPR to sequential recommendation tasks and demonstrate that the new softmax architectures unleash the capability of the neural encoder on learning when to copy and when to exclude the items from the input sequence. By only making some simple modifications on the output softmax layer for SASRec and GRU4Rec, softmax-CPR achieves consistent improvement in 12 datasets. With almost the same model size, our best method not only improves the average NDCG@10 of GRU4Rec in 5 datasets with duplicated items by 10% (4%-17% individually) but also improves 7 datasets without duplicated items by 24% (8%-39%)!
PEFA: Parameter-Free Adapters for Large-scale Embedding-based Retrieval Models
Embedding-based Retrieval Models (ERMs) have emerged as a promising framework for large-scale text retrieval problems due to powerful large language models. Nevertheless, fine-tuning ERMs to reach state-of-the-art results can be expensive due to the extreme scale of data as well as the complexity of multi-stages pipelines (e.g., pre-training, fine-tuning, distillation). In this work, we propose the PEFA framework, namely ParamEter-Free Adapters, for fast tuning of ERMs without any backward pass in the optimization. At index building stage, PEFA equips the ERM with a non-parametric k-nearest neighbor (kNN) component. At inference stage, PEFA performs a convex combination of two scoring functions, one from the ERM and the other from the kNN. Based on the neighborhood definition, PEFA framework induces two realizations, namely PEFA-XL (i.e., extra large) using double ANN indices and PEFA-XS (i.e., extra small) using a single ANN index. Empirically, PEFA achieves significant improvement on two retrieval applications. For document retrieval, regarding Recall@100 metric, PEFA improves not only pre-trained ERMs on Trivia-QA by an average of 13.2%, but also fine-tuned ERMs on NQ-320K by an average of 5.5%, respectively. For product search, PEFA improves the Recall@100 of the fine-tuned ERMs by an average of 5.3% and 14.5%, for PEFA-XS and PEFA-XL, respectively. Our code is available at https://github.com/amzn/pecos/tree/mainline/examples/pefa-wsdm24.
Empathetic Response Generation with Relation-aware Commonsense Knowledge
The development of AI in mental health is a growing field with potential global impact. Machine agents need to perceive users’ mental states and respond empathically. Since mental states are often latent and implicit, building such chatbots requires both knowledge learning and knowledge utilization. Our work contributes to this by developing a chatbot that aims to recognize and empathetically respond to users’ mental states. We introduce a Conditional Variational Autoencoders (CVAE)-based model that utilizes relation-aware commonsense knowledge to generate responses. This model, while not a replacement for professional mental health support, demonstrates promise in offering informative and empathetic interactions in a controlled environment. On the dataset EmpatheticDialogues, we compare with several SOTA methods and empirically validate the effectiveness of our approach on response informativeness and empathy exhibition. Detailed analysis is also given to demonstrate the learning capability as well as model interpretability. Our code is accessible at http://github.com/ChangyuChen347/COMET-VAE.
Professional Network Matters: Connections Empower Person-Job Fit
Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions. While existing works leverage historical or contextual information, they often disregard a crucial aspect: job seekers’ social relationships in professional networks. This paper emphasizes the importance of incorporating professional networks into the Person-Job Fit model. Our innovative approach consists of two stages: (1) defining a Workplace Heterogeneous Information Network (WHIN) to capture heterogeneous knowledge, including professional connections and pre-training representations of various entities using a heterogeneous graph neural network; (2) designing a Contextual Social Attention Graph Neural Network (CSAGNN) that supplements users’ missing information with professional connections’ contextual information. We introduce a job-specific attention mechanism in CSAGNN to handle noisy professional networks, leveraging pre-trained entity representations from WHIN. We demonstrate the effectiveness of our approach through experimental evaluations conducted across three real-world recruitment datasets from LinkedIn, showing superior performance compared to baseline models.
Towards Mitigating Dimensional Collapse of Representations in Collaborative Filtering
Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to use contrastive loss to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the dimensional collapse issue, where user/item embeddings only span a low-dimension subspace of the entire feature space. This suppresses other dimensional information and weakens the distinguishability of embeddings. Here we propose a non-contrastive learning objective, named nCL, which explicitly mitigates dimensional collapse of representations in collaborative filtering. Our nCL aims to achieve geometric properties of Alignment and Compactness on the embedding space. In particular, the alignment tries to push together representations of positive-related user-item pairs, while compactness tends to find the optimal coding length of user/item embeddings, subject to a given distortion. More importantly, our nCL does not require data augmentation nor negative sampling during training, making it scalable to large datasets compared to contrastive learning methods. Experimental results demonstrate the superiority of our nCL.
TemporalMed: Advancing Medical Dialogues with Time-Aware Responses in Large Language Models
Medical dialogue models predominantly emphasize generating coherent and clinically accurate responses. However, in many clinical scenarios, time plays a pivotal role, often dictating subsequent patient management and interventions. Recognizing the latent importance of temporal dynamics, this paper introduces a novel dimension to medical dialogues: timestamps. We advocate that the integration of time-sensitive directives can profoundly impact medical advice, using an illustrative example of post-surgery care with and without timestamps. Our contributions are three-fold: Firstly, we highlight the intrinsic significance of timestamps in medical conversations, marking a paradigm shift in dialogue modeling. Secondly, we present an innovative dataset and framework explicitly tailored for time-stamped medical dialogues, facilitating the model to not only provide medical counsel but also chronologically outline care regimens. Lastly, empirical evaluations indicate our method’s proficiency in time-stamped tasks and reveal an uptick in performance in broader medical Q&A domains. Through our endeavors, we aspire to set new benchmarks in patient-centric and time-sensitive medical dialogue systems.
Exploiting Duality in Open Information Extraction with Predicate Prompt
Open information extraction (OpenIE) aims to extract the schema-free triplets in the form of (subject, predicate, object) from a given sentence. Compared with general information extraction (IE), OpenIE poses more challenges for the IE models, especially when multiple complicated triplets exist in a sentence. To extract these complicated triplets more effectively, in this paper we propose a novel generative OpenIE model, namely DualOIE, which achieves a dual task at the same time as extracting some triplets from the sentence, i.e., converting the triplets into the sentence. Such dual task encourages the model to correctly recognize the structure of the given sentence and thus is helpful to extract all potential triplets from the sentence. Specifically, DualOIE extracts the triplets in two steps: 1) first extracting a sequence of all potential predicates, 2) then using the predicate sequence as a prompt to induce the generation of triplets. Our experiments on two benchmarks and our dataset constructed from Meituan demonstrate that DualOIE achieves the best performance among the state-of-the-art baselines. Furthermore, the online A/B test on Meituan platform shows that 0.93% improvement of QV-CTR and 0.56% improvement of UV-CTR have been obtained when the triplets extracted by DualOIE were leveraged in Meituan’s search system.
Customized and Robust Deep Neural Network Watermarking
As the excellent performance of deep neural networks (DNNs) enhances a wide spectrum of applications, the protection of intellectual property (IP) of DNNs receives increasing attention recently, and DNN watermarking approaches are thus proposed for ownership verification to avoid potential misuses or thefts of DNN models. However, we observe that existing DNN watermark methods suffer from two major weaknesses: i) Incomplete protection to advanced watermark removal attacks, such as fine-tune attack with large learning rates, re-train after pruning, and most importantly, the distillation attack; ii) Limited customization ability, where multiple watermarked models cannot be uniquely identified, especially after removal attacks. To address these critical issues, we propose two new DNN watermarking approaches, Unified Soft-label Perturbation (USP), which provides robust watermark to detect model thefts, and Customized Soft-label Perturbation (CSP), which is able to embed a different watermark in each copy of the DNN model to enable customized watermarking. Experimental results show that our proposed USP and CSP resist all the watermark removal attacks, especially for the distillation attack, and the proposed CSP achieves very promising watermark customization ability, significantly outperforming the other state-of-the-art baselines.
Overlapping and Robust Edge-Colored Clustering in Hypergraphs
A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address both of these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results.
CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learn data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatility. We introduce CroSSL (Cross-modal SSL), which puts forward two novel concepts: masking intermediate embeddings produced by modality-specific encoders, and their aggregation into a global embedding through a cross-modal aggregator CroSSL allows for handling missing modalities and end-to-end cross-modal earning without requiring prior data preprocessing for handling missing inputs or negative-pair sampling for contrastive learning. We evaluate our method on a wide range of data, including motion sensors such as accelerometers or gyroscopes and biosignals (heart rate, electroencephalograms, electromyograms, electrooculograms, and electrodermal). Overall, CroSSL outperforms previous SSL and supervised benchmarks using minimal labeled data, and also sheds light on how latent masking can improve cross-modal learning.
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization
Large language models (LLMs) have achieved great success in general domains of natural language processing. In this paper, we bring LLMs to the realm of geoscience with the objective of advancing research and applications in this field. To this end, we present the first-ever LLM in geoscience, K2, alongside a suite of resources developed to further promote LLM research within geoscience. For instance, we have curated the first geoscience instruction tuning dataset, GeoSignal, which aims to align LLM responses to geoscience-related user queries. Additionally, we have established the first geoscience benchmark, GeoBench, to evaluate LLMs in the context of geoscience. In this work, we experiment with a complete recipe to adapt a pre-trained general-domain LLM to the geoscience domain. Specifically, we further train the LLaMA-7B model on 5.5B tokens of geoscience text corpus, including over 1 million pieces of geoscience literature, and utilize GeoSignal’s supervised data to fine-tune the model. Moreover, we share a protocol that can efficiently gather domain-specific data and construct domain-supervised data, even in situations where manpower is scarce. Meanwhile, we equip K2 with the abilities of using tools to be a naive geoscience aide. Experiments conducted on the GeoBench demonstrate the effectiveness of our approach and datasets on geoscience knowledge understanding and utilization.We open-source all the training data and K2 model checkpoints at https://github.com/davendw49/k2
CL4DIV: A Contrastive Learning Framework for Search Result Diversification
Search result diversification aims to provide a diversified document ranking list so as to cover as many intents as possible and satisfy the various information needs of different users. Existing approaches usually represented documents by pretrained embeddings (such as doc2vec and Glove). These document representations cannot adequately represent the document’s content and are hard to capture the intrinsic user’s intent coverage of the given query. Moreover, the limited number of labeled data for search result diversification exacerbates the difficulty of obtaining more efficient document representations. To alleviate these problems and learn more effective document representations, we propose a Contrastive Learning framework for search result DIVersification (CL4DIV). Specifically, we design three contrastive learning tasks from the perspective of subtopics, documents, and candidate document sequences, which correspond to three essential elements in search result diversification. These training tasks are employed to pretrain the document encoder and the document sequence encoder, which are used in the diversified ranking model. Experimental results show that øurs significantly outperforms all existing diversification models. Further analysis demonstrates that our method has wide applicability and can also be used to improve several existing methods.
TTC-QuAli: A Text-Table-Chart Dataset for Multimodal Quantity Alignment
In modern documents, numerical information is often presented using multimodal formats such as text, tables, and charts. However, the heterogeneity of these sources poses a challenge for machines attempting to jointly read and understand the numerical semantics conveyed through text, tables, and charts. In this paper, we introduce a multimodal dataset called Text-Table-Chart Quantity Alignment (TTC-QuAli). This dataset is designed to facilitate a new task that involves linking related quantities across text, tables, and charts. TTC-QuAli is a comprehensive dataset that contains 4,498 quantities in text, aligned with 1,086 chart images and 1,503 tables from real-world statistical reports. It is the first dataset to provide high-quality annotations for linking quantities across multiple modalities, and it includes challenging composite (aggregated/calculated) quantity linking. To address the challenge of bridging representation gaps between different modalities and capturing their shared contextual semantic meaning, we introduce ConTTC, a novel transformer-based cross-modal contrastive learning architecture. This is the first architecture to jointly model text, tables, and charts, and contrastive learning is employed for multimodal quantity linking towards unified representation learning. Our experiments demonstrate that TTC-QuAli presents a significant challenge for existing baselines and serves as a valuable benchmark for future research. Experiment results show that ConTTC significantly outperforms all baseline methods.
Guardian: Guarding against Gradient Leakage with Provable Defense for Federated Learning
Federated learning is a privacy-focused learning paradigm, which trains a global model with gradients uploaded from multiple participants, circumventing explicit exposure of private data. However, previous research of gradient leakage attacks suggests that gradients alone are sufficient to reconstruct private data, rendering the privacy protection mechanism of federated learning unreliable. Existing defenses commonly craft transformed gradients based on ground-truth gradients to obfuscate the attacks, but often are less capable of maintaining good model performance together with satisfactory privacy protection. In this paper, we propose a novel yet effective defense framework named guarding against gradient leakage (Guardian) that produces transformed gradients by jointly optimizing two theoretically-derived metrics associated with gradients for performance maintenance and privacy protection. In this way, the transformed gradients produced via Guardian can achieve minimal privacy leakage in theory with the given performance maintenance level. Moreover, we design an ingenious initialization strategy for faster generation of transformed gradients to enhance the practicality of Guardian in real-world applications, while demonstrating theoretical convergence of Guardian to the performance of the global model. Extensive experiments on various tasks show that, without sacrificing much accuracy, Guardian can effectively defend state-of-the-art gradient leakage attacks, compared with the slight effects of baseline defense approaches.
Contextual MAB Oriented Embedding Denoising for Sequential Recommendation
Deep neural networks now have become the de-facto standard for sequential recommendation. In the existing techniques, an embedding vector is assigned for each item, encoding all the characteristics of the latter in latent space. Then, the recommendation is transferred to devising a similarity metric to recommend user’s next behavior. Here, we consider each dimension of an embedding vector as a (latent) feature. Though effective, it is unknown which feature carries what semantics toward the item. Actually, in reality, this merit is highly preferable since a specific group of features could induce a particular relation among the items while the others are in vain. Unfortunately, the previous treatment overlooks the feature semantic learning at such a fine-grained level. When each item contains multiple latent aspects, which however is prevalent in real-world, the relations between items are very complex. The existing solutions are easy to fail on better recommendation performance. It is necessary to disentangle the item embeddings and extract credible features in a context-aware manner.
To address this issue, in this work, we present a novel Co ntextual M AB based E mbedding D enoising model~(COMED for short) to adaptively identify relevant dimension-level features for a better recommendation. Specifically, COMED formulates the embedding denoising task as a Contextual Multi-armed Bandit problem. For each feature of the item embedding, we assign a two-armed neural bandit to determine whether the constituent semantics should be preserved, rendering the whole process as embedding denoising. By aggregating the denoised embeddings as contextual information, a reward function deduced by the similarity between the historical interaction sequence and the target item is further designed to approximate the maximum expected payoffs of bandits for efficient learning. Considering the possible inefficiency of training the serial operating mechanism, we also design a swift learning strategy to accelerate the co-guidance between the renovated sequential embedding and the parallel actions of neural bandits for a better recommendation. Comprehensive trials conducted on four widely recognized benchmarks substantiate the efficiency and efficacy of our framework.
Exploring Adapter-based Transfer Learning for Recommender Systems: Empirical Studies and Practical Insights
Adapters, a plug-in neural network module with some tunable parameters, have emerged as a parameter-efficient transfer learning technique for adapting pre-trained models to downstream tasks, especially for natural language processing (NLP) and computer vision (CV) fields. Meanwhile, learning recommendation models directly from raw item modality features — e.g., texts of NLP and images of CV — can enable effective and transferable recommender systems (called TransRec). In view of this, a natural question arises:can adapter-based learning techniques achieve parameter-efficient TransRec with good performance?
To this end, we perform empirical studies to address several key sub-questions. First, we ask whether the adapter-based TransRec performs comparably to TransRec based on standard full-parameter fine-tuning? does it hold for recommendation with different item modalities, e.g., textual RS and visual RS. If yes, we benchmark these existing adapters, which have been shown to be effective in NLP and CV tasks, in item recommendation tasks. Third, we carefully study several key factors for the adapter-based TransRec in terms of where and how to insert these adapters? Finally, we look at the effects of adapter-based TransRec by either scaling up its source training data or scaling down its target training data. Our paper provides key insights and practical guidance on unified & transferable recommendation — a less studied recommendation scenario. We release our codes and other materials at https://github.com/westlake-repl/Adapter4Rec/.
DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting
Subgraph counting is the problem of counting the occurrences of a given query graph in a large target graph. Large-scale subgraph counting is useful in various domains, such as motif analysis for social network and loop counting for money laundering detection. Recently, to address the exponential runtime complexity of scalable subgraph counting, neural methods are proposed. However, existing approaches fall short in three aspects. Firstly, the subgraph counts vary from zero to millions for different graphs, posing a much larger challenge than regular graph regression tasks. Secondly, current scalable graph neural networks have limited expressive power and fail to efficiently distinguish graphs for count prediction. Furthermore, existing neural approaches cannot predict query occurrence positions.
We introduce DeSCo, a scalable neural deep subgraph counting pipeline, designed to accurately predict both the count and occurrence position of queries on target graphs post single training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs, greatly reducing the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately count in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137× improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity. Our open-source project is at https://github.com/fuvty/DeSCo.
PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation
To help merchants/customers to provide/access a variety of services through miniapps, online service platforms have occupied a critical position in the effective content delivery, in which how to recommend items in the new domain launched by the service provider for customers has become more urgent. However, the non-negligible gap between the source and diversified target domains poses a considerable challenge to cross-domain recommendation systems, which often leads to performance bottlenecks in industrial settings. While entity graphs have the potential to serve as a bridge between domains, rudimentary utilization still fail to distill useful knowledge and even induce the negative transfer issue. To this end, we propose PEACE, a Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation. For domain gap bridging, PEACE is built upon a multi-interest and entity-oriented pre-training architecture which could not only benefit the learning of generalized knowledge in a multi-granularity manner, but also help leverage more structural information in the entity graph. Then, we bring the prototype learning into the pre-training over source domains, so that representations of users and items are greatly improved by the contrastive prototype learning module and the prototype enhanced attention mechanism for adaptive knowledge utilization. To ease the pressure of online serving, PEACE is deployed in a lightweight manner, and significant performance improvements are observed in both online and offline environments.
CausalMMM: Learning Causal Structure for Marketing Mix Modeling
In online advertising, marketing mix modeling (MMM) is employed to predict the gross merchandise volume (GMV) of brand shops and help decision-makers to adjust the budget allocation of various advertising channels. Traditional MMM methods leveraging regression techniques can fail in handling the complexity of marketing. Although some efforts try to encode the causal structures for better prediction, they have the strict restriction that causal structures are prior-known and unchangeable. In this paper, we define a new causal MMM problem that automatically discovers the interpretable causal structures from data and yields better GMV predictions. To achieve causal MMM, two essential challenges should be addressed: (1) Causal Heterogeneity. The causal structures of different kinds of shops vary a lot. (2) Marketing Response Patterns. Various marketing response patterns i.e., carryover effect and shape effect, have been validated in practice. We argue that causal MMM needs dynamically discover specific causal structures for different shops and the predictions should comply with the prior known marketing response patterns. Thus, we propose CausalMMM that integrates Granger causality in a variational inference framework to measure the causal relationships between different channels and predict the GMV with the regularization of both temporal and saturation marketing response patterns. Extensive experiments show that CausalMMM can not only achieve superior performance of causal structure learning on synthetic datasets with improvements of 5.7%\sim 7.1%, but also enhance the GMV prediction results on a representative E-commerce platform.
Calibration-compatible Listwise Distillation of Privileged Features for CTR Prediction
In machine learning systems, privileged features refer to the features that are available during offline training but inaccessible for online serving. Previous studies have recognized the importance of privileged features and explored ways to tackle online-offline discrepancies. A typical practice is privileged features distillation (PFD): train a teacher model using all features (including privileged ones) and then distill the knowledge from the teacher model using a student model (excluding the privileged features), which is then employed for online serving. In practice, the pointwise cross-entropy loss is often adopted for PFD. However, this loss is insufficient to distill the ranking ability for CTR prediction. First, it does not consider the non-i.i.d. characteristic of the data distribution, i.e., other items on the same page significantly impact the click probability of the candidate item. Second, it fails to consider the relative item order ranked by the teacher model’s predictions, which is essential to distill the ranking ability. To address these issues, we first extend the pointwise-based PFD to the listwise-based PFD. We then define the calibration-compatible property of distillation loss and show that commonly used listwise losses do not satisfy this property when employed as distillation loss, thus compromising the model’s calibration ability, which is another important measure for CTR prediction. To tackle this dilemma, we propose Calibration-compatible LIstwise Distillation (CLID), which employs carefully-designed listwise distillation loss to achieve better ranking ability than the pointwise-based PFD while preserving the model’s calibration ability. We theoretically prove it is calibration-compatible. Extensive experiments on public datasets and a production dataset collected from the display advertising system of Alibaba further demonstrate the effectiveness of CLID.
Motif-based Prompt Learning for Universal Cross-domain Recommendation
Cross-Domain Recommendation (CDR) stands as a pivotal technology addressing issues of data sparsity and cold start by transferring general knowledge from the source to the target domain. However, existing CDR models suffer limitations in adaptability across various scenarios due to their inherent complexity. To tackle this challenge, recent advancements introduce universal CDR models that leverage shared embeddings to capture general knowledge across domains and transfer it through “Multi-task Learning” or “Pre-train, Fine-tune” paradigms. However, these models often overlook the broader structural topology that spans domains and fail to align training objectives, potentially leading to negative transfer. To address these issues, we propose a motif-based prompt learning framework, MOP, which introducesmotif-based shared embeddings to encapsulate generalized domain knowledge, catering to both intra-domain and inter-domain CDR tasks. Specifically, we devise three typical motifs: butterfly, triangle, and random walk, and encode them through a Motif-based Encoder to obtain motif-based shared embeddings. Moreover, we train MOP under the “Pre-training & Prompt Tuning” paradigm. By unifying pre-training and recommendation tasks as a common motif-based similarity learning task and integrating adaptable prompt parameters to guide the model in downstream recommendation tasks, MOP excels in transferring domain knowledge effectively. Experimental results on four distinct CDR tasks demonstrate the effectiveness of MOP than the state-of-the-art models.
User Behavior Enriched Temporal Knowledge Graphs for Sequential Recommendation
Knowledge Graphs (KGs) enhance recommendations by providing external connectivity between items. However, there is limited research on distilling relevant knowledge in sequential recommendation, where item connections can change over time. To address this, we introduce the Temporal Knowledge Graph (TKG), which incorporates such dynamic features of user behaviors into the original KG while emphasizing sequential relationships. The TKG captures both patterns of entity dynamics (nodes) and structural dynamics (edges). Considering real-world applications with large-scale and rapidly evolving user behavior patterns, we propose an efficient two-phase framework called TKG-SRec, which strengthens Sequential Recommendation with Temporal KGs. In the first phase, we learn dynamic entity embeddings using our novel Knowledge Evolution Network (KEN) that brings together pretrained static knowledge with evolving temporal knowledge. In the second stage, downstream sequential recommender models utilize these time-specific dynamic entity embeddings with compatible neural backbones like GRUs, Transformers, and MLPs. From our extensive experiments over four datasets, TKG-SRec outperforms the current state-of-the-art by a statistically significant 5% on average. Detailed analysis validates that such filtered temporal knowledge better adapts entity embedding for sequential recommendation. In summary, TKG-SRec provides an effective and efficient approach.
User Consented Federated Recommender System Against Personalized Attribute Inference Attack
Recommender systems can be privacy-sensitive. To protect users’ private historical interactions, federated learning has been proposed in distributed learning for user representations. Using federated recommender (FedRec) systems, users can train a shared recommendation model on local devices and prevent raw data transmissions and collections. However, the recommendation model learned by a common FedRec may still be vulnerable to private information leakage risks, particularly attribute inference attacks, which means that the attacker can easily infer users’ personal attributes from the learned model. Additionally, traditional FedRecs seldom consider the diverse privacy preference of users, leading to difficulties in balancing the recommendation utility and privacy preservation. Consequently, FedRecs may suffer from unnecessary recommendation performance loss due to over-protection and private information leakage simultaneously. In this work, we propose a novel user-consented federated recommendation system (UC-FedRec) to flexibly satisfy the different privacy needs of users by paying a minimum recommendation accuracy price. UC-FedRec allows users to self-define their privacy preferences to meet various demands and makes recommendations with user consent. Experiments conducted on different real-world datasets demonstrate that our framework is more efficient and flexible compared to baselines. Our code is available at https://github.com/HKUST-KnowComp/UC-FedRec.
SCAD: Subspace Clustering based Adversarial Detector
Adversarial examples pose significant challenges for Natural Language Processing (NLP) model robustness, often causing notable performance degradation. While various detection methods have been proposed with the aim of differentiating clean and adversarial inputs, they often require fine-tuning with ample data, which is problematic for low-resource scenarios. To alleviate this issue, a Subspace Clustering based Adversarial Detector (termed SCAD) is proposed in this paper, leveraging a union of subspaces to model the clean data distribution. Specifically, SCAD estimates feature distribution across semantic subspaces, assigning unseen examples to the nearest one for effective discrimination. The construction of semantic subspaces does not require many observations and hence ideal for the low-resource setting.
The proposed algorithm achieves detection results better than or competitive with previous state-of-the-arts on a combination of three well-known text classification benchmarks and four attacking methods. Further empirical analysis suggests that SCAD effectively mitigates the low-resource setting where clean training data is limit.
From Second to First: Mixed Censored Multi-Task Learning for Winning Price Prediction
A transformation from second-price auctions (SPA) to first-price auctions (FPA) has been observed in online advertising. The consequential coexistence of mixed FPA and SPA auction types has further led to the problem of mixed censorship, making bid landscape forecasting, the prerequisite for bid shading, more difficult. Our key insight is that the winning price (WP) under SPA can be effectively transferred to FPA scenarios if they share similar user groups, advertisers, and bidding environments. The full utilization of winning price under mixed censorship can effectively alleviate the FPA censorship problem and improve the performance of winning price prediction (aka. bid landscape forecasting). In this work, we propose a Multi-task Mixed Censorship Predictor (MMCP) that utilizes multi-task learning (MTL) to leverage the WP under SPA as supervised information for FPA. A Double-gate Mixture-of-Experts architecture has been proposed to alleviate the negative transfer problem of multi-task learning in our context. Furthermore, several auxiliary modules including the first-second mapping module and adaptive censorship loss function have been introduced to integrate MTL and winning price prediction. Extensive experiments on two real-world datasets demonstrate the superior performance of the proposed MMCP compared with other state-of-the-art FPA models under various performance metrics. The implementation of the code is available on github (https://github.com/Currycurrycurry/MMCP/).
Incomplete Graph Learning via Attribute-Structure Decoupled Variational Auto-Encoder
Graph Neural Networks (GNNs) conventionally operate under the assumption that node attributes are entirely observable. Their performance notably deteriorates when confronted with incomplete graphs due to the inherent message-passing mechanisms. Current solutions either employ classic imputation techniques or adapt GNNs to tolerate missed attributes. However, their ability to generalize is impeded especially when dealing with high rates of missing attributes. To address this, we harness the representations of the essential views on graphs, attributes and structures, into a common shared latent space, ensuring robust tolerance even at high missing rates. Our proposed neural model, named ASD-VAE, parameterizes such space via a coupled-and-decoupled learning procedure, reminiscent of brain cognitive processes and multimodal fusion. Initially, ASD-VAE separately encodes attributes and structures, generating representations for each view. A shared latent space is then learned by maximizing the likelihood of the joint distribution of different view representations through coupling. Then, the shared latent space is decoupled into separate views, and the reconstruction loss of each view is calculated. Finally, the missing values of attributes are imputed from this learned latent space. In this way, the model offers enhanced resilience against skewed and biased distributions typified by missing information and subsequently brings benefits to downstream graph machine-learning tasks. Extensive experiments conducted on four typical real-world incomplete graph datasets demonstrate the superior performance of ASD-VAE against the state-of-the-art
DiffKG: Knowledge Graph Diffusion Model for Recommendation
Knowledge Graphs (KGs) have emerged as invaluable resources for enriching recommendation systems by providing a wealth of factual information and capturing semantic relationships among items. Leveraging KGs can significantly enhance recommendation performance. However, not all relations within a KG are equally relevant or beneficial for the target recommendation task. In fact, certain item-entity connections may introduce noise or lack informative value, thus potentially misleading our understanding of user preferences. To bridge this research gap, we propose a novel knowledge graph diffusion model for recommendation, referred to as DiffKG. Our framework integrates a generative diffusion model with a data augmentation paradigm, enabling robust knowledge graph representation learning. This integration facilitates a better alignment between knowledge-aware item semantics and collaborative relation modeling. Moreover, we introduce a collaborative knowledge graph convolution mechanism that incorporates collaborative signals reflecting user-item interaction patterns, guiding the knowledge graph diffusion process. We conduct extensive experiments on three publicly available datasets, consistently demonstrating the superiority of our DiffKG compared to various competitive baselines. We provide the source code repository of our proposed DiffKG model at the following link: https://github.com/HKUDS/DiffKG
Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation
Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting ChatGPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations significantly outperform those with standard training from gold QA pairs only.
MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation
In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in order to accurately capture users’ preferences for items. To this end, we point out following two limitations of existing GCN-based multimedia recommender systems: (L1) although multimodal features of interacted items by a user can reveal her preferences on items, existing methods utilize GCN designed to focus only on capturing collaborative signals, resulting in insufficient reflection of the multimodal features in the final user/item embeddings; (L2) although a user decides whether to prefer the target item by considering its multimodal features, existing methods represent her as only a single embedding regardless of the target item’s multimodal features and then utilize her embedding to predict her preference for the target item. To address the above issues, we propose a novel multimedia recommender system, named MONET, composed of following two core ideas: modality-embracing GCN (MeGCN) and target-aware attention. Through extensive experiments using four real-world datasets, we demonstrate i) the significant superiority of MONET over seven state-of-the-art competitors (up to 30.32% higher accuracy in terms of recall@20, compared to the best competitor) and ii) the effectiveness of the two core ideas in MONET. All MONET codes are available at https://github.com/Kimyungi/MONET.
C²DR: Robust Cross-Domain Recommendation based on Causal Disentanglement
Cross-domain recommendation aims to leverage heterogeneous information to transfers knowledge from a data-sufficient domain (source domain) to a data-scarce domain (target domain). Existing approaches mainly focus on learning single-domain user preferences and then employ a transferring module to obtain cross-domain user preferences, but ignore the modeling of users’ domain specific preferences on items. We argue that incorporating domain-specific preferences from the source domain will introduce irrelevant information that fails to the target domain. Additionally, directly combining domain-shared and domain-specific information may hinder the target domain’s performance. To this end, we propose C^2DR, a novel approach that disentangles domain-shared and domain-specific preferences from a causal perspective. Specifically, we formulate a causal graph to capture the critical causal relationships based on the underlying recommendation process, explicitly identifying domain-shared and domain-specific information as causal irrelevant variables. Then, we introduce disentanglement regularization terms to learn distinct representations of the causal variables that obey the independence constraints in the causal graph. Remarkably, our proposed method enables effective intervention and transfer of domain-shared information, thereby improving the robustness of the recommendation model. We evaluate the efficacy of C^2DR through extensive experiments on three real-world datasets, demonstrating significant improvements over state-of-the-art baselines.
Likelihood-Based Methods Improve Parameter Estimation in Opinion Dynamics Models
We show that a maximum likelihood approach for parameter estimation in agent-based models (ABMs) of opinion dynamics outperforms the typical simulation-based approach. Simulation-based approaches simulate the model repeatedly in search of a set of parameters that generates data similar enough to the observed one. In contrast, likelihood-based approaches derive a likelihood function that connects the unknown parameters to the observed data in a statistically principled way. We compare these two approaches on the well-known bounded-confidence model of opinion dynamics.
We do so on three realistic scenarios of increasing complexity depending on data availability: (i) fully observed opinions and interactions, (ii) partially observed interactions, (iii) observed interactions with noisy proxies of the opinions. To realize the likelihood-based approach, we first cast the model into a probabilistic generative guise that supports a proper data likelihood. Then, we describe the three scenarios via probabilistic graphical models and show the nuances that go into translating the model. Finally, we implement such models in an automatic differentiation framework, thus enabling easy and efficient maximum likelihood estimation via gradient descent. These likelihood-based estimates are up to 4× more accurate and require up to \timeratio× less computational time.
Multi-Intent Attribute-Aware Text Matching in Searching
Text matching systems have become a fundamental service in most Searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the location mentioned in the query, which represent condensed key information that is helpful for matching. However, most of the existing works downplay the effectiveness of attributes by integrating them into text representations as supplementary information. Hence, in this work, we focus on exploring the relationship between the attributes from two sides. Since attributes from two ends are often not aligned in terms of number and type, we propose to exploit the benefit of attributes by multiple-intent modeling. The intents extracted from attributes summarize the diverse needs of queries and provide rich content of items, which are more refined and abstract, and can be aligned for paired inputs. Concretely, we propose a multi-intent attribute-aware matching model (MIM), which consists of three main components: attribute-aware encoder, multi-intent modeling, and intent-aware matching. In the attribute-aware encoder, the text and attributes are weighted and processed through a scaled attention mechanism with regard to the attributes’ importance. Afterward, the multi-intent modeling extracts intents from two ends and aligns them. Herein, we come up with a distribution loss to ensure the learned intents are diverse but concentrated, and a kullback-leibler divergence loss that aligns the learned intents. Finally, in the intent-aware matching, the intents are evaluated by a self-supervised masking task, and then incorporated to output the final matching result. Extensive experiments on three real-world datasets from different matching scenarios show that MIM significantly outperforms state-of-the-art matching baselines. MIM is also tested by online A/B test, which brings significant improvements over three business metrics in query rewriting and query-item relevance tasks compared with the online baseline in Alipay App.
Text-Video Retrieval via Multi-Modal Hypergraph Networks
Text-video retrieval is a challenging task that aims to identify relevant videos given textual queries. Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content. Previous works primarily focus on aligning the query and the video by finely aggregating word-frame matching signals. Inspired by the human cognitive process of modularly judging the relevance between text and video, the judgment needs high-order matching signal due to the consecutive and complex nature of video contents. In this paper, we propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit, and the video chunks are segmented into distinct clips from videos. We formulate the chunk-level matching as n-ary correlations modeling between words of the query and frames of the video and introduce a multi-modal hypergraph for n-ary correlation modeling. By representing textual units and video frames as nodes and using hyperedges to depict their relationships, a multi-modal hypergraph is constructed. In this way, the query and the video can be aligned in a high-order semantic space. In addition, to enhance the model’s generalization ability, the extracted features are fed into a variational inference component for computation, obtaining the variational representation under the Gaussian distribution. The incorporation of hypergraphs and variational inference allows our model to capture complex, n-ary interactions among textual and visual contents. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the text-video retrieval task.
CDRNP: Cross-Domain Recommendation to Cold-Start Users via Neural Process
Cross-domain recommendation (CDR) has been proven as a promising way to tackle the user cold-start problem, which aims to make recommendations for users in the target domain by transferring the user preference derived from the source domain. Traditional CDR studies follow the embedding and mapping (EMCDR) paradigm, which transfers user representations from the source to target domain by learning a user-shared mapping function, neglecting the user-specific preference. Recent CDR studies attempt to learn user-specific mapping functions in meta-learning paradigm, which regards each user’s CDR as an individual task, but neglects the preference correlations among users, limiting the beneficial information for user representations. Moreover, both of the paradigms neglect the explicit user-item interactions from both domains during the mapping process. To address the above issues, this paper proposes a novel CDR framework with neural process (NP), termed as CDRNP. Particularly, it develops the meta-learning paradigm to leverage user-specific preference, and further introduces a stochastic process by NP to capture the preference correlations among the overlapping and cold-start users, thus generating more powerful mapping functions by mapping the user-specific preference and common preference correlations to a predictive probability distribution. In addition, we also introduce a preference remainer to enhance the common preference from the overlapping users, and finally devises an adaptive conditional decoder with preference modulation to make prediction for cold-start users with items in the target domain. Experimental results demonstrate that CDRNP outperforms previous SOTA methods in three real-world CDR scenarios.
Global Heterogeneous Graph and Target Interest Denoising for Multi-behavior Sequential Recommendation
Multi-behavior sequential recommendation (MBSR) predicts a user’s next item of interest based on their interaction history across different behavior types. Although existing studies have proposed capturing the correlation between different types of behavior, two important challenges have not been explored: i) Dealing with heterogeneous item transitions (both global and local perspectives). ii) Mitigating the issue of noise that arises from the incorporation of auxiliary behaviors. To address these issues, we propose a novel solution, Global Heterogeneous Graph and Target Interest Denoising for Multi-behavior Sequential Recommendation (GHTID). In particular, we view the transitions between behavior types of items as different relationships and propose two heterogeneous graphs. By considering the relationship between items under different behavioral types of transformations, we propose two heterogeneous graph convolution modules and explicitly learn heterogeneous item transitions. Moreover, we utilize two attention networks to integrate long-term and short-term interests associated with the target behavior to alleviate the noisy interference of auxiliary behaviors. Extensive experiments on four real-world datasets demonstrate that our method outperforms other state-of-the-art methods.
Inverse Learning with Extremely Sparse Feedback for Recommendation
Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback from users, hence introducing noises in modeling training. Existing approaches on de-noising recommendation mainly focus on positive instances while ignoring the noise in a large amount of sampled negative feedback. In this paper, we propose a meta-learning method to annotate the unlabeled data from loss and gradient perspectives, which considers the noises in both positive and negative instances. Specifically, we first propose anInverse Dual Loss (IDL) to boost the true label learning and prevent the false label learning. Then we further propose anInverse Gradient (IG) method to explore the correct updating gradient and adjust the updating based on meta-learning. Finally, we conduct extensive experiments on both benchmark and industrial datasets where our proposed method can significantly improve AUC by 9.25% against state-of-the-art methods. Further analysis verifies the proposed inverse learning framework is model-agnostic and can improve a variety of recommendation backbones. The source code, along with the best hyper-parameter settings, is available at this link: https://github.com/Guanyu-Lin/InverseLearning.
Mixed Attention Network for Cross-domain Sequential Recommendation
In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems. In this paper, we propose a M ixed A ttention N etwork (MAN) with local and global attention modules to extract the domain-specific and cross-domain information. Firstly, we propose a local/global encoding layer to capture the domain-specific/cross-domain sequential pattern. Then we propose a mixed attention layer with item similarity attention, sequence-fusion attention, and group-prototype attention to capture the local/global item similarity, fuse the local/global item sequence, and extract the user groups across different domains, respectively. Finally, we propose a local/global prediction layer to further evolve and combine the domain-specific and cross-domain interests. Experimental results on two real-world datasets (each with two domains) demonstrate the superiority of our proposed model. Further study also illustrates that our proposed method and components are model-agnostic and effective, respectively. The code and data are available at https://github.com/Guanyu-Lin/MAN.
Multi-Sequence Attentive User Representation Learning for Side-information Integrated Sequential Recommendation
Side-information integrated sequential recommendation incorporates supplementary information to alleviate the issue of data sparsity. The state-of-the-art works mainly leverage some side information to improve the attention calculation to learn user representation more accurately. However, there are still some limitations to be addressed in this topic. Most of them merely learn the user representation at the item level and overlook the association of the item sequence and the side-information sequences when calculating the attentions, which results in the incomprehensive learning of user representation. Some of them learn the user representations at both the item and side-information levels, but they still face the problem of insufficient optimization of multiple user representations. To address these limitations, we propose a novel model, i.e., Multi-Sequence Sequential Recommender (MSSR), which learns the user’s multiple representations from diverse sequences. Specifically, we design a multi-sequence integrated attention layer to learn more attentive pairs than the existing works and adaptively fuse these pairs to learn user representation. Moreover, our user representation alignment module constructs the self-supervised signals to optimize the representations. Subsequently, they are further refined by our side information predictor during training. For item prediction, our MSSR extra considers the side information of the candidate item, enabling a comprehensive measurement of the user’s preferences. Extensive experiments on four public datasets show that our MSSR outperforms eleven state-of-the-art baselines. Visualization and case study also demonstrate the rationality and interpretability of our MSSR.
Pre-trained Recommender Systems: A Causal Debiasing Perspective
Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data).
However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named \model. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.
MultiFS: Automated Multi-Scenario Feature Selection in Deep Recommender Systems
Multi-scenario recommender systems (MSRSs) have been increasingly used in real-world industrial platforms for their excellent advantages in mitigating data sparsity and reducing maintenance costs. However, conventional MSRSs usually use all relevant features indiscriminately and ignore that different kinds of features have varying importance under different scenarios, which may cause confusion and performance degradation. In addition, existing feature selection methods for deep recommender systems may lack the exploration of scenario relations. In this paper, we propose a novel automated multi-scenario feature selection (MultiFS) framework to bridge this gap, which is able to consider scenario relations and utilize a hierarchical gating mechanism to select features for each scenario. Specifically, MultiFS first efficiently obtains feature importance across all the scenarios through a scenario-shared gate. Then, some scenario-specific gate aims to identify feature importance to individual scenarios from a subset of the former with lower importance. Subsequently, MultiFS imposes constraints on the two gates to make the learning mechanism more feasible and combines the two to select exclusive features for different scenarios. We evaluate MultiFS and demonstrate its ability to enhance the multi-scenario model performance through experiments over two public multi-scenario benchmarks.
Capturing Temporal Node Evolution via Self-supervised Learning: A New Perspective on Dynamic Graph Learning
\beginabstract Dynamic graphs play an important role in many fields like social relationship analysis, recommender systems and medical science, as graphs evolve over time. It is fundamental to capture the evolution patterns for dynamic graphs. Existing works mostly focus on constraining the temporal smoothness between neighbor snapshots, however, fail to capture sharp shifts, which can be beneficial for graph dynamics embedding. To solve it, we assume the evolution of dynamic graph nodes can be split into temporal shift embedding and temporal consistency embedding. Thus, we propose the Self-supervised Temporal-aware Dynamic Graph representation Learning framework (STDGL) for disentangling the temporal shift embedding from temporal consistency embedding via a well-designed auxiliary task from the perspectives of both node local and global connectivity modeling in a self-supervised manner, further enhancing the learning of interpretable graph representations and improving the performance of various downstream tasks. Extensive experiments on link prediction, edge classification and node classification tasks demonstrate STDGL successfully learns the disentangled temporal shift and consistency representations. Furthermore, the results indicate significant improvements in our STDGL over the state-of-the-art methods, and appealing interpretability and transferability owing to the disentangled node representations. \endabstract
ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models
Personalized content-based recommender systems have become indispensable tools for users to navigate through the vast amount of content available on platforms like daily news websites and book recommendation services. However, existing recommenders face significant challenges in understanding the content of items. Large language models (LLMs), which possess deep semantic comprehension and extensive knowledge from pretraining, have proven to be effective in various natural language processing tasks. In this study, we explore the potential of leveraging both open- and closed-source LLMs to enhance content-based recommendation. With open-source LLMs, we utilize their deep layers as content encoders, enriching the representation of content at the embedding level. For closed-source LLMs, we employ prompting techniques to enrich the training data at the token level. Through comprehensive experiments, we demonstrate the high effectiveness of both types of LLMs and show the synergistic relationship between them. Notably, we observed a significant relative improvement of up to 19.32% compared to existing state-of-the-art recommendation models. These findings highlight the immense potential of both open- and closed-source of LLMs in enhancing content-based recommendation systems. We have made our code and LLM-generated data available (https://github.com/Jyonn/ONCE) for other researchers to reproduce our results.
Knowledge Graph Context-Enhanced Diversified Recommendation
The field of Recommender Systems (RecSys) has been extensively studied to enhance accuracy by leveraging users’ historical interactions. Nonetheless, this persistent pursuit of accuracy frequently engenders diminished diversity, culminating in the well-recognized “echo chamber” phenomenon. Diversified RecSys has emerged as a countermeasure, placing diversity on par with accuracy and garnering noteworthy attention from academic circles and industry practitioners. This research explores the diversified RecSys within the intricate context of knowledge graphs (KG). These KGs act as repositories of interconnected information concerning entities and items, offering a propitious avenue to amplify recommendation diversity through the incorporation of insightful contextual information. Our contributions include introducing an innovative metric, Entity Coverage, and Relation Coverage, which effectively quantifies diversity within the KG domain. Additionally, we introduce the Diversified Embedding Learning (DEL) module, meticulously designed to formulate user representations that possess an innate awareness of diversity. In tandem with this, we introduce a novel technique named Conditional Alignment and Uniformity (CAU). It adeptly encodes KG item embeddings while preserving contextual integrity. Collectively, our contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the KG-informed RecSys paradigms.
Interact with the Explanations: Causal Debiased Explainable Recommendation System
In recent years, the field of recommendation systems has witnessed significant advancements, with explainable recommendation systems gaining prominence as a crucial area of research. These systems aim to enhance user experience by providing transparent and compelling recommendations, accompanied by explanations. However, a persistent challenge lies in addressing biases that can influence the recommendations and explanations offered by these systems. Such biases often stem from a tendency to favor popular items and generate explanations that highlight their common attributes, thereby deviating from the objective of delivering personalized recommendations and explanations. While existing debiasing methods have been applied in explainable recommendation systems, they often overlook the model-generated explanations in tackling biases. Consequently, biases in model-generated explanations may persist, potentially compromising system performance and user satisfaction.
To address biases in both model-generated explanations and recommended items, we discern the impact of model-generated explanations in recommendation through a formulated causal graph. Inspired by this causal perspective, we propose a novel approach termed Causal Explainable Recommendation System (CERS), which incorporates model-generated explanations into the debiasing process and enacts causal interventions based on user feedback on the explanations. By utilizing model-generated explanations as intermediaries between user-item interactions and recommendation results, we adeptly mitigate the biases via targeted causal interventions. Experimental results demonstrate the efficacy of CERS in reducing popularity bias while simultaneously improving recommendation performance, leading to more personalized and tailored recommendations. Human evaluation further affirms that CERS generates explanations tailored to individual users, thereby enhancing the persuasiveness of the system.
Attribute Simulation for Item Embedding Enhancement in Multi-interest Recommendation
Our research reveals that multi-interest recommendation models in the matching stage tend to exhibit an under-clustered item embedding space, which leads to a low discernibility between items and hampers item retrieval. This highlights the necessity for item embedding enhancement. However, item attributes, which serve as effective side information for enhancement, are either unavailable or incomplete in many public datasets due to the labor-intensive nature of manual annotation tasks. This dilemma raises two meaningful questions: 1. Can we bypass manual annotation and directly simulate complete attribute information from the interaction data? And 2. If feasible, how can we simulate attributes with high accuracy and low complexity in the matching stage?
In this paper, we first establish a theoretical feasibility that the item-attribute correlation matrix can be approximated through elementary transformations on the item co-occurrence matrix. Then, based on further formula derivation, we propose a simple yet effective module, SimEmb (Item Embedding Enhancement via Simulated Attribute), in the multi-interest recommendation of the matching stage. By simulating attributes with the co-occurrence matrix, SimEmb discards the ID-based item embedding and employs the attribute-weighted summation for item embedding enhancement. Comprehensive experiments on four benchmark datasets demonstrate that our approach notably enhances the clustering of item embedding and significantly outperforms SOTA models with an average improvement of 25.59% on Recall@20.
Generative Models for Complex Logical Reasoning over Knowledge Graphs
Answering complex logical queries over knowledge graphs (KGs) is a fundamental yet challenging task. Recently, query representation has been a mainstream approach to complex logical reasoning, making the target answer and query closer in the embedding space. However, there are still two limitations. First, prior methods model the query as a fixed vector, but ignore the uncertainty of relations on KGs. In fact, different relations may contain different semantic distributions. Second, traditional representation frameworks fail to capture the joint distribution of queries and answers, which can be learned by generative models that have the potential to produce more coherent answers. To alleviate these limitations, we propose a novel generative model, named DiffCLR, which exploits the diffusion model for complex logical reasoning to approximate query distributions. Specifically, we first devise a query transformation to convert logical queries into input sequences by dynamically constructing contextual subgraphs. Then, we integrate them into the diffusion model to execute a multi-step generative process, and a structure-enhanced self-attention is further designed for incorporating the structural features embodied in KGs. Experimental results on two benchmark datasets show our model effectively outperforms state-of-the-art methods, particularly in multi-hop chain queries with significant improvement.
MADM: A Model-agnostic Denoising Module for Graph-based Social Recommendation
Graph-based social recommendation improves the prediction accuracy of recommendation by leveraging high-order neighboring information contained in social relations. However, most of them ignore the problem that social relations can be noisy for recommendation. Several studies attempt to tackle this problem by performing social graph denoising, but they suffer from 1) adaptability issues for other graph-based social recommendation models and 2) insufficiency issues for user social representation learning. To address the limitations, we propose a model-agnostic graph denoising module (denoted as MADM) which works as a plug-and-play module to provide refined social structure for base models. Meanwhile, to propel user social representations to be minimal and sufficient for recommendation, MADM further employs mutual information maximization (MIM) between user social representations and the interaction graph and realizes two ways of MIM: contrastive learning and forward predictive learning. We provide theoretical insights and guarantees from the perspectives of Information Theory and Multi-view Learning to explain its rationality. Extensive experiments on three real-world datasets demonstrate the effectiveness of MADM.
Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction
Automatic bundle construction is a crucial prerequisite step in various bundle-aware online services. Previous approaches are mostly designed to model the bundling strategy of existing bundles. However, it is hard to acquire large-scale well-curated bundle dataset, especially for those platforms that have not offered bundle services before. Even for platforms with mature bundle services, there are still many items that are included in few or even zero bundles, which give rise to sparsity and cold-start challenges in the bundle construction models. To tackle these issues, we target at leveraging multimodal features, item-level user feedback signals, and the bundle composition information, to achieve a comprehensive formulation of bundle construction. Nevertheless, such formulation poses two new technical challenges: 1) how to learn effective representations by unifying multiple features optimally, and 2) how to address the problems of modality missing, noise, and sparsity problems induced by the incomplete query bundles. In this work, to address these technical challenges, we propose a Contrastive Learning-enhanced Hierarchical Encoder method (CLHE). Specifically, we use self-attention modules to combine the multimodal and multi-item features, and then leverage both item- and bundle-level contrastive learning to enhance the representation learning, thus to counter the modality missing, noise, and sparsity problems. Extensive experiments on four datasets in two application domains demonstrate that our method outperforms a list of SOTA methods. The code and dataset are available at https://github.com/Xiaohao-Liu/CLHE.
Source Free Graph Unsupervised Domain Adaptation
Graph Neural Networks (GNNs) have achieved great success on a variety of tasks with graph-structural data, among which node classification is an essential one. Unsupervised Graph Domain Adaptation (UGDA) shows its practical value of reducing the labeling cost for node classification. It leverages knowledge from a labeled graph (i.e., source domain) to tackle the same task on another unlabeled graph (i.e., target domain). Most existing UGDA methods heavily rely on the labeled graph in the source domain. They utilize labels from the source domain as the supervision signal and are jointly trained on both the source graph and the target graph. However, in some real-world scenarios, the source graph is inaccessible because of privacy issues. Therefore, we propose a novel scenario named Source Free Unsupervised Graph Domain Adaptation (SFUGDA). In this scenario, the only information we can leverage from the source domain is the well-trained source model, without any exposure to the source graph and its labels. As a result, existing UGDA methods are not feasible anymore. To address the non-trivial adaptation challenges in this practical scenario, we propose a model-agnostic algorithm called SOGA for domain adaptation to fully exploit the discriminative ability of the source model while preserving the consistency of structural proximity on the target graph. We prove the effectiveness of the proposed algorithm both theoretically and empirically. The experimental results on four cross-domain tasks show consistent improvements in the Macro-F1 score and Macro-AUC.
A Linguistic Grounding-Infused Contrastive Learning Approach for Health Mention Classification on Social Media
Social media users use disease and symptoms words in different ways, including describing their personal health experiences figuratively or in other general discussions. The health mention classification (HMC) task aims to separate how people use terms, which is important in public health applications. Existing HMC studies address this problem using pretrained language models (PLMs). However, the remaining gaps in the area include the need for linguistic grounding, the requirement for large volumes of labelled data, and that solutions are often only tested on Twitter or Reddit, which provides limited evidence of the transportability of models. To address these gaps, we propose a novel method that uses a transformer-based PLM to obtain a contextual representation of target (disease or symptom) terms coupled with a contrastive loss to establish a larger gap between target terms’ literal and figurative uses using linguistic theories. We introduce the use of a simple and effective approach for harvesting candidate instances from the broad corpus and generalising the proposed method using self-training to address the label scarcity challenge. Our experiments on publicly available health-mention datasets from Twitter (HMC2019) and Reddit (RHMD) demonstrate that our method outperforms the state-of-the-art HMC methods on both datasets for the HMC task. We further analyse the transferability and generalisability of our method and conclude with a discussion on the empirical and ethical considerations of our study.
RecJPQ: Training Large-Catalogue Sequential Recommenders
Sequential Recommendation is a popular recommendation task that uses the order of user-item interaction to model evolving users’ interests and sequential patterns in their behaviour. Current state-of-the-art Transformer-based models for sequential recommendation, such as BERT4Rec, generate sequence embeddings and compute scores for catalogue items, but the increasing catalogue size makes training these models costly. The Joint Product Quantisation (JPQ) method, originally proposed for passage retrieval, markedly reduces the size of the retrieval index with minimal effect on model effectiveness, by replacing passage embeddings with a limited number of shared sub-embeddings. This paper introduces RecJPQ, a novel adaptation of JPQ for sequential recommendations, which replaces each item embedding with a concatenation of a limited number of shared sub-embeddings and, therefore, limits the number of learnable model parameters. The main idea of RecJPQ is to split items into sub-item entities before training the main recommendation model, which is inspired by splitting words into tokens and training tokenizers in language models. We apply RecJPQ to SASRec, BERT4Rec, and GRU models on three large-scale sequential datasets. Our results showed that RecJPQ could notably reduce the model size (e.g., 48× reduction for the Gowalla dataset with no effectiveness degradation). RecJPQ can also improve model performance through a regularization effect (e.g. +0.96% NDCG@10 improvement on the Booking.com dataset). Overall, RecJPQ allows the training of state-of-the-art transformer models in industrial applications, where datasets with millions of items are common.
Intent Contrastive Learning with Cross Subsequences for Sequential Recommendation
The user purchase behaviors are mainly influenced by their intentions (e.g., buying clothes for decoration, buying brushes for painting, etc.). Modeling a user’s latent intention can significantly improve the performance of recommendations. Previous works model users’ intentions by considering the predefined label in auxiliary information or introducing stochastic data augmentation to learn purposes in the latent space. However, the auxiliary information is sparse and not always available for recommender systems, and introducing stochastic data augmentation may introduce noise and thus change the intentions hidden in the sequence. Therefore, leveraging user intentions for sequential recommendation (SR) can be challenging because they are frequently varied and unobserved. In this paper, Intent contrastive learning with Cross Subsequences for sequential Recommendation (ICSRec) is proposed to model users’ latent intentions. Specifically, ICSRec first segments a user’s sequential behaviors into multiple subsequences by using a dynamic sliding operation and takes these subsequences into the encoder to generate the representations for the user’s intentions. To tackle the problem of no explicit labels for purposes, ICSRec assumes different subsequences with the same target item may represent the same intention and proposes a coarse-grain intent contrastive learning to push these subsequences closer. Then, fine-grain intent contrastive learning is mentioned to capture the fine-grain intentions of subsequences in sequential behaviors. Extensive experiments conducted on four real-world datasets demonstrate the superior performance of the proposed ICSRec model compared with baseline methods.
Budgeted Embedding Table For Recommender Systems
At the heart of contemporary recommender systems (RSs) are latent factor models that provide quality recommendation experience to users. These models use embedding vectors, which are typically of a uniform and fixed size, to represent users and items. As the number of users and items continues to grow, this design becomes inefficient and hard to scale. Recent lightweight embedding methods have enabled different users and items to have diverse embedding sizes, but are commonly subject to two major drawbacks. Firstly, they limit the embedding size search to optimizing a heuristic balancing the recommendation quality and the memory complexity, where the trade-off coefficient needs to be manually tuned for every memory budget requested. The implicitly enforced memory complexity term can even fail to cap the parameter usage, making the resultant embedding table fail to meet the memory budget strictly. Secondly, most solutions, especially reinforcement learning based ones derive and optimize the embedding size for each each user/item on an instance-by-instance basis, which impedes the search efficiency. In this paper, we propose Budgeted Embedding Table (BET), a novel method that generates table-level actions (i.e., embedding sizes for all users and items) that is guaranteed to meet pre-specified memory budgets. Furthermore, by leveraging a set-based action formulation and engaging set representation learning, we present an innovative action search strategy powered by an action fitness predictor that efficiently evaluates each table-level action. Experiments have shown state-of-the-art performance on two real-world datasets when BET is paired with three popular recommender models under different memory budgets.
SSLRec: A Self-Supervised Learning Framework for Recommendation
Self-supervised learning (SSL) has gained significant interest in recent years as a solution to address the challenges posed by sparse and noisy data in recommender systems. Despite the growing number of SSL algorithms designed to provide state-of-the-art performance in various recommendation scenarios (e.g., graph collaborative filtering, sequential recommendation, social recommendation, KG-enhanced recommendation), there is still a lack of unified frameworks that integrate recommendation algorithms across different domains. Such a framework could serve as the cornerstone for self-supervised recommendation algorithms, unifying the validation of existing methods and driving the design of new ones. To address this gap, we introduce SSLRec, a novel benchmark platform that provides a standardized, flexible, and comprehensive framework for evaluating various SSL-enhanced recommenders. The SSLRec framework features a modular architecture that allows users to easily evaluate state-of-the-art models and a complete set of data augmentation and self-supervised toolkits to help create SSL recommendation models with specific needs. Furthermore, SSLRec simplifies the process of training and evaluating different recommendation models with consistent and fair settings. Our SSLRec platform covers a comprehensive set of state-of-the-art SSL-enhanced recommendation models across different scenarios, enabling researchers to evaluate these cutting-edge models and drive further innovation in the field. Our implemented SSLRec framework is available at the source code repository https://github.com/HKUDS/SSLRec.
GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction
Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called \proj, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. \proj aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, \proj can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of \proj, showcasing significant improvements (by up to 30%\uparrow in AUC) over state-of-the-art competitors. The \hrefhttps://github.com/anonymoususer437/GAD-NR \textcolorcyan source code for \proj is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities.
Ad-load Balancing via Off-policy Learning in a Content Marketplace
Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user’s position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.
ProGAP: Progressive Graph Neural Networks with Differential Privacy Guarantees
Graph Neural Networks (GNNs) have become a popular tool for learning on graphs, but their widespread use raises privacy concerns as graph data can contain personal or sensitive information. Differentially private GNN models have been recently proposed to preserve privacy while still allowing for effective learning over graph-structured datasets. However, achieving an ideal balance between accuracy and privacy in GNNs remains challenging due to the intrinsic structural connectivity of graphs. In this paper, we propose a new differentially private GNN called ProGAP that uses a progressive training scheme to improve such accuracy-privacy trade-offs. Combined with the aggregation perturbation technique to ensure differential privacy, ProGAP splits a GNN into a sequence of overlapping submodels that are trained progressively, expanding from the first submodel to the complete model. Specifically, each submodel is trained over the privately aggregated node embeddings learned and cached by the previous submodels, leading to an increased expressive power compared to previous approaches while limiting the incurred privacy costs. We formally prove that ProGAP ensures edge-level and node-level privacy guarantees for both training and inference stages, and evaluate its performance on benchmark graph datasets. Experimental results demonstrate that ProGAP can achieve up to 5-10% higher accuracy than existing state-of-the-art differentially private GNNs. Our code is available at https://github.com/sisaman/ProGAP.
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling
Generative language models (LMs) have become omnipresent across data science. For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted. LM performance has consistently been increasing with model size—but so has the monetary cost of querying the ever larger models. Importantly, however, not all inputs are equally hard: some require larger LMs for obtaining a satisfactory solution, whereas for others smaller LMs suffice. Based on this fact, we design a framework for cost effective language model choice, called ”Fly-swat or cannon” (FORC). Given a set of inputs and a set of candidate LMs, FORC judiciously assigns each input to an LM predicted to do well on the input according to a so-called meta-model, aiming to achieve high overall performance at low cost. The cost–performance tradeoff can be flexibly tuned by the user. Options include, among others, maximizing total expected performance (or the number of processed inputs) while staying within a given cost budget, or minimizing total cost while processing all inputs. We evaluate FORC on 14 datasets covering five natural language tasks, using four candidate LMs of vastly different size and cost. With FORC, we match the performance of the largest available LM while achieving a cost reduction of 63%. Via our publicly available library, (https://github.com/epfl-dlab/forc) researchers as well as practitioners can thus save large amounts of money without sacrificing performance.
Proxy-based Item Representation for Attribute and Context-aware Recommendation
Neural network approaches in recommender systems have shown remarkable success by representing a large set of items as a learnable vector embedding table. However, infrequent items may suffer from inadequate training opportunities, making it difficult to learn meaningful representations. We examine that in attribute and context-aware settings, the poorly learned embeddings of infrequent items impair the recommendation accuracy. To address such an issue, we propose a proxy-based item representation that allows each item to be expressed as a weighted sum of learnable proxy embeddings. Here, the proxy weight is determined by the attributes and context of each item and may incorporate bias terms in case of frequent items to further reflect collaborative signals. The proxy-based method calculates the item representations compositionally, ensuring each representation resides inside a well-trained simplex and, thus, acquires guaranteed quality. Additionally, that the proxy embeddings are shared across all items allows the infrequent items to borrow training signals of frequent items in a unified model structure and end-to-end manner. Our proposed method is a plug-and-play model that can replace the item encoding layer of any neural network-based recommendation model, while consistently improving the recommendation performance with much smaller parameter usage. Experiments conducted on real-world recommendation benchmark datasets demonstrate that our proposed model outperforms state-of-the-art models in terms of recommendation accuracy by up to 17% while using only 10% of the parameters.
Causality Guided Disentanglement for Cross-Platform Hate Speech Detection
espite their value in promoting open discourse, social media plat- forms are often exploited to spread harmful content. Current deep learning and natural language processing models used for detect- ing this harmful content rely on domain-specific terms affecting their ability to adapt to generalizable hate speech detection. This is because they tend to focus too narrowly on particular linguistic signals or the use of certain categories of words. Another signifi- cant challenge arises when platforms lack high-quality annotated data for training, leading to a need for cross-platform models that can adapt to different distribution shifts. Our research introduces a cross-platform hate speech detection model capable of being trained on one platform’s data and generalizing to multiple unseen platforms. One way to achieve good generalizability across plat- forms is to disentangle the input representations into invariant and platform-dependent features. We also argue that learning causal relationships, which remain constant across diverse environments, can significantly aid in understanding invariant representations in hate speech. By disentangling input into platform-dependent fea- tures (useful for predicting hate targets) and platform-independent features (used to predict the presence of hate), we learn invariant representations resistant to distribution shifts. These features are then used to predict hate speech across unseen platforms. Our ex- tensive experiments across four platforms highlight our model’s enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech
Long-Term Value of Exploration: Measurements, Findings and Algorithms
Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. We here introduce new experiment designs to formally quantify the long-term value of exploration by examining its effects on content corpus, and connecting content corpus growth to the long-term user experience from real-world experiments. Once established the values of exploration, we investigate the Neural Linear Bandit algorithm as a general framework to introduce exploration into any deep learning based ranking systems. We conduct live experiments on one of the largest short-form video recommendation platforms that serves billions of users to validate the new experiment designs, quantify the long-term values of exploration, and to verify the effectiveness of the adopted neural linear bandit algorithm for exploration.
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, \eg, cell lookup, row retrieval, and size detection. We perform a series of evaluations on GPT-3.5 and GPT-4. We find that performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we proposeself-augmentation for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, \eg, TabFact(\uparrow2.31%), HybridQA(\uparrow2.13%), SQA(\uparrow2.72%), Feverous(\uparrow0.84%), and ToTTo(\uparrow5.68%). We believe that our open-source (please find code and data at https://github.com/microsoft/TableProvider) benchmark and proposed prompting methods can serve as a simple yet generic selection for future research.
LEAD: Liberal Feature-based Distillation for Dense Retrieval
Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model. Traditional methods include response-based methods and feature-based methods. Response-based methods are widely used but suffer from lower upper limits of performance due to their ignorance of intermediate signals, while feature-based methods have constraints on vocabularies, tokenizers and model architectures. In this paper, we propose a liberal feature-based distillation method (LEAD). LEAD aligns the distribution between the intermediate layers of teacher model and student model, which is effective, extendable, portable and has no requirements on vocabularies, tokenizers, or model architectures. Extensive experiments show the effectiveness of LEAD on widely-used benchmarks, including MS MARCO Passage Ranking, TREC 2019 DL Track, MS MARCO Document Ranking and TREC 2020 DL Track. Our code is available in https://github.com/microsoft/SimXNS/tree/main/LEAD.
Rethinking and Simplifying Bootstrapped Graph Latents
Graph contrastive learning (GCL) has emerged as a representative paradigm in graph self-supervised learning, where negative samples are commonly regarded as the key to preventing model collapse and producing distinguishable representations. Recent studies have shown that GCL without negative samples can achieve state-of-the-art performance as well as scalability improvement, with bootstrapped graph latent (BGRL) as a prominent step forward. However, BGRL relies on a complex architecture to maintain the ability to scatter representations, and the underlying mechanisms enabling the success remain largely unexplored. In this paper, we introduce an instance-level decorrelation perspective to tackle the aforementioned issue and leverage it as a springboard to reveal the potential unnecessary model complexity within BGRL. Based on our findings, we present SGCL, a simple yet effective GCL framework that utilizes the outputs from two consecutive iterations as positive pairs, eliminating the negative samples. SGCL only requires a single graph augmentation and a single graph encoder without additional parameters. Extensive experiments conducted on various graph benchmarks demonstrate that SGCL can achieve competitive performance with fewer parameters, lower time and space costs, and significant convergence speedup.
A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval
Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e.g., products with aspects such as category and brand. Recent work has proposed two approaches to incorporate the aspect information into item representations for effective retrieval by predicting the values associated with the item aspects. Despite their efficacy, they treat the values as isolated classes (e.g., “Smart Homes”, “Home, Garden & Tools”, and “Beauty & Health”) and ignore their fine-grained semantic relation. Furthermore, they either enforce the learning of aspects into the CLS token, which could confuse it from its designated use for representing the entire content semantics, or learn extra aspect embeddings only with the value prediction objective, which could be insufficient especially when there are no annotated values for an item aspect.
Aware of these limitations, we propose a MUlti-granulaRity-aware Aspect Learning model (MURAL) for multi-aspect dense retrieval. It leverages aspect information across various granularities to capture both coarse and fine-grained semantic relations between values. Moreover, MURAL incorporates separate aspect embeddings as input to transformer encoders so that the masked language model objective can assist implicit aspect learning even without aspect-value annotations. Extensive experiments on two real-world datasets of products and mini-programs show that MURAL outperforms state-of-the-art baselines significantly. Code will be available at the URL.
Temporal Blind Spots in Large Language Models
Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.
Efficient, Direct, and Restricted Black-Box Graph Evasion Attacks to Any-Layer Graph Neural Networks via Influence Function
Graph neural network (GNN), the mainstream method to learn on graph data, is vulnerable to graph evasion attacks, where an attacker slightly perturbing the graph structure can fool trained GNN models. Existing work has at least one of the following drawbacks: 1) limited to directly attack two-layer GNNs; 2) inefficient; and 3) impractical, as they need to know full or part of GNN model parameters.
We address the above drawbacks and propose an influence-based efficient, direct, and restricted black-box evasion attack to any-layer GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then we observe that GNNs and LP are strongly connected in terms of our defined influences. Based on this, we can then reformulate the evasion attack to GNNs as calculating label influence on LP, which is inherently applicable to any-layer GNNs, while no need to know information about the internal GNN model. Finally, we propose an efficient algorithm to calculate label influence. Experimental results on various graph datasets show that, compared to state-of-the-art white-box attacks, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs.
CityCAN: Causal Attention Network for Citywide Spatio-Temporal Forecasting
Citywide spatio-temporal (ST) forecasting is a fundamental task for many urban applications, including traffic accident prediction, taxi demand planning, and crowd flow forecasting. The goal of this task is to generate accurate predictions concurrently for all regions within a city. Prior works take great effort on modeling the ST correlations. However, they often overlook intrinsic correlations and inherent data distribution across the city, both of which are influenced by urban zoning and functionality, resulting in inferior performance on citywide ST forecasting. In this paper, we introduce CityCAN, a novel causal attention network, to collectively generate predictions for every region of a city. We first present a causal framework to identify useful correlations among regions, filtering out useless ones, via an intervention strategy. In the framework, a Global Local-Attention Encoder, which leverages attention mechanisms, is designed to jointly learn both local and global ST correlations among correlated regions. Then, we design a citywide loss to constrain the prediction distribution by incorporating the citywide distribution. Extensive experiments on three real-world applications demonstrate the effectiveness of CityCAN.
Distribution Consistency based Self-Training for Graph Neural Networks with Sparse Labels
Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that both are informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-aware edge predictor is further adopted to augment the graph and increase the model’s generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-the-art baselines.
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes
Empirical loss minimization during machine learning training can inadvertently introduce bias, stemming from discrimination and societal prejudices present in the data. To address the shortcomings of traditional fair machine learning methods-which often rely on sensitive information of training data or mandate significant model alterations-we present FairIF, a unique two-stage training framework. Distinctly, FairIF enhances fairness by recalibrating training sample weights using the influence function. Notably, it employs sensitive information from a validation set, rather than the training set, to determine these weights. This approach accommodates situations with missing or inaccessible sensitive training data. Our FairIF ensures fairness across demographic groups by retraining models on the reweighted data. It stands out by offering a plug-and-play solution, obviating the need for changes in model architecture or the loss function. We demonstrate that the fairness performance of FairIF is guaranteed during testing with only a minimal impact on classification performance. Additionally, we analyze that our framework adeptly addresses issues like group size disparities, distribution shifts, and class size discrepancies. Empirical evaluations on three synthetic and five real-world datasets across six model architectures confirm FairIF’s efficiency and scalability. The experimental results indicate superior fairness-utility trade-offs compared to other methods, regardless of bias types or architectural variations. Moreover, the adaptability of FairIF to utilize pretrained models for subsequent tasks and its capability to rectify unfairness originating during the pretraining phase are further validated through our experiments.
NeuralReconciler for Hierarchical Time Series Forecasting
Time series forecasting has wide-ranging applications in business intelligence, including predicting logistics demand and estimating power consumption in a smart grid, which subsequently facilitates decision-making processes. In many real-world scenarios, such as department sales of multiple Walmart stores across different locations, time series data possess hierarchical structures with non-linear and non-Gaussian properties. Thus, the task of leveraging structural information among hierarchical time series while learning from non-linear correlations and non-Gaussian data distributions becomes crucial to enhance prediction accuracy. This paper proposes a novel approach named NeuralReconciler for Hierarchical Time Series (HTS) prediction through trainable attention-based reconciliation and Normalizing Flow (NF). The latter is used to approximate the complex (usually non-Gaussian) data distribution for multivariate time series forecasting. To reconcile the HTS data, a new flexible reconciliation strategy via the attention-based encoder-decoder neural network is proposed, which is distinct from current methods that rely on strong assumptions (e.g., all forecasts being unbiased estimates and the noise distribution being Gaussian). Furthermore, using the reparameterization trick, each independent component (i.e., forecasts via NF and attention-based reconciliation) is integrated into a trainable end-to-end model. Our proposed NeuralReconciler has been extensively experimented on real-world datasets and achieved consistent state-of-the-art performance compared to well-acknowledged and advanced baselines, with a 20% relative improvement on five benchmarks.
Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation
In the realm of deep learning-based recommendation systems, the increasing computational demands, driven by the growing number of users and items, pose a significant challenge to practical deployment. This challenge is primarily twofold: reducing the model size while effectively learning user and item representations for efficient recommendations. Despite considerable advancements in model compression and architecture search, prevalent approaches face notable constraints. These include substantial additional computational costs from pre-training/re-training in model compression and an extensive search space in architecture design. Additionally, managing complexity and adhering to memory constraints is problematic, especially in scenarios with strict time or space limitations. Addressing these issues, this paper introduces a novel learning paradigm, Dynamic Sparse Learning (DSL), tailored for recommendation models. DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight’s significance and the model’s sparsity distribution during the training. This approach ensures a consistent and minimal parameter budget throughout the full learning lifecycle, paving the way for “end-to-end” efficiency from training to inference. Our extensive experimental results underline DSL’s effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance. We give an code link of our work: https://github.com/shuyao-wang/DSL.
Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation via Adversarial Rewarding
The diverse advertiser demands (brand effects or immediate outcomes) lead to distinct selling (pre-agreed volumes with an under-delivery penalty or compete per auction) and pricing (fixed prices or varying bids) patterns in Guaranteed delivery (GD) and real-time bidding (RTB) advertising. This necessitates fair impression allocation to unify the two markets for promoting ad content diversity and overall revenue. Existing approaches often deprive RTB ads of equal exposure opportunities by prioritizing GD ads, and coarse-grained methods are inferior to 1) Ambiguous reward due to varied objectives and constraints of GD fulfillment and RTB utility, hindering measurement of each allocation’s contribution to the global interests; 2) Intensified competition by the coexistence of GD and RTB ads, complicating their mutual relationships; 3) Policy degradation caused by evolving user traffic and bid landscape, requiring adaptivity to distribution shifts.
We propose LIBRA, a generative-adversarial framework that unifies GD and RTB ads through request-level modeling. To guide the generative allocator, we solve convex optimization on historical data to derivehindsight optimal allocations that balance fairness and utility. We then train a discriminator to distinguish the generated actions from these solvedlatent expert policy’s demonstrations, providing an integrated reward to align LIBRA with the optimal fair policy. LIBRA employs a self-attention encoder to capture the competitive relations among varying amounts of candidate ads per allocation. Further, it enhances the discriminator withinformation bottlenecks-based summarizer against overfitting to irrelevant distractors in the ad environment. LIBRA adopts a decoupled structure, where theoffline discriminator continuously fine-tunes with newly-coming allocations and periodically guides theonline allocation policy’s updates to accommodate online dynamics. LIBRA has been deployed on the Tencent advertising system for over four months, with extensive experiments conducted. Online A/B tests demonstrate significant lifts in ad income (3.17%), overall click-through rate (1.56%), and cost-per-mille (3.20%), contributing a daily revenue increase of hundreds of thousands of RMB.
Not All Negatives Are Worth Attending to: Meta-Bootstrapping Negative Sampling Framework for Link Prediction
The rapid development of graph neural networks (GNNs) encourages the rising of link prediction, achieving promising performance with various applications. Unfortunately, through a comprehensive analysis, we surprisingly find that current link predictors with dynamic negative samplers (DNSs) suffer from the migration phenomenon between ”easy” and ”hard” samples, which goes against the preference of DNS of choosing “hard” negatives, thus severely hindering capability. Towards this end, we propose the MeBNS framework, serving as a general plugin that can potentially improve current negative sampling based link predictors. In particular, we elaborately devise a Meta-learning Supported Teacher-student GNN (MST-GNN) that is not only built upon teacher-student architecture for alleviating the migration between ”easy” and ”hard” samples but also equipped with a meta learning based sample re-weighting module for helping the student GNN distinguish ”hard” samples in a fine-grained manner. To effectively guide the learning of MST-GNN, we prepare a Structure enhanced Training Data Generator (STD-Generator) and an Uncertainty based Meta Data Collector (UMD-Collector) for supporting the teacher and student GNN, respectively. Extensive experiments show that the MeBNS achieves remarkable performance across six link prediction benchmark datasets.
Towards Better Chinese Spelling Check for Search Engines: A New Dataset and Strong Baseline
Misspellings in search engine queries may prevent search engines from returning accurate results. For Chinese mobile search engines, due to the different input methods (e.g., hand-written and T9 input methods), more types of misspellings exist, making this problem more challenging. As an essential module of search engines, Chinese Spelling Check~(CSC) models aim to detect and correct misspelled Chinese characters from user-issued queries. Despite the great value of CSC to the search engine, there is no CSC benchmark collected from real-world search engine queries. To fill this blank, we construct and release the Alipay Search Engine Query (AlipaySEQ) spelling check dataset. To the best of our knowledge, AlipaySEQ is the first Chinese Spelling Check dataset collected from the real-world scenario of Chinese mobile search engines. It consists of 15,522 high-quality human annotated and 1,175,151 automatically generated samples. To demonstrate the unique challenges of AlipaySEQ in the era of Large Language Models~(LLMs), we conduct a thorough study to analyze the difference between AlipaySEQ and existing SIGHAN benchmarks and compare the performance of various baselines, including existing task-specific methods and LLMs. We observe that all baselines fail to perform satisfactorily due to the over-correction problem. Especially, LLMs exhibit below-par performance on AlipaySEQ, which is rather surprising. Therefore, to alleviate the over-correction problem, we introduce a model-agnostic CSC Self-Refine Framework (SRF) to construct a strong baseline. Comprehensive experiments demonstrate that our proposed SRF, though more effective against existing models on both the AlipaySEQ and SIGHAN15, is still far from achieving satisfactory performance on our real-world dataset. With the newly collected real-world dataset and strong baseline, we hope more progress can be achieved on such a challenging and valuable task.
Diff-MSR: A Diffusion Model Enhanced Paradigm for Cold-Start Multi-Scenario Recommendation
With the explosive growth of various commercial scenarios, there is an increasing number of studies on multi-scenario recommendation (MSR) which trains the recommender system with the data from multiple scenarios, aiming to improve the recommendation performance on all these scenarios synchronously. However, due to the large discrepancy in the number of interactions among domains, multi-scenario recommendation models usually suffer from insufficient learning and negative transfer especially on the cold-start scenarios, thus exacerbating the data sparsity issue. To fill this gap, in this work we propose a novel diffusion model enhanced paradigm tailored for the cold-start problem in multi-scenario recommendation in a data-driven generative manner. Specifically, based on all-domain data, we leverage the diffusion model with our newly designed variance schedule and the proposed classifier, which explicitly boosts the recommendation performance on the cold-start scenarios by exploiting the generated high-quality and informative embedding, leveraging the abundance of rich scenarios. Our experiments on Douban and Amazon datasets demonstrate two strengths of the proposed paradigm: (i) its effectiveness with a significant increase of 8.5% and 1% in accuracy on the two datasets, and (ii) its compatibility with various multi-scenario backbone models. The implementation code is available for easy reproduction.
Cost-Effective Active Learning for Bid Exploration in Online Advertising
As a bid optimization algorithm in the first-price auction (FPA), bid shading is used in online advertising to avoid overpaying for advertisers. However, we find the bid shading approach would incur serious local optima. This effect prevents the advertisers from maximizing long-term surplus. In this work, we identify the reasons behind this local optima – it comes from the lack of winning price information, which results in the conflict between short-term surplus and the winning rate prediction model training, and is further propagated through the over-exploitation of the model. To rectify this problem, we propose a cost-effective active learning strategy, namely CeBE, for bid exploration. Specifically, we comprehensively consider the uncertainty and density of samples to calculate exploration utility, and use a 2+ε-approximation greedy algorithm to control exploration costs. Instead of selecting bid prices that maximize the expected surplus for all bid requests, we employ the bid exploration strategy to determine the bid prices. By trading off a portion of surplus, we can train the model using higher-quality data to enhance its performance, enabling the system to achieve a long-term surplus. Our method is straightforward and applicable to real-world industrial environment: it is effective across various categories of winning rate prediction models. We conducted empirical studies to validate the efficacy of our approach. In comparison to the traditional bid shading system, CeBE can yield an average surplus improvement of 8.16% across various models and datasets.
AutoPooling: Automated Pooling Search for Multi-valued Features in Recommendations
Large-scale recommender systems usually contain hundreds of multi-valued feature fields, which have different number of values in each field. For easier computation in traditional fixed-shape neural networks, pooling operators are widely used to compress the multi-valued feature into a fixed-dimension vector. Most existing works set a single pooling method for all fields, but this leads to sub-optimal results, because different feature fields have different information distributions and thus require different pooling methods. In this work, we propose an AutoML-based framework, called AutoPooling, which can automatically and efficiently search for the optimal pooling operator for each multivalued feature. Specifically, learnable weights are assigned to all candidate pooling operators in each feature field. Then an AutoML-based algorithm is used to learn both model parameters and the field-aware weights. Finally, the optimal pooling operator can be acquired based on the associated weights. We evaluate the proposed framework on both public and industrial datasets. The results show that AutoPooling significantly outperforms the benchmarks. Further experiment results show that our method is robust in various deep recommend models and different search spaces.
LLMRec: Large Language Models with Graph Augmentation for Recommendation
The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In light of the recent advancements in large language models (LLMs), which possess extensive knowledge bases and strong reasoning capabilities, we propose a novel framework called LLMRec that enhances recommender systems by employing three simple yet effective LLM-based graph augmentation strategies. Our approach leverages the rich content available within online platforms (e.g., Netflix, MovieLens) to augment the interaction graph in three ways: (i) reinforcing user-item interaction egde, (ii) enhancing the understanding of item node attributes, and (iii) conducting user node profiling, intuitively from the natural language perspective. By employing these strategies, we address the challenges posed by sparse implicit feedback and low-quality side information in recommenders. Besides, to ensure the quality of the augmentation, we develop a denoised data robustification mechanism that includes techniques of noisy implicit feedback pruning and MAE-based feature enhancement that help refine the augmented data and improve its reliability. Furthermore, we provide theoretical analysis to support the effectiveness of LLMRec and clarify the benefits of our method in facilitating model optimization. Experimental results on benchmark datasets demonstrate the superiority of our LLM-based augmentation approach over state-of-the-art techniques. To ensure reproducibility, we have made our code and augmented data publicly available at: https://github.com/HKUDS/LLMRec.git.
Unified Visual Preference Learning for User Intent Understanding
In the world of E-Commerce, the core task is to understand the personalized preference from various kinds of heterogeneous information, such as textual reviews, item images and historical behaviors. In current systems, these heterogeneous information are mainly exploited to generate better item or user representations. For example, in scenario of visual search, the importance of modeling query image has been widely acknowledged. But, these existing solutions focus on improving the representation quality of the query image, overlooking the personalized visual preference of the user. Note that the visual features affect the user’s decision significantly, e.g., the user could be more likely to click the items with her preferred design. Hence, it is fruitful to exploit the visual preference to deliver better capacity for personalization.
To this end, we propose a simple yet effective target-aware visual preference learning framework (named Tavern) for both item recommendation and search. The proposed Tavern works as an individual and generic model that can be smoothly plugged into different downstream systems. Specifically, for visual preference learning, we utilize the image of the target item to derive the visual preference signals for each historical clicked item. This procedure is modeled as a form of representation disentanglement, where the visual preference signals are extracted by taking off the noisy information irrelevant to visual preference from the shared visual information between the target and historical items. During this process, a novel selective orthogonality disentanglement is proposed to avoid the significant information loss. Then, a GRU network is utilized to aggregate these signals to form the final visual preference representation. Extensive experiments over three large-scale real-world datasets covering visual search, product search and recommendation well demonstrate the superiority of our proposed Tavern against existing technical alternatives. Further ablation study also confirms the validity of each design choice.
Continuous-time Autoencoders for Regular and Irregular Time Series Imputation
Time series imputation is one of the most fundamental tasks for time series. Real-world time series datasets are frequently incomplete (or irregular with missing observations), in which case imputation is strongly required. Many different time series imputation methods have been proposed. Recent self-attention-based methods show the state-of-the-art imputation performance. However, it has been overlooked for a long time to design an imputation method based on continuous-time recurrent neural networks (RNNs), i.e., neural controlled differential equations (NCDEs). To this end, we redesign time series (variational) autoencoders based on NCDEs. Our method, called continuous-time autoencoder (CTA), encodes an input time series sample into a continuous hidden path (rather than a hidden vector) and decodes it to reconstruct and impute the input. In our experiments with 4 datasets and 19 baselines, our method shows the best imputation performance in almost all cases.
Neural Kalman Filtering for Robust Temporal Recommendation
Temporal recommendation methods can achieve superior accuracy due to updating user/item embeddings continuously once obtaining new interactions. However, the randomness of user behaviors will introduce noises into the user interactions and cause the deviation in the modeling of user preference, resulting in sub-optimal performance. To this end, we propose NeuFilter, a robust temporal recommendation algorithm based on neural Kalman Filtering, to learn more accurate user and item embeddings with noisy interactions. Classic Kalman Filtering is time-consuming when applied to recommendation due to its covariance matrices. Thus, we propose a neural network solution to Kalman Filtering, so as to realize higher efficiency and stronger expressivity. Specifically, NeuFilter consists of three alternating units: 1) prediction unit, which predicts user and item embeddings based on their historical embeddings; 2) estimation unit, which updates user and item embeddings in a manner similar to Kalman Filtering; 3) correction unit, which corrects the updated user and item embeddings from estimation unit to ensure reliable estimation and accurate update. Experiments on two recommendation tasks show that NeuFilter can achieve higher accuracy compared with the state-of-the-art methods, while achieving high robustness. Moreover, our empirical studies on a node classification task further confirm the importance of handling noises in tasks on temporal graph, shedding a new light on temporal graph modeling.
Deep Evolutional Instant Interest Network for CTR Prediction in Trigger-Induced Recommendation
The recommendation has been playing a key role in many industries, e.g., e-commerce, streaming media, social media, etc. Recently, a new recommendation scenario, called Trigger-Induced Recommendation (TIR), where users are able to explicitly express their instant interests via trigger items, is emerging as an essential role in many e-commerce platforms, e.g., Alibaba.com and Amazon. Without explicitly modeling the user’s instant interest, traditional recommendation methods usually obtain sub-optimal results in TIR. Even though there are a few methods considering the trigger and target items simultaneously to solve this problem, they still haven’t taken into account temporal information of user behaviors, the dynamic change of user instant interest when the user scrolls down and the interactions between the trigger and target items. To tackle these problems, we propose a novel method — Deep Evolutional Instant Interest Network (DEI2N), for click-through rate prediction in TIR scenarios. Specifically, we design a User Instant Interest Modeling Layer to predict the dynamic change of the intensity of instant interest when the user scrolls down. Temporal information is utilized in user behavior modeling. Moreover, an Interaction Layer is introduced to learn better interactions between the trigger and target items. We evaluate our method on several offline and real-world industrial datasets. Experimental results show that our proposed DEI2N outperforms state-of-the-art baselines. In addition, online A/B testing demonstrates the superiority over the existing baseline in real-world production environments.
On the Effectiveness of Unlearning in Session-Based Recommendation
Session-based recommendation predicts users’ future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the one hand, these approaches cannot achieve satisfying unlearning effects due to the collaborative correlations and sequential connections between the unlearning item and the remaining items in the session. On the other hand, seldom work has conducted the research to verify the unlearning effectiveness in the session-based recommendation scenario.
In this paper, we propose SRU, a session-based recommendation unlearning framework, which enables high unlearning efficiency, accurate recommendation performance, and improved unlearning effectiveness in session-based recommendation. Specifically, we first partition the training sessions into separate sub-models according to the similarity across the sessions, then we utilize an attention-based aggregation layer to fuse the hidden states according to the correlations between the session and the centroid of the data in the sub-model. To improve the unlearning effectiveness, we further propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED). Besides, we propose an evaluation metric that measures whether the unlearning sample can be inferred after the data deletion to verify the unlearning effectiveness. We implement SRU with three representative session-based recommendation models and conduct experiments on three benchmark datasets. Experimental results demonstrate the effectiveness of our methods. Codes and data are available at https://github.com/shirryliu/SRU-code.
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of these modalities presents challenges for neural networks. In this paper, we introduce WebLM, a multimodal pre-training network designed to address the limitations of solely modeling text and structure modalities of HTML in webpages. Instead of processing document images as unified natural images, WebLM integrates the hierarchical structure of document images to enhance the understanding of markup-language-based documents. Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively. Empirical results demonstrate that the pre-trained WebLM significantly surpasses previous state-of-the-art pre-trained models across several webpage understanding tasks. The pre-trained models and code are available at https://github.com/X-LANCE/weblm.
Towards Alignment-Uniformity Aware Representation in Graph Contrastive Learning
Graph Contrastive Learning (GCL) methods benefit from two key properties: alignment and uniformity, which encourage the representation of related objects together while pushing apart different objects. Most GCL methods aim to preserve alignment and uniformity through random graph augmentation strategies and indiscriminately negative sampling. However, their performance is highly sensitive to graph augmentation, which requires cumbersome trial-and-error and expensive domain-specific knowledge as guidance. Besides, these methods perform negative sampling indiscriminately, which inevitably suffers from sampling bias, i.e., negative samples from the same class as the anchor. To remedy these issues, we propose a unified GCL framework towards Alignment-Uniformity Aware Representation learning (AUAR), which can achieve better alignment while improving uniformity without graph augmentation and negative sampling. Specifically, we propose intra- and inter-alignment loss to align the representations of the node with itself and its cluster centroid to maintain label-invariant. Furthermore, we introduce a uniformity loss with theoretical analysis, which pushes the representations of unrelated nodes from different classes apart and tends to provide informative variance from different classes. Extensive experiments demonstrate that our method gains better performance than existing GCL methods in node classification and clustering tasks across three widely-used datasets.
Debiasing Sequential Recommenders through Distributionally Robust Optimization over System Exposure
Sequential recommendation (SR) models are typically trained on user-item interactions which are affected by the system exposure bias, leading to the user preference learned from the biased SR model not being fully consistent with the true user preference. Exposure bias refers to the fact that user interactions are dependent upon the partial items exposed to the user. Existing debiasing methods do not make full use of the system exposure data and suffer from sub-optimal recommendation performance and high variance.
In this paper, we propose to debias sequential recommenders through Distributionally Robust Optimization (DRO) over system exposure data. The key idea is to utilize DRO to optimize the worst-case error over an uncertainty set to safeguard the model against distributional discrepancy caused by the exposure bias. The main challenge to apply DRO for exposure debiasing in sequential recommendation lies in how to construct the uncertainty set and avoid the overestimation of user preference on biased samples. Moreover, since the test set could also be affected by the exposure bias, how to evaluate the debiasing effect is also an open question. To this end, we first introduce an exposure simulator trained upon the system exposure data to calculate the exposure distribution, which is then regarded as the nominal distribution to construct the uncertainty set of DRO. Then, we introduce a penalty to items with high exposure probability to avoid the overestimation of user preference for biased samples. Finally, we design a debiased self-normalized inverse propensity score (SNIPS) evaluator for evaluating the debiasing effect on the biased offline test set. We conduct extensive experiments on two real-world datasets to verify the effectiveness of the proposed methods. Experimental results demonstrate the superior exposure debiasing performance of proposed methods. Codes and data are available at https://github.com/nancheng58/DebiasedSR_DRO.
Unified Pretraining for Recommendation via Task Hypergraphs
Although pretraining has garnered significant attention and popularity in recent years, its application in graph-based recommender systems is relatively limited. It is challenging to exploit prior knowledge by pretraining in widely used ID-dependent datasets. On the one hand, user-item interaction history in one dataset can hardly be transferred to other datasets through pretraining, where IDs are different. On the other hand, pretraining and finetuning on the same dataset leads to a high risk of overfitting. In this paper, we propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation. Experimental results on three benchmark datasets verify the superiority of UPRTH. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework.
GAP: A Grammar and Position-Aware Framework for Efficient Recognition of Multi-Line Mathematical Formulas
Formula recognition endeavors to automatically identify mathematical formulas from images. Currently, the Encoder-Decoder model has significantly advanced the translation from image to corresponding formula markups. Nonetheless, previous research primarily concentrated on single-line formula recognition, ignoring the recognition of multi-line formulas, which presents additional challenges such as more stringent grammatical restrictions and two- dimensional positions. In this work, we present GAP (Grammar And Position-Aware formula recognition), a comprehensive framework designed to tackle the challenges in multi-line mathematical formula recognition. First, to overcome the limitations imposed by grammar, we design a novel Grammar Aware Contrastive Learning (GACL) module, integrating complex grammar rules into the transcription model through a contrastive learning mechanism. Furthermore, primitive contrastive learning lacks clear directions for comprehending grammar rules and can lead to unstable convergence or prolonged training cycles. To enhance training efficiency, we propose Rank-Based Sampling (RBS) specialized for multi-line formulas, which guides the learning process by the importance ranking of different grammar errors. Finally, spatial location information is critical considering the two-dimensional nature of multi-line formulas. To aid the model in keeping track of that global information, we introduced a Visual Coverage (VC) mechanism that incorporates historical attention information into the image features via a parameter-free way. To validate the effectiveness of our GAP framework, we construct a new dataset Multi-Line containing 12,002 multi-line formulas and conduct extensive experiments to show the efficacy of our GAP framework in capturing grammatical rules, enhancing recognition accuracy, and enhancing training efficiency. Codes and datasets are available at https://github.com/Sinon02/GAP.
COTER: Conditional Optimal Transport meets Table Retrieval
Ad hoc table retrieval refers to the task of performing semantic matching between given queries and candidate tables. In recent years, the approach to addressing this retrieval task has undergone significant shifts, transitioning from utilizing hand-crafted features to leveraging the power of Pre-trained Language Models (PLMs). However, key challenges arise when candidate tables contain shared items, and/or queries may refer to only a subset of table items rather than the entire one. Existing models often struggle to distinguish the most informative items and fail to accurately identify the relevant items required to match with the query.
To bridge this gap, we propose C onditional O ptimal T ransport based table retrievER (COTER). The proposed algorithm is characterized by simplifying candidate tables, where the semantic meaning of one or several words (from the original table) is enabled to be effectively “transported” to individual words (from the simplified table), under the prior condition of the query. COTER achieves two essential goals simultaneously: minimizing the semantic loss during the table simplification and ensuring that retained items from simplified tables effectively match the given query. Importantly, the theoretical foundation of COTER empowers it to adapt dynamically to different queries and enhances the overall performance of the table retrieval. Experiments on two popular Web-table retrieval benchmarks show that COTER can effectively identify informative table items without sacrificing retrieval accuracy. This leads to the new state-of-the-art with substantial gains of up to 0.48 absolute Mean Average Precision (MAP) points, compared to the previously reported best result.
PhoGAD: Graph-based Anomaly Behavior Detection with Persistent Homology Optimization
A multitude of toxic online behaviors, ranging from network attacks to anonymous traffic and spam, have severely disrupted the smooth operation of networks. Due to the inherent sender-receiver nature of network behaviors, graph-based frameworks are commonly used for detecting anomalous behaviors. However, in real-world scenarios, the boundary between normal and anomalous behaviors tends to be ambiguous. The local heterophily of graphs interferes with the detection, and existing methods based on nodes or edges introduce unwanted noise into representation results, thereby impacting the effectiveness of detection. To address these issues, we propose PhoGAD, a graph-based anomaly detection framework. PhoGAD leverages persistent homology optimization to clarify behavioral boundaries. Building upon this, the weights of adjacent edges are designed to mitigate the effects of local heterophily. Subsequently, to tackle the noise problem, we conduct a formal analysis and propose a disentangled representation-based explicit embedding method, ultimately achieving anomaly behavior detection. Experiments on intrusion, traffic, and spam datasets verify that PhoGAD has surpassed the performance of state-of-the-art (SOTA) frameworks in detection efficacy. Notably, PhoGAD demonstrates robust detection even with diminished anomaly proportions, highlighting its applicability to real-world scenarios. The analysis of persistent homology demonstrates its effectiveness in capturing the topological structure formed by normal edge features. Additionally, ablation experiments validate the effectiveness of the innovative mechanisms integrated within PhoGAD.
Linear Recurrent Units for Sequential Recommendation
State-of-the-art sequential recommendation relies heavily on self-attention-based recommender models. Yet such models are computationally expensive and often too slow for real-time recommendation. Furthermore, the self-attention operation is performed at a sequence-level, thereby making low-cost incremental inference challenging. Inspired by recent advances in efficient language modeling, we propose linear recurrent units for sequential recommendation (LRURec). Similar to recurrent neural networks, LRURec offers rapid inference and can achieve incremental inference on sequential inputs. By decomposing the linear recurrence operation and designing recursive parallelization in our framework, LRURec provides the additional benefits of reduced model size and parallelizable training. Moreover, we optimize the architecture of LRURec by implementing a series of modifications to address the lack of non-linearity and improve training dynamics. To validate the effectiveness of our proposed LRURec, we conduct extensive experiments on multiple real-world datasets and compare its performance against state-of-the-art sequential recommenders. Experimental results demonstrate the effectiveness of LRURec, which consistently outperforms baselines by a significant margin. Results also highlight the efficiency of LRURec with our parallelized training paradigm and fast inference on long sequences, showing its potential to further enhance user experience in sequential recommendation.
IncMSR: An Incremental Learning Approach for Multi-Scenario Recommendation
For better performance and less resource consumption, multi-scenario recommendation (MSR) is proposed to train a unified model to serve all scenarios by leveraging data from multiple scenarios. Current works in MSR focus on designing effective networks for better information transfer among different scenarios. However, they omit two important issues when applying MSR models in industrial situations. The first is the efficiency problem brought by mixed data, which delays the update of models and further leads to performance degradation. The second is that MSR models are insensitive to the changes of distribution over time, resulting in suboptimal effectiveness in the incoming data. In this paper, we propose an incremental learning approach for MSR (IncMSR), which can not only improve the training efficiency but also perceive changes in distribution over time. Specifically, we first quantify the pair-wise distance between representations from scenario, time and time-scenario dimensions respectively. Then, we decompose the MSR model into scenario-shared and scenario-specific parts and apply fine-grained constraints on the distances quantified with respect to the two different parts. Finally, all constraints are fused in an elegant way using a metric learning framework as a supplementary penalty term to the original MSR loss function. Offline experiments on two real-world datasets are conducted to demonstrate the superiority and compatibility of our proposed approach.
Defense Against Model Extraction Attacks on Recommender Systems
The robustness of recommender systems has become a prominent topic within the research community. Numerous adversarial attacks have been proposed, but most of them rely on extensive prior knowledge, such as all the white-box attacks or most of the black-box attacks which assume that certain external knowledge is available. Among these attacks, the model extraction attack stands out as a promising and practical method, involving training a surrogate model by repeatedly querying the target model. However, there is a significant gap in the existing literature when it comes to defending against model extraction attacks on recommender systems. In this paper, we introduce Gradient-based Ranking Optimization (GRO), which is the first defense strategy designed to counter such attacks. We formalize the defense as an optimization problem, aiming to minimize the loss of the protected target model while maximizing the loss of the attacker’s surrogate model. Since top-k ranking lists are non-differentiable, we transform them into swap matrices which are instead differentiable. These swap matrices serve as input to a student model that emulates the surrogate model’s behavior. By back-propagating the loss of the student model, we obtain gradients for the swap matrices. These gradients are used to compute a swap loss, which maximizes the loss of the student model. We conducted experiments on three benchmark datasets to evaluate the performance of GRO, and the results demonstrate its superior effectiveness in defending against model extraction attacks.
Maximizing Malicious Influence in Node Injection Attack
Graph neural networks (GNNs) have achieved impressive performance in various graph-related tasks. However, recent studies have found that GNNs are vulnerable to adversarial attacks. Node injection attacks (NIA) become an emerging scenario of graph adversarial attacks, where the attacks are performed by injecting malicious nodes into the original graph instead of directly modifying it. In this paper, we focus on a more realistic scenario of NIA, where the attacker is only allowed to inject a small number of nodes to degrade the performance of GNNs with very limited information. We analyze the susceptibility of nodes, and based on this we propose a global node injection attack framework, MaxiMal, to maximize malicious information under a strict black-box setting. MaxiMal first introduces a susceptible-reverse influence sampling strategy to select neighbor nodes that are able to spread malicious information widely. Then contrastive loss is introduced to optimize the objective by updating the edges and features of the injected nodes. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed MaxiMal over the state-of-the-art approaches.
Interpretable Imitation Learning with Dynamic Causal Relations
Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents’ decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, CAIL. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed CAIL in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy.
RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis
Aspect-based sentiment analysis (ABSA) is dedicated to forecasting the sentiment polarity of aspect terms within sentences. Employing graph neural networks to capture structural patterns from syntactic dependency parsing has been confirmed as an effective approach for boosting ABSA. In most works, the topology of dependency trees or dependency-based attention coefficients is often loosely regarded as edges between aspects and opinions, which can result in insufficient and ambiguous syntactic utilization. To address these problems, we propose a new reinforced dependency graph convolutional network (RDGCN) that improves the importance calculation of dependencies in both distance and type views. Initially, we propose an importance calculation criterion for the minimum distances over dependency trees. Under the criterion, we design a distance-importance function that leverages reinforcement learning for weight distribution search and dissimilarity control. Since dependency types often do not have explicit syntax like tree distances, we use global attention and mask mechanisms to design type-importance functions. Finally, we merge these weights and implement feature aggregation and classification. Comprehensive experiments show the effectiveness of the criterion and importance functions. RDGCN yields excellent analysis results.
CreST: A Credible Spatiotemporal Learning Framework for Uncertainty-aware Traffic Forecasting
Spatiotemporal traffic forecasting plays a critical role in intelligent transportation systems, which empowers diverse urban services. Existing traffic forecasting frameworks usually devise various learning strategies to capture spatiotemporal correlations from the perspective of volume itself. However, we argue that previous traffic predictions are still unreliable due to two aspects. First, the influences of context factor-wise interactions on dynamic region-wise correlations are under exploitation. Second, the dynamics induce the credibility issue of forecasting that has not been well-explored. In this paper, we exploit the informative traffic-related context factors to jointly tackle the dynamic regional heterogeneity and explain the stochasticity, towards a credible uncertainty-aware traffic forecasting. Specifically, to internalize the dynamic contextual influences into learning process, we design a context-cross relational embedding to capture interactions between each context, and generate virtual graph topology to dynamically relate pairwise regions with context embedding. To quantify the prediction credibility, we attribute data-side aleatoric uncertainty to contexts and re-utilize them for aleatoric uncertainty quantification. Then we couple a dual-pipeline learning with the same objective to produce the discrepancy of model outputs and quantify model-side epistemic uncertainty. These two uncertainties are fed through a spatiotemporal network for extracting uncertainty evolution patterns. Finally, comprehensive experiments and model deployments have corroborated the credibility of our framework.
Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices
While Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications, we demonstrate that, in link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes. We theoretically and empirically investigate how these practices impact node-level performance across different degrees. Specifically, we explore three issues that arise: (I1) overfitting; (I2) distribution shift; and (I3) implicit test leakage. The former two issues lead to poor generalizability to the test data, while the latter leads to overestimation of the model’s performance and directly impacts the deployment of GNNs. To address these issues in a systematic way, we introduce an effective and efficient GNN training framework, SpotTarget, which leverages our insight on low-degree nodes: (1) at training time, it excludes a (training) edge to be predicted if it is incident to at least one low-degree node; and (2) at test time, it excludes all test edges to be predicted (thus, mimicking real scenarios of using GNNs, where the test data is not included in the graph). SpotTarget helps researchers and practitioners adhere to best practices for learning from graph data, which are frequently overlooked even by the most widely-used frameworks. Our experiments on various real-world datasets show that SpotTarget makes GNNs up to 15× more accurate in sparse graphs, and significantly improves their performance for low-degree nodes in dense graphs.
Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation
Modern recommender systems employ various sequential modules such as self-attention to learn dynamic user interests. However, these methods are less effective in capturing collaborative and transitional signals within user interaction sequences. First, the self-attention architecture uses the embedding of a single item as the attention query, making it challenging to capture collaborative signals. Second, these methods typically follow an auto-regressive framework, which is unable to learn global item transition patterns. To overcome these limitations, we propose a new method called Multi-Query Self-Attention with Transition-Aware Embedding Distillation (MQSA-TED). First, we propose an L-query self-attention module that employs flexible window sizes for attention queries to capture collaborative signals. In addition, we introduce a multi-query self-attention method that balances the bias-variance trade-off in modeling user preferences by combining long and short-query self-attentions. Second, we develop a transition-aware embedding distillation module that distills global item-to-item transition patterns into item embeddings, which enables the model to memorize and leverage transitional signals and serves as a calibrator for collaborative signals. Experimental results on four real-world datasets demonstrate the effectiveness of the proposed modules.
The Devil is in the Data: Learning Fair Graph Neural Networks via Partial Knowledge Distillation
Graph neural networks (GNNs) are being increasingly used in many high-stakes tasks, and as a result, there is growing attention on their fairness recently. GNNs have been shown to be unfair as they tend to make discriminatory decisions toward certain demographic groups, divided by sensitive attributes such as gender and race. While recent works have been devoted to improving their fairness performance, they often require accessible demographic information. This greatly limits their applicability in real-world scenarios due to legal restrictions. To address this problem, we present a demographic-agnostic method to learn fair GNNs via knowledge distillation, namely FairGKD. Our work is motivated by the empirical observation that training GNNs on partial data (i.e., only node attributes or topology data) can improve their fairness, albeit at the cost of utility. To make a balanced trade-off between fairness and utility performance, we employ a set of fairness experts (i.e., GNNs trained on different partial data) to construct the synthetic teacher, which distills fairer and informative knowledge to guide the learning of the GNN student. Experiments on several benchmark datasets demonstrate that FairGKD, which does not require access to demographic information, significantly improves the fairness of GNNs by a large margin while maintaining their utility.\footnoteOur code is available via: \code.
Dance with Labels: Dual-Heterogeneous Label Graph Interaction for Multi-intent Spoken Language Understanding
Multi-intent spoken language understanding (SLU) has garnered increasing attention since it can handle complex utterances expressing multiple intents in real-world scenarios. However, existing joint models are disturbed by label statistical frequencies, or adopt homogeneous graphs to capture interactions between the different types (e.g., intent and slot) of label nodes, thereby limiting the performance. To overcome these limitations, we propose Dual Heterogeneous Graph Label Interaction for multi-intent SLU, named DHLG. Concretely, we propose a global static heterogeneous label graph interaction layer to model both intra- and inter-label statistical dependencies across the entire training corpus. Based on this, we introduce a local dynamic heterogeneous label graph layer to further facilitate adaptive interactions between intents and slots for each utterance. Extensive experiments and analyses on two widely-used benchmark datasets demonstrate the superiority of our proposed DHLG over state-of-the-art methods.
MultiSPANS: A Multi-range Spatial-Temporal Transformer Network for Traffic Forecast via Structural Entropy Optimization
Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot reflect critical spatiotemporal local patterns, we design multi-filter convolution modules for generating informative ST-token embeddings to facilitate attention computation. Then, based on ST-token and spatial-temporal position encoding, we employ the Transformers to capture long-range temporal and spatial dependencies. Furthermore, we introduce structural entropy theory to optimize the spatial attention mechanism. Specifically, The structural entropy minimization algorithm is used to generate optimal road network hierarchies, i.e., encoding trees. Based on this, we propose a relative structural entropy-based position encoding and a multi-head attention masking scheme based on multi-layer encoding trees. Extensive experiments demonstrate the superiority of the presented framework over several state-of-the-art methods in real-world traffic datasets, and the longer historical windows are effectively utilized. The code is available at https://github.com/SELGroup/MultiSPANS.
SESSION: Short Demo Papers
IAI MovieBot 2.0: An Enhanced Research Platform with Trainable Neural Components and Transparent User Modeling
While interest in conversational recommender systems has been on the rise, operational systems suitable for serving as research platforms for comprehensive studies are currently lacking. This paper introduces an enhanced version of the IAI MovieBot conversational movie recommender system, aiming to evolve it into a robust and adaptable platform for conducting user-facing experiments. The key highlights of this enhancement include the addition of trainable neural components for natural language understanding and dialogue policy, transparent and explainable modeling of user preferences, along with improvements in the user interface and research infrastructure.
WordGraph: A Python Package for Reconstructing Interactive Causal Graphical Models from Text Data
We present WordGraph, a Python package for exploring the topics of documents corpora. WordGraph provides causal graphical models from text data vocabulary and proposes interactive visualizations of terms networks. Our ease-to-use package is provided with a pre-built pipeline to access the main modules through jupyter widgets. It results in the encapsulation of a whole vocabulary exploration process within a single jupyter notebook cell, with straightforward parameters settings and interactive plots. WordGraph pipeline is fully customizable by adding/removing widgets or changing default parameters. To assist users with no background in Python nor jupyter notebook, but willing to explore large corpora topics, we also propose an automatic dashboard generation from the customizable jupyter notebook pipeline in a web application style. WordGraph is available through a GitHub repository at https://github.com/MLDS-software/WordGraph.
CharmBana: Progressive Responses with Real-Time Internet Search for Knowledge-Powered Conversations
Chatbots are often hindered by the latency associated with integrating real-time web search results, compromising user experience. To overcome this, we present CharmBana, an innovative social chatbot that introduces the use of progressive response generation to effortlessly blend search results into the bot’s responses, while ensuring low response latency. The use of progressive responses is especially beneficial for voice-based chatbots, where the preliminary response buys time for a detailed follow-up, ensuring a smooth user interaction. As a result, our method not only cuts down user waiting times by 50% but also generates more relevant, precise, and engaging search inquiries. When tested in the Alexa Prize Socialbot Grand Challenge 5, our chatbot employing progressive responses consistently received higher user ratings.
GEMRec: Towards Generative Model Recommendation
Recommender Systems are built to retrieve relevant items to satisfy users’ information needs. The candidate corpus usually consists of a finite set of items that are ready to be served, such as videos, products, or articles. With recent advances in Generative AI such as GPT and Diffusion models, a new form of recommendation task is yet to be explored where items are to be created by generative models with personalized prompts. Taking image generation as an example, with a single prompt from the user and access to a generative model, it is possible to generate hundreds of new images in a few minutes. How shall we attain personalization in the presence of “infinite” items? In this preliminary study, we propose a two-stage framework, namely Prompt-Model Retrieval and Generative Model Ranking, to approach this new task formulation. We release GEMRec-18K, a prompt-model interaction dataset with 18K images generated by 200 publicly available generative models paired with a diverse set of 90 textual prompts. Through a demo user interface based on the proposed framework, we illustrate the promise of Generative Model Recommendation as a novel personalization problem and highlight future directions. Our code and dataset are available at: https://github.com/MAPS-research/GEMRec.
EvidenceQuest: An Interactive Evidence Discovery System for Explainable Artificial Intelligence
Explainable Artificial Intelligence (XAI) aims to make artificial intelligence (AI) systems transparent and understandable to humans, providing clear explanations for the decisions made by AI models. This paper presents a novel pipeline and a digital dashboard that provides a user-friendly platform for interpreting the results of machine learning algorithms using XAI technology. The dashboard utilizes evidence-based design principles to deliver information clearly and concisely, enabling users to better understand the decisions made by their algorithms. We integrate XAI services into the dashboard to explain the algorithm’s predictions, allowing users to understand how their models function and make informed decisions. We demonstrate a motivating scenario in banking and present how the proposed system enhances transparency and accountability and improves trust in the technology.
SIRUP: Search-based Book Recommendation Playground
This work presents a playground platform to demonstrate and interactively explore a suite of methods for utilizing user review texts to generate book recommendations. The focus is on search-based settings where the user provides situative context by focusing on a genre, a given item, her full user profile, or a newly formulated query. The platform allows exploration over two large datasets with various methods for creating concise user profiles.
Ginkgo-P: General Illustrations of Knowledge Graphs for Openness as a Platform
Accessibility and openness are two of the most important factors in motivating AI and Web research. One example is as costs to train and deploy large Knowledge Graph (KG) systems increases, valuable auxiliary features such as visualization, explainability, and automation are often overlooked, diminishing impact and popularity. Furthermore, current KG research has undergone a vicissitude to become convoluted and abstract, dissuading collaboration. To this end, we present Ginkgo-P, a platform to automatically illustrate any KG algorithm with nothing but a script and a data file. Additionally, Ginkgo-P elucidates modern KG research on the UMLS dataset with interactive demonstrations on four categories: KG Node Recommendation, KG Completion, KG Question Answering, and KG Reinforcement Learning. These categories and their many applications are increasingly ubiquitous yet lack both introductory and advanced resources to accelerate interest and contributions: with just a few clicks, our demonstration addresses this by providing an open platform for users to integrate individual KG algorithms. The source code for Ginkgo-P is available: we hope that it will propel future KG systems to become more accessible as an open source project.
Real-time E-bike Route Planning with Battery Range Prediction
Electric bicycles (EBs) have gained immense popularity as an environmentally friendly and convenient transportation mode. However, range anxiety remains a major concern for EB users. This paper presents a real-time route planning model focused on predicting the remaining range of EBs. First, we represent the user’s interaction data and the real-time battery state as a dynamic graph. Then we propose a novel approach called the Real-Time Electric Bicycle Remaining Range (RtRR) prediction model, which leverages the graph structure and jointly optimizes temporal edge convolution, LSTM, and Transformer models to estimate the remaining EB battery range. Based on the prediction, we can update the optimal cycling routes for users in real-time, considering charging station locations. Extensive evaluations demonstrate that our proposed RtRR model outperforms 9 baseline methods on real-world datasets. The route planning based on RtRR prediction effectively alleviates range anxiety and enhances the user experience. It can be accessed at https://github.com/gu-yongchun/Real-time-E-bike-Route-Planning-with-Battery-Range-Prediction.
An Interpretable Brain Graph Contrastive Learning Framework for Brain Disorder Analysis
In this paper, we propose an interpretable brain graph contrastive learning framework, which aims to learn brain graph representations by a unsupervised way for disorder prediction and pathogenic analysis. Our framework consists of two key designs: We first utilize the controllable data augmentation strategy to perturb unimportant structures and attribute features for the generation of brain graphs. Then, considering that the difference of healthy and patient brain graphs is small, we introduce hard negative sample evaluation to weight negative samples of the contrastive loss, which can learn more discriminative brain graph representations. More importantly, our method can observe salient brain regions and connections for pathogenic analysis. We conduct disorder prediction and interpretable analysis experiments on three real-world neuroimaging datasets to demonstrate the effectiveness of our framework.
Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs
The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.
Future Timelines: Extraction and Visualization of Future-related Content From News Articles
In today’s rapidly evolving world, maintaining a comprehensive overview of the future landscape is essential for staying competitive and making informed decisions. However, given the large volume of daily news, manually obtaining a thorough overview of an entity’s future prospects is quite challenging. To address this, we present a system designed to automatically extract and summarize future-related information of a queried entity from news articles. Our approach utilizes a novel and publicly accessible multi-source dataset comprising 6,800 annotated sentences to fine-tune a language model to identify future-related sentences. We then use topic modeling to extract the main topics from the data and rank them by relevance as well as present them on an interactive timeline. User evaluations have shown that the timelines and summaries our system produces are useful. The system is available as a web application at: https://chronicle2050.regevson.com.
Temporal Graph Analysis with TGX
Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely designed for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node sub-sampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Traffic (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.
Vector Search with OpenAI Embeddings: Lucene Is All You Need
We provide a reproducible, end-to-end demonstration of vector search with OpenAI embeddings using Lucene on the popular MS MARCO passage ranking test collection. The main goal of our work is to challenge the prevailing narrative that a dedicated vector store is necessary to take advantage of recent advances in deep neural networks as applied to search. Quite the contrary, we show that hierarchical navigable small-world network (HNSW) indexes in Lucene are adequate to provide vector search capabilities in a standard bi-encoder architecture. This suggests that, from a simple cost-benefit analysis, there does not appear to be a compelling reason to introduce a dedicated vector store into a modern “AI stack” for search, since such applications have already received substantial investments in existing, widely deployed infrastructure.
Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters
Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaboration between adapters at a small computational cost. Experimental results over a mixture of NLP tasks show that our proposed MTA architecture and the two-stage training method achieve good performance. Based on ALTER, we have also produced MTA-equipped language models for various domains.
A Scalable Open-Source System for Segmenting Urban Areas with Road Networks
\beginabstract Segmenting an urban area into regions is fundamentally important for many spatio-temporal applications. The traditional grid-based method offers a simple solution as it divides the city map into equal-sized grids, but it fails to preserve semantic information about the original urban structure. Several studies apply the road network to cut the metropolitan area into meaningful blocks. However, existing works do not achieve good scalability, and there is no public system provided so far. To address those problems, we build\mysystem, the first scalable vector-based system which generates reasonable regions using all levels of the road network. We conduct an evaluation to prove the efficiency and effectiveness of our system. We also publish our system as a Python library through Python Package Index (PyPI), and demonstrate its utility using real public datasets in this paper. The source code and useful instructions can be found on https://github.com/PaddlePaddle/PaddleSpatial/tree/main/paddlespatial/tools/genregion. \endabstract
Domain Level Interpretability: Interpreting Black-box Model with Domain-specific Embedding
The importance of incorporating interpretability into machine learning models has been increasingly emphasized. While previous literature has typically focused on feature level interpretability, such as analyzing which features are important and how they influence the final decision, real-world applications often require domain level interpretability, which relates to a group of features. Domain-level interpretability holds the potential for enhanced informativeness and comprehensibility. Unfortunately, there has been limited research in this direction. In this paper, we address this issue and introduce our proposed method DIDE, which obtains domain-level interpretability from domain-specific latent embeddings. To enhance the effectiveness of the framework, we draw inspiration from the gradient smooth philosophy and propose noisy injection in the embedding space, resulting in smoothed interpretability. We conduct extensive experiments to validate the effectiveness of DIDE, and demonstrate its applications in assisting daily business tasks in Alipay.
Wildfire: A Twitter Social Sensing Platform for Layperson
We present Wildfire, an innovative social sensing platform designed for laypersons. The goal is to support users in conducting social sensing tasks using Twitter data without programming and data analytics skills. Existing open-source and commercial social sensing tools only support data collection using simple keyword-based or account-based search. On the contrary, Wildfire employs a heuristic graph exploration method to selectively expand the collected tweet-account graph in order to further retrieve more task-relevant tweets and accounts. This approach allows for the collection of data to support complex social sensing tasks that cannot be met with a simple keyword search. In addition, Wildfire provides a range of analytic tools, such as text classification, topic generation, and entity recognition, which can be crucial for tasks such as trend analysis. The platform also provides a web-based user interface for creating and monitoring tasks, exploring collected data, and performing analytics.
SESSION: Tutorial Papers
Some Useful Things to Know When Combining IR and NLP: The Easy, the Hard and the Ugly
Deep nets such as GPT are at the core of the current advances in many systems and applications. Things are moving fast; techniques become obsolete quickly (within weeks). How can we take advantage of new discoveries and incorporate them into our existing work? Are new developments radical improvements, or incremental repetitions of established concepts, or combinations of both?
In this tutorial, we aim to bring interested researchers and practitioners up to speed on the recent and ongoing techniques around ML and Deep learning in the context of IR and NLP. Additionally, our goal is to clarify terminology, emphasize fundamentals, and outline problems and new research opportunities.
Introduction to Responsible AI
In the first part of this tutorial we define responsible AI and we discuss the problems embedded in terms like ethical or trustworthy AI. In the second part, to set the stage, we cover irresponsible AI: discrimination (e.g., the impact of human biases); pseudo-science (e.g., biometric based behavioral predictions); human limitations (e.g., human incompetence, cognitive biases); technical limitations (data as a proxy of reality, wrong evaluation); social impact (e.g., unfair digital markets or mental health and disinformation issues created by large language models); environmental impact (e.g., indiscriminate use of computing resources). These examples do have a personal bias but set the context for the third part where we cover the current challenges: ethical principles, governance and regulation. We finish by discussing our responsible AI initiatives, many recommendations, and some philosophical issues.
Unbiased Learning to Rank: On Recent Advances and Practical Applications
Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations, along with several applications of its methods.
The tutorial is divided into four parts: Firstly, we give an overview of the different forms of bias that can be addressed with ULTR methods. Secondly, we present a comprehensive discussion of the latest estimation techniques in the ULTR field. Thirdly, we survey published results of ULTR in real-world applications. Fourthly, we discuss the connection between ULTR and fairness in ranking. We end by briefly reflecting on the future of ULTR research and its applications.
This tutorial is intended to benefit both researchers and industry practitioners interested in developing new ULTR solutions or utilizing them in real-world applications.
Bridging Text Data and Graph Data: Towards Semantics and Structure-aware Knowledge Discovery
Graphs and texts are two key modalities in data mining. In many cases, the data presents a mixture of the two modalities and the information is often complementary: in e-commerce data, the product-user graph and product descriptions capture different aspects of product features; in scientific literature, the citation graph, author metadata, and the paper content all contribute to modeling the paper impact.
Towards Trustworthy Large Language Models
Large Language models are among the most exciting technologies developed in the last few years. While the model’s capabilities continue to improve, researchers, practitioners, and the general public are increasingly aware of some of its shortcomings. What will it take to build trustworthy large language models?
This tutorial will present a range of recent findings, discussions, questions, and partial answers in the space of trustworthiness in large language models. While this tutorial will not attempt a comprehensive overview of this rich area, we aim to provide the participants with some tools and insights and to understand both the conceptual foundations of trustworthiness and a broad range of ongoing research efforts. We will tackle some of the hard questions that you may have about trustworthy large language models and hopefully address some misconceptions that have become pervasive.
Strategic ML: How to Learn With Data That ‘Behaves’
The success of machine learning across a wide array of tasks and applications has made it appealing to use it also in the social domain. Indeed, learned models now form the backbone of recommendation systems, social media platforms, online markets, and e-commerce services, where they are routinely used to inform decisions by, for, and about their human users. But humans are not your conventional input–they have goals, beliefs, and aspirations, and take action to promote their own interests. Given that standard learning methods are not designed to handle inputs that ‘behave’, a natural question is: how should we design learning systems when we know they will be deployed and used in social settings? This tutorial introduces strategic machine learning, a new and emerging subfield of machine learning that aims to develop a disciplined framework for learning under strategic user behavior. The working hypothesis of strategic ML is simple: users want things, and act to achieve them. Surprisingly, this basic truism is difficult to address within the conventional learning framework. The key challenge is that how users behave often depends on the learned decision rule itself; thus, strategic learning seeks to devise methods which are able to anticipate and accommodate such responsive behavior. Towards this, strategic ML offers a formalism for reasoning about strategic responses, for designing appropriate learning objectives, and for developing practical tools for learning in strategic environments. The tutorial will survey recent and ongoing work in this new domain, present key theoretical and empirical results, provide practical tools, and discuss open questions and landmark challenges.
Practical Bandits: An Industry Perspective
The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms have seen a large and growing interest from industrial applications, such as search, recommendation and advertising. Indeed, with the bandit lens comes the promise of direct optimisation for the metrics we care about.
Nevertheless, the road to successfully applying bandits in production is not an easy one. Even when the action space and rewards are well-defined, practitioners still need to make decisions regarding multi-arm or contextual approaches, on- or off-policy setups, delayed or immediate feedback, myopic or long-term optimisation, etc. To make matters worse, industrial platforms typically give rise to large action spaces in which existing approaches tend to break down. The research literature on these topics is broad and vast, but this can overwhelm practitioners, whose primary aim is to solve practical problems, and therefore need to decide on a specific instantiation or approach for each project. This tutorial will take a step towards filling that gap between the theory and practice of bandits. Our goal is to present a unified overview of the field and its existing terminology, concepts and algorithms—with a focus on problems relevant to industry. We hope our industrial perspective will help future practitioners who wish to leverage the bandit paradigm for their application.
SESSION: Doctoral Consortiums
Leveraging User Simulation to Develop and Evaluate Conversational Information Access Agents
We observe a change in the way users access information, that is, the rise of conversational information access (CIA) agents. However, the automatic evaluation of these agents remains an open challenge. Moreover, the training of CIA agents is cumbersome as it mostly relies on conversational corpora, expert knowledge, and reinforcement learning. User simulation has been identified as a promising solution to tackle automatic evaluation and has been previously used in reinforcement learning. In this research, we investigate how user simulation can be leveraged in the context of CIA. We organize the work in three parts. We begin with the identification of requirements for user simulators for training and evaluating CIA agents and compare existing types of simulator regarding these. Then, we plan to combine these different types of simulators into a new hybrid simulator. Finally, we aim to extend simulators to handle more complex information seeking scenarios.
Understanding User Behavior in Carousel Recommendation Systems for Click Modeling and Learning to Rank
Although carousels (also-known as multilists) have become the standard user interface for recommender systems in many domains (e-commerce, streaming services, etc.) replacing the ranked list, there are many unanswered questions and undeveloped areas when compared to the literature for ranked lists. This is due to two significant barriers: lack of public datasets and lack of eye tracking user studies of browsing behavior. Clicks, the standard feedback collected by recommender systems, are insufficient to understand the whole interaction process of a user with a recommender requiring system designers to make assumptions, especially on browsing behavior. Eye tracking provides a means to elucidate the process and test these assumptions. In this extended abstract, the PhD project is outlined, which aims to address the open research questions in carousel recommender systems by: 1) improving our understanding of users’ browsing behavior with carousels, 2) formulating a new click model based on the empirical evidence of users’ behavior, and 3) proposing a learning to rank algorithm adapted to the carousel setting. For this purpose, we will carry out the first eye tracking user study within a carousel movie recommendation setting and make the resulting unique dataset of users’ gaze and clicks publicly available.
Grounded and Transparent Response Generation for Conversational Information-Seeking Systems
While previous conversational information-seeking (CIS) research has focused on passage retrieval, reranking, and query rewriting, the challenge of synthesizing retrieved information into coherent responses remains. The proposed research delves into the intricacies of response generation in CIS systems. Open-ended information-seeking dialogues introduce multiple challenges that may lead to potential pitfalls in system responses. The study focuses on generating responses grounded in the retrieved passages and being transparent about the system’s limitations. Specific research questions revolve around obtaining confidence-enriched information nuggets, automatic detection of incomplete or incorrect responses, generating responses communicating the system’s limitations, and evaluating enhanced responses. By addressing these research tasks the study aspires to contribute to the advancement of conversational response generation, fostering more trustworthy interactions in CIS dialogues, and paving the way for grounded and transparent systems to meet users’ needs in an information-driven world.
Learning Opinion Dynamics from Data
The foundation of my doctoral thesis is the estimation of agent-based models (ABMs) that simulate opinion dynamics using a likelihood-based method. I establish that the principles governing ABMs can be transformed into equivalent probabilistic generative models that facilitate a well-defined likelihood function. Consequently, I have incorporated these models into an automatic differentiation framework, which simplifies the process and improves the efficiency of performing maximum likelihood estimation through gradient descent techniques.
The first work shows that our maximum likelihood approach outperforms the typical simulation-based in the estimation of the parameter of a bounded confidence model (BCM) in different settings. We compared the two approaches in three realistic scenarios of increasing complexity depending on data availability: (i) fully observed opinions and interactions, (ii) partially observed interactions, (iii) observed interactions with noisy proxies of the opinions. The comparison is strikingly unbalanced both in term of computational time and estimation error.
Thanks to the formalism of the Probabilistic Graphical Models, it is possible to stand on probabilistic machine learning tools for treating ABMs. Hence, the proposed scheme opens to a broad range of inference techniques. In particular, my ongoing research generalizes the estimation of opinion dynamics models to avoid the most intriguing step of the derivation of the likelihood. We cast the ABMs into a probabilistic programming framework, where it is possible to apply black-box variational inference (VI) without an explicit definition of the (possibly intractable) likelihood. To show such methodology, we extend the BCM towards the most common rules existing in opinion dynamics literature.
Gaussian Graphical Model-Based Clustering of Time Series Data
Time series subsequence clustering is a useful tool for recognizing dynamic changes and uncovering interesting patterns in time series, and it can also be applied to downstream tasks. In addition to clustering the data, the interpretability of the cluster is crucial when analyzing the data, particularly because we frequently lack information about each cluster. The Gaussian Graphical Model (GGM) provides a clear explanation of the cluster as the inverse Gaussian covariance matrix (i.e., network) of the GGM encodes the conditional independence structure. In this study, we aim to enhance our understanding of the data by achieving GGM-based clustering for time series data of various types and structures. Furthermore, our objective is to utilize GGM in downstream tasks, including missing value imputation and forecasting, by taking advantage of the relationships between variables shown in the network. We aim to answer the following research questions: RQ1 : How can we obtain interpretable clusters for tensor time series? RQ2 : How can we deal with time series containing missing values? RQ3 : How can we effectively forecast a time series stream?
Effective and Efficient Transformer Models for Sequential Recommendation
Sequential Recommender Systems use the order of user-item interactions to predict the next item in the sequence. This task is similar to Language Modelling, where the goal is to predict the next token based on the sequence of past tokens. Therefore, adaptations of language models, and, in particular, Transformer-based models, achieved state-of-the-art results for a sequential recommendation. However, despite similarities, the sequential recommendation problem poses a number of specific challenges not present in Language Modelling. These challenges include the large catalogue size of real-world recommender systems, which increases GPU memory requirements and makes the training and the inference of recommender models slow. Another challenge is that a good recommender system should focus not only on the accuracy of recommendation but also on additional metrics, such as diversity and novelty, which makes the direct adaptation of language model training strategies problematic. Our research focuses on solving these challenges. In this doctoral consortium abstract, we briefly describe the motivation and background for our work and then pose research questions and discuss current progress towards solving the described problems.
Framework for Bias Detection in Machine Learning Models: A Fairness Approach
The research addresses bias and inequity in binary classification problems in machine learning. Despite existing ethical frameworks for artificial intelligence, detailed guidance on practices and tech niques to address these issues is lacking. The main objective is to identify and analyze theoretical and practical components related to the detection and mitigation of biases and inequalities in machine learning. The proposed approach combines best practices, ethics, and technology to promote the responsible use of artificial intelligence in Colombia. The methodology covers the definition of performance and fairness interests, interventions in preprocessing, processing, and post-processing, and the generation of recommendations and explainability of the model.
Using Causal Inference to Solve Uncertainty Issues in Dataset Shift
Dataset shift will lead to uncertainty issues, and then the models will accurately decline. Using causality instead of correlation to find the invariant characteristic and solve the uncertainty issues between different dataset distributions (eg. Domain Adaptation). Summarizing datasets can be used in current domain training, building a benchmarking framework of causal learning that combines the causal inference and traditional model to detect, address, and determine the characteristic of the dataset shift.
Multi-Granular Text Classification with Minimal Supervision
Our society has been immersed with massive unstructured text data, posing great challenges for people to fetch needed data, digest critical information, and derive actionable knowledge. Such needs necessitate the development of text classification which is a fundamental task towards structuring the unstructured web data. Existing methods either require heavy human annotation or work only with limited scope (e.g., classification into only a small number of classes), far off from the real needs. Recently developed deep learning and pre-trained language models boost our research substantially, but many problems still remain. Therefore, we propose to develop a minimally-supervised approach to structure massive text into a multi-granularity text space. We explore the following four subtasks: (1) weak supervision enrichment, (2) PLM-enhanced weakly-supervised text classification, (3) empowering fine-grained text classification with enriched taxonomy, (4) joint classification of multi-granular text units.
SESSION: Industry Day Talk Abstracts
Augmenting Keyword-based Search in Mobile Applications Using LLMs
Search in mobile applications has traditionally been keyword driven and limited to simple queries, such as searching for product names, even when the apps support much richer, transactional experiences. On the other hand, search on the web has evolved into queries that are complex, objective-based and most often in natural language. The recent advances in Generative AI make it possible to bring the power of web-like, conversational searches into mobile applications. In this talk, we present the various problems, opportunities and challenges in harnessing LLMs to augment the traditional search experience in mobile applications.
Applications of LLMs in E-Commerce Search and Product Knowledge Graph: The DoorDash Case Study
Extracting knowledge from unstructured or semi-structured textual information is essential for the machine learning applications that power DoorDash’s search experience, and the development and maintenance of its product knowledge graph. Large language models (LLMs) have opened up new possibilities for utilizing their power in these areas, replacing or complementing traditional natural language processing methods. LLMs are also proving to be useful in the label and annotation generation process, which is critical for these use cases. In this talk, we will provide a high-level overview of how we incorporated LLMs for search relevance and product understanding use cases, as well as the key lessons learned and challenges faced during their practical implementation.
Scaling Use-case Based Shopping using LLMs
Products on e-commerce websites are usually organized based on seller-provided product attributes. Customers looking for a product typically have certain needs or product use-cases in mind, for e.g., a headphone for gym classes, or a printer for a small business. However, they often struggle to map these use-cases to product attributes and subsequently fail to find the product they need. In this talk, we present a use-case based shopping (UBS) ML system that facilitates use-case based customer experiences (CXs). The UBS system recommends dominant product use-cases to customers along with most relevant products for those use-cases. Use-cases and their definitions vary across product categories and marketplaces (MPs). This makes training supervised models for thousands of e-commerce categories and multiple MPs infeasible by collecting large amount training data needed to train these models. In this talk, we present our work on scaling the UBS model by instruction tuning an LLM for our task.
HealAI: A Healthcare LLM for Effective Medical Documentation
Since the advent of LLM’s like GPT4 everyone in various industries has been trying to harness their power. Healthcare is an industry where this is a specifically challenging problem due to the high accuracy requirements. Prompt Engineering is a common technique used to design instructions for model responses, however, its challenges lie in the fact that the generic models may not be trained to accurately execute these specific tasks. We will present our journey of developing a cost-effective medical LLM, surpassing GPT4 in medical note-writing tasks. We’ll touch upon our trials with medical prompt engineering, GPT4’s limitations, and training an optimized LLM for specific medical tasks. We’ll showcase multiple comparisons on model sizes, training data, and pipeline designs that enabled us to outperform GPT4 with smaller models, maintaining precision, reducing biases, preventing hallucinations, and enhancing note-writing style.
Mitigating Factual Inconsistency and Hallucination in Large Language Models
Large Language Models (LLMs) have demonstrated remarkable capabilities in various language-related tasks enabling applications in various fields such as healthcare, education, financial services etc. However, they are prone to producing factually incorrect responses or ”hallucinations” which can have detrimental consequences such as loss of credibility, diminished customer trust etc. In this presentation, we showcase a solution that addresses the challenge of minimizing hallucinations. Our solution provides accurate responses and generates detailed explanations, thereby enabling the users to know how the model arrived at the final response. Additionally, it verifies if the explanations are factually correct and offers insights into whether the generated explanations are directly derived from the provided context or if they are inferred from it. We also systematically assess the quality of generated responses using an LLM-based evaluation technique. We present empirical results on benchmark datasets to demonstrate the effectiveness of our approach. Our presentation also examines the impact of individual components in the solution, enhancing the factual correctness of the final response. This research is vital for industries utilizing LLMs, as it provides a means to enhance the reliability of responses and mitigate the risks associated with factual hallucinations. Researchers and practitioners seeking to enhance the reliability of LLM responses will find valuable insights in this presentation.
Recent Advances in Refinement Recommendations
Navigating vast e-commerce websites with extensive product cata- logs can be a daunting challenge for shoppers. To assist customers in finding the products they desire, e-commerce platforms provide product attribute filters, commonly referred to as “refinements.” These refinements serve as a vital navigational aid, enabling cus- tomers to refine their search results based on specific product at- tributes such as material, color, size, brand, etc. However, on mobile devices refinements are not easily discoverable due to lack of screen space. To improve discoverability, contextually relevant refinements are suggested in-line on search page by refinement recommenda- tion systems. In the work, we discuss the evolution of refinement recommendations strategies i.e, a) search query-based classifica- tion approach, for a given search query we train a classification model, with the refinements as labels b) session-based classification approach, for the given sequence of session interactions we train a sequence classification model, with the refinements as labels and c) session-based generation approach, with the sequence of session interactions as input and output as the refinement name.
Foundation Models for Aerial Robotics
Developing machine intelligence abilities in robots and autonomous systems is an expensive and time-consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to address both of these issues. The platform enables robots to learn, compose and adapt skills to their physical capabilities, environmental constraints and goals. One of the components of GRID is a state-of-the-art simulation system that models robot physics and the machine intelligence processes. This system in turn is tightly coupled with a multitude of Foundation Models that enables rapid prototyping, design, debugging and refinement of robot AI models. GRID is designed from the ground up to be extensible to accommodate new types of robots, vehicles, hardware platforms and software protocols. In addition, the modular design enables various deep ML components and existing foundation models to be easily usable in a wider variety of robot-centric problems. We demonstrate the platform in various aerial robotics scenarios and demonstrate how the platform dramatically accelerates development of machine intelligent robots.
Scaling Up LLM Reviews for Google Ads Content Moderation
Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of ads for which we select one representative ad per cluster. We then use LLMs to review only the representative ads. Finally, we propagate the LLM decisions for the representative ads back to their clusters. This method reduces the number of reviews by more than 3 orders of magnitude while achieving a 2x recall compared to a baseline non-LLM model. The success of this approach is a strong function of the representations used in clustering and label propagation; we found that cross-modal similarity representations yield better results than uni-modal representations.
Customer Understanding for Recommender Systems
Recommender systems are powerful tools for enhancing customer engagement and driving sales for Rakuten businesses. However, to achieve their full potential, these systems must possess a profound understanding of customer behaviors. This understanding can be gained from a variety of sources, including customer purchase history, customer feedback, and customer behavioral patterns. One of the most important aspects of customer understanding is the ability to identify lookalike customers, understand their behavioral patterns, and predict lifestyles, for example, whether a customer is married or unmarried, owns a car or plays golf, etc. Rakuten provides more than 70 different services and heavily relies on recommendations for many of its products. In our platforms, we can observe groups of customers who share similar interests, needs, or behaviors often end up being attracted to similar products or services. Customer preferences can change over time, so it is important for recommender systems to adapt those changes. This can be achieved by tracking customer behavior, static or dynamic environment changes around targeted customers, and their feedback. We utilize various graph and deep learning based models to address the customer understanding problem.
The objective of this talk is to offer a comprehensive overview of customer understanding and modeling for complex recommender systems, leading to increased customer satisfaction, loyalty, and sales. We also present various empirical results on both Rakuten data as well as public benchmark datasets, providing evidence of the benefits of customer understanding on various downstream tasks. We also share insights to characterize the circumstances in which the graph-based and deep learning-based models offer the most significant improvements. We conclude this talk by criticizing the current models and discussing possible future developments and improvements.
“Maya”- A Conversational Shopping Assistant for Fashion at Myntra
Fashion Ecommerce is challenging because users have intents that could be very broad Eg: ”I want a dress to wear for my cousin’s wedding”. Traditional single-shot retrieval mechanisms like search don’t work very well for such intents. To solve this problem we developed ”Maya”, a conversational shopping assistant based on LLMs. Maya has been deployed in production and is available to select cohort of users at scale who come to our platform. It is arguably, one of the earliest production deployments of GenAI Fashion assistants at this scale. We describe the journey of building Maya and lessons post the production deployment.
Journey of Hallucination-minimized Generative AI Solutions for Financial Decision Makers
Generative AI has significantly reduced the entry barrier to the domain of AI owing to the ease of use and core capabilities of automation, translation, and intelligent actions in our day to day lives. Currently, Large language models (LLMs) that power such chatbots are being utilized primarily for their automation capabilities on a limited scope. One major limitation of the currently evolving family of LLMs is hallucinations, wherein inaccurate responses are reported as factual. Hallucinations are primarily caused by biased training data, ambiguous prompts and inaccurate LLM parameters, and they majorly occur while combining mathematical facts with language-based context. In this work we present the three major stages in the journey of designing hallucination-minimized LLM-based solutions that are specialized for the decision makers of the financial domain, namely: prototyping, scaling and LLM evolution using human feedback. These three stages and the novel data to answer generation modules presented in this work are necessary to ensure that the Generative AI products are reliable and high-quality to aid key decision-making processes.
Accelerating Pharmacovigilance using Large Language Models
Pharmacovigilance is the science and practice of monitoring, assessing, and preventing adverse effects or any other drug-related problems. Pharmacovigilance ensures the post-market safety of pharmaceuticals and plays a crucial role in public health by enhancing drug safety. This discipline involves collecting, analyzing, and reporting data on adverse events, allowing for informed regulatory decisions. Manual systems face challenges in handling data volume, potentially leading to oversight and delays. Automation with advanced technologies can be a practical solution to mitigate these challenges and ensure efficient data management.
In this talk, we explain the potential application of pre-trained Large Language Models (LLMs) in pharmacovigilance. We begin with a comprehensive overview of the process, covering all the stages of the lifecycle. We emphasize the pivotal process of scrutinizing documents for relevant adverse effects and elucidate the measures that enhance their effectiveness. We delineate our strategy for generating informative summaries, with a specific emphasis on adverse effects and their antecedent occurrences. We emphasize the imperative requirement for factual accuracy validation via the implementation of a fact-checking mechanism. We demonstrate the framework’s efficacy with a focus on output fidelity and summary informativeness. We provide quantifiable evidence of the time-saving benefits of our method compared to manual methods, advocating for the adoption of our framework in pharmacovigilance, and conclude by addressing potential refinements for increased efficiency.
Automated Tailoring of Large Language Models for Industry-Specific Downstream Tasks
Foundational Large Language Models (LLMs) are pre-trained generally on huge corpora encompassing broad subjects to become versatile and generalize to future downstream tasks. However, their effectiveness falls short when dealing with tasks that are highly specialized to a specific use case. Even when adopting current prompt engineering techniques like few-shot or Chain-of-Thought reasoning prompts, the required level of results is not yet achievable directly with foundational models alone. The alternative approach is to fine-tune the LLM, but a common challenge is the limited availability of task-specific training data. In this talk, we will introduce an end-to-end automated framework to tailor a model to specific downstream tasks for an industry where the first step is to generate task-specific custom data from unstructured documents. Next, we will discuss our optimized distributed training pipeline for fine-tuning LLMs on the generated data. Finally, we will provide an overview of the statistical metrics and customized metrics we employ for assessing the performance of the fine-tuned LLM. This automated framework alleviates the burden of manual adjustments and streamlines the process to provide a model that is fully customized to suit the unique requirements of any specific business use case.
Fresh Content Recommendation at Scale: A Multi-funnel Solution and the Potential of LLMs
Recommendation system serves as a conduit connecting users to an incredibly large, diverse and ever growing collection of contents. In practice, missing information on fresh contents needs to be filled in order for them to be exposed and discovered by their audience. In this context, we are delighted to share our success stories in building a dedicated fresh content recommendation stack on a large commercial platform and also shed a light on the utilization of Large Language Models (LLMs) for fresh content recommendations within an industrial framework. To nominate fresh contents, we built a multi-funnel nomination system that combines (i) a two-tower model with strong generalization power for coverage, and (ii) a sequence model with near real-time update on user feedback for relevance, which effectively balances between coverage and relevance. Beyond that, by harnessing the reasoning and generalization capabilities of LLMs, we are presented with exciting prospects to enhance recommendation systems. We share our initial efforts on employing LLMs as data augmenters to bridge the knowledge gap on cold-start items during the training phase. This innovative approach circumvents the costly generation process during inference, presenting a model-agnostic, forward-looking solution for fresh content recommendation.
Lessons Learnt from Building Friend Recommendation Systems
Friend recommendation systems in online social networks such as Snapchat help users find friends and build meaningful connections, leading to heightened user engagement and retention. While friend recommendation systems follow the classical recommendation system paradigm that consists of retrieval and ranking, they pose distinctive challenges different from item recommendation systems (e.g. Youtube videos, Amazon products, Netflix movies), and require special considerations in building one. In this paper, we elucidate the unique challenges encountered and share invaluable insights from developing the friend recommendation system for hundreds of millions of users on Snapchat.
AAGenRec: A Novel Approach for Mitigating Inter-task Interference in Multi-task Optimization of Sequential Behavior Modeling
Multi-task optimization is an emerging research field in recommender systems that aims to enhance the recommendation performance across multiple tasks. Various methodologies have been introduced to address aspects like balancing task weights, handling gradient conflicts, and achieving Pareto optimality. These approaches have shown promise in specific contexts, but are not well-suited for real-world scenarios that involve user sequential behaviors. To address this gap, we present AAGenRec, a novel and effective solution for sequential behavior modeling within multi-task recommender systems, inspired by concepts from acoustic attenuation and genetic differences. Specifically, AAGenRec leverages an established genetic distance method to quantify the dissimilarity between tasks, then introduces an impact attenuation mechanism to mitigate the uncertain task interference in multi-task optimization. Extensive experiments conducted on public e-commerce datasets demonstrate the effectiveness of AAGenRec.
SESSION: WSDM Day Talk Abstracts
Automated Topic Generation for the Mexican Platform for Access to Government Public Information During the Period 2003-2020
The right of access to public information, as safeguarded by national laws and international agreements, is a fundamental guarantee that ensures the ability of every citizen to request and receive government-generated information. In Mexico, the National Institute for Transparency, Access to Information, and Protection of Personal Data (INAI) is responsible for upholding this right. INAI has implemented policies aimed at enhancing access to information, with the National Transparency Platform (NPT) being a notable initiative \citeford2016evolucion . On this platform, individuals from around the world can submit information requests to the Mexican government.
The application of automated techniques for topic generation, such as Latent Dirichlet Allocation (LDA), replaces the labor-intensive process of manually identifying themes within public information requests to the Mexican government, resulting in a more efficient solution\citeberliner2018information . The research adhered to a rigorous methodological approach, involving data collection over a specific time period spanning from 2003 to 2020, with a total of 2,518,875 information requests. The data underwent preprocessing, which included the removal of special characters, segmentation into smaller units (tokenization), and the elimination of common “stop words.” The vocabulary was automatically refined based on Zipf’s law, thereby enhancing the model’s efficiency without compromising accuracy. Subsequently, an LDA topic model was generated using a specific genetic algorithm \citealdana2015clustering , guided by scores reflecting the coherence of topics. The results, in combination with the optimized vocabulary, led to the creation of an improved LDA model customized to meet specific requirements. Topics specific to each state were identified and interpreted using keywords and probabilities.
To evaluate the similarity between topics, a topic embedding vector from the BETO model was employed, and the K-Means algorithm was utilized to cluster descriptions into coherent groups\citeCaneteCFP2020 . The primary issue addressed in this study is the enhancement of topic generation from public government information requests. Automated topic generation plays a pivotal role in the domain of information retrieval, with the principal challenge being the achievement of both accuracy and efficiency in generating topics from a diverse array of information requests.
In this study, we introduce a novel method for automating the generation of topics from information requests. Our approach exceeds previous expectations by delivering significantly higher accuracy and efficiency in topic generation. This advancement enables the automatic extraction of pertinent topics from information requests, offering a substantial improvement in data governance and retrieval across various fields of study.
Our findings represent a significant advancement in the field of automated topic generation. A general geographic pattern reveals that a majority of requests are concentrated in a limited number of states and municipalities, including Mexico City, the State of Mexico, and Jalisco. To obtain preliminary results and validate the methodology, data corresponding to the year 2020 was used, comprising a total of 679,173 public information requests.
In relation to the outcomes of topics categorized by geographic location, as depicted in Figure 1, the first group primarily focuses on matters related to justice, legality, and labor issues. The second group addresses concerns pertaining to the environment and natural resources. The third group delves into similar themes regarding migration and energy. The fourth group places its emphasis on educational and health issues associated with COVID. The fifth group addresses subjects related to trade and personal data protection.
This comprehensive study offers valuable insights for enhancing transparency and open data policies, safeguarding citizens’ information access rights, and promoting transparency across various sectors. Furthermore, automating topic generation holds the potential to revolutionize information access and organization, with a wide range of applications in research, government data retrieval, interdisciplinary research, public policy, and open data advocacy.
Genomic-World Fungi Data: Synteny Part
Fungal genomes undergo duplication events such as Whole Genome Duplication (WGD), which contribute to rapid adaptation and increased genetic diversity. The results provide insights into genome integrity, completeness, evolutionary patterns, and functional conservation observed across different fungal species. Highly conserved synteny patterns underscore the significance of cell membrane proteins, proteolytic capabilities, dynein motor proteins, and endocytosis processes in Basidiomycota, supporting the essential roles these processes play in their development throughout evolution. This research establishes the groundwork for further exploration of genomic diversity and potential applications of synteny analysis in elucidating evolutionary patterns and ecological relationships among fungal species.
Profiling Urban Mobility Patterns with High Spatial and Temporal Resolution: A Deep Dive into Cellphone Geo-position Data
Traditionally, urban traveling patterns have been obtained through origin-destination surveys. This method presents drawbacks such as high costs, limited representativeness of the surveyed population, and low spatial and temporal resolution of the results obtained. This study proposes deploying historical data on mobile device geolocalization to depict population displacement patterns with high spatial and temporal resolution levels. As an illustrative example, the traveling patterns were derived for a megacity in Latin America (the metropolitan area of Monterrey, Mexico) with a database of 0.7 million users being monitored during three months. The solutions formulated to tackle the challenges posed by this proposed method are described, as well as the use of the information gathered to obtain dynamic origin-destination matrices, quantify the average number of daily trips and kilometers traveled per inhabitant, attain population density per hour, and to identify the destinations attractor of most trips. We also suggest using this information to assess the impact of massive events such as concerts and sports gatherings on city mobility and air pollution.
Integrating Knowledge Graph Data with Large Language Models for Explainable Inference
We propose a method to enable Large Language Models to access Knowledge Graph (KG) data and justify their text generation by showing the specific graph data the model accessed during inference. For this, we combine Language Models with methods from Neurosymbolic Artificial Intelligence designed to answer queries on Knowledge Graphs. This is done by modifying the model so that at different stages of inference it outputs an Existential Positive First-Order (EPFO) query, which is then processed by an additional query appendix. In turn, the query appendix uses neural link predictors along with description aware embeddings to resolve these queries. After that, the queries are logged and used as an explanation of the inference process of the complete model. Lastly, we train the model using a Linear Temporal Logic (LTL) constraint-based loss function to measure the consistency of the queries among each other and with the final model output.
Preserving Heritage: Developing a Translation Tool for Indigenous Dialects
The preservation and understanding of indigenous languages emerge as crucial, given their substantial contribution to the cultural and linguistic heritage of communities. Despite their undeniable value, these languages are threatened by extinction due to a dwindling number of native speakers and the predominance of oral traditions over written forms. In this context, this study aims to contribute to the conservation of these languages through the development of a Spanish-indigenous language translator. This research employs neural machine translation technology, investigating three distinct approaches: a translation model based on transformers, finetuning with a Finnish translator, and finetuning with a multilingual translator. The results obtained from these methodologies are promising, demonstrating competitive viability when compared to the limited existing research in this field of study.
Automatic Extraction of Patterns in Digital News Articles of Femicides occurred in Mexico by Text Mining Techniques
Femicides in Mexico are a severe social problem that, according to recent research, is increasing [7]. Although it is true that Secretariado Ejecutivo del Sistema Nacional de Seguridad Pública (SESNSP) periodically publishes information on the incidence of femicides at the national level, this information is limited to counting femicides by states in the country [5].
Given this scenario, unstructured data related to femicides in Mexico was the best source of information for this research. We analyzed 5,000 texts from digital newspaper articles from the most important newspapers in Mexico. These articles were collected by different activists during the period from 2016 to 2020.
SESSION: Workshop Abstracts
WSDM 2024 Workshop on Large Language Models for Individuals, Groups, and Society
This workshop discusses the cutting-edge developments in research and applications of personalizing large language models (LLMs) and adapting them to the demands of diverse user populations and societal needs. The full-day workshop includes several keynotes and invited talks, a poster session and a panel discussion.
The 3rd International Workshop on Interactive and Scalable Information Retrieval Methods for eCommerce (ISIR-eCom 2024)
Over the past few years, consumer behavior has shifted from traditional in-store shopping to online shopping. For example, eCommerce sales have grown from around 5% of total US sales in 2012 to around 15.4% in year 2023. This rapid growth of eCommerce has created new challenges and vital new requirements for intelligent information retrieval systems. Which lead to the primary motivations of this workshop:
(1) Since the pandemic hit, eCommerce became an important part of people’s routine and they started using online shop- ping for smallest grocery items to big electronics as well as cars. With such a large assortment of products and millions of users, achieving higher scalability without losing accuracy is a leading concern for information retrieval systems for eCommerce.
(2) The diverse buyers make the relevance of the results highly subjective, because relevance varies for different buyers. The most suitable and intuitive solution to this problem is to make the system interactive and provide correct relevance for different users. Hence, interactive information retrieval systems are becoming necessity in eCommerce.
(3) To handle sudden change in buyers’ behavior, industries adopted existing sub-optimal information retrieval techniques for various eCommerce tasks. Parallelly, they also started exploring/researching for better solutions and in dire need of help from research community.
This workshop will provide a forum to discuss and learn the latest trends for interactive and scalable information retrieval approaches for eCommerce. It will provide academic and industrial researchers a platform to present their latest works, share research ideas, present and discuss various challenges, and identify the areas where further research is needed. It will foster the development of a strong research community focused on solving eCommerce-related information retrieval problems that provide superior eCommerce experience to all users.
The 5th International Workshop on Machine Learning on Graphs (MLoG)
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. Recently, machine learning techniques are widely developed and utilized to effectively tame graphs for discovering actionable patterns and harnessing them for advancing various graph-related computational tasks. Huge success has been achieved and numerous real-world applications have benefited from it. However, since in today’s world, we are generating and gathering data in a much faster and more diverse way, real-world graphs are becoming increasingly large-scale and complex. More dedicated efforts are needed to propose more advanced machine learning techniques and properly deploy them for real-world applications in a scalable way. Thus, we organize The 5th International Workshop on Machine Learning on Graphs (MLoG) (https://mlog-workshop.github.io/wsdm24.html), held in conjunction with the 17th ACM Conference on Web Search and Data Mining (WSDM), which provides a venue to gather academia researchers and industry researchers/practitioners to present the recent progress on machine learning on graphs.
Integrity 2024: Integrity in Social Networks and Media
Integrity 2024 is the fifth edition of the Workshop on Integrity in Social Networks and Media, held in conjunction with the ACM Conference on Web Search and Data Mining (WSDM) since the 2020 edition [1-4]. The goal of the workshop is to bring together academic and industry researchers working on integrity, fairness, trust and safety in social networks to discuss the most pressing risks and cutting-edge technologies to reliably measure and mitigate them. The event consists of invited talks from academic experts and industry leaders as well as peer-reviewed papers and posters through an open call-for-papers.
WSDM 2024 Workshop on Representation Learning & Clustering
Data clustering and representation learning play an indispensable role in data science. They are very useful to explore massive data in many fields, including information retrieval, natural language processing, bioinformatics, recommender systems, and computer vision. Despite their success, most existing clustering methods are severely challenged by the data generated by modern applications, which are typically high dimensional, noisy, heterogeneous, and sparse or even collected from multiple sources or represented by multiple views where each describes a perspective of the data. This has driven many researchers to investigate new effective clustering models to overcome these difficulties. One promising category of such models relies on representation learning. Indeed, learning a good data representation is crucial for clustering algorithms, and combining the two tasks is a common way of exploring this type of data. The idea is to embed the original data into a low dimensional latent space and then perform clustering on this new space. However, both tasks can be carried out sequentially or jointly. Many clustering algorithms, including deep learning versions, are based on these two modes of combining the two tasks.
Psychology-informed Information Access Systems Workshop
The Psychology-informed Information Access Systems (PsyIAS) workshop bridges the fields of machine learning and psychology, aiming to connect the research communities of information retrieval, recommender systems, natural language processing, as well as cognitive and behavioral psychology. It serves as a forum for multidisciplinary discussions about the use of psychological constructs, theories, and empirical findings for modeling and predicting user preferences, intents, and behaviors. PsyIAS particularly focuses on research that incorporates such psychology-inspired models into the search, retrieval, and recommendation processes, creates corresponding algorithms and systems, or looks into the role of cognitive processes underlying human information access. More information can be found at https://sites.google.com/view/psyias.