WSDM CUP 2024 – Xiaohongshu – Conversational Multi-Doc QA

Introduction:

Despite progress in large language model-based chatbots, conversational question-answering (QA) is still challenging, especially with current or trending topics. A typical solution involves providing relevant documents for the models to reference. However, it’s often noted that these documents can overwhelm or mislead the language models.

In this challenge, we invite you to participate in a conversational QA challenge, featuring a mix of relevant and irrelevant documents from Xiaohongshu. Your systems will be trained on real-world data and assessed based on criteria evaluating both lexical and semantic relatedness. The top-3 teams will be awarded prizes of $1500, $1000, and $500 USD, respectively.

Competition:

  • Title
    • Conversational Multi-Doc QA
  • Rules
    • Your models are required to answer user questions based on the conversational history and the provided reference documents
      • Input: History, Reference Documents, Question.
      • Output: Answer.
    • Model Scale Requirement: Ensure your model size is fewer than 14 billion (14B) parameters. The overall solution will be reviewed after the submission deadline.

Data:

  • Description
    • Format: both training/eval/test data will be given in `json` format, each sample includes the following fields:
      • uuid: string, a unique identifier for each example
      • history: list of tuples of strings, sequential QA pairs
      • documents: list of strings, at most 5 reference documents
      • question: string, user question – answer: string, reference answer (not given in test data)
      • keywords: list of strings, reference keywords that should better be mentioned in the reference answer (not given in both training/eval/test set)

Example:

Python
# Training example.
{
    “uuid”: “xxxxx”,
    “history”: [
    {“question”: xxx, “history”: xxx},
    {“question”: xxx, “history”: xxx},
    …
    ],
    “documents”: [
        “Jun 17th through Fri the 21st, 2024 at the Seattle Convention Center, Vancouver Convention Center.”, “Workshops within a “track” will take place in the same room (or be co-located), and workshop organizers will be asked to work closely with others in their track …”, 
        …
    ],
    “question”: “Where will CVPR 2024 happen?”,
    “answer”: “CVPR 2024 will happen at the Seattle Convention Center, Vancouver.”,
    “keywords”: # Will not be given.
    [
        “Vancouver”, “CVPR 2024”, “Seattle Convention Center”
    ] 
}
 
# submission example for eval/test phase.
[
    {
        “uuid”: “xxxxx”,
        “prediction”: “CVPR 2024 will happen at the Seattle Convention Center, Vancouver.”,
    },
    …
]

Evaluation:

  • Submission:
    • We use codabench to hold the competition, please refer to https://www.codabench.org/competitions/1772/ for details.
    • Format: participants should submit their results in `json` format in which a line is actually an example that includes the following fields,
      • uuid: int, a unique identifier for each test example
      • prediction: string, your answer
    • Criterion
      • Metrics:
        • Keywords Recall: whether the answers contains the truths and the specific keywords by exact matching (see keywords filed in example data).
        • ROUGE-L: whether the answers are similar to reference answers by fuzzy matching (see answer field in example data).
      • Ranking Procedure:
        • The overall performance will be determined by examining the mean rank of the above metrics on the Phase 2 (Test set) leaderboard.
        • In cases where teams have identical mean ranks, preference will be given to the team with the higher ROUGE-L score.

Organization: