Deadlines

November 18, 2024

Start of Competition

February 3, 2025

Submission Deadline

March 10-14, 2025

Award Ceremony

All deadlines are at 11:59 PM UTC on the corresponding day unless otherwise noted.

Participate Here

We are delighted to announce the 2025 WSDM Cup: LMSYS Multilingual Chatbot Arena prediction contest, in collaboration with Kaggle and  LMSYS.org.


Background

As generative AI methods like large language models become increasingly critical to web-scale search and understanding, the problem of LLM evaluation has taken on a role of central importance. Static evaluation datasets have been fraught with challenges, both because of issues of contamination and leakage, and because of the difficulty of accurately rating free-text responses generated by these models. Utilizing human raters provides a useful gold standard, but this approach has obvious scalability limitations. Using auto raters using one LLM as a judge for other LLM outputs is a path that promises scalability, but is only trustworthy when we have a strong basis for demonstrating that the auto-rater judge will provide a faithful proxy of human evaluations.

The creation of useful auto-rater models also aligns with the concept of reward models or preference models in reinforcement learning from human feedback (RLHF). Previous research has identified limitations in directly prompting an existing LLM for preference predictions. These limitations often stem from biases such as favoring responses presented first (position bias), being overly verbose (verbosity bias), or exhibiting self-promotion (self-enhancement bias).

The Challenge

In this iteration of the WSDM Cup, we challenge competitors to create useful and efficient auto-rater models. In order to provide a strong ability to ensure that these models are indeed reflective of human evaluations, we have worked in partnership with LMSYS.org, which has become one of the leading resources for the field through their well known chatbot arena lmarena.ai. This arena has led the way in open-ended, community based evaluations of LLMs in blind side-by-side comparison, and provides the perfect resource for training and evaluating auto-rater models.

Because the WSDM community is global, we focus this competition on the creation of multi-lingual auto-rater models using chatbot arena data.  

Kaggle is generously hosting the competition and is furthermore providing a prize fund of $50,000 USD. This competition begins on Monday, November 18th, and the final deadline for submissions is on Feb 3, 2025, so now is the time to get started.

Get Started

To participate in the competition and for full details, please go to the WSDM Cup Multilingual Chatbot Arena competition page on Kaggle.