In the recent explosive growth of online activities, the data are often recorded as heterogeneous graphs, ranging from Facebook’s Open Graph that record our social and communication activities to the graphs gathered by major search engine companies that represent a snapshot of our collective knowledge. As demonstrated in many web search and data mining applications, a critical element to make the best use of the data is the ability to assess the relative importance of the nodes.
In the 2016 WSDM Cup, the challenge will be to assess the query-independent importance of scholarly articles, using data from the Microsoft Academic Graph--a large heterogeneous graph comprised of publications, authors, venues, organizations, and the fields of study. The goal of this ranking challenge is to provide the best static rank values (as defined in http://en.wikipedia.org/wiki/Learning_to_rank or http://www2006.org/programme/files/xhtml/3101/p3101-Richardson.html) for each of publication entity in a heterogeneous graph. Static rank plays a key role in recommendation systems, especially in the cold start scenarios, and also for search engines to determine the ranking of search results (e.g., for queries like “papers by author x”, “papers about topic y”). Traditional metrics have relied heavily on citations, which favor the more established, seminal papers and treat all citations as equal (and positive) indicators of importance and impact. We invite the community to jointly explore and develop better alternatives in this challenge.
Microsoft Research has released the Microsoft Academic Graph for use in this challenge, which is available now on Microsoft Azure. The entire graph can be downloaded directly (37 GB) or accessed directly from Azure. Should you wish to use Azure in your research, Microsoft Research is making Azure awards available to the research community via the Azure for Research program. The next deadline for award requests is August 15th 2015.
This challenge encompasses two phases. During the first phase, all entries will be evaluated against human annotated data that consist of pairwise comparisons of papers by trained and actively practicing researchers. The organizer will provide a weekly leaderboard, and the top teams at the end of Phase 1 will be invited to enter phase 2 where their algorithms will be re-run against an updated graph, and the results will be deployed to a public search engine to gather real-world user engagement data.