Tutorials
In case of limited space due to popularity, space will be first come, first serve.
Understanding Offline Political Systems by Mining Online Political Data
Monday, February 22, 2016
9:00 - 12:00
Oren Tsur, David Lazer and Tina Eliassi-Rad
https://sites.google.com/site/orentsur/wsdm_2016_tutorial_political_mining
"Man is by nature a political animal", as asserted by Aristotle. This political nature manifests itself in the data we produce and the traces we leave online. In this tutorial, we address a number of fundamental issues regarding mining of political data: What types of data could be considered political? What can we learn from such data? Can we use the data for prediction of political changes, etc? How can these prediction tasks be done efficiently? Can we use online socio-political data in order to get a better understanding of our political systems and of recent political changes? What are the pitfalls and inherent shortcomings of using online data for political analysis? In recent years, with the abundance of data, these questions, among others, have gained importance, especially in light of the global political turmoil and the upcoming 2016 US presidential election. We will introduce relevant political science theory, describe the challenges within the framework of computational social science and present state of the art approaches bridging social network analysis, graph mining, and natural language processing.
Click Models for Web Search and their Applications to IR
Monday, February 22, 2016
14:00 - 17:00
Ilya Markov, Maarten de Rijke and Aleksandr Chuklin
http://clickmodels.weebly.com/wsdm-2016-tutorial.html
Click models, probabilistic models of the behavior of search engine users, have been studied extensively by the information retrieval community during the last eight years. We now have a handful of basic click models, inference methods, evaluation principles and applications for click models, that form the building blocks of ongoing research efforts in the area. The goal of this tutorial is to bring together current efforts in the area, summarize the research performed so far and give a holistic view on existing click models for web search.
Large Scale Distributed Data Science using Apache Spark
Monday, February 22, 2016
14:00 - 17:30
James Shanahan and Liang Dai
http://wsdm2016-sparktutorial.droppages.com/
Apache Spark is an open-source cluster computing framework. It has emerged as the next generation big data processing engine, overtaking Hadoop MapReduce which helped ignite the big data revolution. Spark maintains MapReduce’s linear scalability and fault tolerance, but extends it in a few important ways: it is much faster (100 times faster for certain applications), much easier to program in due to its rich APIs in Python, Java, Scala and R, and its core data abstraction, the distributed data frame. In addition, it goes far beyond batch applications to support a variety of compute-intensive tasks, including interactive queries, streaming, machine learning, and graph processing.
This tutorial will provide an accessible introduction to large scale distributed machine learning and data mining, and to Spark and its potential to revolutionize academic and commercial data science practices. It is divided into two parts: the first part will cover fundamental Spark concepts, including Spark Core, data frames, the Spark Shell, Spark Streaming, Spark SQL, MLlib, and more; the second part will focus on hands-on algorithmic design and development with Spark (developing algorithms from scratch such as decision tree learning, association rule mining (aPriori), graph processing algorithms such as pagerank/shortest path, gradient descent algorithms such as support vectors machines and matrix factorization. Industrial applications and deployments of Spark will also be presented. Example code will be made available in python (pySpark) notebooks.