We’re delighted to have three distinguished researchers to give keynotes at WSDM 2015.
- Making Sense of Big Data with the Berkeley Data Analytics Stack, Michael Franklin (UC Berkeley)
- The Information Life of Social Networks, Lada Adamic (Facebook)
- Learning from User Interactions, Thorsten Joachims (Cornell University)
|Making Sense of Big Data with the Berkeley Data Analytics Stack
Abstract: The Berkeley AMPLab is creating a new approach to data analytics. Launching in early 2011, the lab aims to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and in crowds). The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the four years the lab has been in operation, we’ve released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS features prominently in many industry discussions of the future of the Big Data analytics ecosystem – a rare degree of impact for an ongoing academic project.
Given this initial success, the lab is continuing on its research path, moving “up the stack” to better integrate and support advanced analytics and to make people a full-fledged resource for making sense of data. In this talk, I’ll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on our newest efforts, including some or all of: the GraphX graph processing system, the Velox and MLBase machine learning platforms, and the SampleClean framework for hybrid human/computer data cleaning. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.
Bio: Michael Franklin is the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Science Division at the University of California, Berkeley. He has over 30 years of experience in the database, data analytics, and data management fields as a researcher, lab director, faculty member, entrepreneur, and software developer. Prof. Franklin is also the Director of the Algorithms, Machines, and People Laboratory (AMPLab) at UC Berkeley. The AMPLab currently works with 23 industrial sponsors including founding sponsors Amazon Web Services, Google, and SAP, and received a National Science Foundation CISE “Expeditions in Computing” award, announced as part of the White House Big Data research initiative in March 2012. AMPLab is well-known for creating a number of popular systems in the Open Source Big Data ecosystem including Spark, Mesos, Shark, GraphX and MLlib, all parts of the Berkeley Data Analytics Stack (BDAS). Prof. Franklin is also a co-PI and Executive Committee member for the Berkeley Institute for Data Science, part of a multi-campus initiative to advance Data Science Environments. He is an ACM Fellow, a two-time winner of the ACM SIGMOD “Test of Time” award, and recipient of the outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley.
|The Information Life of Social Networks
Abstract: Vast amounts of information are propagated in online social networks such as Facebook. This talk will describe several studies characterizing how information diffuses over social ties, from the growth of individual cascades to the predictability of their eventual size. It will also characterize the diffusion of specific kinds of information, including rumors, memes, and social movements.
Bio: Lada Adamic leads the Product Science group within Facebook’s Data Science Team. She is also an adjunct associate professor at the University of Michigan’s School of Information and Center for the Study of Complex Systems. Her research interests center on information dynamics in networks: how information diffuses, how it can be found, and how it influences the evolution of a network’s structure. Her projects have included identifying expertise in online question and answer forums, studying the dynamics of viral marketing, and characterizing the structural and communication patterns in online social media. She has received an NSF CAREER award, a University of Michigan Henry Russell award, the 2012 Lagrange Prize in Complex Systems.
|Learning from User Interactions
Abstract: The ability to learn from user interactions can give systems access to unprecedented amounts of world knowledge. This is already evident in search engines, recommender systems, and electronic commerce, and other applications are likely to follow in the near future (e.g., education, smart homes). More generally, the ability to learn from user interactions promises pathways for solving knowledge-intensive tasks ranging from natural language understanding to autonomous robotics.
Learning from user interactions, however, means learning from data that does not necessarily fit the assumptions of the standard machine learning models. Since interaction data consists of the choices that humans make, it has to be interpreted with respect to how humans make decisions, which is influenced by the decision context and constraints like human motivation and human abilities.
In this talk, I argue that we need learning approaches that explicitly model user-interaction data as the result of human decision making. To this effect, the talk explores how integrating micro-economic models of human behavior into the learning process leads to new learning algorithms that have provable guarantees under verifiable assumptions and to learning systems that perform robustly in practice. These findings imply that the design space of such human-interactive learning systems encompasses not only the machine learning algorithm itself, but also the design of the interaction under an appropriate model of user behavior.
Bio: Thorsten Joachims is a Professor in the Department of Computer Science and in the Department of Information Science at Cornell University. His research interests center on a synthesis of theory and system building in machine learning, with applications in information access, language technology, and recommendation. His past research focused on support vector machines, text classification, structured output prediction, convex optimization, learning to rank, learning with preferences, and learning from implicit feedback. In 2001, he finished his dissertation advised by Prof. Katharina Morik at the University of Dortmund. From there he also received his Diplom in Computer Science in 1997. Between 2000 and 2001 he worked as a PostDoc at the GMD Institute for Autonomous Intelligent Systems. From 1994 to 1996 he was a visiting scholar with Prof. Tom Mitchell at Carnegie Mellon University.