The “Practice and Experience” talks represents our first attempt at WSDM to weave together cutting-edge academic research with real-world practice from top industry experts. These talks are organized thematically and presented alongside research papers in the conference’s single-track format.
Vaclav Petricek (eHarmony)
Humans have a mixed record in choosing romantic partners. Are looks or brains more important for a happy marriage?
Matchmaking is an age-old concept that has been revolutionized with the advent of Internet. Suddenly the pool of potential partners that one can plausibly consider has exploded and thanks to the move of courtship online we are now able to collect and analyze unprecedented amounts of data on romantic interactions.
If you are looking for love you may want to take advantage of this accumulated knowledge to give yourself a leg up. However making causal inferences aka “Dating advice” can be problematic due to various sample biases. I will instead show how we leverage this rich data, hadoop, vowpal wabbit, GBMs, graph optimization, and contextual bandit techniques to build a matchmaking system that improves your chances of a happy marriage.
I will focus specifically on how we approach solving these three problems:
- Compatibility: matching for the long term based on psychological traits
- Affinity: modeling immediate attraction
- Distribution: who to introduce to who and when
Did you know that – according to a recent PNAS study – eHarmony couples are happier and less likely to break up or divorce than couples that met offline or through all other dating sites combined?
About the speaker: Vaclav Petricek is a Director of Machine Learning at Santa Monica-based eHarmony where he is responsible for data science and optimization in eHarmony’s core matchmaking algorithms. He also runs the Los Angeles Machine Learning Meetup group and helps startups build recommender systems. Prior to eHarmony, Vaclav was a Researcher at University College London, where he focused on collaborative filtering, social networks, web structure and online auctions. Prior to that he has worked at several Czech startups. Vaclav earned his PhD in Computer Science and a Masters in Distributed Systems from Charles University in Prague, Czech Republic.
Graph Search: Personalized Search over a Trillion Connections
Xiao Li (Facebook)
The Facebook social network service is built upon an enormous graph representing over a billion users and entities, hundreds of billions of photos, and a trillion connections. Graph Search is a personalized search engine that understands user queries expressed in natural language, seeks answers through the traversal of relevant graph edges, and ranks results by various signals extracted from the graph. This empowers users to find, in an aggregated manner, restaurants in New York their friends have visited, or stories about Nelson Mandela by South Africa residents. In this talk, I describe nuts and bolts of the Graph Search system, focusing on real world challenges and solutions in indexing, retrieval, ranking and natural language processing.
About the speaker: Xiao Li is an engineering manager on the Graph Search team at Facebook. She received the Ph.D. degree from the University of Washington, Seattle, in 2007. From 2007 to 2011, prior to joining Facebook, she was a researcher at Microsoft Research, Redmond. She has published over 40 refereed papers in speech and language understanding, information retrieval and machine learning, and is inventor of over 20 granted/pending patents. She was named Innovators Under 35 by MIT Technology Review in 2011.
Challenges and New Directions in Recommendation Systems
Guy Lebanon (Amazon)
The construction of modern recommendation systems in industry owes a large debt to the discoveries of the academic community. However, the field of recommendation systems in industry is currently facing new challenges, which are not adequately addressed by the major research efforts in the academic community. I will survey the growing gap between academic and industry efforts in recommendation systems, consider several major challenges, and explore several new directions in this space.
About the speaker: Guy Lebanon is a senior manager at Amazon, where he leads the Machine Learning Science Group. Prior to that he was a tenured professor at the Georgia Institute of Technology and a scientist at Google and Yahoo. His main research areas are statistical machine learning, computational statistics, and information visualization. Guy received his PhD in 2005 from Carnegie Mellon University and BA, and MS degrees from Technion – Israel Institute of Technology. Dr. Lebanon has authored over 60 refereed publications. He is an action editor of Journal of Machine Learning Research, was the program chair of the 2012 ACM CIKM Conference, and will be the conference co-chair of AI & Statistics (AISTATS 2015). He received the NSF CAREER Award, the ICML best paper runner-up award, the Yahoo Faculty Research and Engagement Award, and is a Siebel Scholar.
Response Prediction for Display Advertising
Olivier Chapelle (Criteo)
Click-through and conversation rates estimation are two core predictions tasks in display advertising. I will present a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: it is easy to implement and deploy; it is highly scalable (we have trained it on terabytes of data); and it provides models with state-of-the-art accuracy.
About the speaker: Olivier Chapelle is a principal research scientist at Criteo, where he works on machine learning for display advertising. Prior to that, he was part of the machine learning group of Yahoo! Research and before that worked at the Max Planck Institute in Tübingen. His main research interests include kernel machines, semi-supervised learning, ranking and large scale learning. He graduated in theoretical computer science from the Ecole Normale Supérieure de Lyon in 1999 and received his PhD for University of Paris 6 in 2002. He has published over 80 publications and has been granted more than 10 patents. He has served as an associated editor for the Machine Learning Journal and Transactions on Pattern Analysis and Machine Intelligence.