Welcome to our weekly Data Management Seminar! The seminar takes place on Tuesdays, 1-2pm.
Please check back often for updates on the topic of each week.
Our previous seminars can be found here.
|Date||Topic / Reading|
|01/20/2016||Prof. Samuel Madden from MIT will be giving a talk: Interactive Data Analytics: the New Frontier.|
|01/26/2016||Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker. Readings in Database Systems, 5th Edition.|
|02/02/2016||Xiangyao Yu from MIT will be giving a talk: Time-Traveling Cache Coherence and Concurrency Control in CS150.|
|02/04/2016||Dr. Ce Zhang from Stanford will be giving a talk: DeepDive: A Data Management System for Machine Learning Workloads in CS151.|
|02/09/2016||Yue Wang. Enhancing Interactivity in Conventional Queries.|
|02/23/2016||Liudmila Elagina. Rank Aggregation under Differential Privacy.|
|03/08/2016||Prof. Julia Stoyanovich. Data Responsibly: Fairness, Neutrality and Transparency in Data Analysis.|
|03/22/2016||Abhishek Roy. Distributed Data Processing in Bioinformatics.|
|03/29/2016||Dan Zhang. Visualizing Differentially Private Data.|
|04/05/2016||Prof. Danielle Citron.|
Interactive Data Analytics: the New Frontier
Speaker: Prof. Samuel Madden
Time: 01/20/2016, 4:00pm-5:00pm
Data analytics often involves data exploration, where a data set is repeatedly analyzed to understand root causes, find patterns, or extract insights. Such analysis is frequently bottlenecked by the underlying data processing system, as analysts wait for their queries to complete against a complex multilayered software stack. In this talk, I’ll describe some exploratory analytics applications we’ve build in the MIT database group over the past few years, and will then describe some of the challenges and opportunities that arise when building more efficient data exploration systems that will allow these applications to be come truly interactive, even when processing billions of data points.
Samuel Madden is a Professor of Electrical Engineering and Computer Science in MIT’s Computer Science and Artificial Intelligence Laboratory. His research interests include databases, distributed computing, and networking. Madden is a leader in the emerging field of “Big Data”, heading the Intel Science and Technology Center (ISTC) for Big Data, a multi-university collaboration on developing new tools for processing massive quantities of data. He also leads BigData@CSAIL, an industry-backed initiative to unite researchers at MIT and leaders from industry to investigate the issues related to systems and algorithms for data that is high rate, massive, or very complex.
Madden received his Ph.D. from the University of California at Berkeley in 2003 where he worked on the TinyDB system for data collection from sensor networks. Madden was named one of Technology Review’s Top 35 Under 35 in 2005, and is the recipient of several awards, including an NSF CAREER Award in 2004, a Sloan Foundation Fellowship in 2007, best paper awards in VLDB 2004 and 2007, and a best paper award in MobiCom 2006.
Time-Traveling Cache Coherence and Concurrency Control
Speaker: Xiangyao Yu
Time: 02/02/2016, 1:00pm-2:00pm
Parallel computing is important. But parallel computing is hard. Building a shared-memory parallel system is challenging due to the difficulty of ensuring consistency and coherence over modern multi-core CPUs. All existing shared-memory systems enforce global memory order using either physical time or logical time (Lamport Clocks). But these designs are suboptimal and limit the performance of highly parallel shared-memory systems.
In this talk, I introduce a new concept of time called “physiological time” that combines the strengths of physical and logical time while avoiding their problems. The key advantage of physiological time is that it enables an application to move memory operations (load/stores) forward and backwards in time in order to avoid conflicts. To evaluate this idea, we implemented our approach in a new cache coherence protocol (Tardis / PACT’15) for multi-core CPUs and a new concurrency control algorithm (TicToc / SIGMOD’16) for on-line transaction processing database systems. Both algorithms are simpler, more scalable, and perform better than existing state-of-the-art implementations. I will then provide a preview of how we think physiological time can be used in a distributed operating environment.
Xiangyao Yu is a 4th year PhD student from MIT. He received his BS degree from Tsinghua University in 2012. His research interests span computer architecture, databases, and distributed systems.
DeepDive: A Data Management System for Machine Learning Workloads
Speaker: Dr. Ce Zhang
Time: 02/04/2016, 4:00pm-5:00pm
Many pressing questions in science are macroscopic: they require scientists to consult information expressed in a wide range of resources, many of which are not organized in a structured relational form. Knowledge base construction (KBC) is the process of populating a knowledge base, i.e., a relational database storing factual information, from unstructured inputs. KBC holds the promise of facilitating a range of macroscopic sciences by making information accessible to scientists. One key challenge in building a high-quality KBC system is that developers must often deal with data that are both diverse in type and large in size. Further complicating the scenario is that these data need to be manipulated by both relational operations and state-of-the-art machine-learning techniques.
My research focuses on building a data management system for machine learning workloads with the goal to help this complex process of building KBC systems. The system I build is called DeepDive, whose ultimate goal is to allow scientists to build a KBC system, and machine learning systems in general, by declaratively specifying domain knowledge without worrying about any algorithmic, performance, or scalability issues. DeepDive has been used by users without machine learning expertise in a number of domains from paleobiology to genomics to anti-human trafficking. In this talk, I will describe the DeepDive framework, its applications, and underlying techniques we developed to speed up a range of machine learning workloads by up to two orders of magnitude.
Ce is a postdoctoral researcher in Computer Science at Stanford University. He is working with Christopher Ré on data management and database systems. With the indispensable help of many collaborators, his PhD work produced the system DeepDive, a trained data system for automatic knowledge-base construction. As part of his PhD thesis, he led the research efforts that won the 2014 SIGMOD Best Paper Award and was invited to the “Best of VLDB 2015” special issue; PaleoDeepDive, a machine-reading system for paleontologists, was featured in Nature magazine, and he also led the Stanford team that produced the top-performing machine-reading system for TAC-KBP 2014 slot-filling evaluations using DeepDive. Ce obtained his PhD from the University of Wisconsin-Madison, advised by Christopher Ré, and his Bachelor of Science degree from Peking University, advised by Bin Cui.
Speaker: Prof. Danielle Citron
Time: 04/05/2016, 1:00pm-2:00pm
Professor Danielle Citron is the Lois K. Macht Research Professor & Professor of Law at the University of Maryland Francis King Carey School of Law. Her work focuses on information privacy, cyber law, automated systems, and civil rights.
Professor Citron is the author of Hate Crimes in Cyberspace published by Harvard University Press. Cosmopolitan and Harper’s Bazaar nominated her book as one of the “Top 20 Best Moments for Women” in 2014; Boston University Law Review hosted an online symposium devoted to her book in 2015.