Welcome to our weekly Data Management Seminar! The seminar takes place on Mondays, 10:30-11:30am.
Please check back often for updates on the topic of each week.
Our previous seminars can be found here.
|Date||Topic / Reading|
|09/12/2016||Introductions for our new members.|
|10/11/2016||Ontology-based Data Access.|
|10/24/2016||Prof. Tim Kraska from Brown University will be giving a talk: Interactive Data Science.|
|10/31/2016||Prof. Arnab Nandi from the Ohio State University will be giving a talk: Querying Without Keyboards: Challenges in Gesture-driven Data Exploration.|
|11/02/2016||Prof. Jonathan Ullman from Northeastern University will be giving a talk: Dusting for Fingerprints in Private Data.|
|11/28/2016||Interactive Data Eeploration.|
|12/05/2016||Tool Demo 1.|
|12/12/2016||Tool Demo 2.|
Interactive Data Science
Speaker: Prof. Tim Kraska
Time: 10/24/2016, 12:30pm-1:30pm
Unleashing the full potential of Big Data requires a paradigm shift in the algorithms and tools used to analyze data towards more interactive systems with highly collaborative and visual interfaces. Ideally, a data scientist and a domain expert should be able to make discoveries together by directly manipulating, analyzing and visualizing data on the spot, instead of having week-long forth-and-back interactions between them. Current systems, such as traditional databases or more recent analytical frameworks like Hadoop or Spark, are ill-suited for this purpose. They were not designed to be interactive nor to support the special requirements of visual data exploration. Similarly, most machine learning algorithms are not able to provide initial answers at “human speed” (i.e., sub-seconds), nor are existing methods sufficient to convey the impact of the various risk factors, such as caused by incompleteness within the data or (implicit) multi-hypothesis testing.
In this talk, I will present our vision of a new approach for conducting interactive exploratory analytics and explain why integrating the aforementioned features requires a complete rethinking of the full analytics stack from the interface to the “guts”. I will present recent results towards this vision including our novel interface, analytical engine and index structure, and outline what challenges are still ahead of us.
Tim Kraska is an Assistant Professor in the Computer Science department at Brown University. Currently, his research focuses on Big Data management systems for modern hardware and new types of workloads, especially interactive analytics. Before joining Brown, Tim spent 3 years as a PostDoc in the AMPLab at UC Berkeley, where he worked on hybrid human-machine database systems and cloud-scale data management systems. Tim received his PhD from the ETH Zurich under the supervision of Donald Kossmann. He was awarded an NSF Career Award (2015), an Airforce Young Investigator award (2015), a Swiss National Science Foundation Prospective Researcher Fellowship (2010), a DAAD Scholarship (2006), a University of Sydney Master of Information Technology Scholarship for outstanding achievement (2005), the University of Sydney Siemens Prize (2005), two VLDB best demo awards (2015 and 2011), and an ICDE best paper award (2013).
Querying Without Keyboards: Challenges in Gesture-driven Data Exploration
Speaker: Prof. Arnab Nandi
Time: 10/31/2016, 10:30am-11:30am
New computing devices that use “natural” modes of interaction such as multitouch and gestures are rapidly becoming more popular than traditional keyboard-based interaction. These devices are being used to consume and directly interact with data in a wide range of contexts, including business intelligence, data-driven sciences, and healthcare. Applications for such devices are highly interactive, and pose a fundamentally different set of expectations on the underlying data infrastructure. In this talk, we rethink various aspects of the database stack — from the query language to the underlying query execution layers — to address such interactive workloads. We explore the impact of including interactivity as first-class concept, and show that our methods result in experiences that are not only effective, but also more fluid and intuitive for the end-user.
Arnab Nandi is an Assistant Professor in the Computer Science and Engineering department at The Ohio State University. Arnab’s research is in the area of database systems, focusing on exploiting user behavior to address challenges in large-scale data analytics and interactive query interfaces. This involves solving problems that span the areas of databases, interactive visualization, human-computer interaction, and information retrieval. He is 2016’s recipient of IEEE TCDE Early Career Award for his contributions towards user-focused data interaction. Arnab is also a recipient of the US NSF CAREER award, a Google Faculty Award, and a Yahoo! PhD Fellowship. Prior to joining Ohio State, Arnab received his PhD under the supervision of H.V. Jagadish from the University of Michigan, Ann Arbor in 2011.
Dusting for Fingerprints in Private Data
Speaker: Prof. Jonathan Ullman
Time: 11/02/2016, 12:00pm-1:10pm
We describe a powerful new family of attacks that recover sensitive information about individuals using only simple summary statistics computed on a dataset. Notably our attacks succeed under minimal assumptions on the distribution of the data, even if the attacker has very limited information about this distribution, and even if the summary statistics are significantly distorted. Our attacks build on and generalize the method of fingerprinting codes for proving lower bounds in differential privacy, and also extend the practical attacks on genomic datasets demonstrated by Homer et al. Surprisingly, the amount of noise that our attacks can tolerate is nearly matched by the amount of noise required to achieve differential privacy, meaning that the robust privacy guarantees of differential privacy come at almost no cost in our model.
Based on joint work with Cynthia Dwork, Adam Smith, Thomas Steinke, and Salil Vadhan.
Jon Ullman is an assistant professor in the College of Computer and Information Sciences at Northeastern University. His research addresses questions like “when and how can we analyze sensitive datasets without compromising privacy” and “how can we prevent false discovery in the empirical sciences” using tools from cryptography, machine learning, algorithms, and game theory. Prior to joining Northeastern, he completed his PhD at Harvard University, and was in the inaugural class of junior fellows in the Simons Society of Fellows.