Welcome to our weekly Data Management Seminar!  The seminar takes place on Tuesdays, 1-2pm, in room CS 303.  In Fall 2015, we will explore the topic of “Fairness, accountability, and non-discrimination in the era of big data”, though we may have some guest speakers on a variety of topics.

Thanks  for sponsoring our Data Management Seminar!

Please check back often for updates on the topic of each week.

Date Topic / Reading
09/22/2015 B. Friedman and H. Nissenbaum. Bias in computer systems. ACM Trans. Inf. Syst.
S. Barocas and A. D. Selbst. Big data's disparate impact. [Part I: How data mining discriminates]
09/29/2015  L. Sweeney. Discrimination in online ad delivery. Queue, 11(3):10, 2013.
M. Kay, C. Matuszek, and S. A. Munson. Unequal representation and gender stereotypes in image search results for occupations.  CHI 2015, p. 3819–3828.
10/06/2015 M. Lécuyer, et al. Xray: Enhancing the web’s transparency with differential correlation. USENIX Security 2014.
10/08/2015 Dr. Laura Haas from IBM Research Lab will be giving a talk: Accelerating the Discovery of Insights from Data from 1pm to 2pm in Room 150/151.
10/19/2015 Prof. Remco Chang from Tufts University will be giving a talk: Big Data Visual Analytics: A User-Centric Approach at 12:30pm in Room 150.
10/20/2015 D. Pedreschi, et al. Discrimination-aware Data Mining. KDD 2008.
10/27/2015 F. Kamiran, et al. Discrimination aware decision tree learning. ICDM 2010.
11/10/2015 C. Dwork, et al. Fairness through awareness. ITCS 2012.
11/24/2015 R. Zemel, et al. Learning fair representations. ICML 2013.
12/01/2015 L. Chen, et al. Peeking Beneath the Hood of Uber. IMC 2015.
12/07/2015 Prof. Ninghui Li from Purdue University will be giving a talk: Differential Privacy: What Does It Mean and What Can Be Achieved? at 12:30pm in Room 150 & Room 151.
12/08/2015 Prof. Kobbi Nissim from Ben-Gurion University & Harvard University will be giving a talk: Privacy: from Theory to Reality at 4:00pm in Room 150.



  1. Accelerating the Discovery of Insights from Data

    Speaker: Dr. Laura Haas
    Time: 1:00pm – 2:00pm, 10/08/2015
    Location: Room 150/151
    Today, businesses and scientists alike struggle to get to the value in their data. Their challenges include finding and gaining access to the data they need, “wrangling” the data into a form they can use, and setting up the systems and software to be used – all before even tackling the analysis. With no coordination, multiple groups may re-do the heavy lifting to ready the data for use, or struggle to figure out what data is already available. Further, the skills required to get from raw data to insight span a broad range from systems to data management, optimization, statistics, algorithms, story-telling and visualization. Rarely can you find such multi-disciplinary expertise in one team – it is typically scattered across multiple business units or departments.
    The IBM Research Accelerated Discovery Lab is a unique, collaborative environment specifically designed to facilitate complex analytic projects by tackling these challenges. One of the key elements of the Lab is the notion of a data lake, accessed through an easy-to-use, collaborative tool called LabBook, which, together with new practices such as datastorming, helps bridge the gaps between experts from different disciplines. We will highlight some successful applications of these technologies, from diverse fields such as medical research, food safety, social media analytics and predictive equipment maintenance.
    Laura Haas is an IBM Fellow and Director of IBM Research’s Accelerated Discovery Lab. She was Director of Computer Science at IBM’s Almaden Research Center from 2005 to 2011, and had worldwide responsibility for IBM Research’s exploratory science program from 2009 through 2013. From 2001-2005, she led the Information Integration Solutions architecture and development teams in IBM’s Software Group. Previously, Dr. Haas was a research staff member and manager at Almaden. She is best known for her work on the Starburst query processor, from which DB2 LUW was developed, on Garlic, a system which allowed integration of heterogeneous data sources, and on Clio, the first semi-automatic tool for heterogeneous schema mapping. She has received several IBM awards for Outstanding Innovation and Technical Achievement, an IBM Corporate Award for information integration technology, the Anita Borg Institute Technical Leadership Award, and the ACM SIGMOD Codd Innovation Award. Dr. Haas was Vice President of the VLDB Endowment Board of Trustees from 2004-2009, and is a member of the National Academy of Engineering and the IBM Academy of Technology, an ACM Fellow, a Fellow of the American Academy of Arts and Sciences, and Vice Chair of the board of the Computing Research Association.

  2. Big Data Visual Analytics: A User-Centric Approach

    Speaker: Prof. Remco Chang
    Time: 12:30pm, 10/19/2015
    Location: Room 150
    Modern visualization systems often assume that the data can fit within the computer’s memory. With such an assumption, visualizations can quickly slice and dice the data and help the users examine and explore the data in a wide variety of ways. However, in the age of Big Data, the assumption that data can fit within memory no longer applies. One critical challenge in designing visual analytics systems today is therefore to allow the users to explore large and remote datasets at an interactive rate. In this talk, I will present our research in approaching this problem in a user-centric manner. In the first half of the talk, I will present preliminary work with the database group at MIT on developing a big data visualization system based on the idea of predictive prefetching and precomputation. In the second half of the talk, I will present mechanisms and approaches for performing prefetching that are based on user’s past interaction histories and their perceptual abilities.
    Remco Chang is an Assistant Professor in the Computer Science Department at Tufts University. He received his BS from Johns Hopkins University in 1997 in Computer Science and Economics, MSc from Brown University in 2000, and PhD in computer science from UNC Charlotte in 2009. Prior to his PhD, he worked for Boeing developing real-time flight tracking and visualization software, followed by a position at UNC Charlotte as a research scientist. His current research interests include visual analytics, information visualization, and human-computer interactions. His research has been funded by NSF, DHS, MIT Lincoln Lab, and Draper. He has had best paper, best poster, and honorable mention awards at InfoVis, VAST, CHI, and VDA. He is currently an associated editor of the ACM Transactions on Interactive Intelligent Systems (TiiS) and the Human Computation journals, and he has been a PC and in organizational roles in leading conferences such as InfoVis, VAST, and CHI. He received the NSF CAREER Award in 2015.

  3. Differential Privacy: What Does It Mean and What Can Be Achieved?

    Speaker: Prof. Ninghui Li
    Time: 12:30pm, 12/7/2015
    Location: Room 150 & Room 151
    Over the last decade, differential privacy (DP) has emerged as the standard privacy notion for research in privacy-preserving data analysis and publishing. However, there is an ongoing debate about the meaning and value of DP. Some hail that the notion of DP offers strong privacy protection regardless of the adversary’s prior knowledge while enabling all kinds of data analysis. Others offer criticisms regarding DP’s privacy guarantee and utility limitations.
    In the first part of the talk we explore the meanings of DP, trying to answer the question, under what condition(s), the notion of DP delivers the promised privacy guarantee? We show that DP is based on the following Personal Data Principle: “Data privacy means giving an individual control over his or her personal data. Privacy does not mean that no information about the individual is learned, or no harm is done to an individual. Enforcing the latter is infeasible and unreasonable.” Furthermore, the question of when DP is adequate for a given setting is not just a technical question and depends on legal and ethical considerations.
    In the second part of the talk, we give a survey of the state of the art in publishing a summary of a relational dataset, ranging from publishing histograms for one-dimensional and two-dimensional datasets, to answering marginal queries for datasets with dozens of dimensions, and finally to finding frequent itemsets in transactional datasets with thousands or more of dimensions.

    Ninghui Li is a Professor of Computer Science at Purdue University, where he has been a faculty member since 2003. His research interests are in security and privacy. He has published over 130 referred papers in these areas. Prof. Li is current on the editorial boards of IEEE Transactions on Dependable and Secure Computing (TDSC), Journal of Computer Security (JCS), and ACM Transactions on Internet Technology (TOIT). He was on the editorial board of the VLDB Journal from 2007 to 2013. He recently served as Program Chair of 2014 and 2015 ACM Conference on Computer and Communications Security (CCS), ACM’s flagship conference in the field of security and privacy.

  4. Privacy: from Theory to Reality

    Speaker: Prof. Kobbi Nissim
    Time: 4:00pm – 5:00pm, 12/8/2015
    Location: Room 150
    The treatment of privacy in data analysis has taken a dramatic shift a little more than a decade ago – as failures of traditional privacy preserving techniques were beginning to accumulate, a theoretical, foundational approach to privacy emerged. A key product of this theoretical treatment is “differential privacy”, a definition of privacy in the context of data analysis that has concrete privacy consequences.

    Differential privacy became to be a rich, fast evolving framework for developing privacy preserving algorithms and for studying some of the fundamental properties of privacy. Moreover, differential privacy proved to interact fruitfully with other research areas, and to influence applications that are (seemingly) not related to privacy. With a mature theoretical basis, differential privacy is now at prime time for inclusion in real-world systems.

    We will look into the intuition behind differential privacy, review some of is theory, and examine some of the challenges for applying differential privacy.

    The talk would be self-contained, no prior background on privacy would be assumed.

    Kobbi Nissim is a Professor of Computer Science at Ben-Gurion University and a Senior Research Fellow at the Center for Research on Computation and Society at Harvard. Trained in cryptography, Kobbi maintains a healthy level of paranoia, and feels the ground is shaky whenever issues of security and privacy are not formally defined and analysed.

    Kobbi’s current work is focused on the theory and application of differential privacy. His work from 2003 and 2004 with Dinur and Dwork initiated rigorous foundational research of privacy and presented a precursor of Differential Privacy, a strong definition of privacy in computation that he introduced in 2006 with Dwork, McSherry and Smith. With collaborators, Nissim established some of the basic constructions supporting differential privacy, and studied differential privacy and its relationships with cryptography, statistics, computational learning, mechanism design, and social networks as well as policy and regulation. Since 2011, Kobbi has been involved with the Privacy Tools for Sharing Research Data project at Harvard University, developing privacy-preserving tools for the sharing of social science data. Other contributions of Nissim include the BGN homomorphic encryption scheme with Boneh and Goh, and the research of private approximations.

    In 2013, Nissim received with Irit Dinur the ACM Alberto O. Mendelzon Test-of-Time award for their PODS 2003 paper initializing the foundational work on privacy. In 2016 he will receive with Dwork, McSherry, and Smith the IACR TCC Test-of-Time award for their TCC 2006 paper introducing differential privacy.