News & Events
[PDL Visit Day 2015] Roger MacNicol (Oracle)
With the end of the civil war between Hadoop and traditional database, customers have data in both: using the most appropriate tool for whichever kind of data it is. The natural result of this is a need for a unified query infrastructure to provide a simple interface to request reports that may draw on data in, for example, Oracle, MongoDB, and Cloudera, and return those results in a timely manner. We proposed and implemented an architecture based on Oracle’s SmartScan Read More
[PDL Visit Day 2015] Tirthankar Lahiri (Oracle)
The Oracle Database In-Memory Option allows Oracle to function as the industry-first dual-format in-memory database. Row formats are ideal for OLTP workloads which typically use indexes to limit their data access to a small set of rows, while column formats are better suited for Analytic operations which typically examine a small number of columns from a large number of rows. Since no single data format is ideal for all types of workloads, our approach was to allow data to be Read More
DB Seminar [Spring 2015]: Round Table Discussion
This Monday we will have a round table discussion Read More
DB Seminar [Spring 2015]: Bruno Ribeiro
Abstract Complex network phenomena – such as information cascades in online social networks – are hard to fully observe, model, and forecast. In forecasting, a recent trend has been to forgo the use of parsimonious models in favor of models with increasingly large degrees of freedom that are trained to learn the behavior of a process from historical data. Extrapolating this trend into the future, eventually we would renounce models all together. But is it possible to forecast the evolution Read More
DB Seminar [Spring 2015]: Miguel Araujo
Abstract: What do real communities in social networks look like? How can we find them efficiently? Community detection plays a key role in understanding the structure of real-life graphs with impact on recommendation systems, load balancing and routing. Previous community detection methods look for uniform blocks in adjacency matrices, but after studying four real networks with ground-truth communities, we provide empirical evidence that communities are best represented as having hyperbolic structure. Our new matrix decomposition method is able to describe binary Read More
Justin Levandoski + Dharma Shukla (Microsoft)
Azure DocumentDB is Microsoft's multi-tenant distributed database service for managing JSON documents at Internet scale. DocumentDB is now generally available to Azure developers. Built from the ground up as a multi-tenant service, DocumentDB is designed to operate within extremely frugal resource budgets while providing predictable performance and robust resource isolation to its tenants. DocumentDB indexing enables automatic indexing of documents without requiring a schema or secondary indices. Uniquely, DocumentDB provides real-time consistent queries in the face of very high rates Read More
Peter Bailis (University of California, Berkeley)
The rise of Internet-scale geo-replicated services has led to considerable upheaval in the design of modern data management systems. Namely, given the availability, latency, and throughput penalties associated with classic mechanisms such as serializable transactions, a broad class of systems (e.g., "NoSQL") has sought weaker alternatives that reduce the use of expensive coordination during system operation, often at the cost of application integrity. When can we safely forego the cost of this expensive coordination, and when must we pay the Read More
Sudipto Das (Microsoft Research)
Multi-tenancy and resource sharing are essential to make a Relational Database-as-a-Service (DaaS), such as Azure SQL Database, cost-effective. However, one major consequence of resource sharing is that the performance of one tenant's workload can be significantly affected by the resource demands of co-located tenants. In the SQLVM project at Microsoft Research, our approach to performance isolation in a DaaS is to isolate the key resources, such as CPU, I/O and memory, needed by the tenants' workload. The major challenge is Read More
DB Seminar [Spring 2015]: Pengtao Xie
Abstract: Personal photos are enjoying explosive growth with the popularity of photo-taking devices and social media. The vast amount of online photos largely exhibit users’ interests, emotion and opinions. Mining user interests from personal photos can boost a number of utilities, such as advertising, interest based community detection and photo recommendation. In this talk, I will introduce our work on mining user interests from personal photos. We propose a User Image Latent Space Model to jointly model user interests and image contents. User interests are modeled as latent Read More
DB Seminar [Spring 2015]: Gisele Pappa
Abstract: In this seminar I will present three of my ongoing projects. I will start talking about dengue fever modeling, its challenges and opportunities. Dengue fever is a tropical, mosquito transmitted disease that has been growing significantly in the past decade. The main goal of this project is to exploit real cases data and Twitter data to generate a predictive system that allows government policies to be put in place up to two weeks before the disease outbreaks really happen. Following, I will present a Bayesian Read More