Archived Events

Archived Events

Sep 14

2017

Sep 14 2017
InfluxDB Storage Engine Internals (Paul Dix)
Speaker:
Paul Dix
System:
InfluxDB
Video:
YouTube

InfluxDB is an open source time series database written in Go. This talk will introduce how InfluxDB structures time series data and what makes it different from other use cases like OLTP. We'll then go into the internals of the storage engine we wrote from scratch, the Time Structured Merge Tree, heavily inspired by LSM trees. In addition to the... Read More

Sep 11

2017

Sep 11 2017
[DB Seminar] Fall 2017: Joy Arulraj
Speaker:
Joy Arulraj

For the first time in 25 years, a new non-volatile memory (NVM) category is being created that is expected to be 1000 times faster than current durable storage devices. The advent of NVM will fundamentally change the dichotomy between memory and durable storage in database systems (DBMSs). These new NVM devices are almost as fast as DRAM, but all writes... Read More

May 22

2017

May 22 2017
[DB Seminar] Spring 2017: Yingjun Wu
Speaker:
Yingjun Wu

The emergence of large main memories and massively parallel processors has triggered the development of multi-core main-memory database management systems (DBMSs). Although the reduction of disk accesses results in low single-thread transaction execution time, scaling these systems on multi-core machines remains notoriously difficult. In particular, the concurrent processing of a large number of transactions can bring about significant performance bottlenecks.... Read More

May 15

2017

May 15 2017
[DB Seminar] Spring 2017: Priya Govindan
Speaker:
Priya Govindan

The structure of real-world complex networks has long been an area of interest, and one common way to describe the structure of a network has been with the k-core decomposition. The core number of a node can be thought of as a measure of its centrality and importance, and is used by applications such as community detection, understanding viral spreads,... Read More

May 15

2017

May 15 2017
Alicia Klinvex (Sandia National Labs)
Speaker:
Alicia Klinvex

As parallel computing tends toward the exascale, scientific data produced by simulations are growing increasingly massive, sometimes resulting in terabytes of data.  By viewing this data as a dense tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving impressive compression ratios with negligible loss in accuracy.  We present recent improvements in our distributed-memory parallel implementation... Read More

May 11

2017

May 11 2017
Pedro Ribeiro (University of Porto)
Speaker:
Pedro Ribeiro

One way of understanding the design principles of complex networks is to look at how they are organized at the subgraph level. In this talk I will describe how subgraphs can be seen as fundamental structural units and how they can provide a powerful and very flexible framework for characterizing and comparing networks. I will focus on two concepts geared around... Read More

May 8

2017

May 8 2017
[DB Seminar] Spring 2017: Andy Pavlo
Speaker:
Andy Pavlo

Most of the academic papers on concurrency control published in the last five years have assumed the following two design decisions: (1) applications execute transactions with serializable isolation and (2) applications execute most (if not all) of their transactions using stored procedures. I know this because I am guilty of writing these papers too. But results from a recent survey... Read More

May 5

2017

May 5 2017
Miguel Araújo (Thesis defense dry-run)
Speaker:
Miguel Araújo

The identification of anomalies and communities of nodes in real-world graphs has applications in widespread domains, from the automatic categorization of wikipedia articles or websites to bank fraud detection. While recent and ongoing research is supplying tools for the analysis of simple unlabeled data, it is still a challenge to find patterns and anomalies in large labeled datasets such as time evolving networks. What do... Read More

May 2

2017

May 2 2017
Agma Traina and Caetano Traina (University of São Paulo)
Speaker:
Agma Traina and Caetano Traina

The evolution of the Relational Database Management Systems must include not only resources to handle big data, but also complex data (such as images, audios, videos, graphs, multidimensional data, long texts, time series, genetic sequences, etc.), where order-based comparisons are not appropriate, and identity-based comparisons are meaningless. Comparing complex data by similarity stirrers much more meaning from data. However, current... Read More

May 1

2017

May 1 2017
[DB Seminar] Spring 2017: Marcel Kornacker
Speaker:
Marcel Kornacker
System:
Impala

Running real-time data-intensive applications on Apache Hadoop requires complex architectures to store and query data, typically involving multiple independent systems that are tied together through custom-engineered pipelines. A common pattern is to use a NoSQL engine like Apache HBase for caching and later transformations, the results of which are periodically written to HDFS in one of the popular open columnar... Read More