Archived Events

Archived Events

May 15

2017

May 15 2017
Alicia Klinvex (Sandia National Labs)
Speaker:
Alicia Klinvex

As parallel computing tends toward the exascale, scientific data produced by simulations are growing increasingly massive, sometimes resulting in terabytes of data.  By viewing this data as a dense tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving impressive compression ratios with negligible loss in accuracy.  We present recent improvements in our distributed-memory parallel implementation... Read More

May 11

2017

May 11 2017
Pedro Ribeiro (University of Porto)
Speaker:
Pedro Ribeiro

One way of understanding the design principles of complex networks is to look at how they are organized at the subgraph level. In this talk I will describe how subgraphs can be seen as fundamental structural units and how they can provide a powerful and very flexible framework for characterizing and comparing networks. I will focus on two concepts geared around... Read More

May 8

2017

May 8 2017
[DB Seminar] Spring 2017: Andy Pavlo
Speaker:
Andy Pavlo

Most of the academic papers on concurrency control published in the last five years have assumed the following two design decisions: (1) applications execute transactions with serializable isolation and (2) applications execute most (if not all) of their transactions using stored procedures. I know this because I am guilty of writing these papers too. But results from a recent survey... Read More

May 5

2017

May 5 2017
Miguel Araújo (Thesis defense dry-run)
Speaker:
Miguel Araújo

The identification of anomalies and communities of nodes in real-world graphs has applications in widespread domains, from the automatic categorization of wikipedia articles or websites to bank fraud detection. While recent and ongoing research is supplying tools for the analysis of simple unlabeled data, it is still a challenge to find patterns and anomalies in large labeled datasets such as time evolving networks. What do... Read More

May 2

2017

May 2 2017
Agma Traina and Caetano Traina (University of São Paulo)
Speaker:
Agma Traina and Caetano Traina

The evolution of the Relational Database Management Systems must include not only resources to handle big data, but also complex data (such as images, audios, videos, graphs, multidimensional data, long texts, time series, genetic sequences, etc.), where order-based comparisons are not appropriate, and identity-based comparisons are meaningless. Comparing complex data by similarity stirrers much more meaning from data. However, current... Read More

May 1

2017

May 1 2017
[DB Seminar] Spring 2017: Marcel Kornacker
Speaker:
Marcel Kornacker
System:
Impala

Running real-time data-intensive applications on Apache Hadoop requires complex architectures to store and query data, typically involving multiple independent systems that are tied together through custom-engineered pipelines. A common pattern is to use a NoSQL engine like Apache HBase for caching and later transformations, the results of which are periodically written to HDFS in one of the popular open columnar... Read More

Apr 25

2017

Apr 25 2017
Dhivya Eswaran and Zongge Liu (SDM2017 dry run)
Speaker:
Dhivya Eswaran and Zongge Liu

Dhivya and Zongge will have dry runs for SDM 2017. Dhivya's talk information: Title: The Power of Certainty: A Dirichlet Multinomial Model for Belief Propagation Abstract: Given a friendship network, how certain are we that Smith is a progressive (vs. conservative)? How can we propagate these certainties through the network? While Belief propagation marked the beginning of principled label propagation to classify... Read More

Apr 24

2017

Apr 24 2017
[DB Seminar] Spring 2017: Dana Van Aken
Speaker:
Dana Van Aken
System:
OtterTune

Database management system (DBMS) configuration tuning is an essential aspect of any data-intensive application effort. But this is historically a difficult task because DBMSs have hundreds of configuration "knobs" that control everything in the system, such as the amount of memory to use for caches and how often data is written to storage. The problem with these knobs is that... Read More

Apr 17

2017

Apr 17 2017
[DB Seminar] Spring 2017: Hyeontaek Lim
Speaker:
Hyeontaek Lim

Multi-core in-memory databases promise high-speed online transaction processing.  However, the performance of individual designs suffers when the workload characteristics miss their small sweet spot of a desired contention level, read-write ratio, record size, processing rate, and so forth. Cicada is a single-node multi-core in-memory transactional database with serializability.  To provide high performance under diverse workloads, Cicada reduces overhead and contention... Read More

Apr 10

2017

Apr 10 2017
[DB Seminar] Spring 2017: Mohammad Hammoud
Speaker:
Mohammad Hammoud

Relational join is a fundamental data management operation, which highly influences the performance of almost every database query. In this talk, I will show that different workload characteristics and hardware configurations necessitate different main-memory hash join models. Subsequently, I will identify four effective models by which any hash-based join algorithm can be executed. I will characterize the relative merits of... Read More