News & Events
Pitt/CMU DB Meetup – Spyros Blanas (Ohio State)
Web data are commonly processed using thousands of CPU cores, and large-scale scientific simulations are quickly approaching the one million CPU core mark. At this scale, the barrier to efficient data analysis is commonly the limited bandwidth to the disk. The growing main memory capacities allow data to be intelligently reduced, analyzed and transformed in situ, before being written to disk or transferred over the network. This talk focuses on accelerating data analysis by embedding in-memory processing capabilities within existing Read More
DB Seminar [Fall 2014]: Thomas Marshall
Big data processing can be expensive and slow, a problem made worse when your data set keeps changing, forcing you to reanalyze it repeatedly. Incremental computation can speed things up by minimizing the work that must be done to update output in response to changing input, but many previous efforts at incremental computation have been limited in the algorithms they can express or require a lot of effort on the part of the application programmer to implement. ThomasDB seeks to Read More
Ippokratis Pandis (Cloudera)
On-line transaction processing (OLTP) is one of the two most important enterprise data management applications. Transaction processing workloads typically exhibit high concurrency and provide ample opportunities for parallel execution by multicore hardware. Unfortunately, due to the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents. As a result, transaction processing systems cannot always convert abundant request-level parallelism into execution parallelism, due to communication bottlenecks. In order to improve scalability of transaction processing, we identify Read More
Bradley C. Kuszmaul (MIT/Tokutek)
I am the founder and Chief Architect at Tokutek and is a Research Scientist in the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory. This talk will discuss how B-trees, Log-Structured Merge Trees and Streaming B-trees operate, and what is their asymptotic performance. Part of the "Seven Databases in Seven Weeks" Seminar Series: http://db.cs.cmu.edu/seminar2014 Read More
Jimeng Sun (Georgia Institute of Technology)
Predictive modeling plays an important role in biomedical research. Thanks to the explosion of Electronic Heart Records (EHR), the interest of building predictive models using EHR data has skyrocketed in recent years. However, the methodologies for develop a predictive model are still labor intensive and ad-hoc. Such rudimentary approaches have hindered the quality and throughput of healthcare and biomedical research. In this talk, we promote a holistic approach that combines 1) algorithm development and 2) system building. We believe such Read More
Flavio Figueiredo wins 1st place in two out of the three tasks of the ECML/PKDD Predictive Analytics Challenge
Flavio was a visiting scholar with the database group at SCS during 2014-15. The challenge was to predict the popularity of a web page time series using 1h of data. Popularity was measured in the number of visits, Facebook likes and mentions the page receives on Twitter. The target time for predictions was 48h. Thus, given 1h worth of activity, the task was to predict the popularity (= amount of activity) of a page, for the next 48 hours. Flavio's Read More
DB Seminar [Fall 2014]: Zhanpeng Fang
Online gaming is one of the largest industries on the Internet, generating tens of billions of dollars in revenues annually. One core problem in online game is to find and convert free users into paying customers, which is of great importance for the sustainable development of almost all online games. Although much research has been conducted, there are still several challenges that remain largely unsolved: What are the fundamental factors that trigger the users to pay? How does users’ paying Read More
Andrew Morrow (MongoDB)
MongoDB is the next-generation database that helps businesses transform their industries by harnessing the power of data. The world’s most sophisticated organizations, from cutting-edge startups to the largest companies, use MongoDB to create applications never before possible at a fraction of the cost of traditional databases. This talk will review the feature set of MongoDB and the history and rationale behind those features, including BSON, the document model, replication, and database sharding. MongoDB's feature set represents a particular set of Read More
DB Seminar [Fall 2014]: Alex Beutel
As we record growing amounts of increasingly detailed user actions and complex interactions, how can we understand and make use of the vast amount of user data? In order to make use of this growing user data, there are a number of technical hurdles: we must be able to understand and model our users, we must be able to handle fraudulent data and adversarial users, and we must be able to scale our learning algorithms and models to big data. Read More
DB Seminar [Fall 2014]: Pengtao Xie
Many graph mining and analysis services have been deployed on the cloud, including Neo4j, GraphDB, Dydra, Infinity Graph, GraphLab, System G, to name a few. Cloud based graph analytics can alleviate users from the burden of implementing and maintaining graph algorithms. However, it invades the security and privacy of users' graph data. To solve this problem, we propose CryptGraph, which runs graph analytics on encrypted graph to preserve the privacy of both users' graph data and the analytic results. In CryptGraph, users encrypt their graph before uploading Read More