News & Events
Summer 2017 Research Internships
The Carnegie Mellon Database Group is offering multiple internship positions for a special summer research project at its Pittsburgh, Pennsylvania campus. It will be an intense 12-week internship from June to August 2017. The project will be to work on a new open-source distributed database system from scratch. Thus, we are looking for candidates that have strong systems-level C/C++ programming skills. Interns will be paid a three-month summer salary (commensurate with skills and experience) and the cost of travel expenses. Read More
[DB Seminar] Fall 2016: Michael Zhang
Current architectures for main-memory online transaction processing (OLTP) database management systems (DBMS) are based on one of two design choices. In the partition choice, the data is assumed to be well partitioned. Transactions run with little or no concurrency control inside a partition. In the non-partition choice, the data is not required to be partitioned and the system carefully controls communication to achieve high performance. The partition choice has the advantage of performance if the data partitions well. The non-partition Read More
[DB Seminar] Fall 2016: Kijung Shin
How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we use them for algorithm design and applications? A k-core is the maximal subgraph where all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. In this talk, we explore pervasive patterns that are related to k-cores and emerging in graphs from several diverse domains. Read More
Neil Shah (Thesis proposal dry-run)
Given the ever-growing prevalence of online social services, usefully leveraging mas- sive datasets has become an increasingly important challenge for businesses and end-users alike. Online services capture a wealth of information about user behavior and platform in- teractions, such as who-follows-whom relationships in social networks and who-rates-what- and-when relationships in e-commerce networks. Since many of these services rely on data- driven algorithms to recommend relevant content to their users, authenticity of user behavior is paramount to success. But given anonymity Read More
[DB Seminar] Fall 2016: Ziqi Wang
As multicore architecture is becoming the new normal of todays computers, many traditional programming paradigms for mutual exclusion has become a major source of scalability bottleneck. To counter such bottlenecks for our in-memory database prototype at Carnegie Mellon University [1], we implemented a lock-free B+Tree multimap index based on BwTree, which was originally proposed by Microsoft Research [2]. In this presentation, detailed techniques for ensuing correct concurrent updates, efficient operations, and improving scalability are discussed, with an insight into low Read More
[DB Seminar] Fall 2016: Canceled (Nov 14)
This week's DB seminar is cancelled. Read More
[DB Seminar] Fall 2016: Neil Shah
Livestreaming platforms have become increasingly popular in recent years as a means of sharing and advertising creative content. Popular content streamers who attract large viewership to their live broadcasts can earn a living by means of ad revenue, donations and channel subscriptions. Unfortunately, this incentivized popularity has simultaneously resulted in incentive for fraudsters to provide services to astroturf, or artificially inflate viewership metrics by providing fake “live” views to customers. Our work provides a number of major contributions: (a) formulation: Read More
[DB Seminar] Fall 2016: Round table discussion
We will have a round table discussion. Read More
Charlie Swanson (MongoDB)
Everyone knows distributed systems are hard. At MongoDB we want to make it easy to express complex queries and extract insights from your data, but we also need to be able to scale to enormous data sets. To help you scale, we support a deployment which partitions the data amongst multiple machines, but a distributed system complicates even simple queries. Efficiently grouping together results can involve non-trivial communication amongst several machines. For example, the distributed query implementation must coordinate with Read More
Jessie Li (Penn State)
How could we harness the increasingly available big data to understand our dynamic ecosystem? For example, why people or animals move in the space in certain ways and how do their movements respond to surrounding environments? Why are crimes more frequent in certain regions and can we explain it using heterogeneous urban data? Is shale gas development contaminating our environment and how to mine the correlations between environment and all potential factors? Our research aims to develop data mining techniques Read More