News & Events
DB Seminar [Fall 2014]: Lianghong Xu
With the rise of large-scale, Web-based applications, users are increasingly adopting a new class of document-oriented database management systems (DBMSs) that allow for rapid prototyping while also achieving scalable performance. Like for other distributed storage systems, replication is an important consideration for document DBMSs in order to guarantee availability. Replication can be between failure-independent nodes in the same data center and/or in geographically diverse data centers. A replicated DBMS maintains synchronization across multiple nodes by sending operation logs (oplogs) across Read More
DB Seminar [Fall 2014]: Sang-Chul Lee
Abstract: This talk addresses the problem in Web page ranking of effectively combining link and content information with efficiency high enough to be applicable to real-world search engines. Unlike previous surfer models, our approach is based on the viewpoint of a Web page author. Based on this viewpoint, we formulate the concept of contribution score, which indicates the amount to which a term in each page is utilized by other pages. To improve efficiency without loss of effectiveness, we exploit Read More
Marcos K. Aguilera (ex-MSR SVC)
Web applications (web mail, web stores, social networks, etc) keep massive amounts of data that must be readily available to users. The storage system underlying these applications has evolved dramatically over the past 25 years, from file systems, to SQL database systems, to a large variety of NOSQL systems. In this talk, we contemplate this fascinating history and present a new storage system called Yesquel. Yesquel combines several advantages of prior systems. It supports the SQL query language to facilitate Read More
DB Seminar [Fall 2014]: Nobu Furukawa
Abstract: Improving student productivity in online learning depends on designing learning environments based on principles derived from learning science research into how people learn. Students master a skill by solving the sequence of practice exercises related to the skill. The initial development of a skills model on a course, defining skills and associate them with exercises, heavily relies on human intuition, thus it might have a discrepancy which leads to differences between expected and actual student performance. Learning Factors Analysis Read More
Ryan Betts (VoltDB)
VoltDB is an in-memory, relational, SQL, fully ACID database well suited to supporting transactional applications against high speed event feeds. We’ll discuss: Why VoltDB exists and the Stonebraker H-Store history How VoltDB differs from its academic roots (and why) Unique VoltDB capabilities that enable streaming data pipelines Some engineering insights after 6 years of developing a distributed, consistent, high performance, fault tolerant, production system supporting many, many billions of production transactions every day. Part of the "Seven Databases in Seven Read More
The Future of Databases is Not a Database (Ori Herrnstadt)
We all get excited about the next technical capability. In-memory - cool; scalable - even cooler; vector based execution, real-time code generation, etc etc. But do these really tackle the most important problems that will lead the next generation of databases? In this presentation Ori will present FoundationDB - a fault-tolerant, scalable and transactional K/V store, and the languages he and his team are building on top of it like SQL, document and Graph. He will then explore how this Read More
DB Seminar [Fall 2014]: Jianquan Liu
In this talk, Dr. Liu will briefly introduce the related research topics that are currently conducted at the Central Research Laboratories of NEC Corporation, such as big data processing. He will then focus on the introduction to a commercial level demo system for surveillance video search, named Wally, which will be exhibited at the ACM Multimedia 2014. Wally is a scalable distributed automated video surveillance system with rich search functionalities, and integrated with image processing products developed by NEC, such Read More
DB Seminar [Fall 2014]: Yuto Yamaguchi
The location pro les of social media users are valuable for various applications, such as marketing and real-world anal- ysis. As most users do not disclose their home locations, the problem of inferring home locations has been well stud- ied in recent years. In fact, most existing methods perform batch inference using static (i.e., pre-stored) social media contents. However, social media contents are generated and delivered in real-time as social streams. In this situation, it is important to continuously update Read More
Ippokratis Pandis (Cloudera)
The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. With Impala, the academic and Hadoop communities now have an open-sourced codebase that helps query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch Read More
DB Seminar [Fall 2014]: Neil Shah
Abstract: How can we detect suspicious users in large online networks? Online popularity of a user or product (via follows, page-likes, etc.) can be monetized on the premise of higher ad click-through rates or increased sales. Web services and social networks which incentivize popularity thus suffer from a major problem of fake connections from link fraudsters looking to make a quick buck. Typical methods of catching this suspicious behavior use spectral techniques to spot large groups of often blatantly fraudulent Read More