Chris Jermaine (Rice University)
In this talk, I’ll describe the SimSQL system, which is a platform for writing and executing statistical codes over large data sets, particularly for machine learning applications. Codes that run on SimSQL can be written in a very high-level, declarative language called Buds. A Buds program looks a lot like a mathematical specification of an algorithm, and statistical codes written in Buds are often just a few lines long.
At its heart, SimSQL is really a relational database system, and like other relational systems, SimSQL is designed to support data independence. That is, a single declarative code for a particular statistical inference problem can be used regardless of data set size, compute hardware, and physical data storage and distribution across machines. One concern is that a platform supporting data independence will not perform well. But we’ve done extensive experimentation, and have found that SimSQL performs as well as other competitive platforms that support writing and executing machine learning codes for large data sets.
Chris Jermaine is an associate professor of computer science at Rice University. He is the recipient of an Alfred P. Sloan Foundation Research Fellowship, a National Science Foundation CAREER award, and an ACM SIGMOD Best Paper Award. In his spare time, Chris enjoys outdoor activities such as hiking, climbing, and whitewater boating. In one particular exploit, Chris and his wife floated a whitewater raft (home-made from scratch using a sewing machine, glue, and plastic) over 100 miles down the Nizina River (and beyond) in Alaska.
More Info: http://www.pdl.cmu.edu/SDI/2015/020515.html