[Vaccination 2021] Convex: Life Without a Backend Team (James Cowling)
Many of us have devoted decades of our lives to making databases faster, cheaper, more scalable, more reliable... much of which is completely irrelevant to the average developer. The main obstacle for the next generation of app developers is convenience. The "serverless revolution" and architectures like Jamstack demonstrate a desire for life without a backend team, yet current technologies in this sphere are poorly suited to truly dynamic/interactive content. Convex is a database and execution platform designed to allow developers... Read More
[Vaccination 2021] Query Optimization and Acceleration at Dremio (Steven Phillips)
This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
[Vaccination 2021] Fluree – Cloud-Native Ledger Graph Database (Brian Platz)
Fluree is an immutable RDF graph database that, beyond performing typical modern database functions, emphasizes security, trust, provenance, privacy, and interoperability. Fluree is open source under the AGPL license and is built on Clojure and open W3C standards. Fluree natively supports JSON and JSON-LD and can leverage/enforce any RDF ontology. The Fluree system consists of a cryptographically-secure ledger to handle state and a scalable semantic graph database to serve queries. Fluree uses SmartFunctions, smart and flexible embedded data policies similar... Read More
[Vaccination 2021] Vertica – High Performance Over Varying Terrain (Stephen Walkauskas)
Vertica is an OLAP database, originally designed when a "beefy" server had 16GB of memory and 8 CPUs, and a 5TB data warehouse was considered to be HUGE. A lot has changed since then. Though CPUs have gotten only a bit faster there are more cores on a die. Applications are commonly run in virtualized environments these days. Data volumes have ballooned, multi-petabyte databases are becoming more common. We've enjoyed the challenge of re-architecting Vertica to keep pace. In this... Read More
[Vaccination 2021] The Pinecone Vector Database System (Edo Liberty)
Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. Learn how we combine state-of-the-art vector indexing with infrastructure orchestration to provide high-performance vector search at any scale. This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
Rethinking Systems for Data-Intensive Computing (Matei Zaharia)
A growing fraction of applications today, from basic business processing to machine learning, are data-intensive: they need to correctly process and produce massive datasets that are too large for any human to inspect. These applications pose many systems challenges, from programming interfaces, to monitoring and debugging (how can a human make sure these applications are working well?), to performance. I’ll talk about several research projects that introduce novel ways to tackle these challenges. On the performance side, many researchers have... Read More
[Vaccination 2021] An Overview of the Starburst Trino Query Optimizer (Karol Sobczak)
Starburst unlocks the value of data by making it fast and easy to analyze anywhere. Starburst queries data across any database, making it instantly actionable for organizations. With Starburst, teams can lower the total cost of their infrastructure and analytics investments, use the tools that work best for their business, and our open source Trino roots prevent data lock-in. Trusted by companies like Comcast, FINRA, and Condé Nast, Starburst helps companies make better decisions faster on all data. Starburst is... Read More
[Vaccination 2021] Reinventing Amazon Redshift (Ippokratis Pandis)
In 2013, eight years ago, Amazon Web Services revolutionized the data warehousing industry by launching Amazon Redshift, the first fully managed, petabyte-scale cloud data warehouse solution. Amazon Redshift made it simple and cost-effective to efficiently analyze large volumes of data using existing business intelligence tools. This launch was a significant leap from the traditional on-premise data warehousing solutions which were expensive, rigid (not elastic), and needed a lot of tribal knowledge to perform. Unsurprisingly, customers embraced Amazon Redshift and it... Read More
[Vaccination 2021] How to Count Things with dbt (Drew Banin)
Modern organizations leverage machine learning, data science, and AI to build predictive, responsive, and personalized applications. BUT! Most are bad at counting things. That's where dbt comes in. dbt is an open source framework used to define, test, and document datasets. In this talk, we will discuss the what, why, and how behind dbt and data warehousing in the year 2021. This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
[Vaccination 2021] Bodo: Automatic HPC Performance and Scaling for Data Processing in Python (Ehsan Totoni)
Python is the language of choice for machine learning (ML) and AI, but SQL has been used for data processing for decades. Many data applications are often a mix of the two languages, which makes development and deployment cumbersome for data teams. BodoSQL addresses the "two-language" problem by compiling Python and SQL code together, providing type checking, error checking, end-to-end optimization, and parallelization across the two languages. Furthermore, BodoSQL uses Bodo's high performance computing (HPC) parallel architecture with MPI for... Read More