Automated Database Design for Large-Scale Scientific Data

Database support for scientific data is challenging due to the massive data volumes and the complex and diverse query workloads in modern scientific applications. An effective database physical design is critical in supporting a large variety of SQL queries over large-scale data. Taking advantage of detailed workload information is the key to guiding the physical design process towards efficient solutions: For instance, the Sloan Digital Sky Survey (SDSS) astronomical database contains tables with hundreds of attributes, which can be queried in various combinations. Designing indexes requires detailed workload knowledge in order to identify attribute subsets that are important for queries and must be indexed. Besides performance, database physical design must satisfy additional constraints, related to the maintenance of large-scale data: Database structures like indexes or materialized views are constrained by the available resources (like disk space) and also by the intensity of database updates which is proportional do the number of structures that must be maintained.



Award # 0431008, COLLABORATIVE RESEARCH: SEI + II (AST): Bypass-Yield Caching for Large-Scale Scientific Database Workloads in the World-Wide Telescope.