报告题目： Building Modern HPC Storage for Data-intensive Applications
报告时间： 2018.1.3 14:00
报告摘要：Modern scientific discoveries heavily rely on large-scale data processing, making storage system one of the most critical component in high-performance computing (HPC) systems. To facilitate scientific applications, we focused on two aspects of the modern HPC storage infrastructure: 1) fast data accessing; 2) efficient data management. Regarding data access performance, we focused on solving the I/O interference issues in large-scale parallel systems. We proposed a randomized I/O scheduling algorithm and validated its advantages on real-world workloads. Regarding data management. We focused on the concept of rich metadata, which records the detailed run-time data access activities in addition to the traditional POSIX metadata. Such information can efficiently be used to understand how data were processed, how a result was generated, or how a discovery can be lossless reproduced. To build such an infrastructure, we conducted series of study on adaptive rich metadata granularity model and tracing tool, graph-based rich metadata storage engine with new incremental graph partitioning algorithms, and rich metadata query engine based on asynchronous graph traversal. Comparing with state-of-the-art solutions, our system paved the way to a practical rich metadata management system with better performance and scalability.
代栋，中国科学技术大学计算机科学与技术专业本科毕业生，中国科学技术大学计算机体系结构专业博士毕业生。2013年博士毕业后赴美国德州理工大学从事博士后研究工作，现为德州理工大学助理研究教授，主要从事高性能计算以及存储系统相关领域的研究工作，研究方向集中于并行文件系统、I/O调度及优化、高性能计算系统元数据管理、大规模图存储等领域。在相关领域的主流国际期刊及会议上发表论文多篇。核心成果多次发表于国际高性能与并行计算领域的顶级会议,包括 International Conference for High Performance Computing, Networking, Storage and Analysis(SC), ACM Symposium on High-Performance Parallel and Distributed Computing(HPDC), Parallel Architectures and Compilation Techniques(PACT)等。