登入
選單
返回
Google圖書搜尋
Software-defined Hardware Without Sacrificing Performance
Matthew Feldman (Electrical engineer)
出版
Stanford University
, 2021
URL
http://books.google.com.hk/books?id=fCygzgEACAAJ&hl=&source=gbs_api
註釋
In recent years, field programmable gate arrays (FPGAs) have emerged as popular accelerators for certain kinds of compute workloads. FPGAs allow the programmer to define and implement digital circuits that are specialized for arbitrary computation. They exhibit performance-per-watt advantages over traditional instruction-based architectures, like CPUs and GPUs, and are more flexible than customized application-specific integrated circuits (ASICs). Traditionally, the best languages for programming FPGAs rely on the programmer to describe their circuit using register-transfer languages (RTLs), such as Verilog and VHDL. This level of abstraction is very cumbersome and makes it difficult for the programmer to quickly and easily design their application. It is usually rare for a programmer who is an expert in a field, such as machine learning, to also be an expert in writing RTL. This mismatch is the motivation for the work in this thesis. The work presented in this thesis envisions a new class of performance-oriented programmers who wish to design highly efficient applications on FPGAs without requiring intimate knowledge of RTLs. There are a variety of new high level languages aimed at bridging this gap between performance and abstraction. One approach is high level synthesis (HLS) tools, and allow the programmer to write code in languages like C with pragmas that help describe how it should map to hardware. While these languages can generate highly efficient hardware designs, their API can be too restrictive for programmers who cannot formulate their whole application within the domain. In this work, we introduce tools and optimizations that were built on top of Spatial, a high level language for programming FPGAs and other reconfigurable dataflow architectures (RDUs), such as Plasticine. We show how this tool can help the programmer achieve up to 22x better performance while improving resource utilization by a factor up to 2.8x on a variety of standard benchmarks. Then, we introduce a memory partitioning tool that is capable of quickly solving for highly efficient schemes. We show that this tool can be used to automatically improve LUT utilization by up to 86\%, BRAM utilization by up to 38\%, and almost always eliminate all DSPs from the memory partitioning logic on a variety of benchmarks, as compared to other state-of-the-art tools. Next, we show how these new components in Spatial can be used to design a data compression kernel that can be used to solve a real-world data compression problem. We explored a variety of different classes of machine learning kernels and swept the parameter space of each one to characterize each one on ML accuracy, latency, and resource utilization. Finally, we describe a series of enhancements added to the Spatial language and compiler that enable new kinds of applications and provide the possibility of using Spatial in more diverse environments.