登入
選單
返回
Google圖書搜尋
Enabling Efficient Parallelism for Applications with Dependences and Irregular Memory Accesses
Peng Jiang (Professor of computer science)
出版
Ohio State University
, 2019
URL
http://books.google.com.hk/books?id=fOXzzQEACAAJ&hl=&source=gbs_api
註釋
Another problem that prevents many applications from exploiting fine-grain parallelism is the irregular memory accesses. Particle simulations, unstructured grid computation, sparse matrix multiplication and iterative graph algorithms are examples of such applications. Previous works mainly rely on tiling and data reorganization to improve data locality and resolve the data conflicts in SIMD processing of these applications. However, the overhead of data reorganization sometimes can be so high that it denies the benefit of SIMD processing itself. We provide a technique that reuses the data reorganization information to amortize its overhead for a class of dynamic or adaptive irregular applications. We also present a specific vectorization method called bucketized hashing with offsets to accelerate hash-based aggregation that has the data conflicts problem but cannot afford any data reorganization because the computation only has one iteration. For emerging platforms, we further present a technique called in-vector reduction that utilizes the new conflict-detection feature in Intel AVX-512 instruction set to resolve the data conflicts in associative irregular applications and achieve high SIMD utilization with none or little data reorganization overhead. Sparse matrix multiplication represents another important class of irregular applications. A particular type of sparse matrix computation that involves multiplying a sparse matrix with one or more dense matrices (e.g., SpMM and SDDMM) is emerging in recent years due to its importance in machine learning and data mining applications. We observe that current techniques for accelerating SpMM and SDDMM can deliver good performance only when the sparse matrix has a clustered structure. Based on the observation, we proposed a clustering-based row-reordering technique to further improve the performance of SpMM and SDDMM on GPUs.