References
- openACC resources
- OpenACC 笔记
- OpenACC Programming and Best Practices Guide
- openacc-training-materials
介绍
- Open Accelerators,OpenACC
- 编译器:PGI,nvc或nvc++,linux下PGI编译器安装
- 针对GPU,OpenMP模型在GPU上的扩展,支持AMD GPU
入门例程
示例代码
|
|
编译器选项
- -ta=tesla: Compiler option to target NVIDIA Tesla GPUs.
- -Minfo=accel: Provides feedback about the code generated by the compiler.
常用命令
循环
- #pragma acc parallel: GPU 并行运算
- #pragma acc kernels: Identifies a code block for parallelization, allowing the compiler to automatically manage parallelism.
- #pragma acc loop: Used within parallel or kernels regions to indicate loops that should be parallelized. 函数和变量
- #pragma acc routine: 让一个函数可以在 GPU 代码中被调用(也可以在 CPU 代码调用)。
- #pragma acc declare: Used for declaring variables or creating a data region. 数据传输
- #pragma acc data: Manages data movement to and from the GPU.
- #pragma acc enter data: Specifies data that should be moved to the GPU.
- #pragma acc exit data: Specifies data to be moved back from the GPU.
- #pragma acc update: Synchronizes data between the host and the GPU.
- copy, copyin, copyout, create, present: Clauses for data construct to define how data is handled (e.g., whether it’s copied to/from the GPU or just created there). 线程精细控制
- gang, worker, vector: Used with loop directive to control how loop iterations are distributed over parallel execution units.
- collapse(n): Collapses nested loops to enhance parallelism.
- reduction(operator:list): Performs a reduction operation (like sum, max) across parallel elements.