【HPC】 OpenACC介绍

主要介绍OpenACC

References

介绍

  • Open Accelerators,OpenACC
  • 编译器:PGI,nvc或nvc++,linux下PGI编译器安装
  • 针对GPU,OpenMP模型在GPU上的扩展,支持AMD GPU

入门例程

示例代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
// 代码 1:test.cpp,使用命令编译:nvc++ -o test -gpu=mem:managed test.cpp
#include <iostream>
#include <vector>
#include <openacc.h>

// Function to initialize the vectors with values
void initialize(std::vector<double>& a, std::vector<double>& b, int n) {
	for(int i = 0; i < n; ++i) {
		a[i] = static_cast<double>(i);
		b[i] = static_cast<double>(2 * i);
	}
}

// detect if GPU is actually running
void detect_gpu()
{
	double a[100], b[100];
	#pragma acc parallel loop
	for (int i = 0; i < 100; ++i) {
		if (i == 10) {
			if (acc_on_device(acc_device_not_host))
				printf("Executing on GPU.\n");
			else
				printf("Not executing on GPU.\n");
		}
		a[i] += b[i];
	}
}

int main() {
	const int n = 1000000; // Size of the vectors
	std::vector<double> a(n), b(n), c(n);
	double *pa = a.data(), *pb = b.data(), *pc = c.data();

	// Initialize vectors a and b
	initialize(a, b, n);

	detect_gpu();

	// Using OpenACC to offload the following computation to an accelerator
	// and explicitly handle data movement
#pragma acc data copyin(pa[0:n], pb[0:n]) copyout(pc[0:n])
	{
#pragma acc parallel loop
		for(int i = 0; i < n; ++i)
			pc[i] = pa[i] + pb[i];
	}

	// Display the first 10 results
	for(int i = 0; i < 10; ++i) {
		std::cout << "c[" << i << "] = " << c[i] << std::endl;
	}
}

编译器选项

  • -ta=tesla: Compiler option to target NVIDIA Tesla GPUs.
  • -Minfo=accel: Provides feedback about the code generated by the compiler.

常用命令

循环

  • #pragma acc parallel: GPU 并行运算
  • #pragma acc kernels: Identifies a code block for parallelization, allowing the compiler to automatically manage parallelism.
  • #pragma acc loop: Used within parallel or kernels regions to indicate loops that should be parallelized. 函数和变量
  • #pragma acc routine: 让一个函数可以在 GPU 代码中被调用(也可以在 CPU 代码调用)。
  • #pragma acc declare: Used for declaring variables or creating a data region. 数据传输
  • #pragma acc data: Manages data movement to and from the GPU.
  • #pragma acc enter data: Specifies data that should be moved to the GPU.
  • #pragma acc exit data: Specifies data to be moved back from the GPU.
  • #pragma acc update: Synchronizes data between the host and the GPU.
  • copy, copyin, copyout, create, present: Clauses for data construct to define how data is handled (e.g., whether it’s copied to/from the GPU or just created there). 线程精细控制
  • gang, worker, vector: Used with loop directive to control how loop iterations are distributed over parallel execution units.
  • collapse(n): Collapses nested loops to enhance parallelism.
  • reduction(operator:list): Performs a reduction operation (like sum, max) across parallel elements.
Licensed under CC BY-NC-SA 4.0
最后更新于 Feb 23, 2025 11:00 +0800
loveleaves
使用 Hugo 构建
主题 StackJimmy 设计