This is going to be a short post on how one can view the actual compiler generated CUDA code when running OpenACC on NVIDIA hardware. It is a warmup to my upcoming post on where-data-goes-with-OpenACC’s-various-data-directives.
When compiling a OpenACC accelerated program, heres what a canonical compile command would look like for the PGI compiler
With these set of options, the intermediate PTX or CUDA code is not visible to the user. However, if we add keepgpu,nollvm to the -ta=nvidia option, then the compiler dumps those for us to see. With only keepgpu, you would get only the PTX source code and binaries.
unsignedintencode_block(char*input,unsignedintsize,char*output){// ....shortened for brevity....#pragma acc data present(input[0:size]), present(base64_LUT[64]), copyout(output[0:4*size/3])#pragma acc kernels #pragma acc loop private(decoded_octets, k)for(i=0;i<size;i=i+3){// Calculate the output array position based// on the input array position (loop iteration)k=(4*i)/3;decoded_octets[0]=input[i]>>2;output[k]=base64_LUT[decoded_octets[0]];// ....shortened for brevity....
Would generate an intermediate file that looks like this: