Profiling PyTorch with NVIDIA Deep Learning Profiler (DLProf)

DLProf is a framework provided by NVIDIA that can profile major deep learning frameworks such as TensorFlow and PyTorch. DLProf has the ability to provide detailed profiling of CPU and GPU processes. DLProf is an essential tool for optimizing hardware usage of training and custom written CUDA code. DLProf can also offer suggestion on how to improve code.

Setting up DLProf

To use DLProf it is recommended to use the NGC docker PyTorch container as it all DLProf features ready to run.

As of Jan 2022 the supported python container version is 21.11. It can be downloaded with the following docker command:

docker pull nvcr.io/nvidia/pytorch:21.11-py3


Preparing PyTorch Code

Before running the DLProf container, PyTorch code must be updated to include the nvidia_dlprof_pytorch_nvtx module by adding the following lines:

import nvidia_dlprof_pytorch_nvtx

nvidia_dlprof_pytorch_nvtx.init()

The loop/object that's inference will be measured must be wrapped with the following:

with torch.autograd.profiler.emit_nvtx():


Launching DLProf

Once the previous steps have been completed, the code can now be profiled.

docker run --rm --gpus=1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it -p8000:8000 -v /path/to/code/on/host:/data nvcr.io/nvidia/pytorch:21.11-py3

Once the contained has been launch we can navigate to /data directory and strt the DLProf:

dlprof --force true --output_path ./dlprof --profile_name test python code.py

--force true will automatically overwrite previous results with the same name

--output_path sets the output directory to store the results

--profile_name sets the name of this profiling session for output files


Analyzing Results

DLProf has a built-in results viewer that is can be accessed through a web browser. DLProf viewer can be launched with the following:

dlprofviewer -b 0.0.0.0 ./profile_name_dldb.sqlite

-b binds the server to the host machine's IP allowing for the result to be viewed on a remote machine

-p allows for the port to be changed from the default of 8000

The results can then be viewed by accessing the following address in a web browser:

http://<Host IP Address>:8000