GPU Programming with Python

Chưa phân loại

In this article, we’ll dive into GPU programming with Python. Using the ease of Python, you can unlock the incredible computing power of your video card’s GPU (graphics processing unit). In this example, we’ll work with NVIDIA’s CUDA library.


For this exercise, you’ll need either a physical machine with Linux and an NVIDIA-based GPU, or launch a GPU-based instance on Amazon Web Services. Either should work fine, but if you choose to use a physical machine, you’ll need to make sure you have the NVIDIA proprietary drivers installed, see instructions:

You’ll also need the CUDA Toolkit installed. This example uses Ubuntu 16.04 LTS specifically, but there are downloads available for most major Linux distributions at the following URL:

I prefer the .deb based download, and these examples will assume you chose that route. The file you download is a .deb package but doesn’t have a .deb extension, so renaming it to have a .deb at the end his helpful. Then you install it with:

sudo dpkg -i package-name.deb

If you are prompted about installing a GPG key, please follow the instructions given to do so.

Now you’ll need to install the cuda package itself. To do so, run:

  sudo apt-get update  sudo apt-get install cuda -y  

This part can take a while, so you might want to grab a cup of coffee. Once it’s done, I recommend rebooting to ensure all modules are properly reloaded.

Next, you’ll need the Anaconda Python distribution. You can download that here:

Grab the 64-bit version and install it like this:

sh Anaconda*.sh

(the star in the above command will ensure that the command is ran regardless of the minor version)

The default install location should be fine, and in this tutorial, we’ll use it. By default, it installs to ~/anaconda3

At the end of the install, you’ll be prompted to decide if you wish to add Anaconda to your path. Answer yes here to make running the necessary commands easier. To ensure this change takes place, after the installer finishes completely, log out then log back in to your account.

More info on Installing Anaconda:

Finally we’ll need to install Numba. Numba uses the LLVM compiler to compile Python to machine code. This not only enhances performance of regular Python code but also provides the glue necessary to send instructions to the GPU in binary form. To do this, run:

conda install numba

Limitations and Benefits of GPU Programming

It’s tempting to think that we can convert any Python program into a GPU-based program, dramatically accelerating its performance. However, the GPU on a video card works considerably differently than a standard CPU in a computer.

CPUs handle a lot of different inputs and outputs and have a wide assortment of instructions for dealing with these situations. They also are responsible for accessing memory, dealing with the system bus, handling protection rings, segmenting, and input/output functionality. They are extreme multitaskers with no specific focus.

GPUs on the other hand are built to process simple functions with blindingly fast speed. To accomplish this, they expect a more uniform state of input and output. By specializing in scalar functions. A scalar function takes one or more inputs but returns only a single output. These values must be types pre-defined by numpy.

Example Code

In this example, we’ll create a simple function that takes a list of values, adds them together, and returns the sum. To demonstrate the power of the GPU, we’ll run one of these functions on the CPU and one on the GPU and display the times. The documented code is below:

  import numpy as np  from timeit import default_timer as timer  from numba import vectorize    # This should be a substantially high value. On my test machine, this took  # 33 seconds to run via the CPU and just over 3 seconds on the GPU.  NUM_ELEMENTS = 100000000    # This is the CPU version.  def vector_add_cpu(a, b):    c = np.zeros(NUM_ELEMENTS, dtype=np.float32)    for i in range(NUM_ELEMENTS):      c[i] = a[i] + b[i]    return c    # This is the GPU version. Note the @vectorize decorator. This tells  # numba to turn this into a GPU vectorized function.  @vectorize(["float32(float32, float32)"], target='cuda')  def vector_add_gpu(a, b):    return a + b;    def main():    a_source = np.ones(NUM_ELEMENTS, dtype=np.float32)    b_source = np.ones(NUM_ELEMENTS, dtype=np.float32)      # Time the CPU function    start = timer()    vector_add_cpu(a_source, b_source)    vector_add_cpu_time = timer() - start      # Time the GPU function    start = timer()    vector_add_gpu(a_source, b_source)    vector_add_gpu_time = timer() - start      # Report times    print("CPU function took %f seconds." % vector_add_cpu_time)    print("GPU function took %f seconds." % vector_add_gpu_time)      return 0    if __name__ == "__main__":    main()  

To run the example, type:


NOTE: If you run into issues when running your program, try using “conda install accelerate”.

As you can see, the CPU version runs considerably slower.

If not, then your iterations are too small. Adjust the NUM_ELEMENTS to a larger value (on mine, the breakeven mark seemed to be around 100 million). This is because the setup of the GPU takes a small but noticeable amount of time, so to make the operation worth it, a higher workload is needed. Once you raise it above the threshold for your machine, you’ll notice substantial performance improvements of the GPU version over the CPU version.


I hope you’ve enjoyed our basic introduction into GPU Programming with Python. Though the example above is trivial, it provides the framework you need to take your ideas further utilizing the power of your GPU.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Hướng dẫn sử dụng Proxy Helper Fakeip khi thuê proxy

Theo mặc định, Chrome sử dụng cài đặt proxy hệ thống. Nhưng đôi khi bạn muốn chỉ đặt proxy...

Installing and Using Snort Intrusion Detection System to Protect Servers and Networks

After setting up any server among the first usual steps linked to security are the firewall, updates and upgrades, ssh...

Hướng dẫn cấu hình Rclone kết nối qua tài khoản FTP

Ở bài trước, chúng ta đã cài tìm hiểu cách cấu hình rclone kết nối với Google Drive. Bài viết...
Bài Viết

Bài Viết Mới Cập Nhật

Lý do tại sao bạn nên sử dụng proxy khi truy cập web đen

Các lỗi thường gặp khi sử dụng proxy và cách khắc phục chúng.

Tác động của việc sử dụng proxy đến tốc độ kết nối internet của bạn.

Các tiện ích và công cụ để quản lý proxy.

Các cách để kiểm tra tốc độ và độ ổn định của proxy.