Search

This intent of this example is to provide very basic code that teaches the mechanics of writing GPGPU software. While the computations are unnecessary, it provides side-by-side comparisons of several computing languages and the manner in which a GPU can be accessed.

## Software

The software is hosted at GitHub: Basic GPU Repository Page.

The recommended method of obtaining the software is by cloning the repository with Git. Otherwise, a ZIP archive can be downloaded from the repository page.

The repository contains several different implementations of the problem described in the next section:

• Single-threaded, sequential C++, making use of the Eigen template library for linear algebra computations.
• Single-threaded, sequential Matlab. This is done to compare with what the majority of economists would use to solve the problem.
• Thrust, using the OpenMP backend to solve the problem in parallel on several CPU cores.
• Thrust, using the CUDA backend to solve the problem in parallel on the GPU.
• CUDA C.

The side-by-side implementations allow users to learn the basics of massively parallel programming by comparing code with familiar benchmarks (Matlab or sequential C++).

## Problem Description

Consider the second-order polynomial

$$y = ax^2 + bx + c. \label{objective}$$

The figure below depicts a range of these polynomials for $a \in [-0.9, -0.1]$, $b = 2.3$ and $c = 5.4$, where the darkest line corresponds to the case $a = -.0.1$.

For a given set of parameter values, it is trivial to show that the maximum occurs at

$$x = \frac{b}{2a}. \label{max}$$

However, for the purposes of demonstrating the mechanics of parallel computing, the software uses Newton's Method to numerically compute the maximum for each $a \in [-0.9, -0.1]$.

## Algorithm

1. Discretize the range of the second-order parameter, $a$, so that it is an equally-spaced grid of $N$ values between -0.9 and -0.1. Denote the grid $\mathcal{A}$.
2. for each $a \in \mathcal{A}$
3.     Find the maximum of Equation \eqref{objective} via Newton's Method.
4. end for

A basic implementation would involve computing the maxima in Step 3 in a serial fashion for each value of $a \in \mathcal{A}$ in the loop at Step 2. Alternatively, with many parallel processors available, the loop at Step 2 could be eliminated and the optimization at Step 3 could be assigned to an individual processor and computed in parallel for each $a \in \mathcal{A}$.