This intent of this example is to provide very basic code that teaches the mechanics of writing GPGPU software. While the computations are unnecessary, it provides side-by-side comparisons of several computing languages and the manner in which a GPU can be accessed.
Software
The software is hosted at GitHub: Basic GPU Repository Page.
The recommended method of obtaining the software is by cloning the repository with Git. Otherwise, a ZIP archive can be downloaded from the repository page.
The repository contains several different implementations of the problem described in the next section:
- Single-threaded, sequential C++, making use of the Eigen template library for linear algebra computations.
- Single-threaded, sequential Matlab. This is done to compare with what the majority of economists would use to solve the problem.
- Thrust, using the OpenMP backend to solve the problem in parallel on several CPU cores.
- Thrust, using the CUDA backend to solve the problem in parallel on the GPU.
- CUDA C.
The side-by-side implementations allow users to learn the basics of massively parallel programming by comparing code with familiar benchmarks (Matlab or sequential C++).
Problem Description
Consider the second-order polynomial
\begin{equation} y = ax^2 + bx + c. \label{objective} \end{equation}
The figure below depicts a range of these polynomials for $a \in [-0.9, -0.1]$, $b = 2.3$ and $c = 5.4$, where the darkest line corresponds to the case $a = -.0.1$.

For a given set of parameter values, it is trivial to show that the maximum occurs at
\begin{equation} x = \frac{b}{2a}. \label{max} \end{equation}
However, for the purposes of demonstrating the mechanics of parallel computing, the software uses Newton's Method to numerically compute the maximum for each $a \in [-0.9, -0.1]$.
Algorithm
- Discretize the range of the second-order parameter, $a$, so that it is an equally-spaced grid of $N$ values between -0.9 and -0.1. Denote the grid $\mathcal{A}$.
- for each $a \in \mathcal{A}$
- Find the maximum of Equation \eqref{objective} via Newton's Method.
- end for
A basic implementation would involve computing the maxima in Step 3 in a serial fashion for each value of $a \in \mathcal{A}$ in the loop at Step 2. Alternatively, with many parallel processors available, the loop at Step 2 could be eliminated and the optimization at Step 3 could be assigned to an individual processor and computed in parallel for each $a \in \mathcal{A}$.