Optimizing the Forward Pass with the RELU_BIAS Epilog
In this section, I demonstrate how to use epilogs to implement a forward pass of a simple linear layer. This layer first multiplies the input vectors by a weights matrix, then adds a bias to each element of the resulting matrix, and finally applies the ReLU activation function.
ReLU, short for Rectified Linear Unit, is a commonly used activation function that replaces negative values with zeros while leaving positive values unchanged.
In terms of matrix operations, the layer can be expressed as follows:
In the equation, the following definitions are true:
-
W represents the weights matrix -
x represents the input vector -
B represents the bias vector -
relu represents the ReLU activation function
Assume that you have your inputs, weights, and bias as CuPy arrays:
num_inputs, num_outputs = 784, 100
batch_size = 256
weights = cupy.random.rand(num_outputs, num_inputs)
bias = cupy.random.rand(num_outputs)
x = cupy.zeros((num_inputs, batch_size))
In the most basic version, you can implement this linear layer by using nvmath-python for calculating
mm = Matmul(weights, x)
mm.plan()
def forward():
y = mm.execute()
y += bias[:, cupy.newaxis]
y[y < 0] = 0
return y
To improve the performance of the code, take advantage of the RELU_BIAS epilog to perform all three operations in a single, fused cuBLAS operation. This epilog first adds the bias to the result of the multiplication and then applies the ReLU function.
You can specify the epilog using the `epilog` argument of the `Matmul.plan` method. Some epilogs, including RELU_BIAS, take extra inputs, which can be specified in the `epilog_inputs` dictionary. For more information about epilogs, see nvmath.linalg.advanced.Matmul.
from nvmath.linalg.advanced import MatmulEpilog
mm = Matmul(weights, x)
mm.plan(epilog=MatmulEpilog.RELU_BIAS, epilog_inputs={"bias": bias})
def forward():
y = mm.execute()
return y
Optimizing the Backward Pass with the DRELU_BGRAD Epilog
In backpropagation, when you know how the loss function
For more information about the derivations of the formulas used to compute the gradients, see Automatic Differentiation and Neural Networks.
The operations required to compute
mm = Matmul(weights.T, grad)
mm.plan()
def backward():
grad_t1 = mm.execute()
grad_t1[mask] = 0 # assuming that `mask = (t1 < 0)`
grad_bias = cupy.sum(grad_t1, axis=1)
return grad_t1, grad_bias
To optimize your backward pass, use the DRELU_BGRAD epilog. Assume that the gradient
mm = Matmul(weights.T, grad)
mm.plan(epilog=MatmulEpilog.DRELU_BGRAD, epilog_inputs={"relu_aux": relu_mask})
def backward():
grad_t1, aux_outputs = mm.execute()
grad_bias = aux_outputs["drelu_bgrad"]
return grad_t1, grad_bias
Conclusion
With the epilogs of nvmath-python, you can fuse common deep learning computations together in your Python code, which enables you to greatly improve the performance. For more information, see the nvmath-python: Unleashing the Full Capabilities of NVIDIA Math Libraries within Python documentation. For an example of end-to-end implementation of a simple neural network with nv-math python, see the Backpropagation Jupyter notebook on GitHub.
We are an open-source library, so feel free to visit the /NVIDIA/nvmath-python GitHub repo and reach out to us there.
Frequently Asked Questions
Q1: What is nvmath-python?
nvmath-python is an open-source Python library that provides Python programmers with access to high-performance mathematical operations from NVIDIA CUDA-X math libraries.
Q2: What is an epilog?
An epilog is an operation that can be fused with a mathematical operation being performed, like FFT or matrix multiplication. Available epilogs cover the most common deep-learning computations.
Q3: How do I use epilogs?
You can use epilogs by specifying the epilog argument of the Matmul.plan method. Some epilogs, including RELU_BIAS, take extra inputs, which can be specified in the epilog_inputs dictionary.
Q4: What is the benefit of using epilogs?
The benefit of using epilogs is that they enable you to fuse common deep-learning computations together in your Python code, which can greatly improve the performance.
Q5: Where can I find more information about nvmath-python?
You can find more information about nvmath-python in the nvmath-python: Unleashing the Full Capabilities of NVIDIA Math Libraries within Python documentation and the Backpropagation Jupyter notebook on GitHub.

