Coursework | CSC4005 Parallel Programming - Project 4 Parallel Programming with Machine Learning

Project 4
Parallel Programming with Machine Learning Assignment Writing Service

In this project, you will have the opportunity to gain insight and practice in using OpenACC to accelerate machine learning algorithms. Specifically, you will be accelerating softmax regression and neural networks (NNs). Assignment Writing Service

First, you will need to understand the basic principles and algorithms of softmax regression and neural networks. Then, you will work with OpenACC, a programming model for parallel computing that makes it easier for you to optimize your code to run on GPUs, thereby greatly increasing the speed of computation. Assignment Writing Service

This assignment will help you understand the importance of parallel computing in machine learning, especially when working with large-scale data and complex models. You will learn how to effectively utilize hardware resources to improve the performance and efficiency of machine learning algorithms. Assignment Writing Service

REMIND: Please start ASAP to avoid the peak period of cluster job submission. Assignment Writing Service

Task0: Setup Assignment Writing Service

Download the dataset from BB. Unzip dataset.zip to folder project4 . The structure of working directory should look like below: Assignment Writing Service

$ tree .
.
├── build
│   ├── nn

```
│    ├── nn_openacc
```
│ ├── softmax Assignment Writing Service

│    └── softmax_openacc
├── dataset

│ ├── testing Assignment Writing Service

│    │   ├── t10k-images.idx3-ubyte

│    │   └── t10k-labels.idx1-ubyte

```
│    └── training
```

│        ├── train-images.idx3-ubyte

│        └── train-labels.idx1-ubyte
├── README.md
├── sbatch.sh

├── src Assignment Writing Service

```
│    ├── nn_classifier.cpp
```

│    ├── nn_classifier_openacc.cpp

```
│    ├── simple_ml_ext.cpp
```
```
│    ├── simple_ml_ext.hpp
```

│    ├── simple_ml_openacc.cpp

│    ├── simple_ml_openacc.hpp

│    ├── softmax_classifier.cpp

│    └── softmax_classifier_openacc.cpp
└── test.sh

5 directories, 20 files

Task1: Train MNIST with softmax regression Assignment Writing Service

Softmax regression (or multinomial logistic regression) is an extension of logistic regression that can handle multiclass classification problems. Softmax Regression, also known as Multinomial Logistic Regression, is an extension of Logistic Regression to the multi-class problem. The mathematical expression is as follows: Assignment Writing Service

Suppose we have an input vector , and we want to classify it into one of different classes. Assignment Writing Service

For each class , we have a weight vector and a bias term . Assignment Writing Service

We can compute the unnormalized log probabilities for belonging to class as follows: Assignment Writing Service

This gives us an output vector , where each element represents the unnormalized log probability of belonging to class . Assignment Writing Service

We can then convert these unnormalized log probabilities into probabilities using the softmax function. The softmax function is defined as follows: Assignment Writing Service

This gives us a probability vector
probability of belonging to class
regression. It maps an input vector
represents the probability of belonging to class . Assignment Writing Service

, where each element represents the
. This is the mathematical expression of softmax Assignment Writing Service

to a probability vector , where each element Assignment Writing Service

This process is also known as softmax classification. In practice, we usually choose the class with the highest probability as the predicted class. Assignment Writing Service

For a multi-class output that can take on values , the softmax loss takes as input a vector of logits , the true class returns a loss defined by: Assignment Writing Service

Softmax gradient descent optimization algorithm: we need to compute the gradients of the loss function with respect to the weights and biases, and then update the weights and biases. We can also write this in the more compact notation we discussed in class. Namely, if we let denote a design matrix of some
inputs (either the entire dataset or a minibatch), a corresponding vector of labels, and overloading to refer to the average softmax loss, then Assignment Writing Service

where Assignment Writing Service

m Assignment Writing Service

n×mR ∈ X Assignment Writing Service

)esiw-wor deilppa noitazilamron( ))ΘX(pxe(ezilamron = Z Assignment Writing Service

)yI − Z( X m = )y ,ΘX(xamtfoslΘ∇
T1 Assignment Writing Service

xamtfosl Assignment Writing Service

m}k,...,1{ ∈ y Assignment Writing Service

1=i
yz − iz pxe ∑ gol = )y ,z(xamtfosl Assignment Writing Service

k Assignment Writing Service

}k , ... ,1{ ∈ y kR ∈ z
}k,...,1{ ∈ y Assignment Writing Service

jx
jp p x Assignment Writing Service

j xjo
jb Assignment Writing Service

K Assignment Writing Service

jp Assignment Writing Service

jx
KR ∈ p Assignment Writing Service

kze1=kKΣ = )x|j = y(p
jze Assignment Writing Service

jx
KR ∈ θ Assignment Writing Service

jb + jθ Tx = jz Assignment Writing Service

nR ∈ jw j
nR ∈ x Assignment Writing Service

jo Assignment Writing Service

denotes the matrix of logits, and represents a concatenation of one-hot bases for the labels in . Assignment Writing Service

Here is the given training code in Python: Assignment Writing Service

Note that for "real" implementation of softmax loss you would want to scale the logits to prevent numerical overflow, but we won't worry about that here (the rest of the assignment will work fine even if you don't worry about this). Assignment Writing Service

There are some functions inside the softmax function that you also need to fill in the details: Assignment Writing Service

Function Declaration Assignment Writing Service

What does the function do Assignment Writing Service

apply the softmax activation function to the matrix Assignment Writing Service

divides all elements of matrix by the scalar value Assignment Writing Service

multiply all elements of matrix by the scalar value Assignment Writing Service

def softmax_regression_epoch(X, y, theta, lr=0.1, batch=100):
    for i in range(0, X.shape[0], batch):

        X_b = X[i : i + batch]
        h_X_exp = np.exp(np.dot(X_b, theta))
        Z = h_X_exp / np.sum(h_X_exp, axis=1)[:, None]
        Y = np.zeros(Z.shape, np.float32)
        Y[np.arange(y[i : i + batch].size), y[i : i + batch]] = 1
        gradients = np.dot(X_b.T, Z - Y) / batch * lr
        theta -= gradients

matrix_dot(A, B, C, m, n, k)

perform a matrix multiplication operation between matrices and , and the result is stored in matrix Assignment Writing Service

C
BA Assignment Writing Service

matrix_softmax_normalize(A, m, n)

vector_to_one_hot_matrix(y, Y, m, n)	convert a vector into a one-hot encoded matrix with dimensions Assignment Writing Service Y y Assignment Writing Service n×m Assignment Writing Service
matrix_minus(A, B, m, n)	perform element-wise subtraction between matrices A and B, with the result stored in matrix A Assignment Writing Service
matrix_dot_trans(A, B, C, n, m, k)	perform a matrix multiplication between the transpose of and , with the result stored in matrix Assignment Writing Service C BA Assignment Writing Service

matrix_div_scalar(A, scalar, m, n)
matrix_mul_scalar(A, scalar, m, n)

ralacs
A Assignment Writing Service

A Assignment Writing Service

y
k×mR ∈ yI Assignment Writing Service

subtracts matrix from matrix element-wise, , with the result stored in matrix Assignment Writing Service

A
BA Assignment Writing Service

matrix_minus(A, B, m, n)

m, n, k, lr, batch)

num_classes, epochs, lr, batch)

| Epoch | Train Loss | Train Err | Test Loss | Test Err |
|     0 |    0.35134 |   0.10182 |   0.33588 |  0.09400 |
|     1 |    0.32142 |   0.09268 |   0.31086 |  0.08730 |
|     2 |    0.30802 |   0.08795 |   0.30097 |  0.08550 |
|     3 |    0.29987 |   0.08532 |   0.29558 |  0.08370 |
|     4 |    0.29415 |   0.08323 |   0.29215 |  0.08230 |
|     5 |    0.28981 |   0.08182 |   0.28973 |  0.08090 |
|     6 |    0.28633 |   0.08085 |   0.28793 |  0.08080 |
|     7 |    0.28345 |   0.07997 |   0.28651 |  0.08040 |
|     8 |    0.28100 |   0.07923 |   0.28537 |  0.08010 |
|     9 |    0.27887 |   0.07847 |   0.28442 |  0.07970 |

train_softmax

softmax_regression_epoch_cpp

Function Declaration What does the function do Assignment Writing Service

softmax_regression_epoch_cpp(X, y, theta, train of softmax regression for 1 epoch Assignment Writing Service

train_softmax(train_data, test_data,

train a softmax classifier Assignment Writing Service

In the implementation, you are allowed to define your variables and functions to facilitate your programming. Assignment Writing Service

The outcome is like below: Assignment Writing Service

Task2: Accelerate softmax with OpenACC Assignment Writing Service

You need to accelerate the function and the functions inside the function with OpenACC. Assignment Writing Service

Hint: You can accelerate the program by applying OpenACC to each function. Assignment Writing Service

Task3: Train MNIST with neural network Assignment Writing Service

The inference and training process of a neural network can be described by the following formulas: Assignment Writing Service

1. Forward Propagation (Inference)
The forward propagation process of a neural network can be described by the following formula, where is the activation value of the th layer is the weight of the th layer, is the bias of the th layer, and is the activation function: Assignment Writing Service

)l( W l Assignment Writing Service

)l(a Assignment Writing Service

f l )l(b l Assignment Writing Service

ateht Assignment Writing Service

This process starts from the input layer, through the calculation of each layer’s weights and biases, as well as the activation function, and finally obtains the predicted value of the output layer. Assignment Writing Service

2. Backward Propagation (Training)
The training process of a neural network mainly updates the weights and biases through the backpropagation algorithm. First, we need to define a loss function to measure the gap between the predicted value and the true value. Then, we update the weights and biases by calculating the gradient of the loss function for the weights and biases: Assignment Writing Service

Here, can be propagated from the next layer to the previous layer through the chain rule. Finally, we use the gradient descent method to update the weights and biases: Assignment Writing Service

Here, is the learning rate, which controls the step size of the update. In this project, we are going to implement a 2-layer NN with SGD. Assignment Writing Service

where and represent the weights of the network (which has a - dimensional hidden unit), and where represents the logits output by the network. We again use the softmax / cross-entropy loss, meaning that we want to solve the optimization problem. Assignment Writing Service

Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in class, on 9/8, but also provide the final form here for ease of implementation). Specifically, let Assignment Writing Service

d Assignment Writing Service

)T2W2G(∘}0 > 1Z{1 = d×mR ∈ 1G
yI−))2W1Z(pxe(ezilamron=k×mR∈2G
)1WX(ULeR = d×mR ∈ 1Z Assignment Writing Service

kR ∈ z
k×dR ∈ 2W
)xT1W(ULeRT2W = Z Assignment Writing Service

)l(b∂α− b= b
L∂ )l( )l( Assignment Writing Service

)l(W∂α− W= W
L∂ )l( )l( Assignment Writing Service

)l(b∂ )l(a∂ = )l(b∂
)l(a∂ L∂ L∂ Assignment Writing Service

)l( W∂ )l(a∂ = )l( W∂
)l(a∂ L∂ L∂ Assignment Writing Service

d×nR ∈ 1W Assignment Writing Service

L Assignment Writing Service

))l(b+)1−l(a)l(W(f=)l(a Assignment Writing Service

)l(a∂
L∂ Assignment Writing Service

α Assignment Writing Service

def nn_epoch(X, y, W1, W2, lr=0.1, batch=100):
    for i in range(0, X.shape[0], batch):

        X_b = X[i : i + batch]
        Z1 = np.maximum(0, np.dot(X_b, W1))
        h_Z1_exp = np.exp(np.dot(Z1, W2))
        Z2 = h_Z1_exp / np.sum(h_Z1_exp, axis=1)[:, None]
        Y = np.zeros(Z2.shape, np.float32)
        Y[np.arange(y[i : i + batch].size), y[i : i + batch]] = 1
        G1 = np.dot(Z2 - Y, W2.T) * (Z1 > 0)
        W1_l = np.dot(X_b.T, G1) / batch * lr
        W2_l = np.dot(Z1.T, Z2 - Y) / batch * lr
        W1 -= W1_l
        W2 -= W2_l

matrix_trans_dot(A, B, C, m, n, k)

perform a matrix multiplication between and the transpose of , with the result Assignment Writing Service

stored in matrix Assignment Writing Service

C
BA Assignment Writing Service

matrix_mul(A, B, size)

k, lr, batch) Assignment Writing Service

train_nn(train_data, test_data,

train a 2-layer NN classifier Assignment Writing Service

num_classes, hidden_dim, epochs, lr,

batch) Assignment Writing Service

| Epoch | Train Loss | Train Err | Test Loss | Test Err |
|     0 |    0.13466 |   0.04023 |   0.14293 |  0.04240 |

where is a binary matrix with entries equal to zero or one depending on whether each term in is strictly positive and where denotes elementwise multiplication. Then the gradients of the objective are given by: Assignment Writing Service

Here is the given training code in Python: Assignment Writing Service

There are some new functions inside the NN function that you also need to fill in the details: Assignment Writing Service

Function Declaration Assignment Writing Service

  nn_epoch_cpp(X, y, W1, W2, m, n, l,

The outcome is like below: Assignment Writing Service

What does the function do Assignment Writing Service

multiply matrix from matrix element- wise, with the result stored in matrix Assignment Writing Service

train the 2-layer NN for 1 epoch Assignment Writing Service

A
BA Assignment Writing Service

2G 1Z m = )y ,2W)1WX(ULeR(xamtfosl2W∇
T1 Assignment Writing Service

1G Xm =)y,2W)1WX(ULeR(xamtfosl1W∇
T1 Assignment Writing Service

∘ 1Z Assignment Writing Service

}0 > 1Z{1 Assignment Writing Service

|     1 |    0.09653 |   0.03020 |   0.11593 |  0.03700 |
|     2 |    0.07351 |   0.02227 |   0.10043 |  0.03170 |
|     3 |    0.05862 |   0.01715 |   0.09091 |  0.02880 |
|     4 |    0.04677 |   0.01298 |   0.08348 |  0.02650 |
|     5 |    0.03878 |   0.01015 |   0.07878 |  0.02490 |
|     6 |    0.03281 |   0.00822 |   0.07595 |  0.02470 |
|     7 |    0.02796 |   0.00672 |   0.07341 |  0.02390 |
|     8 |    0.02452 |   0.00558 |   0.07204 |  0.02280 |
|     9 |    0.02133 |   0.00453 |   0.07076 |  0.02240 |
|    10 |    0.01880 |   0.00365 |   0.07004 |  0.02200 |
|    11 |    0.01675 |   0.00320 |   0.06925 |  0.02190 |
|    12 |    0.01510 |   0.00265 |   0.06867 |  0.02190 |
|    13 |    0.01345 |   0.00203 |   0.06821 |  0.02150 |
|    14 |    0.01217 |   0.00150 |   0.06793 |  0.02080 |
|    15 |    0.01136 |   0.00128 |   0.06787 |  0.02100 |
|    16 |    0.01010 |   0.00098 |   0.06725 |  0.02060 |
|    17 |    0.00949 |   0.00090 |   0.06736 |  0.02050 |
|    18 |    0.00860 |   0.00068 |   0.06690 |  0.02020 |
|    19 |    0.00793 |   0.00050 |   0.06666 |  0.02030 |

Task4: Accelerate neural network with OpenACC Assignment Writing Service

You need to accelerate the train_nn function and the functions inside the nn_epoch_cpp function with OpenACC. Assignment Writing Service

Since the calculating precisions on CPU and GPU platforms are different, there is a tiny gap between the outcome of sequential and OpenACC programs. Here is the sample output of OpenACC: Assignment Writing Service

| Epoch | Train Loss | Train Err | Test Loss | Test Err |
|     0 |    0.13466 |   0.04023 |   0.14293 |  0.04240 |
|     1 |    0.09699 |   0.03037 |   0.11628 |  0.03700 |
|     2 |    0.07349 |   0.02233 |   0.10028 |  0.03230 |
|     3 |    0.05790 |   0.01675 |   0.09053 |  0.02800 |
|     4 |    0.04668 |   0.01280 |   0.08374 |  0.02650 |
|     5 |    0.03846 |   0.01003 |   0.07861 |  0.02520 |
|     6 |    0.03255 |   0.00810 |   0.07542 |  0.02420 |
|     7 |    0.02800 |   0.00678 |   0.07333 |  0.02410 |
|     8 |    0.02444 |   0.00548 |   0.07163 |  0.02350 |
|     9 |    0.02127 |   0.00447 |   0.07054 |  0.02290 |
|    10 |    0.01869 |   0.00365 |   0.06941 |  0.02230 |

|    11 |    0.01683 |   0.00318 |   0.06875 |  0.02200 |
|    12 |    0.01501 |   0.00252 |   0.06818 |  0.02120 |
|    13 |    0.01352 |   0.00200 |   0.06757 |  0.02080 |
|    14 |    0.01241 |   0.00172 |   0.06769 |  0.02070 |
|    15 |    0.01116 |   0.00120 |   0.06712 |  0.02050 |
|    16 |    0.01014 |   0.00098 |   0.06664 |  0.02010 |
|    17 |    0.00948 |   0.00088 |   0.06664 |  0.02030 |
|    18 |    0.00856 |   0.00067 |   0.06628 |  0.01980 |
|    19 |    0.00815 |   0.00057 |   0.06644 |  0.01970 |

Hint: You can accelerate the program by applying OpenACC to each function. Assignment Writing Service

Extra Credit: Extend Neural Network to Assignment Writing Service

Convolutional Neural Network with OpenACC Assignment Writing Service

You need to implement and accelerate the train_cnn function and the functions inside the cnn_epoch_cpp function with OpenACC. You can use any hyperparameters and filters as you like. Note that your performance of CNN should be better in accuracy than the previous 2-layer NN. Assignment Writing Service

Hint: You can accelerate the program by applying OpenACC to each function. Filters in static when compiling may help a lot in time performance. Assignment Writing Service

How to Execute the Program Assignment Writing Service

Execute the bash script. Assignment Writing Service

bash ./test.sh Baseline Assignment Writing Service

Softmax Sequential softmax OpenACC NN Sequential Assignment Writing Service

9767 ms 1066 ms 683586 ms Assignment Writing Service

NN OpenACC Assignment Writing Service

68563 ms Assignment Writing Service

NOTICE: the outcome of the classifier in training (including loss and error) should be the same as the sample outcome number by number. Assignment Writing Service

Requirements & Grading Policy Machine Learning (50%) Assignment Writing Service

Task1: Train MNIST with softmax regression (10%) Assignment Writing Service

Task2: Accelerate softmax with OpenACC (20%) Task3: Train MNIST with neural network (10%)
Task4: Accelerate neural network with OpenACC (10%) Assignment Writing Service

Your programs should be able to compile & execute to get the expected computation result to get the full grade in this part.
Performance of Your Program (30%) Assignment Writing Service

7.5% for each Task Assignment Writing Service

Try your best to do optimization on your parallel programs for higher speedup. If your programs show similar performance to the baseline performance, then you can get the full mark for this part. Points will be deducted if your parallel programs perform poorly while no justification can be found in the report. Assignment Writing Service

One Report in PDF (20%, No Page Limit)
Regular Report (10%)
The report does not have to be very long and beautiful to help you get a good grade, but you need to include what you have done and what you have learned in this project. The following components should be included in the report: Assignment Writing Service

How to compile and execute your program to get the expected output on the cluster.
Explain clearly how you designed and implemented each algorithm Assignment Writing Service

Show the experiment results you get, and do some numerical analysis, such as calculating the speedup and efficiency, demonstrated with tables and figures.
What kinds of optimizations have you tried to speed up your parallel program, and how do they work? Assignment Writing Service

Any interesting discoveries you found during the experiment? Assignment Writing Service

Profiling OpenACC with nsys (10%) Assignment Writing Service

You are required to practice profiling OpenACC programs with nsys as we explained in the Instruction of profiling tools with perf and nsys. The command line profiling of nsys is mandatory while the GUI Nsight System is optional. Assignment Writing Service

Extra Credits (10%) Assignment Writing Service

Implement CNN (5%)
Accelerate CNN with OpenACC (5%) Assignment Writing Service

Extra optimizations or interesting discoveries in the first three tasks may also earn you some extra credits. Assignment Writing Service

The Extra Credit Policy Assignment Writing Service

According to the professor, the extra credits in this project cannot be added to other projects to make them full marks. The credits are the honor you received from the professor and the teaching staff, and the professor may help raise you to a higher grade level if you are at the boundary of two grade levels and he thinks you deserve a better grade with your extra credits. For example, if you are among the top students with B+ grade, and get enough extra credits, the professor may raise you to A- grade. Furthermore, the professor will invite a few students with high extra credits to have dinner with him. Assignment Writing Service

Grading Policy for Late Submission Assignment Writing Service

late submission for less than 10 minutes after the DDL is tolerated for possible issues during submission. Assignment Writing Service
10 Points deduction for each day after the DDL (11 minutes late will be considered as one day, so be careful) Assignment Writing Service
Zero points if you submitted your project late for more than two days Assignment Writing Service

File Structure to Submit on BlackBoard Assignment Writing Service

<Your StudentID>.pdf  # Report
<Your StudentID>.zip  # Codes
├── sbatch.sh
├── src

```
│    ├── nn_classifier.cpp
```

│    ├── nn_classifier_openacc.cpp

```
│    ├── simple_ml_ext.cpp
```
```
│    ├── simple_ml_ext.hpp
```

│    ├── simple_ml_openacc.cpp

│    ├── simple_ml_openacc.hpp

│    ├── softmax_classifier.cpp

│    └── softmax_classifier_openacc.cpp
└── test.sh

5 directories, 20 files