Coursework | COMP3702 Artificial Intelligence - Assignment 3: Reinforcement Learning

COMP3702 Artificial Intelligence (Semester 2, 2024) Assignment 3: Reinforcement Learning Assignment Writing Service

Key information: Assignment Writing Service

Due: 1pm, Friday 25 October Assignment Writing Service
This assignment assesses your skills in training and understanding algorithms for solving Reinforcement Learning Problems. Assignment Writing Service
Assignment 3 contributes 20% to your final grade. Assignment Writing Service
This assignment consists of two parts: (1) programming and (2) a report. Assignment Writing Service
This is an individual assignment. Assignment Writing Service
Your program (code) must be zipped and submitted to Blackboard. It will not be explicitly graded, and instead will form the basis for analysis in the report. Assignment Writing Service
Your report should be in .pdf format and named according to the format a3-COMP3702-[SID].pdf, where SID is your student ID. Each question should start on a separate page and be clearly indicated. Failure to do so will result in a 5% penalty. The report is to be submitted via Gradescope. Reports will be graded by the teaching team. Assignment Writing Service
Note: this assignment will take several hours to run. Ensure you start early! Assignment Writing Service

Reinforcement Learning Assignment Writing Service

In this assignment, you will implement Deep Reinforcement Learning algorithms and analyse their parameters and performance. This assignment will test your skills in training and understanding reinforcement learning algorithms for practical problems and understanding of key algorithm features and parameters. Assignment Writing Service

Gymnasium API Assignment Writing Service

This assignment will make use of the OpenAI Gym/Gymnasium1, which is a standard API for reinforcement learning with a diverse collection of reference environments. This assignment will start by investigating the Cart Pole and Lunar Lander environments shown in Figure 1, which are within the classic-control and box2d set of environments respectively. If using pip, you can install Gymnasium and the dependencies for these set of environments via the following: Assignment Writing Service
```
pip install gymnasium
pip install gymnasium[classic-control]
```
Additional dependencies are required in some environments as per the documentation: e.g. pip install swig for Box2D environments and pip install pygame for the visualisation libraries. Assignment Writing Service

Figure 1: Cart Pole and Lunar Lander environments Assignment Writing Service

1Gymnasium is a maintained fork of OpenAI’s Gym library. Assignment Writing Service

COMP3702 Assignment 3: Reinforcement Learning Assignment Writing Service

To get an understanding of these environments, you can visualise them using the human render mode using code as below: Assignment Writing Service

   import gymnasium as gym

# Initialise the environment
# You can replace the environment e.g. "LunarLander-v2" or -v3
# Ensure the environment (e.g. classic-control, box2d) is installed using pip/conda env = gym.make("CartPole-v1", render_mode='human') Assignment Writing Service

   # Reset the environment to generate the first observation

   observation, info = env.reset(seed=42)
   for _ in range(500):

       # this is where you would insert your policy, here we randomly sample an action

       action = env.action_space.sample()

# step (transition) through the environment with the action, receiving the # next observation, reward and if the episode has terminated or truncated observation, reward, terminated, truncated, info = env.step(action) Assignment Writing Service

       # If the episode has ended then we can reset to start a new episode

       if terminated or truncated:
           observation, info = env.reset()

env.close() Assignment Writing Service

PyTorch Assignment Writing Service

To train neural networks, we will be using the PyTorch machine learning framework. To download PyTorch, follow the instructions here: https://pytorch.org/get-started/locally/, following the instructions for your computer system. Assignment Writing Service

If your computer has a dedicated GPU, select PyTorch with GPU support, and download the appropriate drivers (ROCm for AMD GPUs and CUDA for Nvidia GPUs). Make sure you only install one - either the GPU or the CPU version! Note that the classic-control environments can train faster using CPU than GPU. Assignment Writing Service

Task Assignment Writing Service

Your task is to run the Deep Q-Network (DQN) algorithm and variants such as Duelling DQN and Double DQN, and to write a report investigating the algorithms’ performance and hyper-parameters as detailed in the report section. Assignment Writing Service

For background, we recommend that you read through the following tutorials. You can make use of the code in your solutions with attribution: Assignment Writing Service

• Official DQN PyTorch Tutorial: https://pytorch.org/tutorials/intermediate/reinforcement_ q_learning.html Assignment Writing Service

• Tutorial sample code: https://github.com/comp3702/tutorial11 Assignment Writing Service

When training reinforcement learning algorithms, typically we assess the solution quality using the 100-step moving average episode reward (i.e., R100) received by the learning agent. At time step t, the 100-step moving reward is the average episode reward earned by your learning agent in the episodes [t − 100, t]. If the Q-values imply a poor quality policy, this value will be low. If the Q-values correspond to a high-value policy, the 100-step moving average reward will be higher. We use a moving average because rewards may only be received occasionally and the episode reward is affected by sources of randomness including the exploration strategy. You will need to write a function that plots the R100 vs episodes for analysis in the report. Assignment Writing Service

Page 2 Assignment Writing Service

COMP3702 Assignment 3: Reinforcement Learning Assignment Writing Service

The report Assignment Writing Service

The report tests your understanding of Reinforcement Learning algorithms and the methods used in the code. You should use the code at https://github.com/comp3702/tutorial11 as a foundation to complete the assignment. Assignment Writing Service

Question 1. Q-learning vs Value Iteration Assignment Writing Service

Q-learning is closely related to the Value Iteration algorithm for Markov decision processes. Assignment Writing Service

a) Describe two key similarities between Q-learning and Value Iteration. Answer in no more than 5 lines of text. (5 marks) Assignment Writing Service
b) Give one key difference between Q-learning and Value Iteration. Answer in no more than 5 lines of text. (5 marks) Assignment Writing Service

Question 2. Comparing CartPole-v0 and CartPole-v1 Assignment Writing Service

In this question, you will train two DQNs for two versions of CartPole – v0 and v1 – till the R100 value reaches the reward_threshold (or “stopping reward”) and compare the two learnt policies. According to the source code, the difference between CartPole-v0 and CartPole-v1 are the max_episode_steps and reward_threshold as specified in Table 1. Assignment Writing Service

Table 1: Differences between CartPole-v0 and -v1 Assignment Writing Service

Environment max_episode_steps reward_threshold CartPole-v0 200 195.0 Assignment Writing Service

CartPole-v1 500 475.0 An episode ends if any one of the following occurs: Assignment Writing Service

Termination: Pole Angle is greater than ±12◦ Assignment Writing Service
Termination: Cart Position is greater than ±2.4 (centre of the cart reaches the edge of the display) Assignment Writing Service
Truncation: Episode length is greater than 500 (for CartPole-v1) or 200 (for CartPole-v0) Assignment Writing Service

a) Implement a function to plot the R100 value vs Episode reward. You will need to import a plotting library, e.g. import matplotlib.pyplot as plt, and can implement a function similar to that used in the PyTorch tutorial. Copy or screenshot your code implementation for your answer, citing any resources you used to develop this. (As part of this, you may also want to implement saving and loading of results and/or plots). (5 marks) Assignment Writing Service
b) Plot the R100 value vs Episode number for CartPole-v0 and CartPole-v1 DQN models. Ensure your axes are correctly labelled and indicate what each plot represents (e.g., using a legend or caption). (5 marks) Assignment Writing Service
c) Describe and compare the learnt policies for CartPole-v0 and CartPole-v1. You may make use of the saved video examples on Blackboard titled “CartPole-v0.mp4” and “CartPole-v1.mp4”. Based on your observation of these learnt policies, the definition of the environment and your plots, explain why you think the values of max_episode_steps and reward_threshold were increased from v0 to v1. Assignment Writing Service

(5 marks) Assignment Writing Service

Note: you may need to train the model several times to observe the desired behaviour differences. You can use the human render mode to visualise the policies extracted from the trained neural networks as in the Tutorial solutions or simply describe the saved videos on Blackboard. Assignment Writing Service

Page 3 Assignment Writing Service

COMP3702 Assignment 3: Reinforcement Learning Assignment Writing Service

Question 3. Loss function and Target network Assignment Writing Service

When training neural networks, a loss function is used to compare the current neural network’s predictions to ground truth values, and then gradient descent with backpropagation is applied to adjust the weights of the neural network so the predictions move closer to the ground truth values. In supervised learning we make use of a dataset of paired input and output labels {x,y} for training. In reinforcement learning, the agent receives experiences of states, actions and rewards and through exploration it must estimate the state-action values (in Q-learning). Assignment Writing Service

a) With reference to TD-learning, describe the loss function used to train the neural network in DQN. Use equations and highlight the components corresponding to the “target value” and the neural network’s current state-action value predictions. (5 marks) Assignment Writing Service
b) Examining the code in dqn gym.py, specify which variables correspond to the “target values” and the current state-action value predictions/estimates. (5 marks) Assignment Writing Service
c) An important implementation detail in DQNs is having a separate target network. Describe why a separate target network is needed. (5 marks) Assignment Writing Service
d) Compare the performance of synchronising the target network periodically (e.g. alpha sync = false, target net sync = 1000), versus using soft updates (e.g. alpha sync = true, tau = 0.005). You may select which environment to apply this to (e.g. CartPole-v0). The parameters can be updated in a config file e.g. dqn.yaml (10 marks) Assignment Writing Service

Question 4. Learning-rate Assignment Writing Service

For this question, consider the CartPole-v0 environment (or an environment of your choice) and use the R100 value (100-step moving average episode reward) as a measure of the quality of the learnt policy. Assignment Writing Service

a) Plot the quality of the policy learned by DQN, as given by R100, against episode number for three different fixed values of the learning_rate (which is called α in the lecture notes and in many texts and online tutorials). For this question, do not adjust α over time, rather keep it the same value throughout the learning process. Your plot should display the solution quality up to an episode count where either the performance stabilises (typically > 1000 episodes) or a clear difference in learning rates can be observed. (5 marks) Assignment Writing Service
b) With reference to your plot(s), comment on the effect of varying the learning_rate. (5 marks) Assignment Writing Service
c) Use a plot (either self-drawn or sourced and cited) to describe what happens when the learning_rate Assignment Writing Service

is too high. (5 marks) Assignment Writing Service

Question 5. Epsilon Assignment Writing Service

a) Describe the purpose of the epsilon hyperparameter. (5 marks) Assignment Writing Service
b) Briefly explain how the values of epsilon decay, epsilon final and epsilon start affect training performance. (Note: You do not need to show plots in your answer, but will need to do experimentation to understand the effect of the parameters). (5 marks) Assignment Writing Service

Question 6. DQN vs Double DQN or Duelling DQN Assignment Writing Service

Two variants of DQN are Double DQN and Duelling DQN. Details of the methods can be read in the papers: • Double DQN: https://arxiv.org/abs/1509.06461 Assignment Writing Service

• Duelling DQN: https://arxiv.org/abs/1511.06581
An implementation of Duelling DQN is included in the tutorial11 code. Assignment Writing Service

Choose one of these algorithms and discuss its improvements relative to the original/vanilla DQN. (10 marks) Assignment Writing Service

Page 4 Assignment Writing Service

COMP3702 Assignment 3: Reinforcement Learning Assignment Writing Service

Question 7. Applying DQN beyond CartPole (20 marks) Assignment Writing Service

Based on your study of hyperparameters and understanding of DQN, train a DQN agent for an additional environment from Gymnasium and report on any implementation/hyperparameter changes you had to make, and your agent’s performance. For a simple environment consider LunarLander-v3 or MountainCar-v0. For environments that are image-based like Pong, you will need to use convolutional neural networks, and will require a GPU for efficient training. See sample code here: https://github.com/comp3702/dqn-pong. Assignment Writing Service

This question is effectively repeating the experiments from the previous few questions, but where you get to choose which experiments to perform (e.g. which hyper-parameters to vary), and for your chosen environment instead of CartPole. e.g. choose some hyper-parameters (learning rate, epsilon, tau/target sync interval, number of hidden layers, etc), try a few values for each, and select the best value for each of your chosen hyper-parameters based on your results (which allows you to justify your choice of those values). Assignment Writing Service

Criteria: Assignment Writing Service

Experiment with ≥ 3 hyperparameters and provide evidence and justification for your selection Assignment Writing Service
Provide evidence of experimentation including plots comparing performance of various hyperparameter Assignment Writing Service

settings Assignment Writing Service
Describe what you observed / the effect of the parameter change and why you chose it/ why you modified it Assignment Writing Service
Report final settings of hyperparameter values Assignment Writing Service
Demonstrate that you can solve a level. e.g. define what is a good policy/solved for your selected environment and show that your agent achieves this. e.g. R100 value or description/screenshot of behaviour Assignment Writing Service

Ensure you use headings and paragraphs and tables where appropriate, to facilitate reading of your assignment. Assignment Writing Service

Bee Cart Pole credit to Anonymous COMP3702 student and Generative AI. Assignment Writing Service

END OF ASSIGNMENT 3 Assignment Writing Service

Page 5 Assignment Writing Service

COMP3702 Assignment 3: Reinforcement Learning Assignment Writing Service

Academic Misconduct Assignment Writing Service

The University defines Academic Misconduct as involving “a range of unethical behaviours that are designed to give a student an unfair and unearned advantage over their peers.” UQ takes Academic Misconduct very seriously and any suspected cases will be investigated through the University’s standard policy (https:// ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct). If you are found guilty, you may be expelled from the University with no award. Assignment Writing Service

It is the responsibility of the student to ensure that you understand what constitutes Academic Misconduct and to ensure that you do not break the rules. If you are unclear about what is required, please ask. Assignment Writing Service

In the coding part of COMP3702 assignments, you are allowed to draw on publicly-accessible resources and provided tutorial solutions, but you must make reference or attribution to its source, in comments next to the referenced code, and include a list of references you have drawn on in your solution.py docstring. Assignment Writing Service

If you have utilised Generative AI tools such as ChatGPT, you must clearly cite any use of generative AI in each instance. To reference your use of AI see: Assignment Writing Service

• https://guides.library.uq.edu.au/referencing/chatgpt-and-generative-ai-tools
Failure to reference use of generative AI tools constitutes student misconduct under the Student Code of Assignment Writing Service

Conduct. Assignment Writing Service

It is the responsibility of the student to take reasonable precautions to guard against unauthorised access by others to his/her work, however stored in whatever format, both before and after assessment. You must not show your code to, or share your code with, any other student under any circumstances. You must not post your code to public discussion forums (including Ed Discussion) or save your code in publicly accessible repositories (check your security settings). You must not look at or copy code from any other student. Assignment Writing Service

All submitted files (code and report) will be subject to electronic plagiarism detection and misconduct proceed- ings will be instituted against students where plagiarism or collusion is suspected. The electronic plagiarism detection can detect similarities in code structure even if comments, variable names, formatting etc. are modified. If you collude to develop your code or answer your report questions, you will be caught. Assignment Writing Service

For more information, please consult the following University web pages: • Information regarding Academic Integrity and Misconduct: Assignment Writing Service

– https://my.uq.edu.au/information-and-services/manage-my-program/student-integrity-and- conduct/academic-integrity-and-student-conduct Assignment Writing Service

– http://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct • Information on Student Services: Assignment Writing Service

– https://www.uq.edu.au/student-services/ Assignment Writing Service

Late submission
Assignment Writing Service

Students should not leave assignment preparation until the last minute and must plan their workloads to meet advertised or notified deadlines. It is your responsibility to manage your time effectively. Assignment Writing Service

It may take the autograder up to an hour to grade your submission. It is your responsibility to ensure you are uploading your code early enough and often enough that you are able to resolve any issues that may be revealed by the autograder before the deadline. Submitting non-functional code just before the deadline, and not allowing enough time to update your code in response to autograder feedback is not considered a valid reason to submit late without penalty. Assignment Writing Service

Assessment submissions received after the due time (or any approved extended deadline) will be subject to a late penalty of 10% per 24 hours of the maximum possible mark for the assessment item. Assignment Writing Service

In the event of exceptional circumstances, you may submit a request for an extension. You can find guide- lines on acceptable reasons for an extension here https://my.uq.edu.au/information-and-services/manage-my- program/exams-and-assessment/applying-extension, and submit the UQ Application for Extension of Assess- ment form. Assignment Writing Service

Page 6 Assignment Writing Service