XJTLU Entrepreneur College (Taicang) Cover Sheet
DTS201TC Pattern Recognition
School of Artificial Intelligence and Advanced Computing
Coursework
5 pm China time (UTC+8 Beijing) on Thursday 31ST October 2024
Around 1500
I certify that I have read and understood the University’s Policy for dealing with Plagiarism, Collusion and the Fabrication of Data (available on Learning Mall Online). With reference to this policy I certify that:
• My work does not contain any instances of plagiarism and/or collusion. My work does not contain any fabricated data.
By uploading my assignment onto Learning Mall Online, I formally declare that all of the above information is true to the best of my knowledge and belief.
Module code and Title
School Title
Assignment Title
Submission Deadline
Final Word Count
If you agree to let the university use your work anonymously for teaching and learning purposes, please type “yes” here.
Scoring – For Tutor Use
Student ID
Stage of Marking
Marker Code
Learning Outcomes Achieved (F/P/M/D) (please modify as appropriate)
Final Score
A
B
1st Marker – red pen
IM Initials |
The original mark has been accepted by the moderator (please circle as appropriate): |
Data entry and score calculation have been checked by another tutor (please circle): |
Moderation – green pen
2nd Marker if needed – green pen
For Academic Office Use
Possible Academic Infringement (please tick as appropriate)
Date Received
Days late
Late Penalty
☐ Category A
☐ Category B ☐ Category C ☐ Category D ☐ Category E
Y/N Y
Total Academic Infringement Penalty (A,B, C, D, E, Please modify where necessary) _____________________
Students
Please save your assignment in a PDF document, and package your code as a ZIP file. Submit both the technical report and the code file via Learning Mall Core to the appropriate drop box. Electronic submission is the only method accepted; no hard copies will be accepted.
You must download your file and check that it is viewable after submission. Documents may become corrupted during the uploading process (e.g. due to slow internet connections). However, students themselves are responsible for submitting a functional and correct file for assessments.
Weight: 40%
Overview:
This coursework is the assessment for DTS201TC and aims to evaluate understanding of pattern representation, feature discovery and selection, foundations of pattern recognition algorithms and machines including statistical, structural and neural methods.
Learning Outcomes:
A. demonstrate understanding of foundations of pattern recognition algorithms and
machines, including statistical, structural and neural methods;
B. demonstrate understanding of data structures for pattern representation, feature
discovery and selection;
Avoid Plagiarism
• Do NOT submit work from others.
• Do NOT share code/work with others.
• Do NOT copy and paste directly from sources without proper attribution.
• Do NOT use paid services to complete assignments for you.
Technical Report Requirements:
The student loads image data (Covid- 19 CT images) using Python package and reduces the high-dimensional data to a low dimensional data (e.g., dimension=20). The Covid-19 CT image dataset has 397 non- Covid-19 images and 349 Covid-19 images. Around 20 new features using the PCA dimension reduction can obtain a good classification performance without losing much accuracy.The student needs to compare model performance using the 20 new features with model performance using all original features. Three machine learning models (K-Nearest Neighbour (KNN) classifier, Naive Bayes Classifier and Multi- layer Perceptron (MLP) Neural Networks) will be used to compare the model performances using all original features and the 20 new features. The classification performance will be different using a different machine learning model algorithm. This also will affect the feature ranking performance. The student needs to rank the importance of the 20 new features using the three models (The student can set the feature as mean value for the feature ranking). Finally, the student needs to write a
technical report (around 1500 words) to include the following sections:
Report Title: Covid-19 CT Image Data Classification Using Principal Component Analysis (PCA) and K-Nearest Neighbour (KNN) Classifier, Naive Bayes Classifier and Multi-layer Perceptron (MLP) Neural Networks.
Section 1: Introduction (7 marks)
Machine learning (ML) can learn from training data and it has demonstrated greater
accuracy in predicting Covid-19 than clinicians.
The students need to explain what is the problem using the original high dimensional image data for the Covid-19 prediction and introduce the possible solutions.
Section 2: Principal Component Analysis Method and Experimental Design (10
marks)
The student needs to give the correct PCA formula and explain why the PCA is
needed for the application.
Section 3: Experimental Results with Analysis (15 marks)
The student needs to use tables to show the correct experimental results with analysis
in the report. The student also needs to analyze which model works best among the
three models using all original features and the 20 new features, and how much loss of
the accuracy using the PCA for the Covid-19 classification/prediction.
Section 4: Conclusion (5 marks)
The student can conclude what are advantages and disadvantages using the PCA for
the Covid-19 prediction based on the experimental results and PCA algorithm. The
student also can conclude whether there is feature ranking uncertainty or not using the
three machine learning models. The student can give recommendations to improve the
Covid-19 prediction in this section.
Section 5: References (3 marks)
The student need to read 3 or more than 3 reference papers for the technical report.
Note: The student needs to write around 1500 words for the technical report and provide the Python code.
Figure 1: Covid-19 and Non-Covid-19 CT images
Table 1 (Example): Three model performances using the 20 new features and all original features
Training Performance (50% of the image data)
Models
kNN
Naive Bayes
MLP-NNs
Original features e.g., 93%
... ...
20 new features e.g., 90%
... ...
Test Performance (50% of the image data)
kNN ... ...
Naive Bayes ... ...
MLP-NNs ... ...
Table 2 (Example): Ranking 20 new features using PCA and all data
20 New Features
Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6
Accuracy using k-NN classifier
e.g., 80% ...
... ... ... ...
Accuracy using Naive Bayes classifier
e.g., 82% ...
... ... ... ...
Accuracy using MLP-NNs classifier
e.g., 90% ...
... ... ... ...
............ Feature 20 ... ... ...