| View previous topic :: View next topic |
| Author |
Message |
arwillis Site Admin
Joined: 02 Nov 2005 Posts: 396

|
Posted: Fri Apr 18, 2008 4:15 pm Post subject: Project 3 |
|
|
Data Files:
================
The training images data file. Contents include digitized hand-written training images, i.e., 28x28 pixel images of digits, that you use to compute the parameters of your Gaussian class-conditional probability density functions (pdfs). You will compute a different pdf for each digit 0-9 to generate 10 pdfs in total.
train-images-idx3-ubyte
The training labels data file. Contents include the correct labels for the hand-written training images. The label values are 0-9 indicating the true value of the handwritten digit. They are indexed the same way the training images are indexed, i.e., the first image from the image data is associated with the first label from the label data.
train-labels-idx1-ubyte
The testing images data file. Contents include digitized hand-written test images, i.e., 28x28 pixel images of digits, that you will classify with your system.
t10k-images-idx3-ubyte
The testing labels data file. Contents include the correct labels for the hand-written test images. The label values are 0-9 indicating the true value of the handwritten digit. They are indexed the same way the testing images are indexed, i.e., the first image from the image data is associated with the first label from the label data.
t10k-labels-idx1-ubyte
Project Assignment PDF
=====================
Project 3 PDF
Overall skeleton of the project3 MATLAB code to help you get started on the project:
============================================================
project3skeleton.m
Supporting MATLAB functions to load the data and display the hand-written images
===========================================================
readUByteImageAndLabel.m
getImage.m |
|
| Back to top |
|
 |
arwillis Site Admin
Joined: 02 Nov 2005 Posts: 396

|
Posted: Tue Apr 22, 2008 3:33 pm Post subject: |
|
|
An output of results:
Note the results have 25 examples of each digit from the training data and classify the first 400 test images.
======BEGIN OUTPUT================
Loading the 25 training images to compute PCA vectors with 784-dimensions.
Done computing the scatter of 25 training vectors for each class.
Results for digit 0 achieved 31 correct classifications of 33 total instances, 6.06 percent error.
Results for digit 1 achieved 57 correct classifications of 57 total instances, 0.00 percent error.
Results for digit 2 achieved 24 correct classifications of 44 total instances, 45.45 percent error.
Results for digit 3 achieved 27 correct classifications of 35 total instances, 22.86 percent error.
Results for digit 4 achieved 32 correct classifications of 46 total instances, 30.43 percent error.
Results for digit 5 achieved 21 correct classifications of 42 total instances, 50.00 percent error.
Results for digit 6 achieved 26 correct classifications of 34 total instances, 23.53 percent error.
Results for digit 7 achieved 31 correct classifications of 41 total instances, 24.39 percent error.
Results for digit 8 achieved 16 correct classifications of 27 total instances, 40.74 percent error.
Results for digit 9 achieved 33 correct classifications of 41 total instances, 19.51 percent error.
Got 298 correct classifications of 400 total classifications.
Classifier has 25.50 percent error.
Time spent for PCA classifier classifications is 15.96 seconds
===END OUTPUT |
|
| Back to top |
|
 |
nchuku
Joined: 25 Jan 2008 Posts: 6

|
Posted: Wed Apr 23, 2008 9:18 am Post subject: Need a quick help |
|
|
From the skeleton you gave us on this project, I know that Ivec is the reshaped version of the a training image. Then what is the gammaMatrix there. I am kind of confused there. And to compute psi, do I find the mean of Ivec or the mean of the gammaMatrtix.
Thanks |
|
| Back to top |
|
 |
arwillis Site Admin
Joined: 02 Nov 2005 Posts: 396

|
Posted: Wed Apr 23, 2008 4:59 pm Post subject: |
|
|
Gamma matrix is the matrix of images, one column for each image vector (Ivec). If we load 50 training images gammaMatrix has dimensions 784x50.
The mean image is the average value of the image vector Ivec over all images or, equivalently, the column vector obtained by averaging across the rows of the gammaMatrix. |
|
| Back to top |
|
 |
engrforever
Joined: 24 Jan 2008 Posts: 10

|
Posted: Sun Apr 27, 2008 7:42 am Post subject: |
|
|
Dr. Willis,
I am not clear in the following area ....
when computing the principle components for the "computeFullEigenSpace" and as part of bullet 5, we are asked to normailize the eigenvectors and selectively make changes if eigenvectors are <= 0.
The results we acuqire from [eigVecs, eigVals] = eig(covMatrix); are already normalized.
Am I misunderstanding something here? |
|
| Back to top |
|
 |
arwillis Site Admin
Joined: 02 Nov 2005 Posts: 396

|
Posted: Sun Apr 27, 2008 6:26 pm Post subject: |
|
|
All columns of Ufull and Ureduced should be unit length and mutually perpendicular to any other column vector in the same matrix with the exception of itself, in which case the result is 1 (unit length).
If your code is implemented correctly your vectors should satisfy these criterion. |
|
| Back to top |
|
 |
engrforever
Joined: 24 Jan 2008 Posts: 10

|
Posted: Mon Apr 28, 2008 12:46 pm Post subject: |
|
|
Dr. Willis,
I am not understanding a few of the equations. do you have a few mintues? I will be in Woodward 211. |
|
| Back to top |
|
 |
twmeiswi
Joined: 21 Mar 2007 Posts: 7

|
Posted: Mon Apr 28, 2008 7:58 pm Post subject: |
|
|
Hi,
I noticed on the instructions for Proeject 3 for kNNClassify that I_test should be passed gamma_z. I believe that it is impossible to calculate the value of phi without the psi value, or the average of the images. I believe I should pass the omega value to kNNClassify and not findKNN. Also, I believe either of the findkNN or kNNClassify could be excluded from the project because esentially they do the same thing. What parameters should these functions take and why is kNNClassify only passed gamma_z?
Thanks,
Tom |
|
| Back to top |
|
 |
arwillis Site Admin
Joined: 02 Nov 2005 Posts: 396

|
Posted: Mon Apr 28, 2008 10:18 pm Post subject: |
|
|
featureValues is the omega matrix, i.e., the set of M K-dimensional points.
featureLabels is the matrix L, the set of labels for the omega matrix, one label for each row.
Yes, you need to pass in psi to compute phi_test from I_test (simple subtraction).
If you wish to do nearest neighbor estimation for k>1 then the work from project 2 in the kNNClassify and findkNN functions are both very useful.
kNNClassify and findkNN do totally different things. Briefly:
findkNN generates the candidate samples which are the k-nearest neighbors.
kNNClassify determines the label to assign to the test vector based on the values returned by the findkNN function.
As stated in the project assignment, kNNClassify has more than just gamma_z as input parameters. |
|
| Back to top |
|
 |
arwillis Site Admin
Joined: 02 Nov 2005 Posts: 396

|
Posted: Tue Apr 29, 2008 11:09 am Post subject: |
|
|
My result previously involved using 0.1*median(lambda) which is different from the project. Using the mean(lambda) as the threshold, lambda_bar, I get the following result.
=============================================================
Loading the 25 training images to compute PCA vectors with 784-dimensions.
Done computing the scatter of 25 training vectors for each class.
Results for digit 0 achieved 29 correct classifications out of 33 total instances, 12.12 percent error.
Results for digit 1 achieved 57 correct classifications out of 57 total instances, 0.00 percent error.
Results for digit 2 achieved 27 correct classifications out of 44 total instances, 38.64 percent error.
Results for digit 3 achieved 24 correct classifications out of 35 total instances, 31.43 percent error.
Results for digit 4 achieved 35 correct classifications out of 46 total instances, 23.91 percent error.
Results for digit 5 achieved 22 correct classifications out of 42 total instances, 47.62 percent error.
Results for digit 6 achieved 28 correct classifications out of 34 total instances, 17.65 percent error.
Results for digit 7 achieved 30 correct classifications out of 41 total instances, 26.83 percent error.
Results for digit 8 achieved 18 correct classifications out of 27 total instances, 33.33 percent error.
Results for digit 9 achieved 30 correct classifications out of 41 total instances, 26.83 percent error.
Got 300 correct classifications out of 400 total classifications.
Classifier has 25.00 percent error.
Time spent for PCA classifier classifications is 20.77 seconds |
|
| Back to top |
|
 |
|