Correlation between test and homework scores
Suppose we have a set of exam scores and a set of homework scores for a class and we want to see how they compare or correlate. There are many ways to make the comparison, many of which you could dream up on your own. One of the simplest is the raw correlation coefficient defined as the cosine of the angle between the two sets of scores thought of as vectors in , where n is the number of students in the class. For example, here are the exam scores for a test.
> ex := [98,64,87,72,73,64,87,84,93,57,79,88,79,83,92,87,70,27,88,87,52,84,80,62,96];
and here are the corresponding strings of missed (ms) and right (rt) answers to homework done in the 5 weeks prior to the test.
> ms := [10,29,13,35,2,68,36,10,10,84,28,40,39,46,33,15,51,50,51,23,39,58,13,40,44];
> rt := [102,95,118,74,11,118,76,102,102,72,79,117,73,174,279,76,96,145,95,147,72,54,99,92,198];
Now the average exam score is 77.3 (80.3 if the 27 is thrown out.).
> exave := evalf(convert(ex,`+`)/nops(ex));
The total attempts (tot) and homework percentages (hm) are computed from ms and rt.
> tot := ms + rt;
> hm := [seq(100.*rt[i]/tot[i],i=1..nops(rt))];
Perhaps the most natural correlation is to compute the cosine of the angle between the exam vector and the various homework vectors. We would interpret a return of 1 to mean that the two score vectors correlate perfectly. A return of 0 means that the scores are totally unrelated (perpendicular to each other). When would we expect a negative correlation? Perhaps when the scales are reversed. For example, we would expect a negative correlation between homework missed and exam score.
Here is a Maple word which computes this correlation. corr has two inputs u and v (vectors of the same length); it returns the cosine of the angle between u and v.
>
corr := proc(u,v)
evalf(linalg[dotprod](u,v)/(linalg[norm](u,2)*linalg[norm](v,2))) end;
> corr(ex,rt);
> corr(ex,ms);
> corr(ex,tot);
> corr(ex,hm);
This is fantastic! Look at how perfectly hm correlates with ex. Unfortunately, it also correlates fairly well with ms, where we said we would expect negative correlation. In fact, upon inspection we see that two vectors of nonegative numbers will never have a negative correlation, so we must change our definition of correlation. The standard way to do this is to translate each vector by its average vector before the cosine is computed. That is what goodcorr below does.
>
goodcorr := proc(u,v)
local mu,mv,i,m;
m := convert(u,`+`)/nops(u);
mu := [seq(u[i]-m,i=1..nops(u))];
m := convert(v,`+`)/nops(v);
mv := [seq(v[i]-m,i=1..nops(v))];
corr(mu,mv) end;
Using this to correlate ex with hm, ms, rt, and tot:
> goodcorr(ex,hm);
> goodcorr(ex,ms);
> goodcorr(ex,tot);
> goodcorr(ex,rt);
Actually, this correlation is in the describe subpackage of the stats package.
> with(stats);
> with(describe);
> linearcorrelation(ex,hm);
> evalf(linearcorrelation(ex,ms));
> evalf(linearcorrelation(ex,tot));
> evalf(linearcorrelation(ex,rt));
So we see that the correlation between homework grades and exam grades is fairly strong, although not by any means perfect. When we get to the chapter on least squares approximation, we will exam a different way to measure correlation.