So we start with a teachers prediction on grade distribution for a final exam, and the teacher is going to expect 25% of the final exam grades to be AIDS and 30% to be bees, 35% to be seized and 10% of the grades to be DS. The teacher then gives the test and discovers that seven or eight days seven or bees five our seas and one is a day, and we want to know how well the expected compares to what she actually observed. So we're going to end up running a chi square goodness of fit test to see how well the actual frequencies fit to the expected frequencies. Now the value 775 and one would be referred to as Theobald served frequencies. So in order to run the goodness of fit test, we're going to have to find a Chi Square test statistic which calls for a formula to add up or the sum of observed minus expected quantity squared, divided by expected.
So because of that formula, we're now going to need an additional column on our chart up here, and we're going to call that the expected frequencies. And if this teacher expected to 25% of the students and there were 20 students, then he or she would expect to have five A's. And this teacher expected 30% off the 20 students to be bees. So we would expect six bees, 35% of the 20 students. We would expect seven seas and 10% of the 20 students we would expect to DS Now, Technically, we can't run a goodness of fit test on this data.
And the reason being is, when you run a goodness of fit test, all the expected values should be greater than or equal to five. And since we came up with an expected value of two, technically, our data, we don't have a large enough sample size to run an effective goodness of fit test. But we're going to do so anyway, just for the practice of running a goodness of fit test. So we're now going to create a column called Oh minus e Quantity squared, divided by E. So that means we're going to take the observed AIDS, which was seven.
We're going to subtract the expected A's, which was five. We're going to square that quantity and divide by five, which is that expected value. And in doing so, we get a fraction of 4/5. Then we're gonna do that for the B. We observed seven, We expected six.
We're gonna square that quantity and divide by the expected and you'll get 1/6. And then we observed five sees we expected seven seas. So we get 4/7. And then finally we observed one d we expected to, and we'll get one. And then when we add up that column, we get what is called our Chi Square test statistic on our chi square.
Test statistic for this data is 2.3 81 We're then going to calculate a P value, and the P value is going to be equal to the probability that our Chi square is greater than that test statistic we just found. And a picture oftentimes helps with this. So the chi Square family of grass, for the most part, is a skewed right graph. And for this particular case, our degrees of freedom of this graph was three and we find degrees of freedom when it comes time to working with Chi squares by taking the number of categories and subtracting one. And since this teacher broke her grades up into A, B, C and D, we had four categories.
So that means the degrees of freedom for this chi square test is going to be three. And we know that the average of a chi square distribution is equivalent to the degrees of freedom. So the average of this is going to be three, and you always find the average slightly to the right of the peak. So we're gonna have a three right here and what we're doing when we're running this test is we're calculating the P value and the P value would be what is the probability that the chi square value is greater than two point 0381 So we're going to have to use a feature in the calculator in order to generate that area to the right of 2.381 for a chi square distribution with a degree of freedom of three. So what we're going to use is we're gonna use archives square C D f or cumulative density function in your graphing calculator to calculate this and when you do so it asks you for the lower boundary, the upper boundary and then the degrees of freedom.
So in our example, the lower boundary is going to be 2.381 Thea upper boundary is going to be just a super super large number. We're going to just say 10 to the 99th Power and our degrees of freedom here was three. So we're going to bring in my graphing calculator to show you or to calculate that p value. So we're going to hit the second button. We're going to hit the Bears button or the distributions and we're going to select in my calculator.
It's number eight. We're gonna type in that lower boundary 2.381 We're gonna follow it up with a comma. We're then going to type in a super large number 10 to the 99th Power, and then we're gonna follow that up with the degrees of freedom, and we're going to get a P value of 0.5645 So what does that refer to in our chart that refers to the area where the proportion of the curve that is shaded that is greater than a 2.381 Now, we could have gotten that another way using your graphing calculator, which also gives us the actual graph or images. Well, so I'm gonna bring my calculator in one more time. Let me be clear what we had there.
And we're going to hit that stat button, and we're gonna select tests, and we're going to do a chi Square goodness of fit test, which in my calculator is Letter D. And in doing so, we're going to have to tell it the degrees of freedom, which in this case, was three. And you could see we have our, um, total in there. The 2.381 and try again tests goodness of fit test. And now you can see the P value is 0.5645 and you can see what the picture looks like.
Okay. And the picture on the calculator kind of mirrors the picture that I have drawn on the white board is Well, okay, so from there, I'm gonna get rid of the graphing calculator at this point, and we're ready to draw a conclusion. So we would have to write a hypothesis and are no hypothesis is going to be stating that the expected distribution of grades on this final exam are as what is stated, which was 25% A's 30% bees, 35% sees and 10% DS three. Alternative hypothesis is going to be that the distribution of grades on the final exam differs from what is stated, and all it has to do is differ by one. So it might be that the A's are 25% but the bees might be 25% as well, so as long as one of the categories differs than would fall into that category.
So when you're running a chi square test to make your decision, you are going to do a comparison, and you are going to compare the level of significance to the P value. And if your level of significance is greater than your P value than the decision has to be to reject the null hypothesis. So in our case, we're going to be running our test at a 5% significance level. So we're saying 0.5 is greater than the P value and R P value is 0.5645 and that is false. That's not a true statement.
So because that's not true, we're not going to reject the null hypothesis. So our decision is going to be to fail to reject the null hypothesis. And if we fail to reject the null hypothesis, then we have to draw this conclusion. And the conclusion is going to be that there is not enough evidence at the 5% significance level to state that the distribution is different from the stated proportions. 25% A's 30% bees, 35% sees and 10% titties.
So there's not enough evidence to say it's not that now. We're not saying it is that we're saying there's just not enough evidence to dispute that it isn't that so that's how we run a chi square goodness of fit test.