A DESCRIPTION OF TEXAS A&M UNIVERSITY’S TEST SCORE REPORT
Measurement and Research Services
Memo No. 18
November 12, 2004
For every test submitted to Measurement and Research Services’ test scoring service, a report on the general characteristics of the test is produced. This bulletin describes the statistics reported in the test score report, and will aid in the interpretation of the test results. For information on how to submit tests to be scored, see the handout, “Test Scoring Service Information Bulletin”. Information on the data files generated by the scoring program can be found in the handout, “ Record Formats for Data Files produced from Optically Scanned Sheets”.
The first pages of the test report describe the information used in scoring the test, and list the reporting and output options requested. The first page lists general descriptive and test item count information, lists the scoring constants used, and show which reporting and output options were requested. The following pages show the scoring key, the scoring weight assigned to each item, the arrangement of items on each test form, and the weights assigned to each option on multiple keyed items.
***General Description Information***
The first section of the test score report lists the descriptive information collected on the scoring request form. The same information is listed on the top line of each page of the report. An example of this section is shown below.
GENERAL INFORMATION
Course : TST 123
Taught by : PROFNAME FM
Test Number : 07
Date Scored : 11/12/04
In the example, the course was TST 123, the instructor was F.M. Profname. The test was labeled number 07, and it was scored on November 12,2004.
***Test
Item Information***
The next section lists the number of items used in determining the total score. The test used as an example had 20 items, one of which was omitted. The following information reported.
TEST ITEM INFORMATION
Number of the Last Item : 20
Number of Items Omitted : 1
Number of Items Scored : 19
The last item listed on the answer key was item twenty, and the item listed as omitted was left blank on the answer key. The number of items scored is the difference between the number of items listed and the number of items omitted. This information provides a quick check of whether the test was properly scored.
High numbers of omitted items usually indicates something is amiss.
***Scoring Information***
The information describing the additive and multiplicative constants used in deriving the test scored are listed in the first section. It will look like this:
SCORING FACTORS
Bonus points 0
Scaling Factor 1.00
Bonus points refers to the number of points which will be added to every student’s score. If no bonus points are to be awarded, the value printed will be “0”. The scaling factor is the value by which item scores are multiplied. If a scaling factor is not specified, a factor of one will be used. The weights shown in the SCORING WEIGHTS section reflect the scaling factor. In the example above, no bonus points will be awarded.
***List of the Options Selected***
At the bottom of the first page is a listing of the report and output options that you requested. An example is shown below.
OPTIONS REQUESTED:
Report Options Output Options
Name roster Test Scores (floppy)
ID/Error Roster
Individual Score Reports
Unless you requested a report or output option on the Scoring Request Sheet, it will not be listed in this section and will not be produced. See the handout entitled “Test Scoring Service Information Bulletin” for a complete list of the options available.
***Answer Key, Item weight and Form
Information***
The second page of the report displays the answer key, the scoring weights assigned to each test item, and the item order for all of the alternate forms of the test. An example of this section is shown below.
ANSWER KEY, SCORING WEIGHTS AND VALID TEST
FORMS
|
STANDARD FORM (A) |
ALTERNATIVE
FORMS |
||||
|
|
|
|
|
|
|
|
Item
|
Key |
Weight
|
B |
C |
D |
|
1 |
C |
1 |
20 |
11 |
|
|
2 |
A |
1 |
19 |
12 |
|
|
3 |
D |
1 |
18 |
13 |
|
|
4 |
E |
1 |
17 |
14 |
|
|
5 |
A |
1 |
16 |
15 |
|
|
6 |
***MULTI*** |
15 |
16 |
|
|
|
7 |
B |
1 |
14 |
17 |
|
|
8 |
B |
1 |
13 |
18 |
|
|
9 |
B |
1 |
12 |
19 |
|
|
10 |
A |
1 |
11 |
20 |
|
|
|
|
|
|
|
|
|
STANDARD FORM ( A) |
ALTERNATIVE
FORMS |
||||
|
|
|
|
|
|
|
|
Item
|
Key |
Weight
|
B |
C |
D |
|
11 |
E |
1 |
10 |
1 |
|
|
12 |
***MULTI*** |
9 |
2 |
|
|
|
13 |
***OMIT*** |
8 |
3 |
|
|
|
14 |
A |
1 |
7 |
4 |
|
|
15 |
E |
1 |
6 |
5 |
|
|
16 |
B |
1 |
5 |
6 |
|
|
17 |
C |
1 |
4 |
7 |
|
|
18 |
D |
1 |
3 |
8 |
|
|
19 |
D |
1 |
2 |
9 |
|
|
20 |
***ZERO*** |
1 |
10 |
|
|
MULT: Multiple Key Item. Weights are listed below.
OMIT: Omitted Item
ZERO: Zero Weighted Item
The column labeled “key” lists the correct answer to each item. The weight column shows how many point will be added to the total score when each item is answered correctly. If you did not assign special item weights, the weight assigned to each item will be “1”. If an item has more than one response for which point are awarded, the word “MULT” will appear in the key and weight columns. For the test used as an example, multiple weights are shown for items 6 and 12. Weights for items with multiple responses are listed in the next section of the report. Items omitted from the test receiving zero weight will show the word “ZERO” (see item 20 ).
The numbers shown in the columns under the “ALTERNATIVE FORMS” are the item orders for each form of the test. In the preceding example, there were two alternative items (B and C). The order items on Form B were the reverse of the items on Form A (the Standard Form). The items of the Standard Form were split in half and the two halves reversed to produce the item order for Form C. Reading across the
line for the first item, the information shown in the example tells us that item 1 on Form A appeared as the last item on Form B, and the eleventh item on Form C.
***Multiple Key Items***
If your test had items which had multiple keys, the next section of your report will appear something like this:
MULTIPLE KEYED ITEM WEIGHTS
|
|||||||||
|
|
|
|
|
|
|
||||
|
Item |
A |
B |
C |
D |
E |
||||
|
6 |
0.00 |
1.00 |
0.00 |
0.00 |
1.00 |
||||
|
12 |
1.00 |
0.00 |
2.00 |
0.00 |
0.00 |
||||
Recall that items 6 and 12 in the example had more than one response which earned credit. This section shows the points awarded for each response to these items. On item 12, a student selecting response “A” will be awarded one point, and a student selecting response “C” will receive two points. No other answer is awarded credit.
ITEM ANALYSIS
Following the key and option information is the item analysis. It contains information about each test item. Computed for each item is its 1) difficulty, 2) discrimination, 3) number of omissions, 4) number of responses to each option, and 5) discrimination for each response option. The item analysis information will look like this:
|
Item 12 |
|
Diff.= |
0.62 |
|
Response |
A* |
B |
C* |
D |
E |
|
|
|
|
|
Disc.= |
0.32 |
|
Count |
7 |
10 |
58 |
8 |
14 |
|
|
|
|
Omits= |
3 |
|
Disc (R) |
.18 |
-.27 |
.40 |
-.09 |
-.13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Item 13 |
|
***Item Omitted*** |
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Item 14 |
=> |
Diff.= |
0.14 |
|
Response |
A* |
B |
C |
D |
E |
|
|
|
|
=> |
Disc. |
-0.21 |
|
Count |
14 |
57 |
13 |
0 |
17 |
|
|
|
|
Omits= |
0 |
|
Disc (R) |
-.21< |
.23< |
-.04 |
.00 |
-.01 |
The arrows flag item statistics which may indicate that the item may not be working properly. All item statistics and the criteria for warning flags are described below.
1) Item Difficulty - The first value shown for each item is the item’s difficulty (Diff). - It is the sum of all of the points awarded on the item divided by the maximum number of points possible. The value of this statistic will be between 0.00 (indicating none of the students received credit for their answer), and 1.00 (meaning that all the students received maximum credit). The term “item difficulty” is a misnomer. A better term would be “easiness” because the higher the value of the index, the easier the item was. In the example above, the difficulty index for item 14 was 0.14. Since this item had only one correct answer, this means that 14% of the students answered the item correctly. The value can be verified by dividing the number of students who chose response “A” on the item 14 by the total number of students who took the test ( 100 ).
Computing the item difficulty for item 12, a multiple key item, is slightly more complicated. Recall that the item has two responses for which credit is given ( A and C). If both responses were worth the same number of points (which they are not), then the item difficulty could be calculated by adding the number of students who received credit on the item and by dividing by the total number of students. To compute the difficulty index for an item with different response weights, multiply the number of students selecting each option by the response weight, sum the five products and divide by the maximum number of points possible (the number of students times the maximum credit ) will produce the difficulty index.
One may ask what the range of item difficulties should be on a good test. The answer is that it depends on what you wish to know. If the purpose of a test were to determine if the students have mastered a topic area, one would expect high difficulty values. If the purpose of a test is to discriminate between different levels of achievement, items with difficulty values between 0.3 and 0.7 will be most effective. Warning flags will appear next to the item difficulty index when the item is extremely easy (the difficulty is greater than 0.95), or when it is extremely difficult ( the index is less than 0.20 )
2) Item Discrimination - The value directly beneath the item difficulty is the item discrimination index. It is the correlation between the points awarded on an item and the total test score. This index can have values between -1.0 and 1.00. A negative value for this index means that students who missed the item received total scores higher than the students who answered the item correctly. A positive correlation means that the student answering correctly performed better than the students who missed the item. A value of 0.0 indicates that there was no difference between the two groups. The absolute value of the index is a measure of the strength of the relationship. Values above 0.5 mean that the students who answered the item correctly (or incorrectly in the case of negative values ) frequently had the highest scores on the test. An item difficulty can affect the discrimination index. Items which are very easy (or very difficult ) will not discriminate very well between high and low scoring groups. On these items, nearly everyone will have gotten the items right ( or wrong ) regardless of how they performed on the other items on the test. Item which can discriminate well are those which have difficulties between 0.3 and 0.7. A warning flag will appear by the discrimination statistic when the index is below 0.1 on items of moderate difficulty and for all items when index is below zero.
The item discrimination index conveys some very useful information. All the values on an achievement test should be positive. If one is not, it may mean that the item: 1) does not measure what the other items in the test are assessing, 2) was poorly worded or ambiguous, or 3) was miskeyed. In any case, the item should be examined. Items with discrimination indices near zero should also examined. If the difficulty index of the item is not near 1.0 or 0.0, the item may have one of the problems listed above for instructor-developed tests, most of the items should have discrimination indices above 0.20.
In the example above, item 12 has a discrimination index of 0.32. This indicates that the item is discriminating quite well between low and high scoring students. Item 14 displays a discrimination index value of -0.21. A negative value of this size generally indicates that something is amiss. In this case, it appears that the item was incorrectly keyed.
3) Number of Omits - Beneath the discrimination index is the number of students that did not answer the item. No points are awarded to a student when an item is omitted. In the example, three students omitted item 12, and no students omitted item 14. A warning flag by the number of omissions indicates that more than five percent of students omitted that item.
Test-wise students will not omit an item on a multiple-choice test unless there is a penalty for wrong answers. When the students are well-informed, there will be very few omissions. The number of omissions can tell you when a test is too long. The number of omissions will rise dramatically at the end of a test when the students don’t have enough time to finish.
4) Response Count - The number of students who selected each response is listed to the right of discrimination index. For item 12 in the example, seven students selected response “A,” ten chose “B,” fifty–eight selected the correct response ( option “C” ), and so on. Response count information is useful for identifying common misconceptions among the students. It is also helpful in identifying distracters