A DESCRIPTION OF TEXAS A&M UNIVERSITY’S TEST SCORE REPORT

 

Measurement and Research Services

Memo No.  18

November 12, 2004

 

OVERVIEW

 

For every test submitted to Measurement and Research Services’ test scoring service, a report on the general characteristics of the test is produced.  This bulletin describes the statistics reported in the test score report, and will aid in the interpretation of the test results.  For information on how to submit tests to be scored, see the handout, “Test Scoring Service Information Bulletin”.  Information on the data files generated by the scoring program can be found in the handout, “ Record Formats for Data Files produced from Optically Scanned Sheets”.                    

 

 

KEY AND OPTION INFORMATION

 

The first pages of the test report describe the information used in scoring the test, and list the reporting and output options requested.  The first page lists general descriptive and test item count information, lists the scoring constants used, and show which reporting and output options were requested.  The following pages show the scoring key, the scoring weight assigned to each item, the arrangement of items on each test form, and the weights assigned to each option on multiple keyed items.

 

***General Description Information***

     

The first section of the test score report lists the descriptive information collected on the scoring request form.  The same information is listed on the top line of each page of the report.  An example of this section is shown below.

  

 

            GENERAL INFORMATION

 

   Course   :                             TST 123

   Taught by    :                     PROFNAME FM

   Test Number  :                    07

   Date Scored   :                    11/12/04               

 

In the example, the course was TST 123, the instructor was F.M.  Profname.  The test was labeled number 07, and it was scored on November 12,2004.  

 

***Test Item Information***

 

The next section lists the number of items used in determining the total score.  The test used as an example had 20 items, one of which was omitted.  The following information reported.

 

                 TEST ITEM INFORMATION

 

                  Number of the Last Item  :                                20

                  Number of Items Omitted   :                               1                                                                                           

  Number of Items Scored    :                              19

 

The last item listed on the answer key was item twenty, and the item listed as omitted was left blank on the answer key.  The number of items scored is the difference between the number of items listed and the number of items omitted.  This information provides a quick check of whether the test was properly scored.

High numbers of omitted items usually indicates something is amiss.

 

***Scoring Information***

 

 

The information describing the additive and multiplicative constants used in deriving the test scored are listed in the first section.  It will look like this:

 

SCORING FACTORS

 

                                                         Bonus points               0                                                                                             

                                                          Scaling Factor            1.00

 

Bonus points refers to the number of points which will be added to every student’s score.  If no bonus points are to be awarded, the value printed will be “0”.  The  scaling factor is the value by which item scores are multiplied.  If a scaling factor is not specified, a factor of one will be used.  The weights shown in the SCORING WEIGHTS section reflect the scaling factor.  In the example above, no bonus points will be awarded. 

                  

***List of the Options Selected***

 

At the bottom of the first page is a listing of the report and output options that you requested.  An example is shown below.               

 

  

                 OPTIONS REQUESTED:

 

                     Report Options                                                   Output Options       

                

                     Name roster                                                     Test Scores (floppy)      

                    ID/Error Roster             

                    Individual Score Reports  

 

 

Unless you requested a report or output option on the Scoring Request Sheet, it will not be listed in this section and will not be produced.  See the handout entitled “Test Scoring Service Information Bulletin” for a complete list of the options available.        

 

***Answer Key, Item weight and Form Information***

 

The second page of the report displays the answer key, the scoring weights assigned to each test item, and the item order for all of the alternate forms of the test.  An example of this section is shown below.        


ANSWER KEY, SCORING WEIGHTS AND VALID TEST FORMS

 


STANDARD FORM (A)

    ALTERNATIVE         

         FORMS

 

 

 

 

 

 

Item

Key

Weight

B

C

D

1

C

1

20

11

 

2

A

1

19

12

 

3

D

1

18

13

 

4

E

1

17

14

 

5

A

1

16

15

 

6

***MULTI***

15

16

 

7

B

1

14

17

 

8

B

1

13

18

 

9

B

1

12

19

 

10

A

1

11

20

 

 

 

 

 

 

 

STANDARD FORM ( A)

   ALTERNATIVE

          FORMS

 

 

 

 

 

 

Item

Key

Weight

B

C

D

11

E

1

10

1

 

12

***MULTI***

9

2

 

13

***OMIT***

8

3

 

14

A

1

7

4

 

15

E

1

6

5

 

16

B

1

5

6

 

17

C

1

4

7

 

18

D

1

3

8

 

19

D

1

2

9

 

20

***ZERO***

1

10

 


 


MULT: Multiple Key Item.  Weights are listed below.   

OMIT: Omitted Item  

ZERO: Zero Weighted Item


 

The column labeled “key” lists the correct answer to each item.  The weight column shows how many point will be added to the total score when each item is answered correctly.  If you did not assign special item weights, the weight assigned to each item will be “1”.  If an item has more than one response for which point are awarded, the word “MULT” will appear in the key and weight columns.  For the test used as an example, multiple weights are shown for items 6 and 12.  Weights for items with multiple responses are listed in the next section of the report.  Items omitted from the test receiving zero weight will show the word “ZERO” (see item 20 ).                  

 

The numbers shown in the columns under the “ALTERNATIVE FORMS” are the item orders for each form of the test.  In the preceding example, there were two alternative items  (B and C).  The order items on Form B were the reverse of the items on Form A (the Standard Form).  The items of the Standard Form were split in half and the two halves reversed to produce the item order for Form C.  Reading across the

line for the first item, the information shown in the example tells us that item 1 on Form A appeared as the last item on Form B, and the eleventh item on Form C.

 

***Multiple Key Items***

 

If your test had items which had multiple keys, the next section of your report will appear something like this:

 

 

 

MULTIPLE KEYED ITEM WEIGHTS

 

 

 

 

 

 

Item

A

B

C

D

E

6

0.00

1.00

0.00

0.00

1.00

12

1.00

0.00

2.00

0.00

0.00

 

 

 

 

Recall that items 6 and 12 in the example had more than one response which earned credit.  This section shows the points awarded for each response to these items.  On item 12, a student selecting response “A” will be awarded one point, and a student selecting response “C” will receive two points.  No other answer is awarded credit.


 

ITEM ANALYSIS

 

 

Following the key and option information is the item analysis.  It contains information about each test item.  Computed for each item is its 1) difficulty, 2) discrimination, 3) number of omissions, 4) number of responses to each option, and 5) discrimination for each response option.  The item analysis information will look like this:

  

Item 12

 

Diff.=

0.62

 

Response

A*

B

C*

D

     E

 

 

 

Disc.=

0.32

 

Count

7

10

58

8

14

 

 

 

Omits=

3

 

Disc (R)

.18

-.27

.40

-.09

-.13

 

 

 

 

 

 

 

 

 

 

 

 

Item 13

 

***Item Omitted***

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Item 14

=>

Diff.=

0.14

 

Response

A*

B

C

D

     E

 

 

=>

Disc.

-0.21

 

Count

14

57

13

0

17

 

 

 

Omits=

0

 

Disc (R)

-.21<

.23<

-.04

 .00

-.01

 

The arrows flag item statistics which may indicate that the item may not be working properly.  All item statistics and the criteria for warning flags are described below.

 

1)  Item Difficulty  -  The first value shown for each item is the item’s difficulty (Diff).  - It is the sum of all of the points awarded on the item divided by the maximum number of points possible.  The value of this statistic will be between 0.00 (indicating none of the students received credit for their answer), and 1.00 (meaning that all the students received maximum credit).  The term “item difficulty” is a misnomer.  A better term would be “easiness” because the higher the value of the index, the easier the item was.  In the example above, the difficulty index for item 14 was 0.14.  Since this item had only one correct answer, this means that 14% of the students answered the item correctly.  The value can be verified by dividing the number of students who chose response “A” on the item 14 by the total number of students who took the test ( 100 ).

 

Computing the item difficulty for item 12, a multiple key item, is slightly more complicated.  Recall that the item has two responses for which credit is given ( A and C).  If both responses were worth the same number of points (which they are not), then the  item difficulty could be calculated by adding the number of students who received credit on the item and by dividing by the total number of students.  To compute the difficulty index for an item with different response weights, multiply the number of students selecting each option by the response weight, sum the five products and divide by the maximum number of points possible (the number of students times the maximum credit ) will produce the difficulty index.

 

One may ask what the range of item difficulties should be on a good test.  The answer is that it depends on what you wish to know.  If the purpose of a test were to determine if the students have mastered a topic area, one would expect high difficulty values.  If the purpose of a test is to discriminate between different levels of achievement, items with difficulty values between 0.3 and 0.7 will be most effective.  Warning flags will appear next to the item difficulty index when the item is extremely easy (the difficulty is greater than 0.95), or when it is extremely difficult ( the index is less than 0.20 )

 

2)  Item Discrimination - The value directly beneath the item difficulty is the item discrimination index.  It is the correlation between the points awarded on an item and the total test score.  This index can have values between -1.0 and 1.00.  A negative value for this index means that students who missed the item received total scores higher than the students who answered the item correctly.  A positive correlation means that the student answering correctly performed better than the students who missed the item.  A value of 0.0 indicates that there was no difference between the two groups.  The absolute value of the index is a measure of the strength of the relationship.  Values above 0.5 mean that the students who answered the item correctly (or incorrectly in the case of negative values ) frequently had the highest scores on the test.  An item  difficulty can affect the discrimination index.  Items which are very easy (or very difficult ) will not discriminate very well between high and low scoring groups.  On these items, nearly everyone will have gotten the items right ( or wrong ) regardless of how they performed on the other items on the test.  Item which can discriminate well are those which have difficulties between 0.3 and 0.7.  A warning flag will appear by the discrimination statistic when the index is below 0.1 on items of moderate difficulty and  for all items when index is below zero.           

 

The item discrimination index conveys some very useful information.  All the values on an achievement test should be positive.  If one is not, it may mean that the item: 1) does not measure what the other items in the test are assessing, 2) was poorly worded or ambiguous, or 3) was miskeyed.  In any case, the item should be examined.  Items with discrimination indices near zero should also examined.  If the difficulty index of the item is not near 1.0 or 0.0, the item may have one of the problems listed above for instructor-developed tests, most of the items should have discrimination indices above 0.20.

 

In the example above, item 12 has a discrimination index of 0.32.  This indicates that the item is discriminating quite well between low and high scoring students.  Item 14 displays a discrimination index value of -0.21.  A negative value of this size generally indicates that something is amiss.  In this case, it appears that the item was incorrectly keyed.    

 

3) Number of Omits - Beneath the discrimination index is the number of students that did not answer the item.  No points are awarded to a student when an item is omitted.  In the example, three students omitted item 12, and no students omitted item 14.  A warning flag by the number of omissions indicates that more than five percent of students omitted that item. 

 

Test-wise students will not omit an item on a multiple-choice test unless there is a penalty for wrong answers.  When the students are well-informed, there will be very few omissions.  The number of omissions can tell you when a test is too long.  The number of omissions will rise dramatically at the end of a test when the students don’t have enough time to finish. 

 

4) Response Count - The number of students who selected each response is listed to the right of discrimination index.  For item 12 in the example, seven students selected response “A,” ten chose “B,” fifty–eight selected the correct response ( option “C” ), and so on.  Response count information is useful for identifying common misconceptions among the students.  It is also helpful in identifying distracters 

( incorrect response options ) which are incorrect or not feasible.  In the example, no one selected response “D” on item 14.  This does not necessarily mean that the option needs to be revised, but it does serve to call notice to a possible problem.

 

5) Response Discrimination - Below each response count is the correlation between selecting each response and the total score on the exam.  This index is very similar to the item discrimination index, and may be interpreted in the same way.  A negative value for this index means that students receiving a low score tended to select this option more than higher scoring students.  Conversely, a positive value indicates higher scoring students tended to select the responses more often.  Ideally, the correct responses should have a positive value, and incorrect responses should have negative values.  Warning flag will mark the response discrimination values which do not meet this criterion.

 

In the example, the correct answers in item 12 have positive response discrimination values (0.18 and 0.40).  The incorrect responses all have negative values (-0.27, -0.09, and -0.13). This suggests that the item was behaving as it was intended.  In comparison, item 14 shows a negative response discrimination index for the correct answer, and shows a positive value for one of the distracters.  This usually indicates that the item was miskeyed.  The response discrimination index for the correct response option will be identical to the item discrimination index for items with only one answer.  The item discrimination index for questions with more than one answer are a function of the weight given to each response and the value of each response discrimination index.

 

 

 

DISTRIBUTION OF STUDENT SCORES

 

Each test report will contain two pages that describe how the test scores were distributed among the examinees.  The first page presents the information in a table.  It lists the frequency, cumulative frequency, percentile rank, standardized score and percentage correct for each possible score between the highest and the lowest earned scores.  An example of the report and a brief description of the statistics contained within the table are listed below.               

 

DISTRIBUTION OF STUDENT SCORES

 

 

SCORE

FREQUENCY

CUMULATIVE

PERCENT

STANDARD

PERCENT

 

 

FREQUENCY

RANK

SCORE

CORRECT

20

3

100

99

70

100

19

4

97

95

67

95

18

6

93

90

63

90

17

7

87

84

60

85

16

11

80

75

57

80

15

13

69

63

54

75

14

13

56

50

50

70

13

10

43

38

47

65

12

11

33

28

44

60

11

6

22

19

41

55

10

7

16

13

37

50

9

5

9

7

34

45

8

2

4

3

31

40

7

2

2

1

27

35

 

 

1) Score - This is the total test score computed by the scoring weights described in the first two pages.  The scores shown include all the values between the highest and lowest obtained scores.  In the example, the scores raged form a high of 20 to a low of 7.

 

2) Frequency - The number of students receiving each score.  The most frequently received scores on the test in the example were 14 and 15.  Thirteen students received each score.

 

3) Cumulative Frequency - The number of students who scored at or below a given score.  In the table above, we can see that 80 students earned a score of 16 or below.

 

4) Percentile Rank - The percentage of persons scoring at or below  a specific score.  It is obtained by totaling the frequencies of all the scores below the given score, adding half the frequency of students receiving the score, and then diving by the total number of examinees.  In the example, the percentile rank for a score of 14 was 50.  This means that fifty percent of the students taking the test earned a score of 14 or below.

 

5) Standard Score - This is a transformation of the test scores to create scores which have a mean of 50 and standard deviation of 10.  Commonly referred to as T-score, they are computed by a)subtracting the mean from the score, b) dividing by the standard deviation, c) multiplying the result by 10, and d) adding 50.  Standard score are often used to give equal weight to several tests of varying length and difficulty before combining them to compute a final score.  The example shows that a score of 17 is equivalent to a standard score of 60.  Since the standard deviation of these scores are know to be 10, we know that this score is one standard deviation higher than the mean.

 

6) Percent Correct - This is a person’s score divided by the maximum possible score and multiplied by one hundred.  For a test with no weighted items, it is the percentage of items answered correctly,  In the example, a score of 10 is equivalent to a percent score of 50.

 

The second page describing the distribution of scores shows graphically the spread of the scores.  For the data in the preceding example, the display would look like this:

 

 

                DISTRIBUTION OF STUDENT SCORES

 

                         NUMBER     

 SCORE         OF STUDENTS              TALLIES                NOTE: Each tally mark represents

                                                                                                            1 student(s)     

 

 

 

 

20

3

( 3%)

III

19

4

( 4% )

IIII

18

6

( 6% )

IIIIII

17

7

( 7%)

IIIIIII

16

11

( 11% )

IIIIIIIIIII

15

13

( 13%)

IIIIIIIIIIIII

14

13

(13%)

IIIIIIIIIIIII

13

10

(10%)

IIIIIIIIII

12

11

(11%)

IIIIIIIIIII

11

6

( 6%)

IIIIII

10

7

( 7%)

IIIIIII

9

5

( 5%)

IIIII

8

2

( 2%)

II

7

2

( 2%)

II

 

___

 

 

                        100

 

 

The values listed under “NUMBER OF STUDENTS” are the same as the frequencies shown on the prior page.  The value in parentheses following each frequency count is the percentage of the students who received the score.  A tally mark will be recorded in the histogram to the right of the scores to represent the number of students earning each score.  If there are more than 50 possible different scores, intervals of scores will be represented instead of individual scores.  If more than 60 students receive the same score or fall into the same score interval, then the value represented by each tally mark will be larger than one and its value will be reported above the histogram.

 

If one of the test forms was scored incorrectly, a quick glance at the histogram will identify the problem.  When test forms are scored with an incorrect key, the resulting scores will be the chance level.  This will be reflected in the histogram as a large cluster of scores at the low end of the distribution. 

 

 

ITEM AND TEST CHARACTERISTICS

 

The page describing item and test characteristics contains three sections 1) the distribution of the item difficulty indices, 2) the distribution of the item discrimination indices, and 3) a summary of the test’s statistics.  Each section is described below.

 

 

 

 

 

                                              ***Distribution of Item Difficulties***

 

The top third of this page will contain a display of the distribution of the item difficulty indices.  An example is shown below.

 

              DISTRIBUTION OF ITEM DIFFICULTIES

 

                                      NUMBER                                                      NOTE: Each tally mark represents                       

         RANGES            OF ITEMS                 TALLIES                                                1 item (s)

 

.90 to 100

4 

( 21%)

IIII

.80 to .89

3

( 16%)

III

.70 to .79

2

( 11%)

II

.60 to .69

4

( 21%)

IIII

.50 to .59

2

(  5%)

II

.40 to .49

1

( 11%)

I

.30 to .39

2

(  5%)

II

.20 to .29

0

( 11%)

 

.10 to .19

1

(  5%)

I

.00 to .09

0

(  0%)

 

 

19

 

 

 

 

 

 

 

 

 

 

This display shows how many items had item difficulty values within specific ranges.  It is useful for quickly examining the difficulty of the items.  In the example, one item is obviously more difficult than the others, and seven of the items were answered correctly by over eighty percent of the students.

 

 

*** Distribution of Item Discrimination Indices***

 

The middle section of the page displays the item discrimination indices.  This section shows graphically

the distribution of item discrimination indices for the test items.  An example of the histogram is displayed below.

 

           DISTRIBUTION OF ITEM DISCRIMINATION INDICIES

 

 

                                      NUMBER                                                      NOTE: Each tally mark represents                       

         RANGES            OF ITEMS                 TALLIES                                1 item (s)

 

.90 to 100

0 

(  0%)

 

.80 to .89

0

(  0%)

 

.70 to .79

0

(  0%)

 

.60 to .69

0

(  0%)

 

.50 to .59

1

(  5%)

I

.40 to .49

2

( 11%)

II

.30 to .39

4

( 21%)

IIII

.20 to .29

6

( 32%)

IIIIII

.10 to .19

3

( 16%)

III

.00 to .09

2

( 11%)

II

"-1.00 to" -.01

1

(  5%)

I

 

19

 

 

 

 

This histogram has a category for all negative values.  Negative item discrimination indices mean that students that answer the item correctly receive lower total scores than do the students who miss item.  Usually there is something wrong with items that show negative discrimination.  When one is found, the instructor should look closely at the item analysis information as well as the item itself.  The interpretation of the item discrimination index is discussed more fully in the section of this bulletin covering item analysis.

 

***Summary of Test Statistics***

 

The bottom portion of the page lists thirteen statistics describing the test, The table contains two sets of values.  The values in the left column are expressed in test score units.  The values in the column on the right are expressed as percentages.  An example of the table is presented below.

 

 SUMMARY OF TEST STATISTICS

 

TEST STATISTIC

 

VALUE

 

PERCENT

 

 

 

 

 

 

MEDIAN

 

 

14.04

 

70.20

MEAN OF THE TEST

 

13.89

 

69.45

STANDARD DEVIATION

 

3.06

 

15.31

STANDARD ERROR MEASURE

2.09

 

10.45

RELIABILITY (Coefficient Alpha)

0.71

 

 

MEAN DIFFICULTY

 

0.68

 

 

MEAN DISCRIMINATION

 

0.26

 

 

 

 

 

 

 

 

HIGHEST SCORE

 

20

 

100.00

LOWEST SCORE

 

7

 

35.00

MAXIMUM SCORE

 

20

 

100.00

 

 

 

 

 

 

NUMBER OF FORMS

 

3

 

 

NUMBER OF SCORED ITEMS

19

 

 

NUMBER OF STUDENTS

 

100

 

 

 

 

Each of the statistics listed in the summary table is described below.

 

1) Median - The score at which half the student fall above and half below.  It is a useful estimate of the typical score, especially when there are extremely high or extremely low scores.  The median in the example is 14.04 which translates into a percent correct of 70.20.

 

2) Mean - The arithmetic average of all the test scores.  The most commonly used estimate of a typical score, the mean is the sum of all the scores divided by the number of examinees.  In the example, the mean of the raw scores is 13.89, and the mean of the scores in percent units is 69.45.  The mean and the median will be very similar if the score are symmetrically distributed.  When there are several extreme scores, the median may be a better estimate of the typical score.

 

3) Standard Deviation - This is a measure of the variability or deviation of the scores from the mean.  The value is the square root of the average squared deviation from the mean.  The more the score are spread out, the higher the standard deviation will be.  The standard deviation of the raw scores in the summary above is 3.06. 

 

4) Standard Error Measure - This statistic is a measure of the stability of a test score.  The more stable a test score is, the lower the standard error of measurement (SEM) will be.  The value may be computed by multiplying the square root of one minus the square of the reliability (discussed in the next section) by the standard deviation. 

The concept behind the standard error of measurement involves the idea of repeated testing of an individual with many tests of similar content and difficulty.  Because of slight differences in test content, in testing conditions and in the individual’s responses.  The scores from parallel forms of the test will not be the same.  Instead, there will be distribution of scores, and the mean of the distribution is generally regarded as the best estimate of a person’s ability.  The standard error of measurement is the standard deviation of distribution.

 

The value computed by the scoring program uses information from only one administration of a test.  By assuming that the distribution of scores of parallel tests would be normally distributed, the standard error of measurement can be estimated.  In the example, the SEM was 2.09.  This may be interpreted to mean that the score which best represents a student’s true capability will be within 2.09 of their raw score about 68% of the time.

 

5) Reliability - This coefficient is an estimate of the extent to which each test item measures what the entire test is measuring, The statistic computed is referred to as coefficient alpha, and it is an index of a test’s internal consistency.  The coefficient will have a value between 0.00 and 1.00.  A coefficient of 1.00 means that each item measures exactly the same ability as does the total test score.  A coefficient of 0.00 means that the item scores are unrelated to the total test score.  Values between 0.60 and 0.80 are typical for classroom test.  This is about average for the tests which are scored by the scoring service at Texas A&M.

 

Reliability is influenced by several test characteristics.  Most important is the similarity of the content of the items.  A test which attempts to asses many different abilities will probably have a lower reliability than a test which attempts to measure only ability.  Also of importance is the length of the test.  All other things being equal, longer tests are more reliable than shorter tests.  A third influence on reliability is difficulty.  Tests which are very easy or very difficult tend to be less reliable than those of moderate difficulty.

 

The reliability of a test may be increased by using the information provided in the item analysis.  The most important statistic to examine is the index of item discrimination.  A test’s reliability will be increased by deleting items with negative discrimination indices  ( even though the test will be shortened).  It is also likely that eliminating very easy or very hard items will increase the overall reliability of the test.  However, when considering deleting any item from a test, remember that the item and test statistics should only serve as a guide and that small increases in reliability are less important than representative coverage of the topic area.

 

6) Mean Difficulty - The mean difficulty is the average of the item difficulties.  This value gives a quick estimate of the overall difficulty of the test.  Multiplied by one hundred, it equals the average of the test in percent scores.  In the example, the mean item difficulty was 0.70.  This means that the average student was awarded about 70% of the points possible or that students answered a typical item correctly about  70% of the  time.

 

7) Mean Discrimination  - This is the average of the  discrimination values.  It serves to provide a quick estimate of items’ ability to distinguish between high and low scoring students.  The mean discrimination for the test in the example was 0.26.  This is typical value for a classroom test.

 

8) Highest Score - This is the highest score earned by a student on the test.  The highest score for the test described above was 20.

 

9)  Lowest Score - This is the lowest score attained by a student on the exam.  The lowest score for the test shown in the example was 7 ( 35% correct ).

 

10) Maximum Score - This is the highest possible score that could be earned on the test.  It is the basis for all percent correct computations.  In the example, the highest possible was 20.

 

 

 

11) Number of Forms - This is the number of different forms of the test which were graded together in the analysis.  It will correspond to the number of item orders shown on the second page of the report.  In the example, there were three forms used - the standard form and two alternate forms. 

 

12) Number of Item Scored - The number of items used in determining the final score is listed here.  It will be identical to the number of valid items listed on the front of the report.  There were 19 items used in the test in the example.

 

13) Number of Students - The final value listed is the number of tests scored.  In the example, one hundred tests were analyzed.

 

***Test Form Statistics***

 

 

Tests that use more than one form will have included with the test report a page that shows the number of students which took each form and the mean expressed in raw scores and percentages.  An example of this page is shown below.

 

 

 

          STATISTICS FOR THE DIFFERENT TEST FORMS

 

 

 

NUMBER OF
MEAN
PERCENT

 

 

STUDENTS

 

 

 

 

 

 

 

TEST FORM A

31

14.07

70.35

TEST FORM B

34

5.37

26.85

TEST FORM C

35

14.43

72.15

 

****The means of Forms A and B are significantly different****

 

****The means of Forms B and C are significantly different****

 

 

This analysis is important because it enables you to check the average score for each form.  Assuming that the forms are distributed randomly throughout the class, the mean scores of the other forms should be comparable.  If the mean on a form is significantly different from the other form(s), it may mean that the correspondence of the items between the standard form and the form in question may have been incorrect.  In the example, the mean for form A and C are 71% correct.  The mean for form B was only 27%.  The mean for forms B is typical of what happens when a form is scored incorrectly; The average score will be near the chance level of success.  If this occurs, refer back to the second page of the report where the correspondence between the forms is shown and recheck the item orders.

 

The scoring program will automatically compare the means of each of the forms to each of the other forms.  If there are any statistically significant differences found, a warning will be printed.  In the example, the mean of Form B was found to be markedly different from the means of Forms A and C.

 

 

                           

                          SCORE ROSTERS AND INDIVIDUAL SCORE REPORTS

 

There are five different rosters which can be reported.  They are the 1) name roster, 2) ID roster, 3) name/error roster, 4) ID/error roster, and 5) individual score report.  Each roster is described in the following section.  Not all of the rosters are produced for every report.  A list of the rosters that were selected for your report are listed on the first page of each test score report.

 

***Name Roster***

 

 

The name roster lists score information alphabetically by last name.  An example of the roster is shown below.

                                 NAME SCORE ROSTER

       

 

 

 

 

PERCENTILE
STDRD
PERCENT
NAME

 

ID NUMBER
FLAGS
FORM
SCORE
RANK
SCORE
CORRECT

 

 

 

 

 

 

 

 

 

BALLPLAYER CASH

?

*0*

B

7

1

27

35

BRASSY LOUDEN

123571113

 

  A?

10

13

37

50

CIRRIUS UKANT B

421246816

 

C

13

38

47

65

FLUBRANENS ROCK

4873256.31

 

   B?

18

90

63

90

OILMONEY HOUSTON

543219876

 

C

15

63

54

75

RANGERIDER BILLY

409845053

 

A

14

50

50

70

THANT MILLIE

456123789

*M*

B

12

28

44

60

 

 

 

 

 

 

 

 

 

MEAN

 

 

 

 

12.71

 

 

63.57

STD DEV

 

 

 

 

2.79

 

 

13.95

7         STUDENTS

 

Listed for each student on this roster are their name, ID number, warning flags (if any ), test form, raw score, percentile rank, standard score and percent correct.  Each statistic was described earlier in the section on the distribution of student scores.  When a student fails to enter their name or ID number, a question mark will be printed.

 

Warning flags are used to notify the instructor of conditions that might affect a student’s score.  A flag will appear for two reasons: 1) an O will be printed when the student omits more than 5 percent of the items, or 2) an M will appear when a student has more than two  marks responses.  In addition, a question mark will be printed when student does not indicate which test form was taken on a test with multiple forms ( the letter indicates the key used in scoring the student’s test)

 

 

***ID Roster***

 

The ID roster is identical to the name roster except the student names and warning flags are not included.  An example is shown below. 

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

 

ID SCORE ROSTER

SECTION 500

 

 

 

 

PERCENTILE
STANDARD
PERCENT
ID NUMBER
SCORE

RANK

SCORE

CORRECT

 

 

 

 

 

235051890

15

63

54

75

360983706

15

63

54

75

432112345

16

75

57

80

538626345

13

38

47

65

 

    543626835

19

95

70

95

 

The identification numbers are listed in ascending order.  When an ID was not entered or the ID was invalid, a question mark will be shown under “ID NUMBER.”

 

The roster shown above is an example of the report when analyses by section are requested with the ID roster.  The section number, in this case 500, is listed at the top of each section roster.  A roster for students that did not fill in a section number or who supplied an invalid number will also be printed.  Section summaries and test flags are not printed on ID or ID/Error rosters. 

 

 

***Name/Error Roster***

 

The name/error roster is an alphabetical listing which includes student name, ID number, warning flags, test form, score, percent correct, and a listing of all the student’s incorrect responses.  An example of this roster is printed below.

  

                         NAME/ERROR ROSTER

 

NAME

                            ID                  FLAGS

F
SCORE

PERCENT

OMITTED OR

 

 

 

NUMBER

 

M

 

CORRECT

IN CORRECT ITEMS

 

 

 

 

 

 

 

 

 

?

 

436925814

 

C

18

90

8A 14B

 

AGGIE A N M

369258147

*M*

B

14

70

2C 8? 10? 12A 14C 18C

BALOO HULL A

581470369

*0*

B

9

45

2B 8E 10C 11 15C

 

 

 

 

 

 

 

16 D 17 18A 20

BAUM ADAM

 

?

 

C

20

100

GOT ALL ITEMS CORRECT

TANLINE MAGGIE

325476981

 

A

17

85

2C 11B 19

 

TERIST MILLER

357913579

 

A

17

85

2C 11B 19

 

 

 

The error listing shows the item number and response for the items that the student did not receive full credit.  In the example, Miller Terist chose response “C” on item 2.  The correct answer was “A.” and he received no points for his answer.  He also did not answer item 19.  This is indicated by the blank following the item number.  On items with several weighted responses, only the answers which give the maximum number of points for the item are considered correct.  Responses which are awarded partial credit will be listed in the error listing.

 

Question marks appear in three places on this roster.  When a name or ID number is not entered, a question mark will be reported.  The question mark in the error listing for A.N.M.  Aggie means that the student selected more than one option as a response to the item.  The program will score such items as incorrect and award no points.

 

This roster is useful in the event the answer key is changed after the tests have been scored.  An instructor can read through the error listing and modify scores without rescanning the entire test.  It can also be helpful when cheating is suspected.  An examination of the errors of students thought to be copying from one another can provide evidence of collusion.

 

 

 

***ID/Error Roster***

 

The ID/Error roster lists student scores and errors by ID number, and is used primarily for posting test scores for large sections.  An example of the roster is shown below.

                                

                           ID/ERROR ROSTER

                      FOR SELECTED STUDENTS

 

 

 

 

 

 

ID

TEST

 

PERCENT
OMITTED OR INCORRECT ITEMS

NUMBER

FORM

SCORE

CORRECT

 

 

 

 

 

 

?

 

20

100

GOT ALL ITEMS CORRECT

325476981

 

17

85

2C 11B 19

357913579

 

17

85

2C 11B 20

369258147

 

14

70

2C 8? 10C 12A 14C 18C

436925814

 

18

90

8A 14B

581470369

 

9

45

2B 3A 8E 10C 15C 16D 17 18A

 

 

 

 

19A 20C

 

With the exception of the student’s name, this roster contains the same information as the Name/Error roster.  When this roster is used for posting grades, it’s a good idea for the instructor to have a Name/Error roster for a cross reference.

 

The roster shown above is an example of the report which lists information. Notice that the student with a raw score of 14 missed item 12, even though one point was awarded for selecting option “A.” This is because maximum credit possible on the item was two (awarded for answering “C” )

 

The information under “TEST FORM” is not listed on this roster because only one test form was used.  If more than one form was used, the form information would be shown.

 

***Individual Score Reports***

 

An individual score report is a 3 Ľ” by 7 ˝” card or slip of paper that describes each student’s performance.  It contains the same information listed in the Name/Error roster.  It was designed to be handed out in class.  An example is shown below.

 

 

EXAMPLE  JUSTIN                                                                                                         ID No.  495209902

 

FM PROFNAME         Course TST 123          Section 456          Test No.  07                       Date 11/12/04

Test Form: B           Test score: 17        Maximum Possible :20                                       Percent Correct:85

 

***MISSED ITEMS (ITEM NUMBER – CORRECT ANSWER/YOUR ANSWER)***

 

4 – E/B           12 – M/D       17  C/

 

 

 

M:More than one correct answer.  Option marked was not the best.

 

 

 

The information on incorrect items lists the number of the item missed, the correct answer, and the student’s response.  In the example, the correct answer for item 4 was “E” and this student chose option “B.” Item 12 was a multiple key item and the correct answer is represented by an “M.” The student, Justin Example, chose response “D.” which was not one of the correct answers.  In this report, like the roster which included error listings, responses which do not receive full credit are included which with the missed items.  The third item missed in the example is a result of the student not answering the item.  A blank is shown as the student’s response to indicate that item 17 was omitted.

 

The scanning operator will decide whether to print the individual score reports in our office or at the computing center. 

 

 

      11/12/04   

 

Back to the top

Back to SCANNING

Back to MARS