Introduction
Student ratings of faculty are being used increasingly by faculty, students and administration for making formative and summative decisions about teaching effectiveness. Two decades ago, only about ten percent of American colleges and universities relied on student ratings. Today, most institutions use some form of student ratings. In many cases they are the only tangible evidence of teaching performance. The increase in use of student ratings comes, not coincidentally, at a time when state legislatures and other funding sources are giving closer scrutiny to the teaching missions and effectiveness of higher education institutions and demanding greater accountability from them. For many faculty, this is a cause of some concern. The student ratings are viewed at times as valid or invalid, reliable or unreliable, and useful or useless. Moreover, faculty at major research universities have traditionally acquired and held their positions on the basis of their scholarship. To now be evaluated on their teaching is, to some, a change in the ground rules.
At our university, a large, land-grant university, student ratings of faculty are fairly recent, having been mandated less than five years ago, and still incompletely implemented. The percentage of organized courses systematically rated by students has grown from less than two percent to almost seventy percent in eight semesters. This means that many faculty have still not had their fears of the process allayed.
The major concerns of faculty are:
1. Ratings will be lower in some courses because of the nature of the courses themselves. Specifically, ratings are lower in lower level courses which are often required, and ratings are lower in courses with large sections.
2. Good instruction and good research go hand in hand so it is unnecessary to evaluate them separately. Some faculty would argue that they should only be evaluated by colleagues with excellent publication records.
The concerns of students are:
2. Faculty are too concerned with research to the detriment of their teaching effectiveness.
The concern of the university administration is:
Research on all of these questions is mixed. With regard to the extraneous variables, Aleamoni and Hexner (1980) cited eight studies which support the belief that lower ratings are associated with larger classes but seven studies that found no such relationship. In the same review, they cited eight studies that found no relationship between student ratings and course level, but eighteen that found higher ratings for graduate and upper division courses. Aleamoni (1981) cited five studies that found higher ratings for faculty of higher rank and five other studies that showed no relationship between ratings and rank.
On the issue of research productivity, McDaniel and Feldhusen (1970) found significant correlations between productivity and ratings. Their correlations ranged from -.3 to .15 if the instructor was the first author, and from .00 to .33 if the instructor was the second author. Aleamoni and Yimer (1973) did not find any significant correlations. Their correlations ranged from .00 to .16.
This study attempted to determine the relationship between student ratings and the variables of class size, course level, faculty rank, faculty productivity, and intended use of the ratings.
The student ratings system is a Cafeteria Model (Derry, Seibert, and Starry, 1974) which permits students, deans, department heads, and faculty to take part by selecting items to be used on the ratings form. Students participated through the Student Senate which worked with the Faculty Senate to develop the student ratings system. Student Senate representatives wrote five global items which were intended to be the core items on all forms. The ratings on these five global items (Table 1) are the dependent variables in this study. In practice, deans and department heads are the primary stakeholders for student ratings. Their involvement is to select up to twenty items from the item bank, including the five student items. These items serve the summative evaluation function of the process. If less than twenty items are selected, faculty may have the option of selecting additional items for formative evaluation.
A random sample of 150 courses was drawn from all courses in the colleges of Liberal Arts, Architecture, Business, and Agriculture that were rated in Fall 1991. The student ratings forms in these colleges include the five global items. The class sizes ranged from a minimum of 2 to a maximum of 215. The mean class size was 33.3. Course levels were: 100 level = 28, 200 level = 8, 300 level = 40, 400 level = 22, 600 level = 52. Instructor ranks were: graduate assistant (GATT) = 19, lecturer/instructor = 36, assistant professor = 35, associate professor = 33, professor = 27. Not all departments represented in the sample allow their faculty to select items. Faculty had the option of selecting items for formative evaluation in 118 of the courses and 28 actually selected items.
Deans and department heads receive the results of the global, student items along with the results of the items they selected. The instructors receive the results of all the items. Since the results of the faculty-selected items are not reported to the deans and department heads, their use is purely formative. It can be presumed that faculty who do select additional items are interested in more information to improve their teaching. At the very least, they are more involved in the process of course evaluations than faculty who do not select items. It can also be assumed that they find the student ratings useful and plan to use the information. Thus, the selection of items was used in this analysis as a variable which identifies instructors who find the student ratings useful for making formative decisions.
Productivity was determined by searching bibliographic databases for the publication records of faculty in the sample. The searches were restricted to the three year period of 1989 to 1991 so that research would be current with instruction and so that the effect of career length would be minimized. The databases are listed in Table 2. Each publication was assigned weight according to its type, based on the weighting scheme used by Aleamoni and Yimer (1973). In their system, an authored book was given a weight of 15, an edited book a weight of 9, an article a weight of 3, and a book review a weight of 2. A summary of publications by rank is presented in Table 3.
| Table 1. Core items rated by students. | |
|---|---|
| repeat | I would take another course from this professor. |
| fair grade | The exams were presented and graded fairly. |
| work amt | The amount of work and/or reading was reasonable for the credit hours received in the course. |
| effective | I believe this instructor was an effective teacher. |
| help | Help was readily available for questions and/or homework outside of class. |
| Table 2. Databases searched |
|---|
| Agricola |
| ABI/Inform |
| Arts & Humanities Index |
| American History & Life |
| Books in Print |
| MLA Index |
| Psych Lit |
| Science Citation Index |
| Social Science Citation Index |
| Sociofile |
| Wilson Periodical Indexes |
| Table 3. Publications (authored books + edited books + articles + book reviews) by rank | |||||
|---|---|---|---|---|---|
| Rank | GAT | Lect/inst | Asst | Assoc | Prof |
| Median | 0 | 0 | 1 | 3 | 2 |
| Maximum | 2 | 6 | 13 | 22 | 22 |
Distributions of means. The distributions of mean ratings by rank are presented in Figures 1a through 1e for each item:
Figure 1a: Repeat * Rank
Figure 1b: Fair grade * Rank
Figure 1c: Work Amount * Rank
Figure 1d: Effective * Rank
Figure 1e: Help * Rank
| Table 4. Summary statistics for faculty rank | |||||
|---|---|---|---|---|---|
| REPEAT | |||||
| Associate Professor | Assistant Professor | Graduate Assistant Teacher | Lecturer/ Instructor | Professor | |
| Medians | 4 | 4.32 | 4.22 | 4.28 | 4.24 |
| Cases | 33 | 35 | 19 | 36 | 27 |
| Min | 2.67 | 2.40 | 3 | 2.25 | 2 |
| Max | 4.91 | 5 | 4.91 | 5 | 5 |
| 25th%ile | 3.60 | 3.66 | 3.92 | 3.63 | 3.60 |
| 75th%ile | 4.50 | 4.68 | 4.52 | 4.74 | 4.50 |
| FAIR GRADE | |||||
| Associate Professor | Assistant Professor | Graduate Assistant Teacher | Lecturer/ Instructor | Professor | |
| Medians | 4.23 | 4.07 | 4.17 | 4.27 | 4.32 |
| Cases | 33 | 35 | 19 | 36 | 27 |
| Min | 3 | 2.40 | 3.38 | 2.80 | 2.60 |
| Max | 5 | 5 | 4.63 | 5 | 5 |
| 25th%ile | 3.83 | 3.56 | 3.95 | 3.82 | 3.76 |
| 75th%ile | 4.50 | 4.52 | 4.36 | 4.65 | 4.64 |
| WORK AMOUNT | |||||
| Associate Professor | Assistant Professor | Graduate Assistant Teacher | Lecturer/ Instructor | Professor | |
| Medians | 4.15 | 4.12 | 4.15 | 4.26 | 4.12 |
| Cases | 33 | 35 | 19 | 36 | 27 |
| Min | 2.69 | 1.94 | 3.44 | 3 | 2.67 |
| Max | 4.67 | 4.89 | 4.51 | 5 | 4.93 |
| 25th%ile | 3.89 | 3.50 | 3.89 | 3.82 | 3.50 |
| 75th%ile | 4.38 | 4.53 | 4.36 | 4.60 | 4.52 |
| EFFECTIVE | |||||
| Associate Professor | Assistant Professor | Graduate Assistant Teacher | Lecturer/ Instructor | Professor | |
| Medians | 4.23 | 4.36 | 4.42 | 4.33 | 4.31 |
| Cases | 33 | 35 | 19 | 36 | 27 |
| Min | 2.67 | 2.73 | 3.36 | 2.50 | 3 |
| Max | 5 | 5 | 4.92 | 5 | 5 |
| 25th%ile | 3.94 | 3.89 | 4.24 | 3.88 | 3.79 |
| 75th%ile | 4.62 | 4.67 | 4.60 | 4.77 | 4.64 |
| HELP | |||||
| Associate Professor | Assistant Professor | Graduate Assistant Teacher | Lecturer/ Instructor | Professor | |
| Medians | 4.13 | 4.21 | 4.34 | 4.37 | 4 |
| Cases | 33 | 35 | 19 | 36 | 27 |
| Min | 3.27 | 2.33 | 3.13 | 3.30 | 3.27 |
| Max | 5 | 5 | 4.75 | 5 | 4.93 |
| 25th%ile | 3.87 | 3.83 | 4.07 | 4.04 | 3.84 |
| 75th%ile | 4.50 | 4.50 | 4.52 | 4.49 | 4.64 |
The distributions of mean ratings by course level are presented in Figures 2a through 2e for each item:
Figure 2a: Repeat *Level
Figure 2b: Fair grade *Level
Figure 2c: Work Amount * Level
Figure 2d: Effective * Level
Figure 2e: Help * Level
| Table 5. Summary statistics for Course Level | |||||
|---|---|---|---|---|---|
| REPEAT | |||||
| 100 Level | 200 Level | 300 Level | 400 Level | 600 Level | |
| Medians | 4.32 | 4.53 | 3.66 | 4.07 | 2.14 |
| Cases | 28 | 8 | 40 | 22 | 52 |
| Min | 2.58 | 3.09 | 2.25 | 3.56 | 2 |
| Max | 4.92 | 4.82 | 4.90 | 5 | 5 |
| 25th%ile | 4 | 3.44 | 3.36 | 3.77 | 3.84 |
| 75th%ile | 4.65 | 4.80 | 4.27 | 4.67 | 4.68 |
| FAIR GRADE | |||||
| 100 Level | 200 Level | 300 Level | 400 Level | 600 Level | |
| Medians | 4.34 | 4.32 | 3.92 | 4.23 | 4.24 |
| Cases | 28 | 8 | 40 | 22 | 52 |
| Min | 2.52 | 3.07 | 2.60 | 3.73 | 2.40 |
| Max | 4.88 | 4.77 | 4.89 | 5 | 5 |
| 25th%ile | 4.09 | 3.60 | 3.69 | 3.92 | 3.71 |
| 75th%ile | 4.57 | 4.48 | 4.33 | 4.67 | 4.57 |
| WORK AMOUNT | |||||
| 100 Level | 200 Level | 300 Level | 400 Level | 600 Level | |
| Medians | 4.27 | 4.34 | 3.92 | 4.18 | 4.17 |
| Cases | 28 | 8 | 40 | 22 | 52 |
| Min | 1.94 | 3.38 | 2.29 | 2.94 | 2 |
| Max | 4.67 | 4.76 | 4.85 | 4.89 | 5 |
| 25th%ile | 4 | 3.58 | 3.52 | 3.85 | 3.56 |
| 75th%ile | 4.42 | 4.59 | 4.30 | 4.51 | 4.57 |
| EFFECTIVE | |||||
| 100 Level | 200 Level | 300 Level | 400 Level | 600 Level | |
| Medians | 4.47 | 4.62 | 4 | 4.24 | 4.48 |
| Cases | 28 | 8 | 40 | 22 | 52 |
| Min | 3.20 | 3.36 | 2.50 | 3.88 | 2.67 |
| Max | 4.96 | 4.87 | 4.90 | 5 | 5 |
| 25th%ile | 4.19 | 3.86 | 3.64 | 3.99 | 3.96 |
| 75th%ile | 4.70 | 4.84 | 4.45 | 4.56 | 4.67 |
| HELP | |||||
| 100 Level | 200 Level | 300 Level | 400 Level | 600 Level | |
| Medians | 4.35 | 4.23 | 4.03 | 4.26 | 4.32 |
| Cases | 28 | 8 | 40 | 22 | 52 |
| Min | 3.49 | 3.51 | 3.13 | 3.33 | 2.33 |
| Max | 4.75 | 4.67 | 4.74 | 5 | 5 |
| 25th%ile | 4.04 | 3.66 | 3.79 | 3.90 | 3.98 |
| 75th%ile | 4.53 | 4.61 | 4.40 | 4.48 | 4.67 |
Correlations. Correlations between mean ratings and class size are presented in Table 6 for each item. Considering the total sample, the correlations range from -.249 to .093. On only one item -- Help was readily available for questions and/or homework outside of class (help) -- was the correlation significantly different from zero. Four of the five correlations were negative, indicating higher ratings with smaller class sizes. Ratings and class size were then correlated separately for instructors who did not select items and those who did. For those who did not select items, the pattern of correlations is the same as for the entire sample; for those who selected items, however, the correlations are all near zero but positive.
On the question of availability of help, a significant negative correlation with class size (-.305) disappears for faculty who are interested in more feedback from students (.045). Similar results can be seen when only those courses in which faculty had the option to select items were considered (-.245/.045) and when only courses taught by non-tenured ranks (GATT, lecturers/instructors) were considered (-.383/.160). For tenured ranks, the correlation between ratings on help and class size goes from significant negative (-.30) to non-significant negative (-.106).
In general, the effect of selecting items is to increase the correlation between class size and ratings. On only one pair of correlations is the effect in the opposite direction. A sign test of all twenty pairs of correlations was significant (p<.001). Thus, the negative effect of increasing class size on teaching effectiveness in the eyes of students is erased by faculty who approach student ratings as formative.
| Table 6. Correlations between class size and ratings of the course on each item. | ||||||
|---|---|---|---|---|---|---|
| N | repeat | fair grade | work amt | effective | help | |
| Total Sample | 150 | -0.052 | -0.053 | 0.093 | -0.048 | -0.249* |
| No selection | 122 | -0.112 | -0.108 | 0.072 | -0.108 | -0.304* |
| Select items | 28 | 0.200 | 0.150 | 0.203 | 0.160 | 0.045 |
| Courses with option to select | 118 | 0.080 | 0.052 | 0.137 | 0.044 | -0.175 |
| No selection | 90 | 0.049 | 0.023 | 0.127 | 0.009 | -0.245* |
| Select items | 28 | 0.200 | 0.150 | 0.203 | 0.160 | 0.045 |
| Tenured ranks | 95 | -0.182 | -0.103 | 0.041 | -0.156 | -0.281* |
| No selection | 78 | -0.227* | -0.146 | 0.031 | -0.205 | -0.300* |
| Select items | 17 | -0.050 | 0.034 | 0.029 | -0.019 | -0.106 |
| Non-tenure ranks | 55 | 0.238 | 0.058 | 0.216 | 0.173 | -0.207 |
| No selection | 44 | 0.192 | -0.012 | 0.183 | 0.136 | -0.383* |
| Select items | 11 | 0.384 | 0.234 | 0.347 | 0.279 | 0.160 |
Correlations between mean ratings and instructor's publications are presented for each item in Table 7. All but one of the correlations are positive but most are non-significant. The number of publications correlates significantly with fair grading for the total sample, for courses with the option to select items, and for courses taught by tenure-rank faculty. For non-tenure ranks, publications correlate significantly with ratings on two items -- repeat and effective teacher.
The overall effect of selecting items is to increase the correlation between publications and ratings of effectiveness. For only one pair of correlations is the effect in the opposite direction. A sign test of all twenty pairs of correlations was significant (p<.001).
| Table 7. Correlations between publications and ratings of the course on each item. | ||||||
|---|---|---|---|---|---|---|
| N | repeat | fair grade | work amt | effective | help | |
| Total Sample | 150 | 0.135 | 0.187* | 0.108 | 0.116 | 0.066 |
| No selection | 122 | 0.117 | 0.16 | 0.099 | 0.093 | 0.004 |
| Select items | 28 | 0.168 | 0.262 | 0.107 | 0.153 | 0.335 |
| Courses with option to select | 118 | 0.143 | 0.198* | 0.15 | 0.14 | 0.085 |
| No selection | 90 | 0.119 | 0.165 | 0.145 | 0.114 | -0.004 |
| Select items | 28 | 0.168 | 0.262 | 0.107 | 0.153 | 0.335 |
| Tenured ranks | 95 | 0.171 | 0.254* | 0.163 | 0.149 | 0.128 |
| No selection | 78 | 0.139 | 0.215 | 0.139 | 0.117 | 0.085 |
| Select items | 17 | 0.246 | 0.366 | 0.212 | 0.209 | 0.399 |
| Non-tenure ranks | 55 | 0.304* | 0.23 | 0.259 | 0.288* | 0.125 |
| No selection | 44 | 0.298* | 0.201 | 0.212 | 0.289 | 0.02 |
| Select items | 11 | 0.421 | 0.415 | 0.554 | 0.402 | 0.543 |
No mean differences in the ratings by course level should be welcome news to faculty who fear that having to teach lower level courses will adversely affect their ratings. Likewise no mean difference by instructor rank should be welcome news. In general, it is hoped that extraneous variables will have a neglible impact on ratings of teaching effectiveness. Still, some caveats on the results are in order. First, rank and level are partially confounded in that GATs and lecturer/instructors teach fewer upper level and graduate courses than do higher ranked faculty. Second, Course level is partially confounded with class size. Thus, independent estimates of each effect are nearly impossible to obtain.
One extraneous variable that appears to be important is class size. It stands to reason that large classes will be less favorably rated than small ones. It is not surprising that there is a significant negative correlation between class size and the rating of availability of help. It is encouraging that the correlation becomes almost zero when faculty select items. The interpretation we favor is that these instructors are more analytical about their teaching and find the student ratings useful in improving it. The major caveat to be offered is that the sample becomes very small when only instructors who select items are considered.
The mostly positive correlations between publications and student ratings suggests that teaching and research are not incompatible activities but complimentary forms of scholarship. Although few of the correlations are significant, due in part to the small sample sizes, certain correlations stand out, particularly the correlations between publications and ratings of the fairness of grading. Perhaps instructors whose own scholarship has been evaluated and rated more often are, themselves, better evaluators of other's scholarship. The other significant correlations worth noting are those for non-tenure ranked faculty. The GATs and lecturer/instructors are hired to teach. For them, doing research is extra effort. Although the number of publications in this group is small, the positive correlations suggest that building scholarly credentials through research carries over to classroom teaching.