Teaching or Research? The relationship between scholarly productivity and students' judgments of teaching.

A Paper presented at the Annual Meeting of the American Evaluation Association, Seattle, 1992.

Authors:

Mark E. Troy
Measurement and Research Services
257 Bizzell Hall West
Texas A&M University
College Station, Texas 77843-4239
Phone: 409-845-0532
Fax: 409-847-8666
Katherine Friedrich
Department of Educational Psychology
Texas A&M University
Mary F. Troy
Office of International Programs Coordination
Texas A&M University

Introduction

Student ratings of faculty are being used increasingly by faculty, students and administration for making formative and summative decisions about teaching effectiveness. Two decades ago, only about ten percent of American colleges and universities relied on student ratings. Today, most institutions use some form of student ratings. In many cases they are the only tangible evidence of teaching performance. The increase in use of student ratings comes, not coincidentally, at a time when state legislatures and other funding sources are giving closer scrutiny to the teaching missions and effectiveness of higher education institutions and demanding greater accountability from them. For many faculty, this is a cause of some concern. The student ratings are viewed at times as valid or invalid, reliable or unreliable, and useful or useless. Moreover, faculty at major research universities have traditionally acquired and held their positions on the basis of their scholarship. To now be evaluated on their teaching is, to some, a change in the ground rules.

At our university, a large, land-grant university, student ratings of faculty are fairly recent, having been mandated less than five years ago, and still incompletely implemented. The percentage of organized courses systematically rated by students has grown from less than two percent to almost seventy percent in eight semesters. This means that many faculty have still not had their fears of the process allayed.

The major concerns of faculty are:

The concerns of students are:

The concern of the university administration is:

Research on all of these questions is mixed. With regard to the extraneous variables, Aleamoni and Hexner (1980) cited eight studies which support the belief that lower ratings are associated with larger classes but seven studies that found no such relationship. In the same review, they cited eight studies that found no relationship between student ratings and course level, but eighteen that found higher ratings for graduate and upper division courses. Aleamoni (1981) cited five studies that found higher ratings for faculty of higher rank and five other studies that showed no relationship between ratings and rank.

On the issue of research productivity, McDaniel and Feldhusen (1970) found significant correlations between productivity and ratings. Their correlations ranged from -.3 to .15 if the instructor was the first author, and from .00 to .33 if the instructor was the second author. Aleamoni and Yimer (1973) did not find any significant correlations. Their correlations ranged from .00 to .16.

This study attempted to determine the relationship between student ratings and the variables of class size, course level, faculty rank, faculty productivity, and intended use of the ratings.

The student ratings system.

The student ratings system is a Cafeteria Model (Derry, Seibert, and Starry, 1974) which permits students, deans, department heads, and faculty to take part by selecting items to be used on the ratings form. Students participated through the Student Senate which worked with the Faculty Senate to develop the student ratings system. Student Senate representatives wrote five global items which were intended to be the core items on all forms. The ratings on these five global items (Table 1) are the dependent variables in this study. In practice, deans and department heads are the primary stakeholders for student ratings. Their involvement is to select up to twenty items from the item bank, including the five student items. These items serve the summative evaluation function of the process. If less than twenty items are selected, faculty may have the option of selecting additional items for formative evaluation.

Sample

A random sample of 150 courses was drawn from all courses in the colleges of Liberal Arts, Architecture, Business, and Agriculture that were rated in Fall 1991. The student ratings forms in these colleges include the five global items. The class sizes ranged from a minimum of 2 to a maximum of 215. The mean class size was 33.3. Course levels were: 100 level = 28, 200 level = 8, 300 level = 40, 400 level = 22, 600 level = 52. Instructor ranks were: graduate assistant (GATT) = 19, lecturer/instructor = 36, assistant professor = 35, associate professor = 33, professor = 27. Not all departments represented in the sample allow their faculty to select items. Faculty had the option of selecting items for formative evaluation in 118 of the courses and 28 actually selected items.

Deans and department heads receive the results of the global, student items along with the results of the items they selected. The instructors receive the results of all the items. Since the results of the faculty-selected items are not reported to the deans and department heads, their use is purely formative. It can be presumed that faculty who do select additional items are interested in more information to improve their teaching. At the very least, they are more involved in the process of course evaluations than faculty who do not select items. It can also be assumed that they find the student ratings useful and plan to use the information. Thus, the selection of items was used in this analysis as a variable which identifies instructors who find the student ratings useful for making formative decisions.

Productivity

Productivity was determined by searching bibliographic databases for the publication records of faculty in the sample. The searches were restricted to the three year period of 1989 to 1991 so that research would be current with instruction and so that the effect of career length would be minimized. The databases are listed in Table 2. Each publication was assigned weight according to its type, based on the weighting scheme used by Aleamoni and Yimer (1973). In their system, an authored book was given a weight of 15, an edited book a weight of 9, an article a weight of 3, and a book review a weight of 2. A summary of publications by rank is presented in Table 3.

Table 1.

Table 1. Core items rated by students.
repeatI would take another course from this professor.
fair gradeThe exams were presented and graded fairly.
work amtThe amount of work and/or reading was reasonable for the credit hours received in the course.
effectiveI believe this instructor was an effective teacher.
helpHelp was readily available for questions and/or homework outside of class.

Table 2.

Table 2. Databases searched
Agricola
ABI/Inform
Arts & Humanities Index
American History & Life
Books in Print
MLA Index
Psych Lit
Science Citation Index
Social Science Citation Index
Sociofile
Wilson Periodical Indexes

Table 3.

Table 3. Publications (authored books + edited books + articles + book reviews) by rank
RankGATLect/instAsstAssocProf
Median00132
Maximum26132222

Results

Distributions of means. The distributions of mean ratings by rank are presented in Figures 1a through 1e for each item:

The values are displayed in Table 4. The shading on the figures indicates the 95% confidence interval around the median. It can be seen from the overlap of the shaded areas that there are no significant differences in ratings by instructor's rank. Table 4.
Table 4. Summary statistics for faculty rank
REPEAT
Associate ProfessorAssistant ProfessorGraduate Assistant TeacherLecturer/ InstructorProfessor
Medians 4 4.32 4.22 4.28 4.24
Cases 33 35 19 36 27
Min 2.67 2.40 3 2.25 2
Max 4.91 5 4.91 5 5
25th%ile 3.60 3.66 3.92 3.63 3.60
75th%ile 4.50 4.68 4.52 4.74 4.50
FAIR GRADE
Associate ProfessorAssistant ProfessorGraduate Assistant TeacherLecturer/ InstructorProfessor
Medians 4.23 4.07 4.17 4.27 4.32
Cases 33 35 19 36 27
Min 3 2.40 3.38 2.80 2.60
Max 5 5 4.63 5 5
25th%ile 3.83 3.56 3.95 3.82 3.76
75th%ile 4.50 4.52 4.36 4.65 4.64
WORK AMOUNT
Associate ProfessorAssistant ProfessorGraduate Assistant TeacherLecturer/ InstructorProfessor
Medians 4.15 4.12 4.15 4.26 4.12
Cases 33 35 19 36 27
Min 2.69 1.94 3.44 3 2.67
Max 4.67 4.89 4.51 5 4.93
25th%ile 3.89 3.50 3.89 3.82 3.50
75th%ile 4.38 4.53 4.36 4.60 4.52
EFFECTIVE
Associate ProfessorAssistant ProfessorGraduate Assistant TeacherLecturer/ InstructorProfessor
Medians 4.23 4.36 4.42 4.33 4.31
Cases 33 35 19 36 27
Min 2.67 2.73 3.36 2.50 3
Max 5 5 4.92 5 5
25th%ile 3.94 3.89 4.24 3.88 3.79
75th%ile 4.62 4.67 4.60 4.77 4.64
HELP
Associate ProfessorAssistant ProfessorGraduate Assistant TeacherLecturer/ InstructorProfessor
Medians 4.13 4.21 4.34 4.37 4
Cases 33 35 19 36 27
Min 3.27 2.33 3.13 3.30 3.27
Max 5 5 4.75 5 4.93
25th%ile 3.87 3.83 4.07 4.04 3.84
75th%ile 4.50 4.50 4.52 4.49 4.64

The distributions of mean ratings by course level are presented in Figures 2a through 2e for each item:

. The values are displayed in TABLE 5. On each item, the median rating of 300 level courses is the lowest, but the differences are not significant. Table 5.
Table 5. Summary statistics for Course Level
REPEAT
100 Level200 Level300 Level400 Level600 Level
Medians 4.32 4.53 3.66 4.07 2.14
Cases 28 8 40 22 52
Min 2.58 3.09 2.25 3.56 2
Max 4.92 4.82 4.90 5 5
25th%ile 4 3.44 3.36 3.77 3.84
75th%ile 4.65 4.80 4.27 4.67 4.68
FAIR GRADE
100 Level200 Level300 Level400 Level600 Level
Medians 4.34 4.32 3.92 4.23 4.24
Cases 28 8 40 22 52
Min 2.52 3.07 2.60 3.73 2.40
Max 4.88 4.77 4.89 5 5
25th%ile 4.09 3.60 3.69 3.92 3.71
75th%ile 4.57 4.48 4.33 4.67 4.57
WORK AMOUNT
100 Level200 Level300 Level400 Level600 Level
Medians 4.27 4.34 3.92 4.18 4.17
Cases 28 8 40 22 52
Min 1.94 3.38 2.29 2.94 2
Max 4.67 4.76 4.85 4.89 5
25th%ile 4 3.58 3.52 3.85 3.56
75th%ile 4.42 4.59 4.30 4.51 4.57
EFFECTIVE
100 Level200 Level300 Level400 Level600 Level
Medians 4.47 4.62 4 4.24 4.48
Cases 28 8 40 22 52
Min 3.20 3.36 2.50 3.88 2.67
Max 4.96 4.87 4.90 5 5
25th%ile 4.19 3.86 3.64 3.99 3.96
75th%ile 4.70 4.84 4.45 4.56 4.67
HELP
100 Level200 Level300 Level400 Level600 Level
Medians 4.35 4.23 4.03 4.26 4.32
Cases 28 8 40 22 52
Min 3.49 3.51 3.13 3.33 2.33
Max 4.75 4.67 4.74 5 5
25th%ile 4.04 3.66 3.79 3.90 3.98
75th%ile 4.53 4.61 4.40 4.48 4.67

Correlations. Correlations between mean ratings and class size are presented in Table 6 for each item. Considering the total sample, the correlations range from -.249 to .093. On only one item -- Help was readily available for questions and/or homework outside of class (help) -- was the correlation significantly different from zero. Four of the five correlations were negative, indicating higher ratings with smaller class sizes. Ratings and class size were then correlated separately for instructors who did not select items and those who did. For those who did not select items, the pattern of correlations is the same as for the entire sample; for those who selected items, however, the correlations are all near zero but positive.

On the question of availability of help, a significant negative correlation with class size (-.305) disappears for faculty who are interested in more feedback from students (.045). Similar results can be seen when only those courses in which faculty had the option to select items were considered (-.245/.045) and when only courses taught by non-tenured ranks (GATT, lecturers/instructors) were considered (-.383/.160). For tenured ranks, the correlation between ratings on help and class size goes from significant negative (-.30) to non-significant negative (-.106).

In general, the effect of selecting items is to increase the correlation between class size and ratings. On only one pair of correlations is the effect in the opposite direction. A sign test of all twenty pairs of correlations was significant (p<.001). Thus, the negative effect of increasing class size on teaching effectiveness in the eyes of students is erased by faculty who approach student ratings as formative.

Table 6.

Table 6. Correlations between class size and ratings of the course on each item.
Nrepeatfair gradework amteffectivehelp
Total Sample150-0.052-0.0530.093-0.048-0.249*
No selection122-0.112-0.1080.072-0.108-0.304*
Select items280.2000.1500.2030.1600.045
Courses with option to select1180.0800.0520.1370.044-0.175
No selection900.0490.0230.1270.009-0.245*
Select items280.2000.1500.2030.1600.045
Tenured ranks95-0.182-0.1030.041-0.156-0.281*
No selection78-0.227*-0.1460.031-0.205-0.300*
Select items17-0.0500.0340.029-0.019-0.106
Non-tenure ranks550.2380.0580.2160.173-0.207
No selection440.192-0.0120.1830.136-0.383*
Select items110.3840.2340.3470.2790.160
* p < .05. (Sign test of differences in correlations between selection and non-selection (N=20, x = 1) p<.001.)

Correlations between mean ratings and instructor's publications are presented for each item in Table 7. All but one of the correlations are positive but most are non-significant. The number of publications correlates significantly with fair grading for the total sample, for courses with the option to select items, and for courses taught by tenure-rank faculty. For non-tenure ranks, publications correlate significantly with ratings on two items -- repeat and effective teacher.

The overall effect of selecting items is to increase the correlation between publications and ratings of effectiveness. For only one pair of correlations is the effect in the opposite direction. A sign test of all twenty pairs of correlations was significant (p<.001).

Table 7.

Table 7. Correlations between publications and ratings of the course on each item.
Nrepeatfair gradework amteffectivehelp
Total Sample1500.1350.187*0.1080.1160.066
No selection1220.1170.160.0990.0930.004
Select items280.1680.2620.1070.1530.335
Courses with option to select1180.1430.198*0.150.140.085
No selection900.1190.1650.1450.114-0.004
Select items280.1680.2620.1070.1530.335
Tenured ranks950.1710.254*0.1630.1490.128
No selection780.1390.2150.1390.1170.085
Select items170.2460.3660.2120.2090.399
Non-tenure ranks550.304*0.230.2590.288*0.125
No selection440.298*0.2010.2120.2890.02
Select items110.4210.4150.5540.4020.543
* p<.05 (Sign test of differences in correlations between selection and non-selection (N=20, x=1) p<.001)

Discussion

No mean differences in the ratings by course level should be welcome news to faculty who fear that having to teach lower level courses will adversely affect their ratings. Likewise no mean difference by instructor rank should be welcome news. In general, it is hoped that extraneous variables will have a neglible impact on ratings of teaching effectiveness. Still, some caveats on the results are in order. First, rank and level are partially confounded in that GATs and lecturer/instructors teach fewer upper level and graduate courses than do higher ranked faculty. Second, Course level is partially confounded with class size. Thus, independent estimates of each effect are nearly impossible to obtain.

One extraneous variable that appears to be important is class size. It stands to reason that large classes will be less favorably rated than small ones. It is not surprising that there is a significant negative correlation between class size and the rating of availability of help. It is encouraging that the correlation becomes almost zero when faculty select items. The interpretation we favor is that these instructors are more analytical about their teaching and find the student ratings useful in improving it. The major caveat to be offered is that the sample becomes very small when only instructors who select items are considered.

The mostly positive correlations between publications and student ratings suggests that teaching and research are not incompatible activities but complimentary forms of scholarship. Although few of the correlations are significant, due in part to the small sample sizes, certain correlations stand out, particularly the correlations between publications and ratings of the fairness of grading. Perhaps instructors whose own scholarship has been evaluated and rated more often are, themselves, better evaluators of other's scholarship. The other significant correlations worth noting are those for non-tenure ranked faculty. The GATs and lecturer/instructors are hired to teach. For them, doing research is extra effort. Although the number of publications in this group is small, the positive correlations suggest that building scholarly credentials through research carries over to classroom teaching.

References

Aleamoni, L.M. (1981). "Student Ratings of Instruction." In Jason Millman (Ed.) Handbook of Teacher Evaluation. Beverly Hills: Sage Publications.

Aleamoni, L.M. and P.Z. Hexner. (1980). "A review of the research on student evaluations and a report on the effect of different sets of instructions on student course and instructor evaluations." Instructional Science, 9: 67-84.

Aleamoni, L.M. and M. Yimer. (1973). "An investigation of the relationship between colleague rating, student rating, research productivity, and academic rank in rating instructional effectiveness." Journal of Educational Psychology, 64: 274-277.

Derry, J.O., W.F. Seibert & A.R. Starry (1974). The CAFETERIA System: A New Approach to Course and Instructor Evaluation. (IRB 74-1) West Lafayette, IN: Purdue University, Measurement and Research Center.

McDaniel, E.D. and J.F. Feldhusen (1970). "Relationships between faculty ratings and indexes of service and scholarship.." Proceedings of the 78th Annual Convention of the American Psychological Association. 5: 619-620.