The New Jersey Department of Education has released two reports that evaluate the status of the first-year pilot for evaluating teachers. One was prepared by the Evaluation Pilot Advisory Committee and the other by the Rutgers University Graduate School of Education.
Here’s the memo that was sent out to all chief school administrators and charter school leaders.
After the passage of TEACHNJ, the Legislature's reform of teacher and principal evaluation and tenure law, the DOE selected ten districts to participate in a pilot program for the teacher evaluations. Districts, on a very short time line, selected teacher evaluation rubrics (most chose Charlotte Danielson’s model) and started the time-sucking practice of lengthy, data-driven, teacher evaluations.
Both reports praise the commitment of teacher and administrators as they pioneer a framework for fairly evaluating teachers based largely on student growth. They reference the difficulty of changing a culture where all teachers are deemed above average, and note that this endeavor is, by definition, a long process. Pilot districts have devoted enormous time and resources to extensive professional development, collaboration, and implementation.
Here’s a few take-aways.
Principals are overwhelmed by the time involved in performing the evaluations. From the Rutgers’ report:
It is hard to see how the new expectations for teacher observation can be sustained and still meet the intent of the law if each observation is as time consuming as has been the case, if extensive numbers of observations are required, if the pool of observers is limited to current school administrators, and if those administrators must continue to do everything they are now doing.
The EPAC study also notes that districts were constrained by the DOE’s late notice to pilot districts of their grant awards.
Administrators are gung-ho about the efficacy of the evaluation rubrics. Teachers? Not so much. From the Rutgers’ report:
[A]dministrators generally had a more positive view of these evaluation rubrics than did teachers. For instance, 74% of administrators agreed that the evaluation rubrics assessed teachers accurately, as did 32% of the teachers. Similarly, 75% of administrators agreed that the rubrics generated information that provided useful individual feedback or guidance for professional development, as did 53% of teachers. These differences may not be surprising given that teachers are being evaluated and administrators are not, but the patterns persist across many items.
For more on this, see Blue Jersey’s commentary.
The DOE hasn’t given enough guidance on how to evaluate teachers who are assigned to non-tested areas and subjects. From EPAC: “Going forward, and as soon as possible, the Department should provide clear guidance to districts in these areas.”
Most disturbingly, both the EPAC and Rutgers reports point out that while teachers are now to be graded on a four-point scale (different language is used in different rubrics, but it’s something like ineffective, partially effective, effective, and highly effective) evaluators mostly gave scores in the top two categories. This may be a result of a slowly changing culture or a tentativeness with new evaluator instruments, but, as Rutgers points out, the recent MET study (also referenced in the DOE press release) did not show the same bias.
EPAC:
While some differentiation between teacher performance levels occurred in some districts, in the majority, there was a heavy weighting towards the effective and highly effective ratings. All of these figures show that a diminishingly small number of teachers were given an ineffective rating. Districts A and C had no teachers earn this rating (the zeroes are not rounded figures). In half of the districts for which there is data, no teachers were rated ineffective. In all cases, teacher ratings skew heavily towards the upper two categories.
Rutgers:
Overall, there was a significant range in how districts utilized the rubric. For example, in one district, 60% of teachers received the highest score, while in other districts, only 6% received the top score. However, the modal score was 3 in all but one district, indicating that the vast majority of teachers were judged using a term such as proficient. Few teachers in most districts received a score of 2 or below. In most districts, scores are clustered at 1 or 2 points that are relatively high on the scale. However, there are a substantial number of scores given in the lower part of the scale, especially for one district.
Hey, maybe all our teachers really are great.
However, if this new tenure law – intended to identify great teachers and dismiss terrible ones -- dissolves into the same desultory practice of rating everyone “effective” and “highly effective,” then we’re left with the same system that the new legislation was designed to dismantle. Talk about a time-suck.
From Rutgers:
With at least two local teacher associations among the pilot districts challenging the legitimacy of the new system for teacher evaluation, the use of the new program for dismissal or other personnel decisions would likely be extremely tenuous for administrators this year. There was also evidence that administrators, especially principals, truly believe that all of their teachers are effective and that personnel decisions need not to be made. This mindset would also limit the use of the teacher evaluation tool for personnel decisions.
Labels: DOE, tenure, VAM