Shifting Assessment

I grew up in the small town of Laramie Wyoming and I don’t specifically remember taking a standardized assessment until I was in high school.  Of course, in my classes before that many teachers had used the standard method of a high-stakes summative assessment at the end of a unit or lesson. The process was nearly always the same.  The occasional pre assessment was followed by the lesson or unit with the looming test at the end.  This was a model that I was able to master.  Not the material mind you, but the model.  The priority was put on the grade that I would get and because of this I worried less about the learning goal and more about outperforming others.  This is what research confirms about giving grades in general (Ryan & Deci, 2020).

As I mentioned before, the first standardized test I remember taking was in high school.  This would have been around the year 2002 or my Junior year.  This makes sense with the consideration that 2002 was the year that No Child Left Behind (NCLB) effectively began in schools.  NCLB mandated that states had to give statewide tests once in grades 10 through 12 (Lee, n.d.).  The test I remember must have been part of this mandate.  At the time I took the test I was very anxious because I had put so much importance on the outcome.  However, when we got the test, it was very simplistic.  I clearly remember a test question that asked me, a high school junior, to read and write down the time shown on a clock.  The whole experience was unnerving and somewhat patronizing.

It wasn’t until I got to college that I came into contact with rubrics.  I had chosen to go to school for theater arts and not only had summative assessments in subjects like theater history, but was also being assessed in actual performances.  In order for these performance projects to be effectively assessed, several of my professors had created rubrics. It was a game-changer.  I loved knowing what was expected of me up front and conversely exactly what I had done or not at the end.

Unfortunately, some of the less helpful pieces of my experience are still around.  According to Ryan and Deci in their 2020 report grading is still pervasive in schools around the world and is almost synonymous with school in general, even though there is little evidence of its positive effect.  However, many strides have been made by educators to make grades less important as a concept by using many different formative assessments as well as rubrics for student learning.  

Standardized tests like the one I took in highschool are still very present in our system today.  Some High Stakes Testing (HST) has been very ineffective (Ryan &Deci, 2020), but still is a necessity for large scale data.  Although standardized testing is still hotly debated, sources like the Ohio Department of Education (ODOE) (2016) give actionable and practical guidelines for creating and choosing assessments that are better formulated for students.  The Every Student Succeeds Act (ESSA) brought about many changes from NCLB in testing and school accountability that are well organized in the infographic below from Educators for Excellence (2016) as well.

nclb vs essa.jpg

Overall, in my experience, the purpose and process of assessment has become more well defined since my time in school.  Even in just the last 10 years teaching in my elementary school, the landscape of assessments has leaned to the positive, hopefully leading more students to mastery and understanding.

References

Educators for Excellence. (2016, October 4). Infographic: No Child Left Behind v. The Every Student Succeeds Act [Web log post]. Retrieved from https://e4e.org/blog-news/blog/infographic-no-child-left-behind-v-every-student-succeeds-act

Lee, A. M., JD. (n.d.). No Child Left Behind (NCLB): What You Need to Know [Web log post]. Retrieved from https://www.understood.org/en/school-learning/your-childs-rights/basics-about-childs-rights/no-child-left-behind-nclb-what-you-need-to-know

Ohio Department of Education (ODOE). (2016). A Guide to Using SLOs as a Locally-Determined Measure of Student Growth (Rep. No. Guidebook). Retrieved from https://education.ohio.gov/getattachment/Topics/Teaching/Educator-Evaluation-System/Ohio-s-Teacher-Evaluation-System/Student-Growth-Measures/Student-Learning-Objective-Examples/SLO-Guidebook-041516.pdf.aspx

Ryan, R. M., & Deci, E. L. (2020). Intrinsic and extrinsic motivation from a self-determination theory perspective: Definitions, theory, practices, and future directions. Contemporary Educational Psychology, 61. Retrieved from https://selfdeterminationtheory.org/wp-content/uploads/2020/04/2020_RyanDeci_CEP_PrePrint.pdf

The Philosophical Battle of Creating Assessments

Who should be in charge of creating assessments for Student Learning Objectives (SLO)?  Assessments are at the heart of the SLO process and are the main gauge of student performance evaluated by educators (RSN, 2014).  Therefore, this question is of the utmost importance. There are two main schools of thought on the answer.  One is that standard assessments that have been pre approved should be used by teachers and districts to streamline validity.  The other, a more flexible approach, is that an assessment can be “...any measure that allows students to effectively demonstrate what they know and can do…” (RSN, 2014, p. 14) and can be created by teacher teams and include standardized assessments.  

Let’s look at both of these options in the framework of three important criteria for assessment as laid out by the Ohio Department of Education (2016).  These criteria are (a) Alignment to Standards, (b) Stretch, and (c) Validity and Reliability.  

First, how well do these two schools of thought fit into the need of assessments to align to standards?  This alignment to standards in an assessment would mean that items on the assessment would cover all the standards for that grade or subject, not cover standards outside of the scope of the course or grade, and distribute questions relative to the time spent on each standard (ODOE, 2016).  Pre approved or commercially constructed assessments may have the advantage here as they are inherently based on standards and have content directly based on standards either common core or state level.  However, these pre-created tests may break the last caveat in assessment selection noted above.  Since these tests are produced at a national or state level, they may include or be missing questions pertaining to standards not covered in a course or grade, thereby detracting from the validity of the test itself.

Teacher team or educator created assessments may lack the streamline connection to subject or grade as provided by pre approved tests.  In addition, if teacher teams from different districts and schools all create assessments it will be more complex to measure student growth on the macro scale at a district or state level since there will be no standardized format or list of questions.  This could also be seen as an advantage, however.  Especially in the current landscape of pandemic reconstruction, teachers will know what standards have been prioritized at district and school levels and may be able to create assessments that will better represent the actual standards that have been addressed.  Even without the facet of a pandemic this would be true.

How do these approaches align with the idea of stretch?  Stretch refers to the ability of an assessment to show the growth of the lowest and highest achieving students (ODOE, 2016).  Although many pre approved tests have this in mind, they may be less flexible in this area.  Since they are by definition standard, they may not have the questions that will show the achievement of both the highest and lowest learners.  Another consideration is that often content is not changed for students with different needs, but rather accommodations are created for those students to take the same test.  

In contrast, a teacher team built assessment may have input from teachers of those high and low students, giving the opportunity to create sections and questions that will rigorously examine growth of both groups.  The downfall to avoid here is to create a test specifically geared toward low or high learners as it skews the validity and reliability of the data collected.  Teachers input can also create an assessment that is not simply loaded with accommodations for students, but created with students in mind.

Finally, how do these models stand up to the criteria of reliability and validity?  In other terms, assessments should create consistent results and measure what they are intended to measure (ODOE, 2016).  Assessments can be evaluated for their validity and reliability on four important guideposts.  

  • The assessment should not have overly complicated vocabulary, unless testing reading skills, or have overly complicated language.

  • Test items should be written clearly and concisely.  Performance assessments must have clear steps.

  • Clear rubrics and scoring guides should be provided, especially for performance assessments.

  • Testing conditions should be consistent across classes. (ODOE, 2016).

Pre approved assessments may or may not follow these guideposts.  Many times these assessments have been created by teams that think through the wording of questions endlessly.  This does not mean that there have not been many examples of questions that were not understandable to students because of wording or even the choice of activity listed in a story problem.  Standardized assessments are often created based on majority groups that can leave some minority groups confused by wording or topic (Kim & Zabelina, 2015).  These tests can also have confusing scoring guides or may not even have rubrics since the use of standardized performance assessment is still relatively young.  They do have the distinct advantage of consistency in conditions and instructions since those are often strictly laid out.

Teacher created assessments may solve some of these problems.  Since teachers know their students, they may be able to create assessments that are manageable for students as well as avoiding culturally biased pitfalls of topics.  Rubrics can be created by teacher teams for specific performance assessment when those assessments are created.  However, the problems that may arise are lack of validity and reliability from poorly constructed tests or rubrics and guides.  Consistency of test conditions would be on the shoulders of teachers, schools, and districts to ensure that they were as standardized as possible.  Avoiding large incongruities would definitely be more work for the educators on the ground.

Overall, there are both pros and cons to each system.  A basic philosophical difference lies in the two different schools of thought on assessing for SLOs.  In the corner of pre approved assessments the philosophy is one of consistency and uniformity to gather data.  Its logic and practicality are obvious.  In the other corner, the philosophy is that of shifting power to the teachers and educators that work with students every day. Trading uniformity for the hope of deeper understanding.  John R. Troutman McCrann (2018) sums up this philosophy in one pithy comment. He says, “...it starts and ends with a very simple idea: I, the students' teacher, have expert knowledge about my students and...content standards, so I ought to have the power to assess those students' growth on those standards” (para. 5).  The issue is a complex one, but one that demands debate to ensure the students that represent that “S” in SLO achieve the most that they can.

References

Kim, K. H., & Zabelina, D. (2015). Cultural Bias in Assessment: Can Creativity Assessment Help? International Journal of Critical Pedagogy, 6, 129–147. Retrieved from http://libjournal.uncg.edu/ijcp/article/viewFile/301/856#:~:text=Standardized tests intend to measure,language backgrounds, socioeconomic status, and

Ohio Department of Education (ODOE). (2016). A Guide to Using SLOs as a Locally-Determined Measure of Student Growth (Rep. No. Guidebook). Retrieved from https://education.ohio.gov/getattachment/Topics/Teaching/Educator-Evaluation-System/Ohio-s-Teacher-Evaluation-System/Student-Growth-Measures/Student-Learning-Objective-Examples/SLO-Guidebook-041516.pdf.aspx

Reform Support Network (RSN). (2014). A toolkit for implementing high-quality student learning objectives 2.0. Retrieved from https://www2.ed.gov/about/inits/ed/implementation-support-unit/tech-assist/toolkit-implementing-learning-objectives-2-0.pdf

Troutman McCrann, J. R. (2018). Putting Assessment Back in the Hands of Teachers. Educational Leadership, 75(5), 41-45. Retrieved from http://www.ascd.org/publications/educational-leadership/feb18/vol75/num05/Putting-Assessment-Back-in-the-Hands-of-Teachers.aspx

Are Summative Tests Working?

Even though the landscape of education has changed dramatically in the last 70 years the format of standardized tests have remained oddly the same since the 1950’s (Bryant, 2018).  Here in Colorado, students prepare to take the Colorado Measures of Academic Success (CMAS) assessments each spring.  Having been a proctor and technology assistant for the testing, I can attest that it looks much like the way a standardized test has always looked, except with computers.  Students must sit in a defined space, making no noise, with strict time limits to answer mostly multiple choice questions. The format very much adheres to the values of the Industrialized 20th century: cost efficiency, quantifiability, uniformity, speed, and mass production (Bryant, 2018).  So then our questions should be, does that age-old format work for modern learners and if not what should replace the current system?  

Let’s take a look at the first question.  Other Organisation for Economic Co-operation and Development (OECD) countries have focused their energy toward test validity while the U.S. has stayed on the track of test reliability (Vander Ark, 2019).  In research from the PISA given in 2015 the U.S. ranks in the middle of the pack, well behind many other advanced industrial nations (DeSilver, 2017)  In this table from the Pew Research Center, we can see that several other nations are far ahead of the U.S. in math, science, and reading.  Of course, these numbers are derived from a standardized test and full systems of education are much more complex than just summative tests.  However, it can be gleaned from these numbers that something isn’t quite right with the current system.  Bryant (2018) asserts that “...the fixed content and rigid testing conditions [used in the U.S.] severely constrain the skills and knowledge that can be assessed” (para. 3).  This includes skills created by substantive and authentic learning experiences and essential non-cognitive skills like resilience or collaboration (Bryant, 2018).  On a more simple level, standardized tests don’t take into account all the knowledge teachers have of their students (Vander Ark, 2019).

So what should replace this system if it isn’t working?  We need and will continue to need some sort of large scale testing.  Just classroom-based assessments are not enough for districts, states or even the country to gain data. Without this data we wouldn’t be able to see what is working and what is not, where resources are needed or differentiate between groups or subgroups (Bryant, 2018).  Since this is the case we need to be able to make the tests more valuable by creating them to encompass a wider range of important academic skills.  In his book The Promise of Next Generation Assessment David Conely outlines 10 Principles for Better Assessment (Getting Smart Staff, 2018) that include the idea of cumulative validity.  This idea takes advantage of many points of data about a student.  The data can include classroom-based evidence, continuous assessment, real-word and performance based assessments, the addition of diploma networks (like IB), and the use of AI grading of digital portfolios.  All these can be utilized to make the large-scale summative test work toward validity and ultimately the benefit of students.

References

Bryant, W. (2018, February 20). The Future of Testing. Getting Smart. Retrieved from https://www.gettingsmart.com/2018/02/the-future-of-testing/

DeSilver, D. (2017, February 15). U.S. students’ academic achievement still lags that of their peers in many other countries. Pew Research Center. Retrieved from https://www.pewresearch.org/fact-tank/2017/02/15/u-s-students-internationally-math-science/

Getting Smart Staff. (2018, September 20). David Conley on Next Generation Assessment. Getting Smart. Retrieved from https://www.gettingsmart.com/2018/09/david-conley-on-next-generation-assessment/

Vander Ark, T. (2019, April 1). A Proposal For The End Of Standardized Testing. Forbes. Retrieved from https://www.forbes.com/sites/tomvanderark/2019/04/01/a-proposal-for-the-end-of-standardized-testing/#7860494621d8