The paper is concerned with the identification of the characteristics and effects of different testing methods on individuals and how these practices are reflected at international level. Therefore, a comparison between standardized testing and alternative testing is offered, but the particular case of the students with special educational needs is taken into consideration as well. The nature of each process is discussed, its impact at individual and country level is reflected and recommendations are suggested.
In educational context the problem of assessment and standardized testing are key matters. In the US, the law “No Child Left Behind Act” signed by President Bush on January 8, 2002 makes explicit call for educational accountability. The law focuses on ways that lead to academic progress and reinforces the use of tests and assessments at the state level to monitor student progress toward 100 percent proficiency for all students by 2014. As a result, districts, schools, and teachers have placed a new emphasis on tests and additional assessments to monitor student learning and ensure that students can do well on state tests (The nature of assessment: A guide to standardized testing, retrieved from site). The standards underlie the quality of assessment practices used by teachers, state and federal agencies to measure student achievement. Standardized testing is the most important part of a systematic type of assessment, involving the collection and interpretation of educational data. Concisely expressed by Popham (1999) standardized testing is a quality of a test of being “administered and scored in a predetermined, standard manner”. The standardization refers to the identical administration of the test, requiring the application in the same conditions every time. Such a strict control of other variables leads to the possibility of inferring that the results may be attributed to students’ performance. Moreover, this allows for a further comparison of schools, districts, or states regarding the performance on tests. A great advantage and a strong point of these tests are the psychometric qualities (test study, design, and administration) standards for reliability, validity, and lack of bias (Zucker, 2003; Joint Committee on Testing Practices, 2004). Reliability refers to the possibility of administering the test repeatedly and getting approximately the same score; validity means that the test measures accurately the construct that is intended to measure. These qualities of standardized tests explain their use on a large scale and support inferences and comparisons between groups of learners.
The premises which reinforced the necessity of this type of testing concern the perception that effective education requires information about learning at different points during the process. Therefore, two kinds of assessment have evolved—formative and summative. Formative assessment provides information about learning in process. It may consist of weekly quizzes, tests, essays given by teachers to their students. The results of formative assessments help teachers and students understand how students progress in learning and what adjustments need to be made in instruction. (The nature of assessment: A guide to standardized testing, retrieved from site). Summative assessment refers to “high-stakes”, “standardized” testing carried out by the states and created a great deal of controversy in the educational field. Summative assessment is focused on the state of student learning at certain end points in a student’s academic career—at the end of a school year, or at certain grades such as grades 3rd, 5th, 8th, and 11th. It is a standardized form of summing up students’ learning experiences. The term “high-stakes testing” has been coined to label the consequences to schools (or students) that, as a result of state testing, fail to maintain a steady increase in achievement across the some categories of students in schools (i.e., minority, poor, and special education students).
Thus, it becomes obvious there is a great amount of pressure on schools to demonstrate academic progress, and this pressure comes mainly in the form of standardized testing, since high-test scores have become a primary criterion against which the worth of an educational system is judged (Pawlak, Julie).
The characteristic forms of standardized tests are: multiple choice, fill in the blank, and true/false tests. Criticisms has been put forth on these assessment forms as they are considered to measure only a few of the standards, at the same time providing only single measures of student learning at a certain point in time. Though efforts have been made to create more sophisticated multiple choices tests it remains true that this kind of tests “are clearly limited in the kinds of achievement they can measure” (Zucker, 2003). Another type of tasks integrated in standardized tests is the open-ended test in which students are asked to respond either by writing a few sentences in a short answer form, or by writing an extended essay. The “constructed response” allows students to display knowledge and apply critical thinking skills, but it is more difficult to score. At this point several types of large-scale standardized tests should be mentioned. Among these are: National Assessment of Education Progress (NAEP), certain international tests such as Trends in International Mathematics and Science study (TIMSS) and the program for International Student Assessment (PISA), college admissions (SAT, ACT), and combined program of courses and examination of which Advanced Placement (AP) and International Baccalaureate (IB) are representative. Of these, NAEP is representative as it covers a large spectrum of subject matters and US states. The test covers various subject areas and is administered to students in 4th, 8th, and 12th grade every two years in reading and mathematics and at longer intervals in other academic subjects such as science, history, and geography. Fifty-one states and territories now participate in NAEP.
Returning to the debate concerning standardized tests, another issue is the group-administration of these forms of evaluation together with the decontextualized form in which they assess and promote learning have been another major disadvantages underlined by teachers and researchers. The high-stakes considered in the context of state tests have been identified as yet another soft point of the national educational systems, since certain researchers consider these type of tests a way of boosting the scores, or an opportunity for educators to “devote most of their energy to raising students’ scores on conventional achievement tests” (Meisels et al., 2003). Such line of reasoning highlights other arguments for criticism of standardized testing such as the fact that they are intended to assess the current status of student’s achievement and are not designed to determine appropriate instructional strategies. The nature of instructional strategies is influenced in a negative way by such standardized tests, since it relies mainly on the use of worksheets, narrowing the curriculum and the acquisition of (cognitive) skills.
In what concerns educators, they are most often held responsible for the performance of their students. The sanctions or rewards to which they may be exposed are provided as a result of exposing the results of a school’s performance to the public in newspapers and on television; the comparative framework is often use in order to emphasize the performance of schools considered as “failing”. In such a context, it has been underlined that all kinds of comparisons are encouraged, from urban to rural schools, this leaving place for inaccuracy and misjudgment notwithstanding the same scale is used (Natriello, 2000; Meier, 2002; Darling-Hammond, 2003, cited in Pawlak J.). A negative effect has also been attested on children in the form of test anxiety provoked by the “high-stakes”. Reasons which maintain high levels of anxiety in children have been identified in the form of: the duration of testing – two or three hours of each test-taking day, the unfamiliar setting, their knowledge that the state-wide curriculum is evaluated by an end-of-the-year test, and, of course, the consequences that accompany stressful evaluative situations. Grade retention and denial of diplomas are two possible negative consequences that impact greatly students’ further academic achievement.
As showed above the major criticism against standardized testing is the restriction imposed to the educational opportunities of students. This is an argument for gaining awareness that this method, though practiced on a large scale, and attested by the United States educational system as the most objective and scientific way of assessing performance, is not the only existent measure of performance. The preoccupation with standardized tests coexists in the educational area with other forms of assessment of a more traditional, formative nature, based on the students’ abilities to recognize, recall or apply newly learned knowledge.
In such context educators have felt the need for a more authentic assessment, which would highlight improved strategies of teaching and learning. Authentic assessment, also referred to as performance assessment, portfolio assessment, curriculum-embedded instruction, or integrated education, represents an alternative to standardized testing. Alternative assessment has been described as “…an instructional-driven measurement in which students’ actual classroom performance is evaluated in terms of standards-infused criteria” (Meisels, 2003). Such type of assessment is rather multidimensional, evaluating students learning by taking into account broader concepts of intelligence, ability, intrapersonal, and interpersonal abilities and learning through visual, auditory, kinesthetic modalities. The skills become an integral part of the instructional cycle, and feedback provided by the teacher is meant to be formative (used for immediate feedback into learning and teaching) rather than summative (summarizes where students have reached in their development at the end of the topic). The distinction between standardized testing and alternative testing may be circumscribed in formative and summative terms as described at the beginning of this paper.
These types of tasks aimed at assessing performance are contrasted to multiple choice exercises since students are encouraged to express their knowledge by making use of open-ended questions, essay, portfolio, story retelling, writing samples, projects/exhibitions, experiments/demonstrations etc., and integrated performance assessments which require higher-order thinking, authentic tasks (tasks that are meaningful, challenging, and engaging, relevant in a real context), integrative tasks (tasks that call for a combination of skills across curriculum), and constructed responses (values the process as well as the product of the answer) (Callison, 1998, cited in Pawlak, J.).
As opposed to the limitations of standardized testing in the form of traditional multiple-choice, short answer, true/false, or fill-in-the-blank tests that only evaluate a single measure of student learning and consider only narrow portion of students’ abilities, the curriculum-embedded performance assessment that encompasses developmental guidelines and checklists, portfolios, summary reports etc. provides a more comprehensive evaluation and facilitation of learning and progress. This line of practice is very different from the standardized one since the latter is based on government regulations that include annual targets for academic achievement, participation in assessments, graduation rates for high schools. The debate arises mostly from the fact that the targets established must be applied to the major racial and ethnic groups, the economically disadvantaged, special education students, and students with limited English proficiency (Jewell, M.). These being taken into consideration, it becomes clear that the form of alternative assessment is more efficient when working with students with special needs since including a variety of types of assessments provide students with ample opportunities to demonstrate their abilities at their own pace. Moreover, each individual is taken into consideration, the performance is assessed using multiple sources and instruction may be adapted to ones’ special learning needs. In standardized testing, the format is more rigid, fosters stress and a universal learning pace, assessed as a summary of one educational stage. The assessment is carried out in groups and the individual is offered the possibility to obtain a numeric score which allows him to pass or fail. The consequences of this type are the only feedback individual students receive in standardized testing. In alternative assessment feedback is very important and reinforces learning, stimulating performance on a gradual long term basis.
On the other hand, a possibility to improve the performance of the students with special needs is the preparation in advance and a familiarization with the content. Jewell, M. insisted that schools should ensure that special education students and students with limited English proficiency receive the appropriate accommodations permitted by the test. In addition, thorough instruction of students in appropriate test taking strategies will help improve test performance and reduce test anxiety. Arnold, N. acknowledged the necessity of including the students with special educational needs in the assessment and accountability system and of critical importance is therefore to ensure appropriate allocation of resources and learning opportunities for these students.
After considering the individual differences in testing and the relationships with various testing formats, an important matter to be discussed is the comparison of the US assessment standards with those of other countries.
The US educational system consists of different levels – primary school, secondary school, undergraduate level and graduate level. The major assessments are carried out usually at the end of a school year, or at certain grades such as grades 3rd, 5th, 8th, and 11th. The majority of students are enrolled in public schools. More specific, the students are measured annually in reading and math from third through eight grade and at least once during high school (“Education in the United States,” 2002, cited in Huang, Kelly). The main goal of such testing is to indicate if a school has been improving or not. Another type of standardized testing exists in the country that applies to students planning to attend a postsecondary education. Usually during high school students are required to take either the Scholastic Aptitude Test (SAT) or the American College Testing (ACT) in order to qualify for most universities (“Education in the United States,” 2002). On the other hand, in China, standardized testing has a greater impact on students’ further education. Junior high school students start to prepare to score high enough on the national senior high school entrance exams at the end of the 8th grade, so that they can attend one of the most prestigious senior high schools in the country (“Education in the Republic of China,” 2005, cited in Huang, Kelly). After getting into a senior high school, the main goal is to score well enough on the national university entrance exams, in order to attend a four-year university. If they fail the exam, then they cannot apply to any of these schools. China and Japan assessment systems are similar to the US by making use of standardized testing. However, it appears that these countries place an even stronger emphasis on standardized testing than does the US. Due to the strong reinforcement of high-stakes testing in China and Japan, the negative effects among students are even more prominent that for the US students. Japanese and Chinese students study a longer period of time that their American counter parts, are exposed to higher levels of stress, are more anxious in evaluative situations and have a stronger sense of academic achievement. Such high academic expectations in these countries lead to low self-esteem, high rates of depression and suicide among students (“Education in the Republic of China,” 2005). However, there is a difference in outcomes as well, in performance, between the three countries since Japan and China score higher that American and other international students at different testing sessions. Moreover, another distinction is made in terms of curriculum and instructional strategy. In Japan for instance the teachers teach on the same curriculum, and use the same instructional strategies. They strive to teach the same thing in the same way. In the US, however, teachers manifest a greater flexibility towards the curriculum and instruction. They also make use of alternative testing and feel free to use any kind of strategy or material they consider would help instruction.
In what concerns the European educational system, a general aspect is the fact that final year exams are very common (example: France and Germany). Added to this, several European countries (the United Kingdom, the Netherlands, Slovenia, and Lithuania) combine school-leaving examinations with university entrance examinations. Similar to the US, tests are carried out at a national level and based on curriculum. The preoccupation with examinations linked to explicit national (or state) standards are doubled by the concern for an alignment with international expectations reflected, for instance the indicators in Organization for Economic Co-Operation and Development (OECD) or the results of multinational assessments such as the Third Mathematics and Science Study (TIMSS) (Crighton, J.). The debate in the US education concerning national testing is also heard in the UK, for instance where student performance on national curriculum key stage testing at ages seven, eleven, fourteen, and sixteen has led to the publication of “league tables” listing schools in order of their students’ performance.
The analysis of various educational systems as reflected in testing practices may be summarized by identifying two major tendencies: the focus on national standards and the focus on competences. The focus on national standards is reflected in the summative approach to education and is less adequate when it comes to students with special educational needs. The emphasis on competences is better illustrated by normative or alternative assessments and instructional practices. This kind of approach is more adequate for students with special needs since allows for multi-source assessment and is concerned with individual performance being able to monitor progress.
However, individuals and countries differ when it comes to such issues. When teachers and students are concerned, it is important to take into account both sides of assessment – the objective, standardized one, and the more qualitative and subjective one. Both are important to assess students and are relevant for developing further instructional strategies and governmental policies. The category of students with special needs benefits more from a qualitative approach. In what concerns the countries, differences arise in terms of system organization and emphasis on standards or competences. But however, the belief systems and values of a nation have a great impact on educational assessment (for instance the case of Japan)
1. Popham, J. W. (1999). Why standardized tests don’t measure educational quality. Educational Leadership, 56(6), 8-15.
2. Meisels, S., Atkins-Burnett, S., Xue, Y. (2003). Creating a System of Accountability: The Impact of Instructional Assessment on Elementary Children’s Achievement Test Scores. Educational Policy Analysis Archives, Vol 11 (9).
3. Huang Kelly. Standardized Testing, Retrieved from site on March 22
4. Crighton, Johanna V. Standardized tests and Educational Policy, Retrieved from site, March 22 http://education.stateuniversity.com/pages/2505/Testing.html
5. Zucker, S. (2003). Fundamentals of standardized testing. San Antonio TX: Harcourt Assessment, Inc.
6. Joint Committee on Testing Practices. (2004). Code of fair testing practices in education (Revised). Washington D.C.: American Psychological Association
7. The nature of assessment: A guide to standardized testing Retrieved from site on March 22 www.centerforpubliceducation.org/site/c.kjJXJ5MPIwE/b.2506203/k.680A/Standards_and_testing.htm
8. Jewell, M. “No Child Left Behind:” Implications for Special Education Students and Students with Limited English Proficiency Retrieved from site on March 22 http://www.newhorizons.org/spneeds/improvement/jewell.htm
9. Arnold Nancy, Introduction to Alternate Assessments Retrieved from site on March 22 http://www.newhorizons.org/spneeds/inclusion/teaching/arnold.htm
10. Pawlak, Julie, Standardized Testing vs. Authentic Assessment, Retrieved from site on March 22 www.bankstreet.edu/gems/neweducators/standardizedtestingvs.doc