The National Educational Panel Study (NEPS) as a newly set up large-scale assessment study in Germany has accepted the challenge of including students with special educational needs (SEN) into its conceptual design. Particularly, students with SEN in the area of learning (SEN-L) are oversampled within the NEPS. Their educational biographies and relevant context factors will be assessed longitudinally based on interviews and questionnaires given to their parents, teachers, and school principals. However, obtaining data (test data, questionnaires) from the target subjects themselves is by no means a simple, straightforward endeavor but requires careful research strategies. In this article we will briefly discuss problems of specifying the target population of students with SEN-L and present a focused review of research literature relevant to the inclusion of students with SEN-L into large-scale assessments. Specifically, we will focus on challenges relating to the standardized, reliable, and valid testing of competencies of students with SEN-L within large-scale assessments. Additionally, the article outlines the basic design of feasibility studies within the NEPS to test for the structural comparability of competence assessments in special schools with those in regular schools. These studies will further explore the necessity of test accommodations for students with SEN-L and the impact of those accommodations on the validity and comparability of test scores of students with and without SEN-L.
Abstract This editorial introduces a special issue of Large-Scale Assessments in Education (LSAE) that addresses key challenges in analyzing longitudinal data from large-scale studies. These challenges include ensuring fair measurement across time, developing common metrics, and correcting for measurement errors. The special issue highlights recent methodological innovations, particularly for studies like the National Education Panel Study (NEPS), providing approaches for improving the accuracy and robustness of longitudinal educational research. The papers in this issue present advances in methods for estimating trends, incorporating background information, and analyzing longitudinal relationships between constructs. Innovative approaches such as Bayesian modeling for borrowing historical information, continuous-time models for capturing developmental trends, and plausible value estimation provide practical solutions for researchers working with complex longitudinal data. In addition, the issue presents new software tools that facilitate the implementation of these advanced methodologies. Together, these papers contribute to both the theory and practice of educational assessment and provide valuable insights for those working with longitudinal data in national and international panel studies.
The Berkeley Evaluation and Assessment Research (BEAR) Center has for the last several years been involved in the development of an assessment system, which we call the BEAR Assessment System. The system consists of four principles, each associated with a practical “building block” [Wilson 2005] as well as an activity that helps integrate the four parts together (see the section starting on p. 325). Its original deployment was as a curriculum-embedded system in science [Wilson et al. 2000], but it has clear and logical extensions to other contexts such as in higher education [Wilson and Scalise 2006], in largescale assessment [Wilson 2005]; and in disciplinary areas, such as chemistry [Claesgens et al. 2002], and the focus of this chapter, mathematics.
In this paper, we use data from the German PISA 2003 sample to study the effects of central exit examinations on student performance, student attitudes, and teacher behavior. Unlike earlier studies we use (i) a value-added measure to pin down the effect of central exit exams on learning in the last year before the exam and (ii) separate test scores for mathematical literacy and curriculum-based knowledge. The findings indicate that central exit exams only improve curriculum-based knowledge but do not affect mathematical literacy. Moreover, teachers in German states with central exit examinations are more active and tend to be more performance oriented. Students, although showing a better performance, are less motivated in school.
Abstract. Response styles can influence item responses in addition to a respondent’s latent trait level. A common concern is that comparisons between individuals based on sum scores may be rendered invalid by response style effects. This paper investigates a multidimensional approach to modeling traits and response styles simultaneously. Models incorporating different response styles as well as personality traits (Big Five facets) were compared regarding model fit. Relationships between traits and response styles were investigated and different approaches to modeling extreme response style (ERS) were compared regarding their effects on trait estimates. All multidimensional models showed a better fit than the unidimensional models, indicating that response styles influenced item responses with ERS showing the largest incremental variance explanation. ERS and midpoint response style were mainly trait-independent whereas acquiescence and disacquiescence were strongly related to several personality traits. Expected a posteriori estimates of participants’ trait levels did not differ substantially between two-dimensional and unidimensional models when a set of heterogeneous items was used to model ERS. A minor adjustment of trait estimates occurred when the same items were used to model ERS and the trait, though the ERS dimension in this approach only reflected scale-specific ERS, rather than a general ERS tendency.
Assessment literacy is a crucial aspect of teachers’ professional knowledge and relevant to fostering students’ learning. Concerning experimentation, teachers have to be able to assess student achievement when students form hypotheses, design experiments, and analyze data. Therefore, teachers need to be familiar with criteria for experimentation as well as student conceptions of experimentation. The present study modeled and measured 495 German pre-service teachers’ knowledge of what to assess regarding experimentation competences in biology. We applied an open-answer format for the measurement instrument. For modeling we used item response theory (IRT). We argue that knowledge of what to assess regarding experimentation competences is a one-dimensional construct and we provide evidence for the validity of the measurement. Furthermore, we describe qualitative findings of pre-service teachers’ knowledge of what to assess, in particular difficulties concerning the assessment of student conceptions as well as the use of scientific terms in the assessments. We discuss the findings in terms of implications for science teacher education and further research perspectives.