Evidence of Reliability and Validity

The ATLAST teacher assessments have strong evidence of validity and reliability.

Validity

Three lines of evidence support the argument that the assessments are valid measures of teachers’ knowledge of the targeted science concepts. First, cognitive interviews with teachers (see Writing Items and Cognitive Interviews) established that teachers interpret the items as intended and that teachers must use their knowledge of content to answer the items correctly. Second, a panel of three content experts (e.g. individuals with a Ph.D. in physics) reviewed the assessment items (see Expert Review) at three stages (see Development) to ensure content accuracy. A panel also reviewed the final assessment and judged it to be an adequate measure of the content domain. Finally, dimensionality analyses (including both factor analysis and cluster analysis) indicate that all items on the assessment measure a single dominant trait.

HRI conducted a separate investigation of teacher assessment validity.  Each assessment was administered to approximately 100 middle school science teachers.  Teachers with the lowest five scores and teachers with the highest five scores were then interviewed by content area experts, without the experts knowing how the individuals scored.  Based on the interview, each teacher was categorized as having either extensive or limited content knowledge.  We then compared these determinations to the assessment scores as an indication of validity.  Across 30 teachers, there was only one instance in which the expert’s categorization of the individual did not agree with the test score.

Reliability

The ATLAST teacher assessments have strong internal reliability and test-retest reliability.


Internal Reliability

Using data from the final field test, the internal reliability of each assessment was calculated using both a classical approach (Cronbach’s alpha) and Item Response Theory (IRT).  The IRT internal reliability for each teacher assessment is given below:

  • Flow of Matter and Energy: 0.85
  • Force & Motion: 0.85
  • Plate Tectonics: 0.86

 

The Cronbach alphas were similarly high.

Test-Retest Reliability

To study test-retest reliability, ATLAST recruited approximately 100 middle grades science teachers for each assessment who took the test twice, two weeks apart with no intervening instruction.  The test-retest reliability calculated for each assessment is given below.

  • Flow of Matter and Energy: 0.93
  • Force & Motion: 0.88
  • Plate Tectonics: 0.94

Although no standards exist for minimum reliability, anything approaching 0.9 is considered good.[1]

[1] Crocker, L. M. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart, and Winston.