|
The
validity
of a test concerns what the test measures and how well it does so.
Whether a test is valid depends, in part, on its specific purpose.
For example, is the test valid for this particular purpose, in this
particular situation, for these particular participants? Only after
a test's reliability (consistency) has been established can
researchers consider a test's validity.
(1) Does the test cover the content of interest? For example, are the items on an achievement test for statistics based on statistical concepts?
(2) Is the test appropriate for your
participants? For example, are the items geared toward college math
majors or psychology majors? (1) Subjective methods involve asking experts to judge the relevance of the test items.
(2) Empirical
methods identify which test item can be grouped or
categorized together. Example: In a study (Williams, Weiss, & Rolfus, 2003) with 1,525 children, the 15 subtests of the WISC-IV formed four factors. Children tended to perform similarly on the subtests within each factor. In other words, the subtests within each factor seemed to be measuring similar abilities. For example, the similarities, vocabulary, comprehension, information, and word reasoning subtests formed the verbal comprehension factor. Researcher looked at the content of these subtests and decided to name the factor Verbal Comprehension. So if children do very well on defining vocabulary words, then we would expect them to show strong social comprehension skills as well. This gives evidence for convergent validity within a factor. The remaining 9 subtests did not fall into the verbal comprehension index. They clustered into three other indices. In other words, just because children do very well in defining vocabulary words does not mean that they will tend to do well in copying designs with colored blocks. In other words, the fact that there are different factors or indices (i.e., verbal comprehension, perceptual reasoning, working memory, processing speed) gives evidence that we might find differences in childrens’ performance based on the different indices. In other words, children who do very well on verbal comprehension subtests might do very poorly on working memory subtests. The fact that certain subtests fit better into one factor and not another gives evidence for discriminant validity.
(2) Predictive validity Predictive validity is a type of criterion-related validity where the criterion measures are obtained in the future, usually months or years after test scores are obtained. An example is when college graduates are predicted from an entrance exam. The ideal situation for this type of validity is to administer test during a time of open enrollment, hiring, etc., for a full range of results can be possible on outcome measures. Example: The Pre-Kindergarten Screen (PKS) is a standardized screening measurement to assess school readiness in children between 4 year and 0 months and 5 years and 11 months. There are 10 scores: gross motor skills, fine motor skills, following directions, block tapping, visual matching, visual memory limitation, basic academic skills, delayed gratification, and total score. PKS predictive validity was examined by comparing children’s pre-kindergarten PKS scores to both their kindergarten outcomes and the teacher’s identification of highest and lowest performing students. For the comparison to kindergarten outcome, the PKS was able to accurately classify 98.7% of a group of 392 children. For the comparison to high and low performing children, it was able to accurately classify 91.2% of 125 children (Chittobran, 2003).
|