Chapter Summary

With the extent of psychological research in the field, some of it conflicting, the ability to determine the quality of a given study or studies is important. How does one go about making such determinations? This chapter considers how reliability, validity, and standardization operate as mechanisms for evaluating research.

Reliability concerns the extent to which results of a given study are consistent. For example, does a replication of the study produce the same results? Reliability can be further broken down into external and internal validity. External reliability considers the stability of responses over time. When individuals’ responses to a given test at different times are quite different, the study is said to have low external reliability. Alternatively, internal reliability is concerned with consistency of the test within itself. For example, with a five-item scale intended to assess anxiety, individuals should respond similarly to all five items; no one item should lead to widely divergent responses if the scale is said to be internally reliable. Psychologists can test internal reliability by using the split-half method or running a test of Cronbach’s alpha.

In general, validity is concerned with the extent that a researcher is measuring what she or he intends. A variety of types of validity exist: face validity, or the extent that is being tested is obvious; content validity, the extent which a test wholly represents the topic of concern as determined by experts in the field; criterion validity, the extent to which current scores predict scores on other related measures at the same time (concurrent validity) or in the future (predictive validity); finally, construct validity, the extent to which a measure successfully taps into the psychological construct that is attempting to be studied. Constructs must be addressed consistently, over time, as our understanding of psychological phenomena grows and shifts.

The last formal mechanism through which psychological research can be evaluated for quality is standardization. The process of standardization allows psychologists to determine the norms for a given test or scale. The norms then allow for better understanding of individual responses or even average responses of one group in comparison to a larger population. Most psychological constructs are assumed to fall on a normal distribution. This means that, given a sufficient number of responses, most people will fall near the middle or average, while few people will score dramatically above or below average. Using previously-standardized test measures allows for better interpretation of data.

Additional Online Resources

Tutorial on Internal Validity from Athabasca University:

“Construct Validity in Psychological Tests.” By Crohnbach and Meel (1955):

Video lesson on reliability and validity from Education Portal:

Podcast on Reliability in Personality tests:


Test your knowledge of the keywords and definitions in the chapter.


Interactive Quiz for Chapter 6

Instructions: For each question, click on the radio button beside your answer. When you have completed the entire quiz, click the “Submit my answers” button at the bottom of the page to receive your results.

Question 1:

a) internally reliable
b) externally reliable
c) unreliable
d) face valid

Question 2:

a) test–retest reliability
b) external reliability
c) internal reliability
d) inter-rater reliability

Question 3:

a) Chronbach’s alpha
b) internal validity testing
c) split-half correlation
d) scale degradation assessment

Question 4:

a) continuous
b) dichotomous
c) categorical
d) repeated multiple times

Question 5:

a) item-total correlations
b) item discrimination between extreme groups
c) reliability coefficients
d) internal–external reliability assessments

Question 6:

a) construct validity
b) known groups criterion
c) predictive validity
d) characteristic validity

Question 7:

a) criterion validity
b) face validity
c) expert validity
d) content validity

Question 8:

a) criterion validity predicts scores on future measures, while concurrent validity predicts scores on measures taken at the same time
b) concurrent validity predicts scores on future measures, while criterion validity predicts scores on measures taken at the same time
c) criterion validity predicts scores on other related variables, while concurrent validity predicts scores on the same test at a later date
d) concurrent validity predicts scores on other related variables, while criterion validity predicts scores on the same test at a later date

Question 9:

a) z-score
b) x-score
c) y-score
d) s-score

Question 10:

a) bell distribution
b) standard distribution
c) normal distribution
d) average distribution