With the extent of psychological research in the field, some of it conflicting, the ability to determine the quality of a given study or studies is important. How does one go about making such determinations? This chapter considers how reliability, validity, and standardization operate as mechanisms for evaluating research.

Reliability concerns the extent to which results of a given study are consistent. For example, does a replication of the study produce the same results? Reliability can be further broken down into external and internal validity. External reliability considers the stability of responses over time. When individuals’ responses to a given test at different times are quite different, the study is said to have low external reliability. Alternatively, internal reliability is concerned with consistency of the test within itself. For example, with a five-item scale intended to assess anxiety, individuals should respond similarly to all five items; no one item should lead to widely divergent responses if the scale is said to be internally reliable. Psychologists can test internal reliability by using the split-half method or running a test of Cronbach’s alpha.

In general, validity is concerned with the extent that a researcher is measuring what she or he intends. A variety of types of validity exist: face validity, or the extent that is being tested is obvious; content validity, the extent which a test wholly represents the topic of concern as determined by experts in the field; criterion validity, the extent to which current scores predict scores on other related measures at the same time (concurrent validity) or in the future (predictive validity); finally, construct validity, the extent to which a measure successfully taps into the psychological construct that is attempting to be studied. Constructs must be addressed consistently, over time, as our understanding of psychological phenomena grows and shifts.

The last formal mechanism through which psychological research can be evaluated for quality is standardization. The process of standardization allows psychologists to determine the norms for a given test or scale. The norms then allow for better understanding of individual responses or even average responses of one group in comparison to a larger population. Most psychological constructs are assumed to fall on a normal distribution. This means that, given a sufficient number of responses, most people will fall near the middle or average, while few people will score dramatically above or below average. Using previously-standardized test measures allows for better interpretation of data.

Additional Online Resources

Flashcards

Interactive Quiz for Chapter 6

Instructions: For each question, click on the radio button beside your answer. When you have completed the entire quiz, click the “Submit my answers” button at the bottom of the page to receive your results.

Question 1:

Carl weighs himself on a Monday morning, then maintains a consistent lifestyle and eating habits, and weight himself again on Tuesday morning. If Carl’s weight is the same on Tuesday as it was on Monday, we can say his measure is ________.
a) internally reliable
b) externally reliable
c) unreliable
d) face valid

Question 2:

Stability reliability is also known as _______.
a) test–retest reliability
b) external reliability
c) internal reliability
d) inter-rater reliability

Question 3:

Correlation between two scores on two equal parts of a test is known as ________.
a) Chronbach’s alpha
b) internal validity testing
c) split-half correlation
d) scale degradation assessment

Question 4:

The Kuder–Richardson measure is used to assess reliability when responses are ________.
a) continuous
b) dichotomous
c) categorical
d) repeated multiple times

Question 5:

All of the following are methods for increasing reliability through item analysis except ________.
a) item-total correlations
b) item discrimination between extreme groups
c) reliability coefficients
d) internal–external reliability assessments

Question 6:

If prior research indicates that a population of individuals is likely to have a specific trait you’re interested in measuring, this allows us to do a test of ________.
a) construct validity
b) known groups criterion
c) predictive validity
d) characteristic validity

Question 7:

If experts agree that a measure of depression fully covers the entire range of depressive characteristics the measure is said to have ________.
a) criterion validity
b) face validity
c) expert validity
d) content validity

Question 8:

The main difference between criterion validity and concurrent validity is that ________.
a) criterion validity predicts scores on future measures, while concurrent validity predicts scores on measures taken at the same time
b) concurrent validity predicts scores on future measures, while criterion validity predicts scores on measures taken at the same time
c) criterion validity predicts scores on other related variables, while concurrent validity predicts scores on the same test at a later date
d) concurrent validity predicts scores on other related variables, while criterion validity predicts scores on the same test at a later date

Question 9:

A standardized score is also known as a ________.
a) z-score
b) x-score
c) y-score
d) s-score

Question 10:

The bell-shaped distribution of scores around a mean when large sample sizes are employed is known as the ________.
a) bell distribution
b) standard distribution
c) normal distribution
d) average distribution

Higher Education

Chapter Summary

Additional Online Resources

Flashcards

Interactive Quiz for Chapter 6