Validity in Psychological Tests

Why Measures Like Validity and Reliability are Important

person filling out answer bubbles in multiple choice exam sheet
spxChrome / Getty Images

Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted.

Psychological assessment is an important part of both experimental research and clinical treatment. One of the greatest concerns when creating a psychological test is whether or not it actually measures what we think it is measuring.

For example, a test might be designed to measure a stable personality trait but instead, it measures transitory emotions generated by situational or environmental conditions. A valid test ensures that the results are an accurate reflection of the dimension undergoing assessment.

Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. There are four types of validity: content validity, criterion-related validity, construct validity, and face validity.

This article discusses what each of these four types of validity is and how they are used in psychological tests. It also explores how validity compares with reliability, which is another important measure of a test's accuracy and usefulness.

Content Validity

When a test has content validity, the items on the test represent the entire range of possible items the test should cover. Individual test questions may be drawn from a large pool of items that cover a broad range of topics.

In some instances where a test measures a trait that is difficult to define, an expert judge may rate each item’s relevance. Because each judge bases their rating on opinion, two independent judges rate the test separately. Items that are rated as strongly relevant by both judges will be included in the final test.

Internal and External Validity

Internal and external validity are used to determine whether or not the results of an experiment are meaningful. Internal validity relates to the way a test is performed, while external validity examines how well the findings may apply in other settings.

Criterion-Related Validity

A test is said to have criterion-related validity when it has demonstrated its effectiveness in predicting criteria, or indicators, of a construct.

For example, when an employer hires new employees, they will examine different criteria that could predict whether or not a prospective hire will be a good fit for a job. People who do well on a test may be more likely to do well at a job, while people with a low score on a test will do poorly at that job.

There are two different types of criterion validity: concurrent and predictive.

Concurrent Validity

Concurrent validity occurs when criterion measures are obtained at the same time as test scores, indicating the ability of test scores to estimate an individual’s current state. For example, on a test that measures levels of depression, the test would be said to have concurrent validity if it measured the current levels of depression experienced by the test taker.

Predictive Validity

Predictive validity is when the criterion measures are obtained at a time after the test. Examples of tests with predictive validity are career or aptitude tests, which are helpful in determining who is likely to succeed or fail in certain subjects or occupations.

Construct Validity

A test has construct validity if it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity. A valid intelligence test should be able to accurately measure the construct of intelligence rather than other characteristics, such as memory or education level.

Essentially, construct validity looks at whether a test covers the full range of behaviors that make up the construct being measured. The procedure here is to identify necessary tasks to perform a job like typing, design, or physical ability.

In order to demonstrate the construct validity of a selection procedure, the behaviors demonstrated in the selection should be a representative sample of the behaviors of the job.

Face Validity

Face validity is one of the most basic measures of validity. Essentially, researchers are simply taking the validity of the test at face value by looking at whether it appears to measure the target variable. On a measure of happiness, for example, the test would be said to have face validity if it appeared to actually measure levels of happiness.

Obviously, face validity only means that the test looks like it works. It does not mean that the test has been proven to work. However, if the measure seems to be valid at this point, researchers may investigate further in order to determine whether the test is valid and should be used in the future.

A survey asking people which political candidate they plan to vote for would be said to have high face validity, while a complex test used as part of a psychological experiment that looks at a variety of values, characteristics, and behaviors might be said to have low face validity because the exact purpose of the test is not immediately clear, particularly to the participants.

Reliability vs. Validity

While validity examines how well a test measures what it is intended to measure, reliability refers to how consistent the results are. There are four ways to assess reliability:

  • Internal consistency: Internal consistency examines the consistency of different items within the same test. 
  • Inter-rater: In this method, multiple independent judges score the test on its reliability. 
  • Parallel or alternate forms: This approach uses different forms of the same test and compares the results.
  • Test-retest: This measures the reliability of results by administering the same test at different points in time.

It's important to remember that a test can be reliable without being valid. Consistent results do not always indicate that a test is measuring what researchers designed it to.

Frequently Asked Questions

  • What is external validity in psychology?

    External validity is how well the results of a test apply in other settings. The findings of a test with strong external validity will apply to practical situations and take real-world variables into account.

  • What is internal validity in psychology?

    Internal validity examines the procedures and structure of a test to determine how well it was conducted and whether or not its results are valid. A test with strong internal validity will establish cause and effect and should eliminate alternative explanations for the findings.

  • What is the difference between reliability and validity in psychology?

    Reliability is an examination of how consistent and stable the results of an assessment are. Validity refers to how well a test actually measures what it was created to measure. Reliability measures the precision of a test, while validity looks at accuracy.

  • What is an example of reliability in psychology?

    An example of reliability in psychology research would be administering a personality test multiple times in a row to see if the person has the same result. If the score is the same or similar on each test, it is an indicator that the test is reliable.

  • What kind of data measures content validity in psychology?

    Content validity is measured by checking to see whether the content of a test accurately depicts the construct being tested. Generally, experts on the subject matter would determine whether or not a test has acceptable content validity.

  • How do you assure validity in a psychological study?

    Validity can be demonstrated by showing a clear relationship between the test and what it is meant to measure. This can be done by showing that a study has one (or more) of the four types of validity: content validity, criterion-related validity, construct validity, and/or face validity.

9 Sources
Verywell Mind uses only high-quality sources, including peer-reviewed studies, to support the facts within our articles. Read our editorial process to learn more about how we fact-check and keep our content accurate, reliable, and trustworthy.
  1. Newton PE, Shaw SD. Standards for talking and thinking about validity. Psychol Methods. 2013;18(3):301-19. doi:10.1037/a0032969

  2. Cizek GJ. Defining and distinguishing validity: Interpretations of score meaning and justifications of test use. Psychol Methods. 2012;17(1):31-43. doi:10.1037/a0026975

  3. Committee on Psychological Testing, Including Validity Testing, for Social Security Administration Disability Determinations; Board on the Health of Select Populations; Institute of Medicine. Psychological Testing in the Service of Disability Determination. Washington, DC; 2015.

  4. Lin WL., Yao G. Criterion validity. In: Michalos AC, ed. Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht; 2014. doi:10.1007/978-94-007-0753-5_618

  5. Lin WL., Yao G. Concurrent validity. In: Michalos AC, ed. Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht; 2014. doi:10.1007/978-94-007-0753-5_516

  6. Lin WL., Yao G. Predictive validity. In: Michalos AC, eds. Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht; 2014. doi:10.1007/978-94-007-0753-5_2241

  7. Ginty AT. Construct validity. In: Gellman MD, Turner JR, eds. Encyclopedia of Behavioral Medicine. Springer, New York, NY; 2013. doi:10.1007/978-1-4419-1005-9_861

  8. Johnson E. Face validity. In: Volkmar FR, ed. Encyclopedia of Autism Spectrum Disorders. Springer, New York, NY; 2013. doi:10.1007/978-1-4419-1698-3_308

  9. Almanasreh E, Moles R, Chen TF. Evaluation of methods used for estimating content validityRes Social Adm Pharm. 2019;15(2):214-221. doi:10.1016/j.sapharm.2018.03.066

Kendra Cherry

By Kendra Cherry, MSEd
Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."