Split Half Reliability (A Comprehensive Guide)

Reliability and validity are the two major concepts of psychometrics.

Reliability is a measure of the consistency of scores of individuals on a particular scale.

It is mainly determined through the correlation coefficient. The most widely used correlation coefficient is Cronbach’s Alpha.

This article will explore reliability, its different types and how they can be measured, and in the end split-half reliability, a measure of internal reliability, will be explored in-depth. 


In psychological terms, reliability refers to consistency in the results of research methods.

In a strict sense of psychological testing, reliability refers to the consistency in scores of an individual when a test administered twice over the period of time. 

For example, if a person measures his weight twice a day he would be expecting the same results.

In reality, the result should be the same but if the measure shows different results it will be of no use.

This same analogy can be drawn with the tape measuring inches.

If it measures the length of the same rope differently at two different points it means it will be of no use. 

In the similar method if the research method produces the same results it can be considered reliable.

For the reliability of psychological measures, the correlation coefficient can be used.

If a research measure has a high correlation coefficient it means it is highly reliable.

It should be noted here that it is very unlikely that an individual will produce the same score on the same measure over the period of time because of a number of factors termed as an error but there would be a certain level of correlation among two scores which can be represented through correlation coefficient.

Types of Reliability:

Types of reliability can be divided into two broad categories known as internal reliability and external reliability.

These reliabilities are explored here:

External reliability:

External reliability refers to how much scores on a measure are consistent from one use to another.

The consistency of the overall score on the measure is compared with others to establish correlation coefficient.

There are different types of external reliability.

Inter-Rater Reliability:

When humans are part of the measurement process, researchers need to take into account the error which can be made by them such as misinterpretation, lack of motivation and being prone to distraction.

All these factors are termed as error which can result in low reliability.

Thus if a researcher tries to establish reliability from his research data and resulting correlation coefficient is low then a researcher will be stuck.

This is where the role of inter-rater reliability comes into play. This allows the researcher to establish the reliability outside of the context of research.

If research takes place over a longer period of time researchers can also establish consistency in the score of rater over the period of time. 

In this type of reliability, a researcher might ask rather rate the observation falling in a particular category and then measure the agreement between them.

For example, a measure has 5 subscales and a total of 100 items. Each rater will place a particular item belonging to a specific subscale.

Let’s suppose if 3 of 4 raters place an item on a specific subscale, this implies that there is 75% agreement between raters.

The agreement is observed for each item and the correlation coefficient is established through the statistical procedure. 

Test-Retest Reliability:

In test-retest reliability consistency in scores on measure administered twice over the period of time is explored.

The same measure is administered on the same population at two different occasions with the time interval between them.

The correlation between the two sets of scores is established statistically determining the correlation coefficient.

Higher the correlation coefficient higher will be the reliability of the measure.  It should be noted here the time interval between administrations is crucial here.

If the time interval is short there will be a higher correlation and if the time interval is long it is likely that the correlation between two sets of the score will below.

Parallel Form Reliability:

In this type of reliability, two measures are developed and then the correlation between the scores of those two measures is established.

Parallel form reliability necessitates that two forms must measure the same construct.

For this purpose, a large number of items are constructed measuring the same construct, and then they are divided into two groups.

These two forms are then administered on the same population and the correlation between those scores is established.

The difficulty in parallel form reliability is that it assumes that two forms measure the same construct.

It also becomes burdensome when a researcher has to establish a large amount of items.

Parallel form reliability is much similar to split-half reliability. The only difference here is that in split-half reliability a measure is used as a whole. 

Internal Reliability:

To establish the internal reliability of a measure, one measure is administered on a group of individuals and then reliability of that instrument is established.

In effect internal reliability of instruments is measured by calculating how coherent are the items measuring the same construct.

In simple words, it is a measure of how coherent is the results produced by different items which measure the same construct.

Internal reliability can be measured through the following procedures.

Average inter-item correlation:

To establish internal reliability through average inter-item correlation, the measure is administered on a group of individuals.

The correlation between each pair of the item is measured separately. For example, if the scale consists of items there will be 15 correlation values determining correlation between all pairs of items.

After establishing a correlation their means or average is taken which constitutes the correlation coefficient determining the reliability of a measure. 

Average Item total correlation:

This uses the technique of inter-item correlation; the only difference here is that the score on all items is computed and this computed score is then treated as another variable.

As in the above-mentioned example the score on 6 items will be calculated which will constitute 7th item.

Correlation between all of them will be established and their average will be taken will serve as a correlation coefficient.

The correlation coefficient will then determine the reliability of the scale. 

Split half reliability:

The split half reliability method is used to measure the internal consistency of the scale.

This implies that it measures to what extent components of scale add up to the construct being measured.

As the name indicates, split-half reliability items measuring the same construct are divided into two halves.

Complete measure is administered on a large group of individuals.

For each individual, total scores of items already cut in two halves in measures giving two scores for each individual.

Correlation between two sets of scores for each individual is established giving the correlation coefficient.

This correlation coefficient determines the reliability of the measure.

Higher the correlation between the two halves of the measure, the more reliable the scale will be. 

Steps to determine split-half reliability:

To determine split-half reliability following steps should be followed:

  1. First step is to administer the scale to a large population of individuals. For split-half reliability, samples should be at least 30. 
  2. In the next step, the researcher divides the test into two halves randomly. It is dependent on the researcher how he manages to divide the items of the test. He can use one of many strategies such as dividing the items in terms of even and odd. 
  3. Next research needs to calculate the score on each half for each individual separately which will then let the research have two sets of scores for each individual. 
  4. In the final step correlation between two halves of the measure is established by using Spearman brown correlation determining split-half reliability of a measure. 

Split half reliability and spearman brown correction:

As the correlation coefficient is determined by only half the item of the test it is reduced automatically.

To have a better estimation of reliability of a complete measure spearman brown correction is applied which is as follow:


This correction can only be applied if the two halves have equal length and if not then the following formula can be used 

image197c (1).png

Drawbacks of split-half reliability:

The only drawback of the split-half reliability is that it works well for the measure consisting of a large number of items (most recommended 100) measuring the same construct.

This hardly happens in psychology as most of the time we have scales or measures which measure a set of constructs.

For example, Neo FFI measures personality but all the items in the scale measure different constructs related to personality such as extraversion, neuroticism, openness to experience, agreeableness, and conscientiousness.

Thus to establish the reliability of such scales split-half reliability cannot be used. 

Split half reliability and Parallel form reliability:

Split half reliability is very much similar to parallel form reliability in which items are divided into two sets and then administered to individuals.

There are major difference between split-half reliability and parallel form reliability which is as follow:

  • In contrast to split-half reliability, two forms in parallel form reliability are administered with a short time interval between them. 
  • In parallel form, reliability observations are independent of each other, which is not the case in split-half reliability. 
  • Parallel form reliability necessitates that two forms should be equivalent, which is not a compulsion for split-half reliability. 


There are two major types of reliability: internal reliability and external reliability.

External reliability can be established through different methods including parallel form reliability, test-retest reliability and inter-rater reliability.

Internal reliability measures the extent to which constituents of a test contribute to its reliability, that is the coherence of the results of different items of the test.

Internal reliability can be established through inter-item correlation, item-total correlation, and split-half reliability.

In split-half reliability the items are divided into two halves and then the correlation between two halves of the test is established determining the correlation coefficient. 

FAQs about Split half reliability 

What is split-half reliability in psychology?

The split-half reliability test is split in two halves.

The scores on two halves are measured for each individual and then correlation between two sets of scores for each individual is determined to establish split-half reliability of a scale.

Split half reliability should not be confused with validity which is related to establishing if the test measures what it is supposed to measure.

What are the three types of reliability?

Psychologists usually take into account the following three types of reliability:

– Test-retest reliability (Scale is administered is twice and correlation between two sets of scores is established)

– Internal consistency (It is the correlation across the item, how much items of the scale are internally consistent)

– Inter-Rater reliability (Reliability is established through consistency in scoring of different raters)

What is Validity? 

Validity of a measure implies that the test measures what it is supposed to measure.

For a test to be reliable it should also be valid. 






Was this helpful?

Thanks for your feedback!