How to estimate the precision of your SUS score
Moving from the respondents’ score to the user base
Although the System Usability Scale (SUS) is frequently utilised by UX practitioners, the precision of this measure is often overlooked. If 100 respondents yield an average score of 70 out 100, where does the true score for the whole user base lie?
In this article we take a quick look at the System Usability Scale for measuring the perception of usability and later how to estimate the precision of such measure. If you are familiar with this tool, I suggest you to jump to the section ‘Estimating the precision for the SUS’.
What is the SUS?
The System Usability Scale (SUS) is a standardised questionnaire designed to measure the perceived usability of a product. For this reason, it provides a subjective usability metric, in contrast with objective usability metrics like, task success rate, number of errors, or time on task.
Standardised questionnaires are based on established and validated measures, ensuring that they measure what they are intended to measure (valid) and that the results are consistent across different studies (reliable). Over the years the SUS has become an industry standard for measuring perceived usability, with references in over 1300 scientific articles and publications.
The benefits of using standardised measures include increased objectivity, reliability, validity, and cross/longitudinal comparison. Moreover, by using standardised questionnaires, researchers reduce the time it needs for crafting neat questionnaires.
The SUS consists of 10 items. The respondents provide their degree of agreement using a five response rate (from 1, Strongly disagree, to 5, Strongly agree). The final score provides the perceived degree of usability.
When used during usability studies with small samples, it is actually good practice to add an open-ended question after each question to probe further if the score is low (e.g. ‘What is the reason for this scoring?’, ‘Can you tell more?’) to dive into user pain points.
How to calculate the score
Calculating the score of the SUS can be tricky because odds item are coded positively (like #3, ‘I thought the system was easy to use’) while even ones are coded negatively (like, #2, ‘I found the system unnecessarily complex’). Moreover, the score must summed together and then multiplied by 2.5 to provide a score within the range 0–100.
Anyhow, the calculation of the score can be breakdown into few simple steps. For each respondent, proceed as follows:
- For odd items, subtract one from the user response.
- For even-numbered items, subtract the user responses from 5
- Add up the converted responses for each user and multiply that total by 2.5.
Compute the average from each respondent’s SUS score. Keep in mind that although the scores are 0–100, they are not meant as percentage.
For interpreting the SUS, it can be useful to associate it with some adjectives describing the product usability as shown below. More details about how to interpreting the SUS are provide in 5 Ways to Interpret a SUS Score by Sauro.
Estimating the precision for the SUS
For estimating the precision of the SUS score we need to introduce the idea of confidence interval around a mean.
A confidence interval is a range of values, built from the observation gathered on a sample, that we think will contain the true (unknown) population parameter with a certain probability.
The key point of such calculation is the understanding that we will never know the true value for the a user base made up by thousands of people, we can only make ‘estimates’ based on a few respondents. For this reason, we need a way to estimate the accuracy of such estimation.
The confidence interval is affected by three variables:
- the sample size: the greater the number of respondents, the narrower the range of values (thus, a better estimation of the true value);
- the variability of the response, or formally, the standard deviation. A high variability among the respondents’ score will result in a in wider confidence intervals, as it will be hard to pinpoint the exact value;
- a probability level associated with the confidence interval (see the next section if you wish to dive into the details).
After collecting the SUS responses we have of all these elements: we know how many respondents completed the questionnaire; we can easily calculate the standard deviation for the scores in Excel; and for the probability we can use the ‘constant’ 1.96.
We can then proceed computing the confidence interval by applying the following formula:
Where sd is the standard deviation of the sample, n is the sample size, and 1.96 is the value associated with the probability level.
How to report the score results and its precision
After the calculation we can decide on the best way to report the results.
One common way is to report the results using the margin of errors, namely half of the width of the confidence interval. For instance, suppose for a score of 70 you found that the width of the confidence interval is equal to 10. The results could then be written as: ‘The SUS score for product X equals 70 ± 5’. In this case, the reader will know that the true score for the user base will fall between 65 (70 - 5) and 75 (70 + 5).
Another way is to report the extreme values of the confidence interval among brackets. This way is less common among practitioners while it is often found in scientific papers. Taking the example above the results will be written as: ‘The SUS score for product X is 70 with 95% CI [65, 75]’.
Read for more details: The level of confidence
While calculating the accuracy of the SUS measure we said that the range of values of a confidence interval is tied to a ‘a probability level’: formally, this is called the level of confidence.
The level of confidence (LoC) provides the probability that the estimated confidence interval will contain the true population parameter if the same questionnaire were administered to many random samples drawn from the user base. Taking the opposite perspective, you can think about the confidence level as an indication of how much you are willing to be wrong.
Among UX practitioners the LoC is often set at 90% or 95%. This levels are set as a nice trade-off between the sample size and the accuracy of day-to-day measures. The ‘constants’ associated with these levels of confidence are 1.64 for the 90% and 1.96 for the 95%.
However, there are occasions or industry, especially the health sector, where you might want to have a higher degree of confidence in your estimate (e.g. estimating the maximum time it takes to go through defibrillators instructions). In such cases, a 99% confidence interval is preferred (and the associated value is then 2.57).
Note. What said above assumes your sample is large; for small samples the values associated with the probabilities levels might be different. You can find the right value using an online calculator.