Five methods for eliciting subjective ratings after each task in a usability test were evaluated. The methods included simple Likert scales as well as a technique derived from Usability Magnitude Estimation. They were tested in a large-scale online study in which participants performed six tasks on an Intranet site. Performance data for the tasks reflected the same pattern as all of the subjective ratings. All five methods yielded significant differences in the subjective ratings for the tasks. A sub-sampling analysis showed that one method yielded the most reliable results at the small sample sizes typical of usability tests.