Various test case selection criteria have been proposed for quality testing of software. It is a common phenomenon that test sets satisfying different criteria have different sizes and fault-detecting ability. Moreover, test sets that satisfy a stronger criterion and detect more faults usually consist of more test cases. A question that often puzzles software testing professionals and researchers is: when a testing criterion
1 helps to detect more faults than another criterion
2, is it because
1 specifically requires test cases that are
than those for
2, or is it essentially because
more test cases
2? In this paper, we discuss several methods and approaches for investigating this question, and empirically compare several common coverage criteria for testing logical decisions, taking into consideration the different sizes of the test sets required by these criteria. Our results demonstrate evidently that the stronger criteria under study are more fault-sensitive than the weaker ones, not solely because the former require more test cases. More importantly, we have illustrated a general approach, which takes into account the size factor of the generated test sets, for demonstrating the superiority of a testing criterion over another.