On Wed, 11 Jan 2012 07:00:53 -0800, Robert Goldman rpgoldman@sift.info wrote:
On 1/11/12 Jan 11 -1:16 AM, Daniel Herring wrote:
On Wed, 11 Jan 2012, Daniel Herring wrote:
On Tue, 10 Jan 2012, Jeff Cunningham wrote:
How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?
FWIW, here's one established set of terms: PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED (XPASS and XFAIL are not in POSIX; change test polarity if desired) http://www.gnu.org/software/dejagnu/manual/x47.html#posix
I guess I'd be inclined to say "too bad for POSIX" and add XPASS and XFAIL....
The reason that I'd be willing to flout (or "extend and extinguish" ;->) the standard is that there is no obvious advantage to POSIX compliance in this case that would compensate for the loss in information.
cheers, r
I agree.
I really have no idea what is common practice in standard Unit Testing protocols - it isn't my background (which is mathematics). The only reason I suggested the additions is that it is useful information, some of which is lost if you don't have all four cases. And in my consulting practice I have used all four and seen them in use by others in one form or another in most test settings.
There are many good descriptions of binary hypothesis testing, here is one: (the two models in this setting would be something like H='test passes' and 0='test fails')
"In binary hypothesis testing, assuming at least one of the two models does indeed correspond to reality, there are four possible scenarios: Case 1: H 0 is true, and we declare H 0 to be true Case 2: H 0 is true, but we declare H 1 to be true Case 3: H 1 is true, and we declare H 1 to be true Case 4: H 1 is true, but we declare H 0 to be true In cases 2 and 4, errors occur. The names given to these errors depend on the area of application. In statistics, they are called type I and type II errors respectively, while in signal processing they are known as a false alarm or a miss."
(from http://cnx.org/content/m11531/latest/)
One might argue that Bayes testing procedures are not appropriate in software verification tests but I think this would be short-sighted. It is virtually impossible to design tests which cover every possible data/usage scenario for any but the simplest pieces of code. So what in fact happens is that the test designer picks the tests he thinks are most important. That's where the statistics come in, in the broader sense. Testing several hundred out of the hundreds of thousands or millions of possible permutations of test parameters always implies that statistical assumptions are being made. Being limited to 2 of 4 test results makes it impossible to evaluate the results with any degree of rigor.
I am indifferent as to the terminology applied to cases 2 and 4, so long as they are available. If they are not, it throws unnecessary uncertainty over the entire corpus of test results. And having them available doesn't force those who don't see their necessity to use them. They can choose to simply ignore them and limit their information to the two conditional cases:
{Case 1 | not Case 2} {Case 3 | not Case 4}
Regards, Jeff