11.01.2012, 20:31, "Jeffrey Cunningham" jeffrey@jkcunningham.com:
I really have no idea what is common practice in standard Unit Testing protocols - it isn't my background (which is mathematics). The only reason I suggested the additions is that it is useful information, some of which is lost if you don't have all four cases. And in my consulting practice I have used all four and seen them in use by others in one form or another in most test settings.
Maybe you are right, and when a test marked as a "known failure" passes we should draw user and developer attention to it, we should not just mark it as "OK".
My concerns are that I want to keep things as simple as possible - support for know fail / unexpected ok would require to unify the way how it is represented in all the testing frameworks used by CL libraries. Taking into account that some testing frameworks just does not compile on some lisps, I consider possible ways to postpone results detalization until we have a reliable way to deliver results.
Also, I display not status of individual test, but an aggregated status of the whole test suite. If all the failures are "known", the aggregated status will be "know failure". If all OKs are unexpected, the aggregated status is "unexpected OK". But if both know failures and unexpected OKs present, how to combine them? Probably just as "fail", and expect the maintainer to click the status to open the full library log and find details there.
There are many good descriptions of binary hypothesis testing, here is one: [...] from http://cnx.org/content/m11531/latest/
(the two models in this setting would be something like H='test passes' and 0='test fails')
Test fails or passes is not a hypothesis, but a given measure - we know the test status from the test suite. I have impression you speak not about tests marked as "known failure", but about the error handling tests, where we expect particular code to signal an error, and the test verifies that the error is really signaled. If the error is signaled - test passes; if is not signaled while expected - test fails. It's another question, which I leave to the test developers.
If we have test pass/fail as given measurements, the hypothesis pair user is interested in are H0 "The library version is broken" and H1 "I can use this version of the library, it correctly implements the functions specified"
Another pair of hypothesis, important for a developer are: H0 "My recent changes did not brake anything" and H1 "My recent changes introduce new bugs". That's where annotating the given measures "test fails" by an attribute "known failure" helps.
One might argue that Bayes testing procedures are not appropriate in software verification tests but I think this would be short-sighted.
You are right. QA professionals approach the problem using statistical methods. I remember in university there was a course about software reliability, they describe methods to predict number of undetected bugs remaining in the system, probability of failure during use of the system, etc.