[cffi-devel] how to treat expected failures in tests

newer
[cffi-devel] Git repositories for...

Anton Vodonosov

10 Jan 2012 10 Jan '12

noon

Hello. For my common lisp testing project I aggregate results of a library test suite into single value - ok/fail. I just tested ECL and have the following output from CFFI test suite: 4 out of 228 total tests failed: DEFCFUN.NOOP, CALLBACKS.BFF.1, STRING.ENCODING.UTF-16.BASIC, STRING.ENCODINGS.ALL.BASIC. No unexpected failures. What meaning do you put into the term "expected failure"? Does it mean the library is buggy, but these bugs are known? Or it means that some non-required features are absent, but the library in general OK? I am interested in both short answer - as a library author, how do you think CFFI test suite should be marked if only expected failures present - OK or FAIL? And also I am curious in this concrete example, what these 4 failures mean for CFFI on ECL? Best regards, - Anton

Show replies by date

Luís Oliveira

10 Jan 10 Jan

12:28 p.m.

On Tue, Jan 10, 2012 at 12:00 PM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...

What meaning do you put into the term "expected failure"? Does it mean the library is buggy, but these bugs are known? Or it means that some non-required features are absent, but the library in general OK?

I may have not been consistent in my usage of marking expected failures, but they mark known bugs unlikely to be fixed in the short-term. Either in CFFI or in the Lisp implementation. Now that we have a bugtracker, known failures should definitely point to their respective issue. Non-required features are marked using cffi-sys::foo symbols in *features* so we shouldn't have failures related to that.

...

I am interested in both short answer - as a library author, how do you think CFFI test suite should be marked if only expected failures present - OK or FAIL?

Depends on the use case. In terms of notifications, I would rather be warned about new failures. In terms of a summary, I'd like to see the results broken down into OK, FAIL, KNOWNFAIL.

...

And also I am curious in this concrete example, what these 4 failures mean for CFFI on ECL?

DEFCFUN.NOOP fails because ECL's :void type returns NIL instead of (VALUES) and we'd need to implement a work-around in CFFI. Since this is a such a minor detail, it's sitting unfixed in the foreseeable future. CALLBACKS.BFF.1 is an ECL bug: it used to crash (or hang?) while compiling the SUM-126-NO-LL defcallback. It might have been fixed in the meantime. The string encoding failures are a known CFFI bug, IIRC. HTH, -- Luís Oliveira http://r42.eu/~luis/

Jeff Cunningham

7:12 p.m.

On 01/10/2012 04:28 AM, Luís Oliveira wrote:

...

...
I am interested in both short answer - as a library author, how do you think CFFI test suite should be marked if only expected failures present - OK or FAIL? Depends on the use case. In terms of notifications, I would rather be warned about new failures. In terms of a summary, I'd like to see the results broken down into OK, FAIL, KNOWNFAIL.

How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL? You have to consider the the cases where one expects a failure but it passes too. If those are too long, use the Bayes approach: FP and FN (false-positive and false-negative) Jeff

Anton Vodonosov

11 Jan 11 Jan

5:01 a.m.

On 01/10/2012 04:28 AM, Luís Oliveira wrote:

...

I may have not been consistent in my usage of marking expected failures, but they mark known bugs unlikely to be fixed in the short-term. Either in CFFI or in the Lisp implementation. ... In terms of notifications, I would rather be warned about new failures. In terms of a summary, I'd like to see the results broken down into OK, FAIL, KNOWNFAIL.

10.01.2012, 23:12, "Jeff Cunningham" <jeffrey@jkcunningham.com>:

...

How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL? You have to consider the the cases where one expects a failure but it passes too.

I think it is rather theoretical. If no test frameworks provide a notion of UNEXPECTEDOK, this means it was not needed in regression testing practice. I am even reluctant to the EXPECTEDFAIL, because the word is contradictory and the meaning is not obvious and confusing. If take into account that test results are observed not only by developers, but also by library users, we can imagine a user seeing EXPECTEDFAIL and asking himself: "Excpected FAIL... Is it OK? Can I use the library?" But I see that several regression testing frameworks provide a notion of expected failures and developers use it. And now I understand the goal - to simplify detection of _new_ regressions. Therefore I think I will introduce an expected failure status in the cl-test-grid (in the near feature). Jeff Cunningham:

...

In a testing scenario, "expected failure" to me means the test was designed to fail and it did. Usually, these are set up to test error handling.

Robert Goldman:

...

That is not how the term is used in the CFFI tests, or in most of the unit testing libraries.

Indeed, Robert is right. If we want to test error handling by designing a test which should signal an error, than if error is really signaled, the test status is OK, and if error is not signaled, the status is FAIL.

...

In a large testing environment we would periodically toss in a couple tests we knew would fail - one thing it tests it that the people running the tests are actually running them. If they didn't come back with these failures we knew there was a breakdown in the process.

The goal of cl-test-grid is that if people are running tests, the results are shared and we can always check the online reports :) PS: even today it is possible to distinguish expected failure from unexpected in cl-test-grid - one just needs to click the "fail" link to open the logs, where the tests suite prints if the failures are unexpected or not. BTW, you might have noticed that CFFI has unexpected failures almost on all the lisp implementations. Best regards, - Anton

Martin Simmons

3:08 p.m.

...

...
...
...
...
On Wed, 11 Jan 2012 09:01:19 +0400, Anton Vodonosov said:

On 01/10/2012 04:28 AM, Luís Oliveira wrote:

...
I may have not been consistent in my usage of marking expected failures, but they mark known bugs unlikely to be fixed in the short-term. Either in CFFI or in the Lisp implementation. ... In terms of notifications, I would rather be warned about new failures. In terms of a summary, I'd like to see the results broken down into OK, FAIL, KNOWNFAIL.

10.01.2012, 23:12, "Jeff Cunningham" <jeffrey@jkcunningham.com>:

...
How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL? You have to consider the the cases where one expects a failure but it passes too.

I think it is rather theoretical. If no test frameworks provide a notion of UNEXPECTEDOK, this means it was not needed in regression testing practice.

I am even reluctant to the EXPECTEDFAIL, because the word is contradictory and the meaning is not obvious and confusing.

If take into account that test results are observed not only by developers, but also by library users, we can imagine a user seeing EXPECTEDFAIL and asking himself: "Excpected FAIL... Is it OK? Can I use the library?"

But I see that several regression testing frameworks provide a notion of expected failures and developers use it.

And now I understand the goal - to simplify detection of _new_ regressions.

Therefore I think I will introduce an expected failure status in the cl-test-grid (in the near feature).

FWIW, our internal test harness uses the term "known" rather than "expected" for this situation. The per-release/per-platform list of known failures is kept separate from the tests, which allows the test harness to report success as long as the set of known failures matches. __Martin

Daniel Herring

7:09 a.m.

On Tue, 10 Jan 2012, Jeff Cunningham wrote:

...

How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?

FWIW, here's one established set of terms: PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED (XPASS and XFAIL are not in POSIX; change test polarity if desired) http://www.gnu.org/software/dejagnu/manual/x47.html#posix - Daniel

Daniel Herring

7:16 a.m.

On Wed, 11 Jan 2012, Daniel Herring wrote:

...

On Tue, 10 Jan 2012, Jeff Cunningham wrote:

...
How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?

FWIW, here's one established set of terms: PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED (XPASS and XFAIL are not in POSIX; change test polarity if desired) http://www.gnu.org/software/dejagnu/manual/x47.html#posix

See also these test protocols: http://testanything.org/ https://launchpad.net/subunit - Daniel

Gustavo Henrique Milaré

1:23 p.m.

I would create a KNOWNFAILURE (or something like that) for failures which are known but the developers won't bother to fix it for the present time. My two cents. 2012/1/11 Daniel Herring <dherring@tentpost.com>

...

On Wed, 11 Jan 2012, Daniel Herring wrote:

...
On Tue, 10 Jan 2012, Jeff Cunningham wrote:

...
How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?

FWIW, here's one established set of terms: PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED (XPASS and XFAIL are not in POSIX; change test polarity if desired) http://www.gnu.org/software/**dejagnu/manual/x47.html#posix<http://www.gnu.org/software/dejagnu/manual/x47.html#posix>

See also these test protocols: http://testanything.org/ https://launchpad.net/subunit

- Daniel

______________________________**_________________ cffi-devel mailing list cffi-devel@common-lisp.net http://lists.common-lisp.net/**cgi-bin/mailman/listinfo/cffi-**devel<http://lists.common-lisp.net/cgi-bin/mailman/listinfo/cffi-devel>

Robert Goldman

3 p.m.

On 1/11/12 Jan 11 -1:16 AM, Daniel Herring wrote:

...

On Wed, 11 Jan 2012, Daniel Herring wrote:

...
On Tue, 10 Jan 2012, Jeff Cunningham wrote:

...
How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?

FWIW, here's one established set of terms: PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED (XPASS and XFAIL are not in POSIX; change test polarity if desired) http://www.gnu.org/software/dejagnu/manual/x47.html#posix

I guess I'd be inclined to say "too bad for POSIX" and add XPASS and XFAIL.... The reason that I'd be willing to flout (or "extend and extinguish" ;->) the standard is that there is no obvious advantage to POSIX compliance in this case that would compensate for the loss in information. cheers, r

Jeffrey Cunningham

4:30 p.m.

On Wed, 11 Jan 2012 07:00:53 -0800, Robert Goldman <rpgoldman@sift.info> wrote:

...

On 1/11/12 Jan 11 -1:16 AM, Daniel Herring wrote:

...
On Wed, 11 Jan 2012, Daniel Herring wrote:

...
On Tue, 10 Jan 2012, Jeff Cunningham wrote:

...
How about OK, FAIL, UNEXPECTEDOK, and EXPECTEDFAIL?

FWIW, here's one established set of terms: PASS, FAIL, UNRESOLVED, UNTESTED, UNSUPPORTED (XPASS and XFAIL are not in POSIX; change test polarity if desired) http://www.gnu.org/software/dejagnu/manual/x47.html#posix

I guess I'd be inclined to say "too bad for POSIX" and add XPASS and XFAIL....

The reason that I'd be willing to flout (or "extend and extinguish" ;->) the standard is that there is no obvious advantage to POSIX compliance in this case that would compensate for the loss in information.

cheers, r

I agree. I really have no idea what is common practice in standard Unit Testing protocols - it isn't my background (which is mathematics). The only reason I suggested the additions is that it is useful information, some of which is lost if you don't have all four cases. And in my consulting practice I have used all four and seen them in use by others in one form or another in most test settings. There are many good descriptions of binary hypothesis testing, here is one: (the two models in this setting would be something like H='test passes' and 0='test fails') "In binary hypothesis testing, assuming at least one of the two models does indeed correspond to reality, there are four possible scenarios: Case 1: H 0 is true, and we declare H 0 to be true Case 2: H 0 is true, but we declare H 1 to be true Case 3: H 1 is true, and we declare H 1 to be true Case 4: H 1 is true, but we declare H 0 to be true In cases 2 and 4, errors occur. The names given to these errors depend on the area of application. In statistics, they are called type I and type II errors respectively, while in signal processing they are known as a false alarm or a miss." (from http://cnx.org/content/m11531/latest/) One might argue that Bayes testing procedures are not appropriate in software verification tests but I think this would be short-sighted. It is virtually impossible to design tests which cover every possible data/usage scenario for any but the simplest pieces of code. So what in fact happens is that the test designer picks the tests he thinks are most important. That's where the statistics come in, in the broader sense. Testing several hundred out of the hundreds of thousands or millions of possible permutations of test parameters always implies that statistical assumptions are being made. Being limited to 2 of 4 test results makes it impossible to evaluate the results with any degree of rigor. I am indifferent as to the terminology applied to cases 2 and 4, so long as they are available. If they are not, it throws unnecessary uncertainty over the entire corpus of test results. And having them available doesn't force those who don't see their necessity to use them. They can choose to simply ignore them and limit their information to the two conditional cases: {Case 1 | not Case 2} {Case 3 | not Case 4} Regards, Jeff

Anton Vodonosov

13 Jan 13 Jan

6:52 a.m.

11.01.2012, 20:31, "Jeffrey Cunningham" <jeffrey@jkcunningham.com>:

...

I really have no idea what is common practice in standard Unit Testing protocols - it isn't my background (which is mathematics). The only reason I suggested the additions is that it is useful information, some of which is lost if you don't have all four cases. And in my consulting practice I have used all four and seen them in use by others in one form or another in most test settings.

Maybe you are right, and when a test marked as a "known failure" passes we should draw user and developer attention to it, we should not just mark it as "OK". My concerns are that I want to keep things as simple as possible - support for know fail / unexpected ok would require to unify the way how it is represented in all the testing frameworks used by CL libraries. Taking into account that some testing frameworks just does not compile on some lisps, I consider possible ways to postpone results detalization until we have a reliable way to deliver results. Also, I display not status of individual test, but an aggregated status of the whole test suite. If all the failures are "known", the aggregated status will be "know failure". If all OKs are unexpected, the aggregated status is "unexpected OK". But if both know failures and unexpected OKs present, how to combine them? Probably just as "fail", and expect the maintainer to click the status to open the full library log and find details there.

...

There are many good descriptions of binary hypothesis testing, here is one: [...] from http://cnx.org/content/m11531/latest/

...

(the two models in this setting would be something like H='test passes' and 0='test fails')

Test fails or passes is not a hypothesis, but a given measure - we know the test status from the test suite. I have impression you speak not about tests marked as "known failure", but about the error handling tests, where we expect particular code to signal an error, and the test verifies that the error is really signaled. If the error is signaled - test passes; if is not signaled while expected - test fails. It's another question, which I leave to the test developers. If we have test pass/fail as given measurements, the hypothesis pair user is interested in are H0 "The library version is broken" and H1 "I can use this version of the library, it correctly implements the functions specified" Another pair of hypothesis, important for a developer are: H0 "My recent changes did not brake anything" and H1 "My recent changes introduce new bugs". That's where annotating the given measures "test fails" by an attribute "known failure" helps.

...

One might argue that Bayes testing procedures are not appropriate in software verification tests but I think this would be short-sighted.

You are right. QA professionals approach the problem using statistical methods. I remember in university there was a course about software reliability, they describe methods to predict number of undetected bugs remaining in the system, probability of failure during use of the system, etc.

Anton Vodonosov

11 Mar 11 Mar

8:22 p.m.

Hello. Thanks everyone who answered in this thread. It was very helpful. I now collect test results for CFFI (and other libraries using the RT test framework) detailed to individual test failures, including the information what failures are known (aka "expected"). Now implementing it for other test frameworks. As for CFFI, you can see that from 14 Lisp / OS combination we've run the tests, only on two of the all the failures are known: http://common-lisp.net/project/cl-test-grid/pivot_lib-lisp_ql.html I also apply the "known failures" idea in another way - I compare test results of two consecutive quicklisp distributions on the same Lisp implementation and detect new failures, which were absent in the old version. (Here "known" failures are failures in the previous version - something along the lines of keeping the list of "known" failures separate from the tests.) That way, even it the library test suite already had failures, we can detect and quickly react to new bugs. Best regards, - Anton

Luís Oliveira

12 Mar 12 Mar

11:55 p.m.

On Sun, Mar 11, 2012 at 8:22 PM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...

As for CFFI, you can see that from 14 Lisp / OS combination we've run the tests, only on two of the all the failures are known: http://common-lisp.net/project/cl-test-grid/pivot_lib-lisp_ql.html

Cool stuff! However, the results are a bit depressing. So many fails. :-) Perhaps you could show the ration of failed to total tests and maybe show know-fail/unexpected-ok in yellow/orange rather than red. I wonder if compilation errors could be printed. Error messages like this are not very helpful: <http://cl-test-grid.appspot.com/blob?key=AMIfv97suboJpeei-uBWzlkqcR7CTlyh0Izhvi7u_29HNBgu80ScYf0Mj6zWPjgbsosA-F0Q12HP8o9S5zhsEelTfss8_3C7sjgcuG_q_grR-jMfXPLLRzu6CNytLoNk23rwqlQ6AsajxTRYFubFbz3iBWl5uo8iZQ>. Cheers, -- Luís Oliveira http://r42.eu/~luis/

Robert Goldman

13 Mar 13 Mar

3:40 p.m.

On 3/12/12 Mar 12 -6:55 PM, Luís Oliveira wrote:

...

On Sun, Mar 11, 2012 at 8:22 PM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...
As for CFFI, you can see that from 14 Lisp / OS combination we've run the tests, only on two of the all the failures are known: http://common-lisp.net/project/cl-test-grid/pivot_lib-lisp_ql.html

Cool stuff!

However, the results are a bit depressing. So many fails. :-) Perhaps you could show the ration of failed to total tests and maybe show know-fail/unexpected-ok in yellow/orange rather than red.

I wonder if compilation errors could be printed. Error messages like this are not very helpful: <http://cl-test-grid.appspot.com/blob?key=AMIfv97suboJpeei-uBWzlkqcR7CTlyh0Izhvi7u_29HNBgu80ScYf0Mj6zWPjgbsosA-F0Q12HP8o9S5zhsEelTfss8_3C7sjgcuG_q_grR-jMfXPLLRzu6CNytLoNk23rwqlQ6AsajxTRYFubFbz3iBWl5uo8iZQ>.

Cheers,

In the hopes it will be helpful, here are the test results after a git pull of CFFI today, on Mac OS X, Allegro CL 8.2 64-bit: 21 out of 260 total tests failed: FUNCALL.VARARGS.DOUBLE, DEFCFUN.UNSIGNED-LONG-LONG, DEFCFUN.NOOP, DEFCFUN.VARARGS.FLOAT, DEFCFUN.VARARGS.DOUBLE, DEFCFUN.BFF.1, DEFCFUN.BFF.2, CALLBACKS.BFF.1, CALLBACKS.BFF.2, FOREIGN-GLOBALS.REF.UPPERCASEINT2, FOREIGN-GLOBALS.REF.UPPER-CASE-INT2, FOREIGN-GLOBALS.REF.MIXEDCASEINT2, FOREIGN-GLOBALS.REF.MIXED-CASE-INT2, FOREIGN-ALLOC.10, POINTERP.4, POINTERP.5, POINTER-EQ.NON-POINTERS.1, POINTER-EQ.NON-POINTERS.2, NULL-POINTER-P.NON-POINTER.2, STRING.ENCODING.UTF-16.BASIC, STRING.ENCODINGS.ALL.BASIC. 16 unexpected failures: FUNCALL.VARARGS.DOUBLE, DEFCFUN.VARARGS.FLOAT, DEFCFUN.VARARGS.DOUBLE, DEFCFUN.BFF.1, DEFCFUN.BFF.2, CALLBACKS.BFF.2, FOREIGN-GLOBALS.REF.UPPERCASEINT2, FOREIGN-GLOBALS.REF.UPPER-CASE-INT2, FOREIGN-GLOBALS.REF.MIXEDCASEINT2, FOREIGN-GLOBALS.REF.MIXED-CASE-INT2, FOREIGN-ALLOC.10, POINTERP.4, POINTERP.5, POINTER-EQ.NON-POINTERS.1, POINTER-EQ.NON-POINTERS.2, NULL-POINTER-P.NON-POINTER.2. If this is the sort of thing you want, I will generate more of these. I have a bunch of different CL implementations installed so that I can test ASDF.

Luís Oliveira

7:12 p.m.

On Tue, Mar 13, 2012 at 3:40 PM, Robert Goldman <rpgoldman@sift.info> wrote:

...

If this is the sort of thing you want, I will generate more of these.

What I meant was that cl-test-grid's output is not very informative when there's a compilation error while loading the test suite. Does that make sense? -- Luís Oliveira http://r42.eu/~luis/

Anton Vodonosov

9:31 p.m.

Hello, thanks for the feedback 13.03.2012, 03:55, "Luís Oliveira" <luismbo@gmail.com>:

...

However, the results are a bit depressing. So many fails. :-)

Not so many - 20 failed tests in total.The same failures repeat on different lisps. Here is the breakdown by failures, for the quicklisp 2012-02-08 and the lisp implementations we tested: "callbacks.bff.1" => ("ccl-1.7-f95-linux-x86" "ccl-1.7-f95-macosx-x64" "ccl-1.7-f95-win-x86" "ccl-1.8-f95-macosx-x64" "cmu-20c_release-20c__20c_unicode_-linux-x86" "ecl-11.1.1-606449eb-linux-x86") "callbacks.bff.2" => ("ccl-1.7-f95-linux-x86" "ccl-1.7-f95-macosx-x64" "ccl-1.7-f95-win-x86" "ccl-1.8-f95-macosx-x64" "cmu-20c_release-20c__20c_unicode_-linux-x86") "callbacks.uninterned" => ("ecl-11.1.1-606449eb-linux-x86") "defcfun.bff.2" => ("ccl-1.7-f95-linux-x86" "ccl-1.7-f95-win-x86" "clisp-2.49-unix" "clisp-2.49-win" "cmu-20c_release-20c__20c_unicode_-linux-x86" "sbcl-1.0.54-linux-x86") "defcfun.noop" => ("ccl-1.7-f95-linux-x86" "ccl-1.7-f95-macosx-x64" "ccl-1.7-f95-win-x86" "ccl-1.8-f95-macosx-x64" "ecl-11.1.1-606449eb-linux-x86") "defcfun.stdcall.1" => ("ccl-1.7-f95-win-x86" "clisp-2.49-win") "defcfun.undefined" => ("cmu-20c_release-20c__20c_unicode_-linux-x86") "defcfun.varargs.double" => ("ccl-1.7-f95-win-x86" "clisp-2.49-win") "defcfun.varargs.float" => ("ccl-1.7-f95-win-x86" "clisp-2.49-win") "foreign-symbol-pointer.1" => ("ccl-1.7-f95-win-x86" "clisp-2.49-win") "funcall.stdcall.1" => ("ccl-1.7-f95-win-x86" "clisp-2.49-win") "funcall.varargs.double" => ("ccl-1.7-f95-win-x86" "clisp-2.49-win") "string.encoding.utf-16.basic" => ("ccl-1.7-f95-linux-x86" "ccl-1.7-f95-macosx-x64" "ccl-1.7-f95-win-x86" "ccl-1.8-f95-macosx-x64" "clisp-2.49-unix" "clisp-2.49-win" "cmu-20c_release-20c__20c_unicode_-linux-x86" "ecl-11.1.1-606449eb-linux-x86" "sbcl-1.0.49-linux-amd64" "sbcl-1.0.54-linux-x86" "sbcl-1.0.54.45-a2bef14-macosx-x64") "string.encodings.all.basic" => ("ccl-1.7-f95-linux-x86" "ccl-1.7-f95-macosx-x64" "ccl-1.7-f95-win-x86" "ccl-1.8-f95-macosx-x64" "clisp-2.49-unix" "clisp-2.49-win" "cmu-20c_release-20c__20c_unicode_-linux-x86" "ecl-11.1.1-606449eb-linux-x86" "sbcl-1.0.49-linux-amd64" "sbcl-1.0.54-linux-x86" "sbcl-1.0.54.45-a2bef14-macosx-x64") "struct.alignment.3" => ("ccl-1.7-f95-linux-x86") "struct.alignment.4" => ("ccl-1.7-f95-linux-x86") "struct.alignment.5" => ("ccl-1.7-f95-linux-x86") "struct.alignment.6" => ("ccl-1.7-f95-linux-x86") "struct.alignment.7" => ("ccl-1.7-f95-linux-x86") "struct.alignment.8" => ("ccl-1.7-f95-linux-x86")

...

Perhaps you could show the ration of failed to total tests Can not do it now - we do not collect total number of tests in test suite. In my opinion it is not very useful information (comparing to the list of failed tests, which I hope will allow us to prevent new failures in future)

...

and maybe show know-fail/unexpected-ok in yellow/orange rather than red. this is doable - added a TODO item

...

I wonder if compilation errors could be printed. Error messages like this are not very helpful: <http://cl-test-grid.appspot.com/blob?key=AMIfv97suboJpeei-uBWzlkqcR7CTlyh0Izhvi7u_29HNBgu80ScYf0Mj6zWPjgbsosA-F0Q12HP8o9S5zhsEelTfss8_3C7sjgcuG_q_grR-jMfXPLLRzu6CNytLoNk23rwqlQ6AsajxTRYFubFbz3iBWl5uo8iZQ>.

Lisp compilation failures are present in logs, but in this case we have C library compilation error. This may be seen by component description in the ASDF error message: #<C-TEST-LIB "cffi-tests" "tests" "libtest"> Actually, I try to represent this case - absence of the native library and therefore inability to run tests - by status :no-resource. But this failure is signaled differently on windows and linux, I didn't noticed this and in result in this linux test run was unable to recognize the failure as :no-resource. Some details. In cffi-tests.asd the (defmethod perform ((o compile-op) (c c-test-lib)) does not try to run "make" on windows, #-windows (unless (zerop (run-shell-command "cd ~A; make" and on windows the only error signaled in absense of the native library is cffi:load-foreign-library-error. I rely on it when detecting :no-resource. Added TODO item to detect is as :no-resouce on linux too when compile-op for the library fails. Best regards, - Anton

Luís Oliveira

14 Mar 14 Mar

12:01 a.m.

On Tue, Mar 13, 2012 at 9:31 PM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...

13.03.2012, 03:55, "Luís Oliveira" <luismbo@gmail.com>:

...
However, the results are a bit depressing. So many fails. :-)

Not so many - 20 failed tests in total.The same failures repeat on different lisps.

Exactly. There are not that many failures, but the picture drawn by the grid makes the current status look grimmer than it is. :-)

...

...
Perhaps you could show the ration of failed to total tests Can not do it now - we do not collect total number of tests in test suite. In my opinion it is not very useful information (comparing to the list of failed tests, which I hope will allow us to prevent new failures in future)

Agreed. It's not very important.

...

Lisp compilation failures are present in logs, but in this case we have C library compilation error. This may be seen by component description in the ASDF error message: #<C-TEST-LIB "cffi-tests" "tests" "libtest">

Ah, my bad. While I'm making feature requests, perhaps a backtrace would be useful, though? :-)

...

In cffi-tests.asd the

(defmethod perform ((o compile-op) (c c-test-lib))

does not try to run "make" on windows,

#-windows (unless (zerop (run-shell-command "cd ~A; make"

and on windows the only error signaled in absense of the native library is cffi:load-foreign-library-error. I rely on it when detecting :no-resource.

Should we assume that 'make' is available on windows? Is that a common setup for cygwin/mingw? What about other toolsets such as Microsoft's? Do they ship with make? Cheers, -- Luís Oliveira http://r42.eu/~luis/

Anton Vodonosov

12:15 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

14.03.2012, 04:01, "Luís Oliveira" <luismbo@gmail.com>:

...

...
Lisp compilation failures are present in logs, but in this case we have C library compilation error. This may be seen by component description in the ASDF error message: #<C-TEST-LIB "cffi-tests" "tests" "libtest">

Ah, my bad. While I'm making feature requests, perhaps a backtrace would be useful, though? :-)

They would of course.. but there is no portable way to retrieve a backtrace of CL condition.

...

...
In cffi-tests.asd the

(defmethod perform ((o compile-op) (c c-test-lib))

does not try to run "make" on windows,

#-windows (unless (zerop (run-shell-command "cd ~A; make"

and on windows the only error signaled in absense of the native library is cffi:load-foreign-library-error. I rely on it when detecting :no-resource.

Should we assume that 'make' is available on windows? Is that a common setup for cygwin/mingw? What about other toolsets such as Microsoft's? Do they ship with make?

Gnu make is often installed with cygwin (but not so many people install cygwin). Microsoft Visual Studio ships with a program called nmake - mostly compatible with make, but not completely. Again, not everyone have Visual Studio. Actually, make is not that necessary, it's just one C file we need to compile, the compile.bat from the tests directory is OK (but anyway requires Microsoft Visual C compiler). In short, even if we implement the compile-op for windows, it is going to fail for 85% of users. I think it's OK to ask user to compile the library manually. What I need to fix, is to more correctly recognize the cases when tests are impossible to run. Probably some improvements in the compile-op implementation will help here, but I don't know yet what it would be. BTW, to be clear, the failure on ABCL represented in the logs as Class not found: com.sun.jna.Native is also :no-resouce - CFFI on ABCL requires jna.jar to present in classpaths, which I don't have. I have a TODO item recognize this situation as :no-resource, too.

Luís Oliveira

12:29 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

On Wed, Mar 14, 2012 at 12:15 AM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...

They would of course.. but there is no portable way to retrieve a backtrace of CL condition.

swank should provide a portable way to do that.

...

Gnu make is often installed with cygwin (but not so many people install cygwin). Microsoft Visual Studio ships with a program called nmake - mostly compatible with make, but not completely. Again, not everyone have Visual Studio. Actually, make is not that necessary, it's just one C file we need to compile, the compile.bat from the tests directory is OK (but anyway requires Microsoft Visual C compiler).

In short, even if we implement the compile-op for windows, it is going to fail for 85% of users.

I think it's OK to ask user to compile the library manually.

Sure, but it's not suitable for a buildbot or something along those lines. I assume that at some point test results for cl-test-grid will be provided by automated runs? I've registered this issue in the bug tracker... -- Luís Oliveira http://r42.eu/~luis/

Stelian Ionescu

12:34 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

On Wed, 2012-03-14 at 00:29 +0000, Luís Oliveira wrote:

...

On Wed, Mar 14, 2012 at 12:15 AM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...
They would of course.. but there is no portable way to retrieve a backtrace of CL condition.

swank should provide a portable way to do that.

Or even better: https://gitorious.org/conium -- Stelian Ionescu a.k.a. fe[nl]ix Quidquid latine dictum sit, altum videtur. http://common-lisp.net/project/iolib

Robert Goldman

12:35 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

There is trivial-backtrace, I believe, if one is willing to include that in the CL-test-grid. Possibly a "smaller" include than swank. Sent from my iPad On Mar 13, 2012, at 19:30, "Luís Oliveira" <luismbo@gmail.com> wrote:

...

On Wed, Mar 14, 2012 at 12:15 AM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...
They would of course.. but there is no portable way to retrieve a backtrace of CL condition.

swank should provide a portable way to do that.

...
Gnu make is often installed with cygwin (but not so many people install cygwin). E iMicrosoft Visual Studio ships with a program called nmake - mostly compatible with make, but not completely. Again, not everyone have Visual Studio. Actually, make is not that necessary, it's just one C file we need to compile, the compile.bat from the tests directory is OK (but anyway requires Microsoft Visual C compiler).

In short, even if we implement the compile-op for windows, it is going to fail for 85% of users.

I think it's OK to ask user to compile the library manually.

Sure, but it's not suitable for a buildbot or something along those lines. I assume that at some point test results for cl-test-grid will be provided by automated runs?

I've registered this issue in the bug tracker...

-- Luís Oliveira http://r42.eu/~luis/

Anton Vodonosov

12:58 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

...

On Mar 13, 2012, at 19:30, "Luís Oliveira" <luismbo@gmail.com> wrote:

...
On Wed, Mar 14, 2012 at 12:15 AM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...
They would of course.. but there is no portable way to retrieve a backtrace of CL condition. swank should provide a portable way to do that.

14.03.2012, 04:35, "Robert Goldman" <rpgoldman@gmail.com>:

...

There is trivial-backtrace, I believe, if one is willing to include that in the CL-test-grid. Possibly a "smaller" include than swank.

14.03.2012, 04:34, "Stelian Ionescu" <sionescu@cddr.org>:

...

Or even better: https://gitorious.org/conium

I knew about trivial-backgrace (it borrows code from swank), but conium is new for me. I will keep that it mind, but it will not always works (neither of them). For example ECL - it's lisp-to-C compiler strips function names, so that backtrace is unreadable (both in swank and trivial-backtrace). And for ECL, last time I tried, it was only able to retrieve backtrace of the REPL thread, but not for other treads. Actually, backtrace of compilation error will always be similar: test-grid::libtest -> quicklisp:quickload -> asdf:operate ... Not that many information it will give. Also, I want to keep the test runner workable, even if trivial-backgrace, or other dependency can't be compiled on that lisp. So it should be implemented with care. Considering all this, I think in the near future I will not work on adding backtraces. While they are good, usually it's not very difficult to understand the reason without backtrace (in the worst case, by running the tests again to reproduce the problem). Best regards, - Anton

Anton Vodonosov

1:07 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

14.03.2012, 04:29, "Luís Oliveira" <luismbo@gmail.com>:

...

...
I think it's OK to ask user to compile the library manually.

Sure, but it's not suitable for a buildbot or something along those lines. I assume that at some point test results for cl-test-grid will be provided by automated runs?

In ideal world build bot will work fully automatically. But if some library test suite is not fully automated. I though about approach, when by default, the contributor makes zero effort - runs a simple command. But if he is willing to help more, we may inform him what manual preparations are possible to test more libraries. In the default mode these not-automated libraries just have status :no-resource. Probably the extreme example of a library which can not be fully automated, is cl-sql. It's test sute needs running SQL servers (for all the backends - postgre, oracle, ect).

...

I've registered this issue in the bug tracker...

I can't find it on launchpad... Idea - maybe just commiting precompiled .dll binary would be a solution?

Luís Oliveira

8:44 a.m.

New subject: [cffi-devel] [cl-test-grid] Re: how to treat expected failures in tests

On Wed, Mar 14, 2012 at 1:07 AM, Anton Vodonosov <avodonosov@yandex.ru> wrote:

...

I can't find it on launchpad...

https://bugs.launchpad.net/cffi/+bug/954615 -- Luís Oliveira http://r42.eu/~luis/

Jeff Cunningham

10 Jan 10 Jan

11:51 p.m.

On 01/10/2012 04:00 AM, Anton Vodonosov wrote:

...

Hello.

For my common lisp testing project I aggregate results of a library test suite into single value - ok/fail.

I just tested ECL and have the following output from CFFI test suite:

4 out of 228 total tests failed: DEFCFUN.NOOP, CALLBACKS.BFF.1, STRING.ENCODING.UTF-16.BASIC, STRING.ENCODINGS.ALL.BASIC. No unexpected failures.

What meaning do you put into the term "expected failure"? Does it mean the library is buggy, but these bugs are known? Or it means that some non-required features are absent, but the library in general OK?

I am interested in both short answer - as a library author, how do you think CFFI test suite should be marked if only expected failures present - OK or FAIL?

And also I am curious in this concrete example, what these 4 failures mean for CFFI on ECL?

In a testing scenario, "expected failure" to me means the test was designed to fail and it did. Usually, these are set up to test error handling. In a large testing environment we would periodically toss in a couple tests we knew would fail - one thing it tests it that the people running the tests are actually running them. If they didn't come back with these failures we knew there was a breakdown in the process. Jeff

Robert Goldman

11:58 p.m.

On 1/10/12 Jan 10 -5:51 PM, Jeff Cunningham wrote:

...

On 01/10/2012 04:00 AM, Anton Vodonosov wrote:

...
Hello.

For my common lisp testing project I aggregate results of a library test suite into single value - ok/fail.

I just tested ECL and have the following output from CFFI test suite:

4 out of 228 total tests failed: DEFCFUN.NOOP, CALLBACKS.BFF.1, STRING.ENCODING.UTF-16.BASIC, STRING.ENCODINGS.ALL.BASIC. No unexpected failures.

What meaning do you put into the term "expected failure"? Does it mean the library is buggy, but these bugs are known? Or it means that some non-required features are absent, but the library in general OK?

I am interested in both short answer - as a library author, how do you think CFFI test suite should be marked if only expected failures present - OK or FAIL?

And also I am curious in this concrete example, what these 4 failures mean for CFFI on ECL?

In a testing scenario, "expected failure" to me means the test was designed to fail and it did. Usually, these are set up to test error handling. In a large testing environment we would periodically toss in a couple tests we knew would fail - one thing it tests it that the people running the tests are actually running them. If they didn't come back with these failures we knew there was a breakdown in the process.

That is not how the term is used in the CFFI tests, or in most of the unit testing libraries. Those libraries offer facilities for a test to PASS if and only if the code raises an error (or an error of a particular type). We should not conflate the notion of "raise an error" (or, for that matter, "return NIL") with the notion of "fail a test." Tests are about verifying expectations, and we can have valid, test-passing expectations that code will raise an error. best, r

4887

Age (days ago)

4951

Last active (days ago)

List overview

Download

25 comments

10 participants

participants (10)

Anton Vodonosov
Daniel Herring
Gustavo Henrique Milaré
Jeff Cunningham
Jeffrey Cunningham
Luís Oliveira
Martin Simmons
Robert Goldman
Robert Goldman
Stelian Ionescu

[cffi-devel] how to treat expected failures in tests

tags

participants (10)