[PATCH] Add a TEST-OP-TEST-FAILURE condition for test libraries to sub-class

newer
only perfom when output is missing

Vladimir Sedach

15 Sep 2019 15 Sep '19

3:07 a.m.

Hello! I recently wrote a script to run FiveAM tests for one of my libraries on many different implementations on your own machine: https://gitlab.common-lisp.net/uri-template/uri-template2/blob/master/run-te... It would be really nice if I did not have to copy-paste that script to my other libraries, and instead could contribute a generalized version to Roswell (on which the script is based) and have it work for any project using any test library. What I would like to be able to do, for any system: (handler-case (asdf:test-system "system") (asdf:test-op-test-failure (condition) (princ condition uiop:*stderr*) (uiop:quit 1))) The attached patch adds an ASDF:TEST-OP-TEST-FAILURE condition that test libraries can inherit from. I also added the necessary functionality to FiveAM: https://github.com/sionescu/fiveam/pull/58 It should be easy to add similar functionality to other testing libraries. This will make test automation trivial, with few, if any, changes required to systems that use ASDF and the testing libraries. One thing I would like to discuss is which condition ASDF:TEST-OP-TEST-FAILURE should inherit from. It definitely should not be ERROR - there is no error in test-op, or in running the tests; test failures are a regular and expected occurrence of running tests. Also problematic is widespread abuse of HANDLER-CASE to catch any ERROR; I am afraid if signaled from popular test libraries it would break someone's test running code somewhere. WARNING seems like a nice parent condition, but maybe having ASDF:TEST-OP-TEST-FAILURE inherit from CONDITION is a better idea. Thoughts? Vladimir

Attachments:

0001-Add-a-TEST-OP-TEST-FAILURE-condition-for-test-librar.patch (text/x-diff — 3.2 KB)

Show replies by date

Vladimir Sedach

16 Sep 16 Sep

9:23 a.m.

Stelian, I think there is something wrong with the reply you sent. The Subject header is mangled in the Mailman archives where I spotted it. I never received this email: https://mailman.common-lisp.net/pipermail/asdf-devel/2019-September/006356.h...

...

I'd like to have something that gathers enough information to allow building a suitable integration with continuous integration systems (there are already a few ad-hoc ones around).

What exactly would that look like?

...

For example, a common taxonomy of test status is: SUCEEDED, FAILED, BROKEN, SKIPPED, NOTSTARTED.

I do not see what this has to do with my proposal. FiveAM does not even distinguish failed from broken tests (I do not know if any Common Lisp test libraries do - a broken test would generally be a file compilation error). NOTSTARTED means having asynchronous updates go from the test library to some external system; that is completely out of scope of ASDF.

...

Whatever support ends up in ASDF should be sufficient to allow outputting common rest report formats, such as JUnit, TAP, etc...

You can easily do this with my proposal by having the condition reporter output TAP or JUnit XML. Formatting test results is the responsibility of the test library. There are test libraries for test libraries to help with this formatting: https://github.com/e-user/testbild https://github.com/brobinson9999/cl-tap-producerX in case you want to add JUnit XML output to FiveAM.

...

otherwise we risk ending up with something that cannot be improved because users start relying on it and we don't want to break backwards compatibility.

This is not a problem because conditions can be sub-classed. -- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Robert Goldman

2:13 p.m.

On 16 Sep 2019, at 4:23, Vladimir Sedach wrote:

...

Stelian, I think there is something wrong with the reply you sent. The Subject header is mangled in the Mailman archives where I spotted it. I never received this email:

https://mailman.common-lisp.net/pipermail/asdf-devel/2019-September/006356.h...

FWIW, I didn't either. The display of the subject line on the mailman site may suggest why (I don't know): ``` =?UTF-8?Q?Re:_PATCH:_SIGNAL_a_condition_on_test_failures, _for_use_with_A?= SDF:TEST-SYSTEM ```

...

...
I'd like to have something that gathers enough information to allow building a suitable integration with continuous integration systems (there are already a few ad-hoc ones around).

What exactly would that look like?

Don't a lot of these CI systems just rely on the exit status of a build step? I typically just wrap a trivial script around running the test-op from fiveam-asdf, and make lisp exit with a non-zero status when it encounters an error. That's been sufficient for me, but then I tend to just look at success or failure, and read the console output on failures. I don't do anything fancy like generate XML...

Vladimir Sedach

10:32 p.m.

Robert Goldman <rpgoldman@sift.info> writes:

...

Don't a lot of these CI systems just rely on the exit status of a build step? I typically just wrap a trivial script around running the test-op from fiveam-asdf, and make lisp exit with a non-zero status when it encounters an error. That's been sufficient for me, but then I tend to just look at success or failure, and read the console output on failures. I don't do anything fancy like generate XML...

That is what all of the CI scripts I have looked at in various projects do (there may be others I have not seen; please post them if you know of any). FiveAM itself provides a typical example: https://github.com/sionescu/fiveam/blob/master/.travis.yml#L40 (uiop:quit (if (some (lambda (x) (typep x '5am::test-failure)) (5am:run :it.bese.fiveam)) 1 0)) With my proposal becomes: (handler-case (asdf:test-system "any-system") (asdf:test-op-test-failure (condition) (princ condition uiop:*stderr*) (uiop:quit 1))) And now the lowest common denominator case for CI systems is automatic. To expand on my previous statement that this proposal will not have backward-compatibility problems for CI systems because signals can be sub-classed, here is what that could look like: Someone writes a GitLab continuous integration library, call it cl-gitlab-ci, and wants to add support to test libraries. cl-gitlab-ci defines a condition: (define-condition cl-gitlab-ci-test-failure () ((test-failure-info …) (test-skip-info …)) …) Libraries like FiveAM then add this condition as a subclass and fill in the necessary slots: (define-condition test-spec-failure (cl-gitlab-ci-test-failure asdf:test-op-test-failure) …) This does not break backward compatibility. Same thing for a Travis integration library. When someone decides to write a CI meta-library that abstracts over cl-gitlab-ci, cl-travis-ci, etc. (which as a basis will have what Stelian is asking for), they can define a condition class, generic-ci-test-failure, then send patches to the CI libraries to handle that condition, and send patches to the test libraries to subclass generic-ci-test-failure and fill out the slots (this is just like adding support for another CI library to the test library). Again, this does not break backward compatibility. Currently popular CI programs include Buildbot, GitLab, Jenkins, Travis CI (essentially limited to GitHub). If someone has a proposal for a protocol that will cover the needed features for all of these systems, please post it. Right now the only library for any of these programs that I know of is cl-travis. The hypothetical CI meta-library will ideally be built from experience with several (as of now non-existing) libraries for the various CI programs. Not only do I think that this is outside the scope of ASDF, I also suspect that designing a protocol for CI integration now, without the needed experience from actual libraries for various CI systems, is what is going to result in backward compatibility problems. -- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Mark Evenson

25 Sep 25 Sep

1:15 p.m.

...

On Sep 15, 2019, at 05:07, Vladimir Sedach <vas@oneofus.la> wrote:

Hello!

I recently wrote a script to run FiveAM tests for one of my libraries on many different implementations on your own machine: https://gitlab.common-lisp.net/uri-template/uri-template2/blob/master/run-te...

It would be really nice if I did not have to copy-paste that script to my other libraries, and instead could contribute a generalized version to Roswell (on which the script is based) and have it work for any project using any test library.

What I would like to be able to do, for any system:

(handler-case (asdf:test-system "system") (asdf:test-op-test-failure (condition) (princ condition uiop:*stderr*) (uiop:quit 1)))

[Last year for Emotiq][emotiq] I implemented such a proposal as [asdf-test-harness][] to the maturity needed for us to use as part of Continuout Integration for all our commits. Each flavor of testing framework needs to write a simple adaptor that returns a condition containing a boolean indicating success or failure and the testing framework specific results. It’s been a while so please press me on the claim that my code would be more mature, as it has been enough time that I don’t remember writing the code at the moment… [emotiq]: https://github.com/easye/emotiq/ [asdf-test-harness]: https://github.com/easye/asdf-test-harness/

Vladimir Sedach

7:06 p.m.

Mark Evenson <evenson@panix.com> writes:

...

Each flavor of testing framework needs to write a simple adaptor that returns a condition containing a boolean indicating success or failure and the testing framework specific results.

I am not sure if you are aware of cl-test-grid¹ - it takes the same approach. Unfortunately, it has the same downsides. There is only a subset of the test libraries supported. The code depends on the internal workings of the test libraries and is often out of date. The projects cannot be used as-is to test arbitrary systems (with cl-test-grid, you have to manually add configuration for each system to testsuites/testsuites.lisp, asdf-test-harness seems to depend on emotiq configuration files). This approach has not worked well in the past 8 years (cl-test-grid was started in 2011), and I do not think it is viable. What you have also done in asdf-test-harness, adding an ASDF-TEST-SYSTEM-FAILURE condition to ASDF, is a better approach. I have shown how this can be used by test libraries (with working implementations for FiveAM³ and rove⁴). This removes the need for asdf-test-harness, cl-test-grid, etc. to each have to implement custom adapters for every different test library, completely eliminating the dependency between test harnesses and particular test libraries. The test libraries themselves do not need any new dependencies either. This decouples the systems and will make projects like asdf-test-harness much easier to extend and maintain. As an example, writing a script to test systems in different implementations now becomes trivial.⁵ ¹ https://github.com/cl-test-grid/cl-test-grid ² https://github.com/easye/asdf-test-harness/blob/master/asdf-test-harness.lis... ³ https://github.com/sionescu/fiveam/pull/58 ⁴ https://github.com/vsedach/rove/commit/f6f8822eedc61d131c3b1d37b45c6d48cefcf... ⁵ https://gitlab.common-lisp.net/uri-template/uri-template2/blob/master/run-te... -- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Mark Evenson

26 Sep 26 Sep

11:07 a.m.

...

On Sep 25, 2019, at 21:06, Vladimir Sedach <vas@oneofus.la> wrote:

Mark Evenson <evenson@panix.com> writes:

...
Each flavor of testing framework needs to write a simple adaptor that returns a condition containing a boolean indicating success or failure and the testing framework specific results.

...

Unfortunately, it has the same downsides. There is only a subset of the test libraries supported. The code depends on the internal workings of the test libraries and is often out of date. The projects cannot be used as-is to test arbitrary systems (with cl-test-grid, you have to manually add configuration for each system to testsuites/testsuites.lisp, asdf-test-harness seems to depend on emotiq configuration files).

This approach has not worked well in the past 8 years (cl-test-grid was started in 2011), and I do not think it is viable.

What you have also done in asdf-test-harness, adding an ASDF-TEST-SYSTEM-FAILURE condition to ASDF, is a better approach. I have shown how this can be used by test libraries (with working implementations for FiveAM³ and rove⁴). This removes the need for asdf-test-harness, cl-test-grid, etc. to each have to implement custom adapters for every different test library, completely eliminating the dependency between test harnesses and particular test libraries. The test libraries themselves do not need any new dependencies either.

This decouples the systems and will make projects like asdf-test-harness much easier to extend and maintain.

Thanks for taking the time to understand my code enough to cogently critique the two approaches. As I understand it, both approaches share the need to add a condition to ASDF that indicates that an invocation of an ASDF test-op has failed. The difference is that you propose getting changes to all the various test framework to signal this error condition with a subclass that contains meaningful information for that given framework. Whereas, I (and cl-test-grid)--who was in a hurry to get something working without being able to wait for modifications to the test suites--needs to write an adaptor for each test suite. And consequently these adaptors will be often be “brittle” depending on implementation details that may not be constant over time to work. If this is indeed an accurate summary, then I would endorse your patch to ASDF as necessary for both approaches with the following suggestions for improvements: 1. Have a slot in your base condition class TEST-OP-TEST-FAILURE in which one can record the ASDF component which caused the failure. It is probably possible to dig this information out of the stack, but that will be messy. This would also allow for distinguishing when multiple TEST-OP-TEST-FAILURES are signaled from a single ASDF:TEST-OP invocation, as will be the case when one “chains” test invocation over many ASDF systems. 2. Provide an implementation of the subclass of TEST-OP-TEST-FAILURE that contains the basic structure of a reporter class for the information that should be present in all test frameworks, namely the total number of tests run, the number of failed tests, the identities of the failed tests, and a slot for a human readable error message, along with a reporter function that displays this information. Having an implementation class to work from would make it easier for test frameworks to adapt. 3. Go ahead and define the subclass of this condition when no tests have been run. 4. As for adoption by test framework your strategy, we will have the problem that a given test framework won’t want to adopt the conditions because it isn’t in the version of ASDF they are using, or can easily get a hold of. To solve this, we might somehow define the code within the ASDF source tree so that one can make a standalone ASDF system (“ASDF-TEST-CONDITIONS” or some such) that one may include separately from actually upgrading ASDF. Sincerely, Mark Evenson -- "A screaming comes across the sky. It has happened before but there is nothing to compare to it now."

Vladimir Sedach

27 Sep 27 Sep

7:20 a.m.

Thank you for the specific suggestions Mark. Mark Evenson <evenson@panix.com> writes:

...

1. Have a slot in your base condition class TEST-OP-TEST-FAILURE in which one can record the ASDF component which caused the failure. It is probably possible to dig this information out of the stack, but that will be messy. This would also allow for distinguishing when multiple TEST-OP-TEST-FAILURES are signaled from a single ASDF:TEST-OP invocation, as will be the case when one “chains” test invocation over many ASDF systems.

This is really easy to do with a special variable in an initarg, but are there any systems that you know of that do this? I would definitely like to test with them, because I thought that nested TEST-OP was not supposed to work. From the "More Elaborate Testing" section of the best practices document¹: "You MUST NOT call asdf:operate or any of its derivatives, such as asdf:load-system or asdf:test-system from within a perform method." Unfortunately it looks like that is what ROVE:RUN-SYSTEM-TESTS does exactly that.

...

2. Provide an implementation of the subclass of TEST-OP-TEST-FAILURE that contains the basic structure of a reporter class for the information that should be present in all test frameworks, namely the total number of tests run, the number of failed tests, the identities of the failed tests, and a slot for a human readable error message, along with a reporter function that displays this information. Having an implementation class to work from would make it easier for test frameworks to adapt.

I tried to avoid enforcing required slots, but as both asdf-test-harness and cl-test-grid want a list of failed tests, that is a strong case to make the above slots required in TEST-OP-TEST-FAILURE itself. cl-test-grid wants a list of test names as strings (it wants them down-cased, but that is a detail that can be left to cl-test-grid). A list of strings is a requirement that any test library should be able to satisfy (worst case, it could be a list of random names), and looks to me specific enough for most test harness use cases. The length of the list of failed test names gives the count of failed tests. It seems to me like having a slot for an error message is redundant with the reporter function, given that I think it should be up to the test library to define the reporter function, and not for TEST-OP-TEST-FAILURES to dictate how it is printed. That way, if a test library has a flag to print results in machine readable format, the flag will work without any changes if the overridden reporter function re-uses the library's output facilities, and as long as the test harness PRINCs the condition, the test harness does not need to do anything either.

...

3. Go ahead and define the subclass of this condition when no tests have been run.

I thought about doing this, but with the above slots, there is no need to - the test library can signal TEST-OP-TEST-FAILURE with a 0 count of total number of tests run.

...

4. As for adoption by test framework your strategy, we will have the problem that a given test framework won’t want to adopt the conditions because it isn’t in the version of ASDF they are using, or can easily get a hold of. To solve this, we might somehow define the code within the ASDF source tree so that one can make a standalone ASDF system (“ASDF-TEST-CONDITIONS” or some such) that one may include separately from actually upgrading ASDF.

That is something that Robert brought up on the merge request discussion.² It looks like this can be handled with the #+ #- feature macros or #. read macro to provide CL:WARNING as a fallback super-class. I am open to any ideas. I went ahead and added the slots to ASDF², the FiveAM³, and the rove⁴ implementations. Up next, I am going to work on adding support to cl-test-grid for a library that uses rove, which cl-test-grid does not support yet. ¹ https://github.com/fare/asdf/blob/master/doc/best_practices.md ² https://gitlab.common-lisp.net/asdf/asdf/merge_requests/124 ³ https://github.com/sionescu/fiveam/pull/58 ⁴ https://github.com/fukamachi/rove/pull/29 -- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Anton Vodonosov

30 Sep 30 Sep

11:23 a.m.

Hi. Thank you, everyone, for addressing this topic, having a unified representation of test results would be very useful. A couple of thoughts: - success should also be signaled, so we can distinguish a version where this new protocol is not implemented from the version where tests pass - the minimal requirement is a success / failure designator, the failed test names can be optional - For a caller of asdf:test-op it would be more convenient to have a single signal. Ideally, it should be just a return value of the asdf:operate function, as I understand we only consider the possibility of test result being signaled multiple times during test-op because we hope to make it work for everyone without library authors explicitly modify their code, but adding this new functionality to test frameworks. A good goal, although I can imaging some corner cases. Still, even if we expect test results being signalled multiple times during a test-op, it would be good to provide a wrapper which aggregates them into a single return value. (common-test-results:collect (asdf:test-system "my-system")) - as others mention, to me it also occurred this new functionality should not necessarily be declared inside of ASDF, it could be some separate library, say common-test-result. I'm not 100% sure about this, but currently, lean more towards separate lib, at least for the beginning. ASDF test-op docs could just referer to it. - If delivering test results thourhg a condition, test failure should not be an error or warning, in my opinion. Test error is an anticipated possible outcome. An error during tests an abnormal situation - no access to needed files, memory exhausted, null pointers, etc. - slot for the failing asdf system could probably be avoided, the list failed test names could be enough, if the names are "fully qualified" i.e. include package or system name. 27.09.2019, 10:20, "Vladimir Sedach" <vas@oneofus.la>:

...

Thank you for the specific suggestions Mark.

Mark Evenson <evenson@panix.com> writes:

...
1. Have a slot in your base condition class TEST-OP-TEST-FAILURE in which one can record the ASDF component which caused the failure. It is probably possible to dig this information out of the stack, but that will be messy. This would also allow for distinguishing when multiple TEST-OP-TEST-FAILURES are signaled from a single ASDF:TEST-OP invocation, as will be the case when one “chains” test invocation over many ASDF systems.

This is really easy to do with a special variable in an initarg, but are there any systems that you know of that do this? I would definitely like to test with them, because I thought that nested TEST-OP was not supposed to work. From the "More Elaborate Testing" section of the best practices document¹:

"You MUST NOT call asdf:operate or any of its derivatives, such as asdf:load-system or asdf:test-system from within a perform method."

Unfortunately it looks like that is what ROVE:RUN-SYSTEM-TESTS does exactly that.

...
2. Provide an implementation of the subclass of TEST-OP-TEST-FAILURE that contains the basic structure of a reporter class for the information that should be present in all test frameworks, namely the total number of tests run, the number of failed tests, the identities of the failed tests, and a slot for a human readable error message, along with a reporter function that displays this information. Having an implementation class to work from would make it easier for test frameworks to adapt.

I tried to avoid enforcing required slots, but as both asdf-test-harness and cl-test-grid want a list of failed tests, that is a strong case to make the above slots required in TEST-OP-TEST-FAILURE itself.

cl-test-grid wants a list of test names as strings (it wants them down-cased, but that is a detail that can be left to cl-test-grid). A list of strings is a requirement that any test library should be able to satisfy (worst case, it could be a list of random names), and looks to me specific enough for most test harness use cases.

The length of the list of failed test names gives the count of failed tests.

It seems to me like having a slot for an error message is redundant with the reporter function, given that I think it should be up to the test library to define the reporter function, and not for TEST-OP-TEST-FAILURES to dictate how it is printed. That way, if a test library has a flag to print results in machine readable format, the flag will work without any changes if the overridden reporter function re-uses the library's output facilities, and as long as the test harness PRINCs the condition, the test harness does not need to do anything either.

...
3. Go ahead and define the subclass of this condition when no tests have been run.

I thought about doing this, but with the above slots, there is no need to - the test library can signal TEST-OP-TEST-FAILURE with a 0 count of total number of tests run.

...
4. As for adoption by test framework your strategy, we will have the problem that a given test framework won’t want to adopt the conditions because it isn’t in the version of ASDF they are using, or can easily get a hold of. To solve this, we might somehow define the code within the ASDF source tree so that one can make a standalone ASDF system (“ASDF-TEST-CONDITIONS” or some such) that one may include separately from actually upgrading ASDF.

That is something that Robert brought up on the merge request discussion.² It looks like this can be handled with the #+ #- feature macros or #. read macro to provide CL:WARNING as a fallback super-class. I am open to any ideas.

I went ahead and added the slots to ASDF², the FiveAM³, and the rove⁴ implementations. Up next, I am going to work on adding support to cl-test-grid for a library that uses rove, which cl-test-grid does not support yet.

¹ https://github.com/fare/asdf/blob/master/doc/best_practices.md ² https://gitlab.common-lisp.net/asdf/asdf/merge_requests/124 ³ https://github.com/sionescu/fiveam/pull/58 ⁴ https://github.com/fukamachi/rove/pull/29

-- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Vladimir Sedach

1 Oct 1 Oct

5:24 a.m.

Anton Vodonosov <avodonosov@yandex.ru> writes:

...

- success should also be signaled, so we can distinguish a version where this new protocol is not implemented from the version where tests pass

That is a good idea.

...

- For a caller of asdf:test-op it would be more convenient to have a single signal. Ideally, it should be just a return value of the asdf:operate function, as I understand we only consider the possibility of test result being signaled multiple times during test-op because we hope to make it work for everyone without library authors explicitly modify their code, but adding this new functionality to test frameworks. A good goal, although I can imaging some corner cases. Still, even if we expect test results being signalled multiple times during a test-op, it would be good to provide a wrapper which aggregates them into a single return value.

(common-test-results:collect (asdf:test-system "my-system"))

That is a good idea. I think it goes together well with the fully qualified test names recommendation.

...

- as others mention, to me it also occurred this new functionality should not necessarily be declared inside of ASDF, it could be some separate library, say common-test-result. I'm not 100% sure about this, but currently, lean more towards separate lib, at least for the beginning. ASDF test-op docs could just referer to it.

Raising a signal is a work-around for the inability of TEST-OP to return a result. I would like to avoid making an entire library out of a work-around that is specific to ASDF. -- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Robert Goldman

2 Oct 2 Oct

4:11 p.m.

On 30 Sep 2019, at 6:23, Anton Vodonosov wrote:

...

Hi.

Thank you, everyone, for addressing this topic, having a unified representation of test results would be very useful.

A couple of thoughts:

- success should also be signaled, so we can distinguish a version where this new protocol is not implemented from the version where tests pass

This requires a protocol where ASDF can "know" when the test op is done, so that it can distinguish "has succeeded" from "has not failed yet." It's possible that this could be managed by the test op on the system as a whole, but the design must consider the possibility that the test-op may be implemented so that: 1. Multiple calls are made into the test library (possibly even multiple test libraries are used) 2. TEST-OP may involve invoking test operation on systems depended on (e.g., where a system has subsystems).

...

- the minimal requirement is a success / failure designator, the failed test names can be optional

- Additional requirement: the condition should support accessing both a long and a short form report. In degenerate implementations, these can, of course, be identical.

...

- For a caller of asdf:test-op it would be more convenient to have a single signal. Ideally, it should be just a return value of the asdf:operate function, as I understand we only consider the possibility of test result being signaled multiple times during test-op because we hope to make it work for everyone without library authors explicitly modify their code, but adding this new functionality to test frameworks. A good goal, although I can imaging some corner cases. Still, even if we expect test results being signalled multiple times during a test-op, it would be good to provide a wrapper which aggregates them into a single return value.

(common-test-results:collect (asdf:test-system "my-system"))

This would require that the test library provide an extensible protocol for fusing together multiple test results. And note that the above suggestion will not work, because ASDF does not ever *return* a value from operating. This has to do with the way ASDF creates a plan and then executes it. The plan doesn't support a notion of "return value," so the only way to get information out of ASDF is through conditions. One could, possibly, make `asdf:test-system` have a return value by making it handle conditions, but that would break the current equivalence that `asdf:test-system` is just a shorthand for calling `OPERATE`.

...

- as others mention, to me it also occurred this new functionality should not necessarily be declared inside of ASDF, it could be some separate library, say common-test-result. I'm not 100% sure about this, but currently, lean more towards separate lib, at least for the beginning. ASDF test-op docs could just referer to it.

I agree -- I think `TRIVIAL-TEST-INTERFACE` might be a better first step. I suppose the alternative rationale is that a test interface that was *not* intended for incorporation into ASDF would be able to just *return* things, instead of *signaling* them.

...

- If delivering test results thourhg a condition, test failure should not be an error or warning, in my opinion. Test error is an anticipated possible outcome. An error during tests an abnormal situation - no access to needed files, memory exhausted, null pointers, etc.

That is true, but it's also true that it would require special condition-handling to fit test results into continuous integration -- programmers would no longer be able to just use `quit-on-error`, which is a very handy way to turn a lisp test into something that works in Jenkins or, for that matter, any infrastructure based on shell scripting. I'd rather have to write code to handle errors when I *don't* want them, than have test failure not be an error. If I'm running interactively, it's not a bother to deal with this as an error condition -- I can easily get out of the debugger. But writing a lot of code to catch `TEST-FAILURE` conditions and translate them into exit with non-zero status would be a pain. A solution might be to have a top-level handler that can turn these conditions into errors, or not, as appropriate. But unfortunately, it's not at all easy to pass information from top-level calls to ASDF operations into the way those operations are executed, since `OPERATION` objects no longer carry attributes (Faré removed them because attribute propagation never worked correctly, but sometimes I still regret this).

...

- slot for the failing asdf system could probably be avoided, the list failed test names could be enough, if the names are "fully qualified" i.e. include package or system name.

I don't think we can make any assumptions about the above -- there's no rule about how a programmer can assign test names in a library like FiveAM to packages. Similarly, when writing tests, the programmer does not generally put in the tests information about the containing system -- indeed, doing so would be a violation of standard notions of abstraction (containers know about the contained; contained things don't have to know about their containers). Some kind of dynamic binding could allow these condition objects to automatically collect information about the system under test. This is such a complex issue that we should either have a pretty substantial design before we put it into ASDF, or we should kick this out of ASDF into another library. I'm not willing to incorporate into ASDF something that would incur substantial maintenance debt, without very high confidence that the design is solid. This will have tentacles everywhere. I would note also that getting a new library into Quicklisp for this is going to be a lot easier than getting a new ASDF into Quicklisp: Xach has for years refused to update the ASDF version in Quicklisp, and I don't see any reason to believe this will change.

...

27.09.2019, 10:20, "Vladimir Sedach" <vas@oneofus.la>:

...
Thank you for the specific suggestions Mark.

Mark Evenson <evenson@panix.com> writes:

...
1. Have a slot in your base condition class TEST-OP-TEST-FAILURE in which one can record the ASDF component which caused the failure. It is probably possible to dig this information out of the stack, but that will be messy. This would also allow for distinguishing when multiple TEST-OP-TEST-FAILURES are signaled from a single ASDF:TEST-OP invocation, as will be the case when one “chains” test invocation over many ASDF systems.

This is really easy to do with a special variable in an initarg, but are there any systems that you know of that do this? I would definitely like to test with them, because I thought that nested TEST-OP was not supposed to work. From the "More Elaborate Testing" section of the best practices document¹:

"You MUST NOT call asdf:operate or any of its derivatives, such as asdf:load-system or asdf:test-system from within a perform method."

Unfortunately it looks like that is what ROVE:RUN-SYSTEM-TESTS does exactly that.

...
2. Provide an implementation of the subclass of TEST-OP-TEST-FAILURE that contains the basic structure of a reporter class for the information that should be present in all test frameworks, namely the total number of tests run, the number of failed tests, the identities of the failed tests, and a slot for a human readable error message, along with a reporter function that displays this information. Having an implementation class to work from would make it easier for test frameworks to adapt.

I tried to avoid enforcing required slots, but as both asdf-test-harness and cl-test-grid want a list of failed tests, that is a strong case to make the above slots required in TEST-OP-TEST-FAILURE itself.

cl-test-grid wants a list of test names as strings (it wants them down-cased, but that is a detail that can be left to cl-test-grid). A list of strings is a requirement that any test library should be able to satisfy (worst case, it could be a list of random names), and looks to me specific enough for most test harness use cases.

The length of the list of failed test names gives the count of failed tests.

It seems to me like having a slot for an error message is redundant with the reporter function, given that I think it should be up to the test library to define the reporter function, and not for TEST-OP-TEST-FAILURES to dictate how it is printed. That way, if a test library has a flag to print results in machine readable format, the flag will work without any changes if the overridden reporter function re-uses the library's output facilities, and as long as the test harness PRINCs the condition, the test harness does not need to do anything either.

...
3. Go ahead and define the subclass of this condition when no tests have been run.

I thought about doing this, but with the above slots, there is no need to - the test library can signal TEST-OP-TEST-FAILURE with a 0 count of total number of tests run.

...
4. As for adoption by test framework your strategy, we will have the problem that a given test framework won’t want to adopt the conditions because it isn’t in the version of ASDF they are using, or can easily get a hold of. To solve this, we might somehow define the code within the ASDF source tree so that one can make a standalone ASDF system (“ASDF-TEST-CONDITIONS” or some such) that one may include separately from actually upgrading ASDF.

That is something that Robert brought up on the merge request discussion.² It looks like this can be handled with the #+ #- feature macros or #. read macro to provide CL:WARNING as a fallback super-class. I am open to any ideas.

I went ahead and added the slots to ASDF², the FiveAM³, and the rove⁴ implementations. Up next, I am going to work on adding support to cl-test-grid for a library that uses rove, which cl-test-grid does not support yet.

¹ https://github.com/fare/asdf/blob/master/doc/best_practices.md ² https://gitlab.common-lisp.net/asdf/asdf/merge_requests/124 ³ https://github.com/sionescu/fiveam/pull/58 ⁴ https://github.com/fukamachi/rove/pull/29

-- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Vladimir Sedach

4 Oct 4 Oct

3:39 a.m.

Robert Goldman <rpgoldman@sift.info> writes:

...

...
- success should also be signaled, so we can distinguish a version where this new protocol is not implemented from the version where tests pass

This requires a protocol where ASDF can "know" when the test op is done, so that it can distinguish "has succeeded" from "has not failed yet." It's possible that this could be managed by the test op on the system as a whole, but the design must consider the possibility that the test-op may be implemented so that:

1. Multiple calls are made into the test library (possibly even multiple test libraries are used) 2. TEST-OP may involve invoking test operation on systems depended on (e.g., where a system has subsystems).

There would be three situations here: 1. OPERATE TEST-OP returns with no relevant conditions signaled. You can infer that the condition protocol is not implemented. 2. OPERATE TEST-OP returns and one or more test failure conditions are signaled. 3. OPERATE TEST-OP returns and only test success conditions are signaled.

...

...
- the minimal requirement is a success / failure designator, the failed test names can be optional

- Additional requirement: the condition should support accessing both a long and a short form report. In degenerate implementations, these can, of course, be identical.

What would the long and short form reports look like?

...

...
some corner cases. Still, even if we expect test results being signalled multiple times during a test-op, it would be good to provide a wrapper which aggregates them into a single return value.

(common-test-results:collect (asdf:test-system "my-system"))

This would require that the test library provide an extensible protocol for fusing together multiple test results.

It is simpler than that: take all of the conditions, add the number of tests executed, and append all of the test failure lists. No need for library-specific code.

...

And note that the above suggestion will not work, because ASDF does not ever *return* a value from operating. This has to do with the way ASDF creates a plan and then executes it. The plan doesn't support a notion of "return value," so the only way to get information out of ASDF is through conditions.

What COMMON-TEST-RESULTS:COLLECT would do is handle and coalesce multiple conditions and re-signal a single condition.

...

I agree -- I think `TRIVIAL-TEST-INTERFACE` might be a better first step. I suppose the alternative rationale is that a test interface that was *not* intended for incorporation into ASDF would be able to just *return* things, instead of *signaling* them.

The point is, systems already define TEST-OP. I am trying to use that. The code that uses TEST-OP can do whatever it needs to, but the communication between that code and the test libraries has to be done by stack-based mechanisms like conditions or special variables.

...

That is true, but it's also true that it would require special condition-handling to fit test results into continuous integration -- programmers would no longer be able to just use `quit-on-error`, which is a very handy way to turn a lisp test into something that works in Jenkins or, for that matter, any infrastructure based on shell scripting.

Right now errors are not signaled on test failures in most definitions of TEST-OP I looked at, so this is not something that is currently going on. Neither is this something that would stop working for anyone that has the signal-error-on-failure flags set for their test library, or is throwing errors explicitly.

...

I'd rather have to write code to handle errors when I *don't* want them, than have test failure not be an error.

If I'm running interactively, it's not a bother to deal with this as an error condition -- I can easily get out of the debugger. But writing a lot of code to catch `TEST-FAILURE` conditions and translate them into exit with non-zero status would be a pain.

Test libraries already have flags whether to signal errors on test failures or not. Having the condition be a sub-class of error would not only be annoying in the REPL, it would break whatever test automation code uses these flags, and it would change the behavior of TEST-OP, most of whose callers do not expect it to signal errors on test failures right now. This is a lot of breakage of thousands of existing systems, just to avoid doing the following in a few test automation scripts: (handler-case (asdf:test-system "some-system") (asdf:test-op-test-failures (condition) (princ condition uiop:*stderr*) (uiop:quit 1)))

...

...
- slot for the failing asdf system could probably be avoided, the list failed test names could be enough, if the names are "fully qualified" i.e. include package or system name.

I don't think we can make any assumptions about the above -- there's no rule about how a programmer can assign test names in a library like FiveAM to packages.

FiveAM test names are symbols, so they already get printed with their package name in the implementation I did for FiveAM, without any extra work.

...

I would note also that getting a new library into Quicklisp for this is going to be a lot easier than getting a new ASDF into Quicklisp: Xach has for years refused to update the ASDF version in Quicklisp, and I don't see any reason to believe this will change.

As I mentioned before, I would like to avoid creating a whole library out of something that is a work-around to OPERATE not returning results. Unfortunately, it seems there are both social and technical problems with updating ASDF. In particular I do not see a good mechanism for advertising the availability of this condition protocol to test libraries (there does not seem to be an established way of advertising new ASDF functionality other than the :ASDF3.3 :ASDF3.2 etc. keywords in *FEATURES*). As Anton pointed out, this necessitates the libraries signaling a condition for test success, which necessitates a function like COMMON-TEST-RESULTS:COLLECT. While writing an implementation of that function, I realized I would need to add continue restarts (the only way to handle a condition without affecting control flow). Take all of this together, and it becomes apparent that avoiding ASDF to provide a more useful TEST-OP is, ironically, the way to go. It is simpler to drive communications down the stack by binding special variables to act as accumulators, than it is communicating up the stack with signals, handlers, and restarts. Putting this into a library means TEST-OP would still retain a use as a way to trigger test runs without knowing details about either the tests or the test library (the first half of this proposal), but it will unfortunately mean that ASDF will continue to have no say about what the effects of TEST-OP are (the second half of this proposal). -- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Anton Vodonosov

8 Oct 8 Oct

4:17 a.m.

Vladmir, the library content is not just the signal trick (or special vars). It defines the unified common test result representation, provides a bridge between the protocol test libraries "signal" their result and the protocol the user running tests consumes the results. The library can also provide means to create the common test results as a normal return value; or maybe deliver it through a stack-based mechanism, but explicitly, without relying on test frameworks to do it automatically (in some cases I anticipate explicit creation of the result will be more convenient) And yes, the most important, adding such functionality into the public ASDF API requires too much upfront thinking. After separate library experiments with the approach and stabilizes, ASDF can incorporate, or officially refer it. As Robert mentions in the pull request, having this functionality in new versions of ASDF would require the consumer to check feature flags in their code. In the case of a separate library, the consumer can just load this library, and it will work even with old versions of ASDF. 04.10.2019, 06:39, "Vladimir Sedach" <vas@oneofus.la>:

...

Robert Goldman <rpgoldman@sift.info> writes:

...
...
- success should also be signaled, so we can distinguish a version where    this new protocol is not implemented from the version where tests pass

This requires a protocol where ASDF can "know" when the test op is done, so that it can distinguish "has succeeded" from "has not failed yet." It's possible that this could be managed by the test op on the system as a whole, but the design must consider the possibility that the test-op may be implemented so that:

1. Multiple calls are made into the test library (possibly even multiple test libraries are used) 2. TEST-OP may involve invoking test operation on systems depended on (e.g., where a system has subsystems).

There would be three situations here:

1. OPERATE TEST-OP returns with no relevant conditions signaled. You can infer that the condition protocol is not implemented. 2. OPERATE TEST-OP returns and one or more test failure conditions are signaled. 3. OPERATE TEST-OP returns and only test success conditions are signaled.

...
...
- the minimal requirement is a success / failure designator, the failed    test names can be optional

- Additional requirement: the condition should support accessing both a long and a short form report. In degenerate implementations, these can, of course, be identical.

What would the long and short form reports look like?

...
...
   some corner cases. Still, even if we expect test results being signalled    multiple times during a test-op, it would be good to provide a wrapper    which aggregates them into a single return value.

         (common-test-results:collect (asdf:test-system "my-system"))

This would require that the test library provide an extensible protocol for fusing together multiple test results.

It is simpler than that: take all of the conditions, add the number of tests executed, and append all of the test failure lists. No need for library-specific code.

...
And note that the above suggestion will not work, because ASDF does not ever *return* a value from operating. This has to do with the way ASDF creates a plan and then executes it. The plan doesn't support a notion of "return value," so the only way to get information out of ASDF is through conditions.

What COMMON-TEST-RESULTS:COLLECT would do is handle and coalesce multiple conditions and re-signal a single condition.

...
I agree -- I think `TRIVIAL-TEST-INTERFACE` might be a better first step. I suppose the alternative rationale is that a test interface that was *not* intended for incorporation into ASDF would be able to just *return* things, instead of *signaling* them.

The point is, systems already define TEST-OP. I am trying to use that. The code that uses TEST-OP can do whatever it needs to, but the communication between that code and the test libraries has to be done by stack-based mechanisms like conditions or special variables.

...
That is true, but it's also true that it would require special condition-handling to fit test results into continuous integration -- programmers would no longer be able to just use `quit-on-error`, which is a very handy way to turn a lisp test into something that works in Jenkins or, for that matter, any infrastructure based on shell scripting.

Right now errors are not signaled on test failures in most definitions of TEST-OP I looked at, so this is not something that is currently going on. Neither is this something that would stop working for anyone that has the signal-error-on-failure flags set for their test library, or is throwing errors explicitly.

...
I'd rather have to write code to handle errors when I *don't* want them, than have test failure not be an error.

If I'm running interactively, it's not a bother to deal with this as an error condition -- I can easily get out of the debugger. But writing a lot of code to catch `TEST-FAILURE` conditions and translate them into exit with non-zero status would be a pain.

Test libraries already have flags whether to signal errors on test failures or not. Having the condition be a sub-class of error would not only be annoying in the REPL, it would break whatever test automation code uses these flags, and it would change the behavior of TEST-OP, most of whose callers do not expect it to signal errors on test failures right now. This is a lot of breakage of thousands of existing systems, just to avoid doing the following in a few test automation scripts:

(handler-case     (asdf:test-system "some-system")   (asdf:test-op-test-failures (condition)     (princ condition uiop:*stderr*)     (uiop:quit 1)))

...
...
- slot for the failing asdf system could probably be avoided,    the list failed test names could be enough, if the names are "fully qualified"    i.e. include package or system name.

I don't think we can make any assumptions about the above -- there's no rule about how a programmer can assign test names in a library like FiveAM to packages.

FiveAM test names are symbols, so they already get printed with their package name in the implementation I did for FiveAM, without any extra work.

...
I would note also that getting a new library into Quicklisp for this is going to be a lot easier than getting a new ASDF into Quicklisp: Xach has for years refused to update the ASDF version in Quicklisp, and I don't see any reason to believe this will change.

As I mentioned before, I would like to avoid creating a whole library out of something that is a work-around to OPERATE not returning results.

Unfortunately, it seems there are both social and technical problems with updating ASDF. In particular I do not see a good mechanism for advertising the availability of this condition protocol to test libraries (there does not seem to be an established way of advertising new ASDF functionality other than the :ASDF3.3 :ASDF3.2 etc. keywords in *FEATURES*).

As Anton pointed out, this necessitates the libraries signaling a condition for test success, which necessitates a function like COMMON-TEST-RESULTS:COLLECT. While writing an implementation of that function, I realized I would need to add continue restarts (the only way to handle a condition without affecting control flow).

Take all of this together, and it becomes apparent that avoiding ASDF to provide a more useful TEST-OP is, ironically, the way to go. It is simpler to drive communications down the stack by binding special variables to act as accumulators, than it is communicating up the stack with signals, handlers, and restarts. Putting this into a library means TEST-OP would still retain a use as a way to trigger test runs without knowing details about either the tests or the test library (the first half of this proposal), but it will unfortunately mean that ASDF will continue to have no say about what the effects of TEST-OP are (the second half of this proposal).

-- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Anton Vodonosov

4:18 a.m.

Also, the library can provide serialization of the common test result to files, thus simplifying implementation of the test-report-op, as Fare mentions. 08.10.2019, 07:17, "Anton Vodonosov" <avodonosov@yandex.ru>:

...

Vladmir, the library content is not just the signal trick (or special vars).

It defines the unified common test result representation, provides a bridge between the protocol test libraries "signal" their result and the protocol the user running tests consumes the results. The library can also provide means to create the common test results as a normal return value; or maybe deliver it through a stack-based mechanism, but explicitly, without relying on test frameworks to do it automatically (in some cases I anticipate explicit creation of the result will be more convenient)

And yes, the most important, adding such functionality into the public ASDF API requires too much upfront thinking. After separate library experiments with the approach and stabilizes, ASDF can incorporate, or officially refer it.

As Robert mentions in the pull request, having this functionality in new versions of ASDF would require the consumer to check feature flags in their code. In the case of a separate library, the consumer can just load this library, and it will work even with old versions of ASDF.

04.10.2019, 06:39, "Vladimir Sedach" <vas@oneofus.la>:

...
Robert Goldman <rpgoldman@sift.info> writes:

...
...
  - success should also be signaled, so we can distinguish a version   where     this new protocol is not implemented from the version where tests   pass

  This requires a protocol where ASDF can "know" when the test op is   done, so that it can distinguish "has succeeded" from "has not failed   yet." It's possible that this could be managed by the test op on the   system as a whole, but the design must consider the possibility that   the test-op may be implemented so that:

  1. Multiple calls are made into the test library (possibly even   multiple test libraries are used)   2. TEST-OP may involve invoking test operation on systems depended on   (e.g., where a system has subsystems).

There would be three situations here:

1. OPERATE TEST-OP returns with no relevant conditions signaled. You can infer that the condition protocol is not implemented. 2. OPERATE TEST-OP returns and one or more test failure conditions are signaled. 3. OPERATE TEST-OP returns and only test success conditions are signaled.

...
...
  - the minimal requirement is a success / failure designator, the   failed     test names can be optional

  - Additional requirement: the condition should support accessing both   a long and a short form report. In degenerate implementations, these   can, of course, be identical.

What would the long and short form reports look like?

...
...
    some corner cases. Still, even if we expect test results being   signalled     multiple times during a test-op, it would be good to provide a   wrapper     which aggregates them into a single return value.

          (common-test-results:collect (asdf:test-system "my-system"))

  This would require that the test library provide an extensible   protocol for fusing together multiple test results.

It is simpler than that: take all of the conditions, add the number of tests executed, and append all of the test failure lists. No need for library-specific code.

...
  And note that the above suggestion will not work, because ASDF does   not ever *return* a value from operating. This has to do with the   way ASDF creates a plan and then executes it. The plan doesn't   support a notion of "return value," so the only way to get   information out of ASDF is through conditions.

What COMMON-TEST-RESULTS:COLLECT would do is handle and coalesce multiple conditions and re-signal a single condition.

...
  I agree -- I think `TRIVIAL-TEST-INTERFACE` might be a better first   step. I suppose the alternative rationale is that a test interface   that was *not* intended for incorporation into ASDF would be able to   just *return* things, instead of *signaling* them.

The point is, systems already define TEST-OP. I am trying to use that. The code that uses TEST-OP can do whatever it needs to, but the communication between that code and the test libraries has to be done by stack-based mechanisms like conditions or special variables.

...
  That is true, but it's also true that it would require special   condition-handling to fit test results into continuous integration   -- programmers would no longer be able to just use `quit-on-error`,   which is a very handy way to turn a lisp test into something that   works in Jenkins or, for that matter, any infrastructure based on   shell scripting.

Right now errors are not signaled on test failures in most definitions of TEST-OP I looked at, so this is not something that is currently going on. Neither is this something that would stop working for anyone that has the signal-error-on-failure flags set for their test library, or is throwing errors explicitly.

...
  I'd rather have to write code to handle errors when I *don't* want   them, than have test failure not be an error.

  If I'm running interactively, it's not a bother to deal with this as   an error condition -- I can easily get out of the debugger. But   writing a lot of code to catch `TEST-FAILURE` conditions and   translate them into exit with non-zero status would be a pain.

Test libraries already have flags whether to signal errors on test failures or not. Having the condition be a sub-class of error would not only be annoying in the REPL, it would break whatever test automation code uses these flags, and it would change the behavior of TEST-OP, most of whose callers do not expect it to signal errors on test failures right now. This is a lot of breakage of thousands of existing systems, just to avoid doing the following in a few test automation scripts:

(handler-case      (asdf:test-system "some-system")    (asdf:test-op-test-failures (condition)      (princ condition uiop:*stderr*)      (uiop:quit 1)))

...
...
  - slot for the failing asdf system could probably be avoided,     the list failed test names could be enough, if the names are   "fully qualified"     i.e. include package or system name.

  I don't think we can make any assumptions about the above -- there's   no rule about how a programmer can assign test names in a library   like FiveAM to packages.

FiveAM test names are symbols, so they already get printed with their package name in the implementation I did for FiveAM, without any extra work.

...
  I would note also that getting a new library into Quicklisp for this   is going to be a lot easier than getting a new ASDF into Quicklisp:   Xach has for years refused to update the ASDF version in Quicklisp,   and I don't see any reason to believe this will change.

As I mentioned before, I would like to avoid creating a whole library out of something that is a work-around to OPERATE not returning results.

Unfortunately, it seems there are both social and technical problems with updating ASDF. In particular I do not see a good mechanism for advertising the availability of this condition protocol to test libraries (there does not seem to be an established way of advertising new ASDF functionality other than the :ASDF3.3 :ASDF3.2 etc. keywords in *FEATURES*).

As Anton pointed out, this necessitates the libraries signaling a condition for test success, which necessitates a function like COMMON-TEST-RESULTS:COLLECT. While writing an implementation of that function, I realized I would need to add continue restarts (the only way to handle a condition without affecting control flow).

Take all of this together, and it becomes apparent that avoiding ASDF to provide a more useful TEST-OP is, ironically, the way to go. It is simpler to drive communications down the stack by binding special variables to act as accumulators, than it is communicating up the stack with signals, handlers, and restarts. Putting this into a library means TEST-OP would still retain a use as a way to trigger test runs without knowing details about either the tests or the test library (the first half of this proposal), but it will unfortunately mean that ASDF will continue to have no say about what the effects of TEST-OP are (the second half of this proposal).

-- Vladimir Sedach Software engineering services in Los Angeles https://oneofus.la

Anton Vodonosov

4:27 a.m.

PS: Does the idea of operation return values really mismatches the ASDF model?

Anton Vodonosov

26 Mar 26 Mar

12:54 p.m.

1566

Age (days ago)

2124

Last active (days ago)

List overview

Download

15 comments

4 participants

participants (4)

Anton Vodonosov
Mark Evenson
Robert Goldman
Vladimir Sedach

[PATCH] Add a TEST-OP-TEST-FAILURE condition for test libraries to sub-class

tags

participants (4)