On 30 Sep 2019, at 6:23, Anton Vodonosov wrote:

Hi.

Thank you, everyone, for addressing this topic, having a unified representation
of test results would be very useful.

A couple of thoughts:

- success should also be signaled, so we can distinguish a version where
this new protocol is not implemented from the version where tests pass

This requires a protocol where ASDF can "know" when the test op is done, so that it can distinguish "has succeeded" from "has not failed yet." It's possible that this could be managed by the test op on the system as a whole, but the design must consider the possibility that the test-op may be implemented so that:

Multiple calls are made into the test library (possibly even multiple test libraries are used)
TEST-OP may involve invoking test operation on systems depended on (e.g., where a system has subsystems).

- the minimal requirement is a success / failure designator, the failed
test names can be optional

Additional requirement: the condition should support accessing both a long and a short form report. In degenerate implementations, these can, of course, be identical.

- For a caller of asdf:test-op it would be more convenient to have a single
signal. Ideally, it should be just a return value of the asdf:operate function,
as I understand we only consider the possibility of test result being signaled
multiple times during test-op because we hope to make it work for everyone
without library authors explicitly modify their code, but adding this new
functionality to test frameworks. A good goal, although I can imaging
some corner cases. Still, even if we expect test results being signalled
multiple times during a test-op, it would be good to provide a wrapper
which aggregates them into a single return value.

(common-test-results:collect (asdf:test-system "my-system"))

This would require that the test library provide an extensible protocol for fusing together multiple test results. And note that the above suggestion will not work, because ASDF does not ever return a value from operating. This has to do with the way ASDF creates a plan and then executes it. The plan doesn't support a notion of "return value," so the only way to get information out of ASDF is through conditions.

One could, possibly, make asdf:test-system have a return value by making it handle conditions, but that would break the current equivalence that asdf:test-system is just a shorthand for calling OPERATE.

- as others mention, to me it also occurred this new functionality
should not necessarily be declared inside of ASDF, it could be
some separate library, say common-test-result. I'm not 100% sure
about this, but currently, lean more towards separate lib, at least
for the beginning. ASDF test-op docs could just referer to it.

I agree -- I think TRIVIAL-TEST-INTERFACE might be a better first step. I suppose the alternative rationale is that a test interface that was not intended for incorporation into ASDF would be able to just return things, instead of signaling them.

- If delivering test results thourhg a condition, test failure should not
be an error or warning, in my opinion. Test error is an anticipated
possible outcome. An error during tests an abnormal situation -
no access to needed files, memory exhausted, null pointers, etc.

That is true, but it's also true that it would require special condition-handling to fit test results into continuous integration -- programmers would no longer be able to just use quit-on-error, which is a very handy way to turn a lisp test into something that works in Jenkins or, for that matter, any infrastructure based on shell scripting.

I'd rather have to write code to handle errors when I don't want them, than have test failure not be an error.

If I'm running interactively, it's not a bother to deal with this as an error condition -- I can easily get out of the debugger. But writing a lot of code to catch TEST-FAILURE conditions and translate them into exit with non-zero status would be a pain.

A solution might be to have a top-level handler that can turn these conditions into errors, or not, as appropriate. But unfortunately, it's not at all easy to pass information from top-level calls to ASDF operations into the way those operations are executed, since OPERATION objects no longer carry attributes (Faré removed them because attribute propagation never worked correctly, but sometimes I still regret this).

- slot for the failing asdf system could probably be avoided,
the list failed test names could be enough, if the names are "fully qualified"
i.e. include package or system name.

I don't think we can make any assumptions about the above -- there's no rule about how a programmer can assign test names in a library like FiveAM to packages. Similarly, when writing tests, the programmer does not generally put in the tests information about the containing system -- indeed, doing so would be a violation of standard notions of abstraction (containers know about the contained; contained things don't have to know about their containers).

Some kind of dynamic binding could allow these condition objects to automatically collect information about the system under test.

This is such a complex issue that we should either have a pretty substantial design before we put it into ASDF, or we should kick this out of ASDF into another library.

I'm not willing to incorporate into ASDF something that would incur substantial maintenance debt, without very high confidence that the design is solid. This will have tentacles everywhere.

I would note also that getting a new library into Quicklisp for this is going to be a lot easier than getting a new ASDF into Quicklisp: Xach has for years refused to update the ASDF version in Quicklisp, and I don't see any reason to believe this will change.

27.09.2019, 10:20, "Vladimir Sedach" <vas@oneofus.la>:

Thank you for the specific suggestions Mark.

Mark Evenson <evenson@panix.com> writes:

1. Have a slot in your base condition class TEST-OP-TEST-FAILURE in
which one can record the ASDF component which caused the failure.
It is probably possible to dig this information out of the stack,
but that will be messy. This would also allow for distinguishing
when multiple TEST-OP-TEST-FAILURES are signaled from a single
ASDF:TEST-OP invocation, as will be the case when one “chains” test
invocation over many ASDF systems.

This is really easy to do with a special variable in an initarg, but
are there any systems that you know of that do this? I would
definitely like to test with them, because I thought that nested
TEST-OP was not supposed to work. From the "More Elaborate Testing"
section of the best practices document¹:

"You MUST NOT call asdf:operate or any of its derivatives, such as
asdf:load-system or asdf:test-system from within a perform method."

Unfortunately it looks like that is what ROVE:RUN-SYSTEM-TESTS does
exactly that.

2. Provide an implementation of the subclass of
TEST-OP-TEST-FAILURE that contains the basic structure of a
reporter class for the information that should be present in all
test frameworks, namely the total number of tests run, the number
of failed tests, the identities of the failed tests, and a slot for
a human readable error message, along with a reporter function that
displays this information. Having an implementation class to work
from would make it easier for test frameworks to adapt.

I tried to avoid enforcing required slots, but as both
asdf-test-harness and cl-test-grid want a list of failed tests, that
is a strong case to make the above slots required in
TEST-OP-TEST-FAILURE itself.

cl-test-grid wants a list of test names as strings (it wants them
down-cased, but that is a detail that can be left to cl-test-grid). A
list of strings is a requirement that any test library should be able
to satisfy (worst case, it could be a list of random names), and
looks to me specific enough for most test harness use cases.

The length of the list of failed test names gives the count of failed
tests.

It seems to me like having a slot for an error message is redundant
with the reporter function, given that I think it should be up to the
test library to define the reporter function, and not for
TEST-OP-TEST-FAILURES to dictate how it is printed. That way, if a
test library has a flag to print results in machine readable format,
the flag will work without any changes if the overridden reporter
function re-uses the library's output facilities, and as long as the
test harness PRINCs the condition, the test harness does not need to
do anything either.

3. Go ahead and define the subclass of this condition when no tests
have been run.

I thought about doing this, but with the above slots, there is no
need to - the test library can signal TEST-OP-TEST-FAILURE with a 0
count of total number of tests run.

4. As for adoption by test framework your strategy, we will have
the problem that a given test framework won’t want to adopt the
conditions because it isn’t in the version of ASDF they are using,
or can easily get a hold of. To solve this, we might somehow define
the code within the ASDF source tree so that one can make a
standalone ASDF system (“ASDF-TEST-CONDITIONS” or some such) that
one may include separately from actually upgrading ASDF.

That is something that Robert brought up on the merge request
discussion.² It looks like this can be handled with the #+ #- feature
macros or #. read macro to provide CL:WARNING as a fallback
super-class. I am open to any ideas.

I went ahead and added the slots to ASDF², the FiveAM³, and the rove⁴
implementations. Up next, I am going to work on adding support to
cl-test-grid for a library that uses rove, which cl-test-grid does
not support yet.

¹ https://github.com/fare/asdf/blob/master/doc/best_practices.md
² https://gitlab.common-lisp.net/asdf/asdf/merge_requests/124
³ https://github.com/sionescu/fiveam/pull/58
⁴ https://github.com/fukamachi/rove/pull/29

--
Vladimir Sedach
Software engineering services in Los Angeles https://oneofus.la