Hi.
I'm not sure if this is appropriate for this mailing list, but I was wondering if and how people are using SLIME to test their programs. Looking at the SLIME manual, there is nothing that really jumps out at me as a program testing tool.
Personally, I tend to have two windows above eachother: the top one displaying an open Lisp file where I edit my definitions, the bottom one displaying the SLIME REPL to test these definitions. The problem with this approach is that I don't save my testcases, ie I have no automated regression testing.
How do other SLIMErs deal with this?
Regards,
Dirk Gerrits
Dirk Gerrits dirk@dirkgerrits.com writes:
I'm not sure if this is appropriate for this mailing list, but I was wondering if and how people are using SLIME to test their programs. Looking at the SLIME manual, there is nothing that really jumps out at me as a program testing tool.
If you haven't already then you might like to look at:
ClickCheck by Darius Bacon http://www.accesscom.com/~darius/software/clickcheck.html You tell it how to make test cases and then it automatically tests hundreds of randomly-generated ones.
Peter Seibel's unit test framework in his book: http://www.gigamonkeys.com/book/practical-building-a-unit-test-framework.htm...
I always go ad-hoc myself because, frankly, I don't take testing as seriously as I should :-). In SLIME we have a homebrew elisp-driven testing framework on `M-x slime-run-tests' that presents its results in an outline-mode buffer.
I have a CL program for encoding and decoding TCP/IP packets that tests itself. I did that ad-hoc, but now I think about that might be a good application for ClickCheck. http://www.hexapodia.net/pipermail/small-cl-src/2004-July/000030.html
-Luke
Luke Gorrie wrote:
Dirk Gerrits dirk@dirkgerrits.com writes:
I'm not sure if this is appropriate for this mailing list, but I was wondering if and how people are using SLIME to test their programs. Looking at the SLIME manual, there is nothing that really jumps out at me as a program testing tool.
If you haven't already then you might like to look at: [snipped]
These are certainly interesting, but I wasn't really asking what testing framework I should use. (We have http://www.cliki.net/Test%20Framework for that. ;))
My problem is how to use any such testing framework /effectively/. The style of writing a bit of code in the REPL, trying it out, and copying it to my file, editing it a bit, trying it out in the REPL again... is comfortable for me.
When writing tests, you have to compare the results you get with the results you expect, but sometimes I'm not so sure about the latter. The REPL-approach lets me actually get the result, and then argue whether it is correct or not. I suppose I could then copy this result, make it into a test-case in my file, and stop whining on this list, but somehow this is already too much for me. (Talk about commitment to testing eh? ;))
Perhaps I should look into SLIME's internals and hack up a command to make the last REPL evaluation into a test case. Possibly with some hooks to make it work for any given testing framework.
I always go ad-hoc myself because, frankly, I don't take testing as seriously as I should :-).
Well it's not really our fault, Common Lisp makes ad-hoc testing so damn easy. :) It's not as if C++ has a REPL. (Well I guess there's http://home.mweb.co.za/sd/sdonovan/underc.html but let's not digress into such madness.)
In SLIME we have a homebrew elisp-driven testing framework on `M-x slime-run-tests' that presents its results in an outline-mode buffer.
Nice. Something like the compilation notes buffer, but for test results. That might be nice to have for programs created /with/ SLIME. Hmm, more hacking to do. :)
I have a CL program for encoding and decoding TCP/IP packets that tests itself. I did that ad-hoc, but now I think about that might be a good application for ClickCheck. http://www.hexapodia.net/pipermail/small-cl-src/2004-July/000030.html
Yes ClickCheck seems like a very nice idea. I'm just a bit worried that by using a different random seed each time you may be 'losing test cases', that may 'resurface' at any future moment, at which it may be harder to know what caused the error. But then, it might be a test case that you wouldn't have written using a non-random approach... And with the same random seed, you're just getting a static set of test cases, that may not include the critical cases, which you'd then have to add yourself... Let's say that I'm intrigued but skeptical. :)
Regards,
Dirk Gerrits
On Fri, 16 Jul 2004, Dirk Gerrits wrote:
My problem is how to use any such testing framework /effectively/. The style of writing a bit of code in the REPL, trying it out, and copying it to my file, editing it a bit, trying it out in the REPL again... is comfortable for me.
FWIW.
I never type any code I'm likely to want to keep into the REPL. If there's a place for it it goes into the appropriate file, if not I tend to keep around a scratch.lisp buffer for stuff like that. Then I either C-x C-e or C-c C-c it.
In the REPL I only do very ephemeral stuff: tweaking FORMAT expressions, munging files, using it as a desktop calculator, REQUIRE'ing stuff, etc.
For tests I typically use the RT framework, and have a test-op for the system the tests are for. To run all the tests I just do (in the REPL):
(oos 'test-op :foo)
and when working on a single case I may edit & recompile the relevant code & tests, and then just rerun those ones from the REPL:
(rt:do-test 'foo-tests:frob.42)
Cheers,
-- Nikodemus "Not as clumsy or random as a C++ or Java. An elegant weapon for a more civilized time."
Luke Gorrie luke@bluetail.com writes:
In SLIME we have a homebrew elisp-driven testing framework on `M-x slime-run-tests' that presents its results in an outline-mode buffer.
With the current CVS and CMUCL 18e-r4, I get 'Failed on 23 (0 expected) of 43 tests'. I've attached the outline-mode buffer contents. The first failure seems to be a red herring, and the others appear to be appear to be of the form '"CL-USER> ..." expected but got "SWANK> ..."'.
Regards,
Dirk Gerrits
Dirk Gerrits dirk@dirkgerrits.com writes:
Luke Gorrie luke@bluetail.com writes:
In SLIME we have a homebrew elisp-driven testing framework on `M-x slime-run-tests' that presents its results in an outline-mode buffer.
With the current CVS and CMUCL 18e-r4, I get 'Failed on 23 (0 expected) of 43 tests'. I've attached the outline-mode buffer contents. The first failure seems to be a red herring, and the others appear to be appear to be of the form '"CL-USER> ..." expected but got "SWANK> ..."'.
Hmmmm. That sounds like the stuff I've been mucking around in lately. I'll take a look.
-Peter
Peter Seibel peter@javamonkey.com writes:
Dirk Gerrits dirk@dirkgerrits.com writes:
Luke Gorrie luke@bluetail.com writes:
In SLIME we have a homebrew elisp-driven testing framework on `M-x slime-run-tests' that presents its results in an outline-mode buffer.
With the current CVS and CMUCL 18e-r4, I get 'Failed on 23 (0 expected) of 43 tests'. I've attached the outline-mode buffer contents. The first failure seems to be a red herring, and the others appear to be appear to be of the form '"CL-USER> ..." expected but got "SWANK> ..."'.
Hmmmm. That sounds like the stuff I've been mucking around in lately. I'll take a look.
Okay. So I obviously didn't run the tests before I committed those changes. Shame on me. So now I'm trying to be good and run them and am getting not only all kinds of failures but also--it seems--the tests themselves hanging. I'm using Allegro 6.2 on GNU/Linux. Any ideas?
-Peter
Peter Seibel peter@javamonkey.com writes:
Okay. So I obviously didn't run the tests before I committed those changes. Shame on me. So now I'm trying to be good and run them and am getting not only all kinds of failures but also--it seems--the tests themselves hanging. I'm using Allegro 6.2 on GNU/Linux. Any ideas?
I'm not sure. I have only used the test suite under CMUCL myself, but recently Helmut checked in some stuff that suggests he's run it everywhere (it knows how many failures to expect per Lisp.)
FWIW the test suite sometimes takes some time (e.g. a minute or two) to run and might appear to have hung. I ran it successfully (reporting one trivial/false failure) under CMUCL-19a-pre3 earlier this week.
If it does hang you can C-g it to see the partial results. If you set `slime-test-debug-on-error' you'll get the Elisp debugger if there is an error in a test or if you abort it.
Luke Gorrie luke@bluetail.com writes:
Peter Seibel peter@javamonkey.com writes:
Okay. So I obviously didn't run the tests before I committed those changes. Shame on me. So now I'm trying to be good and run them and am getting not only all kinds of failures but also--it seems--the tests themselves hanging. I'm using Allegro 6.2 on GNU/Linux. Any ideas?
I'm not sure. I have only used the test suite under CMUCL myself, but recently Helmut checked in some stuff that suggests he's run it everywhere (it knows how many failures to expect per Lisp.)
FWIW the test suite sometimes takes some time (e.g. a minute or two) to run and might appear to have hung. I ran it successfully (reporting one trivial/false failure) under CMUCL-19a-pre3 earlier this week.
Yes. This was my main problem--I saw a message in the mini-buffer that said "Evaluation aborted" or some such after which it appeared to hang. So I assumed something bad had happened. But if I just waited a while things went okay. (However under SBCL 0.8.12 I asked me whether I wanted to enter a recursive edit and then dropped me in the debugger. Didn't have that problem in Allegro.)
Anyway, I checked in a fix for the package-related test failures that I introduced the other day. Sorry about that. There are still 7 failures instead of the expected 5 under Allegro. Here are the two unexpected failures:
* arglist ** input: (swank:start-server (swank:start-server port-file &optional (style *communication-style*) dont-close)) At the top level (no debugging or pending RPCs) FAILED: Argument list is as expected: expected: ["(swank:start-server port-file &optional (style *communication-style*) dont-close)"] actual: ["(swank:start-server port-file &optional style dont-close)"] At the top level (no debugging or pending RPCs)
* interactive-eval-output ** input: ((+ 1 2) ;;;; (+ 1 2) ... SWANK> nil) Buffer contains result: expected: [";;;; (+ 1 2) ... SWANK> "] actual: [";;;; (+ 1 2) ... SWANK> "] Buffer visible?: expected: [nil] actual: [nil] ** input: ((princ 10) ;;;; (princ 10) ... 10 SWANK> t) FAILED: Buffer contains result: expected: [";;;; (princ 10) ... 10 SWANK> "] actual: [";;;; (princ 10) ... 10
SWANK> "]
-Peter
Peter Seibel peter@javamonkey.com writes:
Anyway, I checked in a fix for the package-related test failures that I introduced the other day. Sorry about that. There are still 7 failures instead of the expected 5 under Allegro.
Okay I fixed one of the two unexepected failures and the other failure went away by magic. So we're back to only 5 failures, all expected, under Allegro 6.2.
-Peter
Peter Seibel peter@javamonkey.com writes:
Okay I fixed one of the two unexepected failures and the other failure went away by magic. So we're back to only 5 failures, all expected, under Allegro 6.2.
Thanks!
I get the feeling that effort we put into the test suite now as we approach 1.0 will really pay off.
Peter Seibel peter@javamonkey.com writes:
Okay I fixed one of the two unexepected failures and the other failure went away by magic. So we're back to only 5 failures, all expected, under Allegro 6.2.
Nice! Under CMUCL 18e I still get this one though:
* arglist ** input: (swank:start-server (swank:start-server port-file &optional (style *communication-style*) dont-close)) At the top level (no debugging or pending RPCs) FAILED: Argument list is as expected: expected: ["(swank:start-server port-file &optional (style *communication-style*) dont-close)"] actual: ["(swank:start-server port-file &optional (style *communication-style*) dont-close)"] At the top level (no debugging or pending RPCs)
Not really critical, but it prevents me from getting that 'all tests passed'. ;)
Luke Gorrie luke@bluetail.com writes:
I get the feeling that effort we put into the test suite now as we approach 1.0 will really pay off.
Just an idea, not sure if it's feasible, but perhaps it might be nice to have a daily/weekly/whateverly report of actual versus expected failures for all supported implementations sent to the list, just like the daily changelog diff.
Regards,
Dirk Gerrits
Dirk Gerrits dirk@dirkgerrits.com writes:
Just an idea, not sure if it's feasible, but perhaps it might be nice to have a daily/weekly/whateverly report of actual versus expected failures for all supported implementations sent to the list, just like the daily changelog diff.
Yes, that's an excellent idea. We have a script called 'test.sh' in CVS that drives the test suite in batch-mode if anyone wants to have a crack at setting this up. There'd be some rough edges but we're here to help.
Some trickiness is that the implementations don't all run on the same OS/CPU and the commercial ones require licenses to run in batch-mode.
-Luke