Hello.
I have recently preformed cl-test-grid tests runs for several ABCL versions, and reviewed the results now.
Two things I want to report.
1. Some degradation of error handling is introduced somewhere between abcl-1.2.0-fasl42 and abcl-1.3.0-dev-svn-14580-fasl42.
It may be noticed when loading ASDF systems :cells and :test-gtk (the latter is from the cells-gtk3 project). ASDF:load-op fails for these systems with stack overflow.
Old ABCL allows me to catch the stack overflow error and record asdf:load failure, while new ABCL sometimes crashes - the java process terminates: http://common-lisp.net/project/cl-test-grid/abcl/abcl-diff16.html
To reproduce the situations on the cl-test-grid.cloud.efficito.com machine, lets ql:quickload the :cells system and save the ABCL process output:
Old ABCL: java -jar ~/lisps/abcl-1.2.0/dist/abcl.jar --noinit --nosystem --batch --load ~/cl-test-grid/work-dir/agent/quicklisp/setup.lisp --eval '(handler-case (ql:quickload :cells) (serious-condition (e) (format t "!!! Serious condition occurred: ~A !!!~%" (type-of e))))' > log 2>&1 && less log
Old ABCL always behave the same: prints the stacktrace, and then prints the message from our handler-case - "!!! Serious condition occurred .."
New ABCL: java -jar ~/lisps/abcl/dist/abcl.jar --noinit --nosystem --batch --load ~/cl-test-grid/work-dir/agent/quicklisp/setup.lisp --eval '(handler-case (ql:quickload :cells) (serious-condition (e) (format t "!!! Serious condition occured: ~A !!!~%" (type-of e))))' > log 2>&1 && less log
New ABCL sometimes only prints the "!!! Serious condition occurred ..." message from our handler-case, without the stack trace. But sometimes it doesn't invoke our handler-case at all, and instead prints a stack trace as shown below, and then java process exits:
Exception in thread "interpreter" java.lang.ClassCastException: org.armedbear.lisp.Symbol cannot be cast to org.armedbear.lis p.LispThread$StackMarker at org.armedbear.lisp.LispThread.getStackTop(LispThread.java:721) at org.armedbear.lisp.LispThread.pushStackFrame(LispThread.java:736) at org.armedbear.lisp.Lisp.pushJavaStackFrames(Lisp.java:374) at org.armedbear.lisp.Lisp.stackError(Lisp.java:388) at org.armedbear.lisp.asdf_134.execute(asdf.lisp:1436) at org.armedbear.lisp.Symbol.execute(Symbol.java:814) at org.armedbear.lisp.LispThread.execute(LispThread.java:832) at org.armedbear.lisp.asdf_402.execute(asdf.lisp:4458) at org.armedbear.lisp.Symbol.execute(Symbol.java:803) at org.armedbear.lisp.LispThread.execute(LispThread.java:814) at org.armedbear.lisp.asdf_675.execute(asdf.lisp:6127)
The full stacktrace is in the file attached.
As you can see, the problem is caused by ClassCastException when handling the first (stack overflow) error. Why the ClassCastException doesn't happen always - I don't know. Maybe it depends on the internal state of the ABCL process.
2. Erik asked whether cl-test-grid testing becomes faster after the patch for "Lazy allocation of LispStackFrames". There is no noticeable speedup, full test run on both new and old ABCL takes around 18 hours.
For your information: in cl-test-grid every ASDF system and every testsuite is peformed in a separate process, in total around 1500 processes are started. It might be the case that in some other use case, lisp code execution has speedup, but in cl-test-grid it is not noticeable.
Best regards, - Anton