[cl-debian] Bug#407606: cmucl fails at initialization

Package: cmucl Version: 19d-20061116-1 Severity: normal *** Please type your report below this line *** Since I last upgraded cmucl, I get the following error and backtrace whenever I start it. $ cmucl Error in function UNIX::SIGSEGV-HANDLER: Segmentation Violation at #x10044FB8. [Condition of type SIMPLE-ERROR] Restarts: 0: [ABORT] Skip remaining initializations. Debug (type H for help) (UNIX::SIGSEGV-HANDLER #<unused-arg> #<unused-arg> #.(SYSTEM:INT-SAP #x3FFFCA7C)) Source: Error finding source: Error in function LISP::%ENUMERATE-SEARCH-LIST: Undefined search list: default 0] backtrace 0: (UNIX::SIGSEGV-HANDLER #<unused-arg> #<unused-arg> #.(SYSTEM:INT-SAP #x3FFFCA7C)) 1: (UNIX::SIGSEGV-HANDLER 3 #<unused-arg> #<unused-arg> #.(SYSTEM:INT-SAP #x3FFFCA7C))[:EXTERNAL] 2: ("call_into_lisp+#x8C [#x805560C] cmucl") 3: ("funcall3+#x32 [#x8055422] cmucl") 4: ("interrupt_handle_now+#x105 [#x8050940] cmucl") 5: (EQUAL #<Unprintable Instance {6C69663D}> #<ARRAY-TYPE SIMPLE-BASE-STRING>) 6: (EQUAL (#<Unprintable Instance {6C69663D}> . #<Unprintable Instance {682F3A65}>) (#<ARRAY-TYPE SIMPLE-BASE-STRING> #<MEMBER-TYPE NULL>)) 7: ((FLET #:G30 KERNEL::%TYPE-INTERSECTION-CACHE-LOOKUP)) 8: (KERNEL::%TYPE-INTERSECTION (#<ARRAY-TYPE SIMPLE-BASE-STRING> #<MEMBER-TYPE NULL>)) 9: (KERNEL::UNION-COMPLEX-SUBTYPEP-ARG2 #<ARRAY-TYPE SIMPLE-BASE-STRING> #<UNION-TYPE LIST>) 10: (KERNEL:CSUBTYPEP #<ARRAY-TYPE SIMPLE-BASE-STRING> #<UNION-TYPE LIST>) 11: (MAKE-SEQUENCE SIMPLE-STRING 11 :INITIAL-ELEMENT NIL) 12: (LISP::CONCAT-TO-SIMPLE* SIMPLE-STRING "/home/fare" "/") 13: (DEFAULT-DIRECTORY) 14: (LISP::ENVIRONMENT-INIT) 15: ((LABELS LISP::%RESTART-LISP SAVE-LISP)) 16: ((LABELS LISP::RESTART-LISP SAVE-LISP)) 0] (quit) $ Interestingly, if I run cmucl -noinit, I get no such error. However, I have no initialization file, as far as I can tell: neither of ~/init.lisp ~/.cmucl-init.lisp exists And strace fails on cmucl after the first memory-management-related segfault so I don't know what's going on. Also interestingly, if another user starts cmucl, it works. If same user starts with a different $HOME, it also fails. If another user with essentially the same config starts it, it works. It used to all work fine. Failure 100% reproducible in my current environment. I'm baffled. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] May your desire to be correct overcome your desire to have been correct (which you were not, anyway). -- Faré -- System Information: Debian Release: 4.0 APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.16.9-blefuscu Locale: LANG=en_US.iso-8859-1, LC_CTYPE=en_US.iso-8859-1 (charmap=ISO-8859-1) (ignored: LC_ALL set to en_US.iso-8859-1) Versions of packages cmucl depends on: ii common-lisp-controller 6.9 This is a Common Lisp source and c ii debconf [debconf-2.0] 1.5.11 Debian configuration management sy Versions of packages cmucl recommends: pn binfmt-support <none> (no description available) -- debconf information: cmucl/upgradeproblems:

On Saturday 20 January 2007 05:28, Faré wrote:
*** Please type your report below this line *** Since I last upgraded cmucl, I get the following error and backtrace whenever I start it.
$ cmucl
Error in function UNIX::SIGSEGV-HANDLER: Segmentation Violation at #x10044FB8. [Condition of type SIMPLE-ERROR]
Hmm.
-- System Information: .. Kernel: Linux 2.6.16.9-blefuscu
Could you upgrade your kernel to the new one in unstable (linux-image-2.6.18-3-686)? I'm guessing that will fix the problem. Groetjes, Peter -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson|

Using a custom-compiled kernel 2.6.18 (has to be custom - or it won't boot on this crypto'ed machine), I have exactly the same symptoms (plus other unrelated trouble trying to resume-from-ram). I don't think it is kernel-related. Actually, I realize that it dies when I'm within screen with TERM=screen.linux as fare, but not when I am in a session outside of screen, when I override TERM with "screen" or "linux", or run as root or a different user in screen. Note that screen.linux is defined in $TERMCAP (that is the same in both working and non-working sessions). Something is fishy, possibly in cmucl's TERM handling. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Wealth, like happiness, is never attained when sought after directly. It comes as a by-product of providing a useful service. -- Henry Ford

After some experimentation, it looks like the problem is a buffer overflow (of all things!) when variable $TERMCAP is too big. Removing an entry from $TERMCAP makes cmucl happy. Making one lengthier again makes it unhappy again. PS: <subliminal>cl-launch</subliminal> [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] The difference between a programmer and a user, is that the programmer knows there is no difference between using and programming. -- Faré

Faré wrote:
After some experimentation, it looks like the problem is a buffer overflow (of all things!) when variable $TERMCAP is too big.
Interesting. And dangerous. But I cannot reproduce it: $ echo $TERMCAP | wc -c 11448 $ cmucl CMU Common Lisp CVS 19d 19d-release (19D), running on frost With core: /usr/lib/cmucl/lisp.core Dumped on: Sat, 2006-12-30 21:27:55+01:00 on frost For support see http://www.cons.org/cmucl/support.html Send bug reports to the debian BTS. or to pvaneynd@debian.org type (help) for help, (quit) to exit, and (demo) to see the demos Loaded subsystems: Python 1.1, target Intel x86 CLOS based on Gerd's PCL 2004/04/14 03:32:47 *
PS: <subliminal>cl-launch</subliminal>
I hear you ;-) Groetjes, Peter -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson|

It's pretty reproducible here, with cmucl 19d-20061116-1, which is the latest I find in unstable. When you test it, does your $TERM match your $TERMCAP ? Otherwise, cmucl might not be trying to use it. TERM=screen.linux TERMCAP='SC|screen.linux|VT 100/ANSI X3.64 virtual terminal:\ :hs:ts=\E_:fs=\E\\:ds=\E_\E\\:\ :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\ :cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\ :do=^J:nd=\E[C:pt:rc=\E8:rs=\Ec:sc=\E7:st=\EH:up=\EM:\ :le=^H:bl=^G:cr=^M:it#8:ho=\E[H:nw=\EE:ta=^I:is=\E)0:\ :li#27:co#100:am:xn:xv:LP:sr=\EM:al=\E[L:AL=\E[%dL:\ :cs=\E[%i%d;%dr:dl=\E[M:DL=\E[%dM:dc=\E[P:DC=\E[%dP:\ :im=\E[4h:ei=\E[4l:mi:IC=\E[%d@:ks=\E[?1h\E=:\ :ke=\E[?1l\E>:vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l:\ :ti=\E[?1049h:te=\E[?1049l:us=\E[4m:ue=\E[24m:so=\E[3m:\ :se=\E[23m:mb=\E[5m:md=\E[1m:mh=\E[2m:mr=\E[7m:\ :me=\E[m:ms:\ :Co#8:pa#64:AF=\E[3%dm:AB=\E[4%dm:op=\E[39;49m:AX:\ :vb=\Eg:as=\E(0:ae=\E(B:\ :ac=\140\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\ :k0=\E[10~:k1=\EOP:k2=\EOQ:k3=\EOR:k4=\EOS:k5=\E[15~:\ :k6=\E[17~:k7=\E[18~:k8=\E[19~:k9=\E[20~:k;=\E[21~:\ :F1=\E[23~:F2=\E[24~:F3=\E[25~:F4=\E[26~:F5=\E[28~:\ :F6=\E[29~:F7=\E[31~:F8=\E[32~:F9=\E[33~:FA=\E[34~:kb=:\ :K2=\E[G:kB=\E[Z:kh=\E[1~:@1=\E[1~:kH=\E[4~:@7=\E[4~:\ :kN=\E[6~:kP=\E[5~:kI=\E[2~:kD=\E[3~:ku=\EOA:kd=\EOB:\ :kr=\EOC:kl=\EOD:' or TERMCAP='SC|screen.linux|VT 100/ANSI X3.64 virtual terminal:\ :kr=\EOC111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111:kl=\EOD22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222:ku=\EOA111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111:kd=\EOB4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444:' [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Majority, n.: That quality that distinguishes a crime from a law. On 03/05/07, Peter Van Eynde <pvaneynd@debian.org> wrote:
Faré wrote:
After some experimentation, it looks like the problem is a buffer overflow (of all things!) when variable $TERMCAP is too big.
Interesting. And dangerous.
But I cannot reproduce it:
$ echo $TERMCAP | wc -c 11448 $ cmucl CMU Common Lisp CVS 19d 19d-release (19D), running on frost With core: /usr/lib/cmucl/lisp.core Dumped on: Sat, 2006-12-30 21:27:55+01:00 on frost For support see http://www.cons.org/cmucl/support.html Send bug reports to the debian BTS. or to pvaneynd@debian.org type (help) for help, (quit) to exit, and (demo) to see the demos
Loaded subsystems: Python 1.1, target Intel x86 CLOS based on Gerd's PCL 2004/04/14 03:32:47

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Faré I tried with the environment set as you gave, but still it works. Actually I cannot find serious references to TERMCAP in the cmucl sources so I fail to see where it could crash the image... What does strace say? Groetjes, Peter - -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson| -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGO51c11ldN0tyliURAi7IAJ9l4534twwC0pjEmKHjhu9W5eyY8QCeOSys qbH/pCL+bAfoqQw/1vi7E2k= =42i+ -----END PGP SIGNATURE-----

On 04/05/07, Peter Van Eynde <pvaneynd@debian.org> wrote:
Hello Faré
I tried with the environment set as you gave, but still it works. Actually I cannot find serious references to TERMCAP in the cmucl sources so I fail to see where it could crash the image... There is a hemlock/termcap.lisp and a hemlock/rompsite.lisp, where the environment variable is used. Maybe it would help if these were compiled with a better debugging setting?
What does strace say? strace and ltrace output attached. Not very informative to me.
NB: regarding the suggestion by Pierre Thierry, I wouldn't know what to ask GDB. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] The main difference between a computer salesman and a used car salesman is that the used car salesman can probably drive and knows when he's lying. - Peter da Silva

Scribit =?UTF-8?Q? Far=C3=A9 ?= dies 08/05/2007 hora 16:13:
NB: regarding the suggestion by Pierre Thierry, I wouldn't know what to ask GDB.
Just do: $ gdb cmucl If it segfaults, then do (gdb) bt That might help. Quickly, Pierre -- nowhere.man@levallois.eu.org OpenPGP 0xD9D50D8A

OK, it gets weirder. On the zsh command-line, I can make it fail deterministically. In a sh script, it deterministically works. I traced that to the argv[0]. Using zsh, I can explicitly call #!/bin/zsh -f ARGV0=cmucl /usr/bin/cmucl and have it fail deterministically (given the appropriately long TERMCAP and TERM -- otherwise it still works). Note that in the backtrace, those frames are fishy: 10: (KERNEL:CSUBTYPEP #<ARRAY-TYPE SIMPLE-BASE-STRING> #<UNION-TYPE LIST>) 11: (MAKE-SEQUENCE SIMPLE-STRING 15 :INITIAL-ELEMENT NIL) 12: (LISP::CONCAT-TO-SIMPLE* SIMPLE-STRING "/home/fare/bug" "/") The initial-element NIL is "undefined behaviour" when a BASE-CHAR is otherwise wanted. This is in CONCAT-TO-SIMPLE* from code/seq.lisp. The compiler might be confused between inferred types and producing something fishy. When I start cmucl and get it in buggy mode, then trying to (LISP::CONCAT-TO-SIMPLE* 'SIMPLE-STRING "/home/fare/bug" "/") or (MAKE-SEQUENCE 'SIMPLE-STRING 15 :INITIAL-ELEMENT NIL) from the debugger's REPL consistently cause the same SIGSEGV. Whereas when I start it in non-buggy mode, the former gives the expected result "/home/fare/bug/" (the current directory) and the latter correctly gives a Type-error in KERNEL::OBJECT-NOT-BASE-CHAR-ERROR-HANDLER: NIL is not of type BASE-CHAR This suggests an issue with low-safety evaluation. How do I check the "current" safety settings? [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] The program isn't debugged until the last user is dead. On 08/05/07, Faré <fahree@gmail.com> wrote:
On 04/05/07, Peter Van Eynde <pvaneynd@debian.org> wrote:
Hello Faré
I tried with the environment set as you gave, but still it works. Actually I cannot find serious references to TERMCAP in the cmucl sources so I fail to see where it could crash the image... There is a hemlock/termcap.lisp and a hemlock/rompsite.lisp, where the environment variable is used. Maybe it would help if these were compiled with a better debugging setting?
What does strace say? strace and ltrace output attached. Not very informative to me.
NB: regarding the suggestion by Pierre Thierry, I wouldn't know what to ask GDB.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] The main difference between a computer salesman and a used car salesman is that the used car salesman can probably drive and knows when he's lying. - Peter da Silva

a big TERMCAP and an ARGV0 of length <= 7 reveals the bug. It looks like the overall size and/or alignment of the environment in general may contribute to revealing the bug or not. Indeed, trying to reproduce the bug with a different environment causes a very different pattern in when the bug is triggered. Sometimes, I don't even need to change TERM or TERMCAP. If you can suggest other things to try, I'll be glad. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Every program has at least one bug and can be shortened by at least one instruction -- from which, by induction, one can deduce that every program can be reduced to one instruction which doesn't work. On 19/06/07, Faré <fahree@gmail.com> wrote:
OK, it gets weirder.
On the zsh command-line, I can make it fail deterministically. In a sh script, it deterministically works. I traced that to the argv[0]. Using zsh, I can explicitly call #!/bin/zsh -f ARGV0=cmucl /usr/bin/cmucl and have it fail deterministically (given the appropriately long TERMCAP and TERM -- otherwise it still works).
Note that in the backtrace, those frames are fishy: 10: (KERNEL:CSUBTYPEP #<ARRAY-TYPE SIMPLE-BASE-STRING> #<UNION-TYPE LIST>) 11: (MAKE-SEQUENCE SIMPLE-STRING 15 :INITIAL-ELEMENT NIL) 12: (LISP::CONCAT-TO-SIMPLE* SIMPLE-STRING "/home/fare/bug" "/")
The initial-element NIL is "undefined behaviour" when a BASE-CHAR is otherwise wanted. This is in CONCAT-TO-SIMPLE* from code/seq.lisp. The compiler might be confused between inferred types and producing something fishy.
When I start cmucl and get it in buggy mode, then trying to (LISP::CONCAT-TO-SIMPLE* 'SIMPLE-STRING "/home/fare/bug" "/") or (MAKE-SEQUENCE 'SIMPLE-STRING 15 :INITIAL-ELEMENT NIL) from the debugger's REPL consistently cause the same SIGSEGV. Whereas when I start it in non-buggy mode, the former gives the expected result "/home/fare/bug/" (the current directory) and the latter correctly gives a Type-error in KERNEL::OBJECT-NOT-BASE-CHAR-ERROR-HANDLER: NIL is not of type BASE-CHAR This suggests an issue with low-safety evaluation. How do I check the "current" safety settings?
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] The program isn't debugged until the last user is dead.
On 08/05/07, Faré <fahree@gmail.com> wrote:
On 04/05/07, Peter Van Eynde <pvaneynd@debian.org> wrote:
Hello Faré
I tried with the environment set as you gave, but still it works. Actually I cannot find serious references to TERMCAP in the cmucl sources so I fail to see where it could crash the image... There is a hemlock/termcap.lisp and a hemlock/rompsite.lisp, where the environment variable is used. Maybe it would help if these were compiled with a better debugging setting?
What does strace say? strace and ltrace output attached. Not very informative to me.
NB: regarding the suggestion by Pierre Thierry, I wouldn't know what to ask GDB.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] The main difference between a computer salesman and a used car salesman is that the used car salesman can probably drive and knows when he's lying. - Peter da Silva

Hi Faré! On Wed, 20 Jun 2007 05:03:41 +0200, Faré wrote:
a big TERMCAP and an ARGV0 of length <= 7 reveals the bug.
I cannot reproduce it on etch nor on a clean sid chroot: ===== $ export TERM=screen.linux $ export TERMCAP='SC|screen.linux|VT 100/ANSI X3.64 virtual terminal:\ :hs:ts=\E_:fs=\E\\:ds=\E_\E\\:\ :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\ :cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\ :do=^J:nd=\E[C:pt:rc=\E8:rs=\Ec:sc=\E7:st=\EH:up=\EM:\ :le=^H:bl=^G:cr=^M:it#8:ho=\E[H:nw=\EE:ta=^I:is=\E)0:\ :li#27:co#100:am:xn:xv:LP:sr=\EM:al=\E[L:AL=\E[%dL:\ :cs=\E[%i%d;%dr:dl=\E[M:DL=\E[%dM:dc=\E[P:DC=\E[%dP:\ :im=\E[4h:ei=\E[4l:mi:IC=\E[%d@:ks=\E[?1h\E=:\ :ke=\E[?1l\E>:vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l:\ :ti=\E[?1049h:te=\E[?1049l:us=\E[4m:ue=\E[24m:so=\E[3m:\ :se=\E[23m:mb=\E[5m:md=\E[1m:mh=\E[2m:mr=\E[7m:\ :me=\E[m:ms:\ :Co#8:pa#64:AF=\E[3%dm:AB=\E[4%dm:op=\E[39;49m:AX:\ :vb=\Eg:as=\E(0:ae=\E(B:\ :ac=\140\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\ :k0=\E[10~:k1=\EOP:k2=\EOQ:k3=\EOR:k4=\EOS:k5=\E[15~:\ :k6=\E[17~:k7=\E[18~:k8=\E[19~:k9=\E[20~:k;=\E[21~:\ :F1=\E[23~:F2=\E[24~:F3=\E[25~:F4=\E[26~:F5=\E[28~:\ :F6=\E[29~:F7=\E[31~:F8=\E[32~:F9=\E[33~:FA=\E[34~:kb=:\ :K2=\E[G:kB=\E[Z:kh=\E[1~:@1=\E[1~:kH=\E[4~:@7=\E[4~:\ :kN=\E[6~:kP=\E[5~:kI=\E[2~:kD=\E[3~:ku=\EOA:kd=\EOB:\ :kr=\EOC:kl=\EOD:' $ ARGV0=cmucl /usr/bin/cmucl ===== I also tried the second TERMCAP setting you gave at [1], but still without success in reproducing the bug, both with bash or dash. Can you confirm you still experience this bug, please? Thx, bye, Gismo / Luca Footnotes: [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=407606#30

Your message dated Wed, 25 Jun 2008 00:32:38 +0200 with message-id <871w2mfoqh.fsf@gismo.pca.it> and subject line Re: [cl-debian] Bug#407606: cmucl fails at initialization has caused the Debian Bug report #407606, regarding cmucl fails at initialization to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 407606: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=407606 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
participants (5)
-
"Faré"
-
Luca Capello
-
owner@bugs.debian.org
-
Peter Van Eynde
-
Pierre THIERRY