[cl-debian] sbcl with sb-threads not compatible with 2.4 kernel

Hi, It seems that even when in linux-os.c we test if futex'es work and we set linux_no_threads_p accordingly, something is calling futex_wait never the less. As the futex calls do _NO_ error checking this means the system then goes into a loop trying to call the futex call that fails over and over again. So: is 2.4 support dead or is this a bug? Groetjes, Peter -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson|

Peter Van Eynde <pvaneynd@debian.org> writes:
So: is 2.4 support dead or is this a bug?
Uh, it sounds like a bug: one, just like any other, which will be fixed when sufficient interest is taken in fixing it from people it affects. If no-one is interested in fixing it, then 2.4 support on the x86 will effectively be dead. Surely you don't need this explained? Cheers, Christophe

On Wednesday 10 August 2005 08:18, Peter Van Eynde wrote:
Hi,
It seems that even when in linux-os.c we test if futex'es work and we set linux_no_threads_p accordingly, something is calling futex_wait never the less. As the futex calls do _NO_ error checking this means the system then goes into a loop trying to call the futex call that fails over and over again.
That's strange. I have a 2.6 kernel, but replacing futex_wake body with "return 0;" and futex_wait with "for(;;);" didn't give me problems. There are two calls to futex_wait: get-mutex calls it if the mutex is not available (should not happen and thus ok to loop if there is only one thread), and condition-wait which I think is similar.
So: is 2.4 support dead or is this a bug?
Tell us how to reproduce it and what is the observed behaviour.
Groetjes, Peter
Cheers, Gábor

Gábor Melis wrote:
So: is 2.4 support dead or is this a bug?
Tell us how to reproduce it and what is the observed behaviour.
The easiest way to reproduce is running the official 0.9.3 binaries (extracted from the rpm) on a debian kernel-image-2.4.27-2-686 kernel. You get: pvaneynd@sharrow:~/downloads/sbcl-test :) $ ./sbcl --core ./sbcl.core --userinit /dev/null --sysinit /dev/null Linux with NPTL support (e.g. kernel 2.6 or newer) required for thread-enabled SBCL. Disabling thread support. This is SBCL 0.9.3, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. internal error #31 SC: 14, Offset: 0 lispobj 0x500000b SC: 14, Offset: 2 lispobj 0x50679af fatal error encountered in SBCL pid 7714(tid 0): internal error too early in init, can't recover The system is too badly corrupted or confused to continue at the Lisp level. If the system had been compiled with the SB-LDB feature, we'd drop into the LDB low-level debugger now. But there's no LDB in this build, so we can't really do anything but just exit, sorry. pvaneynd@sharrow:~/downloads/sbcl-test :( $ md5sum sbcl sbcl.core 6ac2292d6245d6731f4b811cffc5ea16 sbcl 99283ac68b50ed807c10631798428872 sbcl.core Groetjes, Peter -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson|

Peter Van Eynde wrote:
Tell us how to reproduce it and what is the observed behaviour.
I got a better example: 3/pvaneynd@sharrow:~/fakeroot/darcs-upstream :( $ uname -a Linux sharrow 2.6.12.3-mine2 #1 Fri Aug 5 18:19:08 CEST 2005 i686 GNU/Linux 3/pvaneynd@sharrow:~/fakeroot/darcs-upstream :) $ sbcl This is SBCL 0.9.3, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. * (quit) 3/pvaneynd@sharrow:~/fakeroot/darcs-upstream :) $ LD_ASSUME_KERNEL=2.4.1 sbcl This is SBCL 0.9.3, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. internal error #29 SC: 14, Offset: 4 lispobj 0x50ba64f fatal error encountered in SBCL pid 11009(tid 0): internal error too early in init, can't recover The system is too badly corrupted or confused to continue at the Lisp level. If the system had been compiled with the SB-LDB feature, we'd drop into the LDB low-level debugger now. But there's no LDB in this build, so we can't really do anything but just exit, sorry. 3/pvaneynd@sharrow:~/fakeroot/darcs-upstream :( $ LD_ASSUME_KERNEL=2.4.1 sbcl This is SBCL 0.9.3, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. Segmentation fault Please note that the error changes. I got the idea from a debian bugreport: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301511&archive=yes I quote: "Recent glibc switches to use NPTL instead of LinuxThreads when 2.6 kernel is used. If you set environment variable LD_ASSUME_KERNEL=2.4.1 and rerun his programs on 2.6 kernel, the problem is just disappeared (because LinuxThreads is used). Note that NPTL uses futex for mutex protection, instead LinuxThreads uses signal." I traced the problem on a 2.4 kernel until the first nl_langinfo that blows up with a sigsegv, so it seems to possibly fit. So getting threaded sbcl to work again on 2.4 looks a little too difficult to do in the short amount of time I have available. Do people think it is better to have a sbcl that bombs out on 2.4 with "you should run 2.6" or should I drop the threading on x86? Groetjes, Peter -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson|
sk17766

Peter Van Eynde wrote: [snip]
Do people think it is better to have a sbcl that bombs out on 2.4 with "you should run 2.6" or should I drop the threading on x86?
The Debian kernel team pushes strongly for 2.6 only (at least for i386 an amd64) and there's currently not much activity for 2.4 updates beyond 2.4.27 for sarge. Some architectures may keep 2.4 for a while, but for i386/amd64 it is de facto dead. Thiemo

On Thu, 2005-08-11 at 23:32 +0200, Thiemo Seufer wrote:
Peter Van Eynde wrote: [snip]
Do people think it is better to have a sbcl that bombs out on 2.4 with "you should run 2.6" or should I drop the threading on x86?
The Debian kernel team pushes strongly for 2.6 only (at least for i386 an amd64) and there's currently not much activity for 2.4 updates beyond 2.4.27 for sarge. Some architectures may keep 2.4 for a while, but for i386/amd64 it is de facto dead.
Argh, habe mercy with us poor sysadmins. The last thing i need to do is upgrade all of our servers to 2.6 only to keep sbcl happy :-/ Cheers Ralf Mattes
Thiemo _______________________________________________ cl-debian mailing list cl-debian@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/cl-debian

R. Mattes wrote:
On Thu, 2005-08-11 at 23:32 +0200, Thiemo Seufer wrote:
Peter Van Eynde wrote: [snip]
Do people think it is better to have a sbcl that bombs out on 2.4 with "you should run 2.6" or should I drop the threading on x86?
The Debian kernel team pushes strongly for 2.6 only (at least for i386 an amd64) and there's currently not much activity for 2.4 updates beyond 2.4.27 for sarge. Some architectures may keep 2.4 for a while, but for i386/amd64 it is de facto dead.
Argh, habe mercy with us poor sysadmins. The last thing i need to do is upgrade all of our servers to 2.6 only to keep sbcl happy :-/
Sarge will still retain 2.4.27 as installed default, only unstable will change, which is hopefully not what you run on your servers. :-) An additional sarge backport of a newer SBCL with threading disabled might be useful. Thiemo

Thanks for all the input. I've decided to make sbcl bomb out when having :sb-threads and running on 2.4 or when NPTL is not available. (see patch) Thiemo Seufer wrote:
An additional sarge backport of a newer SBCL with threading disabled might be useful.
I've decided to have a seperate sarge24 (and hoary24) directory on the p.d.o repository with a sbcl without threading. Groetjes, Peter -- signature -at- pvaneynd.mailworks.org http://www.livejournal.com/users/pvaneynd/ "God, root, what is difference?" Pitr | "God is more forgiving." Dave Aronson| --- sbcl-0.9.3.36.orig/src/runtime/linux-os.c +++ sbcl-0.9.3.36/src/runtime/linux-os.c @@ -92,6 +92,21 @@ int linux_sparc_siginfo_bug = 0; int linux_no_threads_p = 0; +#ifdef LISP_FEATURE_SB_THREAD +int isnptl (void) +{ + size_t n = confstr (_CS_GNU_LIBPTHREAD_VERSION, NULL, 0); + if (n > 0) + { + char *buf = alloca (n); + confstr (_CS_GNU_LIBPTHREAD_VERSION, buf, n); + if (strstr (buf, "NPTL")) + return 1; + } + return 0; +} +#endif + void os_init(void) { @@ -121,9 +136,13 @@ } #ifdef LISP_FEATURE_SB_THREAD futex_wait(futex,-1); - if(errno==ENOSYS) linux_no_threads_p = 1; - if(linux_no_threads_p) - fprintf(stderr,"Linux with NPTL support (e.g. kernel 2.6 or newer) required for \nthread-enabled SBCL. Disabling thread support.\n\n"); + if(errno==ENOSYS) { + lose(stderr,"This version of sbcl is compiled with threading support, but your kernel is too old to support this.\n\ +Please use a more recent kernel or a version without threading support.\n"); + } + if(! isnptl()) { + lose("This version of sbcl only works correctly with the NPTL threading library. Please use a newer glibc, older sbcl or stop using LD_ASSUME_KERNEL"); + } #endif os_vm_page_size = getpagesize(); }

On Thursday 11 August 2005 23:21, Peter Van Eynde wrote:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=301511&archive=yes
I quote: "Recent glibc switches to use NPTL instead of LinuxThreads when 2.6 kernel is used. If you set environment variable LD_ASSUME_KERNEL=2.4.1 and rerun his programs on 2.6 kernel, the problem is just disappeared (because LinuxThreads is used). Note that NPTL uses futex for mutex protection, instead LinuxThreads uses signal."
I've read the bug report twice and still don't understand what it means for us. It says that one can get caught out by calling non-reentrant glibc functions from signal handlers and if glibc uses nptl a likely symptom is blocking in a futex.
I traced the problem on a 2.4 kernel until the first nl_langinfo that blows up with a sigsegv, so it seems to possibly fit.
I traced the problem on a 2.4 kernel until the first futex call that returns with enosys and an endless stream of sigsegvs immediately after it.
So getting threaded sbcl to work again on 2.4 looks a little too difficult to do in the short amount of time I have available.
That I agree with. And it wouldn't be very surprising if it turned out that with linuxthreads linked in we don't have a chance at all.
Do people think it is better to have a sbcl that bombs out on 2.4 with "you should run 2.6" or should I drop the threading on x86?
Let me help, if you compile it with gcc4 then disable threading because it just doesn't work at all (or as well as when compiled with gcc3 :-)). If it is compiled with gcc3 then I'd go with threads and drop 2.4 support.
Groetjes, Peter
Cheers, Gabor
participants (6)
-
Christophe Rhodes
-
Gábor Melis
-
Peter Van Eynde
-
Peter Van Eynde
-
R. Mattes
-
Thiemo Seufer