From a mail by Stellian Ionescu:
From: Stelian Ionescu sionescu@cddr.org To: armedbear-devel@common-lisp.net Cc: Date: Sat, 23 Mar 2013 01:07:04 +0000 (UTC) Subject: Bordeaux-threads test hangs The B-T test suite hangs on test "condition-variable". If I do a ^C ABCL exits, so I'm not sure how to debug it.
CL-USER(1): (lisp-implementation-version) "1.1.1" "OpenJDK_64-Bit_Server_VM-Oracle_Corporation-1.7.0_17-b02" "amd64-Linux-3.7.10-10-default"
On 3/24/13 2150 , Erik Huelsmann wrote:
From a mail by Stellian Ionescu:
From: Stelian Ionescu <sionescu@cddr.org mailto:sionescu@cddr.org> To: armedbear-devel@common-lisp.net http://common-lisp.net Cc: Date: Sat, 23 Mar 2013 01:07:04 +0000 (UTC) Subject: Bordeaux-threads test hangs The B-T test suite hangs on test "condition-variable". If I do a ^C ABCL exits, so I'm not sure how to debug it.
CL-USER(1): (lisp-implementation-version) "1.1.1" "OpenJDK_64-Bit_Server_VM-Oracle_Corporation-1.7.0_17-b02" "amd64-Linux-3.7.10-10-default"
Even through I swear I ensured that BORDEAUX-THREADS always works before making releases, I have reproduced the problem on abcl-1.1.1 onwards, but it doesn't occur in abcl-1.1.0. I am about to kick of a bisect regression to figure out where this broke.
Filed as [ticket][312].
[1]: http://trac.common-lisp.net/armedbear/ticket/312
25.03.2013, 14:03, "Mark Evenson" evenson@panix.com:
On 3/24/13 2150 , Erik Huelsmann wrote:
From a mail by Stellian Ionescu:
From: Stelian Ionescu <sionescu@cddr.org mailto:sionescu@cddr.org> To: armedbear-devel@common-lisp.net http://common-lisp.net Cc: Date: Sat, 23 Mar 2013 01:07:04 +0000 (UTC) Subject: Bordeaux-threads test hangs The B-T test suite hangs on test "condition-variable". If I do a ^C ABCL exits, so I'm not sure how to debug it.
CL-USER(1): (lisp-implementation-version) "1.1.1" "OpenJDK_64-Bit_Server_VM-Oracle_Corporation-1.7.0_17-b02" "amd64-Linux-3.7.10-10-default"
Even through I swear I ensured that BORDEAUX-THREADS always works before making releases, I have reproduced the problem on abcl-1.1.1 onwards, but it doesn't occur in abcl-1.1.0. I am about to kick of a bisect regression to figure out where this broke.
Filed as [ticket][312].
Mark, the test hangs not always.
I think this is bug in the test, because the test is not guaranteed to work.
It create 100 threads, each thread waits for (= i *shared*). In every thread i has different value, from 0 to 99. So the threads are chained and each thread waits white the previous one will increase *shared*.
But the threads use bt:condition-notify to interact, which deliver notification to only one of the waiting thread, and there is not guarantee it will be right thread.
The tests passes on SBCL. Maybe SBCL always choses to notify the first thread in the waiting queue.
But bt:condition-notify contract does not require this.
In short, I think what we see is not a bug in ABCL or bordeaux-threads, but a bug in the test.
Best regards, - Anton
On 3/25/13 1129 , Anton Vodonosov wrote: […]
Mark, the test hangs not always.
Thanks for the confirmation, as I was just coming around to this realization as the only way to explain the inconsistencies.
I think this is bug in the test, because the test is not guaranteed to work.
It create 100 threads, each thread waits for (= i *shared*). In every thread i has different value, from 0 to 99. So the threads are chained and each thread waits white the previous one will increase *shared*.
But the threads use bt:condition-notify to interact, which deliver notification to only one of the waiting thread, and there is not guarantee it will be right thread.
The tests passes on SBCL. Maybe SBCL always choses to notify the first thread in the waiting queue.
But bt:condition-notify contract does not require this.
In short, I think what we see is not a bug in ABCL or bordeaux-threads, but a bug in the test.
Digging into the test
(test condition-variable (setf *shared* 0) (let ((num-procs 100)) (dotimes (i num-procs) (make-thread (compile nil `(lambda () (with-lock-held (*lock*) (loop until (= ,i *shared*) do (condition-wait *condition-variable* *lock*)) (incf *shared*)) (condition-notify *condition-variable*))))) (with-lock-held (*lock*) (loop until (= num-procs *shared*) do (condition-wait *condition-variable* *lock*))) (is (equal num-procs *shared*))))
I really don't understand what is being tested here. Since there is no delay in starting the threads, for a non-loaded CPU each thread never really invokes the CONDITION-WAIT. Instead, each thread "sees" that it is the correct worker in the chain, calls essentially a no-op CONDITION-NOTIFY and then exits. Wouldn't one want to delay the execution of the threads by some random amount before starting things going?
On Mon, Mar 25, 2013 at 8:02 AM, Mark Evenson evenson@panix.com wrote:
(test condition-variable (setf *shared* 0) (let ((num-procs 100)) (dotimes (i num-procs) (make-thread (compile nil `(lambda () (with-lock-held (*lock*) (loop until (= ,i *shared*) do (condition-wait *condition-variable* *lock*)) (incf *shared*)) (condition-notify *condition-variable*))))) (with-lock-held (*lock*) (loop until (= num-procs *shared*) do (condition-wait *condition-variable* *lock*))) (is (equal num-procs *shared*))))
The test suggests a cute idea, whether or not the original author intended it. Typically a condition variable is used for producing/consuming, but in this case it could be used for something like a game of "hot potato". The bug is that the potato is not getting passed around. The fix is to move the existing condition-notify inside the lock and to insert a condition-notify before each condition-wait call. In effect each participant in the game is both a consumer and a producer -- a relay for the potato until the participant is "it".
However even with that fix the test seems irredeemable because I don't think a cascade of condition-notifies is required to reach all waiting threads, i.e., all participants aren't guaranteed to get the potato. The potato could perpetually bounce between two participants, for example. At least there's no apparent need for a Lisp implementation to come with a "potato guarantee".
On 3/25/13 1102 , Mark Evenson wrote: […]
Even through I swear I ensured that BORDEAUX-THREADS always works before making releases, I have reproduced the problem on abcl-1.1.1 onwards, but it doesn't occur in abcl-1.1.0. I am about to kick of a bisect regression to figure out where this broke.
Hmm, I'm getting very inconsistent results to testing in that I can't get abcl-1.1.0 to pass the test that it previously worked on. Maybe some interaction with SLIME and/or underlying JVM platform and/or when ASDF decides to recompile things. I need to double-check my assumptions here, start over testing where this fails.
armedbear-devel@common-lisp.net