[armedbear-devel] Losing on multiprocessing

8 Apr 2015

      Howdy,

I wonder if those of you have worked with threads might have a quick look
to see if I am doing something stupid.

https://lsw2.googlecode.com/svn/branches/bona/util/jargrep.lisp

The situation is that I want to do stuff (like look for matches to a
regular expression) in 240k files which comprise 52G of data.

I am running on a VM allocated 5 CPUs each with three cores.

Because at the moment the disk subsystem isn't very fast, I decided to
approach this by breaking up the 240k files into 15 parts and put each part
in a jar file.

The code mentioned above looks for a regular expression (two methods for
two different regex handlers: java and dk.brics.automaton

It is invoked something like:

(jar-map-threads-automaton-find
 regex
 (generate-filename-sequence "/data/jars/15/file#.jar" 2 0 14))

This spawns off 15 threads that go at it for something around a minute. As
they find hits they save them in a lisp hash table keyed by the entry name
in the jar file, which is unique across all the jar files.

The result of running this is about (and their's the rub) 20 key value
pairs in the hash table (I had read that ABCL hash tables are thread safe).
The problem is that different runs of this code on the same data get
different numbers of key value pairs, between 13 and 24!

I'm not sure whether I'm just not doing this the right way, in which case
it would be very helpful to get an explanation of why not, or there's a
problem somewhere in the implementation.

Any ideas would be greatly appreciated.

Best,
Alan

(LISP-IMPLEMENTATION-VERSION)
"1.2.0-dev-svn-14436M"

"Java_HotSpot(TM)_64-Bit_Server_VM-Oracle_Corporation-1.7.0_21-b11"

"amd64-Linux-3.8.0-30-generic"

[armedbear-devel] Losing on multiprocessing

Alan Ruttenberg