Hi,
Has anyone looked at this? I'm traversing a large structure and using an eq (tried eql too) hash table to record where I've been so that I don't get caught in circularities. There are about 1.5M nodes, but what I see is that adding entries to the hash table slows down (a lot) over time, even when I give an initial size of 4,000,000. That would be max occupancy of 30%, lower than anyone would typically rehash.
What I see is that it rapidly starts to fill. At 300k entries I'm getting about 2.5K new entries per second. By the time I get to 600k entries it is down to 750 new entries/sec. At 900k (where it stands now) it's about 500 entries/sec.
That makes me suspect that the hash isn't distributing well over the size of the underlying vector and that the slowdown is due to buckets getting long.
I'm not sure how to probe this - any ideas? Or to fix it. I suppose I could look for an alternative hash implementation, but I thought I would check first with the list to see if anyone else can shed some light.
Thanks, Alan
Alan Ruttenberg alanruttenberg@gmail.com writes:
Hi,
Has anyone looked at this? I'm traversing a large structure and using an eq (tried eql too) hash table to record where I've been so that I don't get caught in circularities. There are about 1.5M nodes, but what I see is that adding entries to the hash table slows down (a lot) over time, even when I give an initial size of 4,000,000. That would be max occupancy of 30%, lower than anyone would typically rehash.
What I see is that it rapidly starts to fill. At 300k entries I'm getting about 2.5K new entries per second. By the time I get to 600k entries it is down to 750 new entries/sec. At 900k (where it stands now) it's about 500 entries/sec.
That makes me suspect that the hash isn't distributing well over the size of the underlying vector and that the slowdown is due to buckets getting long.
I'm not sure how to probe this - any ideas? Or to fix it. I suppose I could look for an alternative hash implementation, but I thought I would check first with the list to see if anyone else can shed some light.
I haven't actively pushed our CL:HASH-TABLE to such limits, but would be very interested in including alternative implementations in ABCL if that solves your problem. [WeakHashTable][] provides an example of an alternate hashtable structure that "looks" like a standard CL:HASH-TABLE to existing code.
[WeakHashTable]: http://lisp.not.org/trac/armedbear/browser/trunk/abcl/src/org/armedbear/lisp...
ABCL's [Java implementation of CL:HASH-TABLE][HashTable.java] uses a pre-allocated array of HashEntry objects for the underlying storage buckets, so unless there is some sort of memory pressure from GC, I am surprised by this behavior. Standard questions for your side: Can you comfortably fit 10^6 of your records in memory? Are you copying the structure when adding to the HASH-TABLE, or just inserting a reference? I might start investigating by throwing the "-Xloggc: log GC status to a file with time stamps" switch to see if the GC "pulses" get longer as you get deeper into your transverse.
[HashTable.java] http://lisp.not.org/trac/armedbear/browser/trunk/abcl/src/org/armedbear/lisp...
A [ticket in the Trac][1] is always an easy place to stick such a problem, so it doesn't get lost (login with OpenID if you need to).
[1]: http://lisp.not.org/trac/armedbear/report/1
armedbear-devel@common-lisp.net