Friends,
I wrote a little package for "fash hash tables", which provide an abstraction that is analogous to that of Common Lisp hash tables, but is faster for tables with few elements, and only slightly inferior for tables with many elements.
I did this because performance analysis showed that our system was spending too much time in hash table operations, and using the new package helped.
I have recently been cleaning this up, one reason being that I'd like to open source it. The function names used to be things like getfhash and mapfhash. Now they are like fhash:get and fhash:map-elements and so on.
However, before I open-source it, I was to make sure it's "right". It recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp does, is an abstraction violation. Whether it works by hashing is an implementation detail. The Java collection library calls this a Map. Python calls it a dictionary. Clojure calls it a map. Those are both better names.
Con: Common Lisp already uses the name "hash table", so it would be easier for existing Common Lisp programmers to see the analogy.
Advice? Thanks!
Thanks!
-- Dan
FWIW, I'd call 'em maps.
I think that would be more accurate and fit better with the rest of the programming culture in the large (whatever _that_ might be!).
-- Gary Warren King, metabang.com Cell: (413) 559 8738 Fax: (206) 338-4052 gwkkwg on Skype * garethsan on AIM * gwking on twitter
Daniel Weinreb dlw@google.com writes:
Friends,
I wrote a little package for "fash hash tables", which provide an abstraction that is analogous to that of Common Lisp hash tables, but is faster for tables with few elements, and only slightly inferior for tables with many elements.
I did this because performance analysis showed that our system was spending too much time in hash table operations, and using the new package helped.
I have recently been cleaning this up, one reason being that I'd like to open source it. The function names used to be things like getfhash and mapfhash. Now they are like fhash:get and fhash:map-elements and so on.
However, before I open-source it, I was to make sure it's "right". It recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp does, is an abstraction violation. Whether it works by hashing is an implementation detail. The Java collection library calls this a Map. Python calls it a dictionary. Clojure calls it a map. Those are both better names.
Con: Common Lisp already uses the name "hash table", so it would be easier for existing Common Lisp programmers to see the analogy.
Advice? Thanks!
Cesarum has an ADAPTATIVE-DICTIONARY abstraction that does that:
https://gitorious.org/com-informatimago/com-informatimago/blobs/master/commo...
It uses hash-tables when it's fast, and faster data structure when hash-tables are slow (ie. when they don't have enough elements).
Thank you for letting me know about this.
One of the things I feel strongly about, though, is avoiding names like dictionary-get. Fhash's package is the kind where you're not supposed to do a "use". You're supposed to get at the external symbols by using explicit package prefixes, so the dictionary- isn't there.
Anyway, I'll take a look at what you wrote. Thanks again.
-- Dan
On Mon, Jun 13, 2011 at 9:52 PM, Pascal J. Bourguignon < pjb@informatimago.com> wrote:
Daniel Weinreb dlw@google.com writes:
Friends,
I wrote a little package for "fash hash tables", which provide an abstraction that is analogous to that of Common Lisp hash tables, but is faster for tables with few elements, and only slightly inferior for tables with many elements.
I did this because performance analysis showed that our system was spending too much time in hash table operations, and using the new package helped.
I have recently been cleaning this up, one reason being that I'd like to open source it. The function names used to be things like getfhash and mapfhash. Now they are like fhash:get and fhash:map-elements and so on.
However, before I open-source it, I was to make sure it's "right". It recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp does, is an abstraction violation. Whether it works by hashing is an implementation detail. The Java collection library calls this a Map. Python calls it a dictionary. Clojure calls it a map. Those are both better names.
Con: Common Lisp already uses the name "hash table", so it would be easier for existing Common Lisp programmers to see the analogy.
Advice? Thanks!
Cesarum has an ADAPTATIVE-DICTIONARY abstraction that does that:
https://gitorious.org/com-informatimago/com-informatimago/blobs/master/commo...
It uses hash-tables when it's fast, and faster data structure when hash-tables are slow (ie. when they don't have enough elements).
-- __Pascal Bourguignon__ http://www.informatimago.com/ A bad day in () is better than a good day in {}.
pro mailing list pro@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
On Mon, Jun 13, 2011 at 12:18 PM, Daniel Weinreb dlw@google.com wrote:
Here are pros and cons of changing it that I can see. Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Yes it is! It's a hash table with one bucket.
But I prefer the term "map" anyway :-)
-- Scott
On Jun 13, 2011, at 21:18 , Daniel Weinreb wrote:
Friends,
I wrote a little package for "fash hash tables", which provide an abstraction that is analogous to that of Common Lisp hash tables, but is faster for tables with few elements, and only slightly inferior for tables with many elements.
I did this because performance analysis showed that our system was spending too much time in hash table operations, and using the new package helped.
I have recently been cleaning this up, one reason being that I'd like to open source it. The function names used to be things like getfhash and mapfhash. Now they are like fhash:get and fhash:map-elements and so on.
However, before I open-source it, I was to make sure it's "right". It recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp does, is an abstraction violation. Whether it works by hashing is an implementation detail. The Java collection library calls this a Map. Python calls it a dictionary. Clojure calls it a map. Those are both better names.
Actually, in Java the naming reflects the separation between abstraction and implementation. AFAIU, clojure does not quite do this and neither does Python.
I would advocate settling down on a Java-esque nomenclature with MAP or DICTIONARY as "names" for the abstraction and with different names for the implementations; e.g., DICTIONARY-TREE, DICTIONARY-FHASH, DICTIONARY-HASH-TABLE, you name it....
Cheers -- MA
-- Marco Antoniotti, Associate Professor tel. +39 - 02 64 48 79 01 DISCo, Università Milano Bicocca U14 2043 http://bimib.disco.unimib.it Viale Sarca 336 I-20126 Milan (MI) ITALY
Please note that I am not checking my Spam-box anymore. Please do not forward this email without asking me first.
On Tue, Jun 14, 2011 at 9:37 AM, Marco Antoniotti antoniotti.marco@disco.unimib.it wrote:
On Jun 13, 2011, at 21:18 , Daniel Weinreb wrote:
Friends,
I wrote a little package for "fash hash tables", which provide an abstraction that is analogous to that of Common Lisp hash tables, but is faster for tables with few elements, and only slightly inferior for tables with many elements.
I did this because performance analysis showed that our system was spending too much time in hash table operations, and using the new package helped.
I have recently been cleaning this up, one reason being that I'd like to open source it. The function names used to be things like getfhash and mapfhash. Now they are like fhash:get and fhash:map-elements and so on.
However, before I open-source it, I was to make sure it's "right". It recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp does, is an abstraction violation. Whether it works by hashing is an implementation detail. The Java collection library calls this a Map. Python calls it a dictionary. Clojure calls it a map. Those are both better names.
Actually, in Java the naming reflects the separation between abstraction and implementation. AFAIU, clojure does not quite do this and neither does Python.
I would advocate settling down on a Java-esque nomenclature with MAP or DICTIONARY as "names" for the abstraction and with different names for the implementations; e.g., DICTIONARY-TREE, DICTIONARY-FHASH, DICTIONARY-HASH-TABLE, you name it....
AFAIK, Java started just like Lisp with the Hashtable class (implementation without interface) which later was deprecated in favor of Map (interface) + HashMap, TreeMap, ... (implementations). There is already a somewhat official API for extensible sequences, designed by Christophe Rhodes and implemented in SBCL and ABCL (and maybe others, I don't know). We could similarly have extensible maps, even though "sequence" is an already recognized abstraction in CL, while "map" is not.
Alessio
Could you tell me where to find that? Thanks. -- Dan
On Tue, Jun 14, 2011 at 3:50 AM, Alessio Stalla alessiostalla@gmail.comwrote:
On Tue, Jun 14, 2011 at 9:37 AM, Marco Antoniotti antoniotti.marco@disco.unimib.it wrote:
On Jun 13, 2011, at 21:18 , Daniel Weinreb wrote:
Friends,
I wrote a little package for "fash hash tables", which provide an abstraction that is analogous to that of Common Lisp hash tables, but is faster for tables with few elements, and only slightly inferior for tables with many elements.
I did this because performance analysis showed that our system was spending too much time in hash table operations, and using the new package helped.
I have recently been cleaning this up, one reason being that I'd like to open source it. The function names used to be things like getfhash and mapfhash. Now they are like fhash:get and fhash:map-elements and so on.
However, before I open-source it, I was to make sure it's "right". It recently occurred to me that the package name "fhash" has problems.
Here are pros and cons of changing it that I can see.
Pro: I's not a hash table in the small-cardinality case; it's a linear lookup. So the name is not actually accurate.
Pro: Calling such a data structure a "hash table", even as Common Lisp does, is an abstraction violation. Whether it works by hashing is an implementation detail. The Java collection library calls this a Map. Python calls it a dictionary. Clojure calls it a map. Those are both better names.
Actually, in Java the naming reflects the separation between abstraction
and implementation. AFAIU, clojure does not quite do this and neither does Python.
I would advocate settling down on a Java-esque nomenclature with MAP or
DICTIONARY as "names" for the abstraction and with different names for the implementations; e.g., DICTIONARY-TREE, DICTIONARY-FHASH, DICTIONARY-HASH-TABLE, you name it....
AFAIK, Java started just like Lisp with the Hashtable class (implementation without interface) which later was deprecated in favor of Map (interface) + HashMap, TreeMap, ... (implementations). There is already a somewhat official API for extensible sequences, designed by Christophe Rhodes and implemented in SBCL and ABCL (and maybe others, I don't know). We could similarly have extensible maps, even though "sequence" is an already recognized abstraction in CL, while "map" is not.
Alessio
pro mailing list pro@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
On Tue, Jun 14, 2011 at 3:38 PM, Daniel Weinreb dlw@google.com wrote:
Could you tell me where to find that? Thanks. -- Dan
The paper - titled "User-extensible sequences in Common Lisp" by C. Rhodes - can be found for example here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.1604&rep=rep1&type=pdf
I don't remember how well the paper describes SBCL's implementation; I think it's worth taking a look at it to see how it combines CLOS (for genericity) with regular functions special-cased on CL built-in types. ABCL's impl is almost a clone of SBCL's, with only minor adaptations.
I agree that classes in CLOS are overrated. CLOS is mainly about generic functions and it's a pity, imho, that GFs can only be specialized on classes. With minor changes CLOS could be more general.
Cheers, Alessio
Hi. I read Christophe's paper on extensible sequences. I don't think this bears on my new package, though, for two reasons:
(1) it's only about sequences; maps don't fit into its framework.
(2) He is proposing here a change that would have to be made to every Common Lisp implementation. As may have been apparent from other email I've sent, I am, sadly, pessimistic that we can really get all of the implementors to make changes in harmony. It's not that they are bad or incompetent or anything like that. It's just that they're busy people with other priorities. In some cases, the priorities include "putting food on the table" (in the metaphorical sense), i.e. it would be easier if someone could pay them to do this, but I don't see how that would happen.
Anyway, thanks for pointing me at this very interesting paper.
-- Dan
On Tue, Jun 14, 2011 at 10:13 AM, Alessio Stalla alessiostalla@gmail.comwrote:
On Tue, Jun 14, 2011 at 3:38 PM, Daniel Weinreb dlw@google.com wrote:
Could you tell me where to find that? Thanks. -- Dan
The paper - titled "User-extensible sequences in Common Lisp" by C. Rhodes - can be found for example here: < http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.1604&rep=rep...
I don't remember how well the paper describes SBCL's implementation; I think it's worth taking a look at it to see how it combines CLOS (for genericity) with regular functions special-cased on CL built-in types. ABCL's impl is almost a clone of SBCL's, with only minor adaptations.
I agree that classes in CLOS are overrated. CLOS is mainly about generic functions and it's a pity, imho, that GFs can only be specialized on classes. With minor changes CLOS could be more general.
Cheers, Alessio
Please don't take this personal, but: Pessimism is infectious. It would be better if you would keep your pessimism for yourself. There is some amount of good enthusiasm still in this community, and it needs to be encouraged further, not discouraged!
If there is enough interest and enthusiasm, then things do move. I have had very positive experiences myself with the Closer to MOP project and ContextL, where many - almost all - Common Lisp vendors were very open to bug fixes and suggestions for improvements, to the extent that the overall support for MOP-based extensions has been considerably improved. Another more recent example is the adoption of ASDF 2.x, which is by now included as a default system definition facility in many Common Lisp implementations. There are more such examples.
Thanks, Pascal
On 29 Jun 2011, at 00:06, Daniel Weinreb wrote:
Hi. I read Christophe's paper on extensible sequences. I don't think this bears on my new package, though, for two reasons:
(1) it's only about sequences; maps don't fit into its framework.
(2) He is proposing here a change that would have to be made to every Common Lisp implementation. As may have been apparent from other email I've sent, I am, sadly, pessimistic that we can really get all of the implementors to make changes in harmony. It's not that they are bad or incompetent or anything like that. It's just that they're busy people with other priorities. In some cases, the priorities include "putting food on the table" (in the metaphorical sense), i.e. it would be easier if someone could pay them to do this, but I don't see how that would happen.
Anyway, thanks for pointing me at this very interesting paper.
-- Dan
On Tue, Jun 14, 2011 at 10:13 AM, Alessio Stalla alessiostalla@gmail.com wrote: On Tue, Jun 14, 2011 at 3:38 PM, Daniel Weinreb dlw@google.com wrote:
Could you tell me where to find that? Thanks. -- Dan
The paper - titled "User-extensible sequences in Common Lisp" by C. Rhodes - can be found for example here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.1604&rep=rep1&type=pdf
I don't remember how well the paper describes SBCL's implementation; I think it's worth taking a look at it to see how it combines CLOS (for genericity) with regular functions special-cased on CL built-in types. ABCL's impl is almost a clone of SBCL's, with only minor adaptations.
I agree that classes in CLOS are overrated. CLOS is mainly about generic functions and it's a pity, imho, that GFs can only be specialized on classes. With minor changes CLOS could be more general.
Cheers, Alessio
pro mailing list pro@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/pro
-- Pascal Costanza The views expressed in this email are my own, and not those of my employer.
On Wed, Jun 29, 2011 at 12:06 AM, Daniel Weinreb dlw@google.com wrote:
Hi. I read Christophe's paper on extensible sequences. I don't think this bears on my new package, though, for two reasons: (1) it's only about sequences; maps don't fit into its framework.
Yes, I was aware of that; I suggested the paper because it details what is, imho, an effort analogous to yours, and might give inspiration wrt. the general spirit, the API and the integration with the rest of CL.
(2) He is proposing here a change that would have to be made to every Common Lisp implementation.
Such a change would have to be made because CL already has a sequence abstraction, so it makes sense to extend it rather than to provide another abstraction with the same goals. Since there's no map abstraction in CL, a similar API for maps would not need any special support from implementations. Extra features like LOOP integration might be provided only on CL implementations with an extensible LOOP, or by using Iterate instead, or, shameless plug, doplus [1].
As may have been apparent from other email I've sent, I am, sadly, pessimistic that we can really get all of the implementors to make changes in harmony. It's not that they are bad or incompetent or anything like that. It's just that they're busy people with other priorities. In some cases, the priorities include "putting food on the table" (in the metaphorical sense), i.e. it would be easier if someone could pay them to do this, but I don't see how that would happen. Anyway, thanks for pointing me at this very interesting paper.
I'm with Pascal on pessimism: I'm sure all sane Lisp implementers will add any feature that is reasonably easy to implement and is demanded by sufficiently many users. Fixes to the MOP generally satisfy both these rules. Extensible sequences do not, yet, at least because most Lispers don't know about them or don't find them sufficiently useful to bug their vendors about them. In my personal experience with ABCL, where dealing with Java libraries is fairly common, having Java Lists be natively understood as CL sequences is valuable. I imagine that if, e.g., FSet would get more users, having it integrated with sequences would be appealing (even if it would open another can of worms since the CL sequence API assumes mutability - a design mistake, imho).
[1] http://code.google.com/p/tapulli/wiki/doplus
Alessio
Just for the record, I think extensible sequences would be really great! -- Dan
On Wed, Jun 29, 2011 at 5:24 PM, Alessio Stalla alessiostalla@gmail.comwrote:
On Wed, Jun 29, 2011 at 12:06 AM, Daniel Weinreb dlw@google.com wrote:
Hi. I read Christophe's paper on extensible sequences. I don't think this bears on my new package, though, for two reasons: (1) it's only about sequences; maps don't fit into its framework.
Yes, I was aware of that; I suggested the paper because it details what is, imho, an effort analogous to yours, and might give inspiration wrt. the general spirit, the API and the integration with the rest of CL.
(2) He is proposing here a change that would have to be made to every Common Lisp implementation.
Such a change would have to be made because CL already has a sequence abstraction, so it makes sense to extend it rather than to provide another abstraction with the same goals. Since there's no map abstraction in CL, a similar API for maps would not need any special support from implementations. Extra features like LOOP integration might be provided only on CL implementations with an extensible LOOP, or by using Iterate instead, or, shameless plug, doplus [1].
As may have been apparent from other email I've sent, I am, sadly, pessimistic that we can really get all of the implementors to make changes in harmony. It's not that they are bad or incompetent or anything like that. It's just that they're busy people with other priorities. In some cases, the priorities include "putting food on the table" (in the metaphorical sense), i.e. it would be easier if someone could pay them to do this, but I don't see how that would happen. Anyway, thanks for pointing me at this very interesting paper.
I'm with Pascal on pessimism: I'm sure all sane Lisp implementers will add any feature that is reasonably easy to implement and is demanded by sufficiently many users. Fixes to the MOP generally satisfy both these rules. Extensible sequences do not, yet, at least because most Lispers don't know about them or don't find them sufficiently useful to bug their vendors about them. In my personal experience with ABCL, where dealing with Java libraries is fairly common, having Java Lists be natively understood as CL sequences is valuable. I imagine that if, e.g., FSet would get more users, having it integrated with sequences would be appealing (even if it would open another can of worms since the CL sequence API assumes mutability - a design mistake, imho).
[1] http://code.google.com/p/tapulli/wiki/doplus
Alessio
- Daniel Weinreb qyj@tbbtyr.pbz [2011-06-30 18:12:02 -0400]:
Just for the record, I think extensible sequences would be really great!
Just for the record, CLISP has had extensible sequences since forever (far predating my involvement). See clisp/src/defseq.lisp which defines the standard sequences (vectors, lists et al); the infrastructure is in C file clisp/src/sequence.d:
/* O(seq_types) contains a list of type descriptors for sequences. These are simple-vectors of length 16, containing: SEQ-TYPE ; the type of the sequence, usually a symbol Access functions: SEQ-INIT SEQ-UPD SEQ-ENDTEST SEQ-FE-INIT SEQ-FE-UPD SEQ-FE-ENDTEST SEQ-ACCESS SEQ-ACCESS-SET SEQ-COPY SEQ-LENGTH SEQ-MAKE SEQ-ELT SEQ-SET-ELT SEQ-INIT-START SEQ-FE-INIT-END
Explanation of the functions SEQ-XXX:
A "pointer" is something, that can step through a sequence. There are pointers, that move from left to right; they are created with INIT or INIT-START, copied with COPY, UPD to advance one step, ENDTEST for testing, if they have reached the end of the Sequence, ACCESS for fetching the element, which is pointed to by the pointer, ACCESS-SET for setting the element, which is pointed to by the pointer. There are also pointers, that move from right to left; they are created with FE-INIT or FE-INIT-END, copied with COPY, FE-UPD for moving them one step to the left, FE-ENDTEST for testing, if they have reached the end of the Sequence, ACCESS for fetching the element, which is pointed to by the pointer. For them, ACCESS-SET does not work.
Movement operations: INIT (lambda (seq) ...) -> pointer returns the leftmost pointer of SEQ. UPD (lambda (seq pointer) ...) -> pointer returns a pointer to the adjacent neighbor at the right. SEQ-UPD can assume, that the right border of SEQ is not stepped over. ENDTEST (lambda (seq pointer) ...) -> bool tests, if this pointer is at the right end of SEQ. The same "FROM END" : FE-INIT (lambda (seq) ...) -> pointer returns the rightmost pointer of SEQ. FE-UPD (lambda (seq pointer) ...) -> pointer returns a pointer to the adjacent neighbor at the left. SEQ-FE-UPD can assume, that the left border of SEQ is not stepped over. FE-ENDTEST (lambda (seq pointer) ...) -> bool tests, if this pointer is at the left end of SEQ. Access via pointer: ACCESS (lambda (seq pointer) ...) -> value returns the element in SEQ the pointer is pointing to. ACCESS-SET (lambda (seq pointer value) ...) -> sets the element where the pointer is pointing to in SEQ, to the specified value. Works only for pointers that move from left to right! COPY (lambda (pointer) ...) -> pointer returns a copy of the Pointer to SEQ (because UPD and FE-UPD can operate destructively on the pointers) Total length: LENGTH (lambda (seq) ...) -> size returns the (active) length of the Sequence SEQ. MAKE (lambda (size) ...) -> sequence returns a newly allocated, empty sequence, that has the type SEQ-TYPE and the specified length. Access via index (usually more inefficient than via pointer): ELT (lambda (seq index) ...) -> value returns (ELT SEQ index) SET-ELT (lambda (seq index value) ...) -> sets (ELT SEQ index) to value. INIT-START (lambda (seq index) ...) -> pointer returns a pointer which moves in SEQ from left to right from Position index. Must execute the Range-test by itself. FE-INIT-END (lambda (seq index) ...) -> pointer returns a pointer which moves in SEQ from right to left from Position index. Must execute the Range-test by itself. */
of course, the presence of index accessors prevents things like hash tables, trees and maps from being considered sequences...