AFAICT, the CLIM spec doesn't say anything about what are the acceptable values for the DUPLICATE-TEST argument to FORMAT-GRAPH-FROM-ROOTS (although it does specify that the default value is EQL). This seems very unfortunate. If we were to be limited to the conventional equality test values, then we would be able to use a hash-table to store the graph nodes. However, if the spec permits arbitrary functions of two arguments, that seems to make use of hash-tables impossible. Am I overlooking something? Is this a deficiency in the spec?
thanks, R
On Monday, July 18, 2005, at 08:13 pm, rpgoldman@real-time.com wrote:
AFAICT, the CLIM spec doesn't say anything about what are the acceptable values for the DUPLICATE-TEST argument to FORMAT-GRAPH-FROM-ROOTS (although it does specify that the default value is EQL). This seems very unfortunate. If we were to be limited
It says that it is a '...function of two arguments that is used to compare two objects...' I think any function of two arguments that can be used to compare two objects should be ok.
to the conventional equality test values, then we would be able to use a hash-table to store the graph nodes. However, if the spec permits arbitrary functions of two arguments, that seems to make use of hash-tables impossible. Am I overlooking something? Is this a deficiency in the spec?
I don't understand what the benefit of using hashtables to store the nodes is. That said, it seems to me that whilst the spec permits arbitrary functions of two arguments, any specific node type will need a particular DUPLICATE-TEST function... or am I missing the point of the question totally?
-Duncan
thanks, R _______________________________________________ mcclim-devel mailing list mcclim-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/mcclim-devel
Duncan Rose wrote:
On Monday, July 18, 2005, at 08:13 pm, rpgoldman@real-time.com wrote:
AFAICT, the CLIM spec doesn't say anything about what are the acceptable values for the DUPLICATE-TEST argument to FORMAT-GRAPH-FROM-ROOTS (although it does specify that the default value is EQL). This seems very unfortunate. If we were to be limited
It says that it is a '...function of two arguments that is used to compare two objects...' I think any function of two arguments that can be used to compare two objects should be ok.
to the conventional equality test values, then we would be able to use a hash-table to store the graph nodes. However, if the spec permits arbitrary functions of two arguments, that seems to make use of hash-tables impossible. Am I overlooking something? Is this a deficiency in the spec?
I don't understand what the benefit of using hashtables to store the nodes is. That said, it seems to me that whilst the spec permits arbitrary functions of two arguments, any specific node type will need a particular DUPLICATE-TEST function... or am I missing the point of the question totally?
Given the fact that there might be a very large number of nodes, and that we have to check for duplicates, a hash-table providing constant time lookup seems pretty desirable, so one could do
(make-hash-table :test duplicate-test) (gethash (funcall duplicate-key node) hash-table)
to find duplicates.
Perhaps /I'm/ missing the point --- do you have a different vision of how the duplicate detection could be performed?
I haven't thought long and hard about it, but in the worst case, given an arbitrary duplicate-test function, isn't it the case that we can't do better than linear search to check for duplicates? I.e., we have no idea how to thin the field of possible match candidates, so we can't do better than check EVERY existing graph node to see if a newly-added one matches. That seems unacceptable. Or am I overlooking somethign?
Note also that since you have the freedom to define the DUPLICATE-KEY funciton yourself, restricting the DUPLICATE-TEST values to one of the hash-table-acceptable alternatives doesn't seem like a profound sacrifice.
Can anyone think of a case where it would be a substantial sacrifice?
Best,
Robert
P.S. This discussion has reminded me to add an annotation to the CLIM spec to this effect.
On Thursday, July 21, 2005, at 06:55 pm, Robert P. Goldman wrote:
Duncan Rose wrote:
On Monday, July 18, 2005, at 08:13 pm, rpgoldman@real-time.com wrote:
AFAICT, the CLIM spec doesn't say anything about what are the acceptable values for the DUPLICATE-TEST argument to FORMAT-GRAPH-FROM-ROOTS (although it does specify that the default value is EQL). This seems very unfortunate. If we were to be limited
It says that it is a '...function of two arguments that is used to compare two objects...' I think any function of two arguments that can be used to compare two objects should be ok.
to the conventional equality test values, then we would be able to use a hash-table to store the graph nodes. However, if the spec permits arbitrary functions of two arguments, that seems to make use of hash-tables impossible. Am I overlooking something? Is this a deficiency in the spec?
I don't understand what the benefit of using hashtables to store the nodes is. That said, it seems to me that whilst the spec permits arbitrary functions of two arguments, any specific node type will need a particular DUPLICATE-TEST function... or am I missing the point of the question totally?
Given the fact that there might be a very large number of nodes, and that we have to check for duplicates, a hash-table providing constant time lookup seems pretty desirable, so one could do
(make-hash-table :test duplicate-test) (gethash (funcall duplicate-key node) hash-table)
to find duplicates.
Perhaps /I'm/ missing the point --- do you have a different vision of how the duplicate detection could be performed?
I'm with you now; I agree that duplicate node checking would be easier and likely quicker using hashtables. I suspect that the original intention was that the graph be traversed to find duplicates; as you say, a linear search (perhaps something better could be achieved for ordered trees, but I'm not sure that the function arguments allow such possibilities).
I haven't thought long and hard about it, but in the worst case, given an arbitrary duplicate-test function, isn't it the case that we can't do better than linear search to check for duplicates? I.e., we have no idea how to thin the field of possible match candidates, so we can't do better than check EVERY existing graph node to see if a newly-added one matches. That seems unacceptable. Or am I overlooking somethign?
I think that the original intention was not to provide a built in function for drawing arbitrarily complex graphs of many nodes. Perhaps it was (again, it would be nice if one of the CLIM designers were subscribed to this list to answer such questions ;-). In the CLIM mailing list archive (ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/gui/ clim/mail/) a lot of these kinds of questions seem to me to have been answered with 'if the supplied functionality doesn't meet your specific requirement, write something that does'. I think this may be one of those cases where the generalised CLIM functionality fails under specific usage, that either was not anticipated (in which case there's an argument for changing FORMAT-GRAPH-FROM-ROOTS) or that was anticipated and purposefully left unsupported - of course, we are only left to guess at which of these is correct.
Note also that since you have the freedom to define the DUPLICATE-KEY funciton yourself, restricting the DUPLICATE-TEST values to one of the hash-table-acceptable alternatives doesn't seem like a profound sacrifice.
Can anyone think of a case where it would be a substantial sacrifice?
In as far as it departs from the specification, it's a sacrifice I think (though probably not a substantial sacrifice). To retread old ground, I would personally still prefer to see these kinds of changes implemented outside the main codebase of McCLIM (or at least, external to the implementation of the functionality that is present in the specification). At least that way, somebody could take the McCLIM 'extension' and drop it into LW or ACL, and have their application work. That said, it's not like the vendors (especially Franz) have adhered to the spec anyway (Franz likes to reorder argument lists which kind of makes more sense, and improves consistency I guess, but does (IMO) damage portability. I can only conclude that portability between CLIM implementations wasn't Franz' main concern with regards to CLIM. Is it ours?) so maybe this is an irrelevance.
In summary: I think your argument has merit, I just feel that changing FORMAT-GRAPH-FROM-ROOTS isn't the place to rectify the problem(s).
Without further user feedback I fear it will be difficult to resolve this issue to everyones satisfaction.
-Duncan
Best,
Robert
P.S. This discussion has reminded me to add an annotation to the CLIM spec to this effect.
"DR" == Duncan Rose duncan@robotcat.demon.co.uk writes:
DR> On Thursday, July 21, 2005, at 06:55 pm, Robert P. Goldman wrote:
>> Duncan Rose wrote: >> >>> On Monday, July 18, 2005, at 08:13 pm, rpgoldman@real-time.com wrote: >>>> >>>> AFAICT, the CLIM spec doesn't say anything about what are the >>>> acceptable values for the DUPLICATE-TEST argument to >>>> FORMAT-GRAPH-FROM-ROOTS (although it does specify that the default >>>> value is EQL). This seems very unfortunate. If we were to be >>>> limited >>> It says that it is a '...function of two arguments that is used to >>> compare two objects...' >>> I think any function of two arguments that can be used to compare two >>> objects should be ok. >>>> to the conventional equality test values, then we would be able to >>>> use >>>> a hash-table to store the graph nodes. However, if the spec permits >>>> arbitrary functions of two arguments, that seems to make use of >>>> hash-tables impossible. Am I overlooking something? Is this a >>>> deficiency in the spec? >>> I don't understand what the benefit of using hashtables to store the >>> nodes is. That said, it seems to me that whilst the spec permits >>> arbitrary functions of two arguments, any specific node type will >>> need a particular DUPLICATE-TEST function... or am I missing the >>> point of the question totally? >> >> Given the fact that there might be a very large number of nodes, and >> that we have to check for duplicates, a hash-table providing constant >> time lookup seems pretty desirable, so one could do >> >> (make-hash-table :test duplicate-test) >> (gethash (funcall duplicate-key node) hash-table) >> >> to find duplicates. >> >> Perhaps /I'm/ missing the point --- do you have a different vision of >> how the duplicate detection could be performed? >>
DR> I'm with you now; I agree that duplicate node checking would DR> be easier and likely quicker using hashtables. I suspect that DR> the original intention was that the graph be traversed to find DR> duplicates; as you say, a linear search (perhaps something DR> better could be achieved for ordered trees, but I'm not sure DR> that the function arguments allow such possibilities).
I think I should provide an implementation that is based on the use of hash-tables, checking for the argument being EQ, EQL, EQUAL or EQUALP. I can have a mixin class providing the hash table and a find-previous-node generic function that does the lookup, and a linear search implementation of the method for the default case.
[...snip...]
DR> I think that the original intention was not to provide a built in DR> function for drawing arbitrarily complex graphs of many nodes. Perhaps DR> it was (again, it would be nice if one of the CLIM designers were DR> subscribed to this list to answer such questions ;-). In the CLIM DR> mailing list archive DR> (ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/gui/ DR> clim/mail/) a lot of these kinds of questions seem to me to have been DR> answered with 'if the supplied functionality doesn't meet your specific DR> requirement, write something that does'. I think this may be one of DR> those cases where the generalised CLIM functionality fails under DR> specific usage, that either was not anticipated (in which case there's DR> an argument for changing FORMAT-GRAPH-FROM-ROOTS) or that was DR> anticipated and purposefully left unsupported - of course, we are only DR> left to guess at which of these is correct.
This is a good point. It's clear, for example, that format-graph-from-roots is insufficient for a graph *editor*, since it doesn't provide support for being able to move the graph nodes, etc. So there's a case where one is clearly moving beyond format-graph-from-roots. But my case (simple dag display) seems acceptable.
>> Note also that since you have the freedom to define the DUPLICATE-KEY >> funciton yourself, restricting the DUPLICATE-TEST values to one of the >> hash-table-acceptable alternatives doesn't seem like a profound >> sacrifice. >> >> Can anyone think of a case where it would be a substantial sacrifice?
DR> In as far as it departs from the specification, it's a DR> sacrifice I think (though probably not a substantial DR> sacrifice). To retread old ground, I would personally still DR> prefer to see these kinds of changes implemented outside the DR> main codebase of McCLIM (or at least, external to the DR> implementation of the functionality that is present in the DR> specification). At least that way, somebody could take the DR> McCLIM 'extension' and drop it into LW or ACL, and have their DR> application work.
This seems like a good idea, but I'm skeptical that it will work in practice. AFAICT, the documented graph-formatting interface is simply not sufficient to allow someone to create a portable extension. I am certainly finding that if I want to build onto what's there, I need to (or at least, it's a lot easier to) do it by using stuff that is not part of the specified API. For example, if one wishes to add slots to graph-output-record types, one really must take advantage of the fact that format-graph-from-roots in McCLIM passes extra arguments to the graph-output-record constructor, which is not (unless I missed something) part of the graph-formatting protocol. Similarly, the CLIM spec does not allow for additional keyword arguments to format-graph-from-roots, which IMHO would make the avowed intention of providing an extensible API well-nigh impossible. I take advantage of the fact that McCLIM has format-graph-from-roots with &allow-other-keys.
DR> That said, it's not like the vendors (especially Franz) have DR> adhered to the spec anyway (Franz likes to reorder argument DR> lists which kind of makes more sense, and improves consistency DR> I guess, but does (IMO) damage portability. I can only DR> conclude that portability between CLIM implementations wasn't DR> Franz' main concern with regards to CLIM. Is it ours?) so DR> maybe this is an irrelevance.
Since I don't expect ever to be able to afford a CLIM license from a vendor, this is not a huge concern of mine! That's not intended to be a slam --- CLIM seems horribly un-economical from the perspective of a Lisp vendor, combining the bad features of hugeness and expense to maintain with a vanishingly small user base.
DR> In summary: I think your argument has merit, I just feel that DR> changing FORMAT-GRAPH-FROM-ROOTS isn't the place to rectify DR> the problem(s).
Granted. I think that the hash-table can be supported using CLOS specialization w/o violating the spec (modulo &allow-other-keys, which seems mandatory and, in any case, isn't MY violation!), and if it can be, should be.
I hope to provide a better patch RSN.
OPEN QUESTION: The HASH-TABLE slot in standard-graph-output-record is not documented as to purpose. However, it seems to be used for duplicate-handling in GENERATE-GRAPH-NODES. On that basis, I'm inclined to remove it, move it into the mixin I've proposed, and provide something that's actually MORE standard-compliant than the current version.
OPEN QUESTION 2: Would people who are using format-graph-from-roots be willing to send me a test case or test cases so that I can make sure any new version I create doesn't bust things?
Thanks, Robert
There is one problem with the strategy I've proposed of using mixins to handle different possible arguments to format-graph-from-roots:
1. Recall the proposed strategy:
a. add a mixin class that would be something like HASHABLE-GRAPH.
b. move the HASH-TABLE slot from STANDARD-GRAPH-OUTPUT-RECORD to HASHABLE-GRAPH. Add a NODE-LIST slot to STANDARD-GRAPH-OUTPUT-RECORD.
c. Add a FIND-DUPLICATE-GRAPH-NODE generic function that would do linear search on NODE-LIST by default, but that would do a hash lookup if the graph-output-record were of type HASHABLE-GRAPH.
d. Given a call to FORMAT-GRAPH-FROM-ROOTS, if DUPLICATE-TEST were one of the acceptable values (EQ, EQL, EQUAL or EQUALP), then make the hash-table, else use NODE-LIST.
2. Problem: as far as I can tell, this would make (if you pardon the pun) a hash of the existing McCLIM graph-type declaration process. Recall that when one defines a new graph-type, one needs to call DEFINE-GRAPH-TYPE to create a mapping from graph-type name (by convention, it seems, a keyword symbol, but could be simply a symbol) to graph-output-record subclass. But this approach isn't really compatible with the approach I've outlined in #1, above, since we'd need to be creating (at least) TWO new classes, one with the HASHABLE-GRAPH mixin, and one without, and the person defining the graph-output record would have to know about this difference.
3. Possible work-arounds:
a. Avoid using CLOS and method dispatch to solve this problem. Instead, create a DUPLICATE-GRAPH-NODE function that is *not* methodified, but that simply checks the value of duplicate-test, and either uses a hash-table or not, as appropriate. Somehow this bothers me, but not as much as:
b. Do something scary involving CHANGE-CLASS.
The route of least resistance is clearly:
c. Go ahead and violate the CLIM spec, and restrict DUPLICATE-TEST to be one of EQ, EQL, EQUAL or EQUALP. Note that *this is the status quo* --- the existing McCLIM code already makes a graph node hash table and initializes it using DUPLICATE-TEST, so I will just be continuing in the current line, not adding any new spec violations. Actually, note that if we don't want to choose this alternative, I actually need to fix existing code --- it's not enough to just add code, I actually need to fix the existing graph layout code
If I had this to design ab initio (not that I would dare to design a graphic interface spec myself), I would just do 3(c), and make the spec limit DUPLICATE-TEST to one of the hash-able values, which seems far better.
Anyone have a vendor CLIM license and know if they support arbitrary DUPLICATE-TEST values? Reading the Franz user's guide suggests that they do.
Unless I get a strong vote for 3(c), I will follow 3(a), and try to have a patch RSN.
BTW, shouldn't McCLIM cough up a hairball if you try to use format-graph-from-roots with :merge-duplicates t and :graph-type :tree?
Cheers, Robert