Hi,
Quoting Ivan Shvedunov (ivan4th@gmail.com):
Dear colleagues, I've pushed the changes I've accumulated over the last week to my github repository ( https://github.com/ivan4th/commonqt/ ). Please review them and let me know whether I should push some of them to the main repo.
Thanks. I have gone ahead and pushed f5c8f374 to the main repo. This includes all except the caching changes. (Please speak up if that partial push leaves anything broken.)
Here are my thoughts on the caching:
The are two code paths to consider: The fast path, which hits the cache, and the slow path, which needs to go through the full dispatch. But the latter currently includes situations that do not use the cache at all, even though they should, or that are likely to exceed the cache size.
Possible areas of improvement include:
a. Make the fast path even faster. This is what I had been working on (before I took a pause with CommonQt development).
There is certainly still room for improvement; depending on the number of arguments perhaps somewhere between a 3x or 10x speedup on SBCL. Clozure is perhaps 5x slower than SBCL, which we might hopefully improve on.
b. Make the slow path faster. In principle a worthy goal:
Although only some code will truly benefit from this, the potential speedup for affected code is tremendous, because the slow path is currently extremely slow. I don't have numbers, but back when CommonQt always went through the slow path, it was basically unusable on Clozure.
c. Fix the cases that do not benefit from the cache yet at all: There many such cases at the moment.
Your marshalling changes fall under "making the slow path faster". I appreciate this goal in principle, but I am somewhat reluctant to push those changes just yet, for the following reasons:
1. Benchmarks. Let's write benchmarks for the slow path before trying to improve it. (My existing micro-benchmarks are only for the fast path.)
2. Thread safety and use of hash tables. CommonQt is meant to be thread-safe for ordinary operations like method invocations. I know that many Lisps support thread-safe hash tables, but I think it would be nice if we could build caching without having to depend on somewhat unportable features for these data structures.
The existing caching code reimplements hash tables and takes care to atomically replace single elements of a vector for this reason.
(Why is thread safety needed at all? Qt only has one GUI thread, but additional Lisp threads appear as Qt threads and are useful for all non-GUI code, e.g. Qt I/O multiplexing. BTW, note that startup-related code that initializes smoke modules is an exception: Initialization is not thread safe at this point. Users are expected to do this only once.)
3. Duplicate caching: Method lookup would have a cache, which calls marshalling functions on cache misses, which would also have a (different) cache. Can't we have one cache that covers both? Perhaps this is fundamentally the same argument as 4.:
4. Finally, a thought that is not an objection, but merely a question of priorities:
Most code should hit the fast path anyway, and I consider it a bug that %METHOD-INVOCATION-CALLBACK doesn't do so.
So I'd like to work on c. in preference to b. for the moment. The analysis/plan is basically this:
- The current cache is a "call site"-dependent cache. I used this approach because it is the easiest. It's not a bad idea in principle, because it needs "hash table" entries (in the generic sense, not the cl:hash-table sense) only for those argument signatures actually encountered for a specific call site.
- It has a big downside, however: It performs badly for functions that use different methods and argument signatures every time. In general, if you don't do (defun yourfun () (#_yourcall)) but rather (defun yourfun () (helperfunction)) (defun helperfunction () (#_yourcall)) the cache moves from yourfun to helperfunction. If there are other callers to helperfunction, they suddenly share the same cache.
The solution is move away from call-site-specific caches, and towards class-specific and method(-name)-specific caches. Basically we would make a pre-allocated lisp array of cache data structures (probably at smoke module loading time).
One simple model for the run-time dispatch would be to take the class id as a reference into that array, and compute a hash key out of method name and signature.
Another similar approach (perhaps closer to the current code) is to index on the method name first, and then hash on class and argument signature.
(But note that we need to cache the step that computes the method id, so we cannot use that as a hash key.)
The second approach is very similar to how CLOS is usually implemented, I think. Our problem is very close to CLOS in any case, and rather unlike vtable dispatch in normal OO languages. Hopefully our code can be much simpler than CLOS though; e.g. because we can work out in advance how large cache table need to be for the usual smoke modules, so that we don't need to enlarge and re-hash things.
[...]
- UNMARSHALLER now caches unmarshalled closures.
This seems to improve performance a bit in %METHOD-INVOCATION-CALLBACK as it doesn't cache
Side note: The specific code in %METHOD-INVOCATION-CALLBACK is obviously distinct for ordinary method dispatch, but I think it could use the same global cache tables.
anything itself. Also, I've got rid of redundant NONCONST-NAME function I've introduced previously. I did some tests on a QListView code with custom model with SBCL statistical profiler and seems like all this indeed helps (will later distill that test code into automatic benchmark).
Thank you, a benchmark would be most helpful.
d.