On 08/17/2015 11:09 PM, Daniel Kochmański wrote:
Extending ECL or as a implementation-specific package for it might be
another option, but again, this is beyond my knowledge of its internals.
Could you elaborate a little on this?
As a conceptual design without having examined ECL internal docs or source-- a very big disclaimer, indeed-- this is one approach considering primarily the C & Unix perspective:
1. Since my criteria revolve around cheap & efficient workers and cheating garbage collection upon worker exit, I'd build the memory management system upon mmap() and munmap() for each worker heap. All memory consed within context of the worker must be constrained to exist only within the large blocks of memory from its private mmap()'d areas.
2. Regardless of ECL or other Lisp system, introduce a form such as WITH-WORKER that could combine Erlang's spawn() parameters and BODY that is maintained as a list for which the form at runtime would time-slice between its elements. WITH-WORKER could vaguely resemble WITH-OPEN-FILE, so let's leave it at that for now. More below.
3. Where Erlang/BEAM has the concept of "reductions" as the means by which to introduce and enforce some notion of fairness across workers, perhaps the Lispy version *doesn't* try to protect the programmer from hurting him/herself. Iterating over the BODY as list would translate into round-robin interleaving across all WITH-WORKER instances on the same CPU core-- same scheduler.
4. As with Erlang/BEAM: introduce one scheduler per CPU core. Worker migration to re-balance load per core is a pain point, but avoid anything too fancy here. Start with random-but-even distribution and random-but-even re-balancing.
(But beware: attempting to localize workers based upon message-passing communication patterns adds too much complexity to get correct for the "average" use case. Instead, similarly to thread "colours", just specify affinity hints as optional parameters to WITH-WORKER, which should be *ignored* on first cut.)
5. As with Erlang/BEAM: when sending messages across cores, *copy* rather than share data for better cache locality. Of course, the Lispy approach would probably be to allow copying *or* sharing, and a thin package on top of something like SBCL's sb-concurrently (with an optional parameter to override) could enforce the recommended approach. Caveat below.
(See http://sbcl.org/manual/index.html#sb_002dconcurrency and particularly take note of the Edya Ladan-Mozes and Nir Shavit paper reference for minimizing locks for queues.)
6. There are obvious points that I've ignored here, such as in order to have cheap worker cleanup upon exit, the system would need guarantees that no other worker is sharing its data. There are sound reasons why Erlang semantics are pure functional programming! But for the Lispy approach, perhaps keep the global heap for such things, and anything to be shared goes there-- along with the performance penalty for reaching across NUMA nodes. That is, "here's enough rope..." But seriously, just as Scala used OO early in its lifetime yet largely considers it best to avoid that now, so too I suspect we'd feel the same about shared read/write memory in the long-run once we have enough lines of code written with this proposed system.
Of course, the above may have no contact with reality of ECL.
For instance, if there is an equivalent to a global lock then the above design fails quickly.
Perhaps this gives ideas to someone who has working knowledge of such internals.
Thanks, -Daniel