Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
Handling tens of thousands of requests per second per node isn't uncommon, and these often have *several* workers per request or connection: hundreds of thousands of processes. Under such scenarios, anything less than this approach to lightweight processes might suffer from stalls during long gc runs that would be avoided or significantly reduced under Erlang's model.
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
For those unfamiliar with it, what Erlang means by "process" stems from an implementation of Actor Model with caveats. Each process is more co-routine than thread, and many may run within the same OS thread managed by the Erlang scheduler. The BEAM VM pre-empts processes via "reduction" counting, which may be understood as unit of VM time. Their famed tag line, "Let it crash," may be loosely understood in CL terms as an implicit HANDLER-CASE.
The open question here is to address a stated non-goal of CL-MUPROC, "we rely on the CL implementation's MP system" and "considerably heavier than Erlang processes". [See presentation link from https://common-lisp.net/project/cl-muproc/ ]
Some Erlang-on-CL packages use OS threads or in the case of Erlang-in-Lisp, fork(). While easier for a library author to implement, these contradict the definition of cheap here. Other such packages punt the issue altogether and instead focus only on pattern-matching aspects of Erlang the language.
Then there's Lisp-Flavoured-Erlang, and Elixir offers hygienic macros that manipulate the full AST properly-- all play nice on same Erlang VM at runtime-- for people simply looking to avoid Erlang syntax or semantics. But let's focus on cheap Erlang style processes in Common Lisp here, please.
1. Semantics of library elements are straight-forward enough (e.g., SPAWN, EXIT) and may be borrowed wholesale and safely named within their own package.
2. Memory allocation & garbage collection:
Erlang BEAM VM doesn't begin to reclaim memory until very late, so an initial implementation here might assume ephemeral worker processes and omit gc until exit of a worker. However, some processes are long-lived in practice.
One compromise might be acceptable: one nursery per process, but anything promoted to higher generations gets handled however your gc does it now.
This states nothing about use of shared versus multiple heaps across OS threads, so such matters may continue to be implementation-dependent.
3. Co-routines:
For something similar to Erlang's reductions, there would need to be a measure of runtime complexity per process.
However, I've used single thread co-routines for near realtime systems in C and CL with a loop of function pointers and ensuring that each referenced function executes under some threshold determined through experimentation and confirmed via test cases. No pre-empting needed. While fragile and not easily portable across different hardware (including same architecture), this may be acceptable for a preliminary draft.
Using CL-MUPROC routines as an example and extending MUPROCN: perhaps its body becomes this top-level list of functions from which interleaving across processes may occur. Then add variations for playing well with DO or LOOP semantics.
4. Message-passing:
SBCL's sb-concurrency extension with Queue and Mailbox (implementation of "Optimistic FIFO Queue") can be the solution here too. More aligned with CL principles, allowing for multiple mailboxes-- and therefore priority messaging-- would be a welcome advancement beyond Erlang. (Erlang allows only one mailbox per process: sequential but with pattern matching, nothing out-of-band...)
An important implementation detail that simplifies gc and increases cache locality across NUMA nodes: messages get duplicated when delivered-- each process only references its own copy!
5. (Almost) nothing shared:
Erlang enforces strict prohibition against shared memory, and common practice is to use an in-memory database (ETS) as key/value store in lieu of globals. Scala allows but discourages shared memory.
A CL-inspired take on this might be to use SBCL's approach with thread creation: upon creating a new process, you get: A) global special values, but modify at own risk... B) LET bindings are local to each process; C) threads don't inherit dynamic bindings from parent. i.e., http://sbcl.org/manual/index.html#Special-Variables
6. Scheduling processes on OS threads:
This is delicate in Erlang, and I've experienced specialized use cases interfering with their scheduler when under heavy load. Instead for our purposes, let the programmer handle the mapping of processes to OS threads. Less is more here.
7. Finally, gen_server & OTP fiends may be implemented as their own packages... or skipped entirely!
Thoughts on feasibility?
Thanks, -Daniel
Have you looks at Lisp-Flavored Erlang ("LFE")? - http://lfe.io
It's really quite interesting, IMO.
--Scott
On Mon, Aug 3, 2015 at 11:12 AM, shenanigans@sonic.net wrote:
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
Handling tens of thousands of requests per second per node isn't uncommon, and these often have *several* workers per request or connection: hundreds of thousands of processes. Under such scenarios, anything less than this approach to lightweight processes might suffer from stalls during long gc runs that would be avoided or significantly reduced under Erlang's model.
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
For those unfamiliar with it, what Erlang means by "process" stems from an implementation of Actor Model with caveats. Each process is more co-routine than thread, and many may run within the same OS thread managed by the Erlang scheduler. The BEAM VM pre-empts processes via "reduction" counting, which may be understood as unit of VM time. Their famed tag line, "Let it crash," may be loosely understood in CL terms as an implicit HANDLER-CASE.
The open question here is to address a stated non-goal of CL-MUPROC, "we rely on the CL implementation's MP system" and "considerably heavier than Erlang processes". [See presentation link from https://common-lisp.net/project/cl-muproc/ ]
Some Erlang-on-CL packages use OS threads or in the case of Erlang-in-Lisp, fork(). While easier for a library author to implement, these contradict the definition of cheap here. Other such packages punt the issue altogether and instead focus only on pattern-matching aspects of Erlang the language.
Then there's Lisp-Flavoured-Erlang, and Elixir offers hygienic macros that manipulate the full AST properly-- all play nice on same Erlang VM at runtime-- for people simply looking to avoid Erlang syntax or semantics. But let's focus on cheap Erlang style processes in Common Lisp here, please.
- Semantics of library elements are straight-forward enough (e.g., SPAWN,
EXIT) and may be borrowed wholesale and safely named within their own package.
- Memory allocation & garbage collection:
Erlang BEAM VM doesn't begin to reclaim memory until very late, so an initial implementation here might assume ephemeral worker processes and omit gc until exit of a worker. However, some processes are long-lived in practice.
One compromise might be acceptable: one nursery per process, but anything promoted to higher generations gets handled however your gc does it now.
This states nothing about use of shared versus multiple heaps across OS threads, so such matters may continue to be implementation-dependent.
- Co-routines:
For something similar to Erlang's reductions, there would need to be a measure of runtime complexity per process.
However, I've used single thread co-routines for near realtime systems in C and CL with a loop of function pointers and ensuring that each referenced function executes under some threshold determined through experimentation and confirmed via test cases. No pre-empting needed. While fragile and not easily portable across different hardware (including same architecture), this may be acceptable for a preliminary draft.
Using CL-MUPROC routines as an example and extending MUPROCN: perhaps its body becomes this top-level list of functions from which interleaving across processes may occur. Then add variations for playing well with DO or LOOP semantics.
- Message-passing:
SBCL's sb-concurrency extension with Queue and Mailbox (implementation of "Optimistic FIFO Queue") can be the solution here too. More aligned with CL principles, allowing for multiple mailboxes-- and therefore priority messaging-- would be a welcome advancement beyond Erlang. (Erlang allows only one mailbox per process: sequential but with pattern matching, nothing out-of-band...)
An important implementation detail that simplifies gc and increases cache locality across NUMA nodes: messages get duplicated when delivered-- each process only references its own copy!
- (Almost) nothing shared:
Erlang enforces strict prohibition against shared memory, and common practice is to use an in-memory database (ETS) as key/value store in lieu of globals. Scala allows but discourages shared memory.
A CL-inspired take on this might be to use SBCL's approach with thread creation: upon creating a new process, you get: A) global special values, but modify at own risk... B) LET bindings are local to each process; C) threads don't inherit dynamic bindings from parent. i.e., http://sbcl.org/manual/index.html#Special-Variables
- Scheduling processes on OS threads:
This is delicate in Erlang, and I've experienced specialized use cases interfering with their scheduler when under heavy load. Instead for our purposes, let the programmer handle the mapping of processes to OS threads. Less is more here.
- Finally, gen_server & OTP fiends may be implemented as their own
packages... or skipped entirely!
Thoughts on feasibility?
Thanks, -Daniel
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
Handling tens of thousands of requests per second per node isn't uncommon, and these often have *several* workers per request or connection: hundreds of thousands of processes. Under such scenarios, anything less than this approach to lightweight processes might suffer from stalls during long gc runs that would be avoided or significantly reduced under Erlang's model.
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
It would be great to have such rapid fire threads. In addition to Erlang style, it could also support Concurrent ML style and Reppy Channels. Threads would be able to form spaghetti stacks and be quickly pared when dead. Could do a lot more speculative threading.
But that really does sound like a redesign of CL.
(I’m one of the authors of an Erlang-like system in Lisp, called Butterfly. The Scheme guys have one called Termite.)
- DM
On Aug 3, 2015, at 8:34 AM, Stelian Ionescu sionescu@cddr.org wrote:
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
Handling tens of thousands of requests per second per node isn't uncommon, and these often have *several* workers per request or connection: hundreds of thousands of processes. Under such scenarios, anything less than this approach to lightweight processes might suffer from stalls during long gc runs that would be avoided or significantly reduced under Erlang's model.
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
-- Stelian Ionescu a.k.a. fe[nl]ix Quidquid latine dictum sit, altum videtur. http://common-lisp.net/project/iolib http://common-lisp.net/project/iolib
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
well, Nikodemus had some plans for green threads for SBCL and it didn't sound like a rewrite.
and adding first class heaps would be a very useful addition with or without any of the other erlang stuff. i don't know how hard that is, but i assume it should be rather simple if the responsibility is pushed over to the user to make sure that there are no dangling pointers after destroying a heap.
or am i missing something?
On Mon, Aug 3, 2015 at 3:36 PM, Attila Lendvai attila@lendvai.name wrote:
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
well, Nikodemus had some plans for green threads for SBCL and it didn't sound like a rewrite.
and adding first class heaps would be a very useful addition with or without any of the other erlang stuff. i don't know how hard that is, but i assume it should be rather simple if the responsibility is pushed over to the user to make sure that there are no dangling pointers after destroying a heap.
or am i missing something?
Not Common Lisp (yet), but Racket has custodians for first-class resource allocation pool, and all kinds of concurrency primitives. Since it's a programming language designed to implement other programming languages on top of it, it would make a great basis for a "new" common lisp implementation.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org There are a thousand hacking at the branches of evil to one who is striking at the root. — Thoreau
I did something similar several years ago for Butterfly, where I pre-instantiate a pool of threads into the “stable”, and utilize a Grand Central Dispatch (real and / or simulated) to run various jobs. No sense killing the horses after they run the racetrack.
But this is still a far cry (a few dozen threads) from what Erlang and CML provide (numbering in the thousands).
- DM
On Aug 3, 2015, at 12:46 PM, Faré fahree@gmail.com wrote:
On Mon, Aug 3, 2015 at 3:36 PM, Attila Lendvai <attila@lendvai.name mailto:attila@lendvai.name> wrote:
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
well, Nikodemus had some plans for green threads for SBCL and it didn't sound like a rewrite.
and adding first class heaps would be a very useful addition with or without any of the other erlang stuff. i don't know how hard that is, but i assume it should be rather simple if the responsibility is pushed over to the user to make sure that there are no dangling pointers after destroying a heap.
or am i missing something?
Not Common Lisp (yet), but Racket has custodians for first-class resource allocation pool, and all kinds of concurrency primitives. Since it's a programming language designed to implement other programming languages on top of it, it would make a great basis for a "new" common lisp implementation.
—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org http://fare.tunes.org/ There are a thousand hacking at the branches of evil to one who is striking at the root. — Thoreau
On Mon, Aug 3, 2015 at 3:36 PM, Attila Lendvai attila@lendvai.name wrote:
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
well, Nikodemus had some plans for green threads for SBCL and it didn't sound like a rewrite.
There are two main reasons why Erlang(the VM, not so much the language IMO) is so effective in its niche:
1) Per-process(green thread) GC heap and no sharing between processes except through very narrow and explicit primitives. This makes Erlang code pretty safe for concurrency as long as you're not seeking to satisfy hard real-time constraints.
2) It has a bytecode-interpreting VM in which each instruction is a safepoint and all I/O is non-blocking, basically implementing scheduler preemption in user space. This allow Erlang watcher processes to randomly kill worker processes and restart them. Being an interpreter is crucial here, because as soon as they tried compiling to native code(HiPE) they had problems with tight loops in pure Erlang code that would not yield to the scheduler and thus starved other processes. Almost nobody uses HiPE. Erlang code can still block if it calls foreign code or it performs some syscalls that can't be avoided to block, but that's why they try to implement everything in pure Erlang - they switched the default SSL implementation to one written by them, a few years ago, because they had issues with OpenSSL.
So again, unless you're willing to implement something along those lines, you could have Erlang-style concurrency - which is still useful I guess - but you wouldn't achieve the rock-solid robustness for which the Erlang VM is so famous(except for inter-process message queues that are unbounded by default so you can easily use all memory if you're not careful).
Yeah, this is why I suggested LFE in my first reply.
--Scott
On Aug 3, 2015, at 4:19 PM, Stelian Ionescu sionescu@cddr.org wrote:
On Mon, Aug 3, 2015 at 3:36 PM, Attila Lendvai attila@lendvai.name wrote:
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
well, Nikodemus had some plans for green threads for SBCL and it didn't sound like a rewrite.
There are two main reasons why Erlang(the VM, not so much the language IMO) is so effective in its niche:
Per-process(green thread) GC heap and no sharing between processes except through very narrow and explicit primitives. This makes Erlang code pretty safe for concurrency as long as you're not seeking to satisfy hard real-time constraints.
It has a bytecode-interpreting VM in which each instruction is a safepoint and all I/O is non-blocking, basically implementing scheduler preemption in user space. This allow Erlang watcher processes to randomly kill worker processes and restart them. Being an interpreter is crucial here, because as soon as they tried compiling to native code(HiPE) they had problems with tight loops in pure Erlang code that would not yield to the scheduler and thus starved other processes. Almost nobody uses HiPE.
Erlang code can still block if it calls foreign code or it performs some syscalls that can't be avoided to block, but that's why they try to implement everything in pure Erlang - they switched the default SSL implementation to one written by them, a few years ago, because they had issues with OpenSSL.
So again, unless you're willing to implement something along those lines, you could have Erlang-style concurrency - which is still useful I guess - but you wouldn't achieve the rock-solid robustness for which the Erlang VM is so famous(except for inter-process message queues that are unbounded by default so you can easily use all memory if you're not careful).
-- Stelian Ionescu a.k.a. fe[nl]ix Quidquid latine dictum sit, altum videtur. http://common-lisp.net/project/iolib
Yeah, this is why I suggested LFE in my first reply.
Yep, forgot to reply to your email.
+1 for LFE
Thanks to all who responded!
Here's one message replying to highlights, yet as always very informed answers all around!
On 08/03/2015 12:36 PM, Attila Lendvai wrote:
How might we get equivalent cheap ephemeral processes into a contemporary Common Lisp implementation?
In short, you need to write from scratch a new CL implementation. Current ones are not designed with the Erlang constraints in mind.
well, Nikodemus had some plans for green threads for SBCL and it didn't sound like a rewrite.
and adding first class heaps would be a very useful addition with or without any of the other erlang stuff. i don't know how hard that is, but i assume it should be rather simple if the responsibility is pushed over to the user to make sure that there are no dangling pointers after destroying a heap.
or am i missing something?
This would give a lot of traction for my projects, especially for *tasks* that generate a lot of garbage to be otherwise collected. (e.g., one use case: lots of temporary hash tables that grow to many GB within a minute, queried a few dozen times, then discarded. I've previously used 100 line Perl scripts for this, just to avoid the gc hit. Called it via file based queue and/or pipe. Everything else in SBCL or CCL.)
On 08/03/2015 08:18 AM, Scott McKay wrote:
Have you looks at Lisp-Flavored Erlang ("LFE")?
It's really quite interesting, IMO.
--Scott
Yes, I have worked with LFE, Elixir and Erlang. I was hoping to get away from their single mailbox per process limitation, among other issues.
Robert & Duncan are doing an excellent job on LFE. It's come a long way in the past year or so. If anyone hasn't tried it lately, please do! For instance, they added a macro for those who were put off by the colon operator, so more conventional module:fn has been usable for over a year. And Jose has a very extensive macro system for Elixir that's possibly closest to Common Lisp's for a non-Lisp.
On 08/03/2015 01:19 PM, Stelian Ionescu wrote:
- It has a bytecode-interpreting VM in which each instruction is a safepoint and all I/O is non-blocking, basically implementing scheduler preemption in user space. This allow Erlang watcher processes to randomly kill worker processes and restart them.
Ah, that's the insight to their details I was unaware of. Thanks!
In earlier CL approaches, I just wrapped the main loop in a handler-case and found that sufficient for my needs, but others may find my approach lacking... And you don't want to know about the external watchdog for something similar but in C.
Erlang's VM has a very large set of use cases that they are supporting, far larger than the proposed set of features from the original posting.
As you also stated (but trimmed here), yes I can make do without Erlang's robustness guarantees. A tens of thousands of workers without a heavy thread pool and fast/cheap workers are my main use cases.
On 08/03/2015 12:46 PM, Faré wrote:
Not Common Lisp (yet), but Racket has custodians for first-class resource allocation pool, and all kinds of concurrency primitives. Since it's a programming language designed to implement other programming languages on top of it, it would make a great basis for a "new" common lisp implementation.
Thanks for that. I've been looking for an excuse to delve deeper into Racket, but since I started with scheme R3 in college, that pesky explicit value for false still feels wrong to me. ;^)
Thanks again everyone, -Daniel
On Aug 3, 2015, at 18:12 , shenanigans@sonic.net wrote:
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
…
Wasn’t it Prolog? I believe the first version was written in Prolog.
In any case, apart from the niceties of the Erlang VM, how much of CL will be there in a ground up new implementation? And how much can be leveraged out of different implementations (e.g., LW has wonderful concurrency primitive available)?
Let me say that the traffic on CLL re LFE and Flavors seems a bit misdirected to me. I want CLOS, not Flavors.
Cheers — MA
I've been trying to convince the people over the in FBP group that this is much easier than it sounds (esp. if you're rolled your own RTOS, as I have done many times).
(I have to leave now, but will answer in more detail later tomorrow).
Here are some slides for a not-yet-presented talk:
https://github.com/guitarvydas/FBP-from-scratch
And here is a raw implementation of such extremely-green threads done in C (sorry :-):
https://github.com/guitarvydas/collate-fbp-classic
Given closures, this should be even easier.
'til later
pt
I've been trying to convince the people over the in FBP group that this is much easier than it sounds (esp. if you're rolled your own RTOS, as I have done many times).
(I have to leave now, but will answer in more detail later tomorrow).
Here are some slides for a not-yet-presented talk:
This is correct but useless to most people, who aren't doing signal processing and need to run on an off-the-shelf OS kernel.
I apologize for my absence.
@David - FBP == flow-based programming.
http://www.jpaulmorrison.com/fbp/
http://www.amazon.ca/Dataflow-Reactive-Programming-Systems-Practical/dp/1497...
Frame-based is interesting in its own right, though http://www.amazon.ca/Framing-Software-Reuse-Lessons-World/dp/013327859X
Just about everyone has used a restricted form of FBP. It’s called “BASH pipe syntax”.
@Stelian, this technique works with or without an O/S.
“… Being an interpreter is crucial here, because as soon as they tried compiling to native code(HiPE) they had problems with tight loops in pure Erlang code that would not yield to the scheduler and thus starved other processes. Almost nobody uses HiPE. …”
This is the part that I have difficulties communicating to people :-).
Here’s another attempt, where I discuss long-running loops and rolling your own scheduler...
https://bittarvydas.wordpress.com/2015/09/27/long-running-loops/
pt
What is FBP? (surely not Federal Bureau of Prisons). Frame Based Programming? (having looked at the slides, maybe so), Flow Based Programming?
- DM
On Mon, 03 Aug 2015 08:12:36 -0700, shenanigans-65eDfwRo+1xeoWH0uzbU5w wrote:
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
I have to voice interest. I would love for a CL implementation to support lightweight threads.
The open question here is to address a stated non-goal of CL-MUPROC, "we rely on the CL implementation's MP system" and "considerably heavier than Erlang processes". [See presentation link from https://common-lisp.net/project/cl-muproc/ ]
How did I miss cl-muproc? My spare time hacking in the lat months has been implementing the *exact same thing*. Oh well...
On 08/17/2015 10:15 AM, Max Rottenkolber wrote:
On Mon, 03 Aug 2015 08:12:36 -0700, shenanigans-65eDfwRo+1xeoWH0uzbU5w wrote:
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
I have to voice interest. I would love for a CL implementation to support lightweight threads.
There was a successful IndieGoGo campaign for SBCL features in the recent past, so might there be someone interested, willing and able to pursue implementing something like this?
This can be its own thing rather than any attempt at paying homage to Erlang or BEAM vm: for me at least, it's about runtime performance and avoiding the big gc hits by discarding an entire heap upon worker exit but in a way that accommodates hundreds of thousands of concurrent workers. Start small, and build from there. Erlang & BEAM address all sorts of use cases for which a Lisp system wouldn't necessarily need to be constrained-- and keeping it more Lispy anyway.
I personally would contribute a few thousand US dollars for such an effort, as it's beyond my knowledge of SBCL internals at this time.
Extending ECL or as a implementation-specific package for it might be another option, but again, this is beyond my knowledge of its internals.
While I respect the various languages orbiting Erlang, I'd rather use Common Lisp.
Thanks, -Daniel
Hello,
shenanigans@sonic.net writes:
On 08/17/2015 10:15 AM, Max Rottenkolber wrote:
On Mon, 03 Aug 2015 08:12:36 -0700, shenanigans-65eDfwRo+1xeoWH0uzbU5w wrote:
Creators of Erlang have a Lisp background, and one feature of the Erlang VM (BEAM) that I'd like back-ported into Common Lisp is their process.
An Erlang "process" is cheap to create, cheap to destroy, cheap when blocked, and upon exit performs bulk gc of its allocated memory; e.g., munmap().
I have to voice interest. I would love for a CL implementation to support lightweight threads.
There was a successful IndieGoGo campaign for SBCL features in the recent past, so might there be someone interested, willing and able to pursue implementing something like this?
This can be its own thing rather than any attempt at paying homage to Erlang or BEAM vm: for me at least, it's about runtime performance and avoiding the big gc hits by discarding an entire heap upon worker exit but in a way that accommodates hundreds of thousands of concurrent workers. Start small, and build from there. Erlang & BEAM address all sorts of use cases for which a Lisp system wouldn't necessarily need to be constrained-- and keeping it more Lispy anyway.
I personally would contribute a few thousand US dollars for such an effort, as it's beyond my knowledge of SBCL internals at this time.
Extending ECL or as a implementation-specific package for it might be another option, but again, this is beyond my knowledge of its internals.
Could you elaborate a little on this?
While I respect the various languages orbiting Erlang, I'd rather use Common Lisp.
Thanks, -Daniel
Regards, Daniel
On 08/17/2015 11:09 PM, Daniel Kochmański wrote:
Extending ECL or as a implementation-specific package for it might be
another option, but again, this is beyond my knowledge of its internals.
Could you elaborate a little on this?
As a conceptual design without having examined ECL internal docs or source-- a very big disclaimer, indeed-- this is one approach considering primarily the C & Unix perspective:
1. Since my criteria revolve around cheap & efficient workers and cheating garbage collection upon worker exit, I'd build the memory management system upon mmap() and munmap() for each worker heap. All memory consed within context of the worker must be constrained to exist only within the large blocks of memory from its private mmap()'d areas.
2. Regardless of ECL or other Lisp system, introduce a form such as WITH-WORKER that could combine Erlang's spawn() parameters and BODY that is maintained as a list for which the form at runtime would time-slice between its elements. WITH-WORKER could vaguely resemble WITH-OPEN-FILE, so let's leave it at that for now. More below.
3. Where Erlang/BEAM has the concept of "reductions" as the means by which to introduce and enforce some notion of fairness across workers, perhaps the Lispy version *doesn't* try to protect the programmer from hurting him/herself. Iterating over the BODY as list would translate into round-robin interleaving across all WITH-WORKER instances on the same CPU core-- same scheduler.
4. As with Erlang/BEAM: introduce one scheduler per CPU core. Worker migration to re-balance load per core is a pain point, but avoid anything too fancy here. Start with random-but-even distribution and random-but-even re-balancing.
(But beware: attempting to localize workers based upon message-passing communication patterns adds too much complexity to get correct for the "average" use case. Instead, similarly to thread "colours", just specify affinity hints as optional parameters to WITH-WORKER, which should be *ignored* on first cut.)
5. As with Erlang/BEAM: when sending messages across cores, *copy* rather than share data for better cache locality. Of course, the Lispy approach would probably be to allow copying *or* sharing, and a thin package on top of something like SBCL's sb-concurrently (with an optional parameter to override) could enforce the recommended approach. Caveat below.
(See http://sbcl.org/manual/index.html#sb_002dconcurrency and particularly take note of the Edya Ladan-Mozes and Nir Shavit paper reference for minimizing locks for queues.)
6. There are obvious points that I've ignored here, such as in order to have cheap worker cleanup upon exit, the system would need guarantees that no other worker is sharing its data. There are sound reasons why Erlang semantics are pure functional programming! But for the Lispy approach, perhaps keep the global heap for such things, and anything to be shared goes there-- along with the performance penalty for reaching across NUMA nodes. That is, "here's enough rope..." But seriously, just as Scala used OO early in its lifetime yet largely considers it best to avoid that now, so too I suspect we'd feel the same about shared read/write memory in the long-run once we have enough lines of code written with this proposed system.
Of course, the above may have no contact with reality of ECL.
For instance, if there is an equivalent to a global lock then the above design fails quickly.
Perhaps this gives ideas to someone who has working knowledge of such internals.
Thanks, -Daniel
On 08/18/2015 08:24 PM, shenanigans@sonic.net wrote:
- As with Erlang/BEAM: when sending messages across cores, *copy*
rather than share data for better cache locality. Of course, the Lispy approach would probably be to allow copying *or* sharing, and a thin package on top of something like SBCL's sb-concurrently (with an optional parameter to override) could enforce the recommended approach. Caveat below.
One of the larger issues I had with Erlang was that you couldn't sic a bunch of threads on analyzing or searching immutable multi-GB raw data structures (not ETS entries, nor hacking around with byte blobs).
My ideal layout would involve per-process heaps, with the option to either cons into or copy into the immutable shared heap. I believe collecting a shared add-only immutable-data heap in this form might not require stopping the world. As was mentioned elsewhere, however, designing around immutability advantages does venture further away from Lisp's style. Splitting memory into per-process heaps, shared immutable heap, and a shared mutable stop-the-world-to-GC heap might be going a little overboard, especially since the allocation/copy style generally has to be declared by the human.