antiweb seems to be using a lot of the techniques we'd like for an erlang-in-lisp implementation: processes that communicate through messages, efficient event loop to manage a great number of connections, etc.
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] A tautology is a thing which is tautological.
---------- Forwarded message ---------- From: doug@hcsw.org Date: 2008/7/17 Subject: Invitation to beta-test new Common Lisp webserver To: fahree@gmail.com
Hi Faré,
We spoke earlier on IRC and apparently have similar ideas about server design so I thought you might be interested in having a look at a webserver I have been working on:
I'm appending the first section of the "Design of Antiweb" page of the manual.
Best,
Doug
Antiweb is a webserver written in Common Lisp, C, and Perl by Doug Hoyte and Hoytech. Antiweb is not a "proof of concept" and is not "exploratory code". We intend the core design of Antiweb (as laid out in this design paper) to be stable for the next 10+ years of use.
The two webservers that have had the largest influence on Antiweb4 are nginx and lighttpd. We took liberal advantage investigating these and other excellent servers while designing Antiweb4. Another more obscure server that has influenced Antiweb is fhttpd.
Why another webserver? In our opinion, the biggest problem with the above servers is that they aren't written in lisp. Many servers that we studied have grafted on extension languages (ie, Perl for nginx and Lua for lighttpd). Antiweb is different. Instead of being a C program that uses some other language, Antiweb is a lisp program that uses C (and Perl).
Like nginx, lighttpd, fhttpd, and Antiweb3, Antiweb4 is an asynchronous or event-based server, meaning that a single thread of control multiplexes multiple client connections. Antiweb is a collection of unix processes. Connections are transferred between processes with sendmsg(). When this happens, any data that was initially read from the socket is transferred along with the socket itself. The socket is always closed in the sending process.
To multiplex connections inside a process, Antiweb uses a state machine data structure defined in src/libantiweb.h. Antiweb requires either the kqueue() or epoll() stateful level-triggered event APIs.
* On a 32-bit linux/CMUCL system, 10000 inactive keepalive connections consume about 3M of user-space memory (in addition to two lisp images). * The number of inactive keepalive connections has negligible performance impact on new connections.
There are three modes for sending files: medium, small, and large:
1. Medium: These files are mmap()ed (memory mapped) to avoid copying the file's data into user-space. The data is copied directly from the filesystem to the kernel's socket buffer. 2. Small: These files are read into a user-space buffer because a small read() is often cheaper than mmap()+munmap(). 3. Large: Antiweb uses a user-space buffer for large files. This is to avoid disk-thrashing when serving many large files to clients concurrently (idea from lighttpd) and to avoid running out of address space on 32 bit systems.
Super-size it: Because Antiweb uses a 64-bit off_t type and lisp's unlimited precision integers on all systems, Antiweb can serve files of any size. It also supports download resuming for all three file send modes.
Antiweb's data structures are designed for pipelining. Antiweb uses vectored I/O (also known as scatter-gather I/O) along with non-blocking I/O nearly everywhere. Antiweb's internal message passing protocol uses pipelining also. For example, an HTTP connection that pipelines two requests for small files followed by one request for a medium file is responded to with a single writev() system call consisting of the following:
* The HTTP headers and file contents for the first two small files * The HTTP headers for the medium file * As much of the memory mapped medium file as it takes to fill the kernel's socket buffer.
Subsequently, all the generated log messages are written to the hub process with another writev(). The hub will eventually append the log messages (as well as any others that queue up) onto the axslog log file.
To see the connection statistics of a worker process, use the -stats command:
# antiweb -stats /var/aw/example.conf ... Keepalive Time: 65 seconds Total Connections: 41 HTTP requests: 72 Avg reqs/conn: 1.8 File descriptor usage (estimate): 17/32767 Current Connections: 11 Keepalives: 7 Sending files to: 2 Proxy: Sources: 0 Sinks: 0 Idle: 0 Timers: 0 Hub: 1 Unix Connections: 1 Lingering: 0 Zombies: 0 ...
Notice that in addition to the HTTP traffic, there is also a connection to the hub's unix socket that was opened on start-up, and one other open unix socket. That other unix socket is you. You created a supervisor connection while asking for stats info.
-stats will also tell you how hosts are mapped to directories on a worker:
# antiweb -stats /var/aw/example.conf ... Host -> HTML root mappings: localhost -> /var/www/testing example.com -> /var/www/example.com www.example.com -> /var/www/example.com ...
Although usually we love it, sometimes pipelining is bad. Antiweb deliberately tears down persistent HTTP connections on certain responses:
* 4XX and 5XX HTTP Errors - This is to prevent blind web vulnerability scanners like nikto from persisting or pipelining 95+ percent of their requests. * Directory Listings - To prevent pipelined recursive crawling.
When finished with a connection, Antiweb will shutdown the write direction of the socket and linger as required by HTTP/1.1. Antiweb always gracefully degrades for HTTP/0.9 and HTTP/1.0 clients. Antiweb has first-class IPv6 support. If you really do want to pipeline 4XX and 5XX errors, you have two options:
1. Use Antiweb's rewrite module to change problematic requests into requests for existing files. 2. Use Antiweb's fast-files module. This is a memory cache that supports accelerated static content, pre-generation of HTTP headers, negative caching, and persisting/pipelining 404 errors.
Antiweb was designed with security in mind from the beginning. Here are some of the security decisions made during the Antiweb design process:
* Virtual hosts are privilege-separated without proxying. Once the hub has determined which worker should handle a connection, it transfers the socket to the worker process and has nothing further to do with the connection. Worker processes run under different UIDs from the hub (and each-other). Workers are optionally chroot()ed. * Workers have no access to log files: all log messages are sent to the hub over the unix socket. This means that a compromised worker process cannot steal previously created log messages or log messages created by other workers. * CGI processes can be restricted with resource limitations. * Even on lisps without unicode support, Antiweb4 guarantees all internal data and filenames are UTF-8 encoded. This includes verifying all code-points are in their shortest possible representation and that there are no otherwise invalid surrogates. * Antiweb processes never try to clean-up or recover in the event of an unexpected condition. A process cannot do that because it has failed. Some other process that hasn't failed will clean-up after it.
Antiweb also includes an experimental new technology for constructing webpages called Anti Webpages. These are Perl-inspired programs that let you draw page layouts with significant whitespace, glue together HTML/CSS/Javascript, and more.
Antiweb was created for admins, by admins. Please let us know any ways you think it could be better.