[erlang-in-lisp-devel] Fwd: Invitation to beta-test new Common Lisp webserver

21 Jul 2008

      antiweb seems to be using a lot of the techniques we'd like for an
erlang-in-lisp implementation: processes that communicate through
messages, efficient event loop to manage a great number of
connections, etc.

[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ]
A tautology is a thing which is tautological.

---------- Forwarded message ----------
From:  <doug@hcsw.org>
Date: 2008/7/17
Subject: Invitation to beta-test new Common Lisp webserver
To: fahree@gmail.com

Hi Faré,

We spoke earlier on IRC and apparently have similar ideas
about server design so I thought you might be interested
in having a look at a webserver I have been working on:

http://hoytech.com/antiweb/

I'm appending the first section of the "Design of Antiweb"
page of the manual.

Best,

Doug

Antiweb is a webserver written in Common Lisp, C, and Perl by Doug Hoyte and
Hoytech. Antiweb is not a "proof of concept" and is not "exploratory code". We
intend the core design of Antiweb (as laid out in this design paper) to be
stable for the next 10+ years of use.

The two webservers that have had the largest influence on Antiweb4 are nginx and
lighttpd. We took liberal advantage investigating these and other excellent
servers while designing Antiweb4. Another more obscure server that has
influenced
Antiweb is fhttpd.

Why another webserver? In our opinion, the biggest problem with the above
servers is that they aren't written in lisp. Many servers that we studied have
grafted on extension languages (ie, Perl for nginx and Lua for lighttpd).
Antiweb is different. Instead of being a C program that uses some other
language, Antiweb is a lisp program that uses C (and Perl).

Like nginx, lighttpd, fhttpd, and Antiweb3, Antiweb4 is an asynchronous or
event-based server, meaning that a single thread of control multiplexes multiple
client connections. Antiweb is a collection of unix processes. Connections are
transferred between processes with sendmsg(). When this happens, any data that
was initially read from the socket is transferred along with the socket itself.
The socket is always closed in the sending process.

To multiplex connections inside a process, Antiweb uses a state machine data
structure defined in src/libantiweb.h. Antiweb requires either the kqueue() or
epoll() stateful level-triggered event APIs.

 * On a 32-bit linux/CMUCL system, 10000 inactive keepalive connections
   consume about 3M of user-space memory (in addition to two lisp images).
 * The number of inactive keepalive connections has negligible performance
   impact on new connections.

There are three modes for sending files: medium, small, and large:

 1. Medium: These files are mmap()ed (memory mapped) to avoid copying the
   file's data into user-space. The data is copied directly from the filesystem
   to the kernel's socket buffer.
 2. Small: These files are read into a user-space buffer because a small
   read() is often cheaper than mmap()+munmap().
 3. Large: Antiweb uses a user-space buffer for large files. This is to avoid
   disk-thrashing when serving many large files to clients concurrently (idea
   from lighttpd) and to avoid running out of address space on 32 bit systems.

Super-size it: Because Antiweb uses a 64-bit off_t type and lisp's unlimited
precision integers on all systems, Antiweb can serve files of any size. It also
supports download resuming for all three file send modes.

Antiweb's data structures are designed for pipelining. Antiweb uses vectored I/O
(also known as scatter-gather I/O) along with non-blocking I/O nearly
everywhere. Antiweb's internal message passing protocol uses pipelining also.
For example, an HTTP connection that pipelines two requests for small files
followed by one request for a medium file is responded to with a single writev()
system call consisting of the following:

 * The HTTP headers and file contents for the first two small files
 * The HTTP headers for the medium file
 * As much of the memory mapped medium file as it takes to fill the kernel's
   socket buffer.

Subsequently, all the generated log messages are written to the hub process with
another writev(). The hub will eventually append the log messages (as well as
any others that queue up) onto the axslog log file.

To see the connection statistics of a worker process, use the -stats command:

# antiweb -stats /var/aw/example.conf
...
Keepalive Time: 65 seconds
Total Connections: 41  HTTP requests: 72  Avg reqs/conn: 1.8
File descriptor usage (estimate): 17/32767
Current Connections: 11
 Keepalives: 7  Sending files to: 2
 Proxy: Sources: 0  Sinks: 0  Idle: 0
 Timers: 0  Hub: 1  Unix Connections: 1
 Lingering: 0  Zombies: 0
...

Notice that in addition to the HTTP traffic, there is also a connection to the
hub's unix socket that was opened on start-up, and one other open unix socket.
That other unix socket is you. You created a supervisor connection while asking
for stats info.

-stats will also tell you how hosts are mapped to directories on a worker:

# antiweb -stats /var/aw/example.conf
...
Host -> HTML root mappings:
 localhost -> /var/www/testing
 example.com -> /var/www/example.com
 www.example.com -> /var/www/example.com
...

Although usually we love it, sometimes pipelining is bad. Antiweb deliberately
tears down persistent HTTP connections on certain responses:

 * 4XX and 5XX HTTP Errors - This is to prevent blind web vulnerability
   scanners like nikto from persisting or pipelining 95+ percent of their
   requests.
 * Directory Listings - To prevent pipelined recursive crawling.

When finished with a connection, Antiweb will shutdown the write direction of
the socket and linger as required by HTTP/1.1. Antiweb always gracefully
degrades for HTTP/0.9 and HTTP/1.0 clients. Antiweb has first-class IPv6
support. If you really do want to pipeline 4XX and 5XX errors, you have two
options:

 1. Use Antiweb's rewrite module to change problematic requests into requests
   for existing files.
 2. Use Antiweb's fast-files module. This is a memory cache that supports
   accelerated static content, pre-generation of HTTP headers, negative
   caching, and persisting/pipelining 404 errors.

Antiweb was designed with security in mind from the beginning. Here are some of
the security decisions made during the Antiweb design process:

 * Virtual hosts are privilege-separated without proxying. Once the hub has
   determined which worker should handle a connection, it transfers the socket
   to the worker process and has nothing further to do with the
connection. Worker
   processes run under different UIDs from the hub (and each-other). Workers are
   optionally chroot()ed.
 * Workers have no access to log files: all log messages are sent to the hub
   over the unix socket. This means that a compromised worker process
cannot steal
   previously created log messages or log messages created by other workers.
 * CGI processes can be restricted with resource limitations.
 * Even on lisps without unicode support, Antiweb4 guarantees all internal
   data and filenames are UTF-8 encoded. This includes verifying all code-points
   are in their shortest possible representation and that there are no otherwise
   invalid surrogates.
 * Antiweb processes never try to clean-up or recover in the event of an
   unexpected condition. A process cannot do that because it has failed. Some
   other process that hasn't failed will clean-up after it.

Antiweb also includes an experimental new technology for constructing webpages
called Anti Webpages. These are Perl-inspired programs that let you draw page
layouts with significant whitespace, glue together HTML/CSS/Javascript, and
more.

Antiweb was created for admins, by admins. Please let us know any ways you think
it could be better.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFIfuKO3LTjmOMguVMRAvfUAJ4kg5aWoMfmHkrcWHvITs8Sqa9oEgCeMqPP
lOxSU0c6lc3ZU1BOTH6L79w=
=9+fV
-----END PGP SIGNATURE-----

Faré

tags

participants (1)