Hello, and a happy new year to everyone.
As promised on #lisp, I did some profiling of redisplay, both
incremental and non incremental. The program I wrote displays a
certain number of paragraphs of text, each paragraph has an average of
10 lines of text, each line has an average of 10 words, and each word
has an average of 5.5 characters. Here are the results so far.
Experiment 1: 100 paragraphs. Make three nested loops, each time
doing a format for each word (and the space after it), a terpri after
each line, and another terpri after each paragraph.
seconds | consed | calls | sec/call | name
----------------------------------------------------------
1.788 | 40,813,664 | 18,499 | 0.000097 | STREAM-WRITE-STRING
1.431 | 109,924,368 | 1,085 | 0.001319 | STREAM-WRITE-CHAR
0.416 | 14,113,200 | 18,515 | 0.000022 | CLIM:STREAM-ADD-STRING-OUTPUT
----------------------------------------------------------
3.635 | 164,851,232 | 38,099 | | Total
estimated total profiling overhead: 0.05 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 2: Set drawing-p to nil while the redisplay function is
running and do a replay at the end. This experiment shows that most
of the time (75%) spent in stream-write-string above was spent
actually displaying the string, and not adding output records. It is
perhaps a bit surprising that stream-write-string is faster (by a
factor 40) than stream-write-char. I have no explanation for that at
this point.
seconds | consed | calls | sec/call | name
----------------------------------------------------------
1.107 | 73,825,392 | 1,263 | 0.000877 | STREAM-WRITE-CHAR
0.436 | 15,569,280 | 18,310 | 0.000024 | STREAM-WRITE-STRING
0.344 | 12,530,640 | 18,313 | 0.000019 | CLIM:STREAM-ADD-STRING-OUTPUT
----------------------------------------------------------
1.887 | 101,925,312 | 37,886 | | Total
estimated total profiling overhead: 0.05 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 3: Surround each paragraph and new line output with
`with-new-output-record' (no record type give) so that we get a
structure of nested output records. This experiment shows that there is
a little but not significant difference between inserting new output
records when there are already many of them and when there are few of
them.
seconds | consed | calls | sec/call | name
---------------------------------------------------------
0.823 | 66,124,880 | 1,073 | 0.000767 | STREAM-WRITE-CHAR
0.504 | 14,937,136 | 18,029 | 0.000028 | STREAM-WRITE-STRING
0.312 | 12,665,376 | 18,045 | 0.000017 | CLIM:STREAM-ADD-STRING-OUTPUT
---------------------------------------------------------
1.640 | 93,727,392 | 37,147 | | Total
estimated total profiling overhead: 0.05 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 4: Like experiment 3, except give an explicit type of
standard-tree-output-record. There is no significant difference
compared to experiment 3, though there might have been for many output
records. I guess this shows that tree output records can handle large
outputs without any significant speed penalty.
seconds | consed | calls | sec/call | name
---------------------------------------------------------
0.920 | 65,444,656 | 1,060 | 0.000867 | STREAM-WRITE-CHAR
0.520 | 14,775,504 | 17,995 | 0.000029 | STREAM-WRITE-STRING
0.388 | 12,564,224 | 18,011 | 0.000022 | CLIM:STREAM-ADD-STRING-OUTPUT
---------------------------------------------------------
1.828 | 92,784,384 | 37,066 | | Total
estimated total profiling overhead: 0.05 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 5: Use incremental redisplay, and replace the calls to
with-new-output-record with calls to updating-output. Do not count
initial display of the pane, and instead invoke the redisplay 100
times (by using a command that does nothing). Notice that there are
other panes around, so you can't rely on the `calls' and `sec/call'
figures of redisplay-frame-pane. This experiment shows that
incremental redisplay is doing its job, and costing around 12 ms in
this case.
seconds | consed | calls | sec/call | name
--------------------------------------------------------
1.260 | 34,307,008 | 1,300 | 0.000970 | CLIM:REDISPLAY-FRAME-PANE
0.048 | 802,576 | 1,700 | 0.000028 | CLIM:STREAM-ADD-STRING-OUTPUT
0.030 | 1,068,960 | 1,600 | 0.000019 | STREAM-WRITE-CHAR
0.016 | 73,456 | 100 | 0.000159 | STREAM-WRITE-STRING
--------------------------------------------------------
1.354 | 36,252,000 | 4,700 | | Total
estimated total profiling overhead: 0.01 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 6: Like experiment 5, but do 10 times as many paragraphs
(1000 paragraphs, 10k lines in total). Incremental redisplay seems to
cost 100ms in this case.
seconds | consed | calls | sec/call | name
---------------------------------------------------------
10.429 | 235,221,216 | 1,300 | 0.008023 | CLIM:REDISPLAY-FRAME-PANE
0.027 | 1,532,000 | 1,700 | 0.000016 | CLIM:STREAM-ADD-STRING-OUTPUT
0.026 | 196,448 | 1,600 | 0.000016 | STREAM-WRITE-CHAR
0.008 | 530,816 | 100 | 0.000079 | STREAM-WRITE-STRING
---------------------------------------------------------
10.490 | 237,480,480 | 4,700 | | Total
estimated total profiling overhead: 0.01 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 7: Like experiment 6, but instead of a command that does
nothing, invoke a command that modifies the first word of the first
line of the first paragraph (and changes the output records
accordingly). There are about twice as many calls to
stream-write-string and stream-add-string-output as I had expected.
It should have to redraw around 10 strings (one line) in each call, so
I had expected 10000 calls. Still, this experiment shows that
incremental redisplay itself is working, but it is a bit slow, around
100ms per call, even though almost all output records are the same and
in the same place.
seconds | consed | calls | sec/call | name
----------------------------------------------------------
12.228 | 269,291,632 | 1,300 | 0.009407 | CLIM:REDISPLAY-FRAME-PANE
0.601 | 17,087,472 | 20,400 | 0.000029 | STREAM-WRITE-STRING
0.375 | 12,453,424 | 20,400 | 0.000018 | CLIM:STREAM-ADD-STRING-OUTPUT
0.031 | 86,624 | 1,200 | 0.000026 | STREAM-WRITE-CHAR
----------------------------------------------------------
13.236 | 298,919,152 | 43,300 | | Total
estimated total profiling overhead: 0.06 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 8: Like experiment 7, but instead of adding a word to the
first line, either add a new line at the beginning or delete the first
line, so that all top-level output records will be in the wrong
place. There is a noticeable delay between screen updates in this
experiment (around a second).
seconds | consed | calls | sec/call | name
------------------------------------------------------------
90.237 | 2,637,312,912 | 1,300 | 0.069413 | CLIM:REDISPLAY-FRAME-PANE
0.723 | 20,536,496 | 25,210 | 0.000029 | STREAM-WRITE-STRING
0.433 | 17,614,384 | 25,210 | 0.000017 | CLIM:STREAM-ADD-STRING-OUTPUT
0.039 | 57,472 | 1,376 | 0.000029 | STREAM-WRITE-CHAR
------------------------------------------------------------
91.432 | 2,675,521,264 | 53,096 | | Total
estimated total profiling overhead: 0.07 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 9: Similar to experiment 8. Use incremental redisplay. Do
not count initial display of the pane. Invoke the redisplay 100 times
by using a command that either adds a new line at the beginning or
deletes the first line, so that all top-level output records will be
in the wrong place. This experiment shows that at least 80% of the
time spent in redisplay-frame-pane is spent in invoke-updating-output,
and that more than 80% of the time spent in invoke-updating-output is
spent in move-output-record. The reason for that is twofold. First,
the spec says that the position (and start-cursor-position and
end-cursor-position) of an output record is "absolute", i.e. relative
to the stream (as opposed to being relative to the parent output
record). Second, the spec was interpreted very literally by the
McCLIM implementors to mean that the absolute position had to be stored
in the output record. Thus, moving a top-level output record involves
recursively moving each child. In our case, this means that when a
paragraph is moved, each line is moved as well, resulting in at least
10 times as much work as necessary. A better idea would be to store
the position relative to the parent, to sum them up when the position
is asked for, and to only modify the position of the root record, when
it is to be moved.
seconds | consed | calls | sec/call | name
-------------------------------------------------------------
64.196 | 1,425,446,000 | 99,900 | 0.000643 | CLIM-INTERNALS::MOVE-OUTPUT-RECORD
9.620 | 162,623,840 | 101,482 | 0.000095 | CLIM:INVOKE-UPDATING-OUTPUT
0.852 | 1,442,704 | 104,846 | 0.000008 | CLIM:ADD-OUTPUT-RECORD
-------------------------------------------------------------
74.669 | 1,589,512,544 | 306,228 | | Total
estimated total profiling overhead: 0.40 seconds
overhead estimation parameters:
8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
--
Robert Strandh
---------------------------------------------------------------------
Greenspun's Tenth Rule of Programming: any sufficiently complicated C
or Fortran program contains an ad hoc informally-specified bug-ridden
slow implementation of half of Common Lisp.
---------------------------------------------------------------------