Hello, and a happy new year to everyone.
As promised on #lisp, I did some profiling of redisplay, both incremental and non incremental. The program I wrote displays a certain number of paragraphs of text, each paragraph has an average of 10 lines of text, each line has an average of 10 words, and each word has an average of 5.5 characters. Here are the results so far.
Experiment 1: 100 paragraphs. Make three nested loops, each time doing a format for each word (and the space after it), a terpri after each line, and another terpri after each paragraph.
seconds | consed | calls | sec/call | name ---------------------------------------------------------- 1.788 | 40,813,664 | 18,499 | 0.000097 | STREAM-WRITE-STRING 1.431 | 109,924,368 | 1,085 | 0.001319 | STREAM-WRITE-CHAR 0.416 | 14,113,200 | 18,515 | 0.000022 | CLIM:STREAM-ADD-STRING-OUTPUT ---------------------------------------------------------- 3.635 | 164,851,232 | 38,099 | | Total
estimated total profiling overhead: 0.05 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 2: Set drawing-p to nil while the redisplay function is running and do a replay at the end. This experiment shows that most of the time (75%) spent in stream-write-string above was spent actually displaying the string, and not adding output records. It is perhaps a bit surprising that stream-write-string is faster (by a factor 40) than stream-write-char. I have no explanation for that at this point.
seconds | consed | calls | sec/call | name ---------------------------------------------------------- 1.107 | 73,825,392 | 1,263 | 0.000877 | STREAM-WRITE-CHAR 0.436 | 15,569,280 | 18,310 | 0.000024 | STREAM-WRITE-STRING 0.344 | 12,530,640 | 18,313 | 0.000019 | CLIM:STREAM-ADD-STRING-OUTPUT ---------------------------------------------------------- 1.887 | 101,925,312 | 37,886 | | Total
estimated total profiling overhead: 0.05 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 3: Surround each paragraph and new line output with `with-new-output-record' (no record type give) so that we get a structure of nested output records. This experiment shows that there is a little but not significant difference between inserting new output records when there are already many of them and when there are few of them.
seconds | consed | calls | sec/call | name --------------------------------------------------------- 0.823 | 66,124,880 | 1,073 | 0.000767 | STREAM-WRITE-CHAR 0.504 | 14,937,136 | 18,029 | 0.000028 | STREAM-WRITE-STRING 0.312 | 12,665,376 | 18,045 | 0.000017 | CLIM:STREAM-ADD-STRING-OUTPUT --------------------------------------------------------- 1.640 | 93,727,392 | 37,147 | | Total
estimated total profiling overhead: 0.05 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 4: Like experiment 3, except give an explicit type of standard-tree-output-record. There is no significant difference compared to experiment 3, though there might have been for many output records. I guess this shows that tree output records can handle large outputs without any significant speed penalty.
seconds | consed | calls | sec/call | name --------------------------------------------------------- 0.920 | 65,444,656 | 1,060 | 0.000867 | STREAM-WRITE-CHAR 0.520 | 14,775,504 | 17,995 | 0.000029 | STREAM-WRITE-STRING 0.388 | 12,564,224 | 18,011 | 0.000022 | CLIM:STREAM-ADD-STRING-OUTPUT --------------------------------------------------------- 1.828 | 92,784,384 | 37,066 | | Total
estimated total profiling overhead: 0.05 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 5: Use incremental redisplay, and replace the calls to with-new-output-record with calls to updating-output. Do not count initial display of the pane, and instead invoke the redisplay 100 times (by using a command that does nothing). Notice that there are other panes around, so you can't rely on the `calls' and `sec/call' figures of redisplay-frame-pane. This experiment shows that incremental redisplay is doing its job, and costing around 12 ms in this case.
seconds | consed | calls | sec/call | name -------------------------------------------------------- 1.260 | 34,307,008 | 1,300 | 0.000970 | CLIM:REDISPLAY-FRAME-PANE 0.048 | 802,576 | 1,700 | 0.000028 | CLIM:STREAM-ADD-STRING-OUTPUT 0.030 | 1,068,960 | 1,600 | 0.000019 | STREAM-WRITE-CHAR 0.016 | 73,456 | 100 | 0.000159 | STREAM-WRITE-STRING -------------------------------------------------------- 1.354 | 36,252,000 | 4,700 | | Total
estimated total profiling overhead: 0.01 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 6: Like experiment 5, but do 10 times as many paragraphs (1000 paragraphs, 10k lines in total). Incremental redisplay seems to cost 100ms in this case.
seconds | consed | calls | sec/call | name --------------------------------------------------------- 10.429 | 235,221,216 | 1,300 | 0.008023 | CLIM:REDISPLAY-FRAME-PANE 0.027 | 1,532,000 | 1,700 | 0.000016 | CLIM:STREAM-ADD-STRING-OUTPUT 0.026 | 196,448 | 1,600 | 0.000016 | STREAM-WRITE-CHAR 0.008 | 530,816 | 100 | 0.000079 | STREAM-WRITE-STRING --------------------------------------------------------- 10.490 | 237,480,480 | 4,700 | | Total
estimated total profiling overhead: 0.01 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 7: Like experiment 6, but instead of a command that does nothing, invoke a command that modifies the first word of the first line of the first paragraph (and changes the output records accordingly). There are about twice as many calls to stream-write-string and stream-add-string-output as I had expected. It should have to redraw around 10 strings (one line) in each call, so I had expected 10000 calls. Still, this experiment shows that incremental redisplay itself is working, but it is a bit slow, around 100ms per call, even though almost all output records are the same and in the same place.
seconds | consed | calls | sec/call | name ---------------------------------------------------------- 12.228 | 269,291,632 | 1,300 | 0.009407 | CLIM:REDISPLAY-FRAME-PANE 0.601 | 17,087,472 | 20,400 | 0.000029 | STREAM-WRITE-STRING 0.375 | 12,453,424 | 20,400 | 0.000018 | CLIM:STREAM-ADD-STRING-OUTPUT 0.031 | 86,624 | 1,200 | 0.000026 | STREAM-WRITE-CHAR ---------------------------------------------------------- 13.236 | 298,919,152 | 43,300 | | Total
estimated total profiling overhead: 0.06 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 8: Like experiment 7, but instead of adding a word to the first line, either add a new line at the beginning or delete the first line, so that all top-level output records will be in the wrong place. There is a noticeable delay between screen updates in this experiment (around a second).
seconds | consed | calls | sec/call | name ------------------------------------------------------------ 90.237 | 2,637,312,912 | 1,300 | 0.069413 | CLIM:REDISPLAY-FRAME-PANE 0.723 | 20,536,496 | 25,210 | 0.000029 | STREAM-WRITE-STRING 0.433 | 17,614,384 | 25,210 | 0.000017 | CLIM:STREAM-ADD-STRING-OUTPUT 0.039 | 57,472 | 1,376 | 0.000029 | STREAM-WRITE-CHAR ------------------------------------------------------------ 91.432 | 2,675,521,264 | 53,096 | | Total
estimated total profiling overhead: 0.07 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling
Experiment 9: Similar to experiment 8. Use incremental redisplay. Do not count initial display of the pane. Invoke the redisplay 100 times by using a command that either adds a new line at the beginning or deletes the first line, so that all top-level output records will be in the wrong place. This experiment shows that at least 80% of the time spent in redisplay-frame-pane is spent in invoke-updating-output, and that more than 80% of the time spent in invoke-updating-output is spent in move-output-record. The reason for that is twofold. First, the spec says that the position (and start-cursor-position and end-cursor-position) of an output record is "absolute", i.e. relative to the stream (as opposed to being relative to the parent output record). Second, the spec was interpreted very literally by the McCLIM implementors to mean that the absolute position had to be stored in the output record. Thus, moving a top-level output record involves recursively moving each child. In our case, this means that when a paragraph is moved, each line is moved as well, resulting in at least 10 times as much work as necessary. A better idea would be to store the position relative to the parent, to sum them up when the position is asked for, and to only modify the position of the root record, when it is to be moved.
seconds | consed | calls | sec/call | name ------------------------------------------------------------- 64.196 | 1,425,446,000 | 99,900 | 0.000643 | CLIM-INTERNALS::MOVE-OUTPUT-RECORD 9.620 | 162,623,840 | 101,482 | 0.000095 | CLIM:INVOKE-UPDATING-OUTPUT 0.852 | 1,442,704 | 104,846 | 0.000008 | CLIM:ADD-OUTPUT-RECORD ------------------------------------------------------------- 74.669 | 1,589,512,544 | 306,228 | | Total
estimated total profiling overhead: 0.40 seconds overhead estimation parameters: 8.000001e-9s/call, 1.312e-6s total profiling, 4.56e-7s internal profiling