I took a file of about 450MB of characters. Using SBCL, when I read it like this:
(defun do-test2 ()
(with-open-file (stream *text-file*)
(let ((buffer-size (* 16 1024 1024)) ; 16M
)
(time
(loop with buffer = (make-array buffer-size :element-type 'character)
for n-characters = (read-sequence buffer stream)
while (< 0 n-characters))))))
It took an average of 1.08125s to read (4 trials).
This procedure:
(defun do-test3 ()
(with-open-file (stream *text-file* :element-type '(unsigned-byte 8))
(let ((buffer-size (* 16 1024 1024)) ; 16M
)
(time
(loop with buffer = (make-array buffer-size :element-type '(unsigned-byte 8))
for n-characters = (read-sequence buffer stream)
while (< 0 n-characters))))))
It took an average of 0.07s
Modifying this to set the :external-format
to :iso8859-1
and reading into an array of :element-type 'character
it takes an average of 0.8095s
So there seems to be some overhead to the unicode handling. Note that I didn't have a file at hand that actually had ISO8859-1 in it, so I don't know if that would have complicated matters.
This suggests that just moving around bits without worrying about their interpretation may be faster than treating them as characters. So you could see if that changes your results at all.
I'm not a real expert in CL file I/O, so it's likely that this could be done better.
On 21 Oct 2022, at 16:18, Garrett Dangerfield wrote:
I tried changing (make-array buffer-size :element-type 'character)to(make-array buffer-size :element-type 'byte)and I got additional warnings and it took 70 seconds instead of 20.Thanks,Garrett.
On Fri, Oct 21, 2022 at 1:47 PM Robert Goldman <rpgoldman@sift.net> wrote:I don't know what data you are reading but is there any chance that the difference is that when you read text in lisp as ISO-8859-1 lisp is actually processing the text as unicode, but when you are reading it in Java you are just slamming raw bytes into memory?
Maybe this is relevant? https://stackoverflow.com/questions/979932/read-unicode-text-files-with-java
I don't use Java myself, so I can't say, and I don't have access to your data, but it does seem like the Java code is doing something simpler than the Lisp code.
What happens if you change your Lisp code to
read-sequence
of typebyte
instead ofcharacter
?On 21 Oct 2022, at 13:43, Garrett Dangerfield wrote:
I don't want to cause a firestore here but I was doing some simple
benchmarks on file i/o between Java, ABCL, and SBCL and I'm a bit shocked,
honestly.Reading a 2.5M file in 16M chunks in (using iso-8859-1):
- abcl takes a tad over 1 second
- sbcl takes 0.04 secondsReading a 5.8G file in 16M chunks in (using iso-8859-1 for Lisp, for Java
it's just bytes):
- abcl takes...too long, I gave up
- sbcl takes between 20 and 21 seconds
- Java takes 1.5 secondsThese are all run on the same computer using the same files, etc.
What's up with this? Thoughts? I'd heard that SBCL should be as fast as C
under at least some circumstances. I'd wager that C is at least as fast as
Java (probably faster).Thanks,
Garrett Dangerfield. (he/him/his)P.S. Don't get me wrong, I *LOVE* Lisp, I'm trying to get away from Java as
fast as I can (the syntax is killing me slowly). I've used ABCL in
projects before (it was wonderful, Java doesn't handle XML well).Lisp code:
(with-open-file (stream "/media/danger/OS/temp/jars.txt" :external-format
:iso-8859-1) ; great_expectations.iso
(let ((size (file-length stream))
(buffer-size (* 16 1024 1024)) ; 16M
)
(time
(loop with buffer = (make-array buffer-size :element-type 'character)
for n-characters = (read-sequence buffer stream)
while (< 0 n-characters)))
)))Java code:
private static final int BUFFER_SIZE = 16 * 1024 * 1024;
try (InputStream in = new
FileInputStream("/media/danger/OS/temp/great_expectations.iso"); ) {
byte[] buff = new byte[BUFFER_SIZE];
int chunkLen = -1;
long start = System.currentTimeMillis();
while ((chunkLen = in.read(buff)) != -1) {
System.out.println("chunkLen = " + chunkLen);
}
double duration = System.currentTimeMillis() - start;
duration /= 1000;
System.out.println(String.format("it took %,2f secs", duration));
} catch (Exception e) {
e.printStackTrace(System.out);
} finally {
System.out.println("Done.");
}Robert P. Goldman
Research Fellow
Smart Information Flow Technologies (d/b/a SIFT, LLC)319 N. First Ave., Suite 400
Minneapolis, MN 55401Voice: (612) 326-3934
Email: rpgoldman@SIFT.net
Robert P. Goldman
Research Fellow
Smart Information Flow Technologies (d/b/a SIFT, LLC)
319 N. First Ave., Suite 400
Minneapolis, MN 55401
Voice: (612) 326-3934
Email: rpgoldman@SIFT.net