I don't know what data you are reading but is there any chance that the difference is that when you read text in lisp as ISO-8859-1 lisp is actually processing the text as unicode, but when you are reading it in Java you are just slamming raw bytes into memory?
Maybe this is relevant? https://stackoverflow.com/questions/979932/read-unicode-text-files-with-java
I don't use Java myself, so I can't say, and I don't have access to your data, but it does seem like the Java code is doing something simpler than the Lisp code.
What happens if you change your Lisp code to `read-sequence` of type `byte` instead of `character`?
On 21 Oct 2022, at 13:43, Garrett Dangerfield wrote:
I don't want to cause a firestore here but I was doing some simple benchmarks on file i/o between Java, ABCL, and SBCL and I'm a bit shocked, honestly.
Reading a 2.5M file in 16M chunks in (using iso-8859-1):
- abcl takes a tad over 1 second
- sbcl takes 0.04 seconds
Reading a 5.8G file in 16M chunks in (using iso-8859-1 for Lisp, for Java it's just bytes):
- abcl takes...too long, I gave up
- sbcl takes between 20 and 21 seconds
- Java takes 1.5 seconds
These are all run on the same computer using the same files, etc.
What's up with this? Thoughts? I'd heard that SBCL should be as fast as C under at least some circumstances. I'd wager that C is at least as fast as Java (probably faster).
Thanks, Garrett Dangerfield. (he/him/his)
P.S. Don't get me wrong, I *LOVE* Lisp, I'm trying to get away from Java as fast as I can (the syntax is killing me slowly). I've used ABCL in projects before (it was wonderful, Java doesn't handle XML well).
Lisp code: (with-open-file (stream "/media/danger/OS/temp/jars.txt" :external-format :iso-8859-1) ; great_expectations.iso (let ((size (file-length stream)) (buffer-size (* 16 1024 1024)) ; 16M ) (time (loop with buffer = (make-array buffer-size :element-type 'character) for n-characters = (read-sequence buffer stream) while (< 0 n-characters))) )))
Java code: private static final int BUFFER_SIZE = 16 * 1024 * 1024; try (InputStream in = new FileInputStream("/media/danger/OS/temp/great_expectations.iso"); ) { byte[] buff = new byte[BUFFER_SIZE]; int chunkLen = -1; long start = System.currentTimeMillis(); while ((chunkLen = in.read(buff)) != -1) { System.out.println("chunkLen = " + chunkLen); } double duration = System.currentTimeMillis() - start; duration /= 1000; System.out.println(String.format("it took %,2f secs", duration)); } catch (Exception e) { e.printStackTrace(System.out); } finally { System.out.println("Done."); }
Robert P. Goldman Research Fellow Smart Information Flow Technologies (d/b/a SIFT, LLC)
319 N. First Ave., Suite 400 Minneapolis, MN 55401
Voice: (612) 326-3934 Email: rpgoldman@SIFT.net