I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte buffer. For example we want to write a long string to a byte stream using a fixed size byte buffer. STRING-TO-OCTETS seems to return the buffer and the number of bytes written. E.g. (let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\a))) (stream:string-to-octets string :external-format :utf32 :buffer buffer)) Returns a new vector (non-eq to buffer) and 404. Shouldn't one return value also indicate how many characters were converted so that the conversion can be continued at that character offset without allocating a fresh buffer? Helmut
On 10/29/11 4:18 AM, Helmut Eller wrote:
I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte buffer. For example we want to write a long string to a byte stream using a fixed size byte buffer. STRING-TO-OCTETS seems to return the buffer and the number of bytes written. E.g.
(let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\a))) (stream:string-to-octets string :external-format :utf32 :buffer buffer))
Returns a new vector (non-eq to buffer) and 404.
Shouldn't one return value also indicate how many characters were converted so that the conversion can be continued at that character offset without allocating a fresh buffer?
Good question. It seems that string-to-octets grew the buffer and the new buffer has all of the converted characters and the original contains just the that portion. I'll have to look through the history to see why it is this way, but I think you're right. If a buffer is supplied, string-to-octets shouldn't grow the buffer; it should stop when the buffer is full. Although, I can see why it needs to: the code doesn't know how many octets are needed for a character until the character is converted, and by then we may have exceeded the buffer, so the buffer gets a partially converted character. If you were going the other way (octets-to-string), there's octets-to-string-counted that tells you how many octets were consumed for each character. This was needed to make streams fast because an input buffer may not have a full character at the end of the buffer. Ray
On 10/29/11 4:18 AM, Helmut Eller wrote:
I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte buffer. For example we want to write a long string to a byte stream using a fixed size byte buffer. STRING-TO-OCTETS seems to return the buffer and the number of bytes written. E.g.
(let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\a))) (stream:string-to-octets string :external-format :utf32 :buffer buffer))
Is this better: (let ((buffer (make-array 19 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\u+3b2))) (multiple-value-bind (b p i last) (stream:string-to-octets string :external-format :utf8 :buffer buffer) (values b p i last))) #(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206) 19 9 18 9 is the number of characters converted, 18 is the index+1 in the buffer where the last valid octet was placed. The last octet is the first octect of the 2-octet utf8 encoding for #\u+3b2. If no buffer is specified, the entire string is converted and the buffer is returned along with the number of octets generated. (Kind of redundant now.) Ray
* Raymond Toy [2011-10-30 03:28] writes:
On 10/29/11 4:18 AM, Helmut Eller wrote:
I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte buffer. For example we want to write a long string to a byte stream using a fixed size byte buffer. STRING-TO-OCTETS seems to return the buffer and the number of bytes written. E.g.
(let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\a))) (stream:string-to-octets string :external-format :utf32 :buffer buffer))
Is this better:
(let ((buffer (make-array 19 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\u+3b2))) (multiple-value-bind (b p i last) (stream:string-to-octets string :external-format :utf8 :buffer buffer) (values b p i last)))
#(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206) 19 9 18
9 is the number of characters converted, 18 is the index+1 in the buffer where the last valid octet was placed. The last octet is the first octect of the 2-octet utf8 encoding for #\u+3b2.
Is 19 the number of octets written? Or is it an index? Might be nice to either use counts or indexes consistently. I'm not sure that I understand the purpose of the 18. Is this something that is needed later?
If no buffer is specified, the entire string is converted and the buffer is returned along with the number of octets generated. (Kind of redundant now.)
Returning the buffer is also redundant in the case where buffer is specified. We can return only 3 values in registers; would it be worthwhile to make the two cases unsymmetric? In one case return 2 (or 3) indexes and in the other case just the buffer. Helmut
On 10/30/11 12:04 AM, Helmut Eller wrote:
* Raymond Toy [2011-10-30 03:28] writes:
On 10/29/11 4:18 AM, Helmut Eller wrote:
I'm wondering how STRING-TO-OCTETS can be used with a fixed sized byte buffer. For example we want to write a long string to a byte stream using a fixed size byte buffer. STRING-TO-OCTETS seems to return the buffer and the number of bytes written. E.g.
(let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\a))) (stream:string-to-octets string :external-format :utf32 :buffer buffer))
Is this better:
(let ((buffer (make-array 19 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\u+3b2))) (multiple-value-bind (b p i last) (stream:string-to-octets string :external-format :utf8 :buffer buffer) (values b p i last)))
#(206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206 178 206) 19 9 18
9 is the number of characters converted, 18 is the index+1 in the buffer where the last valid octet was placed. The last octet is the first octect of the 2-octet utf8 encoding for #\u+3b2. Is 19 the number of octets written? Or is it an index? Might be nice to either use counts or indexes consistently.
I'm not sure that I understand the purpose of the 18. Is this something that is needed later?
Yeah, that was kind of messy. This is what I currently have: * (let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\u+f012))) (stream:string-to-octets string :external-format :utf8 :buffer buffer)) #(239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128) 18 6 So 18 is the number of valid octets actually written. (The last two octets form an incomplete conversion.) The 6 is the number of characters consumed to produce those 18 octets. For the case where no buffer is specified, a new buffer is created and the second return value is the number of octets written (same as the buffer length), and the third value is the number of characters (length of the string). Is that better? Ray
* Raymond Toy [2011-10-30 16:30] writes:
I'm not sure that I understand the purpose of the 18. Is this something that is needed later?
Yeah, that was kind of messy. This is what I currently have:
* (let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\u+f012))) (stream:string-to-octets string :external-format :utf8 :buffer buffer))
#(239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128) 18 6
So 18 is the number of valid octets actually written. (The last two octets form an incomplete conversion.) The 6 is the number of characters consumed to produce those 18 octets.
For the case where no buffer is specified, a new buffer is created and the second return value is the number of octets written (same as the buffer length), and the third value is the number of characters (length of the string).
Is that better?
Yes, looks good. Helmut
* Helmut Eller [2011-10-30 18:28] writes:
* Raymond Toy [2011-10-30 16:30] writes:
I'm not sure that I understand the purpose of the 18. Is this something that is needed later?
Yeah, that was kind of messy. This is what I currently have:
* (let ((buffer (make-array 20 :element-type '(unsigned-byte 8))) (string (make-string 100 :initial-element #\u+f012))) (stream:string-to-octets string :external-format :utf8 :buffer buffer))
#(239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128 146 239 128) 18 6
So 18 is the number of valid octets actually written. (The last two octets form an incomplete conversion.) The 6 is the number of characters consumed to produce those 18 octets.
For the case where no buffer is specified, a new buffer is created and the second return value is the number of octets written (same as the buffer length), and the third value is the number of characters (length of the string).
Is that better?
Yes, looks good.
BTW, it would also be useful to have a parameter to specify a start position in buffer. With that it would be possible to buffer up multiple small chunks. Helmut
On 10/30/11 3:40 PM, Helmut Eller wrote:
* Helmut Eller [2011-10-30 18:28] writes:
* Raymond Toy [2011-10-30 16:30] writes:
Is that better? Yes, looks good. BTW, it would also be useful to have a parameter to specify a start position in buffer. With that it would be possible to buffer up multiple small chunks. That could be solved by passing in displaced arrays. But you can't because string-to-octets wants a simple-array. That should probably be changed.
I'll into adding a :buffer-start parameter. Ray
participants (2)
-
Helmut Eller -
Raymond Toy