Previous discussion: https://bugs.ruby-lang.org/issues/20394
Context
When trying to write pure Ruby gems that are competitive in term of performance with C extensions, a very common bottleneck is parsing of text based protocols and formats, such as the Redis RESP protocol, or even the PDF format (FYI @gettalong).
As a result, currently the most efficient way to parse integers in a string in Ruby, is to reimplement atoi using String#getbyte, which is a bit ridiculous.
Otherwise if you create a substring with String#slice or StringScanner#scan and then call to_i or Integer, instantiating the sub string and copying the bytes really tank the performance.
Proposal
Given that StringScanner is a default gem, is often involved in string parsing, and already act as a "pointer into a String", I think it's well positioned to offer an efficient way to parse an Integer without instantiating a useless temporary string.
Basically an optimized way to do scanner.scan(/\d+/).to_i.
The API could be any of:
scanner.scan(/\d+/, :to_i)
scanner.scan(/\d+/, Integer)
scanner.scan_integer(/\d+/)
Logically the two supported types would be Integer and Float, but perhaps others would be helpful for other protocols?
@kou as maintainer of strscan, do you have any opinion? I'm happy to put the work on this, but I'd need to know if the feature is desired, and which API would be deemed acceptable.
Also cc @tenderlove @mame from previous discussions.
Previous discussion: https://bugs.ruby-lang.org/issues/20394
Context
When trying to write pure Ruby gems that are competitive in term of performance with C extensions, a very common bottleneck is parsing of text based protocols and formats, such as the Redis RESP protocol, or even the PDF format (FYI @gettalong).
As a result, currently the most efficient way to parse integers in a string in Ruby, is to reimplement
atoiusingString#getbyte, which is a bit ridiculous.Otherwise if you create a substring with
String#sliceorStringScanner#scanand then callto_iorInteger, instantiating the sub string and copying the bytes really tank the performance.Proposal
Given that
StringScanneris a default gem, is often involved in string parsing, and already act as a "pointer into a String", I think it's well positioned to offer an efficient way to parse an Integer without instantiating a useless temporary string.Basically an optimized way to do
scanner.scan(/\d+/).to_i.The API could be any of:
scanner.scan(/\d+/, :to_i)scanner.scan(/\d+/, Integer)scanner.scan_integer(/\d+/)Logically the two supported types would be
IntegerandFloat, but perhaps others would be helpful for other protocols?@kou as maintainer of
strscan, do you have any opinion? I'm happy to put the work on this, but I'd need to know if the feature is desired, and which API would be deemed acceptable.Also cc @tenderlove @mame from previous discussions.