unicode - Delphi 2009 RawByteString vagaries -


Assume that you want to display raw byte content of UTF 8 string for some adverse reason.

  var utf8Str: UTF8String; Start utf8Str: '' ąćęłńóśźż '; End;  

(1) It does not, it displays the readable form:

  memo1.Lines.Add ( RawByteString (Utf8Str); // Output: 'ąćęłńóśźż'  

(2) , however, "works" - note the inclusion:

  memo1.Lines.Add ('x' + RawByteString (utf8Str)); // Output: 'Xand ...' ĂłĹ> źż ' 

I understand (1), although the compulsorion is forced to unicode stringent participation is ever stopped, a robotstrings Like displaying verse - though, (2) why does behavior change?

(3) is still a stranger - we reverse unity:

  memo1.Lines.Add (RawByteString (utf8Str) + 'X'); // Output: 'ąćęłńóśźżx'  

I am reading new functional string types in Delphi and I thought how they work, but this is a puzzle.

to minimize the number of overloads required for the write by stasting function Exists for those who work with different flavors, with different codepeping for AnsiString s

In general, do not declare the variable of RawByteString type. Do not type values ​​of that type Do not combine variables of that type Only about things you can do:

  • To declare a parameter of this type (original intention)
  • Indexing on such parameters
  • Searching
  • Intelligent operation that checks the actual code page of the string, using the stringcode page function .

For example, please note that the string codepage function uses RawByteString as its own logic type. In this way, it will work with any AnsiString , rather than translating the codepage before passing it as an argument.

For your case, things like concatenation are largely undefined. The behavior has changed between RTM and Update 2, but when the RTL string completion function receives several strings with different code pages, there is no easy way to figure out how to use the code page for the final string should go. This is the only reason that you should not include them as you do.


Comments