I am trying to parse some RTF, that I will come back from the server I can fix (and have to work using Rich Textbox Control), although some of the RTFs contain an additional "encoding" and some letters become corrupted.
The original string is as follows (and includes some of the characters used in Polish):
ąćęłńóśźż
hex encoded characters The RTF string that sends back looks like this
{\ lang1045 \ langfe1045 \ f16383 {\ 'b9 \' e6 \ 'ea \ b3 {\ f7 \ a8 \' bd \ ' A8 \ 'ae}' 9c \ '9f \' bf}}
I'm having trouble decoding the ñó characters in the returned string, they two Hex values represent each, while the sp The rest of the Ring is represented by a single hex values (required).
Use rich textbox control to "pars" RTF results in corrupter text (two vertical characters in the question are displayed as separate unwanted characters).
If I encode the plain string using the hex manually, the expected codepage (ANSI codepace for 1250, Latin 2, LCD 1045) will get the following:
\ 'B9 \' E6 \ "EA \" B3 \ F1 \ F3 \ 9C \ 9F \ 'bf \ / code> How can I forget how I < Strong> {\ f7 \ 'a8 \' bd \ 'a8 \' ae} And the string seen directly on the server looks ok, which means Central characters (if they are corrupted somewhere in the conversion sending corrupted). I'm not sure the problem is on the server side (as I have no control over it), but since the server is used for many translation functions, I think that returned The string is okay
I am going through RTF specs, but there is a signal about this type of combination of any encoding.
I do not know why it is Ning, but it appears to be encoding (or something quite adequately ).
Perhaps the server tries to match something "smart" to find the letters, or the default character encoding of the server is GBK or so, and those characters (and only those) are also in GBK, So it likes.
I found out that the humiliating hex code ( A8 BD A8 AE
) is a simple HTML file in the form of bytes, so I can go through the encoding of my browser. That matches anything:
& lt; Html & gt; & Lt; Body & gt; ¨½¨® & lt; / Body & gt; & Lt; / Html & gt;
To my surprise, my browser came directly with "ń".
Comments
Post a Comment