You are here: PSPad forum > English discussion forum > U+XXXXX characters at status bar

U+XXXXX characters at status bar

#1 U+XXXXX characters at status bar

Posted by: Freeman | Date: 2015-03-29 15:11 | IP: IP Logged

Because PSPad has full Unicode support since build 2656, I want to discuss about character codes at status bar. I'm not sure, what solution will be better.

PSPad uses monospace font, so characters higher than U+FFFF are visible as two different codes -- surrogate pair. We can see their codes at status bar, like $D83D and $DE0A. But there is no way to see full character code (code point), like #128522 or U+1F60A.

Because surrogate pair components can be pointed by caret, existing behavior is right. Is it possible to extend status band and show full code point as separate codes, for any surrogate pair component, like:
? 55357 $D83D → #128522, U+1F60A
? 56842 $DE0A → #128522, U+1F60A

Or, as alternative way, add a new band to status bar, especially for code points?

Is status bar Unicode-aware now? I cannot see characters like → or ə on it, but character codes are right. Why do duplicate the character itself at status bar?

Options: Reply | Quote | Up ^


#2 Re: U+XXXXX characters at status bar

Posted by: pspad | Date: 2015-03-29 15:22 | IP: IP Logged

Statusbar is unicode component, but I don't know how to recognize that char is surrogate pair. Editor presents it as 2 characters.

Options: Reply | Quote | Up ^


#3 Re: U+XXXXX characters at status bar

Posted by: Freeman | Date: 2015-03-29 15:28 | IP: IP Logged

pspad:
Statusbar is unicode component

So why it shows '?' (question character) instead of character like 'ə' (U+0259, not higher Unicode).

Options: Reply | Quote | Up ^


#4 Re: U+XXXXX characters at status bar

Posted by: pspad | Date: 2015-03-29 15:48 | IP: IP Logged

Because of me. There were non unicode functions used what breaks unicode chars. Fixed.

I will move PSpad to unicode Delphi to get full unicode support everywhere, but it will take some time.

Options: Reply | Quote | Up ^


#5 Re: U+XXXXX characters at status bar

Posted by: vbr | Date: 2015-03-29 16:49 | IP: IP Logged

pspad:
Statusbar is unicode component, but I don't know how to recognize that char is surrogate pair. Editor presents it as 2 characters.

Hi,
if there would be a need for it, the original unicode character can be computed from the surrogates.
cf. e.g. the info on the following page (below the interactive conversion forms and tables there is some general background including the conversion math:

www.russellcottrell.com

and even some sample javascript code is given there.
It would be possible to use this via PSPad scripting, but the usage wouldn't be straightforward (scripts currently can't access statusbar, it would be prompts called with keyboard shorcuts).

It could most likely be coded internally to - possibly with some checks for invalid surrogates outside of the regular pairs etc.

regards
vbr

Options: Reply | Quote | Up ^


#6 Re: U+XXXXX characters at status bar

Posted by: pspad | Date: 2015-03-29 17:08 | IP: IP Logged

This isn't problem. But in case when is it surrogate pair, I need to read 2 bytes and calculate real char code. How can I know if it is surrogate pair?
Is it rule that surrogate pair starts always with D8xx followed by DCxx..DFxx?
If is it rule, I can take 2 bytes and calculate char value.

Options: Reply | Quote | Up ^


#7 Re: U+XXXXX characters at status bar

Posted by: vbr | Date: 2015-03-29 20:43 | IP: IP Logged

pspad:
This isn't problem. But in case when is it surrogate pair, I need to read 2 bytes and calculate real char code. How can I know if it is surrogate pair?
Is it rule that surrogate pair starts always with D8xx followed by DCxx..DFxx?
If is it rule, I can take 2 bytes and calculate char value.

I believe, it is a rule for a valid utf-16 encoding, see:
unicode.org

"
Q: What are surrogates?

A: Surrogates are code points from two special ranges of Unicode values, reserved for use as the leading, and trailing values of paired code units in UTF-16. Leading, also called high, surrogates are from D800 /16 to DBFF /16, and trailing, or low, surrogates are from DC00 /16 to DFFF /16. They are called surrogates, since they do not represent characters directly, but only as a pair.
"

In the next chapter of that page, there is another info about the conversion, but it is probably the same like the previously linked page.

(It is recommended in the unicode standard to treat invalid use of surrogate as errors, but in a text ditor I would rather prefer to keep them and report as individual characters.)

vbr

Options: Reply | Quote | Up ^


#8 Re: U+XXXXX characters at status bar

Posted by: pspad | Date: 2015-03-30 05:29 | IP: IP Logged

OK, next build will show surrogate pairs on statusbar

Options: Reply | Quote | Up ^






Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR