You are here: PSPad forum > English discussion forum > Text Regex

Text Regex

#1 Text Regex

Posted by: maki | Date: 2019-04-08 10:06 | IP: IP Logged

My regex is incorrect, what should I change?
Search for text with characters in one line.

^([\p{Cyrillic}]+[\-\.\,\!\…\?\(\)\„\”—0-9\s]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”—0-9\s]*|[\-\.\,\…\?\(\)\„\”—0-9]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”—0-9\s]*)$

Text included:
\p{Cyrillic}
!
!!
!!!
!!!!
?
??
???
… (unicode)
— (unicode)
-
.
..
...
,
0-9
(
)
„ (unicode)
„ (unicode)
"
\s (space)
\
/
\x(200b}
*
#
@
%

Edited 1 time(s). Last edit at 2019-04-08 10:09 by maki.

Options: Reply | Quote | Up ^


#2 Re: Text Regex

Posted by: maki | Date: 2019-04-08 10:54 | IP: IP Logged

Still wrong:

^([\p{Cyrillic}]+[\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*|[\-\.\,\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”\,\;\\\/\*\#\@\&\:\.x{200B}—0-9\s]*)$

Edited 2 time(s). Last edit at 2019-04-08 11:00 by maki.

Options: Reply | Quote | Up ^


#3 Re: Text Regex

Posted by: vbr | Date: 2019-04-08 14:37 | IP: IP Logged

Hi,
PSPad regex engine doesn't currently support several features you are using in your pattern:

e.g. the unicode properties \p{...} (e.g. \p{Cyrillic}) refering to unicode block or script or codepoint matching using \x{...} (there must be a backslash before x in this notation, or it is handled litterally, i.e. it had to be \x{200B} in a tool that supports it to match
# ​ (dec.: 8203) (hex.: 0x200b) # ​ ZERO WIDTH SPACE (Other, Format) (General Punctuation [8192-8303] [0x2000-0x206f])

you may use another tools that currently support these features, or it might be possible to replace the properties with character sets,
e.g. with [Ѐ-ԯ] you can match all codepoinds in the ranges 0x0400-0x04FF; Cyrillic and 0x0500-0x052F; Cyrillic Supplement
which might be sufficient for some usecases.

The codepoint or unicode notation is not supported - the respective characters must be entered directly in the pattern.

hth,
vbr

Options: Reply | Quote | Up ^


#4 Re: Text Regex

Posted by: maki | Date: 2019-04-08 15:11 | IP: IP Logged

But with Cyrillic I have no problems, I can replace the standard regex working in PSPad or other editor, but what about the next code? I can not deal with him.

Edited 2 time(s). Last edit at 2019-04-08 15:15 by maki.

Options: Reply | Quote | Up ^


#5 Re: Text Regex

Posted by: vbr | Date: 2019-04-08 17:03 | IP: IP Logged

Ok, the zero width space can be problematic too, but it can be copied directly and in can be handled in PSPad, even in search dialog - it should be the "invisible" character between the following parens:
(​)
UNfortunatley, the editor component in PSPad has som problems with selection of such characters, but it is in general possible to select/copy larger part of the surrounding test and (carefully) delet the other parts as needed.
It is also shown the same likeregular space in pspad as a dot marking whitespace.
vbr

Options: Reply | Quote | Up ^


#6 Re: Text Regex

Posted by: maki | Date: 2019-04-08 17:11 | IP: IP Logged

Ok, let's get rid of "ZERO WIDTH SPACE", I'm talking about normal Regex, let's not talk about Unicode anymore.

\char = match a literal

How to match all characters from the above on the forum?

Options: Reply | Quote | Up ^


#7 Re: Text Regex

Posted by: vbr | Date: 2019-04-09 08:02 | IP: IP Logged

Hi, what is the pattern supposed to match? - probably some (preferably short) samples of real text strings might be clearer.
If Cyrillic or some "special" - i.e. non-ascii punctuation should be matched, unicode definitely needs to be taken into account.
The current patterns are most likely more complicated than needed - e.g. you don't need escaping most of the literals in the character class (between [...]) - most likely only: - ^ [ and ] (with some further positional specificities).
vbr

Options: Reply | Quote | Up ^


#8 Re: Text Regex

Posted by: maki | Date: 2019-04-09 09:19 | IP: IP Logged

I will not give examples because there are too many different ones and a more accurate match would have been needed. There is no point in matching everything! It would be very bad. And that this is my work - I will not write private texts.
It simply has to contain Russian words and all char that I gave in early. That's all.

Options: Reply | Quote | Up ^


#9 Re: Text Regex

Posted by: maki | Date: 2019-04-21 09:05 | IP: IP Logged

SOLVED.

Options: Reply | Quote | Up ^






Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR