You are here: PSPad forum > English discussion forum > Re: How to extract all Russian words from text?
Posted by: Haunebu | Date: 2021-04-12 16:17 | IP: IP Logged
How to extract all Russian words[string] from text?
Posted by: pspad | Date: 2021-04-12 16:41 | IP: IP Logged
I don't know what do you exactly want.
From text or html or?
Do you want list of words? Use File info and you can get list of unique words
Posted by: pspad | Date: 2021-04-12 16:44 | IP: IP Logged
How do you recognize russian word? Written in cyrilic?
Again strange request without any informations, examples,...
We are not mind readers, we don't own crystaline sphere, so we don't see into your head.
E.g.:
Putin was on the trip.
"Putin" is russian word and rest isn't?
Edited 2 time(s). Last edit at 2021-04-12 16:55 by pspad.
Posted by: Haunebu | Date: 2021-04-12 19:38 | IP: IP Logged
I asked a very simple question. It is strange that such an intelligent person does not understand simple things.
You have any text in which you have e.g. Polish or English words and Cyrillic words. Now I want to extract only all Cyrillic words.
Example text:
w pierwszym i szóstym regułą Ганвлла Чпа Ale kler
Edited 2 time(s). Last edit at 2021-04-12 19:40 by Haunebu.
Posted by: pspad | Date: 2021-04-12 19:47 | IP: IP Logged
Thank you for explanation.
Russian word doesn't mean that is written in cyrilic.
"stunde" is german word,"čas" is russian word, "hour" is english word.
I am glad that inteligent person like you understand now.
You can use regular expressions cause cyrilic uses different set of chars than latin. Regular expressions allows you to use interval given by char value.
Open key table, get start and stop of the cyrilic and use it.
Edited 1 time(s). Last edit at 2021-04-12 20:04 by pspad.
Posted by: pspad | Date: 2021-04-12 20:03 | IP: IP Logged
Paterns for regular expression:
[\x{0400}-\x{04FF}]+
or use
[А-Яа-я]+
Posted by: Haunebu | Date: 2021-04-12 20:03 | IP: IP Logged
A simple pattern [А-Яа-я] only finds everything, but does not export words.
Extract options Display matched string only
Only exports single Cyrillic letters in lines.
And where are all these words?
Edited 1 time(s). Last edit at 2021-04-12 20:03 by Haunebu.
Posted by: Haunebu | Date: 2021-04-12 20:05 | IP: IP Logged
Works fine, just add "One or More" +
Posted by: pspad | Date: 2021-04-12 20:06 | IP: IP Logged
Where is the "+" char in your expression? Please use copy/paste correctly.
Now you want to export word? You requested to find it.
Use COPY button in Find dialog to export it into new file
Posted by: Haunebu | Date: 2021-04-12 20:15 | IP: IP Logged
I just lost the + in my regular expression, now it's ok
Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR