You are here: PSPad forum > English discussion forum > Re: How to extract all Russian words from text?

Re: How to extract all Russian words from text?

Goto Page: 1 2 Next

#1 How to extract all Russian words from text?

Posted by: Haunebu | Date: 2021-04-12 16:17 | IP: IP Logged

How to extract all Russian words[string] from text?

Options: Reply | Quote | Up ^


#2 Re: How to extract all Russian words from text?

Posted by: pspad | Date: 2021-04-12 16:41 | IP: IP Logged

I don't know what do you exactly want.
From text or html or?
Do you want list of words? Use File info and you can get list of unique words

Options: Reply | Quote | Up ^


#3 Re: How to extract all Russian words from text?

Posted by: pspad | Date: 2021-04-12 16:44 | IP: IP Logged

How do you recognize russian word? Written in cyrilic?
Again strange request without any informations, examples,...
We are not mind readers, we don't own crystaline sphere, so we don't see into your head.

E.g.:
Putin was on the trip.
"Putin" is russian word and rest isn't?

Edited 2 time(s). Last edit at 2021-04-12 16:55 by pspad.

Options: Reply | Quote | Up ^


#4 Re: How to extract all Russian words from text?

Posted by: Haunebu | Date: 2021-04-12 19:38 | IP: IP Logged

I asked a very simple question. It is strange that such an intelligent person does not understand simple things.
You have any text in which you have e.g. Polish or English words and Cyrillic words. Now I want to extract only all Cyrillic words.

Example text:
w pierwszym i szóstym regułą Ганвлла Чпа Ale kler

Edited 2 time(s). Last edit at 2021-04-12 19:40 by Haunebu.

Options: Reply | Quote | Up ^


#5 Re: How to extract all Russian words from text?

Posted by: pspad | Date: 2021-04-12 19:47 | IP: IP Logged

Thank you for explanation.
Russian word doesn't mean that is written in cyrilic.
"stunde" is german word,"čas" is russian word, "hour" is english word.
I am glad that inteligent person like you understand now.

You can use regular expressions cause cyrilic uses different set of chars than latin. Regular expressions allows you to use interval given by char value.
Open key table, get start and stop of the cyrilic and use it.

Edited 1 time(s). Last edit at 2021-04-12 20:04 by pspad.

Options: Reply | Quote | Up ^


#6 Re: How to extract all Russian words from text?

Posted by: pspad | Date: 2021-04-12 20:03 | IP: IP Logged

Paterns for regular expression:
[\x{0400}-\x{04FF}]+
or use
[А-Яа-я]+

Options: Reply | Quote | Up ^


#7 Re: How to extract all Russian words from text?

Posted by: Haunebu | Date: 2021-04-12 20:03 | IP: IP Logged

A simple pattern [А-Яа-я] only finds everything, but does not export words.
Extract options Display matched string only
Only exports single Cyrillic letters in lines.
And where are all these words?

Edited 1 time(s). Last edit at 2021-04-12 20:03 by Haunebu.

Options: Reply | Quote | Up ^


#8 Re: How to extract all Russian words from text?

Posted by: Haunebu | Date: 2021-04-12 20:05 | IP: IP Logged

Works fine, just add "One or More" +

Options: Reply | Quote | Up ^


#9 Re: How to extract all Russian words from text?

Posted by: pspad | Date: 2021-04-12 20:06 | IP: IP Logged

Where is the "+" char in your expression? Please use copy/paste correctly.
Now you want to export word? You requested to find it.

Use COPY button in Find dialog to export it into new file

Options: Reply | Quote | Up ^


#10 Re: How to extract all Russian words from text?

Posted by: Haunebu | Date: 2021-04-12 20:15 | IP: IP Logged

I just lost the + in my regular expression, now it's ok

Options: Reply | Quote | Up ^


Goto Page: 1 2 Next





Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR