You are here: PSPad forum > English discussion forum > How to combine several regular expressions

How to combine several regular expressions

Goto Page: Previous1 2 3 4 5 Next

#21 Re: How to combine several regular expressions

Posted by: maki | Date: 2017-10-30 11:34 | IP: IP Logged

However, it is not possible to change the *.log file to plain text.
Theoretically, you can only extract text from a file, but I can never clear the full code and I have to manually remove hundreds of thousands of unnecessary code.
this is a massacre job.
Log register all content of any page (if supported).
I have a parser that, in addition to the connection log and executable, contains useful text (for example, descriptions from the social network)

Mini-example:

1481405751408-12/10/16 10:35:51 [jd.http.Browser(loadConnection)] ->
{"response":[17129,{"id":,"to_id":-26,"date":14,"post_type":"post","text":"#БП_Россия<br><br>Анонимно пожалуйста...<br><

Edited 1 time(s). Last edit at 2017-10-30 11:36 by maki.

Options: Reply | Quote | Up ^


#22 Re: How to combine several regular expressions

Posted by: pspad | Date: 2017-10-30 12:19 | IP: IP Logged

As the first, you need to open LOG in correct encoding.
If there is mix of UTF-8 and ANSI, you can't open it as UTF-8
In my last answer I wrote you how to reopen file in correct encoding.

Without knowledges of your file content or without sample, I am not able to help you more.

Options: Reply | Quote | Up ^


#23 Re: How to combine several regular expressions

Posted by: maki | Date: 2017-10-30 14:33 | IP: IP Logged

EmEditor correctly opens the file, automatically detecting the encoding as UTF8

Sample text
<br>А __<br>Так же ", \"сонный\", \" убитый\". Глупоо человек доверяет мне.<br>

need to correct the regex

<br>text<br>

and <br>text</br>

(?<=<br>)[^\\\[\]\{\}]*?(?=(</br>|<br>))

I do not know what will be the best option for extracting text because I do not want to mix text that contains a very long text line 10000+

My Set view:
Wrap by Characters

Issue 2:
Try to open the wrong encoding (can not correctly select the text)

Reproduction: Please broken CHARSET and open and try to mark the damaged charset

Edited 3 time(s). Last edit at 2017-10-30 14:40 by maki.

Options: Reply | Quote | Up ^


#24 Re: How to combine several regular expressions

Posted by: maki | Date: 2017-11-01 09:13 | IP: IP Logged

@pspad - This is a bug !!!
Please copy the text to the program and try to select. Will not succeed !!!
I checked this and it is a real bug. Why not fix it?

Copy/paste text:
pastebin.com

I need to fix the charset automatically, but I do not know if there is such a way in PSPad?

Unknown Encoding/Decoding to Russian Encoding/Decoding etc.

Options: Reply | Quote | Up ^


#25 Re: How to combine several regular expressions

Posted by: pspad | Date: 2017-11-01 09:24 | IP: IP Logged

Of course is hard to select such text, cause due to wrong encoding it contains control chars.
I tried to get wrong encoded text, but I wasn't successful. PSPad detected correctly azbuka and UTF-8. Can you prepare me some sample what is wrongly detected?
Without it I can't do nothing.

Options: Reply | Quote | Up ^


#26 Re: How to combine several regular expressions

Posted by: maki | Date: 2017-11-01 10:08 | IP: IP Logged

(?<=<br>)[^\\\[\]\{\}]*?(?=(</br>|<br>))
or
(?<=<br>)[^\\\[\]\{\}]*?(?=(<\/br>|<br>))

Regex does not work properly because the text is \
What should be changed in regex?

I will not send you the whole file because it contains private data in huge quantities,
But what should I send to show bad coding? Because it shows.

Options: Reply | Quote | Up ^


#27 Re: How to combine several regular expressions

Posted by: maki | Date: 2017-11-01 10:11 | IP: IP Logged

pspad:
Of course is hard to select such text, cause due to wrong encoding it contains control chars.
I tried to get wrong encoded text, but I wasn't successful. PSPad detected correctly azbuka and UTF-8. Can you prepare me some sample what is wrongly detected?
Without it I can't do nothing.

example: www33.zippyshare.com

Options: Reply | Quote | Up ^


#28 Re: How to combine several regular expressions

Posted by: pspad | Date: 2017-11-01 10:38 | IP: IP Logged

Just tested. When I save it as ANSI 1251, PSPad isn't able to detect correct encoding cause there isn't enough cyrilic characters.
In this case I change encoding in menu Encoding to Ansi 1251 and reload (Ctrl+R) to see correct content

When I save your sample to UTF-8 no BOM and reopen it. PSpad open it correctly as UTF-8 file.

Options: Reply | Quote | Up ^


#29 Re: How to combine several regular expressions

Posted by: pspad | Date: 2017-11-01 10:41 | IP: IP Logged

maki:
(?<=<br>)[^\\\[\]\{\}]*?(?=(</br>|<br>))
or
(?<=<br>)[^\\\[\]\{\}]*?(?=(<\/br>|<br>))

Regex does not work properly because the text is \
What should be changed in regex?

I will not send you the whole file because it contains private data in huge quantities,
But what should I send to show bad coding? Because it shows.

If you want to search for '\' char, you need to escape it with '\' char. It means you will use \\, not a \\\

Options: Reply | Quote | Up ^


#30 Re: How to combine several regular expressions

Posted by: maki | Date: 2017-11-01 10:52 | IP: IP Logged

Now differently, it detects, again wrong
"РѕР ± СЂРѕРіРѕ РІСЂРμРјРμРЅРё СЃСѓС,РѕРє." РќРμС,

Too bad it does not automatically detect encoding.
Without unnecessary switching coding.

PSPad - wrong detect encoding
Notepad ++ (Plus) - correct encoding
EmEditor - correct encoding

Edited 1 time(s). Last edit at 2017-11-01 10:54 by maki.

Options: Reply | Quote | Up ^


Goto Page: Previous1 2 3 4 5 Next





Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR