You are here: PSPad forum > English discussion forum > div class tag regex issue
Posted by: maki | Date: 2020-01-27 15:46 | IP: IP Logged
<div class="wall_post_text">Музыка...т жизни: Ничего Невозможного Нет. Люблю каждого, каждому дарю частичку света и радости. <br><br>Предпочитаю слова и предложения на русском литературном языке ресных людей, с кем можно говорить, говорить обо всем
<br><br>Кремён, СПб, Почтамт до востребования, 190000<br><br>жду:*</div>
How to extract it as text, with separation.
Музыка...т жизни: Ничего Невозможного Нет. Люблю каждого, каждому дарю частичку света и радости.
Предпочитаю слова и предложения на русском литературном языке ресных людей, с кем можно говорить, говорить обо всем
Кремён, СПб, Почтамт до востребования, 190000
жду:*
My expression is not perfect. Please help!
(?<=div class="wall_post_text">).[^\[\]\{\}]*?(?=<br>)|(?<=br>).[^\[\]\{\}]*?(?=<br>)
Edited 3 time(s). Last edit at 2020-01-27 15:53 by maki.
Posted by: Vany | Date: 2020-01-27 16:24 | IP: IP Logged
What about to replace <br><br> with \n at first step?
--
Vany
(PSPad 5.5.1.812 x32, W10h/p x64 en/cs)
Posted by: pspad | Date: 2020-01-27 16:32 | IP: IP Logged
Maki, you are asking the same again and again.
Last time you open your HTML in browser, copy text from it and you were happy. Do it again, you will spare your time, you will spare our time and you will be happy again.
Posted by: maki | Date: 2020-01-27 16:41 | IP: IP Logged
PSPad - The first issue is fully resolved
HTML to TXT
Now I have a second point. Much harder. Another 1GB file! 1000.000.000 character
Java Log HTML to HTML
It is not easy to convert. Very Complex code.
Vany not work for me.
Edited 2 time(s). Last edit at 2020-01-27 16:44 by maki.
Posted by: maki | Date: 2020-01-27 17:26 | IP: IP Logged
I want extract text:
<div class="wall_post_text">extract text</div>
Edited 2 time(s). Last edit at 2020-01-27 17:28 by maki.
Posted by: maki | Date: 2020-01-27 17:52 | IP: IP Logged
still wrong regex:
<a\s+(?:[^>]*?\s+)?href=\\"\\/wall-(\d+)\?q=.*?<\\/span>
OR
<a\s+(?:[^>]*?\s+)?href=\\"\\/wall\-(\d+)\?q\=.+<\\/span>
Edited 1 time(s). Last edit at 2020-01-27 17:59 by maki.
Posted by: pspad | Date: 2020-01-27 18:35 | IP: IP Logged
If you want to extract your lines, use search dialog, search for regular expression:
<div class="wall_post_text">(.*)</div>
and use COPY button
It will copy only lines with content you are looking for
Second step with result:
Search: <div class="wall_post_text">(.*)</div>
Replace: $1
Posted by: maki | Date: 2020-01-27 18:50 | IP: IP Logged
[Window Title]
Info
[Content]
Occurrence of "<div class="wall_post_text">(.*)</div>" was found 0 times
[OK]
Unfortunately it doesn't work. The tags are completely different in the log.
Example:
<div class=\"wall_post_text\"><a href=\"\/feed?section=search&q=%23%D0%91%D0%9F_%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F\">#БП_Россия<\/a><br><br>Привет!<br>Меня деле.<br>Больше всего я люблю учиться,путетвовать и учить языки. В собеседнике ищу желание переписываться и только. Мы вместе учить языки,рассказывать о своих будх, дискутировать на проблемные темы и все что ты захочешь.<br>Адрес дам в лс.<\/div>
Posted by: pspad | Date: 2020-01-27 18:57 | IP: IP Logged
Why do you send example different from your real text?
The regular expression for your current text is:
<div class=\\"wall_post_text\\">(.*)<\\/div>
If text you have sent is different, modify regular expression
Posted by: maki | Date: 2020-01-27 19:02 | IP: IP Logged
Because the text contains private data. That's why I limited this possibility. If you want a real text then I will send by e-mail etc.
Editor PSPad - freeware editor, © 2001 - 2025 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR