You are here: PSPad forum > English discussion forum > Re: Conversion -> Remove HTML tags - option does not work
Posted by: maki | Date: 2017-12-12 03:56 | IP: IP Logged
Remove HTML tags - option does not work
<p>I want to extract the text</p>
Edited 3 time(s). Last edit at 2017-12-12 03:57 by maki.
Posted by: pspad | Date: 2017-12-12 05:55 | IP: IP Logged
The original file must be HTML like. Change syntax to HTML first
Posted by: maki | Date: 2017-12-12 14:14 | IP: IP Logged
But it still does not remove 100% of the HTML code!
For example, it does not remove:
/div
/div
/li!-- #comment-## --
li class="comment even thread-even depth-1" id="dsq-comment-1926"
div id="dsq-comment-header-1926" class="dsq-comment-header"
cite id="dsq-cite-1926"
a id="dsq-author-user-1926" href="http://Website" target="_blank" rel="nofollow"Martyna/a
/cite
/div
div id="dsq-comment-body-1926" class="dsq-comment-body"
div id="dsq-comment-message-1926" class="dsq-comment-message"ppotrafi zatrzymać uciekający czas#8230;/p
/div
/div
/li!-- #comment-## --
li class="comment odd alt thread-odd thread-alt depth-1" id="dsq-comment-1927"
div id="dsq-comment-header-1927" class="dsq-comment-header"
Edited 1 time(s). Last edit at 2017-12-12 14:15 by maki.
Posted by: pspad | Date: 2017-12-12 15:54 | IP: IP Logged
sorry, but /div isn't tag
Tag means content inside < and >
Send exact example what isn't removed
Posted by: maki | Date: 2017-12-12 17:46 | IP: IP Logged
If the browser saves as html, so it should be html.
I do not know why PSPad does not detect ANY page as HTML.
PSPad ALWAYS incorrectly converts the code into HTML code.
But no matter, I have another question ...
Maybe in diffrent way...
without removing the code.
How to extract any text between the <p>text</p> tags
Edited 3 time(s). Last edit at 2017-12-12 17:49 by maki.
Posted by: epement | Date: 2018-01-22 21:21 | IP: IP Logged
Tell me exactly what you mean by the term "extract". Extract to where?
Have you tried a find/replace for those symbols? Example:
Find: </?p>
Options: Regular Expressions
Replace: {nothing}
What if the <p> tag contains a class or style attribute, such as <p class="indent"> ?
What should happen then?
What about other embedded tags, such as <i>, <em>, <a ...>, <b>, etc. ??
To simply remove all tags between angle brackets, do a find/replace like this:
Find: <[^<>]*>
Options: Regular Expressions
Replace: {nothing}
I have feeling that you are asking for something more than this.
Eric
Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR