You are here: PSPad forum > English discussion forum > Re: extract urls preceeded by a predefined string in a html file
Posted by: Esgrimidor | Date: 2014-02-14 20:21 | IP: IP Logged
extract urls preceeded by a predefined string in a html file to a new file with one url per line.
How can i do that ?
Best Regards
--
Nice program indeed
Posted by: pspad | Date: 2014-02-14 20:38 | IP: IP Logged
To extract URLs into new file use:
open search dialog
click on the button with ! next to search field and choose Find URL
Pres Copy button
Posted by: Esgrimidor | Date: 2014-02-15 10:35 | IP: IP Logged
Running to try.
--
Nice program indeed
Posted by: Esgrimidor | Date: 2014-02-15 10:53 | IP: IP Logged
I think is failing.
The file have 412812 lines and seems interact but don't finish to create the target file.
Now I try delete the string appear in the search box at the beginning and let empty....
go well now...
23.668 url extracted each one in a different line.
But
How can i filter if i want only the urls preceeded by a predefined string.... ?
--
Nice program indeed
Posted by: pspad | Date: 2014-02-15 11:25 | IP: IP Logged
Modify the regular expression by adding string on the begin what will idenfify what you want
Posted by: Esgrimidor | Date: 2014-02-15 21:08 | IP: IP Logged
I will try and comment. I am not used to regular expressions.
Best Regards
--
Nice program indeed
Posted by: pspad | Date: 2014-02-15 21:37 | IP: IP Logged
If you want help from us, you need to provide some example of your lines. Sory, but we are not mind readers
Posted by: Esgrimidor | Date: 2014-02-19 12:16 | IP: IP Logged
Try again.
This is part of the file to extract urls ......
www.proof.com
[Ref]http://itv.com
gent asom.net
[Ref]http://www.imagen.org
and the predefined string is "[Ref]"
How can i do that ?
Best Regards
--
Nice program indeed
Posted by: pspad | Date: 2014-02-19 12:20 | IP: IP Logged
simple put the [REF] on the begin of existing regular epression:
\[REF\]
backslash are there due to brackets are used as control chars in regular expressions
Posted by: Andreas | Date: 2014-02-19 13:09 | IP: IP Logged
You can try with search and replace.
search:
\[Ref\](.*$)|.*
replace:
$1
$1 is the back reference to value in first parenthesis
| means OR
$ means END of line
But if you want the output cleaned up in a new file I think you have to write a script. Read the PSPad manual for scripting and use Javascript, therefor you get many help and tutorials.
Search for an online regex tester and a regex tutorial in your language.
Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR