You are here: PSPad forum > English discussion forum > Re: Filter out part of string

Re: Filter out part of string

#1 Filter out part of string

Posted by: Gibboz2k | Date: 2013-09-16 11:54 | IP: IP Logged

Hi Everybody,

My first post here smiling smiley

I am facing the following problem and am not sure how to go about handling this:

I have a folder full of html files. I need to filter out part of a string from within this files to leave me with a list of results. Let me give you an example of what I need. In the html file could be a string class="abc error"><zshp:txt gv_textk="zz_example_text.123" />. I am interested in finding each position where gv_textk= pops up within the file and filter out what is between the two speech marks after this. In this case zz_example_text.13.

Ideally I am left with a list of these entries within all of the files in the directory which I can then copy into excel.

Any ideas how I can automate this process? (I did not find anything within the forum regarding this)

Thanks in advance,
Gibboz2k

Options: Reply | Quote | Up ^


#2 Re: Filter out part of string

Posted by: vbr | Date: 2013-09-16 14:26 | IP: IP Logged

Gibboz2k:
Hi Everybody,

My first post here smiling smiley

I am facing the following problem and am not sure how to go about handling this:

I have a folder full of html files. I need to filter out part of a string from within this files to leave me with a list of results. Let me give you an example of what I need. In the html file could be a string class="abc error"><zshp:txt gv_textk="zz_example_text.123" />. I am interested in finding each position where gv_textk= pops up within the file and filter out what is between the two speech marks after this. In this case zz_example_text.13.

Ideally I am left with a list of these entries within all of the files in the directory which I can then copy into excel.

Any ideas how I can automate this process? (I did not find anything within the forum regarding this)

Thanks in advance,
Gibboz2k

Hi,
if you always have the needed substring within one line (i.e. without linebreaks), you can use PSPad search capabilities, e.g. the regular expression search.
e.g. using the pattern:
gv_textk="[^"]*"
with the activated option [x] Regular expressions
and the button "Copy" in the search dialog, you get a list of all occurences of the needed attribute from the given file.

It is less comfortable, if there are many such files, there is a function
Search: Search and replace in files,
but unfortunately, it doesn't support regular expressions.

If the phrase gv_textk= is only used in this meaning and is unambiguous, you can first search for this plain substring:
gv_textk=
in the given directory (Search scope - (o) Selected directory, possibly with restricting to a certain file type or *.*

In the results list, you can use an icon "paper-sheet" (Open results in new document)

Than you can use the initially mentioned regular expression pattern on this pre-filtered data.

gv_textk="[^"]*"

(use the Copy function)

If you then only need the quoted content, in the next step just replace the previous output with another regex:
search for:
gv_textk="([^"]*)"
replace with
$1

If your data is more complex or it contains linebreaks within the "interesting" attributes or on the other hand there are multiple matches on some lines, it would be harder to extract this way.

hth,
vbr

Options: Reply | Quote | Up ^






Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR