You are here: PSPad forum > English discussion forum > Re: How to extract all links from plain text?

Re: How to extract all links from plain text?

Goto Page: Previous1 2 3 4 5 Next

#31 Re: How to extract all links from plain text?

Posted by: pspad | Date: 2020-01-19 11:35 | IP: IP Logged

OK. In the Edit menu / special conversion are options:
URL -> text
text -> URL

But I found it doesn't work correctly for characters like Chinese. I will fix it and it will be available in next developer build.

I will check other conversion there if they are fully unicode ready

Options: Reply | Quote | Up ^


#32 Re: How to extract all links from plain text?

Posted by: maki | Date: 2020-01-19 13:32 | IP: IP Logged

PSPad incorrectly changes the encoding:
Percent-encoding to Unicode (Curent Encoding)

Should / work
Percent-encoding to Unicode (UTF-8)

Edited 1 time(s). Last edit at 2020-01-19 13:33 by maki.

Options: Reply | Quote | Up ^


#33 Re: How to extract all links from plain text?

Posted by: pspad | Date: 2020-01-19 14:40 | IP: IP Logged

I don't know what is percent-encoding and why it should be UTF-8? Where?

Please when you write anything, try to write it as I would be able answer you witout any other question

Options: Reply | Quote | Up ^


#34 Re: How to extract all links from plain text?

Posted by: maki | Date: 2020-01-19 15:54 | IP: IP Logged

In PSPad you will not see the name if it is Unicode.
However, choosing regular Unicode encoding will destroy Charset. UTF-8 must be here. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

Edited 1 time(s). Last edit at 2020-01-19 15:55 by maki.

Options: Reply | Quote | Up ^


#35 Re: How to extract all links from plain text?

Posted by: pspad | Date: 2020-01-19 16:20 | IP: IP Logged

What are you running to get URL safe encoded?

Options: Reply | Quote | Up ^


#36 Re: How to extract all links from plain text?

Posted by: maki | Date: 2020-01-21 11:46 | IP: IP Logged

@pspad - thanks for the help but your regex is good but still not perfect, you have to try to improve.

example detect invalid URL

wwwww.1.com

Options: Reply | Quote | Up ^


#37 Re: How to extract all links from plain text?

Posted by: maki | Date: 2020-01-21 12:39 | IP: IP Logged

maki:
@pspad - thanks for the help but your regex is good but still not perfect, you have to try to improve.

example detect invalid URL

wwwww.1.com

maybe???

w{3,3}

Options: Reply | Quote | Up ^


#38 Re: How to extract all links from plain text?

Posted by: pspad | Date: 2020-01-21 12:48 | IP: IP Logged

maki:

maybe???

w{3,3}

What in case when there will be 1.com in the text only? Is it valid url or not?
www in the url isn't mandatory.
I think you are not able create universal expression what will handle all of your possible or broken URL.
Made it in 2 steps:
1. extract anything from your text what can looks like URL
2. check validity

Edited 1 time(s). Last edit at 2020-01-21 12:48 by pspad.

Options: Reply | Quote | Up ^


#39 Re: How to extract all links from plain text?

Posted by: maki | Date: 2020-01-22 09:22 | IP: IP Logged

As the old Russian proverb says: "You won't drink all the vodka, you won't have all the women, but you have to try!"

That's why it's not worth giving up. Genius apparently lies in simplicity, so he still tries.

mathiasbynens.be/demo/url-regex

Edited 1 time(s). Last edit at 2020-01-22 09:23 by maki.

Options: Reply | Quote | Up ^


#40 Re: How to extract all links from plain text?

Posted by: maki | Date: 2020-01-26 18:47 | IP: IP Logged

Quote:

I think you are not able create universal expression what will handle all of your possible or broken URL.

Everything is possible and success has occurred today.
Of course, the syntax will not work in PSPad.
208 characters

<a alt="<>" href="http://www.stairws.com">
w5ww.com www.com http://com.pl
www.com
wwww.com
1.com

Test Notepad++ Work. This is the perfect regular expression (URL Match) for today.

Edited 3 time(s). Last edit at 2020-01-26 18:50 by maki.

Options: Reply | Quote | Up ^


Goto Page: Previous1 2 3 4 5 Next





Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR