You are here: PSPad forum > English discussion forum > HTML to TXT (Unicode/Cyrylic Support)

HTML to TXT (Unicode/Cyrylic Support)

Goto Page: 1 2 Next

#1 HTML to TXT (Unicode/Cyrylic Support)

Posted by: maki | Date: 2020-01-25 08:51 | IP: IP Logged

I am looking for a good converter.
HTML to TXT
I tried several tools, but damaged the Russian Cyrillic.
Do you know any good tool for Cyrillic?

Size: 220 MB HTML

Any help, ideas highly recommended.

In addition, no tool test can handle a large HTML file. Crash of each tool!

Edited 1 time(s). Last edit at 2020-01-25 08:51 by maki.

Options: Reply | Quote | Up ^


#2 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: maki | Date: 2020-01-25 09:42 | IP: IP Logged

And is it possible to convert many thousands of HTML (CP-1251) to TXT (UTF-8) at the same time? How to do it?
I want to create one simple text file from many thousands of HTML files. Unfortunately, all tool cannot handle CP-1251 encoding and no tool can handle so much data 220 MB (Total Size HTML)
How to deal with all these problems?

Options: Reply | Quote | Up ^


#3 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: pspad | Date: 2020-01-25 09:44 | IP: IP Logged

Just tested with 112 MB file in Cyrilic and PSpad 32b.
Remove tags works without any problem and didn't break cyrilic encoding.
It tooks 22s and I got plain text in cyrilic.

Another way is open your file in browser, select all, copy text and paste it into editor.

Options: Reply | Quote | Up ^


#4 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: pspad | Date: 2020-01-25 09:52 | IP: IP Logged

maki:
And is it possible to convert many thousands of HTML (CP-1251) to TXT (UTF-8) at the same time? How to do it?
I want to create one simple text file from many thousands of HTML files. Unfortunately, all tool cannot handle CP-1251 encoding and no tool can handle so much data 220 MB (Total Size HTML)
How to deal with all these problems?

Use PSPad batch encoding from the encoding menu.

Options: Reply | Quote | Up ^


#5 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: maki | Date: 2020-01-25 09:54 | IP: IP Logged

I want the text to be separated correctly.
Don't mix everything.

Options: Reply | Quote | Up ^


#6 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: maki | Date: 2020-01-25 09:56 | IP: IP Logged

No 64-bit browser can handle such a file. Always a crash, Aw Snap etc. Various Error/Freeze
Too Big File => "Not respond" message

Edited 3 time(s). Last edit at 2020-01-25 09:57 by maki.

Options: Reply | Quote | Up ^


#7 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: maki | Date: 2020-01-25 09:59 | IP: IP Logged

PSPad => File => Import All File(s)... Missing option

PSPad option HTML merge "strip headers/footers" ???

Edited 1 time(s). Last edit at 2020-01-25 10:01 by maki.

Options: Reply | Quote | Up ^


#8 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: maki | Date: 2020-01-25 10:06 | IP: IP Logged

How to ALSO include javascript code?

<("[^"]*"|'[^']*'|[^'">])*>

nd fix &nbsp;

Example &nbsp; &nbsp;&nbsp; &nbsp;Любой объект, реально

Edited 3 time(s). Last edit at 2020-01-25 10:10 by maki.

Options: Reply | Quote | Up ^


#9 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: pspad | Date: 2020-01-25 10:10 | IP: IP Logged

maki:
I want the text to be separated correctly.
Don't mix everything.

Thank you for explanation, as usually very exact.
Find another tool. I am sorry, but I don't have time to write tens of question to get any useful information related to problem from you.

Browse your big file in web broser and copy text from webbrowser window. This is the best way how to get exact text.

Options: Reply | Quote | Up ^


#10 Re: HTML to TXT (Unicode/Cyrylic Support)

Posted by: pspad | Date: 2020-01-25 10:11 | IP: IP Logged

maki:
PSPad => File => Import All File(s)... Missing option

PSPad option HTML merge "strip headers/footers" ???

Why? It Isn't MAKI editor. PSpad doesn't serve to you only.
Write script for it.

Options: Reply | Quote | Up ^


Goto Page: 1 2 Next





Editor PSPad - freeware editor, © 2001 - 2020 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR