You are here: PSPad forum > English discussion forum > HTML to TXT (Unicode/Cyrylic Support)
Posted by: maki | Date: 2020-01-25 08:51 | IP: IP Logged
I am looking for a good converter.
HTML to TXT
I tried several tools, but damaged the Russian Cyrillic.
Do you know any good tool for Cyrillic?
Size: 220 MB HTML
Any help, ideas highly recommended.
In addition, no tool test can handle a large HTML file. Crash of each tool!
Edited 1 time(s). Last edit at 2020-01-25 08:51 by maki.
Posted by: maki | Date: 2020-01-25 09:42 | IP: IP Logged
And is it possible to convert many thousands of HTML (CP-1251) to TXT (UTF-8) at the same time? How to do it?
I want to create one simple text file from many thousands of HTML files. Unfortunately, all tool cannot handle CP-1251 encoding and no tool can handle so much data 220 MB (Total Size HTML)
How to deal with all these problems?
Posted by: pspad | Date: 2020-01-25 09:44 | IP: IP Logged
Just tested with 112 MB file in Cyrilic and PSpad 32b.
Remove tags works without any problem and didn't break cyrilic encoding.
It tooks 22s and I got plain text in cyrilic.
Another way is open your file in browser, select all, copy text and paste it into editor.
Posted by: pspad | Date: 2020-01-25 09:52 | IP: IP Logged
maki:And is it possible to convert many thousands of HTML (CP-1251) to TXT (UTF-8) at the same time? How to do it?
I want to create one simple text file from many thousands of HTML files. Unfortunately, all tool cannot handle CP-1251 encoding and no tool can handle so much data 220 MB (Total Size HTML)
How to deal with all these problems?
Use PSPad batch encoding from the encoding menu.
Posted by: maki | Date: 2020-01-25 09:54 | IP: IP Logged
I want the text to be separated correctly.
Don't mix everything.
Posted by: maki | Date: 2020-01-25 09:56 | IP: IP Logged
No 64-bit browser can handle such a file. Always a crash, Aw Snap etc. Various Error/Freeze
Too Big File => "Not respond" message
Edited 3 time(s). Last edit at 2020-01-25 09:57 by maki.
Posted by: maki | Date: 2020-01-25 09:59 | IP: IP Logged
PSPad => File => Import All File(s)... Missing option
PSPad option HTML merge "strip headers/footers" ???
Edited 1 time(s). Last edit at 2020-01-25 10:01 by maki.
Posted by: maki | Date: 2020-01-25 10:06 | IP: IP Logged
How to ALSO include javascript code?
<("[^"]*"|'[^']*'|[^'">])*>
nd fix
Example Любой объект, реально
Edited 3 time(s). Last edit at 2020-01-25 10:10 by maki.
Posted by: pspad | Date: 2020-01-25 10:10 | IP: IP Logged
maki:I want the text to be separated correctly.
Don't mix everything.
Thank you for explanation, as usually very exact.
Find another tool. I am sorry, but I don't have time to write tens of question to get any useful information related to problem from you.
Browse your big file in web broser and copy text from webbrowser window. This is the best way how to get exact text.
Posted by: pspad | Date: 2020-01-25 10:11 | IP: IP Logged
maki:PSPad => File => Import All File(s)... Missing optionPSPad option HTML merge "strip headers/footers" ???
Why? It Isn't MAKI editor. PSpad doesn't serve to you only.
Write script for it.
Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR