You are here: PSPad forum > Bug report / Hlášení chyb > Re: encoding wrong detcted on large files

Re: encoding wrong detcted on large files

#1 encoding wrong detcted on large files

Posted by: JoKalliauer | Date: 2019-04-28 17:48 | IP: IP Logged

If I open this file homepage.boku.ac.at (https://homepage.boku.ac.at/jokalliauer/Internet/Bug.svg ) PsPad detects "ANSI Western European (1252)", but it is UTF-8-encoded.

But, however if I open this file homepage.boku.ac.at (https://homepage.boku.ac.at/jokalliauer/Internet/NoBug.svg ) PsPad detects "Unicode UTF-8 no BOM (65001)".

The file starts with `<?xml version="1.0" encoding="UTF-8"?>`, therfore encoding should be clear.

Options: Reply | Quote | Up ^


#2 Re: encoding wrong detcted on large files

Posted by: pspad | Date: 2019-04-28 18:03 | IP: IP Logged

PSPad uses for autodetection first about 10k of text. Because in your file is encoded image and few UTF-8 encoded chars is on the end, PSpad won't detect it.
Try to put some UTF-8 encoded chars before image.

Header information isn't relevat for autodetection, PSpad isn't not a web browser who intepret code content.

You can anytime change code page in the menu Encoding to UTF-8 and reload file (Ctrl+R)
If you have enabled save file state options, PSPad will remember code page for this file for you.

Options: Reply | Quote | Up ^


#3 Re: encoding wrong detcted on large files

Posted by: JoKalliauer | Date: 2019-04-29 15:19 | IP: IP Logged

Thank you for your fast answer!

PSpad is an lightweight editor that can edit SVG-files from 1MB-1GB, but I do not know if there are any special Unicode-characters in the file (since I only manipulate them marginally) and I would like to open svgs per default as UTF-8.

add unicode-character:
Since I am processing many svg-files (mostly UTF-8) on wikimedia-commons, which should be reuploaded, changing content (f.e. add a comment) just because of a very uncommon software-problem (Inkscape, Illustrator, notepad++, common Browsers,... do no not have this problem) is undesirable.

PSPad recognises file-types (highlighting), therfore I would like to change default coding according to the file-type. (SVGs are almost always UTF-8, even old ones.)

If this is not implemented:
How can I change the default encoding for all files to open? (I do not want any autodetection at all.)

Options: Reply | Quote | Up ^


#4 Re: encoding wrong detcted on large files

Posted by: pspad | Date: 2019-04-29 15:25 | IP: IP Logged

Program settings / files
There is default CP
You can use project, where you can specify CP too.

Options: Reply | Quote | Up ^


#5 Re: encoding wrong detcted on large files

Posted by: Andreas | Date: 2019-04-29 15:41 | IP: IP Logged

I have no problem opening your first linked file. It shows UTF-8 no BOM.

Here are my settings:
- format for new files: UNIX
- standard code page: UTF-8 no BOM

Toolbar settings:
- Coding: UTF-8 no BOM
- Coding: auto coding - deactivated
- Format: UNIX

If you test this consider that if you once have opened a file and pspad shows an unwanted codepage, that pspad saves this somewhere. So when you test my settings you allways have to test with a new file. To do this you can e.g. use a copy of your file.

ps I think the UNIX setting is irrelevant, is just for the completeness.

Options: Reply | Quote | Up ^


#6 Re: encoding wrong detcted on large files

Posted by: Andreas | Date: 2019-04-29 15:44 | IP: IP Logged

pspad:
Program settings / files
There is default CP...

This will not work if auto detect coding is activated. With auto detect his file shows ANSI anyway.

Options: Reply | Quote | Up ^


#7 Re: encoding wrong detcted on large files

Posted by: viplex | Date: 2019-08-06 07:06 | IP: IP Logged

Hi,
I am having issues with the auto encoding detection too. Personally I am not sure if 10kb is a good threshold for the detection. In my case I have both a html and a php file of about 35kb and they have some utf8 characters in the middle and at the end which are always broken by pspad.

The problem is that I don't notice it until its too late and the php file is causing fatal errors on the parser.

Do you think it would be a noticeable performance impact to bump up the limit? Maybe not at file open but somehow async? UTF-8 is really widespread and imho. it should be considered. I have solved the issue temporarily by putting an emoji at the start of my files but this is only the case for files I already know will break, it doesn't cover the rest.

I would imagine that something like 1 or 5Mb should do the job for most files and it would have no impact on smaller files anyway?

Best regards

Options: Reply | Quote | Up ^


#8 Re: encoding wrong detcted on large files

Posted by: JL2019 | Date: 2019-08-19 13:34 | IP: IP Logged

+10

Options: Reply | Quote | Up ^


#9 Re: encoding wrong detcted on large files

Posted by: pspad | Date: 2019-08-20 04:58 | IP: IP Logged

I will extend range for detection, this isn't problem.
It will very little slow down file opening.

Options: Reply | Quote | Up ^






Editor PSPad - freeware editor, © 2001 - 2019 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR