Problems with spellchecker

#1 Problems with spellchecker

Posted by: myroslava | Date: 08/27/2004 19:56 | IP: IP Logged


I'm from Ukraine, I really like PsPad, and since there is no non-ispell spelling dictionary for the Ukrainian language, I decided to start putting together a dictionary... gonna be hell of a lot of work... hope I will get it finished SOME time, and am almost sorry I got this crazy idea...

But the process of using the spellchecker with a VERY limited dictionary showed a number of bugs in the spellchecker. I work like this - i copy some text into PsPad, it underlines almost everything smiling smiley, and then I add words.

So, now the list of problems:

1) sometimes the spellchecker doesn't offer a suggestion to replace a word, though I know perfectly well that there is a word in the spelling dictionary that I have just put into it, and it differs from the word in question by ONE letter only (yes, I've saved the file and reloaded the dictionary). Though, I must add, it happens with one letter only - the newly (re)introduced "g" as opposed to "h" (approximate explanation).

2) whenever there is a complex word with a hyphen (both parts - before and after the hyphen are NOT in the dictionary yet), the underline disappears as soon as the hyphen is typed. Thus, if you misspell a word with a hyphen, it will not be shown. I noticed this only because I know the limits of my dictionary. Someone who has a complete dictionary wouldn't notice it.

3) sometimes the spellchecker considers one word as two separate words. Thus, if you right-click on the first part of an underlined word, it would show the replacement suggestion for this first part (e.g. 4 or 4 characters in length), and if you click on the second part, it shows a suggestion for the second part. No way of knowing how this could spellchecking with a complete dictionary. In the same way, if you choose "add to dictionary" from the context menu, it adds only part of the word. Strange, huh?

4) problem with case-sensitiveness. All words in the dictionary must be lowercase, otherwise the spell-checker doesn't recognize a word (e.g. a name of a person or a town). Of course, I started adding everything lowercase, but this kills the whole purpose - for example, suppose someone would write someone's name starting with a lowercase character, and not notice it....

5) problem when using apostrophe. Underlines words with apostrophe, which HAVE been added to the dictionary. Don't know whether it considers the word as two words or something else. When using English dictionary, it doesn't recognize words like "man's", too, so maybe it doesn't consider apostrophe as word-break, because then it would recognize "man" and underline only "s". It underlines everything.

Hope this helps. I'm not a programmer, so I can't offer concrete suggestions, sorry. Only general ideas like enforce unity of a word between spaces, consider hyphen a space (or word-breaking character, whatever would be correct in programmer's talk), and NOT consider apostrophe a word-break and pay attention to an apostrophe as a separate character.

Good luck. Again, I enjoy using PsPad. Hope it gets better and better.

#2 Re: Problems with spellchecker

Posted by: pspad | Date: 08/27/2004 20:21 | IP: IP Logged

1) spellchecker suggest word, if word begins to same letter and difference is in one or two other letters (missing, additive, changed)

4) yes, this is limitation

2,3,5) I know it. Will be better (I hope) with next version of Synedit (editor base component)

If you need disctionary, try to look for ISpell or OpenOffice spell check dictionaries
When I create Czech dictionary, I start with some 10 000 words dictionary. I take some elektronic books, articles from internet and grow it up to 320 000 words. And because I know czech gramatic, I make special loading and generating other word form and Czech dictionary has now over 700 000 efective words...
It takes a long time and many hours of work...

#3 Re: Problems with spellchecker

Posted by: myroslava | Date: 08/27/2004 20:55 | IP: IP Logged

1) what if a person misspells the first letter? this happens. And, sometimes it offers suggestions with 3-letters-difference. So I think theoretically the first-letter problem could be solved (3 letters difference means it has rather broad analyzing possibilities... but maybe this is because the dictionary is small, so the memory can handle it).

4) ok. Nothing to do here.

2,3,5) ok, understood, will be waiting. Anyway, my dictionary will not be ready soon enough either smiling smiley As they say, patience is a virtue smiling smiley

I have an ispell dictionary for OpenOffice (I'm an OOo user), but it will be a pain in the posterior to remove the aff markups after each word, and then redo the word-forms. Also, the Ukrainian ispell seems to work rather well, but looking inside the dictionary, I would prefer doing things from scratch (will take too long to explain, but some things are stupid there). Again, I don't know programming tricks to automate the word-forms-generation process. Also, I know grammar too smiling smiley, but I approach it from the linguist's point of view, not a programmer's. I do not add the whole paradigm of technically possible word-forms in cases where the word can never be used in these forms (there are lots of incomplete paradigms for verbs).

After a couple of weekends, my dictionary has slightly above 15000 word-forms. Yes, this takes long hours of work smiling smiley.

Well, thanks for the answer, and thanx for making PsPad. It really helps me in my HTML coding smiling smiley.

#4 Re: Problems with spellchecker

Posted by: myroslava | Date: 08/27/2004 21:03 | IP: IP Logged

Oh, yeah, and I think the Ukrainian interface needs some corrections. Don't promise to do it soon, but some time in the future...

We are in a terrible position of a newly independent country whose language was suppressed for a long time. So now different TV channels use different types of inflections (in some cases), there are no unified grammar and spelling rules, and there is that problem with the "g" letter, which many people don't know where to use and where not to use. This also includes the desire of some people to try to "ukrainize" foreign technical terms (like "file"), and the Ukrainian PsPad interface features a couple of *different* versions for some terms("file" among them), so it's confusing to use. I'll correct some things and unify the language across the interface. Again, don't promise to do it soon.

#5 Re: Problems with spellchecker

Posted by: Stefan | Date: 08/29/2004 22:36 | IP: IP Logged

> When I create Czech dictionary

Jan, how did you sort out double words ?
Have you a tool ? I search for such a tool too.

Will you pls share your experiences with me ?

greets, Stefan
#6 Re: Problems with spellchecker

Posted by: pspad | Date: 08/29/2004 22:58 | IP: IP Logged

I am using Sort function from PSPad - menu Edit / Sort
[x] remove dupicates

#7 Re: Problems with spellchecker

Posted by: Stefan | Date: 08/29/2004 23:15 | IP: IP Logged

> [x] remove dupicates

yawning smiley

THX i be ashamed grinning smiley

greets, Stefan
#8 Re: Problems with spellchecker

Posted by: jekov | Date: 03/01/2011 16:47 | IP: IP Logged

Hi there,
It was necessary, converted from Dictionary EmEditor the Ukrainian dictionary for spelling without the established Office if, to whom it is necessary:uk_UA_Spell

