You are here: PSPad forum > Translation > Script for dictionary creation

Script for dictionary creation

#1 Script for dictionary creation

Posted by: MrSpock | Date: 06/03/2006 14:43 | IP: IP Logged

Subject modified for primping
from "Small script to help with dictionary creation"
to "Script for dictionary creation"

--------------------------------------------

One notable difference between a computer spell-checking dictionary and a typical print dictionary is that the latter can be confined to the "basic" forms of the words it contains, while the former must list all the inflected forms too.

I've written a small and very simple script that might help to facilitate the lexicographer's work a little, by automatically adding inflection suffixes to a list of words (provided they are inflected according to the same paradigm). That is to say it could be used to convert a list that comprises the two words "hot" and "flat" to the list "hot, hotter, hottest, flat, flatter, flattest". Uh, well, that's pretty much it. winking smiley Not very impressive, to be sure, and it still leaves the vast majority of the work to the human user. But I believe that anything that goes beyond this rather trivial achievement is bound to become immensely complicated and language-specific.

Download the script.

This script can be easily understood and modified by non-programmers, and its basic idea can be used in every language I know, with more or less interesting results. For German and Latin (and Turkish, I'm told) it is quite useful.

Usage: Copy deklinator.vbs to your PSPad\Script\VBScript\ folder. Upon restarting PSPad, you will find a new submenu called "Grammatik" in your PSPad Script menu, with a single entry, "Deklinieren". If some text is selected, the script assumes it should operate on the selection. If nothing is selected, it assumes it should convert all text in the active window. Note: With very large files, this can take some time.
As long as you don't modify it for your own purposes, the script will generate five new lines for every line in your active window/selection when run. These lines will be copies of the old ones, with one or two letters attached at the end.
In its original form, the script expects to be run from a list of German adjectives in the active PSPad window. It just adds "-e", "-em", "-en" etc. to the lines in that very window. The words in the list must be separated by newline characters.

Customization: If you want to modify it, open the script file with (preferably) PSPad and try to understand what's going on. There are lots of code comments, so I think everybody can do that without much programming experience.

Edited 1 time(s). Last edit at 01/01/2007 14:49 by Stefan.

Options: Reply | Quote | Up ^


#2 Re: Small script to help with dictionary creation

Posted by: no | Date: 07/22/2006 12:09 | IP: IP Logged

the link doesnt work

any updated one?

Options: Reply | Quote | Up ^


#3 Re: Small script to help with dictionary creation

Posted by: MrSpock | Date: 07/22/2006 14:58 | IP: IP Logged

Strange. The link works for me. But anyhow, I have now uploaded the script to the PSPad user extensions section.

Edited 1 time(s). Last edit at 07/22/2006 15:43 by MrSpock.

Options: Reply | Quote | Up ^


#4 Re: Small script to help with dictionary creation

Posted by: MrSpock | Date: 01/15/2007 16:52 | IP: IP Logged

I have just uploaded an updated version of the above mentioned script, by the name of "language tools". Now it has a routine that divides verbs into categories based on regular expressions and chooses the right suffix set (sometimes, that is). winking smiley
For languages that are more systematic than German, a modified version will be quite useful. Since there seem to be several Esperanto enthusiast around here, what about an Esperanto dictionary?

Options: Reply | Quote | Up ^


#5 Re: Small script to help with dictionary creation

Posted by: Tedehur | Date: 01/17/2007 01:35 | IP: IP Logged

MrSpock:
I have just uploaded an updated version of the above mentioned script, by the name of "language tools". Now it has a routine that divides verbs into categories based on regular expressions and chooses the right suffix set (sometimes, that is). winking smiley
For languages that are more systematic than German, a modified version will be quite useful. Since there seem to be several Esperanto enthusiast around here, what about an Esperanto dictionary?

I'm not exactly sure that the word "several" is adequate.
But it's true that your script would be extremely efficient with languages with a regular structure such as esperanto.
The only problem I see concerns the special characters. They are supported by UTF-8 and ISO-8859-3 only, and are usually replaced in documents using other codepages by equivalent combinations. For instance ĉ is written cx or ch when unsupported.
That would require the dictionary to include all 3 forms, or to perform a character conversion first. sad smiley

Options: Reply | Quote | Up ^


#6 Re: Small script to help with dictionary creation

Posted by: MrSpock | Date: 01/17/2007 03:53 | IP: IP Logged

What a pity. If only Ludwik Zamenhof had foreseen codepages and all the trouble we have with them. sad smiley

Options: Reply | Quote | Up ^


#7 Re: Small script to help with dictionary creation

Posted by: junke101 | Date: 04/23/2008 20:58 | IP: IP Logged

Not knowing any German at all, I'm have a hard time confidently deciphering what the correct replacements would be for the suffix and suffix inflection strings in english. (and honestly off hand what menu_item[1-4] even translate to) While I imagine I can 'figure it out' at a base level by studying the code long enough, I'm really hoping someone has already made such a translation and would be willing to share and/or at least point me in the right directions.

Anyone?

Options: Reply | Quote | Up ^


#8 Re: Small script to help with dictionary creation

Posted by: MrSpock | Date: 04/23/2008 22:24 | IP: IP Logged

@junke101: Most of the stuff isn't applicable to any other language than German. One of the peculiarities of German is that virtually any number of words can be run together. For example, "Danube steam ship company" is "Donaudampfschifffahrtsgesellschaft" in German, where you can optionally insert hyphens to mark the word boundaries, e.g. "Donau-Dampfschifffahrts-Gesellschaft."
Similarly, "four year employment" is "vierjährige Beschäftigung," where "vier" means "four," and "jährige" can't be translated but corresponds to English "year."
Basically, the script just adds suffixes and/or prefixes in cases where there is some system in this chaos.

Here's a version where I translated the menu entries (not sure if it helps much though): langtool.vbs

I'm afraid the best advice I can give is to run the different functions on a small list of words with, say, four entries and see what they do.

The first function, run on the single word "blah," will produce:
blah
blahe
blahem
blahen
blaher
blahes

For example for Latin, you could easily change this to transform "ros" into:
rosa
rosae
rosam

You would have to make these changes to the "Deklinieren" function:

suffix(0) = "a"
suffix(1) = "ae"
suffix(2) = "am"

[Delete the rest of the suffix(i) stuff.]

The others are very similar in principle. Nothing fancy, but it can help a lot, provided you split your basic forms into several files.

I'm not sure if private messages in this forum work, but I think there is an e-mail address in my profile. Maybe I can help you if you send me some examples, but I'm afraid I can't promise anything. sad smiley

Options: Reply | Quote | Up ^






Editor PSPad - freeware editor, © 2001 - 2017 Jan Fiala
Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák