You are here: PSPad forum > Translation > Re: Small script to help with dictionary creation
Posted by: MrSpock | Date: 2006-06-03 12:43 | IP: IP Logged
Subject modified for primping
from "Small script to help with dictionary creation"
to "Script for dictionary creation"
--------------------------------------------
One notable difference between a computer spell-checking dictionary and a typical print dictionary is that the latter can be confined to the "basic" forms of the words it contains, while the former must list all the inflected forms too.
I've written a small and very simple script that might help to facilitate the lexicographer's work a little, by automatically adding inflection suffixes to a list of words (provided they are inflected according to the same paradigm). That is to say it could be used to convert a list that comprises the two words "hot" and "flat" to the list "hot, hotter, hottest, flat, flatter, flattest". Uh, well, that's pretty much it. Not very impressive, to be sure, and it still leaves the vast majority of the work to the human user. But I believe that anything that goes beyond this rather trivial achievement is bound to become immensely complicated and language-specific.
This script can be easily understood and modified by non-programmers, and its basic idea can be used in every language I know, with more or less interesting results. For German and Latin (and Turkish, I'm told) it is quite useful.
Usage: Copy deklinator.vbs to your PSPad\Script\VBScript\ folder. Upon restarting PSPad, you will find a new submenu called "Grammatik" in your PSPad Script menu, with a single entry, "Deklinieren". If some text is selected, the script assumes it should operate on the selection. If nothing is selected, it assumes it should convert all text in the active window. Note: With very large files, this can take some time.
As long as you don't modify it for your own purposes, the script will generate five new lines for every line in your active window/selection when run. These lines will be copies of the old ones, with one or two letters attached at the end.
In its original form, the script expects to be run from a list of German adjectives in the active PSPad window. It just adds "-e", "-em", "-en" etc. to the lines in that very window. The words in the list must be separated by newline characters.
Customization: If you want to modify it, open the script file with (preferably) PSPad and try to understand what's going on. There are lots of code comments, so I think everybody can do that without much programming experience.
Edited 1 time(s). Last edit at 2007-01-01 13:49 by Stefan.
Posted by: no | Date: 2006-07-22 10:09 | IP: IP Logged
the link doesnt work
any updated one?
Posted by: MrSpock | Date: 2006-07-22 12:58 | IP: IP Logged
Strange. The link works for me. But anyhow, I have now uploaded the script to the PSPad user extensions section.
Edited 1 time(s). Last edit at 2006-07-22 13:43 by MrSpock.
Posted by: MrSpock | Date: 2007-01-15 15:52 | IP: IP Logged
I have just uploaded an updated version of the above mentioned script, by the name of "language tools". Now it has a routine that divides verbs into categories based on regular expressions and chooses the right suffix set (sometimes, that is).
For languages that are more systematic than German, a modified version will be quite useful. Since there seem to be several Esperanto enthusiast around here, what about an Esperanto dictionary?
Posted by: Tedehur | Date: 2007-01-17 00:35 | IP: IP Logged
MrSpock:I have just uploaded an updated version of the above mentioned script, by the name of "language tools". Now it has a routine that divides verbs into categories based on regular expressions and chooses the right suffix set (sometimes, that is).
For languages that are more systematic than German, a modified version will be quite useful. Since there seem to be several Esperanto enthusiast around here, what about an Esperanto dictionary?
I'm not exactly sure that the word "several" is adequate.
But it's true that your script would be extremely efficient with languages with a regular structure such as esperanto.
The only problem I see concerns the special characters. They are supported by UTF-8 and ISO-8859-3 only, and are usually replaced in documents using other codepages by equivalent combinations. For instance ĉ is written cx or ch when unsupported.
That would require the dictionary to include all 3 forms, or to perform a character conversion first.
Posted by: MrSpock | Date: 2007-01-17 02:53 | IP: IP Logged
What a pity. If only Ludwik Zamenhof had foreseen codepages and all the trouble we have with them.
Posted by: junke101 | Date: 2008-04-23 18:58 | IP: IP Logged
Not knowing any German at all, I'm have a hard time confidently deciphering what the correct replacements would be for the suffix and suffix inflection strings in english. (and honestly off hand what menu_item[1-4] even translate to) While I imagine I can 'figure it out' at a base level by studying the code long enough, I'm really hoping someone has already made such a translation and would be willing to share and/or at least point me in the right directions.
Anyone?
Posted by: MrSpock | Date: 2008-04-23 20:24 | IP: IP Logged
@junke101: Most of the stuff isn't applicable to any other language than German. One of the peculiarities of German is that virtually any number of words can be run together. For example, "Danube steam ship company" is "Donaudampfschifffahrtsgesellschaft" in German, where you can optionally insert hyphens to mark the word boundaries, e.g. "Donau-Dampfschifffahrts-Gesellschaft."
Similarly, "four year employment" is "vierjährige Beschäftigung," where "vier" means "four," and "jährige" can't be translated but corresponds to English "year."
Basically, the script just adds suffixes and/or prefixes in cases where there is some system in this chaos.
Here's a version where I translated the menu entries (not sure if it helps much though): langtool.vbs
I'm afraid the best advice I can give is to run the different functions on a small list of words with, say, four entries and see what they do.
The first function, run on the single word "blah," will produce:
blah
blahe
blahem
blahen
blaher
blahes
For example for Latin, you could easily change this to transform "ros" into:
rosa
rosae
rosam
You would have to make these changes to the "Deklinieren" function:
suffix(0) = "a"
suffix(1) = "ae"
suffix(2) = "am"
[Delete the rest of the suffix(i) stuff.]
The others are very similar in principle. Nothing fancy, but it can help a lot, provided you split your basic forms into several files.
I'm not sure if private messages in this forum work, but I think there is an e-mail address in my profile. Maybe I can help you if you send me some examples, but I'm afraid I can't promise anything.
Editor PSPad - freeware editor, © 2001 - 2024 Jan Fiala, Hosted by Webhosting TOJEONO.CZ, design by WebDesign PAY & SOFT, code Petr Dvořák, Privacy policy and GDPR