Kamus: Computer Assisted Morphological Analysis of Indonesian Transitive Verbs

Robyn PHILLIPS

Lawrie BROWN []

Dennis HART [1]

Ian MACFARLING [2]

Abstract

This paper describes the development of a morphemic translator, called Kamus, for Bahasa Indonesia transitive verbs. A prototype has been developed so that linguists can use it when translating passages from Indonesian to English to provide a quick alternative to looking up dictionaries and thesauri for the specific meaning of a verb, or for an alternate choice. It is not only an electronic dictionary as well as a thesaurus, but is also a linguistic tool that provides the user with grammatical information about the word itself. Kamus has been developed for experienced linguists translating Indonesian passages, but can also be used by beginners as an aid to learning the language. Although this prototype has only concentrated on Indonesian transitive verbs, the design is capable of being modified to include the whole Indonesian language. This would provide a more complete electronic Indonesian dictionary, as well as a possible foundation for a sentence translator.

Contents

Abstract
1 Introduction
2 Literature Review
3 Nature of Indonesian
4 Default Behaviour of Kamus
5 Overview of Kamus
5.1 Parser
5.2 Database Retriever
5.3 Database
5.4 Words Stored in the Database
6 Results of Kamus
7 Further Research
8 Conclusion
9 References
Annex A - Entity Relationship Diagram for the Kamus Database

1 Introduction

Ever since the concept of an electronic dictionary was developed, linguists and potential linguists have enjoyed using them on a wide variety of hardware - from the desktop computer down to hand-held machines. They appeal to linguists because they offer a quick and easy method of looking up foreign words to obtain equivalent English meanings. Electronic dictionaries and word translators have been developed for a wide variety of languages - especially those most commonly taught such as German, French, Japanese and Mandarin Chinese. Unfortunately, languages that are equally, if not more useful to Australia, such as Indonesian, do not even rate a mention amongst these electronic aides [Suda87].

The aim of this project was to develop a computer assisted morphological analyser of Indonesian transitive verbs. This encompassed building an electronic dictionary to store information about the words. A parser was also built to analyse and display grammatical information for translated words. The design of Kamus is such that future versions can be modified to include the whole vocabulary of Bahasa Indonesia.

The electronic dictionary of Kamus is effectively a dictionary and thesaurus combined together. After input of an Indonesian transitive verb, the meaning of the word, a list of other similar words with slightly different meanings (in case the user has selected the wrong form of the verb for their needs), as well as words with similar meanings, is displayed.

The parser provides the user with as much grammatical information about the word as possible by just examining its structure. This information includes the prefix, root and suffix of the word (if applicable) and the effects of the affixes on the meaning of the root. This is being done as a prototype for a larger grammatical analyser, one that will translate whole sentences instead of only words. We have not be tackled the problem of developing a sentence analyser at this time. Instead, the current Kamus prototype only offers possible translations of one word at a time to the user

2 Literature Review

Two known systems for Machine Translation of Indonesian are: EICATS and a Multi-Language Translation System using Interlinga.

EICATS [ShSu88] is an English-Indonesian Computer Aided Translation System developed by Shimura and Sukmadjaja. They integrated the three main processes of a machine translation system - analysis, transfer and generation, to form a real time translation system. It has been tested using typical sentences and found to give satisfactory results. Sukmadjaja has also developed an electronic Indonesian dictionary [Sukm88], although little information on it has been obtained.

The Multi-Language Translation System using Interlingua [Tsuj90] is a result of an international cooperation project involving the five countries of China, Malaysia, Thailand, Japan and Indonesia. The system is being developed to translate between Japanese and each of the other languages using an intermediate language called Interlingua. It is hoped that by using this method, once a language has been set up to translate into Interlingua, it can then be translated into any other language that is also set up for Interlingua. The authors were hoping for an accuracy of 80-90% with pre-editing but no further information to date has been found as to the actual results of their work.

Although the respective authors appear pleased with their results so far, most of these translation systems are not being used outside of research purposes. One of the contributing factors to this may be that linguists are wary about trusting computers to provide an accurate translation. For these reasons, we decided to restrict our work to a morphemic analyser and translator for use by linguists. However, in future this could form the basis for a more sophisticated translation system.

3 Nature of Indonesian

An Indonesian word consists of one or more morphemes added together. A morpheme is the smallest component of a language that has a value within the language structure [MacD76]. There are two types of morphemes in Bahasa Indonesia, roots and affixes. These roots and affixes are combined together to form words.

Roots are morphemes that occur by themselves, or in combinations (eg. in the word memberitahukan, the beritahu actually consists of two roots, beri and tahu, joined together).

Affixes are morphemes that never occur independently, but rather always occur in a fixed relationship to a base. A "base" may simply be a root, or it can be a more complicated structure consisting of several morphemes. It could be a duplicated root, or a combination of several roots in some morphological or syntactic arrangement, and possibly with affixes attached as well. A root is thus just one possible form of a base [MacD76].

There are three classifications of affix, these are prefix, infix and suffix. A prefix is a morpheme that is attached to the beginning of the base word (e.g. memberitahukan, the mem being the prefix). There are a variety of different rules that govern how a prefix is attached to a base word, depending on which prefix it is and the make up of the base word. This is discussed in more detail below. Infixes are not commonly used in modern Bahasa Indonesia. They are morphemes that are placed inside of the root, usually after the first syllable. This project does not deal with infixes so they will not be discussed further. A suffix is a morpheme that is attached to the end of a base word (e.g. memberitahukan, the kan being the suffix).

Words can be constructed by combining morphemes together in particular patterns. A word can consist of (ignoring infixes) a root alone (e.g. tidur), a prefix and a root (e.g. membeli = mem- + beli), a suffix and a root (e.g. gambaran = gambar + -an), or a prefix and a root and a suffix (e.g. membesarkan = mem- + besar + -kan). Depending on the root, each of these prefix and suffix combinations can have different effects on the final meaning of the word.

The prefix me- is one of the more common prefixes used in Indonesian, especially as far as verbs are concerned. Some me-root verbs are always transitive, some me-root verbs are either transitive or intransitive according to whether or not they have an object and some me-root verbs never have an object, so are always intransitive. When the prefix me- is attached to certain roots, changes occur in the initial sound of the root, or an extra sound appears before the root (see Table 1). In this table, the asterisked letters are dropped when the me- prefix is attached to a root starting with one of them. The examples in the Table (which show the whole word with prefix, and the relevant base word in parentheses) illustrate this. As will be seen below, this added some complexity to the design of the Kamus parser and its supporting database.

Prefix Root beginning with Examples

mem f, p* memberi (beri), meminjam (pinjam)

men j, c, t* mendasarkan (dasar), menerima (terima)

meng vowel, g, h, k* mengacara (acara), mengunci (kunci)

meny s* menyisir (sisir)

me all other sounds melanggar (langgar)

Table 1: me- prefixes with initial sounds

Several Indonesian linguists from a variety of backgrounds were interviewed concerning this project. One of the first things to become evident was that none of the existing Indonesian dictionaries are considered to be very comprehensive, although one or two dictionaries do exist that are better than the others. The reason for this was considered to be related to the way Indonesians treat their language.

In Indonesia, a broad vocabulary is a sign of a highly educated person, and such a person commands respect. Indonesians wishing to be seen as highly educated want not only to increase the size of their vocabulary, but also to be seen to possess a large one. Therefore, when an unknown word is heard, the hearer (rather than admit ignorance of it) will often assign a likely meaning on the basis of the context and use it subsequently with that assigned meaning. This results in a variety of meanings, or shades of meaning, for each word; a variety that is reflected in different dictionary definitions. To capture the meaning of a word reliably, several dictionaries may need to be consulted. Ideally, Kamus should consolidate the definitions from a number of dictionaries to obtain accurate word meanings but, in the prototype, provision of this completeness of meaning has not been attempted.

There is evidently a lack of electronic versions of published Indonesian dictionaries. It has been reported to us that the publishers deny having electronic versions, stating that they always type out their dictionaries anew. This is entirely possible because of the cheap labour available in the Republic. Since we have been unable to obtain an electronic form of an Indonesian dictionary for Kamus, one was created from scratch.

4 Default Behaviour of Kamus

It is an axiom that there is an exception to every rule and, in the context of this paper, such exceptions make attempts at programming a computer to translate a language difficult. This has certainly been the case with Kamus and its parser and assumptions have had to be made about the default behaviour. These assumptions are used when a word to be translated is not found in Kamus' supporting database, and they are designed to help in determining the likely grammatical structure of the unknown word. That is, they relate to determining what (if any) affixes are present, and what the base word to which they are attached is.

If a word to be translated is not in the Kamus database, but contains the same letters in the correct positions as a possible affix, then it is assumed that those letters are an affix, and they are treated accordingly. For example, assume the word entered is kembang. One of the prefixes stored in the database is ke. If kembang is present in the database (as a whole word), then the supposed prefix will be ignored. However, if kembang is not found in the database, then the default behaviour leads to the candidate prefix ke being separated, and a search for the remainder mbang is conducted. In this case the parsed result is wrong, but in the majority of cases this behaviour provides the correct decomposition.

Words involving a prefix and dropped letter (see Table 1) also need to be catered for. Firstly, the prefix is recognized and separated, and then a database search conducted to see if what remains is a valid base word. If not, and the word to be translated is a "dropped letter" candidate, then the potential missing letter is attached and another database search conducted to see if this is a known base word. If it is still not found, then the attached letter is removed again and the word (minus prefix) is assumed to be an unknown root. For example, assume the word entered is mengacara. In this case meng is recognized as a valid prefix and separated, leaving acara. If acara is not found in the database, then Kamus realises that base words to which meng is attached may in fact begin with a 'k' even though it does not appear in the original word. The program therefore adds 'k' to the base word to form kacara and then looks for this in the database. If it is found, nothing more is done but, if kacara cannot be found either, then Kamus assumes that there is no such word and reverts it back to acara to continue on as normal.

If the word involved contains the first two letters di, and the whole word does not exist in the database, then it is assumed that this word is in the object focus form and it will be converted into its subject focus counterpart. For example, assume the word entered is dibeli. This word does not exist in the database so the program assumes that the user meant the passive form of membeli. It achieves this by removing the di, and looking up the database to find which prefix to replace it with. Kamus will find mem and add it on to form membeli, proceeding to parse the word as normal from then on.

Finally, it is worth noting that each of the different forms of the me- prefix are treated as prefixes in their own right.

5 Overview of Kamus

As has been noted, Kamus is a morphemic analyser that allows a user to enter an Indonesian transitive verb to obtain an equivalent English meaning. Its development is described in detail in [Phil93]. Kamus provides the user with the meaning of the word, a list of related words, the grammatical structure of the word (i.e. root, and prefix and suffix if present), and a small description of the effects of the affixes on that type of root. All of this information is presented to the user more quickly than if they had to manually look up the information in a hardcopy dictionary. If the required word is known to Kamus then it will accurately provide this information to the user. However, even if the word is not in the database, Kamus has the ability to provide a reasonable guess as to its grammatical structure.

There are two versions of Kamus available, an X-Windows version using Motif Widgets and WCL resource tools (see Figures 1 and 2), and a UNIX command-line version. In order to run Kamus, the user just enters a word to be translated into the main window (Figure 1). The program then analyses the grammatical structure of the word, looks up the database to find its meaning as well as any additional grammatical information, and also any related words. All of this information is then displayed for the user to examine (Figure 2).

Figure 1 : Kamus Main Window

Figure 2: Kamus Results Window

Kamus uses a database which contains all the information required by the program, including the definitions for each word, lists of related words, lists of affixes to look for when parsing, as well as grammatical information about the different prefix, root and suffix combinations. This database will be described in more detail later.

The overall structure of Kamus is shown in Figure 3, from which it can be seen that there are four main sub-systems. The interface sub-system mediates between the user and the rest of the program. For the X-Windows version of Kamus, it creates the windows being used and manages them, as well as retrieving the word from the user and returning the retrieved information. The interface for the UNIX command-line version prompts for a word and returns the information.

Figure 3: Data Flow Diagram - First Level

After the Interface sub-system obtains a word from the user, it is submitted to the Parser for analysis. A list of suffixes and prefixes is retrieved from the database and used to break down the word into its morphemes (i.e. prefix, suffix, and root, as applicable). These morpheme(s) and the original word are passed on to the Database Retriever. This searches the database to provide a meaning for the word and a list of any synonyms. It also retrieves all of the relevant grammatical information belonging to the word. Finally, all of the information is displayed for the user. Note that the database itself is not called from the interface sub-system. It is only used by the parser and the database retriever to produce the required results.

5.1 Parser

Some aspects of the operation of the parser have already been discussed under "Default Behaviour of Kamus". This Section presents a more complete description.

The parser was developed to analyse the word and to produce as much grammatical information as possible. Its role is to break down the word into the root, prefix and suffix (as appropriate), since knowing these is a prerequisite for successful later analysis.

The first thing the parser does when it is called is check to see whether the word it has just been given is a root word. This is done by searching the database for a match against words of type 'root'. If the whole word is found as a root, then there is no need to go any further and the parser exits.

If the word given to the parser is not found as a root then it probably contains more than one morpheme. On this assumption, a check is first made to see if any of the known suffixes exist as part of the word. The list of possible suffixes is retrieved from the database, and the word is checked against them. If no match is found then the parser continues on to check for a prefix. On the other hand, if a match is found then two possibilities remain - either the word has a genuine suffix as part of its structure, or the base word happens to contain the same set of characters as a suffix and the match is spurious. To determine which of these is the case, the parser leaves the purported suffix attached and checks for the existence of a prefix. The concept here is that if a prefix is found then the parser separates it from the whole word and checks to see if the remainder is a known root. If it is, the suffix match must have been a spurious one and the word actually consists of a prefix attached to a root that happens to contain the same letters as a suffix. Alternatively, failure of this check is taken to mean that the suffix match is genuine. In this case, the parser separates the (genuine) suffix from the rest of the word and checks what is left for a prefix. Failure to find one means that the original word consisted only of a root plus a suffix, whereas success means that the original word consisted of a prefix, root and suffix combined together.

This method of checking for a prefix whist in the middle of checking a suffix, and then again and again (until a recognized root is found or no other possibilities remain) is the way that Kamus provides its equivalent to the back-tracking found in declarative languages like Prolog.

The main hurdle experienced in the development of the parser was dealing with words that were special cases. After considering several solutions it was decided that the parser should assume that if a word is not in the database, then it complies with "normal" grammatical rules. The decomposition algorithm described above implements this.

Another complication that arose while programming the parser concerned the transformational grammar that characterizes prefixes and root words where the initial letter is dropped. This concept was described earlier in relation to Table 1 and the prefixes involved were mem- for the letter 'p', meng- for the letter 'k', men- for the letter 't', and meny- for the letter 's'. This situation was complicated further because these prefixes may be attached to roots beginning with any letter, not just those that are dropped. It therefore was not just a case of removing the prefix, adding the dropped letter and then looking up the root.

To deal with these cases, Kamus first searches for and separates any prefix from the word, and then looks for the remainder as a root in the database. Success means that no dropped letter was involved, but failure could be because of a dropped letter so this case is considered next. This entails adding the appropriate dropped letter on to the beginning of the root and checking for the modified root's existence in the database. If it is found, then the decomposition process is complete and a dropped leading letter was involved. If the modified root is not found then the added letter is removed, the separated prefix replaced, and the reconstituted whole word checked for other possible prefixes. This was done to deal with the me- class of prefixes. For example, if a word begins with the letters meng there are two possible outcomes from the parser. The first is that the word contains the prefix meng-, and the other is that the word contains the prefix me- and the root begins with 'ng'. Both of these may occur, so if one decomposition does not result in a successful search, the other is tried. There is a third outcome that might be considered, and that is a prefix of men- with the root starting with 'g'. This, however, is not possible since the prefix men- is only attached to words beginning with 't', 'c', 'd' or 'j'. And similarly for prefixes mem- and me-, meny- and me-, and men- and me-. Kamus considers these possiblities but, of course, does not succeed with these attempted decompositions because the "roots" produced by them do not exist in the supporting database.

A decision has been made to exclude as much grammatical information from the actual code of the parser as possible. The code in the parser is written in the form "when X occurs, do Y", but at compilation time, the actual contents of X and Y are not known. At run time, X and Y are retrieved from the database and placed into the 'shells' as required. Most of this grammatical information has been stored in the database tables "prefix_list", "suffix_list" and "replaced_by". This design decision should make future modifications to Kamus to include other types of words easier since it will basically be a matter of adding affixes to the lists in the database, and only rarely adding another 'shell' to the code.

5.2 Database Retriever

The role of the database retriever is to retrieve the meaning of the selected word in addition to other grammatical information. It receives the root, any affixes found, plus the original word from the parser and, using this data, it can obtain the meaning of the word through an embedded SQL interface between the program and the database.

To provide all of the relevant grammatical information, the type of the root (e.g. noun, adverb, etc.) and the structure of the word (e.g. root, prefix plus root, etc) is retrieved. The structure of the word informs the program where to look in the database. The program then retrieves the meaning of the affix(es) based on the word's root type.

5.3 Database

The database has been designed to contain all of the information specific to the Indonesian language. That is, word meanings, grammatical information, as well as lists of affixes and their meanings for various root types.

The design of the database can viewed as consisting of four main parts, the part that shows how a word has other related words, and then how the different combinations of affixes and the root can exist, that is root plus prefix, root plus suffix and root plus prefix plus suffix.

An entity relationship diagram was constructed and converted into a relational schema (which specifies the tables that are required to store the information). The full entity relationship diagram can be found in Annex A. The rectangles in this diagram represent entities, the things about which data is to be stored. The diamond symbols represent relationships between the entities. For example, the word ajak "has root type" transitive verb, where transitive verb is one of the valid types of root. The ellipses on the diagram are known as attributes. These specify the actual data items that are recorded about the various entities, or relationships, of the database. As an example, looking at the entity "prefix_list", we record the letters of each prefix ("prefix"), its length ("length"), the letter dropped from the root when it is attached, if any ("dropped_letter"), and its class ("class"). Finally, the characters "1", "M" and "N" specify the cardinalities of the relationships in the database. For example, a word can be of only one type (e.g. "noun") but there will many words of the same type (e.g. there are many words that are nouns).

There are six main tables in the database supporting Kamus, these being "word" (to store the meaning of a word, the type of its root and the type of the word as a whole), "related_words" (to store the synonyms), "root_suffix" (to store the words of type root + suffix), "prefix_root" (to store the words of type prefix + root), "prefix_root_suffix" (to store the words of type prefix + root + suffix) and "replaced_by" (to indicate which prefixes are replaced by which and under what circumstances).

The entity "word" participates in three relationships, two with other entities - namely "type of word", and "type of root", and one with itself - for related words. The relationship with "word type" specifies the form of the word with respect to the affixes. That is, whether it is a root, prefix + root, root + suffix, or prefix + root + suffix. The relationship with "root type" specifies the grammatical type of the root of the word, e.g. noun, transitive verb, intransitive verb, simple verb, adverb or adjective. The relevance of "word type" and "root type" is that they are used to retrieve grammatical information about the word to display to the user.

The relationship "related to" connects the entity "word" to itself. This allows each particular word to be connected to other words that are either its synonyms or related in some other way (such as a different grammatical form with the same root). Words with the same root are included since there are times when writing Indonesian that a linguist might know that he or she wants to use a particular form of the root, but cannot think of which one. Kamus can then be used to retrieve the root word, but in addition to synonyms it will also provide other forms with the same root from which the linguist can choose.

The bottom half of the entity relationship diagram is concerned with how the affixes can be joined to particular types of root, and what effect the conjunction will have on the meaning for each different type of root. It also details the grammatical type of the whole word that results from the connection (i.e. noun, adjective, etc).

The bottom right corner of the entity relationship diagram relates to prefixes. This data is used by the parser to break down the word into its morphemes (prefix, root and suffix). For each prefix that is stored there is the ability to store what has been called a 'dropped letter', though of course this only applies to a few of them (see Table 1). As described above, knowledge about dropped letters is required by the parser to reconstruct the root after removing a prefix that entails a dropped letter.

The entity "prefix_list" participates in a relationship "replaced by". This deals with the passive form of transitive verbs. When transitive verbs are used in their passive form the me- prefix is replaced by a di- prefix. When parsing the word the parser needs to recognise that if the di- prefix is part of the word, and the word is not a root in the database, then the di- prefix needs to be replaced with the corresponding me- prefix before further analysis. This is worked out by looking at the first letter of the word after the di- prefix has been removed since this will indicate which me- prefix is the appropriate replacement. An interesting complication is that when the di- prefix is used letters that are normally dropped with an me- prefix are retained.

The database has been designed so that it can be upgraded to cater for the whole Indonesian vocabulary. Although the prototype deals only with transitive verbs, the database should be able to be easily modified to cope with other types of words. In achieving this the parser component of Kamus may need to have some extra rules added, and the database will need not only the extra words themselves but also the other prefixes and suffixes of the language outside of those used by the transitive verbs.

5.4 Words Stored in the Database

The list of words that has been stored in the database is mostly from Langkah Baru [JoSt89], [John90]. These references were chosen so that the list of transitive verbs obtained would provide a large enough selection of vocabulary to demonstrate the prototype. Over six hundred root words were chosen, resulting in more than fourteen hundred different combinations. It was originally intended to go through the Echols and Shadily dictionary Kamus Indonesia Inggris [EcSh89] to make the list more complete, but this was not necessary in order to develop a working prototype so this task has been left as an area for future development.

Not every transitive verb from Langkah Baru [JoSt89], [John90] was used, and there may be some stored in the database that are not transitive verbs. The reason for this is that there are no known Indonesian dictionaries that specify the type of the word (i.e. transitive verb). Kamus Besar [Kamu88] gives information about the root but does not distinguish between the classes of verbs; nor does it give the type of the word when showing the different forms of the roots. We have thus attempted to classify the verbs in the prototype ourselves. If any further work is carried out on Kamus, the grammatical information that has already been entered into the database will have to be checked for correctness.

The synonyms for the prototype were obtained from Kamus Ungkapan Indonesia-Inggris, which is an Indonesian-English Dictionary [PoSu85a],[PoSu85b]. This also turned out to be a very time consuming task since, although the dictionary would list one word as being a synonym for another, it was not necessarily listed the other way around. Any synonyms mentioned in the dictionary that were not part of the chosen vocabulary were not included.

6 Results of Kamus

Table 2 shows several words that were put through Kamus along with the results that were output. The first three words, membawa, memotong, and mempertambangkan are words that exist in the database and, as can be seen, they have been parsed correctly. The first is a 'normal' word where the prefix is simply attached to the root without any modification. The second involves a dropped letter since the root word begins with 'p' and the prefix is mem (see Table 1). The third one demonstrates that Kamus treats two prefixes joined together, such as memper, as one prefix. Mereka is not a transitive verb and does not have any prefixes or suffixes but, since it has not been entered into the database, Kamus has followed its default behaviour (i.e. assumed that it follows the 'normal' syntactical rules) and has parsed it into a prefix + root. The final word is not in the database either, and nor is it a transitive verb. Kepandaian is actually a noun, but as the affixes ke and an are used for transitive verbs as well, Kamus has correctly parsed the word. This demonstrates that with the addition of the correct affixes for other parts of the Indonesian language, Kamus can be used to translate those words correctly as well.

Selected Word Prefix Root Suffix

membawa mem bawa

memotong mem potong

mempertambangkan memper tambang kan

mereka me reka

kepandaian ke pandai an

Table 2: Sample Results from Kamus.

The main limitations highlighted during the testing of the Kamus prototype were:

a The user does not have the opportunity to select a word from a large vocabulary in the database. Only 600 root words are stored together with 800 transitive verbs formed from these root words. Ideally, the database would contain not only all of the transitive verbs, but the majority of the Indonesian language. Kamus also does not have the ability to deal with transitive verbs that contain a duplicated root, or two roots joined together, or are using the shortened versions of the pronouns engkau and aku.

b If information needs to be modified or added to the database, the user must be familiar with the underlying Ingres database software. As this is not a reasonable expectation, an interface to the database should be developed to allow the user to modify the information without needing an understanding of the database software or the database structure.

c The information that currently exists in the database is neither totally accurate nor complete, nor is it from a wide range of sources. A basic meaning for each word from only two different dictionaries has been entered into the database to demonstrate Kamus' potential. The information has not been checked for its level of compliance with Indonesia's language standards and no effort was made to ensure that it included the different spellings that are available for some words.

d The parser is dependent on the order that the affixes are retrieved from the database in the sense that if there are two or more alternative decompositions of a word Kamus will desist after detecting the "first" one.

e Kamus has only been developed for a UNIX system and uses an underlying Ingres database.

f There is a limitation in Kamus in that if a word is not stored in the database, the result of the parser cannot be guaranteed to be correct. It can, however, make reasonably intelligent guesses in such circumstances.

7 Further Research

There are many areas of improvements that can be made to Kamus; some to remove the limitations previously identified and others to add extra features to improve its functionality. The following is a list of the main areas:

a One of the first improvements should be to increase the vocabulary so that not only does it contain all of the transitive verbs, but rather the majority of the Indonesian language.

b An interface application program needs to be developed to allow the user to modify the content of the database without needing to know how to operate the underlying database software, or the database structure.

c For Kamus to have a practical use, it needs to be made available on other computer operating systems and underlying database software.

d As Kamus is just a prototype for translating words, the possibility exists that the program could be used as a basis for a larger and more sophisticated translation system that translates whole sentences.

e Finally, as Kamus has been developed as an electronic tool for translating Indonesian, there is no reason why it could not be adapted to be used to translate other languages that exhibit a similar prefix-root-suffix structure.

8 Conclusion

Kamus is an experimental prototype that we believe has been successful in acting as a combined electronic dictionary/thesaurus and grammatical analyser. It provides a quick and easy tool for obtaining equivalent English meaning(s) for Indonesian words. When Kamus is given a word a parser breaks it down into its prefix, root and suffix, as appropriate. The database retriever then retrieves the word's meaning, a list of synonyms and other grammatical information such as the type of the word and its root, and a description of the generic meaning produced when its affixes are joined together with that particular type of root. Kamus will produce the correct results if the word is in its supporting database (assuming that the information stored in the database is correct), otherwise it will give an logical estimation of the likely structure of the word. Kamus was developed in a UNIX environment with an Ingres database supporting it. There are two interface environments, the first being a windows style environment using X-Windows and Motif Widgets via WCL and the second being a Unix command line environment.

At present, Kamus has been developed for translating Indonesian transitive verbs, but there is no reason why the program cannot be developed further to include the remainder of the Indonesian language.

During this project a computer assisted morphological analyser of Indonesian transitive verbs was developed. Although it requires more work, it is believed that what has been done is a positive step towards developing electronic tools for use as translators between English and the languages of Australia's neighbouring countries.

9 References

[EcSh89] ECHOLS, John M. and SHADILY, Hassan. Kamus Indonesia Inggris : An Indonesian-English Dictionary. Cornell University Press, Jakarta, 1989.

[EcSh90] ECHOLS, John M. and SHADILY, Hassan. Kamus Inggris Indonesia : An English-Indonesian Dictionary. Cornell University Press, Jakarta, 1990.

[John90] JOHNS, Yohanni. Bahasa Indonesia Book Two Second Edition Langkah Baru: A New Approach. Australian National University Press, Rushcutters Bay, 1990.

[JoSt89] JOHNS, Yohanni and STOKES, Robyn. Bahasa Indonesia Book One Langkah Baru: A New Approach. Australian National University Press, Rushcutters Bay, 1989.

[Kamu88] Kamus Besar Bahasa Indonesia (The Indonesian Large Dictionary). Department of Education and Culture, Jakarta, 1988.

[MacD76] MACDONALD, Roderick Ross. Indonesian Reference Grammar. Georgetown University Press, Washington DC, USA, 1976.

[Phil93] PHILLIPS, Robyn. Kamus: Computer Assisted Morphological Analysis of Indonesian Transitive Verbs. BSc (Honours) thesis, Department of Computer Science, University College, UNSW, Australian Defence Force Academy, Canberra, Australia. October 1993.

[PoSu85a] PODO, Hadi and SULLIVAN, Joseph. Kamus Ungkapan Indonesia-Inggris Jilid I A-L. Penerbit PT Gramedia, Jakarta, 1985.

[PoSu85b] PODO, Hadi and SULLIVAN, Joseph. Kamus Ungkapan Indonesia-Inggris Jilid II M-Z. Penerbit PT Gramedia, Jakarta, 1985.

[ShSu88] SHIMURA, Masamichi and SUKMADJAJA, Darmawan. "An English-Indonesian Computer Aided Translation System". Journal of Japanese Society for Artificial Intelligence. Volume 3, Number 1, January 1988, pp103 - 107.

[Sukm88] SUKMADJAJA, D. "Building an Indonesian electronic dictionary". International Symposium on electronic dictionaries - ISED '88. Inter Group, Tokyo, 1988, pp76-68.

[Suda87] SUDARWO, I. "The need for MT in Indonesia". MT Machine Translation summit. Manuscripts and Program. Japan, 1987, p113.

[Tsuj90] TSUJI, Yoshihide. "Multi-Language Translation System at Using Interlingua for Asian Languages". Proceedings of an International Conference organized by the IPSJ to Commemorate the 30th Anniversary. IPSJ, 1990, pp 545 - 552.

Annex A - Entity Relationship Diagram for the Kamus Database

Entity Relationship Diagram for the Kamus Database

Definitions of terms used in the Kamus entity relationship diagram.

A full word is a whole word consisting of the following syntax:

[prefix] + root + [suffix]

where the brackets indicate that the affix may or may not exist.

"Type of word" is the grammatical type of the full word. This may be, for example, noun, transitive verb, adjective, etc.

"Type of root" is the grammatical type of the root. This may be, for example, noun, transitive verb, adjective, etc. Note that words based on a particular root do not necessarily have the same type as their root.

The prefix and suffix are the defined group of letters that may exist at the beginning and end of the word respectively.


[1] Department of Computer Science

2 Department of Politics

University College, UNSW, Australian Defence Force Academy,

Canberra ACT 2600 Australia