A New Runtime for the EC Erlang Compiler

Lawrie Brown
School of IT and EE, Australian Defence Force Academy, Canberra, Australia
`Lawrie.Brown@adfa.edu.au`

Abstract

This paper will discuss the new runtime being developed for the EC Erlang Compiler. EC is intended to be both the core compiler technology for the Magnus massively scaleable computing platform, and to implement some previously identified safety extensions to Erlang for use in mobile code applications.

I start with a very brief overview of the Erlang language, a functional language designed to support robust, reliable, distributed, near real-time applications, with the emphasis on the features the runtime must support. I then briefly list other efforts at developing an Erlang compiler. Next I discuss my work on developing the new runtime to support the EC compiler, and some of the problems and issues that arose in doing this. These include the representation and handling of Erlang terms and its consequences for garbage collection; the use of detached system pthreads to implement Erlang processes (and why we could not use the standard cancellation mechanisms); the use of the dlopen library to implement dynamic module loading; and some issues with external interaction. I conclude with some open issues to be resolved and areas for further research.

1 Introduction

This paper details some of the issues raised and solutions trialed in developing a new, reasonably complete, runtime for the EC Erlang compiler. Erlang [1,2] is a dynamically typed, single assignment language which uses pattern matching for variable binding and function selection, which has inherent support for lightweight concurrent and distributed processes, and has error detection and recovery mechanisms. It was developed at the Ericsson Computer Science Laboratory to satisfy a requirement for a language suitable for building large soft real-time control systems, particularly telecommunications systems. The Erlang system in general use uses a threaded interpreter of intermediate code compiled from Erlang source modules. This OSE Erlang system has been released as an open source product[17], and is freely available for download. However previous efforts to create a full compiler for Erlang have met with mixed success.

In previous work, I have proposed extensions to the language to improve its ability to support safe mobile code execution, which are known as Safe Erlang [8,7]. The extensions included the use of unforgeable references (capabilities) with associated rights for safety critical data types (eg process identifiers), and support for a hierarchy of execution nodes. I had previously trialed the concepts in a prototype, but now needed to incorporate them into a full production Erlang system.

My current work involves drawing these threads together. Maurice Castro at the Software Engineering Research Centre (SERC), RMIT, was developing the EC Erlang compiler [9,10,11]. The core compiler existed, but with very limited run-time support. This was precisely where I needed to make changes to the language to support my safety extensions. Hence I proceeded to develop a greatly extended run-time, with support for the full language features as well as my safety extensions. This paper documents some of the problems that arose, and the solutions chosen, in this development.

2 Erlang Language Requirements

Erlang is a functional language designed to support robust, reliable, distributed, near real-time applications. Some of the key features of Erlang needed for this include:

Concurrency: extremely light-weight process creation, with message passing between them
Distribution: easy remote process creation, message passing and interaction across multiple nodes
Robustness: error and exception handlers for fault-tolerant code
Soft real-time response: event handling in milliseconds
Hot code upgrade: of modules on a running system
Incremental code loading: of modules named at execution time as needed
External interfaces: to devices, files, network, non-Erlang processes, treated as special "processes"

The Erlang language is functional, derived (now very loosely) from a prolog progenitor. As well as the basic language syntax, it defines a quite large number of BIFs (Built-In Functions). In some ways these are more akin to O/S support utilities than language support because of the wide range of features supported as native by Erlang.

A key focus of Erlang is on soft, real-time, systems. It is essential that it has extremely cheap and efficient process spawning. This allows code such as the following example (a trivial echo deamon):

    Echo = spawn(echo,loop,[]), 
    Echo ! {self(),"Here is a message"}, 
    ....
    loop() ->
      receive  
        {From,Msg} -> From ! Msg, loop()
      end.

The process identifier returned from the spawn call is used to send messages to that process. But it also can be used for a range of other purposes such as sending signals to it (including killing it), or examine its process dictionary. These additional uses needed to be limited in a Safe Erlang implementation.

Further, distribution is part of the core language. Hence the spawn of the new thread can be directed at some other execution node as easily as on the local node. The following code fragment would start the echo daemon process on a remote node. The message passing though would be identical to that already shown.

    Echo = spawn(SomeNode,echo,loop,[]),

The ëverything is a process" model is also used for interaction with the external world - to devices, files, network etc. These special processes are called ports, are created by a special call, and implemented using code compiled in to the standard runtime.

Another key feature of Erlang is its support for incremental code loading and hot code upgrade. When a function in another module is called, the binding to the code for that module is delayed till runtime. For example you can have code like:

    Word = string:sub_word(" Hello old boy !",3),
    apply(io,write,[Word]),

At the point of call, the system must identify the required code and transfer execution to it. Hence in the code above, the binding to modules string or io is delayed until run-time.

Further, there is support for the replacement of modules whilst execution is occurring. Erlang assumes that at least two instances of any module may be loaded and running at any time. New references to a module will always used the most recently loaded, but previous calls may have resulted in code still executing in the previous version. When the application is convinced that all such calls have been completed, then the old code may be unloaded.

In developing the runtime, I needed to include support for these critical features of the language.

3 Erlang Compilers and EC

The current main production Erlang system is Öpen Source Erlang" (OSE). OSE uses an internally threaded interpreter of intermediate BEAM code, compiled from the original Erlang source modules. Whilst the efficiency of this system is quite good, as has been demonstrated in its production use in a range of applications, it does suffer the overhead of interpretation.

The original motivation for starting work on EC, was that a full, unencumbered, compiler was needed for the Magnus project at SERC. The existing OSE did not provide this. Also, from my perspective of implementing safety extensions, the OSE runtime is complex and not well-documented, with many inter-dependencies.

There have been several previous efforts at developing a full compiler for Erlang. These include:

HiPE - High-Performance Erlang: HiPE is a native code compiler for X86 & SPARC architectures, developed by a team at Uppsala University, Sweden [16,15,13]. It is now part of the standard OSE system, and utilises the existing OSE interpreter runtime. Whilst it is now widely available as part of the OSE distribution, since it uses the same runtime, it has the same limitations for our purposes.
ETOS - Erlang to Scheme: ETOS is an Erlang compiler that functions by translating Erlang to Scheme, and then using the Gambit Scheme Compiler to generate C code, which is finally compiled for the target system [12]. The limitations of this system are that it is currently incomplete, has a proprietary licence, is convoluted, and lastly, the most recent versions are only available as a binary distribution. This eliminates it as a suitable platform for our research.
Gerl - Geoff's Erlang: Gerl (Geoff's Erlang) compiles a large subset of Erlang into C++, which is then compiled for the target platform [20,19]. It also includes support for a number of additional features including a concurrent monitoring system and simple static type analysis. Whilst this is close to what was desired for this research, it had some limitations. In particular there are some semantic incompatibilities between Erlang and C++, especially concerning calling conventions and Erlang's overloaded function definitions which are resolved at runtime. It is also unable to interact with existing OSE Erlang systems.

Given these perceived issues with any of the existing compilers, Maurice Castro commenced the development of the EC (Erlang Compiler) system [9,10,11]. As of mid-2002, the basic compiler was complete, but with a very limited runtime library.

One issue EC faced was that the Erlang language is still evolving. As a target for EC, the language recognised is as specified in the original Erlang book, and the subsequent language reference manual [1,3], with the minimum of changes needed to support the required safety extensions [8,7].

3.1 System Targets

From the beginning, EC was built and tested on a range of target systems. In particular it had been compiled and tested on FreeBSD, Redhat Linux, Solaris and MacOSX. Although the compiler ran on all these platforms, it only included a code generator for x86 assembler, targeted to the FreeBSD and Linux systems. On the other platforms it ran as a cross-compiler. Code generators for other architectures would be desirable, and is an area for future work.

Utilising multiple target platforms has been very beneficial in creating reliable code. In conjunction with an extensive regression test suite used to constantly verify the output from the various modules in the compiler, it was very useful to identify bugs and ambiguities in the code (not to mention incomplete library API specifications)!

4 A New EC Run-Time

As of mid-2002, the existing EC runtime supported the core primitive data types and basic operations on them. However it had no support for garbage collection, concurrent processes, message passing, distributed nodes, or module loading and upgrade. That is, none of the really ïnteresting" parts of the language (not to mention much harder to implement). Clearly to be a useful implementation of Erlang, these features had to be added. As well, I wished to use this implementation as a validation of the Safe Erlang language extensions and features.

5 Erlang Data Type Support

EC keeps all data values on the heap. References to these values use a tagged address. That is the bottom 3 bits of the address are used as a flag to indicate one of 8 major data types, being: Ïnteger, Float, Atom, PidPortRef, Tuple, Cons, Function, Binary". Details of the representation of most (except PidPortRef), are given in [9]. Remembering to mask these bits out before dereferencing is a nice trap for the careless (even despite having macros to make this easier, sigh)!

Note that references to data values can be found in other compound data items (eg tuples or lists/cons), in local variables in Erlang process (thread local) stacks, and in system data structures (such as process, module, registered_name and other tables, and in the message-passing buffers). This impacts on the selection of a garbage collector, as detailed below. Support for most of the primitive and compound data types had already been written by Maurice in the original EC development. Missing was any support at all for PidPortRef or Binary terms, and very incomplete support for Function terms.

5.1 Capabilities for PidPortRef Types

A key element identified as being required to support a Safe Erlang variant, was the use of unforgeable references (capabilities) with associated rights for safety critical data types (such as process identifiers). In existing Erlang systems, these values are all associated with the PidPortRef data type. Consequently I chose to implement a capability type as a generalization of this existing type. A capability is a globally (almost) unique identifier for some resource. I support a range of capability sub-types for resources including: "references, open files, net connections, loaded modules, nodes, process identifiers", amongst others. Within each of these resource types, the capability uses a monotonically increasing integer index of such resources on some node, and uses a globally distinct node name (currently DNS name) for the node.

As well as identifying some resource, a capability also includes a set of rights defining what operations are permitted on that resource. In EC we needed to determine an appropriate list of rights to use. These were resolved to be a general set of rights for manipulating capabilities: [info, register, unregister, restrict, revoke], and then for each sub-type, rights which map onto BIFs or categories of BIFs which manipulate such types. For example for process-identifiers: [exit, group_leader, kill, link, process_flag, send], or for nodes: [spawn, halt, processes, newnode, fileops, netops], etc. These may still need to be refined further with experience with their use.

Lastly, in order to make these unforgeable, an opaque, private value is associated with each capability, which can be used by the originating node to verify that the capability is valid and intact whenever it receives a request to manipulate the resource referenced by the capability. Note that this implementation does not require the capability's validity be verified elsewhere, thus avoiding the problem of key distribution that plagues many more general capability systems. In the current implementation, a 64-bit cryptographic hash is used as the private data, based on the TEA-CBC cipher [18]. This was chosen as it is very small and fast, with the hash size being a compromise between security and size. However other implementations such as password capabilities could be chosen and used on a node by node basis, since the check value is opaque outside the originating node.

Hence in EC a capability value on the heap is implemented as a structured data value of the form: [type|rights|index|nodeid|private].

5.2 Function Types

The existing runtime had limited support for function terms when used for intra-module function references. This had to be generalized to inter-module references, which interacted with the development of support for dynamic module loading and execution, detailed below. In particular, a function reference can be made to variables containing the module and function names (as atoms), which will not be known till runtime. Hence actual mapping to a specific function can only occur during execution. That is, you can have a function reference which looks like:

    Somefun = fun Name/Arity,
    Res = Somefun(Arg1, Arg2),

Only at runtime will the actual values in the Name and Arity variables be known, and hence the actual function, and its loaded address be identified.

One currently unresolved complexity with implementing Erlang function references involves inline function definitions which refer to local variables in the defining function, whose values need to be preserved for subsequent references to the function. This is the intention of the initvar field noted above. However its actual implementation is still incomplete. This aspect of the language seems difficult to map into a compiled implementation.

5.3 Binary Types

Support for binaries was a fairly simple addition. A binary data value on the heap had the form: [type|len|data...]. The contents of the binary were assumed to be an öpaque" blob (and in particular to not contain pointers to other data items on the heap for garbage collection purposes).

One issue that is currently unresolved is support for the concat and split binary BIFs. Currently these involve the creation of entirely new binary values on the heap, and hence the copying of all of the data contents. I have seriously considered an alternate implementation, tagged in the type field, which involves an array of [len|addr] pairs, being a scatter-gather collection of components of the binary. This would greatly improve the efficiency of the concat and split binary BIFs, especially on larger binary values, at a cost of greater complexity of code in other BIFs which must utilise the binary data values (and which currently need only refer to a single "chunk" of data). This issue is yet to be resolved.

5.4 Garbage Collection

Garbage collection is a serious issue for implementing any functional language, since it assumes storage management is automatic. That is, terms are created when needed, and the space used must be released when no longer referenced anywhere. These mechanisms can make code much safer, since it avoids memory leaks. However writing such a package is a non-trivial undertaking, and not of research interest for either Maurice or myself.

In the specific case of implementing Erlang in EC, there are a number of issues of concern. As already noted above, references to potentially collectible data items can occur in a range of places - heap, stack, system data, which must all be identified so they can be scanned. Also execution of code that could change these values must be suspended whilst collection is occurring. Given the multiple thread environment being implemented, this is non-trivial, and means that the garbage collector must be thread-aware, cogniscient of where the various thread stacks are located, and how to access and manipulate the thread-management data to suspend executing threads.

A key design decision concerning data storage, was whether to use a single global heap, or multiple, thread-local heaps. The tradeoff is that the former means that message passing can be done by copying data references to the heap, rather than complete data items as the latter would require. However the use of thread-local heaps would simplify garbage collection, since only the only thread would need to be suspended whilst collection occurs. However given the centrality of message-passing to Erlang code, and the potentially rather large size of data values (particularly binaries, lists and tuples), a global heap implementation was chosen.

For contrast, it is worth noting that the OSE system uses thread-local heaps by default, though support for a global heap is being developed (currently problematic), precisely because of the performance concerns noted above. The problems though are in the thread management. GERL also chose to use thread-local heaps and hence to copy for message passing.

After researching available packages the Boehm-Demers-Weisner mark-sweep garbage-collector was chosen for use [6,5,4]. This is available on many platforms, including all the target O S we used. Unfortunately the versions with thread support have been much less widely ported, currently only being available on the Linux and Solaris targets of those we use. This means that on the other platforms garbage collection is unavailable.

This is a major limitation, however no other package with comparable facilities exists, so it's not clear how we can progress this issue. It should also be noted that we have yet to really stress-test the capabilities of the garbage-collector in a reasonably large application with significant amounts of data manipulation. This is an open but certainly critical, issue.

6 Concurrency

A key feature of Erlang is support for cheap and efficient process spawning. Hence we clearly needed to support some form of light-weight processes. For greatest portability POSIX pthreads were chosen [14]. This meant that since the overall process space was common, with memory shared by all the threads, that a global heap, and message-passing by copying references rather than data was possible. More specifically, we chose to use detached, system scheduled, Posix threads, leaving the scheduling to the system scheduler for fairest sharing of the system resources. It is of interest that the GERL implementation in contrast, mapped Erlang process onto Linux processes, relying on the light-weight process forking implementations in the latest Linux kernels for efficiency, and accepting message passing by copying data.

Having chosen pthreads, it was next found that just because it's a standard, that doesn't mean all implementations are equal - because they're not! On the various target platforms we used, different sets of the pthread functions were available. Also different sets of Makefile flags were needed to correctly build the libraries on the different O/S targets. Further the underlying thread identifiers varied greatly. The greatest problem area related to cancellation, discussed below. However the variation in identifier form meant we had to build tables to map the "generic process index" we used in our process identifier capabilities to the local pthread identifier on any given system. Which meant managing our own üser space" process table.

6.1 Key System Data Structures

Hence as part of the concurrency implementation, we needed to build and manage a üser-space" process table. For every Erlang process, implemented as a separate pthread, we needed to keep a record with details of its pthread identifier, Erlang master process capability, list of Erlang processes linked to it, its registered name, received messages buffer, assorted flags, access locks etc. These records were then stored in a hash-chain (or bucket-hash) table, which comprised an array of pointers to lists of the process-table records.

The hash-chain structure proved very effective, and it was re-used for a number of other system tables, including the module, node, and registered_name tables.

Another key data structure is the use of thread-specific data in each Erlang process (pthread). It is a record, accessed using the pthread_getspecific call, which houses assorted process (pthread) specific data. This includes the list of pending messages flushed from the global process-table record for the thread, the private process (pthread) Erlang dictionary, and other utility variables. It was convenient, and efficient, to use this data area, rather than to have to access the process-table record whenever such information was needed, with the overhead of locking that entails.

6.2 Message Passing

A key field in the process-table record is the received message buffer. For ease of debugging from within erlang programs, this was implemented as a standard Erlang list (ie a linked lists of cons terms, terminated by the null list tag). Each item in the list has the next message sent as its head element. One critical difference though was that this list was allowed to be mutable, where as Erlang lists are required to be immutable (any change requires copying the list - but leads to safer code). This was done for efficiency reasons. However it does mean that care is needed if the full list is passed back to Erlang code (usually only done for debugging reasons), in which case the list is cloned to create an immutable copy. This same approach was taken with a number of other system managed lists, including the linked processes list, and the process dictionary.

Also, instead of allocating new cons cells on the heap for every message, a (configurable) fixed sized buffer was used. This meant it was possible that messages may not be delivered when it overflowed, this was not believed to be a major concern. Especially since the implementation has each Erlang process (pthread) flushing all messages out of this buffer and copying into a list referenced from the thread-specific data. Thus the central buffer only has to hold messages sent between receive calls for any Erlang process. Given the centrality of message passing to much Erlang code, this was felt to be reasonable.

6.3 Fun and Games with Locking

Whenever there is shared data, there is of course, a requirement to provide locking to ensure serialization of access to this data. All of the key system tables are shared, and hence must be locked. In the first instance a ptread_mutex was used to manage access to these global data structures, including the global atom, open file, loaded module, registered name, open net socket, known nodes, and of course process tables.

The process tables handling though, was rather more complex than the other tables (which were fairly simple key-value lookups). There were a number of reasons why processes might want access to a record in the process table, and having obtained it, may wish to hold it for some time whilst, for example, writing messages or signals etc to it. For this reason, a two-stage locking process was chosen for this. The global mutex is used to access or manipulate the table as a whole. Then in every record there is a condition variable used to manage access to it, and which must be acquired before any change is made to that record. Several overlapping conditions were used depending on who wanted access and what for. Once the condition variable lock was acquired, the overall table mutex could be released. This approach seems to work well, and handles for example, the case when the process is exiting (see below) to ensure that all other attempted accesses are flushed before acquiring the lock.

7 Cancellation is a Nightmare

One of the biggest problems (read nightmares!) in developing the new runtime was providing suitable support for cancellation of Erlang processes (pthreads). Erlang permits any process knowing the process identifier of another to send it a signal which may or may not result in termination of the other process. Further when terminating, an Erlang process needs to convey its exit status to the list of all processes linked to it (as wishing to receive said status). This is done by passing a message of appropriate form to all such processes. Hence process termination is a rather complex and involved process. Although the use of capabilities placed some limits on who could take such actions, they were still very much present.

Having chosen to use pthreads, it seemed an obvious choice to use the thread cancellation mechanisms provided. I should state that the Posix pthread guide [14], did warn that the use of cancellation was extremely problematic and fraught with danger. They were right!!! The short answer is (which took 2 months of frustration to reach) it doesn't work!

In perhaps a little more detail the problems that occurred firstly involved highly varying support for the mechanisms. In particular on MacOSX 10.1 and 10.2 for example, the code simply did not work as specified! Leaving that aside, an effort was made to use it on the other platforms. The next issue was the selection of the cancellation mode - deferred cancellation was chosen as it meant that cancellation would only occur when calling certain, documented, system library routines (unfortunately the list of exactly which routines also varied from O/S to O/S). When a thread was sent a cancellation signal, the next time it called one of the defined routines, control would be transferred to the cancellation handler (multiple of which could be pushed onto a cancellation stack). So far so good.

The next issue was the dreaded locks. When a thread was canceled and started executing the cancellation handler, it could have been interrupted whilst holding one of the locks to access a global table. Initially when there was just the process and names tables this did not seem too bad. The process needed to record when it was holding a lock so it could be safely released by the handler. But with the number of locks growing this started to become very cumbersome. An alternate approach was trialed, of using errorchecking mutexes. This means that, at the cost of additional overhead, all mutexes track which thread currently holds them, so that attempts by threads not holding them to release the lock can safely be ignored. And then having the cancellation handler simply attempt to release all locks. Of course, it goes without saying that support for this feature was not present on all platforms! And the ever growing number of locks was still problematic.

But wait, there's more! As noted above, a process (pthread) being canceled needs to relay its status to other linked processes. Which involves sending messages to them, by adding these messages to their received message buffer. Which means acquiring a suitable lock on the process table and then on the condition variable for the specific record for that process. But this call is one of the cancellation points. Which results in the cancellation routine being recursively called. Which results in a rapid spiral into oblivion! In short it just doesn't work.

So how did we solve the problem? In a nutshell, abandon any attempt to use pthread cancellation, and code it explicitly into the process table handling. An exiting flag was added to each thread's process table record. This could be set by any other thread which had acquired the appropriate locks. Some time later when the target thread next accessed the process table (which, note, occurs for all message passing operations), it can check this flag as part of the handling. If set it calls its exit handler (which carefully set flags so it knew it had already been called)! This all worked well, and even better, only used features that were actually supported on all target O/S platforms! The downside, is that cancellation would be deferred until the targeted thread next accessed the process table. But realistically, this is very little different to how pthread deferred cancellation works. And in essence all we had done was reimplement deferred cancellation in user space - but where we had complete control of the process.

So the moral of this sorry tale is that you should not even contemplate using cancellation with pthreads interacting with each other and using locks. It just does not work! Which is, of course, what they said in the first place, and which begs the question of why they even bother with the calls in the API. Whatever, I hope you don't get caught.

8 Dynamic Code Loading and Execution

The next major implementation issue was adding support for runtime binding of function calls to the appropriate routine, and providing support for dynamic code loading and replacement.

In EC the decision was made early to use standard C calling conventions, to maximise interoperability with code compiled in other languages. All function parameters and its return value had to be Erlang term references (type eptr). A naming convention was adopted whereby a call to function F in module M with N arguments would be mapped to an entry point name of M_F_A, and that this function would be located in a file with name M (and a suitable suffix for dynamically loaded modules on the O/S).

Further, as has been noted before, Erlang can include code like:

    apply(io,write,Args),

Even though the module and functions names might be known at compile time (though they might also be variables), given the use of an arbitrary list Args for the arguments, the actual number of arguments would not be known till runtime. Apart from delaying resolution of the actual entry point name till then, a further concern was how to actually build a conventional C function call with the number of arguments only being determined at runtime. In a word, it simply can't be done in C. For the target architecture, some inline x86 assembler code was used to walk the argument list, pushing arguments onto the stack in the correct order, and then doing the actual function call. For all other systems a really "nasty hack" ^TM involving a switch statement on the number of args and a series of calls where the correct number of args were used (and if you wanted more than 10 args you were plumb out of luck!). Very messy. And a limitation that GERL ran into also.

Having managed the problem of building a call, there was still the issue of determining the entry point. The solution chosen was to require all modules to include a global table with the name M_module_info which contained a series of triples being [function_name, arity, entry_point_addr]. This could be scanned to determine which specific function was needed, and the entry point address was then used in the call. The EC compiler generates this (and the other tables noted below) automatically. Code compiled from another language needs these tables to be created manually. Note that the first entry in the table is reserved, and holds module_name, number_exported_functions, module_load_address, the last being the address in memory where dlopen placed the loaded module.

As well as the M_module_info table, there are also two other tables which can optionally be included. M_import_tab listing other modules imported for use by this module, and the M_attribute_tab which lists Erlang attribute names and associated values (both as plain C strings).

In order to locate the correct M_module_info table for any module M, pointers to them were kept in a global loaded modules table. This is another hash-chain table, keyed off the module name. It also has the nice property that if a module of the same name is loaded (to replace an older module), since entries are inserted at the head of the chain from any bucket, the most recently loaded module table for any name will be accessed. This global table is initialised with an entry for the erlang module, which is a placeholder for all the BIFs which are implemented as part of the runtime. This table is automatically generated (sed is wonderful) whenever the runtime is compiled. And thankfully permits the actual code to be scattered over a large number of statically linked in modules.

Which then leads to the issue of how to dynamically load modules at runtime. Again, in the interests of maximum portability, the obvious choice seemed to be the use of the dlopen library. Although this is not the standard mechanism for dynamic code loading on MacOSX, thankfully the 3rd party dlcompat library took care of this issue. After working out the necessary Makefile flags (which needless to say were vastly different on all supported platforms), this proved to be relatively straightforward to implement. The use of the module info table detailed above was chosen over direct use of the dlsym call, since it both permitted support for ßtatic" modules (such as erlang), and also the handling of code updating more easily.

The checking for existing use of, and the subsequent unloading of superceeded modules is still to be implemented.

9 External Interaction

In OSE Erlang, being an interpreter, interaction with external devices and processes, including access to files and network, is handled using driver code linked in with the runtime. The "handle" for such external data, is a port, one form of the PidPortRef data type.

Since directly supports links to external code, the decision was made not to take this approach. For the general case, suitable interface code could be written in any suitable language, with glue routines conforming to the EC module conventions (noted above) used to provide the interface between the Erlang code and the external code.

In the case of file and network access though, suitable glue code was provided as part of the runtime, and specific "file" and "net" sub-types of the general capability type were used as the handle for such open files or network connections. This has the added advantage of leveraging the capability rights mechanism to control what forms of access are permitted to these types of resources.

10 Distributed Erlang Nodes

Probably the last critical element of the runtime was providing support for interaction with other Erlang nodes, thus implementing the distribution features of Erlang which make the language so powerful. There are two key components of this. First is the handling of the external representation (serialization) of Erlang terms. And second is the implementation of the distribution protocol, and the handling of remote requests from other nodes.

The ëxternal format" has evolved through several versions in the evolution of Erlang. It is documented and implemented in the erl_interface standard library module in the OSE Erlang distribution [17]. Whilst much of this could be directly adapted for use by the EC runtime, there were problems with the handling of functions and the general "capability" (ie pidportref) data type. In the end, there had to be two versions of the external format. One compatible with the OSE protocol (which lost some information present in the original EC terms, which then had to be either cached or recreated), and another with a couple of non-standard extensions for use when interacting with other EC nodes which encoded all of the information in EC terms. An EC node will correctly decode either, but care needs to be taken when sending data to a non-EC remote node.

The final critical component is the distribution protocol. Again this has evolved, and is still evolving as Erlang changes. It has two components. The first involves interaction with an Internet daemon called epmd, a standard part of the OSE Erlang distribution [17], which provides a mapping between Erlang node names on some system and the TCP port they listen for external connections on. This has been implemented, so EC nodes can register themselves, and can lookup other nodes.

The second key component is the actual interaction with the remote nodes. This is still under development as perhaps the last critical component still missing in the EC runtime. Once this is done, it should be possible for Erlang code running on EC nodes to interact with code running on conventional OSE Erlang nodes, which is a key design goal. Note that GERL cannot do this, whilst of course HiPE can, since it uses the same runtime as OSE Erlang.

11 Other Issues and Further Work

At this stage the final component of the distribution protocol critically needs to be written. It should then be possible to seriously compare and contrast code running in an EC node to that running on an OSE node. As well, EC code needs to be stress tested with suitably large, long running, and complex applications to ensure that all the mechanisms used function in a production environment. Only then will it possible to evaluate whether the effort of building a fully compiled system has been beneficial.

As far as further work is concerned, there are still a number of unresolved issues in the implementation, and alternatives. In the compiler code, the choice of data representation and efficient cacheing of references to primitive types such as atoms, integers, floats and to function references, has been flagged for further discussion. Also resolution of the incomplete garbage collection support has to be resolved. Perhaps somewhat further in the future, it would be highly desirable to develop code generators for other architectures, particularly PPC and SPARC since the rest of the system already works on systems using these. The design of the compiler isolates code generation to the final (sixth) phase of the compiler, hence adding new generators should not be too difficult.

12 Conclusions

This paper documents some of the design decisions taken, and the problems found, in implementing a reasonably complete runtime for the EC Erlang compiler. It had to provide support for the various features of the language, most critically concurrency, cancellation, and dynamic code loading. The use of the Posix pthreads, the use of üser space" cancellation handling, and the use of the "dlopen" library are probably some of the key final design choices.

At this time, the new EC runtime is close to being a reasonably complete implementation of the Erlang language in a fully compiled environment. I hope it will be used to contrast the efficiency of code running in this environment against the more traditional interpreted environment. In its development, I believe it has highlighted some interesting issues in the development of a fully compiled environment for a functional language like Erlang.

On a more personal and pragmatic note, this work has resulted in the original, minimal runtime comprising 1 module, about 1500 lines of C code and 72 external entry points expanding into a rather more substantial 16 modules, 15000 lines of C code and 222 external entry points during my period on sabbatical! The latter number is rather indicative of the fact that the Erlang BIFs include much functionality more commonly associated with O/S support, emphasising that it is intended to be a self-contained language for many applications.

13 Acknowledgments

This work was conducted during my special studies program in S2 2002, whilst visiting SERC at RMIT in Melbourne. In particular, I'd like to thank Dr Maurice Castro for the initial work on EC, and his input on my extensions to it; and Prof Fergus O'Brien for his support in hosting me at SERC.

References

[1]: J. Armstrong, R. Virding, C. Wikstrom, and M. Williams. Concurrent Programming in Erlang. Prentice Hall, 2nd edition, 1996. http://www.erlang.org/download/erlang_book_toc.html.
[2]: Joe Armstrong. The Development of Erlang. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming, pages 196-203. ACM, 1997.
[3]: Jonas Barklund and Robert Virding. Erlang 4.7.3 Reference Manual: Draft 0.7, 1999. http://www.bluetail.com/~rv/Erlang-spec/Drafts/es_4.7_0.7.ps.gz.
[4]: H. Boehm. A garbage collector for C and C++. http://www.hpl.hp.com/personal/Hans_Boehm/gc/.
[5]: H. Boehm. Space Efficient Conservative Garbage Collection. SIGPLAN Notices, 28(6):197-206, June 1993. http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi93.ps.Z.
[6]: H. Boehm and M. Weisner. Garbage Collection in an Uncooperative Environment. Software Practice & Experience, pages 807-820, Sept 1988.
[7]: Lawrie Brown. SSErl - Prototype of a Safer Erlang. Technical Report CS04/97, School of Computer Science, Australian Defence Force Academy, Canberra, Australia, Nov 1997. http://www.unsw.adfa.edu.au/~lpb/papers/tr9704.html.
[8]: Lawrie Brown and Dan Sahlin. Extending Erlang for Safe Mobile Code Execution. In Y. Mu V. Varadharajan, editor, Information and Communication Security, volume 1726 of Lecture Notes in Computer Science, pages 39-53. Springer-Verlag, Nov 1999. http://www.unsw.adfa.edu.au/~lpb/research/sserl/icics99.html.
[9]: Maurice Castro. EC: an Erlang Compiler. Technical Report SERC-0128, Software Engineering Research Centre, RMIT, Melbourne, Australia, Jul 2001. http://www.serc.rmit.edu.au/~ec/docs/SERC-0128.pdf.
[10]: Maurice Castro. EC: an Erlang Compiler. In Seventh International Erlang/OTP User Conference, Stockholm, Sweden, Sep 2001. Erlang Systems. http://www.erlang.se/euc/01/castro2001.ps.
[11]: Maurice Castro and Lawrie Brown. EC: an Erlang Compiler, 2003. http://www.cs.adfa.edu.au/~ec/.
[12]: Marc Feeley. ETOS Distribution, 2003. http://www.iro.umontreal.ca/~etos/.
[13]: Erik Johansson, Mikael Pettersson, and Konstantinos Sagonas. A High Performance Erlang System. In 2nd International Conference on Principles and Practice of Declarative Programming (PPDP 2000), Montreal, Canada, Sept 2000. http://www.csd.uu.se/~happi/publications/p32-johansson.pdf.
[14]: Bil Lewis and Daniel J. Berg. Multithreaded Programming With PThreads. Sun Microsystems Press, 1997. 0136807291.
[15]: K. Sagonas, M. Pettersson, R. Carlsson, P. Gustafsson, , and T. Lindahl. All you wanted to know about the Hipe compiler (and might have been afraid to ask). In ( Submitted for publication), 2003. http://www.csd.uu.se/~kostis/Papers/erlang03.pdf.
[16]: K. Sagonas, M. Pettersson, R. Carlsson, P. Gustafsson, , and T. Lindahl. HIPE Homepage, 2003. http://www.csd.uu.se/projects/hipe/.
[17]: Erlang Systems. Open Source Erlang Distribution, 1999. http://www.erlang.org/.
[18]: David J. Wheeler and Roger M. Needham. TEA, a Tiny Encryption Algorithm, volume 1008 of Lecture Notes in Computer Science, pages 363-366. Springer-Verlag, 1994. http://www.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html.
[19]: Geoff Wong. Compiling Erlang via C. Technical Report SERC-079, Software Engineering Research Centre, RMIT, Melbourne, Australia, Dec 1998. http://goanna.cs.rmit.edu.au/~geoff/erlang/gerl_paper.ps.gz.
[20]: Geoff Wong. GERL Distribution, 2002. http://goanna.cs.rmit.edu.au/~geoff/erlang/.

File translated from T_EX by T_TH, version 2.89.
On 30 Jun 2003, 14:31.