A New Runtime for the EC Erlang Compiler
Dr Lawrie Brown
School of Computer Science, Australian Defence Force Academy, Canberra, Australia
Email: Lawrie.Brown@adfa.edu.au
Last updated: 12 Mar 2003
Abstract
This seminar will discuss the new runtime being developed for the
EC Erlang Compiler, which is intended to be both the core compiler
technology for the Magnus massively scaleable computing platform,
as well as a means to generate standalone executables of Erlang
programs. It has also been used to implement some previously
identified safety extensions to improve the safety of Erlang when
used for mobile code applications amongst others. These include
using capabilities for all critical resources (processes, open
files, network connections, references etc), as well as a hierarchy
of nodes within an Erlang system.
The seminar will start with a brief overview of the Erlang language,
which is a functional language designed to support robust, reliable,
distributed, near real-time applications, developed by the Ericsson
CS labs. Then I will describe the proposed language safety extensions,
and provide a brief overview of the Magnus architecture. The latter
half of the seminar will discuss my work on the new runtime for EC.
This includes the use of detached system pthreads to implement
Erlang processes, the use of the dlopen library to implement
dynamic module loading, memory allocation for Erlang terms and
the consequences for garbage collection, and some open issues on
the implementation of capabilities and the use of their rights in
the system.
Introduction
- A New Runtime for the EC Erlang Compiler
- Introduction
- Erlang - a functional, distributed language
- for robust, reliable, distributed, near real-time applications
- EC - an Erlang compiler to native code
- for Magnus massively scaleable computing platform
- as testbed for safe Erlang mobile code extensions
- ecrt - a runtime for EC
- using detached system pthreads, dlopen, GC
Erlang
- Erlang - Functional
- Erlang - Concurrent
- focus is on soft, real-time, systems
- have extremely cheap and efficient process spawning
- use returned process identifier (pid) for all interactions
Echo = spawn(echo,loop,[]),
Echo ! {self(),"Here is a message"},
....
loop() ->
receive
{From,Msg} -> From ! Msg, loop()
end.
- Erlang - Distributed
- remote spawn almost identical
- given pid, all use is independent of locality
- eases implementation of robust distributed applications
Echo = spawn(SomeNode,echo,loop,[]),
....
- Erlang Language Features
- Concurrency - extremely light-weight processes with message passing
- Distribution - easy process creation & interaction on many nodes
- Robustness - error and exception handlers for fault-tolerance
- Soft real-time response - handles events in millisecs
- Hot code upgrade - on running system
- Incremental code loading - at boot time or as needed
- External interfaces - to devices, files, network, other processes
- Why Erlang?
- a production grade language
- impressive time to market
- shorter time, better quality
- good performance
- sufficiently good for real telecomms
- in large systems loss of speed wrt C more than compensated for by ease of
development and maintenance
- real-time garbage collection works
- Limitations in Erlang now
- pure functional use safe
- only dangerous with side-effects
- pids/ports too powerful
- forgeable
- unrestricted control over resource
- eg can kill any process on system
- processes have same "world-view"
- modules, names, resources, servers
- no remote code loading in context
- Safer Execution in Erlang
- need controlled access to processes and external resources (ports)
- make these data types capabilities
- want custom world-view
- implement a hierarchy of (sub)nodes
- registered names
- modules available
- resource limits
- need remote code loading in context
- Safe Erlang - Capabilities
- unique, unforgeable resource name with associated rights to use
<Type,Rights,Value,Node,Private>
- for nodes/pids/ports/user capas
- nb. not files, databases etc, just access to
- possible implementations considered
- crypto hash check, validated by node
- password (sparse) key to table on node
- created/checked only by owner node
- Safe Erlang - Nodes
- provide hierarchy of (sub)nodes
- each with a custom context
- registered names
- control which servers are accessible
- modules available
- module name aliasing
- support remote loading, safety renaming
- resource limits on processes
- partition & constrain use of system resources
- child nodes must partition parents share
- Safe Erlang - Remote Code Execution
- support "remote" code loading and execution in context
- to support mobile agents, applets
- module refs in "remote" modules
- load requested module from remote system, not from local system
- each module has a "context"
- considering code mobility
EC
- EC - A Compiler for Erlang
- generates fast native machine code
- for a large subset of Erlang
- independent of existing runtime
- to be widely available, portable, royalty free
- for use as core compiler for Magnus
- and as testbed for safe Erlang extensions
- Open Source Erlang
- current main Erlang implementation
- internally threaded interpreter for BEAM machine
- robust, excellent development
- reasonable speed, limited by interpreter
- Other Erlang Implementations
- HiPE - High-Performance Erlang
- native code compiler for X86 & SPARC
- links to existing OSE interpreter/runtime
- ETOS - Erlang to Scheme
- converts Erlang to Scheme
- then use Gambit Scheme Compiler to generate C
- currently incomplete, has proprietary licence
- Gerl - Geoff's Erlang
- compile large subset of Erlang into C++
- semantic incompatibilities wrt to C++ (esp call conventions)
- closest to whats desired but has limitations
- EC Structure
- EC Status
- currently compiles most of the language
- as specified in Erlang book, and Erlang Ref Man 4.4 (1999)
- external format binaries, distribution, GC to come
- ports not needed (file/net now full types, can link to C etc)
- preprocessor TBD
- to x86 assembler for BSD/Linux
- integrates with runtime to implement BIFs, concurrency,
message passing, distribution, garbage collection
- Magnus
- a massively scaleable computing platform
- separate parallel activities
- message-passing
- non-blocking interconnect
- implementation to use
- DMA @ bus speed across switch
- WDM passive optical switch
- Erlang
- Magnus Architecture
- Magnus WDM Switch
ECRT
- ECRT - A New Runtime for EC
- previous version of EC had minimal runtime
- support for some core types and basic ops
- but NOT concurrency, message passing etc
- ECRT intended to provide full support for language
- implement all types (incl extended pidportref type)
- provide concurrency (multiple Erlang processes) & message passing
- provide dynamic code loading
- builds and tests on FreeBSD, Linux, MacOSX, Solaris
- ie. guess next backend targets for compiler ;-)
- ECRT - Basic Data Types
- Erlang is dynamically typed
- all values are tagged references to heap
- have 8 fundamental types:
- atom, cons (list item), tuple, integer, float, function,
binary, pidportref (capability)
- use global heap for all terms
- some issues with garbage collection in threaded env
- but message passing requires only pointer transfer
- ECRT - Capability (Pidportref) Type
- ECRT - Capability - Issues
- determining appropriate list of rights used
- basically map to BIFs which change status
- eg. all: MASTER, INFO, REGISTER, UNREGISTER, RESTRICT, REVOKE
- process: EXIT, GROUP_LEADER, KILL, LINK, PROCESS_FLAG, SEND, DEBUG
- NODE rights - can limit ability to access external environment
- eg. open files or net connections, change module load path etc
- node: SPAWN, HALT, PROCESSES, NEWNODE, MONITOR_NODE, FILEOPS, MODULE_PATH, NETOPS
- instead enforce interaction using message passing to more trusted node
- open issue as to whether all processes in a node are equivalent?
- ECRT - Concurrency
- Erlang inherently concurrent with message passing between processes
- node implemented as heavy weight Unix type process with address space protection
- Erlang processes implemented as detached, system scheduled, Posix threads
- have user-space Erlang processes table (hash-chain) to manage the many aspects
of Erlang processes (links, message buffers, flags, registered names)
- use thread-specific data to hold local info (pid, heap, process dictionary)
- ECRT - Concurrency - Issues
- cancellation is a nightmare!
- Erlang processes need to be able to send exit signals to each other
- originally tried using Posix deferred thread cancellation
- simply could not make it work reliably
- problems were handling of locks & amount of work on cleanup
- cancellation signal can be taken whilst Erlang process holds lock,
which must be tracked & released (became very hard)
- exiting processes must signal their exit status to other linked processes,
but this involves interactions requiring locks etc (had recursively cleanup calls!)
- finally discarded any use of cancellation
- now set flag in our process data structure
- rely on Erlang processes (threads) to check regularly (must do on
message passing, other process interaction)
- ECRT - Dynamic Code Loading
- Erlang modules are dynamically loaded and can be replaced
- have implemented this using
dlopen()
et al
- indeed can load not just Erlang (EC modules) but any compiled
code which conforms to naming/layout conventions
- module M in file M.so
- function F arity A has entry point M_F_A
- have module_info table ["F", A, *M_F_A]
- runtime manages a table (hash-chain) of loaded modules used to identify
externally accessible functions in each module
- must consult this table on every apply call and external function call
- ECRT - Dynamic Code Loading - Issues
- have some concerns on overhead consulting loaded module table for function calls
- still resolving how to implement code replacement
- loading a new module is easy, will be inserted before old in hash-chain,
hence all new calls will use latest version
- real issue is determining when no further Erlang processes are using old code
- current thinking is to have external calls instrument themselves to count
entry/exit
- ECRT - Garbage Collection
- a serious issue for this type of functional language
- terms are created when needed
- space used must be released when no longer referenced
- can make code much safer since avoids memory leaks
- but no main interest of either Maurice or myself
- chose to use existing Boehm-Demers-Weisner mark-sweep GC
- available for many platforms (good)
- thread support much less widely ported (bad - Linux & Solaris!)
- still need further work to resolve here
- ECRT - Other Issues
- Erlang apply BIF
apply(M, F, A)
- takes arbitrary list of arguments
- must implement by pushing onto callstack as needed
- have to use inline assembler to do efficiently (on x86)
- otherwise have "nasty hack" (tm) switch on num args!
- having runtime build/test on multiple platforms is great for isolating bugs!
Conclusion
- Conclusion
- have made much progress into making EC usable
- have explored some interesting issues in implementing a fully
compiled version of a concurrent, functional language
- still have more to do!
- Questions
- References
- Talk Outline:
-
http://lpb.canb.auug.org.au/adfa/seminars/adfa303/
- BDWGC
-
"A garbage collector for C and C++",
Boehm, H., HP, http://www.hpl.hp.com/personal/Hans_Boehm/gc/.
- BoWe88
-
Boehm, H. and Weisner, M.,
"Garbage Collection in an Uncooperative Environment",
Software Practice & Experience, pp 807-820, Sept 1988.
- Boeh93
-
Boehm, H.,
"Space Efficient Conservative Garbage Collection",
SIGPLAN Notices, Vol 28, ACM, No 6, pp 197-206, June 1993.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi93.ps.Z.
- BrSa99
-
L. Brown, D. Sahlin,
"Extending Erlang for Safe Mobile Code Execution",
V. Varadharajan, Y. Mu, in Information and Communication Security,
Lecture Notes in Computer Science, Vol 1726, Springer-Verlag, pp 39-53, Nov 1999.
http://lpb.canb.auug.org.au/adfa/research/sserl/icics99.html.
- Bro97d
-
L. Brown,
"SSErl - Prototype of a Safer Erlang",
School of Computer Science, Australian Defence Force Academy, Canberra, Australia, Technical Report, No CS04/97, Nov 1997.
http://lpb.canb.auug.org.au/adfa/papers/tr9704.html.
- Cast01
-
Maurice Castro,
"EC: an Erlang Compiler",
Software Engineering Research Centre, RMIT, Melbourne, Australia, Technical Report, No SERC-0128, Jul 2001.
http://www.serc.rmit.edu.au/~ec/docs/SERC-0128.pdf.
- Cast01b
-
Maurice Castro,
"EC: an Erlang Compiler",
in Seventh International Erlang/OTP User Conference,
Erlang Systems, Stockholm, Sweden, Sep 2001.
http://www.erlang.se/euc/01/castro2001.ps.
- EC
-
M. Castro, L. Brown,
"EC: an Erlang Compiler",
SECR RMIT and CS ADFA, 2003.
http://www.cs.adfa.edu.au/~ec/.
Lawrie.Brown@adfa.edu.au / 12 Mar 2003