Towards a Safer Erlang

Dr Lawrie Brown

School of Computer Science, Australian Defence Force Academy, Canberra, Australia

Abstract

In order to support the evolving telecommunications applications arena, there is a desire to modify the Erlang language and execution environment to provide safe and partitioned execution of externally sourced mobile agents, applets, or outsourced programs; which are imported and run on a local Erlang system. Following trials with several earlier prototypes, a consensus is emerging on the features required in a secure Erlang. This paper discusses the requirements that have been identified, and relates them to ongoing work with the SSErl prototype.

Introduction

Erlang is a declarative language for programming concurrent and distributed systems which was developed at the Ericsson Computer Science Laboratories [AVWW96], [Arms96], [Wiks94]. It is a dynamically typed, single assignment language which uses pattern matching for variable binding and function selection, has explicit mechanisms to create concurrent and distributed processes, and advanced facilities for error detection and recovery.

Two prototypes of a safer Erlang environment have been trialed.

SafeErlang: developed by Gustaf Naeser and Dan Sahlin at Uppsala [Nae97a] (and used in [JNS97]) supports a hierarchy of subnodes to control resource usage and to support remotely loaded modules; and "encrypted capabilities" for pids and nodes to control their usage.
SSErl: developed by the author whilst on his sabbatical in SERC and NTNU [Bro97d] supports a hierarchy of nodes on each erlang system which provide a custom "context" for processes in them, and the use of "password capabilities" for pids, ports, and nodes to constrain the use of these values.

Following trials with these prototypes, and extensive discussions between myself, Maurice Castro, Gustaf Naeser, Dan Sahlin, and other colleagues at Ericsson CS labs and SERC, a consensus is emerging on the features required in a safer Erlang. In the following sections I present my understanding of this consensus. Later in this paper I present a method of imposing a security policy on services by the use of a monitor function applied to server requests by a modified safe_gen_server module, as a better solution than that previously trialed with SSErl.

Rationale for a Safer Erlang

There are a number of scenarios which can be considered where a safer erlang execution environment is desirable. These include:

outsourced code: where a system wants to run code supplied by a third-party, but in a controlled and constrained environment to limit access to resources to those necessary to run the desired application.
mobile agents: where a system chooses to run a server offering to host "mobile agents" which request to migrate a group of processes to it. These processes in turn may load and run remotely sourced modules (loaded in context based on the agent source location) using the new MID mechanism.
applets: where a system (user) chooses to run a client which requests that one or more remotely sourced modules (applets) be loaded and run.
fault isolation: where a large application is partitioned into a number of sections, each executed is distinct environments with limits on resource usage and resource access.

All of these may be regarded as instances of "mobile code" which executes in a constrained environment at a location distinct from its origin. There are obvious safety issues raised, whereby the executing system wishes to constrain the information which may be accessed, and the resources used by such code. Some of these issues, and existing proposed solutions, are canvassed in Bro96b. To date most of the focus on providing safe execution environments has involved adapting traditional procedural programming languages. In this work, we are evaluating the ease of modifying a declarative, functional language, Erlang, instead.

Using Erlang as the core language for a safe execution environment provides a number of advantages. Firstly "pure" functional language use, where functions manipulate dynamically typed data and return a value, is safe (apart from problems of excessive resource usage). It is only when these functions are permitted to have side-effects that they may become dangerous. Side effects are possible in Erlang when a function:

accesses other processes: by sending and receiving messages or signals, viewing process information, or changing the process flags, on either local or remote processes.
accesses external resources: outside the Erlang system (files, network, hardware devices etc), using the open_port BIF.
accesses permanent data: held in databases managed by the db BIFs.

Thus a safer Erlang requires controls on when such side-effects are permitted.

Features Required in a Safer Erlang

The consensus on the extensions needed to control side-effects is that two changes are needed to the current Erlang system:

nodes: should form a hierarchy within an erlang system to provide a custom "context" of services available, restrict the use of code with side-effects, and impose resource utilisation limits.
capabilities: used to impose a finer granularity of control on the use of process (and port and node) identifiers, making these unforgeable with a specified set of rights on their use.

Nodes

A hierarchy of nodes should exist within an Erlang run-time system (an existing Erlang node). These would provide a "custom context" of services available, restrict the use of code with side-effects, and impose resource utilisation limits. Functionally these would be similar to existing Erlang nodes with some additional features.

A "custom context" for processes is provided by having distinct:

registered names table: a table of "local" names and their associated capability identifiers. Used to advertise services by name to processes executing within the node. These names are not shared with other nodes, and are not visible to other nodes unless an appropriate server is running to access them as a "distributed" name (discussed in the names section below).
loaded modules table: which specifies which modules are currently (or may be permitted to be) loaded for use by processes executing in the node. This table maps the name used in the executing code to the name used locally for the loaded module. This module name "aliasing" mechanism is used to support both redirection of names to "safer" variants of modules, as well as to support modules loaded from remote systems (for agents or applets), where the module name needs to be made unique on the local system. This table would be consulted to map the module name on all external function calls, applys, and spawns. It can be pre-loaded with "safe" aliases, as well as names of standard library modules where the local copies should be used by agents or applets. It would then be updated by other module load requests as processes execute, loading the modules from the appropriate context (source node) if permissible.
capability key data: the information necessary to create and validate unforgeable capabilities for the node, as discussed below..

Restrictions on "side-effects" are enforced by specifying whether or not each of the following are permissible for processes executing in the node:

open_port
block direct access to external resources such as files, network, hardware devices etc.
external process access
block access (eg. send,exit,link,process_info etc) to processes running in nodes on other Erlang systems (ie accesses which are provided via the net_kernel).
database BIFs usage
block direct access to permanent data accessed via the local database manager.

When disabled, access to such resources would have to be mediated by server processes running on the local system, but in a more privileged node, which would be trusted to enforce an appropriate access policy for safety. Typically these servers would be advertised in the registered names table of the "restricted" node. Standard servers can be used, if they are started with a custom safety policy check function, as discussed later.

Resource limits could be imposed by a node for all processes executing within, or any descendent child node of, it. Limits could be imposed on some of cpu usage, memory usage, max no reductions; or perhaps on combinations of these. Further work defining appropriate resources to be limited, is needed.

The general approach to creating a controlled execution environment to support the type of scenarios described above is as follows. First a number of servers are started in a node with suitable privileges, to provide controlled access to resources. Then a node would be created with "side-effects" disabled. Its registered names table would be loaded referencing these servers, its "loaded modules" table with appropriate local library and safe alias names; and with appropriate resource limits set. Processes would then be spawned in this node to execute the desired "mobile code" modules, in a now appropriately constrained environment.

Capabilities

Currently in Erlang, if a process obtains or constructs a process identifier, it has a large degree of control over that process. It can send and receive messages or signals, view the process information, or change the process flags. This is unsafe. A finer granularity of control on the use of these process (and port and node) identifiers is needed.

Making these identifiers capabilities is one means of providing this finer granularity. A capability is a globally unique (in time and space), unforgeable name for (a specific instance of) some resource. Associated with it is a list of rights for operations permitted on the resource by holders of the capability.

Capabilities may be implemented in several ways. Encrypted capabilities and password capabilities have been demonstrated in the safer erlang prototypes, and are being considered for use.

SafeErlang used encrypted capabilities for pids and nodes. It encrypted all of the information identifying the resource and the rights to use it. This incurs an overhead whenever the information must be accessed or verified, which depends on the type of encryption algorithm used. Once created, there is no way to revoke these capabilities apart from destroying the associated resource. It is also subject to an attacker attempting to guess the key used to encrypt the capability in order to forge other capabilities.

An alternative variant of encrypted capabilities, is to use a cryptographic hash function (eg [RFC2104]) to create a check value for the capability information, which is then kept in the clear. This system is still subject to an attacker attempting to guess the key used in creating the check value, and verifying the guess against intercepted capability data. The likelyhood of success will depend on the type of hash function used.

SSErl used password (sparse) capabilities [APW86]. In a password capability system, the capability is a data item which specifies the node creating it, and a random value (selected sparsely from a large address space). It is possible to try and forge a capability, but success is statistically highly improbable, and attempts should be detectable by abnormally high numbers of requests presenting invalid capabilities. The capability has no meaning on its own, but is only of use as a token when supplied with a request for some operation to the node which maintains a table of valid capabilities. This imposes a space overhead on the node state table. However password capabilities may be easily revocated by removal from the table of currently valid capabilities. A disadvantage is the size this table may grow to, particularly for long running server processes, and if capabilities are used for references or user defined values, where it is impossible to know when they have no further use. Also large tables may take some time to search, though careful selection of the table mechanism can reduce this to a minimum.

There is thus a tradeoff between these alternatives - trading some level of security with encrypted capabilities for space with password capabilities. The best alternative is likely to depend on the target environment.

Experience with the prototypes has also shown that it is important for efficient execution that all information needed to evaluate guard tests or pattern matches be present locally in the capability. This information must include the type (for the various pid/port/ref etc guards) and the value (to test if two capabilities refer to the same object).

Consequently I propose that capabilities have the following form:

      {Type,NodeId,Value,Rights,Private}

where

Type
describes the type of resource the capability references, such as a node, process, port, reference, or user defined capability type (atom).
NodeId
the identifier of the node which created the capability, and which can be asked to verify the validity of the capability or perform operations on the specified resource, which is managed by that node.
Value
identifies the resource which is referenced by the capability (node, process identifier, port identifier, reference number, or any erlang term, respectively for the various types)
Rights
a list (bitmap?) of operations permitted on the resource referenced by the capability. The actual rights depend on the type of the capability. For a process capability, it could include rights like: [exit,link,info,register,restrict,send,trace,view].
Private
an opaque (probably binary) term, which can be used by the originating node to verify the validity of the capability when it is submitted with a request to perform some operation on the associated resource. It could either be a cryptographic check value, or a random password value, only the originating node need know.

(nb. I've expressed the capability as a tuple, as used in the prototypes, but in a final implementation it will be an indivisible, fundamental, tagged data item, as pids/ports/refs are now).

Using capabilities of this form would allow individual nodes (more likely systems) to choose whichever implementation (and associated tradeoffs) is most appropriate, whilst preserving interoperability amongst nodes.

My own preference, after evaluating the prototypes, is marginally for the use of encrypted capabilities, implemented using a cryptographic hash function (eg [RFC2104]) applied to the full, externally visible, values of the other components. Care should be taken in the choice of an appropriate hash function, since known flaws have been identified in some "obvious" modes of use [BCK96]. The overhead of validating the check value can be minimised if any "local" encrypted capabilities are checked once on creation (or import) and then simply flagged as such (say be amending the hidden data type tag to indicate its been verified). Subsequent use of the capability then incurs no overhead. This assumes the Erlang run-time system is trusted (which is implicit in all this work). Further for "remote" capabilities, any delays due to crypto overheads are likely to be swamped by network latencies. Each node would keep the key used to create and validate its capabilities secret, and this key could be randomly selected when the node is initialised. Any previously created capabilities must refer to no longer extant (instances of) resources (from a previous incarnation of the node), so there is no requirement to continue to be able to validate them.

As well as capabilities for nodes, processes and ports, I propose using capabilities for references, and for extensible "user capabilities". The latter provide an unforgeable data item which can be given to "untrusted code", and subsequently verified when supplied with later requests. This could be used to implement safer versions of the file, window and network servers for example; where the "user capabilities" are used to refer to individual files etc, passed as parameters in a series of requests, but in a manner which means the server can verify that the data value associated with it has not been tampered with.

A capability may be restricted (assuming it permits it). This results in the creation of a new capability, referencing the same resource, but with a more restricted set of rights. Using this, a server process can for example, register its name against a restricted capability for itself, permitting other processes to send to it, but nothing else.

Capabilities would be used instead of the existing node names, pids, ports, or refs by BIFs which create or use these resources. New BIFs would be required for operations such as:

check(Capa,Op)
checks if the supplied capability is valid and permits the requested operation (this is not a guard test as it must consult the originating node to validate the capability).
make_ref(Type,Val)
creates a user capability with the specified type (atom) and value (term), as a general case of a reference.
newnode(Parent,Name,Opts)
creates a new node as a child of the Parent, with the specified context options.
restrict(Capa,Rights)
creates a new version of the supplied capability, referring to the same resource, but with a more restricted set of rights.
same(Capa1,Capa2)
guard testing whether the supplied capabilities refer to the same resource.

Other Issues

Discussions on the experiences with the prototypes identified a number of other, not directly safety related, issues which need clarification and resolution.

Types of Registered Names

Current Erlang systems provide a registered names table for each node. Names in this table may be local, or they may be distributed to all participating nodes by the "global" server. This results in a flat global namespace, which does not scale well to very large numbers of participating nodes. Additionally, the language supports distributed names, which is a reference to a registered name on a named node.

Names may be used instead of pids to send messages, and to identify nodes. This functionality should continue with these replaced as capabilities (extended in the case of thinking of node names as a "registered name", probably a global name).

I believe that three categories of names are needed:

local: names registered in the table on a node, represented by an atom
distributed: names in the registered names table on another node, represented by a tuple pair {name,node} (where node is a capability). Such names could be accessed if the remote node chooses to run an appropriate server to accept requests to resolve local names.
global: names in a formal hierarchy, which identifies where to locate a server (or one of several redundant servers) to resolve the name (similar to the DNS in style). These would be represented by a tuple {name1,name2,name3,...}, possibly with an alternate syntactic form like name1.name2.name3... (but they should not simply be a special form of atom, so it is possible to reason about the separate components).

The addition of a formal, hierarchical global name space is the major new feature. It is needed, I believe, to support scaling of Erlang applications to large numbers of nodes. Its also needed to support mobile agents and long lived servers, which can be identified by name, with their associated capabilities being updated as they migrate or are restarted.

All of these names would be valid as the target of a send, and names associated with a node could be used instead of the current use of DNS node names in a number of operations. The Erlang run-time system would resolve the name as appropriate. If a cached copy is used for efficiency, there should be an automatic retrieval of the latest value if the cached value is found to refer to an invalid capability.

Module Naming Conventions

Currently Erlang supports a two level naming mechanism of functions within modules. There is no means of specifying that a group of modules are related, as is often the case in applications, or in libraries. Some concept, perhaps a project (or library) is needed for a group of related modules, which is then reflected in their names. This is recognised in the current informal convention of using a common prefix on some library and application names. This approach is not easy to work with explicitly, or to reason about components of the names.

There are proposals to have a formal three level hierarchy in the next release of Erlang. I would support this, and encourage the naming convention be such that the separate components can be easily matched in function clause heads or guards (ie perhaps a tuple should be used, rather than some convention of splitting an atom).

I also believe that the current standard library modules should be restructured and grouped as one (or a small number of) groups of modules (projects).

Mobile Agents

There is some interest in supporting the concept of "mobile agents" in Erlang. This was a key motivation behind the use of the SafeErlang prototype in [JNS97]. In determining the requirements for a safer erlang, consideration needs to be given to features needed to support such agents.

The desire is to be able to take a group of processes, and on request of some process, wakeup the group on another node, with all internal references intact. It is assumed that such a migration would only occur when all of the processes are requesting a reduction (function call), this being the point of scheduling in erlang. Given the functional nature of the language, all that need be migrated would be the function name and arguments, the module code, and the process dictionary for each of the processes in the group. Since all the erlang terms have to be marshalled and cast into external form anyway, it should not be hard to identify (capability) pids referring to the migrating processes, and flag them for replacement by appropriate new values on the new node. All other data items, including capabilities referring to other servers, or resources such as ports, would remain unchanged, and continue to be valid, referring to the appropriate remote resource (as is done now with pids). Subsequently during execution, any external function calls should be interpreted in the context of the original node for the agent, and the requested module be loaded from that node (which presumably has granted permission for that to occur) using the new MID mechanism.

The required additional features needed to support mobile agents are thus:

remote module loading: where modules can be loaded from a remote node (and kept distinct from local or other remote modules with the same name), the new MID concept will assist with this.
code mobility: where a group of executing processes can be migrated, with internal references intact, to another node (system).

An argument was proposed that the capabilities for migrating processes should remain unchanged, and that references to them should be redirected. I do not believe this is appropriate, as it is inconsistent with the definition of a capability, not to mention its practical implementation. It was motivated by a desire that other processes' references (copies of capabilities) should continue to remain valid. I believe they should be invalidated, since a migrated agent is now a new instance of itself. Instead, any process which wishes to have a "permanent" reference to an agent should use a "global" name instead. As part of the migration process, this global name can be updated with the new capability for the migrated agent.

Custom Safety Policies

To assist in the creation and usage of safer SSErl nodes, it is desirable that there be a clear and easy mechanism for specifying an appropriate safety policy. This would involve the correct specification of the options in the creation of a newnode, but would also require some means of controlling the use of some standard servers, especially those accessing the file system, network and window manager.

Currently Erlang supports a generic client-server mechanism using the gen_server module. With a small change to this module to optionally impose a check function against all messages received for the server, it is possible to run a number of instances of a standard server, each enforcing a different security policy. This results in a clean, and I believe easier to verify implementation, compared to implementing a suite of custom servers. It also results in the checking overhead only occurring when accessing such servers.

The use of this safety check function to assist in the easy specification and implementation of custom safety policies is further elaborated on in my subsequent paper [Bro97e].

SSErl

The SSErl prototype has been extended to incorporate most of the features identified above (although for now it continues to use password capabilities, and does not support remote module loading for agents), to enable further experimentation with these concepts. It is available upon request.

Conclusions

The rationale for needing a safer version of Erlang is presented, along with the current consensus on the features required to provide this. Primarily these are the provision of a hierarchy of nodes to provide a "custom context", restrictions on "side-effects", and resource limits; and the use of capabilities for nodes, processes, ports, and user defined references. Some other issues involving registered names, module naming conventions and mobile agents are also discussed. Then a method of imposing a security policy on services by the use of a monitor function applied to server requests by a modified safe_gen_server module is presented as a better solution than that suggested previously.

Acknowledgements

The SSErl prototype and this paper were written during my special studies program in 1997, whilst visiting SERC in Melbourne, and NTNU in Trondheim, Norway. I'd like to thank my colleagues at these institutions, and at the Ericsson Computer Science Laboratory in Stockholm for their discussions and support.

References

APW86: M. Anderson, R.D. Pose, C.S. Wallace, "A Password Capability System", The Computer Journal, Vol 29, No 1, pp 1-8, 1986.
AVWW96: J. Armstrong, R. Virding, C. Wikstrom, M. Williams, "Concurrent Programming in Erlang", 2nd edn, Prentice Hall, 1996. http://www.ericsson.se/erlang/sure/main/news/book.shtml.
Arms96: J. Armstrong, "Erlang - A Survey of the Language and its Industrial Applications", in INAP'96 - The 9th Exhibitions and Symposium on Industrial Applications of Prolog, Hino, Tokyo, Japan, Oct 1996. http://www.ericsson.se/cslab/erlang/publications/inap96.ps.
BCK96: M. Bellare, R. Canetti, H. Krawczyk, "Keyed Hash Functions and Message Authentication", in Advances in Cryptology - Proceedings of Crypto'96, Lecture Notes in Computer Science, Vol 1109, Springer-Verlag, pp 1-15, 1996. http://www.research.ibm.com/security/keyed-md5.html.
Bro96b: L. Brown, "Mobile Code Security", in AUUG 96 and Asia Pacific World Wide Web 2nd Joint Conference, AUUG, Sept 1996. http://lpb.canb.auug.org.au/adfa/papers/mcode96.html.
Bro97d: L. Brown, "SSErl - Prototype of a Safer Erlang", Australian Defence Force Academy, Canberra, Australia, Technical Report, No CS04/97, Nov 1997. http://lpb.canb.auug.org.au/adfa/papers/tr9704.html.
Bro97e: L. Brown, "Custom Safety Policies in SSErl", Australian Defence Force Academy, Canberra, Australia, Technical Note, Jun 1997. http://lpb.canb.auug.org.au/adfa/papers/ssp97/sserl97e.html.
JNS97: I. Jonsson, G. Naeser, D. Sahlin, et al., "Adapting Erlang for Secure Mobile Agents", in Practical Applications of Intelligent Agents and Multi-Agents: PAAM'97, London, UK, Apr 1997. http://www.ericsson.se/cslab/~dan/reports/paam97/final/paam97.ps.
Nae97a: G. Naeser, "Your First Introduction to SafeErlang", CS, Uppsala University, Jan 1997. http://www.csd.uu.se/~gaffe/general/safe/nae97a.ps.gz.
RFC2104: H. Krawczyk, M. Bellare, R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", IETF, RFC 2104, Feb 1997.
Wiks94: C. Wikstrom, "Distributed Programming in Erlang", in PASCO'94 - First International Symposium on Parallel Symbolic Computation, Sep 1994. http://www.ericsson.se/cslab/erlang/publications/dist-erlang.ps.

The latest version of this paper may be found at: http://lpb.canb.auug.org.au/adfa/papers/ssp97/sserl97c.html.

Last updated: 27 June 1997.