Introducing SERCs Safer Erlang

Now Superceeded by SSErl - Prototype of a Safer Erlang

Dr Lawrie Brown

School of Computer Science, Australian Defence Force Academy, Canberra, Australia

Last updated: 24 April 1997.

Abstract

In order to support outsourced and third party telecommunications applications, there is a desire to modify the Erlang language and execution environment to provide safe and partitioned execution of externally sourced or outsourced programs which are imported and run on a local Erlang system. This paper outlines a possible design approach, and describes the initial prototype.

Introduction

Erlang is a declarative language for programming concurrent and distributed systems which was developed at the Ericsson and Ellemtel Computer Science Laboratories [AVWW96], [Arms96], [Wiks94]. It is a dynamically typed, single assignment language which uses pattern matching for variable binding and function selection, has explicit mechanisms to create concurrent and distributed processes, and advanced facilities for error detection and recovery.

Mobile Code is code sourced from remote, possibly "untrusted" systems or suppliers, but imported and executed on a local system. Consequently such code needs to be executed within some form of "constrained" or "sandbox" environment to protect the local system from accidental or deliberate inappropriate behaviour.

Given the anticipated rapid growth in telecommunications applications software, there is expected to be a rapidly increasing need to support third party outsourced code being executed on trusted systems. It is believed this can be done with an acceptable level of safety by the use of containment methods being developed to support the concept of mobile code, as used in Java [GM95], SafeTCL [OLW96], Omniware [LSW95], and Telescript [Tar95], amongst other systems (see overview in Brow96b).

The approach being considered here is support the execution of a number of mutually untrusted (and untrusting programs) within a dedicated Erlang node. This involves partitioning the node into a collection of separate "subnodes", which provide a restricted execution environment or "sandbox", along with controlled access to processes in other "sandbox" environments. Each of these "sandboxes" form a separate security domain, where the operations available will be constrained by an appropriately chosen security policy. Different policies can be enforced in different "sandboxes".

Making Erlang Safer

The Erlang language provides a number of inherent benefits. Its dynamic typing and single assignment prevent many classes of errors. The main additions involve controlling access to resources used to create and communicate with other processes, and to external devices. Currently this is through the use of Pids and Ports, with few restrictions on their use. Also, there is a need to partition a single "hardware" erlang node into a number of subnodes, each with a custom view of the world (in terms of registered names and which modules are available and used).

I propose protecting the former (Pids and Ports), as well as Nodes, by making them password capabilities [APW86]. In a password capability system, the capability is a data item which indicates the entity owning it (a node in this case), and a random value (selected sparsely from a large address space). An appropriate capability must be supplied, explicitly or implicitly, in order to perform most "unsafe" operations (which in Erlang involve the use of BuiltIn Functions - BIFs). The user is free to try and forge a capability, but it is statistically highly improbable that they'll create a valid one. The capability has no meaning on its own, but is only of use when supplied to its owner (a node) along with a request for some operation. One advantage of password capabilities is the ease of revocation, by removing it from its owning entity's list of currently valid capabilities. Any process subsequently trying to use it will fail with an invalid_capability (as also occurs if a forged capability is used).

For the latter, I propose creating a concept of subnodes to provide custom views. Each subnode should provide a "context" for processes executing in it. It provides distinct "registered names" and "module alias" tables. The registered names can be modified by processes with an appropriate subnode capability, and is used to send messages to named servers. The "module alias" table is specified when the node is created, and is used to alias module names at run-time when functions are invoked. By appropriate customisation, modules executing within a subnode can be provided with a custom view of modules used and servers available.

Prior Work

A system with similar goals, though with a stronger focus on code mobility, was SafeErlang, developed by Gustaf Naeser et al. at Uppsala [Nae97a], [JNS97].

SafeErlang incorporates three new concepts. Encrypted capabilities are used for pids and nodes, but not (yet) for open_port. This seriously limits its ability to protect key standard library routines, and accesses to external resources. Also the use of encryption introduces problems with appropriate key distribution in distributed environments, along with the selection of appropriate cryptographic algorithms. Subnodes are used to provide a custom collection of modules (with names rewritten at compile time), and to provide resource limits on processes executing in the subnode. Lastly, a new module loading (mid) mechanism is supplied to support code mobility. A new "code" server is used to manage the code (mid) distribution as required. In their current system, this necessitates a recompilation of source every time the module is loaded (in part to cope with varying module names needed, depending on which subnode it is being loaded in to).

The focus of their project was to support mobile agents, reflected in the emphasis on providing new module management functions, and providing resource limits for subnodes. For this, as reported in [JNS97], it has been successful. However I found the arrangement of servers used, and the division of responsibility for implementing various operations, to be unnecessarily complex. In SafeErlang, each "real" erlang node (system) has a "code" server which manages the new module loading mechanism; a "gate" server which manages the keys for all the subnodes on that system; and a "name" server which implements the replacement registered name system. Also for each (sub)node on the system, there is a "node" server process which manages the information for that subnode. Every time an "unsafe" BIF is invoked in a user process, a request is sent to the "gate" server which decrypts and checks the capability supplied. Usually a message is then sent from the "gate" server to the relevant node manager which actually implements the requested operation. Consequently the node managers must explicitly manage all the requested links between processes. Responses are returned via the same path. I believe this involves an unnecessary amount of message passing and state maintenance, and that a simpler and cleaner design is possible.

SERCs Safer Erlang Prototype

SERCs Safer Erlang (SSErl) is a prototypical implementation of what I believe to be a simpler and more comprehensive design of a secure erlang execution environment. It is less concerned with module mobility (though that should be added later by providing a custom error handler along with the use of the module alias mechanism), than with providing a comprehensive and elegant implementation of capabilities and subnodes with as few changes as possible to the existing erlang language specification. It is currently implemented as a collection of glue functions substituted by a modified erlang compiler for all calls of "unsafe" BIFs. These interact with "node" server processes, one for each distinct (sub)node on the erlang system. The prototype supports capabilities for pids, ports, and nodes; and a hierarchy of subnodes on each erlang system. Its uses a modified compiler (adapted from that developed by Naeser [Nae97a] for the SafeErlang system).

Most of the glue functions have the form:

check with the node manager to see if the operation is permitted by the capability
if so some key information is returned (generally a pid or a capability)
perform the desired operation (in the users process) if necessary

Some routines require two capability checks (generally to see if the executing process is permitted to do, and the target capability permits, the desired operation). Functions to spawn new processes, open a port, or create a new subnode are also a little, but not much, more complex, since they must initialise some data structures. I believe this general structure mirrors fairly closely the logic that should be followed if these features were to be implemented within the Erlang RunTime System (ERTS) for production use.

Each SSErl process maintains a record of information which includes:

self: a capability for itself, which determines which (potentially unsafe) operations the process is permitted to perform.
parent: a capability for its parent node, used to restrict rights for newly created subnodes, and some other operations.
group_leader: a capability for the process group_leader (used for I/O).
apply check function: the name of the function which may (optionally) be called by the apply glue routine before any external function calls to validate those calls, described in [Bro97b]. Copied from the parent node state table.
aliases: the list of module aliases, used redirect external function calls from "well-known" library modules to safer variants. Copied from the parent node state table.

This record is stored in the process dictionary, and is protected by modified put and erase functions from changes outside the sserl glue routines.

Capabilities

In the prototype, a capability is a tuple {Type,Node,Ref,Rand}, where Type is one of capapid|capaport|capanode|caparef; Node is the name of the (sub)node which created the capability; Ref is an erlang reference value; and Rand is a random number. The latter two provide the statistical protection for the capability. The capability has no intrinsic meaning until it is supplied to the owning node manager. The node manager maintains as part of its state a list of [{Capa,Value}*] which maps a capability to its value. Value is a tuple of the form {Class,Val,Rights}, where Class identifies the object referenced as one of process|port|node|{user type}; Val is the process id, port no, node manager process id, or user supplied value (atom or integer); and Rights is the list of access rights. Currently the access rights supported for the various classes are:

process
db, exit, garbage_collect, group_leader, kill, link, local, open_port, priority, process_info, register, restrict, revoke, send, spawn, spawn_link, trace, trap_exit, unlink, unregister, view
port
link, local, restrict, revoke, send, unlink, view
node
delete_module, info, link, load_module, local, net_kernel, newnode, processes, restrict, revoke, shutdown, spawn, spawn_link, unlink, view
ref
local, restrict, revoke, view

Most of these correspond to permitting the BIF of the same name (or the process_flag for trap_exit or priority). Rights specific to capability manipulation include:

info
permits access to node state information;
local
indicates that name registrations may not be made globally available;
restrict
permits restriction of the capability
revoke
permits revokation of the capability (provided it is a restricted variant).
view
permits viewing of the capability value {Class,Val,Rights}.

An appropriate capability must be supplied either explicitly (as a pid/port/node argument), or implicitly (from the processes knowledge of its own capability, or its parent node's capability), in order to perform most "unsafe" BIFs.

A capability is created whenever a process is spawned, a port is opened, a subnode is created, a reference is made, or an existing capability is restricted. They may also be created with very limited rights for existing processes outside the SSErl environment as part of its initialisation, or to correspond to pids from a list_to_pid BIF. Capabilities are destroyed (removed from the relevant node table) when the associated object (process, port or node) dies.

User capabilities (references with a user supplied type and value) are intended to assist in providing finer control for file accesses, I/O device accesses, or other potentially sensitive operations.

Subnodes

SSErl subnodes provide a distinct context for processes executing within them. Each subnode has a "node" server process which maintains the state for that subnode. The servers are registered by their node name in the "real" erlang system registered names table. This allows glue functions executing in user processes to communicate with a specified node server (as given by the node name embedded in a capability). This also allows access to non-local node servers by sending a message to '{node host}'.

The state managed by the server process for each subnode includes:

name: the name of the (sub)node as an atom, extended from the system name
self: a capability for itself (defined in its own capability table).
parent: a capability for its parent (defined in its parents capability table)
capability table [{Capa,Value}*]: maps capabilities to their associated real data (pid) & rights
registered name table [{Name,Capa,Pid}*]: maps names to a process capability, thus permitting different subnodes to have the same name referencing different processes, allowing custom variants of standard services to be provided
module alias table [{Name,Alias}*]: remaps module names at runtime to an alias name, allowing different subnodes to direct the same module name to different actual modules. Currently this uses an exact match, it may be extended to include a prefix-match, allowing name extensions on all otherwise unmatched names, if desired.
subnode table [{CNode,Pid}*]: provides a list of all subnodes which are children of this subnode
process table [{CPid,Pid}*]: provides a list of all processes belonging to the subnode
prototypical process rights: used to restrict rights for newly created processes in the node
apply check function {Mod,Func}: the name of the function which may (optionally) be called by the apply glue routine before any external function calls to validate those calls, see [Bro97b].

Glue Functions

Glue functions have been provided for a number of "unsafe" BIFs in order to implement the capability and subnode functionality, and to impose more stringent checks on the right to perform these functions. These "unsafe" BIFs may be catagorised in groups as: Apply, Spawn, Module, Node, Process, Misc and Db. There are also some new BIFs to manipulate capabilities and subnodes. Calls made to these BIFs are replaced by the modified compiler (more specifically by the sys_pre_expand module), eg. erlang:spawn(M,F,A) becomes sserl_bifs:k_spawn(M,F,A).

As mentioned earlier, most of the glue functions have the form:

check with the node manager to see if the operation is permitted by the capability
if so some key information is returned (generally a pid or a capability)
perform the desired operation (in the users process) if necessary

For example, the link glue routine is:

    k_link(CPid) ->
        Pid = node_request(check,CPid,link),
        link(Pid).

By category, the glue functions are:

Apply

apply({Module,Function}, ArgList)
apply(Module, Function, ArgList): both safe and unsafe calls could result from a call to apply. Nested applies are checked, safe variants are allowed, unsafe variants are rewritten, the apply check function is called (if specified), the module names are aliased, and finally the desired function is called. A process capability is returned.

Spawns and Open_Port

spawn(Module, Function, ArgList)
spawn_link(Module, Function, ArgList): these check that the requesting process has the right to, and the parent node permits, the spawn to occur. The new process is created and initialised. A process capability for the new process is returned.
spawn(Node, Module, Function, ArgList)
spawn_link(Node, Module, Function, ArgList): these spawn the process on a separate subnode (on either the same or a different underlying erlang system), returning a process capability.
open_port(PortName, PortSettings): creates a new port, and returns a capability for it, if the process is permitted. Unfortunately this affects communications with the port, since the real underlying Pid is used as part of the communications. Currently there is no easy way to catch and rewrite this code, so port interaction code must be rewritten slightly to work with the SSErl system.

Module

delete_module(Module): checks the process is permitted to, and aliases the module name before deleting the module.
load_module(Module): checks the process is permitted to, and aliases the module name before loading the module.
purge_module(Module): checks the process is permitted to, and aliases the module name before purging the module.
check_process_code(Pid, Module): retrieves the real pid, aliases the module name and is permitted.
function_exported(A1,A2,A3): is permitted as is.
module_loaded(Module): aliases the module name and is permitted
pre_loaded(): is permitted as is.

Node

alive(Name, Port): blocked as redundant
disconnect_node(Node): blocked as redundant
get_cookie(): permitted, but should become redundant
set_cookie(Node, Cookie): permitted if self has net_kernel right, though should become redundant
is_alive(): permitted, but redundant
monitor_node(Node, Flag): permitted if self has net_kernel right
newnode(Name)
newnode(Parent,Name)
newnode(Parent,Name,{Node_Rights,Proto_Rights,Names,Aliases,Options,Apply_Chk}): new BIFs to create a subnode of the parent node, if processes parent node capability permits. Any option values not supplied (given as nil) will be inherited from the parent node.
node(): returns the name of the parent node of the process (should really be a capability)
node(Arg): returns the name of the node which created Arg (should really be a capability)
node_info(Node): new BIF to return the node state information if Node permits info.
nodes(): returns a list of node names (should really be capabilities)
halt()
shutdown(Node): new BIF to terminate a node (halt defaults to parent node) if shutdown is permitted. If the topnode on an erlang system is shutdown, then the entire system is halted.

Process Communications

exit(Pid, Reason): used to cause a process to exit if permitted
kill(Pid): used to kill a process if permitted (equiv to exit(Pid,kill) but with a separate right)
group_leader(): returns the capability of the group leader process
group_leader(Leader, Pid): used to set the group leader of a process if permitted
link(Pid): links to a process if permitted
list_to_pid(List): converts a list to a pid, returning a (very restricted) capability
pid_to_list(Pid): converts the pid referenced by the capability to a list if view permitted
processes()
processes(Node): returns a list of all process capabilities executing on a node (defaults to parent node).
process_flag(Flag, Option): used to set the trap_exit or priority flags, if self has the corresponding right. Altering the error_handler is not permitted.
process_info(Pid)
process_info(Pid, Key): retrieves process information if permitted.
register(Name, Pid): used to register a name and corresponding capability on the parent node if both the capability and self permit. Any form of capability may be registered. Currently names are strictly local. A global server may be supported later.
registered(): retrieves a list of registered name and capability pairs.
self(): returns the processes own capability.
send(To,Msg): BIF which handles the message send operation " To!Msg". "To" may be a locally registered name, a remotely registered name and remote node capability, or a process capability; which identifies the process to receive the message.
unlink(Pid): unlinks from a process if permitted.
unregister(Name)
unregister(Pid): unregister the name or corresponding pid if it permits.
whereis(Name): returns the capability referenced by the registered name on the parent node.

Capability Functions

check(Capa,Op): new BIF to check if the capability permits op.
make_ref()
make_ref(Type,Val): creates a reference capability. The latter call associates a user Type (an atom) and Val (an atom or integer) with the reference capability for use as a user capability.
restrict(Rights): restricts the list of rights for a processes self capability - does not create a new capability. nb. the new list of rights will be the intersection of the existing and supplied lists of rights.
restrict(Capa,Rights): new BIF to create a restricted version of an existing capability. nb. the new list of rights will be the intersection of the existing and supplied lists of rights.
revoke(Capa)
revoke(Capa,Master): new BIF to revoke a capability restricted from self or Master.
same(C1,C2): new BIF to test whether the two capabilities (which may be restricted variants) refer to the same underlying object (eg pid or port).
view(Capa): new BIF to access the information the capability refers to (its real pid and access rights) if view is permitted.

Misc

erase()
put(Key,Value): permitted except that the sserl process information may not be erased or modified.
garbage_collect()
garbage_collect(Pid): permitted if self or pid respectively permits garbage_collect.
statistics(Type): permitted.
trace(Pid,How,Flags): permitted if pid permits trace.

DB

All the various "db_*" BIFs are replaced by functions which check self permits db before they are called. This probably ought to be refined further, but has not been a priority for this prototype.

Using the SSErl Prototype

The SSErl prototype is distributed a tar file which includes source and precompiled jam files for the sserl, modified compiler, and changed standard library modules. A README file is included which describes the (minimal) customisation required. Also required is a working Erlang 4.4.1 system (available from [Erla96]).

Once installed, a Unix shell script sserl is used to start the SSErl system. It is invoked as "sserl" (normally), or "sserl -verbose" (for copious debug information). It starts erlang in a distributed mode by default. It includes a slightly modified Eshell which initialises the SSErl environment, and executes any commands given using the modified apply in an sserl environment. Thus all the new BIFs are available.

An alternate script nsserl is provided which uses a custom boot file start_sserl.boot, which must be generated from an appropriately customised copy of start_sserl.script using mkboot:mkboot(start_sserl). This script is much more dependent on the Erlang system structure.

A number of additional utility routines are provided in the sserl module, and have been incorporated into shell_default, and are available directly from the shell prompt. These are all described in the shell help(). Some of the most useful include:

info()
info(Node): which display the node status information (rather long and verbose).
ps()
ps(Node): which lists all processes executing on a node
names()
names(Node): which lists all registered names on a node
mknode(Name)
safenode(Name): create a new unrestricted or limited subnode

Subnodes are created by the newnode(Name) BIF (or the safety policynode(Name,Policy_Module) or safenode(Name) library functions, see [Bro97b]). A capability for the new node is returned. This may then be used with spawn to run processes in the node.

Some functions are provided in the test module in the test subdirectory, to exercise various aspects of the SSErl environment, particularly focusing on the modified BIFs. See test:help() for details of the various test functions.

An abbreviated sample sserl session is given in the listing below. It assumes sserl was started in the test subdirectory of the distribution. Some details have been omitted for brevity.

UNIX> sserl
Erlang (JAM) emulator version 4.4
 
Eshell V4.4  (abort with ^G)
SSErl Node 'lpb@galaxy.serc.rmit.edu.au' initialised.
(lpb@galaxy.serc.rmit.edu.au)1> help().
** shell internal commands **
... various standard output omitted
** commands in module sserl (SERCs Safer Erlang) **
init()                -- Create top node (done by shell).
help()                -- Displays this help.
info()                -- Displays info about top node.
info(Cnode)           -- Displays info about node.
... other help details omitted

(lpb@galaxy.serc.rmit.edu.au)2> info().
Node Info Details
  Name           'lpb@galaxy.serc.rmit.edu.au'
  Node Capa      {capanode,'lpb@galaxy.serc.rmit.edu.au',#Ref,7330370}
  Parent Capa    topnode
  Subnodes         
  Processes        
   {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,2092775} -> <0.27.0>
   {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,2248609} -> <0.21.0>
  Process Cnt    2
  Process Rights [..rights..]
  Capabilities     
   {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,2092775} -> {process,<0.27.0>,[..rights..]}
   {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,5056377} -> {process,<0.15.0>,[..rights..]}
   {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,2583009} -> {process,<0.20.0>,[..rights..]}
   {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,2248609} -> {process,<0.21.0>,[..rights..]}
   {capanode,'lpb@galaxy.serc.rmit.edu.au',#Ref,7330370} -> {node,<0.25.0>,[..rights..]}
  Seed           {667,25635,181}
  Names            
   file_server -> {capapid,'lpb@galaxy.serc.rmit.edu.au',#Ref,5056377}, <0.15.0> 
  Aliases        []
  Options        []
  Status         alive
  Ticker         <0.26.0>
ok

(lpb@galaxy.serc.rmit.edu.au)3> N1=safenode(saf1).
{capanode,'saf1.lpb@galaxy.serc.rmit.edu.au',#Ref,5591103}

(lpb@galaxy.serc.rmit.edu.au)4> P1=spawn(N1,test,test,[]).
{capapid,'saf1.lpb@galaxy.serc.rmit.edu.au',#Ref,5293704}
Test - simple test to see self - at time {15,55,9}
test: self {capapid,'saf1.lpb@galaxy.serc.rmit.edu.au',#Ref,5293704} -> {process,<0.32.0>,[db,exit,garbage_collect,group_leader,kill,link,local,open_port,priority,process_info,register,send,spawn,spawn_link,trace,trap_exit,unlink,unregister,view]}
Process List for node 'saf1.lpb@galaxy.serc.rmit.edu.au'
Capa#       Pid          Current Call                         
5293704  <0.32.0>     {sserl_bifs,k_process_info,2}           

(lpb@galaxy.serc.rmit.edu.au)5> ps(N1).
{capapid,'saf1.lpb@galaxy.serc.rmit.edu.au',#Ref,7545331}
Process List for node 'saf1.lpb@galaxy.serc.rmit.edu.au'
Capa#       Pid          Current Call                         
7545331  <0.33.0>     {sserl_bifs,k_process_info,2}           
5293704  <0.32.0>     {test,snooze,1}                         

(lpb@galaxy.serc.rmit.edu.au)6> stop().
UNIX>

Programming in the Safe Environment

Programming in the safe environment should be very little different from normal erlang coding, save that some operations may be restricted when the code is executed. Generally the new BIFs would only be used in creating a custom environment, or in some utility modules (which handle display of capabilities for example). As an example, the utility function safenode(Name) which is supplied as part of a suite of utility functions in the safety module, is listed below (note it uses other utility functions from the safety and ordsets modules).

%% safenode/1 - creates a "safer" subnode of the parent
%%    node rights exclude delete_module,load_module,net_kernel,newnode
%%    processes within it may not use group_leader,open_port,priority
safenode(Name) ->
    CParent = get_dict(node),                   % get parent capability
    % restrict node rights list from parent for new node
    Ri = view_rights(CParent),                  % get rights of parent node
    NR = subtract(Ri,
                  list_to_set([delete_module,load_module,net_kernel,newnode])),
    % restrict proto process rights for new node
    St = node_info(CParent),
    PR = subtract(St#ninfo.p_rights,
                  list_to_set([group_leader,open_port,priority])),
    % add safe module aliases to Aliases table
    NewAli = [{file,safe_file},{lists,safe_lists},{ordsets,safe_ordsets},
                {random,safe_random},{string,safe_string},{unix,safe_unix}],
    Ali = append(NewAli, St#ninfo.aliases),
    % start the safe versions of daemons used by safenode modules
    catch safe_file:start(),
    % create new node with safer rights and custom world-view
    newnode(CParent,Name,{NR,PR,nil,Ali,nil,nil}).

This also demonstrates the use of aliases. The safenode library function uses the safe_file_server. It restricts file access to the current directory only, but is accessed using the usual file functions, with the module name being appropriately aliased.

Limitations of the SSErl Prototype

The current prototype has a number of limitations.

most of the current library modules are not (yet) compiled in the safe environment. Whilst they should run, they will not see the custom subnode environment, nor will they handle capabilities
performance will be reduced due to the extra layer of glue functions, and the necessity to exchange messages with the node manager process for all unsafe BIFs
open_port functionality involves a visible change in use, since the real underlying Pid must be used as part of the communications dialog
display of capabilities for processes and nodes is obviously different to what is seen at present (though presumably the io_lib functions could be changed to hide this)
values returned for nodes are currently inconsistent, some functions return a node capability, others return a node name. In part this is due to not having underlying node capabilities in the net_kernel.

All of these limitation could be addressed by incorporating the changes directly in a new version of the Erlang Run-Time System.

Incorporating SSErl in the Erlang RunTime Environment

Once experience with this prototype has verified the validity of this approach to providing a safe erlang execution environment, it would be much better to incorporate the changes into a new version of the ERTS. Also at this time, further safety checks could be made on the manner in which the ERTS has been written.

Capabilities and Subnodes in the ERTS

Capabilities should be a fundamental erlang data type, similar to a reference. It would be uniquely tagged of course, and should include some identifier for the subnode which created the capability (probably an index into the atom table, as I believe is done now for processes and references), along with a random value selected unpredictably and sparsely from a large possible space. This should use around 128 bits. Any pseudo-random generator function used must be seeded with information that is hard to predict externally (ie some combination of time and current system data structures at least, a true random source would be ideal though). These capabilities would be used for processes, ports, and nodes, as well as extensibly for other data requiring protection.

Subnodes should be added as a concept in the ERTS. They will primarily involve a table of relevant status information for each distinct subnode, along with some means of locating this table in the system both internally and externally. The table will include much of the information currently managed by the node manager processes in this prototype. The capability table will map capabilities to their underlying values and rights.

The process state information will need to be extended to include capabilities for itself, its parent, and its group leader; and probably a pointer directly to its parent node state table for efficiency. Note the parent node capability need not be the same as that recorded in the node state table, it may very well be a restricted version of it.

All the BIFs implemented in the ERTS which involve potentially unsafe operations will need to be rewritten to incorporate an appropriate check of rights from the supplied (or inherited) capabilities before proceeding.

Auditing BIFs and Standard Library Routines

All BIFs and standard library functions which are written in a general programming language (eg C) will need to be audited for careless coding practises which could be used to subvert the type safety of the system. These have been found to be a major source of security flaws in existing systems (eg see discussion on Java weaknesses in [DFW96]).

This component will be time-consuming, but necessary to ensure safety. Examples of poor style include any use of the standard C functions gets, sprintf, strcat, strcpy; ie any functions which could overrun a buffer supplied to them due to the absence of bounds checks on these parameters. The basic requirement is that all parameters be checked to ensure that bounds are not exceeded, that their values are sane, and cannot cause a run-time execution fault.

Other Changes for Improved Security

Some other changes which could be considered in the ERTS include the placement and implementation of the message buffer, and of the process scheduler.

Currently there is a single message buffer shared by all processes in an Erlang Node. A single buffer was chosen for efficiency and capacity management reasons, but does leave all processes on the Node susceptible to a denial of service attack. This could be created by a rogue process flooding some server with mal-formed messages that are not matched by any receive patterns in that server, and thus not flushed from the buffer. With the implementation of subnodes, consideration should be given to making the message buffer a component of the subnode rather than the node. Also, to reduce the impact of flood attacks, some mechanism for garbage collecting "old" messages, perhaps with a caveat that only messages which have been checked against a pattern and rejected some number of times are eligible. This is similar to solutions proposed to overcome the current spate of TCP SYN attacks on the Internet.

The process scheduler should also probably be modified with the introduction of subnodes. Rather than share CPU cycles amongst all ready processes, consideration could be given to allocating shares to the various subnodes, and then dividing that amongst all processes in a subnode.

Protecting Erlang from External Attack

If the ERTS is assumed to be safe from compromise (ie assume that no-one will gain root type privileges on its host and interfere with its address space(s) directly), then the only mechanism for external subversion is via "spoof" messages being sent to the port(s) associated with the net_kernel in distributed Erlang implementations. At present, the only security mechanism used is to require a suitable "cookie" be sent with each message [AVWW96]. However, this is sent in the clear, and is subject to eavesdropping, and subsequent masquerade by an attacker.

In order to secure these messages being exchanged between distributed Erlang nodes, it is necessary to either physically protect all communications links used, or to employ cryptographic techniques to secure the communications. Possible approaches to the latter involve the use of a "digital signature" instead of a "cookie" (eg perhaps a signed hash using the shared secret), or alternatively, full encryption of all links. The use of SSL (secure socket layer) code would most likely be the best choice here [HY96]. In any case, it would mean that the new "safe distributed erlang" would be incompatible with the existing system. This may, or may not, be a problem.

Conclusions

This paper describes the rationale, design approach, and details of the SSErl prototype of a more secure Erlang execution environment. The prototype will be used to evaluate whether an appropriate level of abstraction has been chosen, and whether the interfaces provided are appropriate for the development of "safe" imported code systems. It is anticipated that once the design approach is validated, it will then be incorporated in a new version of the Erlang RunTime System.

Acknowledgements

The SSErl prototype and this paper were written during my special studies program in 1997, whilst visiting SERC in Melbourne and NTNU in Trondheim, Norway. I'd like to thank my colleagues at these institutions, and at the Ericsson Computer Science Laboratory in Stockholm for their discussions and support.

References

APW86: M. Anderson, R.D. Pose, C.S. Wallace, "A Password Capability System", The Computer Journal, Vol 29, No 1, pp 1-8, 1986.
AVWW96: J. Armstrong, R. Virding, C. Wikstrom, M. Williams, "Concurrent Programming in Erlang", 2nd edn, Prentice Hall, 1996. http://www.ericsson.se/erlang/sure/main/news/book.shtml.
Arms96: J. Armstrong, "Erlang - A Survey of the Language and its Industrial Applications", in INAP'96 - The 9th Exhibitions and Symposium on Industrial Applications of Prolog, Hino, Tokyo, Japan, Oct 1996. http://www.ericsson.se/cslab/erlang/publications/inap96.ps.
Bro96b: L. Brown, "Mobile Code Security", in AUUG 96 and Asia Pacific World Wide Web 2nd Joint Conference, AUUG, Sept 1996. http://lpb.canb.auug.org.au/adfa/papers/mcode96.html.
Bro97b: L. Brown, "Custom Security Policies in SSErl", Australian Defence Force Academy, Canberra, Australia, Technical Note, Apr 1997. http://lpb.canb.auug.org.au/adfa/papers/ssp97/sserl97b.html.
DFW96: D. Dean, E.W. Felten, D.S. Wallach, "Java Security: From HotJava to Netscape and Beyond", in Proceedings IEEE Symposium on Security and Privacy, IEEE, May 1996. http://www.cs.princeton.edu/sip/pub/secure96.html.
Erla96: Erlang Systems, "Erlang Distribution", Ericsson Software Technology AB, Erlang Systems, 1996. http://www.ericsson.se/erlang/.
GM95: J. Gosling, H. McGilton, "The Java Language Environment: A White Paper", Sun Microsystems, May 1995. ftp://ftp.javasoft.com/docs/.
HY96: T.J. Hudson, E.A. Young, "SSLeay and SSLapps FAQ", Uni. Queensland, 1996. http://www.psy.uq.edu.au:8080/~ftp/Crypto/.
JNS97: I. Jonsson, G. Naeser, D. Sahlin, et al., "Adapting Erlang for Secure Mobile Agents", in Practical Applications of Intelligent Agents and Multi-Agents: PAAM'97, London, UK, Apr 1997. http://www.ericsson.se/cslab/~dan/reports/paam97/final/paam97.ps.
LSW95: S. Lucco, O. Sharp, R. Wahbe, "Omniware: A Universal Substrate for Mobile Code", in Fourth International World Wide Web Conference, MIT, Dec 1995. http://www.w3.org/pub/Conferences/WWW4/Papers/165/.
Nae97a: G. Naeser, "Your First Introduction to SafeErlang", CS, Uppsala University, Jan 1997. http://www.csd.uu.se/~gaffe/general/safe/nae97a.ps.gz.
OLW96: J.K. Ousterhout, J.Y. Levy, B.B. Welch, "The Safe-Tcl Security Model", Sun Microsystems Laboratories, Mountain View, CA 94043-1100, USA, Nov 1996. http://www.sunlabs.com/research/tcl/safeTcl.ps.
Tar95: J. Tardo, "An Introduction to Safety and Security in Telescript", General Magic Inc., 1995. http://cnn.genmagic.com/Telescript/TDE/security.html.
Wiks94: C. Wikstrom, "Distributed Programming in Erlang", in PASCO'94 - First International Symposium on Parallel Symbolic Computation, Sep 1994. http://www.ericsson.se/cslab/erlang/publications/dist-erlang.ps.

Now Superceeded by SSErl - Prototype of a Safer Erlang