SSErl - Prototype of a Safer Erlang

Dr Lawrie Brown

School of Computer Science, Australian Defence Force Academy, Canberra, Australia

Abstract

In order to support outsourced and third party telecommunications applications, there is a desire to modify the Erlang language and execution environment to provide safe and partitioned execution of externally sourced mobile agents, applets, or outsourced programs; which are imported and run on a local Erlang system. This paper describes the SSErl prototype of a safer Erlang environment.

Introduction

Erlang is a declarative language for programming concurrent and distributed systems which was developed at the Ericsson Computer Science Laboratories [AVWW96], [Arms96], [Wiks94]. It is a dynamically typed, single assignment language which uses pattern matching for variable binding and function selection, has inherent support for lightweight concurrent and distributed processes, and has error detection and recovery mechanisms. It was developed at the Ericsson Computer Science Laboratories. Most of the Erlang system is written in Erlang (including compilers, debuggers, standard libraries), with just the core run-time system and a number of low-level system calls (known as BIFs for BuiltIn Functions) written in C. Distributed use is almost transparent, with processes being spawned on other nodes, instead of locally. An Erlang node is one instance of the run-time environment, often implemented as a single process (with many threads) on Unix systems. Erlang is currently being used in the development of a number of very large telecommunications applications, and this usage is expected to increase.

Mobile Code is code sourced from remote, possibly "untrusted" systems or suppliers, but imported and executed on a local system. Consequently such code needs to be executed within some form of "constrained" or "sandbox" environment to protect the local system from accidental or deliberate inappropriate behaviour. Such code can be used to support mobile agents, applets, outsourced code, and fault isolation in large applications. Demand for this type of support is expected to grow.

The general design approach to a safer Erlang has been described in [BrSa97]. This paper describes the SSErl prototype of a Safer Erlang system which implements this approach. It provides extensions to an Erlang system by supporting a hierarchy of nodes within an Erlang system; the use of capabilities for nodes, processes, ports, and user capabilities; and some support for remote module loading in context.

The SSErl Prototype

The key extensions identified in [BrSa97] required to make Erlang safer were:

nodes: forming a hierarchy within an Erlang system, to provide a custom context of services available, to restrict the use of code with access to external resources, and to impose utilisation limits (on cpu, memory etc).
capabilities: to impose a finer granularity of control on the use of process (and other resource) identifiers, making these unforgeable with a specified set of rights on the use of any specific instance.
remote module loading: to allow spawning of modules sourced on another system, retaining knowledge of the source system so that subsequent module references can also be sourced from that system.

SSErl (SERC's Safer Erlang - named after the locale where I wrote the original version) is a prototype to validate the utility of these extensions. It implements them by using glue functions for all calls of unsafe BIFs. These are substituted by a modified erlang compiler, and interact with node server processes, one for each distinct node on the erlang system. The modified compiler was adapted from that developed by Naeser [Nae97a] for his SafeErlang prototype.

Most of the SSErl glue functions have the form:

possibly resolve a registered name to capability
check with the node manager to see if the operation is permitted by the capabilities rights
if so some key information is returned (generally a pid or a capability)
perform the desired operation (in the users process)

For example, the k_exit glue routine look as follows:

k_exit(Proc,Reason)                     ->
    CPid = resolve_name(Proc),
    Pid = node_request(check,CPid,?exit),
    exit(Pid,Reason).

Functions to spawn new processes, open a port, or create a new subnode are also a little, but not much, more complex, since they must initialise some data structures. I believe this general structure mirrors fairly closely the logic that should be followed if these features were to be implemented within the Erlang RunTime System (ERTS) for production use.

SSErl also currently supports the execution of remote modules in context. Subsequent simple module requests for that process will also be interpreted in that context, and be sourced from the remote node, unless listed in the module aliases table or explicitly overridden.

Each SSErl process maintains some private state including capabilities for itself and for its parent. Further details are given in Appendix A.

Nodes

A hierarchy of nodes is defined within an Erlang run-time system (an existing Erlang node). These provide a custom context of services available, and restrict the use of code with side-effects. Specifying resource utilistion limits is not supported in SSErl, though ideally it should be. Functionally these nodes are similar to existing Erlang nodes with these additional features.

A custom context for processes is provided by having distinct:

registered names table: a table of local names and their associated capability identifiers, used to advertise services by name to processes executing within the node.
module alias table: which specifies which modules are currently (or may be permitted to be) loaded for use by processes executing in the node. This table maps the name used in the executing code to the name used locally for the loaded module, and is used to support both aliasing of names to safer variants, and to provide unique names for remotely loaded modules.
capability key data: the information necessary to create and validate unforgeable capabilities for the node. The contents are opaque, and private to the module implementing the capabilities.

Restrictions on side-effects are enforced by specifying (in the process rights field) whether or not each of the following are permissible for processes executing in the node:

open_port
for direct access to external resources like files, network, hardware devices etc.
external process access
for access (eg. send,exit,link,process_info etc) to processes running in nodes on other Erlang systems (ie accesses which are provided via the net_kernel).
database BIFs usage
for direct access to the local database manager.

Each node has a node server process which maintains the state for that node, including the various tables of context info and the process rights. Further details are given in Appendix A. The servers are registered by their node name in the real erlang system's registered names table. This allows glue functions executing in user processes to communicate with a specified node server (as given by the node name embedded in a capability). This also allows access to non-local node servers by sending a message to '{node,host}' (outside the SSErl framework).

A node capabilitity is returned by the newnode(Parent,Name,Opts) BIF which creates a new node, which also registers the node by name in both its, and the creators registered names table. eg.

      SomeNode=newnode(node(),testing,[{proc_rights,[]}]).

Either the node capability, or a registered node name (atom) can be used to specify a node to any BIF that requires this.

Capabilities

A capability is a globally unique (in time and space), unforgeable name for (a specific instance of) some resource and a list of rights for operations permitted on the resource by holders of the capability.

The same resource may be referred to from different capabilities giving the owners of the capabilities different rights. We are using capabilities to ensure that these identifiers cannot be forged, and to limit their usage to the operations desired. A capability is always created (and verified upon use) on the node which manages the resource which it references, and these resources never migrate. This node thus specifies the domain for the capability. Further, the resources referenced are never reused (a new process, even with the same arguments, is still a new instance, for example), so revocation is not the major issue it traditionally is in capability systems. In our usage capabilities are invalidated when their associated resource is destroyed (eg. the process dies). Other processes may possess invalid capabilities, but any use will raise an invalid capability exception.

Capabilities may be implemented in several ways. SSErl supports two, interoperable, variants which may be selected on a node by node basis using an option to the newnode BIF.

hash (encrypted) capabilities: use a cryptographic hash function to create a check value for the capability data, which is then appended to the capability. This may subsequently be used to verify the validity of the capability when it is presented to its node along with a request for some operation upon it. There is no means of revoking these capabilities.
password (sparse) capabilities: uses password capabilities, where the capability includes a random password value (selected sparsely from a large address space). The capability is supplied as a token to its node along with a request for some operation upon it. The node maintains a table of valid capabilities in order to verify it, and capabilities may be revoked by removing them from the table.

There is a tradeoff between these alternatives - trading some level of security with encrypted capabilities for space and revokability with password capabilities. The best alternative is likely to depend on the target environment.

In the prototype, capabilities are a tuple (record) with the following components:

      {Type,NodeId,Value,Rights,Private}

where

Type
describes the type of resource the capability references: mid, node, pid, port, or user.
NodeId
the identifier of the node which created the capability, and which can be asked to verify the validity of the capability or perform operations on the specified resource, which is managed by that node.
Value
identifies the resource which is referenced by the capability ({name,path}, node, process identifier, port identifier, or any erlang term, respectively for the various types)
Rights
a list of operations permitted on the resource referenced by the capability. The actual rights depend on the type of the capability.
Private
an opaque term, used by the originating node to verify the validity of the capability when it is submitted with a request to perform some operation on the associated resource. It may be a cryptographic check value, or a random password value, only the originating node need know.

The capability rights specifies the list of access rights permitted. Internally these are stored as a bit-mask, but are specified by the user as a list of atoms. Currently the rights supported for the various classes are:

pid
exit, group_leader, kill, link, priority, info, register, restrict, revoke, send, trace, trap_exit, unregister, view
port
exit, link, register, restrict, revoke, send, unregister, view
node
halt, info, module, monitor_node, newnode, processes, register, restrict, revoke, spawn, unregister, view
mid
info, load, register, restrict, revoke, unregister, view
user
register, restrict, revoke, unregister, view

Most of these correspond to permitting the BIF of the same name (process_flag for trap_exit or priority; remote module load for load). Rights specific to capability manipulation include:

info
permits access to info on the object (ie use the node_info or process_info BIFs)
restrict
permits restriction of the capability
restricted
internal use only right that specifies that this is a restricted variant of a capability
revoke
permits revokation of the capability (if supported)
view
permits viewing of the capability as a list

An appropriate capability must be supplied either explicitly (as a mid/node/pid/port argument), or implicitly (from the processes knowledge of its own capability, or its parent node's capability), in order to perform most unsafe BIFs.

A capability is created whenever a node is created, a process is spawned, a port is opened, a mid or user capability is made, or an existing capability is restricted. eg. Pid=spawn(testing,start,[]). Capabilities may also be created with very limited rights for existing processes outside the SSErl environment as part of its initialisation, or to correspond to pids from a list_to_pid BIF. Capabilities become invalid when the associated object (process, port or node) dies, though this may only be detected upon a subsequent attempt to use the capability.

User capabilities (with a user supplied value) are intended to assist in providing finer control for file accesses, I/O device accesses, or other potentially sensitive operations.

Additional details on the implementation of capabilities is given in Appendix B - SSErl Capability Internals.

Remote Module Execution in Context

SSErl currently supports the execution of remote modules in context. Each SSErl process maintains a table of remote module names and their associated mid capabilities. When executing within a remotely sourced module, any simple module requests will also be sourced from that remote node, unless listed in the module aliases table. Details of this process are given below in Appendix C - Apply Logic.

The module argument to the apply, spawn or spawn_link BIFs can be either a simple atom name (as currently), or can be a mid capability, created using the make_mid BIF. If a mid is used, then the module will be executed in context, and any internal references will be sourced from the same node. Thus a remote spawn in context can be done thus:

      TMid=make_mid(tricky),
      TPid=spawn(SomeNode,TMid,start,[]).

Some SSErl Features in more Detail

This section documents the changes to the language interface, specifically the BIFs and guards.

New BIFs

The following describes in some more detail, the interfaces to some of the new BIFs added to the SSErl prototype, particularly those related to nodes and capabilities.

halt(Node)
halts a node (specified by either registered name or capability), and all nodes and processes within it.
newnode(Parent,Name,Options)
creates a new SSErl node with the specified options, and returns a capability for the newly created node. Note: this capability will be registered with the nodes Name in both the creators and its own registered names table, to assist in supporting specification of nodes by name. The Options to this call are a list of (any of) these terms:

{proc_rights, [Ri*]}
specifies process rights from [db,extern,open_port], which will be constrained to at most those permitted by the parent.
{names, [{Name,Capa}*]}
initialises the registered names table.
{modules, [{Name,Alias}*]}
updates the module aliases table, merging the new aliases with those already defined in the parent node.
{capa, hash}
specifies the use of a crypto hash check for capabilities.
{capa, pass}
specifies the use of password capabilities.
{flags, [Flag*]}]
specifies some other flag (currently {verbose,N} is supported).
Any values not supplied will be inherited from the parent node.
newnode(Parent,Name)
newnode(Name)
variants which inherit all details from the specified parent/current node.
node_info(Node)
returns the node information table - probably releases too much information, but is needed for debugging!?!?!

check(Capa,Op)
checks if the supplied capability is valid and permits the requested operation, returns true or generates an exception.
make_capa(Value)
creates a user capability with any user supplied Value.
make_mid(Module)
creates a mid capability for the named module on the local node.
restrict(Capa,Rights)
creates a new version of the supplied capability with a more restricted set of rights. nb. the new list of rights will be the intersection of the existing and supplied lists of rights.
revoke(Capa)
revokes the specifies capability (if supported). The capability must be restricted. Original (master) capabilities cannot be revoked.
view(Capa)
returns the capability as a list of terms (with rights expanded to a list).

New and Modified Guard Tests

Several new guard tests have been added, and others modified:

capa(N)
test whether the term is a capability.
mid(N)
nid(N)
pid(N)
port(N)
test whether the term is a mid, node, pid, or port capability specifically. capability
same(Capa1,Capa2)
test whether the two supplied capabilities refer to the same object (ie have the same value). For efficiency reasons it does not consult with the creating nodes of the capabiities, and hence cannot check their validity (can use check() to do this).

Modified BIFs

The following existing BIFs have either modified arguments, or modified return values, as noted. Any BIFs which require specification of a node can take either a registered node name, or a node capability. Similarly, any BIFs which require specification of a pid can take either a registered process name, or a pid capability (this is a generalisation of the existing behaviour). BIFs which return a pid or port now return a pid or port capability. In contrast, BIFs which return a node name still do so (though it should be a registered name), for compatibility with existing usage. Any BIF taking a capability (whether specified by name or value) will only be performed if the capability grants the appropriate right.

apply(M,F,A)
apply a module. M can be either a plain module name or a mid capability. If the latter, a remote module load will be done if required, and the module is then executed in the appropriate remote context.
check_process_code(Pid,Module)
Pid can be either a name or a capability.
exit(Pid,Reason)
Pid can be either a name or a capability.
group_leader()
returns a capability for this processes group_leader with very limited rights (can register, send to and view, not much else).
group_leader(Leader,Pid)
Leader and Pid can be either names or a capabilities.
kill(Pid)
equivalent to exit(Pid,kill). Pid can be either a name or a capability.
link(Pid)
Pid can be either a name or a capability.
list_to_pid(List)
returns a capability for the constructed pid with very limited rights (can register, send to and view, not much else).
monitor_node(Node,Flag)
Node can either be a name or a capability.
node(Term)
return the owner node name of Term (capability, reference or registered name).
nodes()
return a list of names of all known nodes, those listed in the local nodes registered names table.
open_port(PortName,PortSettings)
returns a port capability for the new port to some external resource.
pid_to_list(Pid)
Pid can be either a name or a capability.
processes(Node)
return a list of pid capabilities for all processes on the specified Node. Node can either be a name or a capability.
process_info(Pid)
process_info(Pid,Item)
Pid can be either a name or a capability. Note: the links item still returns a list of real (not capability) pids - this is a deficiency.
register(Name,Capa)
register any type of capability, not just pids, in this nodes registered names table.
registered()
returns list of all registered names and their associated capabilities (of any type).
send(To,Msg)
glue routine used to replace To!Msg. To can be a name, a capability, or a {Name,Node} tuple.
spawn(M,F,A)
spawn(Node,M,F,A)
spawn_link(M,F,A)
spawn_link(Node,M,F,A)
spawn a new process (on Node) and return a pid capability for it. Node can either be a name or a capability; M can be either a plain module name or a mid capability. If the latter, a remote module load will be done if required, and the module is then executed in the appropriate remote context.
trace(Pid,How,FlagList)
Pid can be either a name or a capability.
unlink(Pid)
Pid can be either a name or a capability.
unregister(Name)
unregister(Capa)
can unregister by either name or a capability.
whereis(Name)
returns capability (of any type) registered with that name.

Safety Module

In the safety library module, there are a number of utility functions to assist in using the SSErl prototype. A list of these is supplied by safety:help(). Of these, of special note are:

restrictx(Capa,XRights)
restrict the supplied capability by excluding rights from the supplied list.
policynode(Name,Policy)
policynode(Parent,Name,Policy)
creates a node of the current/Parent node with the supplied Name, which implements the named Policy module, see [Bro97e].
safenode(Name,Policy)
safenode(Parent,Name,Policy)
creates a safer node of the current/Parent node with the supplied Name, and various default safety measures imposed.
read_capa(FileName)
reads a capability from the file "FileName.erlc", where the capability must have been written as a binary term, assuming access to the file is permitted. If the capability is for a node, it is also registered for convenience (to help make usage of node capabilities more transparent).
write_capa(FileName,Capa)
writes the given capability to the file "FileName.erlc", as a binary term.

The latter two may be used to transfer a capability between distinct Erlang systems (nodes). nb. cookies must also be exchanged in the current prototype.

Using the SSErl Prototype

The SSErl prototype is distributed as a tar file which includes source and precompiled jam files for the sserl, modified compiler, and changed standard library modules, as well as the C source for the crypto hash driver. A README file is included which describes the (minimal) customisation required and installation process. Also required is a working Erlang system (available from [Erla96]).

Once installed, a Unix shell script sserl is used to start the SSErl system. It is invoked as

      sserl [-capa hash|pass] [-verbose 1|2|3]

where the -capa flag specifies which form of capabilities to use by default, and the -verbose specifies the level of debugging diagnostics desired. It starts erlang in a distributed mode, with a slightly modified Eshell which initialises the SSErl environment, and executes any commands given using the modified apply in an sserl environment. Thus all the new BIFs are available.

A number of additional utility routines are provided in the sserl module, and have been incorporated into shell_default, and are available directly from the shell prompt. These are all described in the shell help(). Some of the more useful include:

info()
info(Node)
display node status information (rather long and verbose)
ps()
ps(Node)
list all processes executing in a node
names()
names(Node)
list all registered names on a node
newnode(Name)
safenode(Name)
policynode(Name,Policy)
create a new unrestricted, limited, or policy constrained, node

Subnodes are created by the newnode(Name) BIF (or the safety safenode(Name) or policynode(Name,Policy_Module) library functions, see [Bro97e]). A capability for the new node is returned. This may then be used with spawn to run processes in the node.

Some functions are provided in the test module in the test subdirectory, to exercise various aspects of the SSErl environment, particularly focusing on the modified BIFs. See test:help() for details of the various test functions.

A Sample SSErl Session

An abbreviated sample sserl session is given in the listing below. It assumes sserl was started in the test subdirectory of the distribution. Some details have been omitted for brevity.

lpb@hassel_102 %sserl
Erlang (JAM) emulator version 4.5.3
Eshell V4.5.3  (abort with ^G)
SSErl v1.4 Node 'lpb@hassel.item.ntnu.no' initialised.
(lpb@hassel.item.ntnu.no)1> help().
** shell internal commands **
... details of standard shell commands ommitted
** commands in module sserl v1.4 (SERCs Safer Erlang) **
init()                -- Create top node (done by shell).
help()                -- Displays this help.
info()                -- Displays info about top node.
info(Cnode)           -- Displays info about node.
pinfo()               -- Displays info about process.
names()               -- Display registered names.
names(Cnode)          -- Display registered names in node.
ps()                  -- Display topnode process list.
ps(Cnode)             -- Display nodes process list.
safenode(Name)        -- Create safe subnode of topnode.
policynode(Name,Policy) -- Create subnode with given policy.
cnode()               -- Find capability for this node.
read_capa(FileName)   -- Read capability from file.
write_capa(FileName,Capa) -- Write capability to file.
true
(lpb@hassel.item.ntnu.no)2> Saf1=safenode(saf1).
{capa,node,'saf1.lpb@hassel.item.ntnu.no',<0.30.0>,14207,#Bin}
(lpb@hassel.item.ntnu.no)3> P1=spawn_link(Saf1,test,test,[]).
{capa,pid,'saf1.lpb@hassel.item.ntnu.no',<0.31.0>,16711799,#Bin}
Test - simple test to see self - at time {16,12,22}
  Self = {capa,pid,'saf1.lpb@hassel.item.ntnu.no',<0.31.0>,16711799,#Bin}
  view(Self) = [pid,'saf1.lpb@hassel.item.ntnu.no',<0.31.0>,[exit,group_leader,
                kill,link,priority,send,trace,trap_exit,info,register,restrict,
                revoke,unregister,view],11657856311980075134]
  TestPort={'EXIT',safety_violation}
Process List for node 'saf1.lpb@hassel.item.ntnu.no'
Pid        Initial Call                   Current Call                  
<0.31.0>   {test,test,0}                  {sserl_bifs,k_process_info,2} 
  snoozing ...              
(lpb@hassel.item.ntnu.no)4> halt().

Distributed SSErl

SSErl may be used in a distributed Erlang environment. To access a remote node, it is necessary to have a capability for that node. Eventually it is intended that a global, hierarchical name server may be used to assist in distributing these capabilities. For now, the

read_capa(FileName)
& write_capa(FileName,Capa)

utilities can be used to exchange capabilities, as shown in the demo session below.

The following session involves two systems, which here share an NFS area, running different forms of capabilities on each. On the first system (hyll), sserl is run using hash capabilities. First its node capability is written, then a safe version of the RPC server (which executes calls in a restricted subnode) is started. After the calls from hassel have been processed, the node state is displayed.

lpb@hyll_192 %sserl -capa hash
Erlang (JAM) emulator version 4.5.3
 
Eshell V4.5.3  (abort with ^G)
SSErl v1.4 Node 'lpb@hyll.item.ntnu.no' initialised.
(lpb@hyll.item.ntnu.no)1> write_capa(hyll,cnode()).
ok
(lpb@hyll.item.ntnu.no)2> RPC=safe_rpc:start().
{ok,{capa,pid,'lpb@hyll.item.ntnu.no',<0.29.0>,16711799,#Bin}}
a(lpb@hyll.item.ntnu.no)3> info().
Node Info Details
  Name           'lpb@hyll.item.ntnu.no'
  Node Capa      {capa,node,'lpb@hyll.item.ntnu.no',<0.22.0>,16247,#Bin}
  Parent Capa    topnode
  Processes        
   {capa,pid,'lpb@hyll.item.ntnu.no',<0.29.0>,16711799,#Bin}
   {capa,pid,'lpb@hyll.item.ntnu.no',<0.28.0>,16711799,#Bin}
   {capa,pid,'lpb@hyll.item.ntnu.no',<0.21.0>,16711799,#Bin}
  Process Cnt    3
  Process Rights [db,extern,open_port]
  Monitors       []
  Subnodes         
   {capa,node,'rpc.lpb@hyll.item.ntnu.no',<0.30.0>,16247,#Bin}
  Names            
   'rpc.lpb@hyll.item.ntnu.no' -> {capa,node,'rpc.lpb@hyll.item.ntnu.no',<0.30.0>,16247,#Bin}
   safe_rpc -> {capa,pid,'lpb@hyll.item.ntnu.no',<0.29.0>,16711799,#Bin}
   'lpb@hyll.item.ntnu.no' -> {capa,node,'lpb@hyll.item.ntnu.no',<0.22.0>,16247,#Bin}
  Modules        [{erlang,erlang},{sserl,sserl},{sserl_bifs,sserl_bifs},...]
  Flags          []
  Capa Module    sserl_hcapa
  Capa State     {capast,#Port,[213,76,84,84,211,79,156,159,134,116,172,30,132,133,253,185],5}
  Capa Table       
ok

On the second system (hassel), sserl is run using password capabilities. First the (hash) capability for hyll is read, and then used in a safe_rpc call, and in a spawn_link. Finally the node state is displayed.

lpb@hassel_103 %sserl -capa pass
Erlang (JAM) emulator version 4.5.3
Eshell V4.5.3  (abort with ^G)
SSErl v1.4 Node 'lpb@hassel.item.ntnu.no' initialised.
(lpb@hassel.item.ntnu.no)1> Hyll=read_capa(hyll).
{capa,node,'lpb@hyll.item.ntnu.no',<47.22.0>,16247,#Bin}
(lpb@hassel.item.ntnu.no)2> safe_rpc:call(Hyll,sserl,ps,[]).
Process List for node 'rpc.lpb@hyll.item.ntnu.no'
Pid        Initial Call                   Current Call                  
<0.33.0>   {safe_rpc,reply,5}             {sserl_bifs,k_process_info,2} 
ok
(lpb@hassel.item.ntnu.no)3> spawn_link('lpb@hyll.item.ntnu.no',sserl,names,[]).
{capa,pid,'lpb@hyll.item.ntnu.no',<47.34.0>,16711799,#Bin}
Registered Name List for node 'lpb@hyll.item.ntnu.no'
Name                     Type     Val          Current Call                    
rpc.lpb@hyll.item.ntnu.n node     <47.30.0>    {sserl_node,serve,1}            
safe_rpc                 pid      <47.29.0>    {safe_gen_server,loop,7}        
lpb@hyll.item.ntnu.no    node     <47.22.0>    {sserl_node,serve,1}            
(lpb@hassel.item.ntnu.no)4> info().
Node Info Details
  Name           'lpb@hassel.item.ntnu.no'
  Node Capa      {capa,node,'lpb@hassel.item.ntnu.no',<0.22.0>,16247,#Bin}
  Parent Capa    topnode
  Processes        
   {capa,pid,'lpb@hassel.item.ntnu.no',<0.28.0>,16711799,#Bin}
   {capa,pid,'lpb@hassel.item.ntnu.no',<0.21.0>,16711799,#Bin}
  Process Cnt    2
  Process Rights [db,extern,open_port]
  Monitors       []
  Subnodes         
  Names            
   'lpb@hyll.item.ntnu.no' -> {capa,node,'lpb@hyll.item.ntnu.no',<47.22.0>,16247,#Bin}
   'lpb@hassel.item.ntnu.no' -> {capa,node,'lpb@hassel.item.ntnu.no',<0.22.0>,16247,#Bin}
  Modules        [{erlang,erlang},{sserl,sserl},{sserl_bifs,sserl_bifs},...]
  Flags          []
  Capa Module    sserl_pcapa
  Capa State     {capast,#Port,[46,193,12,169,205,133,193,165,230,237,159,224,104,192,140,54],4}
  Capa Table       
   {capa,pid,'lpb@hassel.item.ntnu.no',<0.20.0>,262271,#Bin}
   {capa,pid,'lpb@hassel.item.ntnu.no',<0.28.0>,16711799,#Bin}
   {capa,pid,'lpb@hassel.item.ntnu.no',<0.21.0>,16711799,#Bin}
   {capa,node,'lpb@hassel.item.ntnu.no',<0.22.0>,16247,#Bin}
 ok
(lpb@hassel.item.ntnu.no)5> halt().

If the shared NFS area were not available, any file transfer mechanism could have been used to transfer the (small, binary) capability data to the remote system.

Note that looking at just the capabilities, it is not possible to determine which form is being used. Only by examining the node state is this apparent.

Remote Module Execution in Context

The following session illustrates how a remote module can be accessed and executed. It assumes that another instance of sserl is running on hyll which has already written its node capability to a file (as shown in the previous example). A mid capability is created and used in the spawn to request a remote load of that module and execution in context. Note the internal module name reflects this. On the first use there is a delay whilst the module is loaded, but subsequent invocations use the loaded copy. In the second part of the example, a mid was created on hyll and written to a file. This was read and then used directly in an apply on hassel. Note again, the module name reflects its remote origin. The third part of the example shows an apply on another mid transferred from hyll, but which calls a second module using a plain name, which is interpreted in context.

lpb@hassel_104 %sserl
Erlang (JAM) emulator version 4.5.3
Eshell V4.5.3  (abort with ^G)
SSErl v1.4 Node 'lpb@hassel.item.ntnu.no' initialised.
(lpb@hassel.item.ntnu.no)1> Hyll=read_capa(hyll).
{capa,node,'lpb@hyll.item.ntnu.no',<47.22.0>,16247,#Bin}

(lpb@hassel.item.ntnu.no)2> Mid=make_mid(mid_test2).
{capa,mid,'lpb@hassel.item.ntnu.no',{'mid_test2@lpb@hassel.item.ntnu.no',
     "/home/hyll/b/lpb/mcode/Erlang/sserl/test/mid_test2"},16503,#Bin}
(lpb@hassel.item.ntnu.no)3> spawn(Hyll,Mid,f1,[from_hassel]).
{capa,pid,'lpb@hyll.item.ntnu.no',<47.31.0>,16711799,#Bin}
  'mid_test2@lpb@hassel.item.ntnu.no':f1(from_hassel) called and snoozing

(lpb@hassel.item.ntnu.no)4> Mid2=read_capa(mid2).
{capa,mid,'lpb@hyll.item.ntnu.no',{'mid_test2@lpb@hyll.item.ntnu.no',
     "/home/hyll/b/lpb/mcode/Erlang/sserl/test/mid_test2"},16503,#Bin}
(lpb@hassel.item.ntnu.no)5> apply(Mid2,f2,[on_hassel]).
  'mid_test2@lpb@hyll.item.ntnu.no':f2(on_hassel) called and snoozing
ok

(lpb@hassel.item.ntnu.no)6> Mid3=read_capa(mid3).      
{capa,mid,'lpb@hyll.item.ntnu.no',{'mid_test1@lpb@hyll.item.ntnu.no',
     "/home/hyll/b/lpb/mcode/Erlang/sserl/test/mid_test1"},16503,#Bin}
(lpb@hassel.item.ntnu.no)7> apply(Mid3,f1,[on_hassel]).
  'mid_test1@lpb@hyll.item.ntnu.no':f1(on_hassel) called
  calling mid_test2:f1(mid_test1)
  'mid_test2@lpb@hyll.item.ntnu.no':f1(mid_test1) called and snoozing
  calling mid_test2:f2(mid_test1)
  'mid_test2@lpb@hyll.item.ntnu.no':f2(mid_test1) called and snoozing
  called mid_test2:module_info(module) = 'mid_test2@lpb@hyll.item.ntnu.no'
  'mid_test1@lpb@hyll.item.ntnu.no':f1(on_hassel) snoozing ...
ok

Programming in the Safe Environment

Programming in the safe environment should be very little different from normal erlang coding, save that some operations may be restricted when the code is executed. Generally the new BIFs would only be used in creating a custom environment, or in some utility modules (which handle display of capabilities for example). As an example, the utility function safenode(Name) which is supplied as part of a suite of utility functions in the safety module, is listed below.

%% safenode/2 - creates a "safer" node of specified parent
safenode(Parent,Name) ->
    PRi = [],			% restrict proto process rights for new node
    % use safe module aliases
    Mods = [{file,safe_file},{lists,safe_lists},{ordsets,safe_ordsets},
		{random,safe_random},{string,safe_string},{unix,safe_unix}],
    % start the safe versions of daemons used by safenode modules
    catch safe_file:start(),
    % create new node with safer rights and custom world-view
    CN = newnode(Parent,Name,[{proc_rights,PRi},{modules,Mods}]),
    % and restrict it before return to prevent newnode use.
    restrictx(CN,[newnode]).

This also demonstrates the use of aliases. The safenode library function uses the safe_file_server module. It restricts file operations allowed, but is accessed using the usual file functions, with appropriate aliasing of the module name.

Limitations of the SSErl Prototype

The current prototype has a number of limitations.

performance is reduced due to the extra layer of glue functions, and the need to exchange messages with the node manager process for all unsafe BIFs
open_port functionality involves a visible change in use, since the real underlying Pid must be used as part of the communications dialog
display of capabilities for processes and nodes is obviously different to what is seen at present (though presumably the io_lib functions could be changed to hide this)
values returned for nodes are currently inconsistent, some functions return a node capability, others return a node name. In part this is due to not having underlying node capabilities in the net_kernel.
both capability implementations use an insufficiently random key, and the password table is linear.
the remote modules table is currently maintained separately in each process, whereas it ought to be managed by the node, and merely cached in each process. This results in child mid capabilities being issued for each process which uses a remote module, rather than once on any node.
ambiguity is possible with mid capabilities if two modules with the same name but in different directories exist. Massive confusion will result - as the unique name used is module@node.

All of these limitation could be addressed by incorporating the changes directly in a new version of the Erlang Run-Time System.

Incorporating SSErl in the Erlang RunTime Environment

Once experience with this prototype has verified the validity of this approach to providing a safe erlang execution environment, it would be much better to incorporate the changes into a new version of the ERTS. Also at this time, further safety checks could be made on the manner in which the ERTS has been written.

Capabilities and Subnodes in the ERTS

Capabilities should be a fundamental erlang data type, similar to a reference. It would be uniquely tagged of course, and should include some identifier for the node which created the capability (probably an index into the atom table, as I believe is done now for processes and references). These capabilities would be used for nodes, processes, and ports, as well as extensibly for other data requiring protection.

Subnodes should be added as a concept in the ERTS. They will primarily involve a table of relevant status information for each distinct node, along with some means of locating this table in the system both internally and externally. The table will include much of the information currently managed by the node manager processes in this prototype.

The process state information will need to be extended to include capabilities for itself, its parent, and its group leader; and probably a pointer directly to its parent node state table for efficiency. Note the parent node capability need not be the same as that recorded in the node state table, it may very well be a restricted version of it.

All the BIFs implemented in the ERTS which involve potentially unsafe operations will need to be rewritten to incorporate an appropriate check of rights from the supplied (or inherited) capabilities before proceeding.

Auditing BIFs and Standard Library Routines

All BIFs and standard library functions which are written in a general programming language (eg C) will need to be audited for careless coding practises which could be used to subvert the type safety of the system. These have been found to be a major source of security flaws in existing systems (eg see discussion on Java weaknesses in [DFW96]).

This component will be time-consuming, but necessary to ensure safety. Examples of poor style include any use of the standard C functions gets, sprintf, strcat, strcpy; ie any functions which could overrun a buffer supplied to them due to the absence of bounds checks on these parameters. The basic requirement is that all parameters be checked to ensure that bounds are not exceeded, that their values are sane, and cannot cause a run-time execution fault.

Protecting Erlang from External Attack

If the ERTS is assumed to be safe from compromise (ie assume that no-one will gain root type privileges on its host and interfere with its address space(s) directly), then the only mechanism for external subversion is via spoof messages being sent to the port(s) associated with the net_kernel in distributed Erlang implementations. At present, the only security mechanism used is to require a suitable cookie be sent with each message [AVWW96]. However, this is sent in the clear, and is subject to eavesdropping, and subsequent masquerade by an attacker.

In order to secure these messages being exchanged between distributed Erlang nodes, it is necessary to either physically protect all communications links used, or to employ cryptographic techniques to secure the communications. Possible approaches to the latter involve the use of a digital signature instead of a cookie (eg perhaps a signed hash using the shared secret), or alternatively, full encryption of all links. The use of SSL (secure socket layer) code would most likely be the best choice here [HY96]. In any case, it would mean that the new safe distributed erlang would be incompatible with the existing system. This may, or may not, be a problem.

Conclusions

This paper describes the rationale, design approach, and details of the SSErl prototype of a more secure Erlang execution environment. The prototype will be used to evaluate whether an appropriate level of abstraction has been chosen, and whether the interfaces provided are appropriate for the development of safe imported code systems. It is anticipated that once the design approach is validated, it will then be incorporated in a new version of the Erlang RunTime System.

Acknowledgements

The SSErl prototype and this paper were written during my special studies program in 1997, whilst visiting SERC in Melbourne and NTNU in Trondheim, Norway. I'd like to thank my colleagues at these institutions, and at the Ericsson Computer Science Laboratory in Stockholm for their discussions and support.

References

AVWW96: J. Armstrong, R. Virding, C. Wikstrom, M. Williams, "Concurrent Programming in Erlang", 2nd edn, Prentice Hall, 1996. http://www.ericsson.se/erlang/sure/main/news/book.shtml.
Arms96: J. Armstrong, "Erlang - A Survey of the Language and its Industrial Applications", in INAP'96 - The 9th Exhibitions and Symposium on Industrial Applications of Prolog, Hino, Tokyo, Japan, Oct 1996. http://www.ericsson.se/cslab/erlang/publications/inap96.ps.
BrSa97: L. Brown, D. Sahlin, "Extending Erlang for Safe Mobile Code Execution", Australian Defence Force Academy, Canberra, Australia, Technical Report, No CS03/97, Nov 1997. http://lpb.canb.auug.org.au/adfa/papers/tr9703.ps.gz.
Bro97e: L. Brown, "Custom Safety Policies in SSErl", Australian Defence Force Academy, Canberra, Australia, Technical Note, Jun 1997. http://lpb.canb.auug.org.au/adfa/papers/ssp97/sserl97e.html.
DFW96: D. Dean, E.W. Felten, D.S. Wallach, "Java Security: From HotJava to Netscape and Beyond", in Proceedings IEEE Symposium on Security and Privacy, IEEE, May 1996. http://www.cs.princeton.edu/sip/pub/secure96.html.
Erla96: Erlang Systems, "Erlang Distribution", Ericsson Software Technology AB, Erlang Systems, 1996. http://www.ericsson.se/erlang/.
HY96: T.J. Hudson, E.A. Young, "SSLeay and SSLapps FAQ", Uni. Queensland, 1996. http://www.psy.uq.edu.au:8080/~ftp/Crypto/.
Nae97a: G. Naeser, "Your First Introduction to SafeErlang", CS, Uppsala University, Jan 1997. http://www.csd.uu.se/~gaffe/general/safe/nae97a.ps.gz.
RFC1321: R. Rivest, "The MD5 Message-Digest Algorithm", IETF, RFC 1321, Apr 1992.
RFC1750: D. Eastlake, S. Crocker, J. Schiller, "Randomness Recommendations for Security", IETF, RFC 1750, Dec 1994.
RFC1938: N. Haller, C. Metz, "A One-Time Password System", IETF, RFC 1938, May 1996.
RFC2104: H. Krawczyk, M. Bellare, R. Canetti, "HMAC: Keyed-Hashing for Message Authentication", IETF, RFC 2104, Feb 1997.
Wiks94: C. Wikstrom, "Distributed Programming in Erlang", in PASCO'94 - First International Symposium on Parallel Symbolic Computation, Sep 1994. http://www.ericsson.se/cslab/erlang/publications/dist-erlang.ps.

Appendix A - SSErl Internal State

The SSErl prototype keeps internal state for both nodes and processes.

SSErl Nodes

The state managed by the SSErl server process is passed as a parameter St to the server loop for each node sserl_node:serve(St), and is a record which includes:

name
the name of the node as an atom, extended from its parent's name
self
a capability for itself (created in this node)
parent
a capability for its parent (created by the parent)
registered name table [{Name,Capa}*]
maps names to a process capability, thus permitting different nodes to have the same name referencing different processes, allowing custom variants of standard services to be supported
module alias table [{Name,Alias}*]
remaps the name used in the executing code to the name used locally for the loaded module. Used to support execution of safe variants of libraries, and to ensure remotely loaded modules have a unique name
node table [CNode*]
provides a list of all nodes which are children of this node
process table [CPid*]
provides a list of all processes belonging to the node
monitors table [Pid*]
provides a list of all processes (local and remote) monitoring this node
prototypical process rights
used to restrict rights (on db,extern,open_port) for processes in the node
flags
general flags (eg verbose)
capability module
the name of the module (sserl_hcapa or sserl_hcapa) used to implement the private capability details
capability state
an opaque term used by the capability module
capability table
an opaque term used by the capability module

The node state may be viewed using sserl:info(Node), where Node is a capability for the node (current node by default).

SSErl Processes

Each SSErl process maintains a record of information which includes:

self
a capability for itself, which determines which (potentially unsafe) operations the process is permitted to perform
node
a capability for its parent node, used to restrict rights for newly created subnodes, and some other operations
rem_mod
table of remote module names and associated mid capabilities
init
the initial call for the process (used by safety:ps())
p_rights
the list of process rights (copied from the node state for efficiency)
flags
the list of flags (copied from the node state for efficiency)
modules
the list of module name aliases, used redirect external function calls (copied from the node state for efficiency)

This record is stored in the process dictionary with name sserl_pinfo, and is protected by modified put and erase functions from changes other than via the sserl glue routines. It may be viewed using sserl:pinfo().

Appendix B - SSErl Capability Internals

The SSErl prototype includes dual capability implementations - using a crypto hash check value, and a (sparse random) password.

As described previously, in SSErl capabilities are tuples with the following form:

      {Type,NodeId,Value,Rights,Private}

Hash Capabilities

Hash capabilities use a cryptographic hash function to create a check value for the capability data, which is then appended to the capability. Currently, this hash is computed as:

      Private = HMAC_MD5(term_to_binary({Type,NodeId,Value,Rights}),Key)

using the HMAC_MD5 hash function [RFC2104]. implemented by the external sserl_hash driver (described below), with Key selected randomly on node creation by the driver.

In the node state, capability module is sserl_hcapa. The capability state is a tuple {port, key, count} where the port links to the sserl_hash driver; the key is that computed by the driver and returned for debugging use only; and count is a running tally of how many capabilities the node has created. The capability table is unused.

Password Capabilities

Password (sparse) capabilities include a random password value (selected sparsely from a large address space). When the capability is supplied as a token to its node along with a request for some operation, the node searches for the capability in its table of valid capabilities in order to verify it. The random value is currently computed using:

      Private = HMAC_MD5(term_to_binary({"password capability",Count}),Key)

using the HMAC_MD5 hash function [RFC2104]. implemented by the external sserl_hash driver (described below), with Key selected randomly on node creation by the driver.

In the node state, capability module is sserl_pcapa. The capability state is a tuple {port, key, count} where the port links to the sserl_hash driver; the key is that computed by the driver and returned for debugging use only; and count is a running tally of how many capabilities the node has created. The capability table is the nodes table of valid capabilities, organised as a simple LIFO linear list of [Capa*].

sserl_hash

This implements the HMAC_MD5 hash function [RFC2104] as an external driver, linked to Erlang via a port. sserl_hash is a C program which when started via

     Port = open_port({spawn,sserl_hash},[binary,{packet,2}]),

computes a random key as Key=HMAC_MD5([time,pid,ppid],FixedKey).
This has insufficient entropy for safety, but fixing it is neither easy nor portable (cf [RFC1750]). It then returns the key as a binary to Erlang for debug usage
the driver then loops:
1. reading a binary term Bin from erlang
2. computes Hash=HMAC_MD5(Bin,Key) (see [RFC2104], [RFC1321])
3. folds Hash to 64 bits as per the OTP standard for space efficiency ([RFC1938])
4. sends the resulting 64 bits back to Erlang as a binary

The driver is loosely derived from the sample driver demo_server.c in the Erlang book [AVWW96] pp127-130, and uses the packet protocol for communication with Erlang with a 2 byte length field.

Appendix C - Apply Logic

The k_apply glue routine is the key routine responsible for appropriately aliasing module names, and supporting remote module execution in context. Briefly, its logic when called with a plain module name may be summarised as follows:

check if module is given in alias list, if so use the local name given
otherwise check the execution context and
use the local name if in local context
handle a remote module reference, obtaining a mid and laoding if needed

It, and its associated routines look as follows:


%% handle named modules whose names may need to be aliased
k_apply(C,M,F,A) when atom(C), atom(M), atom(F), list(A) ->
    % lookup module name (use alias if exists, otherwise check context)
    Mod = case alias_module(M) of
        {yes,Ali}  -> Ali;                              % aliased name, use it
        {no,_}     -> check_context(C,M)                % see if local/remote
    end,
    verbose2("~w apply(~w,~w,~w,~240.4p)~n",[self(),C,Mod,F,A]),
    apply(Mod,F,A);

%% alias_module/1 returns the real module callname
%% => CallName | exit(*)
%%    scans the module alias list (local copy in pinfo)
%%    to see if the supplied module name should be aliased
alias_module(Mod) ->
    ModList = get_dict(modules),
    case keysearch(Mod,1,ModList) of
        {value,{_,RealName}} -> {yes,RealName};
        false           ->      {no,Mod}
    end.


%% check_context/2 checks & handle module in local or remote context
%%    called by k_apply(C,Mod,F,A)
%% => Callname | EXIT
check_context(P,M) ->
    Remotes = get_dict(rem_mod),                        % get remotes list
    case keysearch(P,1,Remotes) of                      % see if caller remote
        {value,{_,PMid}} ->     rem_mod(PMid,M,Remotes);% remote module
        false           ->      M                       % local module
    end.

%% rem_mod/3 handles a remote module reference, and ensures its loaded
%% => Callname | EXIT
%%
%% NOTE - the list of remote modules & mids SHOULD be kept by the node manager
%%        and just cached here. Keeping it local is a HACK to keep it simple!!!
rem_mod(PMid,M,Remotes) ->
    Node = capa_node(PMid),
    Full = list_to_atom(atom_to_list(M)++"@"++atom_to_list(Node)), % fullname
    case keysearch(Full,1,Remotes) of                   % check if known
        {value,{_,Mid}} ->      Full;                   % already loaded
        false           ->      % obtain mid for module, save it, ensure loaded
                                Mid = node_request(child_mid,PMid,M),
                                put_dict(rem_mod,[{Full,Mid}|Remotes]),
                                remcode:ensure_loaded(Mid),
                                Full
    end.

The module remcode provides the routines to actually load a remotely sourced module. If k_apply is called with an explicit mid capability, then a remote module reference is used, and checked to see if its necesary to add the name and mid to the rem_mod table and ensure its loaded.

The latest version of this paper may be found at: http://lpb.canb.auug.org.au/adfa/papers/tr9704.html.

This paper was last revised: 22 Oct 1997, and it currently documents SSErl v1.4 running under Erlang v4.5.3.