Erlang is a declarative language for programming concurrent and distributed systems which was developed at the Ericsson Computer Science Laboratories [AVWW96], [Arms96], [Wiks94]. It is a dynamically typed, single assignment language which uses pattern matching for variable binding and function selection, has explicit mechanisms to create concurrent and distributed processes, and advanced facilities for error detection and recovery.
Two prototypes of a safer Erlang environment have been trialed.
Following trials with these prototypes, and extensive discussions
between myself, Maurice Castro, Gustaf Naeser, Dan Sahlin, and other
colleagues at Ericsson CS labs and SERC, a consensus is emerging
on the features required in a safer Erlang.
In the following sections I present my understanding of this consensus.
Later in this paper I present a method of imposing a security policy
on services by the use of a monitor function applied to server requests
by a modified safe_gen_server
module, as a better solution
than that previously trialed with SSErl.
All of these may be regarded as instances of "mobile code" which executes in a constrained environment at a location distinct from its origin. There are obvious safety issues raised, whereby the executing system wishes to constrain the information which may be accessed, and the resources used by such code. Some of these issues, and existing proposed solutions, are canvassed in Bro96b. To date most of the focus on providing safe execution environments has involved adapting traditional procedural programming languages. In this work, we are evaluating the ease of modifying a declarative, functional language, Erlang, instead.
Using Erlang as the core language for a safe execution environment provides a number of advantages. Firstly "pure" functional language use, where functions manipulate dynamically typed data and return a value, is safe (apart from problems of excessive resource usage). It is only when these functions are permitted to have side-effects that they may become dangerous. Side effects are possible in Erlang when a function:
open_port
BIF.
db
BIFs.
A "custom context" for processes is provided by having distinct:
Restrictions on "side-effects" are enforced by specifying whether or not each of the following are permissible for processes executing in the node:
- open_port
- block direct access to external resources such as files, network, hardware devices etc.
- external process access
- block access (eg. send,exit,link,process_info etc) to processes running in nodes on other Erlang systems (ie accesses which are provided via the net_kernel).
- database BIFs usage
- block direct access to permanent data accessed via the local database manager.
When disabled, access to such resources would have to be mediated by server processes running on the local system, but in a more privileged node, which would be trusted to enforce an appropriate access policy for safety. Typically these servers would be advertised in the registered names table of the "restricted" node. Standard servers can be used, if they are started with a custom safety policy check function, as discussed later.
Resource limits could be imposed by a node for all processes executing within, or any descendent child node of, it. Limits could be imposed on some of cpu usage, memory usage, max no reductions; or perhaps on combinations of these. Further work defining appropriate resources to be limited, is needed.
The general approach to creating a controlled execution environment to
support the type of scenarios described above
is as follows. First a number of servers are started in a node with
suitable privileges, to provide controlled access to resources. Then a
node would be created with "side-effects" disabled. Its registered
names table would be loaded referencing these servers, its
"loaded modules" table with appropriate local library and safe alias
names; and with appropriate resource limits set. Processes would then be
spawned in this node to execute the desired "mobile code" modules,
in a now appropriately constrained environment.
Capabilities
Currently in Erlang, if a process obtains or constructs a process
identifier, it has a large degree of control over that process.
It can send and receive messages or signals, view the process
information, or change the process flags. This is unsafe.
A finer granularity of control on the use of these process (and
port and node) identifiers is needed.
Making these identifiers capabilities is one means of providing this finer granularity. A capability is a globally unique (in time and space), unforgeable name for (a specific instance of) some resource. Associated with it is a list of rights for operations permitted on the resource by holders of the capability.
Capabilities may be implemented in several ways. Encrypted capabilities and password capabilities have been demonstrated in the safer erlang prototypes, and are being considered for use.
SafeErlang used encrypted capabilities for pids and nodes. It encrypted all of the information identifying the resource and the rights to use it. This incurs an overhead whenever the information must be accessed or verified, which depends on the type of encryption algorithm used. Once created, there is no way to revoke these capabilities apart from destroying the associated resource. It is also subject to an attacker attempting to guess the key used to encrypt the capability in order to forge other capabilities.
An alternative variant of encrypted capabilities, is to use a cryptographic hash function (eg [RFC2104]) to create a check value for the capability information, which is then kept in the clear. This system is still subject to an attacker attempting to guess the key used in creating the check value, and verifying the guess against intercepted capability data. The likelyhood of success will depend on the type of hash function used.
SSErl used password (sparse) capabilities [APW86]. In a password capability system, the capability is a data item which specifies the node creating it, and a random value (selected sparsely from a large address space). It is possible to try and forge a capability, but success is statistically highly improbable, and attempts should be detectable by abnormally high numbers of requests presenting invalid capabilities. The capability has no meaning on its own, but is only of use as a token when supplied with a request for some operation to the node which maintains a table of valid capabilities. This imposes a space overhead on the node state table. However password capabilities may be easily revocated by removal from the table of currently valid capabilities. A disadvantage is the size this table may grow to, particularly for long running server processes, and if capabilities are used for references or user defined values, where it is impossible to know when they have no further use. Also large tables may take some time to search, though careful selection of the table mechanism can reduce this to a minimum.
There is thus a tradeoff between these alternatives - trading some level of security with encrypted capabilities for space with password capabilities. The best alternative is likely to depend on the target environment.
Experience with the prototypes has also shown that it is important for efficient execution that all information needed to evaluate guard tests or pattern matches be present locally in the capability. This information must include the type (for the various pid/port/ref etc guards) and the value (to test if two capabilities refer to the same object).
Consequently I propose that capabilities have the following form:
{Type,NodeId,Value,Rights,Private}where
- Type
- describes the type of resource the capability references, such as a node, process, port, reference, or user defined capability type (atom).
- NodeId
- the identifier of the node which created the capability, and which can be asked to verify the validity of the capability or perform operations on the specified resource, which is managed by that node.
- Value
- identifies the resource which is referenced by the capability (node, process identifier, port identifier, reference number, or any erlang term, respectively for the various types)
- Rights
- a list (bitmap?) of operations permitted on the resource referenced by the capability. The actual rights depend on the type of the capability. For a process capability, it could include rights like:
[exit,link,info,register,restrict,send,trace,view]
.- Private
- an opaque (probably binary) term, which can be used by the originating node to verify the validity of the capability when it is submitted with a request to perform some operation on the associated resource. It could either be a cryptographic check value, or a random password value, only the originating node need know.
(nb. I've expressed the capability as a tuple, as used in the prototypes, but in a final implementation it will be an indivisible, fundamental, tagged data item, as pids/ports/refs are now).
Using capabilities of this form would allow individual nodes (more likely systems) to choose whichever implementation (and associated tradeoffs) is most appropriate, whilst preserving interoperability amongst nodes.
My own preference, after evaluating the prototypes, is marginally for the use of encrypted capabilities, implemented using a cryptographic hash function (eg [RFC2104]) applied to the full, externally visible, values of the other components. Care should be taken in the choice of an appropriate hash function, since known flaws have been identified in some "obvious" modes of use [BCK96]. The overhead of validating the check value can be minimised if any "local" encrypted capabilities are checked once on creation (or import) and then simply flagged as such (say be amending the hidden data type tag to indicate its been verified). Subsequent use of the capability then incurs no overhead. This assumes the Erlang run-time system is trusted (which is implicit in all this work). Further for "remote" capabilities, any delays due to crypto overheads are likely to be swamped by network latencies. Each node would keep the key used to create and validate its capabilities secret, and this key could be randomly selected when the node is initialised. Any previously created capabilities must refer to no longer extant (instances of) resources (from a previous incarnation of the node), so there is no requirement to continue to be able to validate them.
As well as capabilities for nodes, processes and ports, I propose using capabilities for references, and for extensible "user capabilities". The latter provide an unforgeable data item which can be given to "untrusted code", and subsequently verified when supplied with later requests. This could be used to implement safer versions of the file, window and network servers for example; where the "user capabilities" are used to refer to individual files etc, passed as parameters in a series of requests, but in a manner which means the server can verify that the data value associated with it has not been tampered with.
A capability may be restricted (assuming it permits it). This results in the creation of a new capability, referencing the same resource, but with a more restricted set of rights. Using this, a server process can for example, register its name against a restricted capability for itself, permitting other processes to send to it, but nothing else.
Capabilities would be used instead of the existing node names, pids, ports, or refs by BIFs which create or use these resources. New BIFs would be required for operations such as:
- check(Capa,Op)
- checks if the supplied capability is valid and permits the requested operation (this is not a guard test as it must consult the originating node to validate the capability).
- make_ref(Type,Val)
- creates a user capability with the specified type (atom) and value (term), as a general case of a reference.
- newnode(Parent,Name,Opts)
- creates a new node as a child of the Parent, with the specified context options.
- restrict(Capa,Rights)
- creates a new version of the supplied capability, referring to the same resource, but with a more restricted set of rights.
- same(Capa1,Capa2)
- guard testing whether the supplied capabilities refer to the same resource.
Names may be used instead of pids to send messages, and to identify nodes. This functionality should continue with these replaced as capabilities (extended in the case of thinking of node names as a "registered name", probably a global name).
I believe that three categories of names are needed:
atom
{name,node}
(where node is a capability).
Such names could be accessed if the remote node chooses to run an
appropriate server to accept requests to resolve local names.
{name1,name2,name3,...}
, possibly with an alternate
syntactic form like name1.name2.name3...
(but they should
not simply be a special form of atom, so it is possible to reason
about the separate components).
The addition of a formal, hierarchical global name space is the major new feature. It is needed, I believe, to support scaling of Erlang applications to large numbers of nodes. Its also needed to support mobile agents and long lived servers, which can be identified by name, with their associated capabilities being updated as they migrate or are restarted.
All of these names would be valid as the target of a send, and names
associated with a node could be used instead of the current use of DNS
node names in a number of operations. The Erlang run-time system would
resolve the name as appropriate. If a cached copy is used for efficiency,
there should be an automatic retrieval of the latest value if the cached
value is found to refer to an invalid capability.
Module Naming Conventions
Currently Erlang supports a two level naming mechanism of functions
within modules. There is no means of specifying that a group of
modules are related, as is often the case in applications, or
in libraries. Some concept, perhaps a project
(or library) is needed for a group of related modules, which is
then reflected in their names. This is recognised in the current
informal convention of using a common prefix on some library and
application names. This approach is not easy to work with explicitly,
or to reason about components of the names.
There are proposals to have a formal three level hierarchy in the next release of Erlang. I would support this, and encourage the naming convention be such that the separate components can be easily matched in function clause heads or guards (ie perhaps a tuple should be used, rather than some convention of splitting an atom).
I also believe that the current standard library modules should
be restructured and grouped as one (or a small number of) groups
of modules (projects).
Mobile Agents
There is some interest in supporting the concept of "mobile agents"
in Erlang. This was a key motivation behind the use of the SafeErlang
prototype in [JNS97]. In determining the requirements
for a safer erlang, consideration needs to be given to features needed
to support such agents.
The desire is to be able to take a group of processes, and on request of some process, wakeup the group on another node, with all internal references intact. It is assumed that such a migration would only occur when all of the processes are requesting a reduction (function call), this being the point of scheduling in erlang. Given the functional nature of the language, all that need be migrated would be the function name and arguments, the module code, and the process dictionary for each of the processes in the group. Since all the erlang terms have to be marshalled and cast into external form anyway, it should not be hard to identify (capability) pids referring to the migrating processes, and flag them for replacement by appropriate new values on the new node. All other data items, including capabilities referring to other servers, or resources such as ports, would remain unchanged, and continue to be valid, referring to the appropriate remote resource (as is done now with pids). Subsequently during execution, any external function calls should be interpreted in the context of the original node for the agent, and the requested module be loaded from that node (which presumably has granted permission for that to occur) using the new MID mechanism.
The required additional features needed to support mobile agents are thus:
An argument was proposed that the capabilities for migrating processes
should remain unchanged, and that references to them should be redirected.
I do not believe this is appropriate, as it is inconsistent with the
definition of a capability, not to mention its practical implementation.
It was motivated by a desire that other processes' references (copies of
capabilities) should continue to remain valid. I believe they should
be invalidated, since a migrated agent is now a new instance of itself.
Instead, any process which wishes to have a "permanent" reference to
an agent should use a "global" name instead. As part of the migration
process, this global name can be updated with the new capability for
the migrated agent.
Custom Safety Policies
To assist in the creation and usage of safer SSErl nodes, it is
desirable that there be a clear and easy mechanism for specifying an
appropriate safety policy. This would involve the correct specification
of the options in the creation of a newnode, but would also require
some means of controlling the use of some standard servers, especially
those accessing the file system, network and window manager.
Currently Erlang supports a generic client-server mechanism using the
gen_server
module. With a small change to this module
to optionally impose a check function against all messages received
for the server, it is possible to run a number of instances of a
standard server, each enforcing a different security policy.
This results in a clean, and I believe easier to verify implementation,
compared to implementing a suite of custom servers.
It also results in the checking overhead only occurring when accessing
such servers.
The use of this safety check function to assist in the easy specification and implementation of custom safety policies is further elaborated on in my subsequent paper [Bro97e].
safe_gen_server
module is
presented as a better solution than that suggested previously.