SICP series - Mutation and Environment Model
So far in the previous posts we have dealt only with functional programming
. According to this model, all procedures are pure
functions, which means they can be conceptually treated as mathematical functions. A function in mathematics is simply a mapping from its input values to some output values; so a function is dependent only on the input parameters it is passed. Also, the function should return the same output any time it is given the same input values. In the programming world, this entails that procedures do not alter any existing data structures, or in other words they do not mutate their state
. In fact, any time an operation is performed on a data structure, a new copy is returned from it. For example, the operation of adding an element to the end of the list (usually implemented by the function append) does not alter the given list; instead, the following steps are taken:
-
a copy of the list is made
-
the element is added to the end of the list
-
the copy of the list is returned
This means that the list passed as input to append, after being processed by this function, is unchanged, i.e. does not possess any new element. It is then possible to operate on this list assuming that nowhere throughout the course of the program it was modified. This is the reason why we say pure functions are exempt from side effects
.
The previous point, i.e. that functions limit themselves to return values rather than altering any existing data structure, shows the composability quality of functional programs. With this perspective in mind, it is possible to view programs as made of the interactions among its functions, where each function depends on the value resulting from the execution of another function; at the same time, that function can provide the input value to the next one. This model of computation is reminiscent of commands piping in Unix systems. Again, notice how the data structures in the system are left unaltered, while the functions do their computations.
Mutation
As good as this model may be, it however poses some limitations, so today we are going to introduce the topic of mutation
. This concept is implemented in Scheme by mutators, which take a variable name and a value and bind that value to that name; if that variable possessed a value, this is discarded and the new value is assigned to the variable. This is done with the following instruction.
(set! x "foo")
Notice how this differs from define
, in that this construct creates a new binding between a name and a value. It is however not possible to redefine this binding with another define
; what happens is that the previous binding is discarded and a new binding is created. So define
can only create new bindings, not alter them. Even though apparently the outcome may be the same, it is important to appreciate the subtle difference.
The important effect mutation brings to the table is that it introduces time
or context
in our model of computation. Now, two expressions with identical syntax may be associated to different semantics as they rely on the context surrounding their evaluation. Their meaning is dependent on the time they are evaluated, which means the order in which expressions are evaluated matter. For example
(define x 10)
(+ x 5) ; 15
; ...
(set! x 90)
; ...
(+ x 5) ; 95
note how in the previous excerpt the same instruction (+ x 5)
can result in two different values depending at the point in time where it is evaluated in the program.
Environment Model
This shift in paradigm from functional to state-based
requires a new evaluation model to evaluate expressions by also taking into account the effects introduced by mutation. This leads also to a shift in the viewpoint on computation. According to the new vision, the following elements can be viewed in the following manner:
-
variable
-
OLD - name for a value
-
NEW - place where it is possible to store something
-
-
procedure
-
OLD - functional description of a computational process
-
NEW - object with inherited context
-
-
expression
-
NEW - only has meaning with respect to an environment
-
These considerations bring us the the definition of what we will call the Environment Model
(EM). What follows now is a description of the structure of this model.
-
A frame consists of a table of bindings; each binding in the table is a pairing of a name and a value.
-
An environment consists of a nested sequence of frames, where a frame can be shared by multiple environments; the connection between two frames is called the
enclosing environment pointer
. All the evaluation occurs within the scope of an environment, which provides the context for how symbols and names are interpreted. -
The current environment changes when the interpreter applies a procedure.
-
The top environment is called the
global environment
(GE); it is the starting point in a program and holds the bindings for all its basic expressions. -
The rule to evaluate a combination is the followign one:
-
evaluate the sub-expressions in the current environment
-
apply the value of the first sub-expression to the values of others
-
What follow now are rules of evaluation of EM. To have a visual depiction of the following rules I advise to consult the slides of lesson 15 which can be found here [1]
-
name-rule
: a name X evaluated in environment E gives the value of X in the first frame of E where X is bound -
define-rule
: a define construct evaluated in environment E creates or replaces a binding in the first frame of E -
set!-rule
: a set! of variable X evaluated in environment E changes the binding of X in the first frame of E where X is bound -
lambda-rule
: a lambda construct evaluated in environment E creates a procedure whose environment pointer is E -
application
: this consists of four steps-
create a new frame A
-
make A into an environment E; A’s enclosing environment pointer goes to the same frame as the environment pointer of the procedure being applied P
-
within A, bind the parameters of P to the argument values
-
evaluate the body of P with E as the current environment
-
Again, note the difference between set!
and define
. set!
always finds an existing binding for the variable, walking up the chain of environment pointers until it finds a binding and changes it. define
always creates a binding in the same frame, even though there was a previous binding there.
Also note how the lambda-rule implements the concept commonly known in functional programming as function closure
. In fact, a function has access to the state of the lexical scope (frame) in which it has been defined. In other words, all the variables defined in the same scope of the function (when defined) are visible inside the function itself and can be included in its body. This means that, regardless of where in the program a function is called, the function keeps the same state of those variables even though the current lexical scope may be totally different and those variables may not exist anymore. This is because, as explained in the rule above, the procedure object has its environment pointer pointing to the frame in the environment where it was evaluated at definition time.
Finally, I conclude this post by pointing out that all rules in the EM are evaluation rule, while the last rule considers application of procedures. We will see in the next post that the interplay between expression evaluation and function application is going to be the core of the whole evaluation process of any program.