SICP series - Data Abstraction

14 November 2019

Tags: fundamentals SICP functional programming

The previous post Procedural Abstraction dealt with the idea of creating procedures to capture patterns of computation and abstracting them away by suppressing the implementation details and giving them a name so taht it is possible to treat them as primitive elements of the language. The topic of this post tackles a complementary issue to that of procedure abstraction, specifically how to group together pieces of information or data into abstract structures. Even though the problems both sides of abstraction are complementary, it will turn out that they are actually interconnected and thus the distinction among them is blurred. In fact, the procedures to manipulate the elements of a data structure usually have an inherent structure that mimics the data structure. This is just a teaser, but I will come back on this more in detail later.

Compound Data

It is sometimes handy if we can use data elements which differ from the basic primitives the language provides you. Therefore, there needs to be a way for us to define and use our own data structures, in particular compound data structures. So there needs to be a way to glue simple elements together to form a compound structure which can be treated as a unit piece of primitive data element as well as a way to pull the compound structure’s constituent pieces apart. This compound structure should be treated as a single unit, which means that it should be possible to pass it as argument and return as a value. This leads to the concept of closure.

closure: "the result obtained by creating a compound data structure can itself be treated as a primitive object and thus be input to the creation of another compound object"

Pairs

Scheme provides a basic construct to glue things together, cons (constructor). cons is a procedure which takes two expressions as input, evaluates them and glues those values together to form a pair. In order to get those elements back from the pair Scheme provides the procedures car and cdr which return the first and second element respectively. Furthermore, the contract can be extended to include operations such as predicates to check if a given data structure satisfies the contract or not. So the contract may look like this.

(cons <x-exp> <y-exp>) ==> <P>  ; constructor
(car <P>) ==> <x-val>           ; accessor
(cdr <P>) ==> <y-val>           ; accessor
(pred? <x>)                     ; predicate
  ==> #t if pred? evaluates to a pair, else #f

To portrait an implementation of a contract, the most basic example we could think is that of a 2-D point.

(define (make-point x y)
  (cons x y))
(define (point-x p)
  (car p))
(define (point-y p)
  (cdr p))
(define (pair? p)
  (; check details))

Lists

Scheme provides another primitive way of gluing together elements, the list. A list is a data structure that can hold an arbitrary number of ordered items. It is in practice implemented under the hood as a sequence of pairs with the following properties:

car-part of a pair in sequence - holds an item
cdr-part of a pair in sequence - holds a pointer to the rest of the list
empty-list nil - signals no more pairs or end of the list

Figure 1. Lists in Scheme

Note that lists are closed under the operations of cons, car and cdr, and can thus be treated as primitives.

Several procedures can be written to define operations on lists; one thing they share is their recursive structure which mimics the recursive structure of the data object. For example, consider the append operation:

(define (append l1 l2)
  (if (null? l1)
    l2
    (adjoin (first l1)
      (append (rest l1) l2))))

where adjoin, first and rest are defined in terms of cons, car and cdr for lists:

(define adjoin cons)
(define first car)
(define rest cdr)

Take-home

As seen with pairs and lists, it is possible to build systems which rely on layers of abstractions barriers, creating a sort of hierarchy of data abstractions. For example, lists are composed of pairs and groups could be defined in terms of lists. regardless of the level, the user to use only those operations exposed for a given level of abstraction without messing with the procedures of the implementation of that structure. This leads to the benefit that the user is shielded by the implementation details of the underlying representation of a given structure. Which means that alternative implementations could be provided without the user ever noticing it, provided that the contract is respected.

Abstract Data Types

In Scheme it is possible to create tagged data structures by combining symbols (tags) with data structures. Symbols are data objects which, when evaluated, refer to the name itself rather than the value associated to it. Mixing tags with data structures leads to two major benefits:

data-directed programming - functions that select the right operations based on the arguments
defensive programming - functions that fail gracefully if given bad arguments

Procedures following the data-directed programming approach first check the tag of a data object to choose the right sub-procedure designed specifically to to handle data of that given type. This leads to more modular systems, as procedures can be decomposed in specialized sub-components rather than having to deal with any possible type by themselves. With defensive programming procedures check the type of the arguments and produce an error message in case of an unexpected data object has been passed.

With these concepts in mind it is then possible to define abstract data types (ADTs). An example could be an ADT for sums. Some of the procedures which could be defined for implementing it could be for example sake.

(define (make-sum addend augend)
  (list '+ addend augend'))
(define (sum-exp? e)
  (and (pair? e) (eq? (car e) '+')))

The first procedure defines a tagged-data structure make-sum prefixed by the symbol +. The second procedure exploits defensive programming as it first checks the argument to see if it is a pair. In case it is not, then it just returns false. If it is a pair, it proceeds to check whether the first element of the list is the symbol +. We could implement a simple evaluator function like this

(define (eval e)
  (cond
    ((number? e) e)
    ((sum-exp? e)
      (+ (eval (sum-addend e))
         (eval (sum-augend e))))
    (else
      (error "unknow expression " e))))

In this excerpt we have exploited data-directed programming to check the expressions types looking for the primitive types first. If the expression is not a number, we resort to defensive programming to check if the expression is a sum.

« Is computer science the right name? SICP series - Procedural Abstraction »