Don’t Be So Direct

After years tossing up the idea, I finally got around to putting together a short course on advanced programming concepts for scientists. The research group I belong to has grown considerably in the last few years, and there seems to be more interest in programming than ever before, so the time seemed right.

The course is called ‘Programming Paradigms for Scientific Developers’. It consists of just three lectures: the first covers Procedural and Structured Programming; the second, Object Oriented Programming; and the third, Generic Programming. The goal is not to teach people how to write an if branch in Fortran, or define a function in Python; it is designed to address the concepts that transcend language-level details.

Scientists are not like most developers — they have a nasty habit of wanting to know why. They don’t usually accept advice unless it is accompanied by solid reasoning. Preaching inheritance and polymorphism to Scientists perfectly content with common blocks and implicit typing will get you about as far as a Ballet Dancer in a Mangrove Swamp. To get through to them, you have to be able to rationalize the concepts you are advocating, and that means a lot of soul searching.

One of the concepts I use throughout my course is that of indirection. I don’t simply mean the term as it is often used in C programming to describe the role of a pointer, I mean it in a much broader sense. Indirection relates to how directly something is represented in a piece of software. For example, a function provides a means of indirectly executing a series of instructions. The alternative to a function is directly inserting the function body into the code wherever it is required.

Because indirection occurs at every level of software development, I have been able to use it as a base from which to describe new techniques. I begin by demonstrating the role of indirection in the development techniques the Scientists are already familiar with, and then show how the more advanced programming paradigms facilitate other forms of indirection not possible in procedural programming.

An important form of indirection in Procedural Programming is that introduced by a procedure (i.e.. subroutine or function). A procedure allows the programmer to avoid duplication of code by using it indirectly via a call. Reducing duplication is an important theme in software development, and techniques that seek to introduce indirection are inevitably also designed to reduce duplication — the one facilitates the other.

A procedure also places an interface between the calling code and the procedure body. Interfaces are the means by which indirection is realized. By introducing an interface, code becomes more flexible, because as long as the interface is fixed, code behind the interface is free to vary independent of the calling code. (In object oriented terms, code behind an interface is known asimplementation.)

In summary, different forms of indirection are designed to eliminate different types of duplication by introducing different sorts of interfaces. Once you realize this, the techniques introduced in each programming paradigm make a lot more sense.

Consider Fortran’s user defined types (UDTs), which are equivalent to C’s structs. UDTs are to variables what procedures are to expressions: a form of indirection that relieves the developer from duplicating data declarations. And take inheritance in object oriented programming (OOP); it is a means of indirectly including the variables and methods of one class in another class.

Polymorphism is one of the more difficult concepts to grasp for those new to OOP. It makes more sense, however, when you recognize it as yet another incarnation of indirection, one in which interfaces are introduced to free a given piece of code from making direct reference to a particular concrete data type.

Reuse of code in procedural programs — which do not make use of polymorphism — usually involves conditional branches, with one branch for each data type used. Consider the following Fortran 90 example:


integer, parameter                :: QN_OPTIMIZER = 1
integer, parameter                :: CG_OPTIMIZER = 2
	
type (QuasiNewtonOptimizer)       :: qn
type (ConjugateGradientOptimizer) :: cg
integer                           :: opt
	
read(5,*)opt
	
select case (opt)
  case (QN_OPTIMIZER)
    call new(qn)
  case (CG_OPTIMIZER)
    call new(cg)
end select
	
select case (opt)
  case (QN_OPTIMIZER)
    call takeStep(qn)
  case (CG_OPTIMIZER)
    call takeStep(cg)
end select
	
select case (opt)
  case (QN_OPTIMIZER)
    call delete(qn)
  case (CG_OPTIMIZER)
    call delete(cg)
end select

This code could form the basis of an optimization engine that includes several different types of optimizers. What you will hopefully notice is that there is a subtle form of duplication occurring in the branching structure. Exactly the same form of select block is being used for initializing the optimizers, taking a step, and deleting them again. Wouldn’t it be good if you could employ a single branching block, and somehow ‘remember’ which branch was followed, so that the rest of your code would not be polluted by duplicated blocks?

Polymorphism is in effect exactly that: a means of storing branching decisions. Consider the following rewrite of the above example:


integer, parameter                :: QN_OPTIMIZER = 1
integer, parameter                :: CG_OPTIMIZER = 2
	
type (QuasiNewtonOptimizer)       :: qn
type (ConjugateGradientOptimizer) :: cg
type (Optimizer)                  :: optimizer
integer                           :: opt
	
read(5,*)opt
	
! Choose implementation
select case (opt)
  case (QN_OPTIMIZER)
    call new(qn)
    optimizer = qn
  case (CG_OPTIMIZER)
    call new(cg)
    optimizer = cg
end select
	
! Generic type remembers the choice
call takeStep(optimizer)
call delete(optimizer)

In this case, a new ‘generic’ type called Optimizer has been added. It effectively stores the branch chosen when an optimizer is initialized. The generic type is a polymorphic pointer: when thetakeStep and delete methods of the Optimizer object are invoked, it ‘looks up’ the stored concrete optimizer type, and invokes the appropriate subroutine. Naturally, the look up is a form of indirection, freeing the calling code from direct knowledge of the executed code, including the concrete type of the optimizer.

Standards of Fortran prior to 2003 do not directly support polymorphism of this type, but it is easy enough to fudge — the code above is real working Fortran 90. To find out more about what lies behind the Optimizer generic type, you can download my course slides, which go into plenty of detail.

Someday, I would like to incorporate automated creation of generic types into the Forpedopreprocessor, but I’ll leave that for another time …

Leave a Comment

Filed under C++, Fortran, Python, Scientific Programming

Comments are closed.