Category Archives: Scientific Programming

Joining the MacResearch Team

I’ve been a regular visitor to the www.macresearch.org web site pretty much since its inception, so I was very pleased to be able to accept an invitation to join the Executive Committee.

MacResearch is a web site that targets the Mac-using Scientist. It provides a wide range of services, including news feeds, software reviews, how-to articles, forums, a script repository, and — most recently — access to a 4-node Xserve computational cluster.

But one of the more important roles that MacResearch has taken on is that of mediator: Polls are held regularly, and the results summarized in a report which is communicated directly to Apple, and released to the community at large. If you want to know more, either visit the site, or check out the new web cast, in which Ivan Judson and Joel Dudley explain it all much more eloquently than I ever could.

What will my role be at MacResearch? To be honest, it’s a bit too early to say. I will certainly contribute content, most probably related to scientific software development in Cocoa, Python, C++, and Fortran. I also have some ideas for applications of Xgrid, but I can’t say much more than that until I find out what the existing MacResearch team have in mind. Whatever happens, I’m sure it will be an interesting ride…

Leave a Comment

Filed under C++, Cocoa, Fortran, Mac, Personal, Python, Scientific Programming

Psychic Mac

I just had one of those spooky moments. You know the ones: you are debugging, and find something totally unexpected…something that shouldn’t even be possible.

What I was doing was running a subprocess from within a Cocoa app I am developing. I was usingNSTask to start a script, and retrieving the output with an NSPipe. Nothing complicated about that.

Because I am still in the early stages of development, the script I was using was just a stand-in, to make sure everything was working to plan. It simply wrote a property list to standard output, with a few static values in it. I was planning to rewrite this script later such that it invoked the UNIX command ps, to get information about tasks running on the computer.

To my utter surprise, when I ran the application and examined the script output in the debugger, I saw this:


  PID  TT  STAT      TIME COMMAND
  711  p1  S+     0:00.20 -bash

Hmm, that looks nothing like my property list, and, what’s more, it looks awfully like the output format of the ps command! A quick search of my project revealed no reference whatsoever to ps. What was going on? Was my Mac psychic? Did it know what I was going to do next?

As with many of these ‘How could it be so?’ debugging moments, the answer turned out to be relatively simple, but it had me spooked for a while. When I initially created the script, I hadinserted a ps command into it, but had quickly forgotten that I had done this, because I then changed the script to print the property list. The script file resides in a directory that is copied into the Resources folder of the application bundle when the app is built. The problem seems to have been that Xcode did not recognize that the script needed to be recopied into the application when it was modified. A clean build fixed the problem.

The moral of the story: when confronted with something you don’t understand, your first instinct is to ascribe it to some supernatural power, when the more likely explanation is just that Xcode is buggy.

Leave a Comment

Filed under Cocoa, Mac, Scientific Programming

Eclipse and PyDev are Worth the Entrance Fee

I’ve been doing some work on Forpedo the last few days, which is a preprocessor for Fortran written in Python. Forpedo currently enables you to use basic generic programming techniques in Fortran programs; I’m now adding options for run-time polymorphism, as described here.

I didn’t want to talk about Forpedo today though, but the IDE Eclipse, and the PyDev plugin in particular. A friend of mine pointed it out to me, and I thought I would give it a try for Python development. In the past, I have tried developing Fortran with the Photran Eclipse plugin, but found it a bit difficult to configure for my Fortran compiler and build system. In the end, I gave up.

My experience with PyDev was very different though. I came across a few bugs, but in general it works as advertised, and was very easy to configure. The editor and code completion are powerful, and it has an outline view that allows you to easily navigate to any class, method, or function in a file. Best of all, it has a graphical debugger, which sure beats debugging from the command line, or dropping print statements into the code to locate problems. Another time saver are the links in the call stack dump that PyDev adds to allow you to jump to a problem spot when a script crashes.

All of these features are to be found in Xcode too, but Xcode only really works well for a handful of languages, and Python isn’t one of them. Working with Objective-C in Xcode has spoiled me, and I always dread having to go back to vi or TextMate — which is a great text editor, by the way — when I have to develop in Fortran or Python. Eclipse seems to be offering me a way out, at least for Python.

The Eclipse IDE itself is actually a very well written cross-platform application. ‘Cross-Platform’ usually equates to ‘Dodgy as Sh.t’, but you would never guess Eclipse was written in Java, and runs equally well on Linux and Windows as the Mac. The secret seems to be the API used to develop the user interface: The Standard Widget Toolkit (SWT). Unlike Java’s other UI libraries, AWT and Swing, SWT utilizes native widgets on each platform. On the Mac, it wraps around Carbon calls, so the windows and buttons you see on the screen are the real McCoy. It makes a world of difference to the look-and-feel of an App.

So it looks like Eclipse might become a permanent addition to my Dock. If you regularly develop in Python, why not take PyDev for a run — it’s the only Python IDE worth the time of day, in my view.

Leave a Comment

Filed under Fortran, Mac, Python, Scientific Programming

Which Programming Language Will Scientists Be Using in 20 Years?

Scientific Software development is still dominated by Fortran, and what’s more, most Fortran programs have still not made the transition to the Fortran 90/95 standard, let alone Fortran 2003. Predicting the demise of Fortran is a sport almost as well-subscribed as predicting the downfall of Apple, but both Fortran and Apple have proven much more resilient than most could have anticipated. There wouldn’t be too many willing to bet against Apple at the moment, but Fortran may be about to come under pressure from new languages. Will the Fortran nay-sayers finally draw blood?

I do a lot of programming in Fortran; it forms the basis of the ADF package, which I contribute to, and most of my research programs are written in Fortran 90. I must admit to having a strong aversion to Fortran 77, especially when it is used to write new code. It was no doubt once a very powerful language, but by today’s standards is quite primitive. Fortran 90 addressed many of the deficiencies, making it a very useable language, approximately on a par with C. Some features of Fortran 90, such as arrays, are actually far superior to anything in C.

Recently, I have been following the transition to Fortran 2003 with some interest, and even looking beyond it to future Fortran standards. Fortran 2003 exists as a standard, but not yet as a practical language, because there is no compiler that fully supports it. It will be several years before it is a serious option, but you can already read about it.

Unfortunately, from my point of view, Fortran 2003 suffers from the same fate as most aging languages: it is quite verbose and ungainly. It’s not necessarily the fault of the Standard authors; it’s just what happens when you try to make something do what it was never intended for, like installing a turbo in a T-model Ford. For example, Fortran 2003 finally brings Object-Oriented Programming (OOP) to Fortran, but it is not a pretty fit. Languages like Python, that were designed for OOP from day one, are much more compact and natural.

The real question is if it even matters whether these new features are clumsy or not. Fortran is in a monopoly position, and history has told us that that can be a formidable barrier for contenders to overcome, even when the monopolist is second rate. The shear volume of Fortran code already in existence, and the reluctance of Scientists to learn a new programming language, may be enough to guarantee Fortran’s future.

It may not be all smooth sailing though, because for the first time in my career, there are efforts afoot to design new languages specifically with scientific applications in mind. One such language is Fortress, which is being developed at Sun Microsystems. Sun has achieved a lot with Java, and you might think that if anyone could come up with a viable new language, they could. Their team is headed up by Guy Steele, who helped design Java and contributed to the High Performance Fortran specification.

It will probably be another 5 years before Fortress is any more than an academic exercise, if it makes it that far. Even if it doesn’t supplant Fortran completely, it would be good to have some competition amongst scientific languages — at the moment it’s a one horse race.

Leave a Comment

Filed under Fortran, High Performance Computing, Scientific Programming

Why developing scientific apps for the Mac is a dead end street

MacResearch, which I think is a great site for the Mac using Scientist, has pondered the question of why there hasn’t been a boom in Scientific Software exclusively for the Mac. The assumption is that because the Mac has free developer tools; a great application development interface (API) in Cocoa; and easy to use scripting languages like AppleScript, that it should only be a matter of time before a number of ‘killer’ scientific apps appear.

As a Mac-using Scientist and Developer, I am quite well credentialed to offer some insight into this. In my day job, I help develop the commercial cross-platform Quantum Chemistry softwareADF, and in my spare time I develop and sell a financial-modeling app — written in Cocoa — called Trade Strategist. I also do most of my Scientific Research on the Mac, with the free Xcode tools referred to in the MacResearch article.

There are two ways in which a Scientific App could become publicly available: The first is via a company, and the second is via a Scientist. But the Mac market is relatively small, and the market for scientific Mac users even smaller. There really is not enough money in it to support a company developing exclusively for the Mac. In fact, there really isn’t enough money in Scientific Software to afford the luxury of any platform exclusivity whatsoever. Most companies I know support multiple platforms to make ends meet.

So if we assume that a company must develop cross-platform Scientific Software, it is pretty clear why there are not many company-developed apps written in Cocoa — Cocoa is a proprietary technology only supported on Mac OS X. As wonderful as it is, it is not an option for most companies, that end up choosing cross platform solutions like Qt and Tcl/Tk.

With companies out of the running, why aren’t there more software packages coming from the Scientists themselves? Ultimately, it again comes back to money, or at least time (which equals money). Even with a great API like Cocoa, developing good software is quite a time-consuming process. To get to first base is actually quite easy: You can throw together a prototype or version of your software with limited-functionality in a few days or weeks. The problem is taking the next step of turning that unpolished rock into something desirable to others.

Writing documentation, thorough testing, generalizing the functionality, improving the user interface; it all takes a lot of time. And its time the average Scientist doesn’t have to spend on such frivolous activities. Publish or perish, man! That great user interface won’t result in any new publications, so just take the ugly but functional version of the app that you already have, use it to generate your own scientific results, and leave the swarming masses to their own devices.

I am actually talking from experience here, because I once had stars in my eyes, and a piece of software that I thought could revolutionize the Scientific World: Dynasity. Dynasity is a Cocoa visualization app. I like to think of it as a visual multi-track recording studio. You create different visual tracks, and follow them in time. Dynasity also leverages QuickTime, making it possible to capture movies and stills with the click of a button.

So where is Dynasity? Where can you download it? Well, you can’t really. Dynasity is developed and used internally in my research group, but it has never been released to the public. The time required to generalize the software for a mass audience, and the problems of supporting it thereafter, make it undesirable. And any potential sales would hardly compensate the investment of time demanded. (If you are really intrigued about Dynasity, you can download it here. Note that this is not an official release, and comes with no promise of support or any warranty whatsoever.)

There are Scientific Apps available on the Mac, but they are often a bit underdone, for the reasons given above, and could hardly be classified as ‘killer apps’. There are exceptions though: As MacResearch mentioned in their piece, Mek&Tosj — fellow inhabitants of Amsterdam — write some great Cocoa software for Molecular Biologists. DataTank is also a Mac-exclusive that I like a lot; it is a visualization app like Dynasity, but much further developed.

One thing you could be forgiven for thinking from all of this is that the Mac is not much good to the Scientific Developer. Nothing could be further from the truth. I do most of my development work, from legacy Fortran to advanced C++ and Python scripting, with Apple’s free tools on a Mac. Xcode is great, and if you add to that Xgrid, gcc, and scripting languages like Python, Perl, Ruby, and Tcl, you have a winning combination. I also regularly use TextMate, an editor gaining favor with developers, and exclusive to the Mac.

In conclusion, the Mac is an insanely-great Scientific Platform, but don’t hold your breath for Mac-exclusives in the Scientific Realm. There simply isn’t the market for the big boys, and most scientists have better things to do with their time than writing help pages and answering questions of the form “I haven’t looked at the help pages yet, but can you tell me how to … ?”.

3 Comments

Filed under Cocoa, Mac, Scientific Programming

Don’t Be So Direct

After years tossing up the idea, I finally got around to putting together a short course on advanced programming concepts for scientists. The research group I belong to has grown considerably in the last few years, and there seems to be more interest in programming than ever before, so the time seemed right.

The course is called ‘Programming Paradigms for Scientific Developers’. It consists of just three lectures: the first covers Procedural and Structured Programming; the second, Object Oriented Programming; and the third, Generic Programming. The goal is not to teach people how to write an if branch in Fortran, or define a function in Python; it is designed to address the concepts that transcend language-level details.

Scientists are not like most developers — they have a nasty habit of wanting to know why. They don’t usually accept advice unless it is accompanied by solid reasoning. Preaching inheritance and polymorphism to Scientists perfectly content with common blocks and implicit typing will get you about as far as a Ballet Dancer in a Mangrove Swamp. To get through to them, you have to be able to rationalize the concepts you are advocating, and that means a lot of soul searching.

One of the concepts I use throughout my course is that of indirection. I don’t simply mean the term as it is often used in C programming to describe the role of a pointer, I mean it in a much broader sense. Indirection relates to how directly something is represented in a piece of software. For example, a function provides a means of indirectly executing a series of instructions. The alternative to a function is directly inserting the function body into the code wherever it is required.

Because indirection occurs at every level of software development, I have been able to use it as a base from which to describe new techniques. I begin by demonstrating the role of indirection in the development techniques the Scientists are already familiar with, and then show how the more advanced programming paradigms facilitate other forms of indirection not possible in procedural programming.

An important form of indirection in Procedural Programming is that introduced by a procedure (i.e.. subroutine or function). A procedure allows the programmer to avoid duplication of code by using it indirectly via a call. Reducing duplication is an important theme in software development, and techniques that seek to introduce indirection are inevitably also designed to reduce duplication — the one facilitates the other.

A procedure also places an interface between the calling code and the procedure body. Interfaces are the means by which indirection is realized. By introducing an interface, code becomes more flexible, because as long as the interface is fixed, code behind the interface is free to vary independent of the calling code. (In object oriented terms, code behind an interface is known asimplementation.)

In summary, different forms of indirection are designed to eliminate different types of duplication by introducing different sorts of interfaces. Once you realize this, the techniques introduced in each programming paradigm make a lot more sense.

Consider Fortran’s user defined types (UDTs), which are equivalent to C’s structs. UDTs are to variables what procedures are to expressions: a form of indirection that relieves the developer from duplicating data declarations. And take inheritance in object oriented programming (OOP); it is a means of indirectly including the variables and methods of one class in another class.

Polymorphism is one of the more difficult concepts to grasp for those new to OOP. It makes more sense, however, when you recognize it as yet another incarnation of indirection, one in which interfaces are introduced to free a given piece of code from making direct reference to a particular concrete data type.

Reuse of code in procedural programs — which do not make use of polymorphism — usually involves conditional branches, with one branch for each data type used. Consider the following Fortran 90 example:


integer, parameter                :: QN_OPTIMIZER = 1
integer, parameter                :: CG_OPTIMIZER = 2
	
type (QuasiNewtonOptimizer)       :: qn
type (ConjugateGradientOptimizer) :: cg
integer                           :: opt
	
read(5,*)opt
	
select case (opt)
  case (QN_OPTIMIZER)
    call new(qn)
  case (CG_OPTIMIZER)
    call new(cg)
end select
	
select case (opt)
  case (QN_OPTIMIZER)
    call takeStep(qn)
  case (CG_OPTIMIZER)
    call takeStep(cg)
end select
	
select case (opt)
  case (QN_OPTIMIZER)
    call delete(qn)
  case (CG_OPTIMIZER)
    call delete(cg)
end select

This code could form the basis of an optimization engine that includes several different types of optimizers. What you will hopefully notice is that there is a subtle form of duplication occurring in the branching structure. Exactly the same form of select block is being used for initializing the optimizers, taking a step, and deleting them again. Wouldn’t it be good if you could employ a single branching block, and somehow ‘remember’ which branch was followed, so that the rest of your code would not be polluted by duplicated blocks?

Polymorphism is in effect exactly that: a means of storing branching decisions. Consider the following rewrite of the above example:


integer, parameter                :: QN_OPTIMIZER = 1
integer, parameter                :: CG_OPTIMIZER = 2
	
type (QuasiNewtonOptimizer)       :: qn
type (ConjugateGradientOptimizer) :: cg
type (Optimizer)                  :: optimizer
integer                           :: opt
	
read(5,*)opt
	
! Choose implementation
select case (opt)
  case (QN_OPTIMIZER)
    call new(qn)
    optimizer = qn
  case (CG_OPTIMIZER)
    call new(cg)
    optimizer = cg
end select
	
! Generic type remembers the choice
call takeStep(optimizer)
call delete(optimizer)

In this case, a new ‘generic’ type called Optimizer has been added. It effectively stores the branch chosen when an optimizer is initialized. The generic type is a polymorphic pointer: when thetakeStep and delete methods of the Optimizer object are invoked, it ‘looks up’ the stored concrete optimizer type, and invokes the appropriate subroutine. Naturally, the look up is a form of indirection, freeing the calling code from direct knowledge of the executed code, including the concrete type of the optimizer.

Standards of Fortran prior to 2003 do not directly support polymorphism of this type, but it is easy enough to fudge — the code above is real working Fortran 90. To find out more about what lies behind the Optimizer generic type, you can download my course slides, which go into plenty of detail.

Someday, I would like to incorporate automated creation of generic types into the Forpedopreprocessor, but I’ll leave that for another time …

Leave a Comment

Filed under C++, Fortran, Python, Scientific Programming