Wednesday, February 3, 2010

Taking control over legacy code


The problem

Some years ago I faced a situation where a company's main public webapplication ran on a legacy mainframe (OS/390) webserver. It was written in REXX.

The developers had to do the actual coding in a terminal window (3270).

If a developer wanted to code in a regular Text Editor (TextPad), he had to first download the sourcefile via FTP, edit it, then FTP the new version back up to the mainframe to test it. To compile the uploaded sourcefile he had to use a terminal window and navigate to the file (dataset), then disable and enable it to force a recompile of the file.

One other major problem with the FTP-solution was that different developers did overwrite each others changes when they uploaded their new files.

Since the source was not managed by any source control system, it was basically impossible to figure out who had change the code and why.

As you can see this was not an ideal situation.

The ideal solution

The ideal solution is for sure to rewrite the application from scratch with modern technology, but this was not an option for the Company. They felt that they had invested too much in the existing code and that it would take too long time to rewrite it. Not to mention that they would have been unable to create new stuff while porting the old stuff.

Taking control over the legacy code

Since it was not an option to rewrite the application we needed to make it as convenient as possible to work with it.

This is what I ended up doing:

We downloaded all the code and added it to SubVersion. Then we "defined" that that the version stored in SubVersion was the "master (correct) version" of the code, not the version stored on the mainframe.

Then I wrote a deployment tool in Java that automated the deployment-process.

Since we could not prevent other developers (in other teams) to directly edit the code on the mainframe we had to have a mechanism to prevent us from silently overwrite their code changes. This was a critical feature when selling the "idea" to my leader.

To detect this the deployment tool automatically added some metadata to the sourcefile when uploading it to the mainframe. This metadata contained a hash-value (crc, fingerprint) representing the exact state of the sourcecode when uploaded. This made it possible to validate the existing mainframe version of the file before overwriting it with new versions.

The metadata was generated inside a comment (/* metadata */) since the altered source file still needed to compile.

The deployment tool could also be used to compile the source remote on the mainframe. This was done by using a linux component called s3270 which lets you script the terminal session. Since we needed to run the deployment tool on windows, the deployment tool ran s3270 using cygwin.

Since the upload- and compileprocess was slow, we wanted to prevent us from uploading and compiling unchanged files.

To fix this we also included subversion url- and revision-info in the metadata. This made it possible to resolve which files had changed and only upload- and compile those.

Conclution

By taking control over the source (adding it to SubVersion) and automating the deploy- and compile-process we ended up with an much better development environment.

This ended up only being the first step away from the mainframe. Today this old application still lives, only that it runs inside an Emulator running in Tomcat on a Windows Server.

I hope this blog post inspires someone.