Refactoring: Improving the Design of Existing Code

Posted by jonathan on December 10, 2007

Refactoring: Improving the Design of Existing Code
Martin Fowler with Kent Beck, John Brant, William Opdyke and Don Roberts.
In the Addison-Wesley Object Technology Series.

Refactoring (as I’ll refer to the book from here on in) is a heavy and beautifully produced 418 page hardback book. Martin Fowler is a UK-based independent consultant who has worked on many large systems and is the author of several other books including UML-Distilled.

Refactoring is a self-help book in the tradition of Code Complete by Steve McConnell. It defines its audience clearly as working programmers and provides a set of problems, a practical and easily followed set of remedies and a rationale for applying those techniques.

Code refactoring is the process of restructing code in a controlled way to improve the structure and clarity of code, whilst maintaining the meaning of the code being restructured. Many maintenance problems stem from poorly written code that has become overly complex, where objects are overly familiar with each other, and where solutions implemented expeditiously contribute to the software being hard to understand and hard to add features to.

Typically refactorings are applied over a testable or local scope, with existing behavior being preserved. Refactoring as defined in this book is not about fixing bad designs, but instead should be applied at lower levels.

Testing a la Extreme Programming is emphasized as a control for ensuring that program meaning is not changed by refactoring. It is not over emphasized, and this is not a book about testing, but it is often mentioned and stays in the background through the book.

The refactorings presented in the book are not intended as a comprehensive solution for all problems, but they do offer a means to regain control of software that has been implemented poorly, or where maintenance has been shown to simply replace old bugs with newer ones.

The book is divided into two main sections, introductory material that introduces and discusses refactorings, and a lengthy taxonomy of refactorings that includes both examples and further discussion. The introductory material consists of a long worked example through simple Java code that implements printing a statement for a video store. Despite the simplicity of the code, Fowler shows in clear detail where improvements can be made, and how those improvements make the code both impressively easy to understand, and easy to maintain and add features.

Several key refactorings are demonstrated in the opening chapter including Extract Method, Move Method and Replace Conditional with Polymorphism. This is a book about programming in the object oriented paradigm, so as you might expect, the first two refactorings refer to extracting and moving object methods either into new methods, or between objects. The third example provides a means to replace special cased behavior in a single object type by deriving a sub type of the object and moving type specific code to the sub types. This is a fundamental technique in object oriented programming, and is discussed here in practical terms.

Now that several actual refactorings have been introduced, Fowler provides a solid and well thought-out discussion of the why’s, when’s and when not’s of refactoring. For example, code can decay as features are added, and programmers special-case, or bodge additional functionality into existing objects. Fowler argues that the bitrot and decay makes software more unreliable, leads to bugs and can accelarate as the problem gets worse. Faced with these problems, refactoring should be used to improve local design and clean up and improve code, leading to better software, that is easier to maintain, easier to debug, and easier to improve with new features as requirements change.

However, there is a caveat, in that since software functionality should remain unchanged during refactoring, the process of refactoring consumes resources, but provides no easily measureable value. Fowler confronts this issue in a section that discusses how to communicate with managers, that you are performing refactoring work. He denies being subversive, but his conclusion is that refactoring should essentially be folded in with normal work as it improves the overall result.

This is a bit like goofing off on the basis that you’ll think better after 20 minutes of fooseball. I’d definitely subscribe to that theory, but many others may not.

Kent Beck guests in Chapter Three for a review of the issues in typical software that suggest a refactoring may be needed. This chapter is entitled Bad Smells in Code, and most of the smells presented will be familiar to any reasonably experienced programmer, and they will be a great learning experience for less experienced programmers. I got the same feeling reading this chapter as I did when I first read Code Complete. Here was someone writing down names and describing problems that I had a vague unease about, but was too inexperienced to really articulate or do something about. Typically the refactorings address the same kind of issues that a code review with a skilled experienced programmer would address. Long parameter lists, too long methods, objects delving about in each others private variables, case statements, related code spread across different objects etc. None of these problems are debiliting in themselves, but added up, they lead to software that can be prone to error and difficult to maintain.

Most of the remaining substance of the book, 209 pages, is given over to a taxonomy of refactorings. These 72 refactorings are covered in detail with comprehensive simple examples presented in Java. Each refactorings is given a clear name, a number and a line or two of descriptive text. The motivation for the refactoring is then discussed, often including caveats and cautions. The mechanics of implementing the refactoring are then listed, with 1 or more (and often more) examples of implementing the refactoring. Refactorings range from the very simple to more complex examples such as Convert Procedural Design to Objects.

Due to the difficulties of reproducing large and complex sections of code, Fowler sticks with relatively simple examples. These seem to grate on him more than the reader, and he can come across as somewhat embarrassed when we look at the employee, programmer, manager pay example for the tenth time. I certainly didn’t have a problem with it though.

Conclusion

This is a very well written and fun to read book. I personally feel that much of the material is implied by from Code Complete, but Fowler does a fantastic job of expanding and formalizing the idea of applying explicit refactorings. Much like Code Complete gave a motivation for keeping code well commented and laid out, this book presents the case for care and feeding of how to structure software. To fight bitrot and technical debt, poorly structured and unclear code should be targetted and refactored to improve structure and clarity. This gives a very real payback in terms of less required maintenance, and ease in adding features later on down the line.

Despite the fact that all the examples are in Java, the ideas are easily transferable to C++ or any procedural object oriented language.

Highly recommended.

Querying MS Visual Source Safe in Perl - Counting DIFFs

Posted by jonathan on August 15, 2007

The following Perl code automates MS Visual Source Safe to get the stats on check-ins. The input should be a list of files from the output of my other script.



#!/usr/bin/perl -w
use strict;
 
# Read a list of check-ins and call 'ss Diff' to get what was checked-in
 
$ENV{'SSDIR'} = "<source safe network folder>";
my $project = "<project folder>";
my $checkInsFile = "<output from first perl script>";
 
# Read in check-ins from file
open( checkIns, $checkInsFile );
while( <checkIns> ) {
    
    # Parse out the details from the check-in
    my( $file, $ver, $use, $date, $com );
    if( m/^(.*)\t(.*)\t(.*)\t(.*)\t(.*)$/ ) {
        ( $file, $ver, $use, $date, $com ) = ($1, $2, $3, $4, $5);
    }
    
    # Put any check for conditions here
  
    # Check for some date range
    
    # Check for some file pattern
    
    # Check for some comment pattern
        
    # Summed totals for this check-in
    my $added   = 0;
    my $changed = 0;
    my $deleted = 0;
    
    # Special case version 1 to get the file line count of Version 1 (Check-in)
    if( $ver == '1' ) {
        
        # Get linecount for version 1 of file
        my $command = "ss View \$/$project/$file -V$ver|";
        open( ssView, $command );
        while( <ssView> ) {
            $added++;
        }
        close( ssView );
        
    }
    else {
        
        # Open a Unix diff from version-1 to version
        my $verMinus1 = $ver - 1;
        my $command = "ss Diff -DU \$/$project/$file -V$verMinus1~$ver|";
        open( ssDiff, $command );
        
        while( <ssDiff> ) {
            
            # Match for "number[a|d|c]number" at the start of a line
            if( m/^(\d+),?(\d*)([acd])(\d+),?(\d*)$/ ) {
                
                # Added
                if( $3 eq "a" ) {
                    $added += ($5 gt "" ? $5 : $4) - $4 + 1;
                }
        
                # Changed
                if( $3 eq "c" ) {
                    $changed += ($5 gt "" ? $5 : $4) - $4 + 1;
                }
        
                # Deleted
                if( $3 eq "d" ) {
                    $deleted += ($5 gt "" ? $5 : $4) - $4 + 1;
                }
            }
        }
        
        close( ssDiff );
    }
    
    # Print check-in details + edit details
    print "$file\t$ver\t$use\t$date\t$added\t$changed\t$deleted\t$com\n";
    #print "Totals for ${file};$ver ($use) on $date added: $added changed: $changed deleted: $deleted\n";
}
 
# Done
exit 0;

Go Ubuntu!!

Posted by jonathan on May 23, 2007

Well I am posting this from Firefox on Ubuntu, and just great does it look! Ubuntu 7.04 (Feisty Fawn) is pretty much rock solid so far, and out of the box works great.

I have my nvidia graphics, sound, SATA, file sharing, WINE all working great, and am slowly working through getting a bunch of old documents out of Access 2.0 and Word 2.0 formats and in to PDF, which sadly takes a Mac and a copy of Office 2004, though OpenOffice made a valiant attempt. In fact I ended up printing the old Word Docs into PDF from Mac Office, straight into a SAMBA share, so I can now grab the unformatted text if I really need it.

Anyway, Ubuntu 7.04 will ship on selected Dell machines as of Thursday 24th May 2007, but you can download it and try it for free from the link above. Go try it! You’ll be impressed!

Time Enough to Crash - Backup 2

Posted by jonathan on December 12, 2006

There’s been a good lull of about 3 months or so since Nathan at the Woodlands of Houston Apple Store replaced the hard drive in my Power Book G4, and I’ve started getting nervous about it having an ‘accident’ again.

It was only a bad sector, in a single song, but it was enough to jack up my preferred method of backing up, which is to use the disk utility software after booting off the OS Install disc. Generally this is a pretty good way to get a complete image of your drive, every few weeks or so.

Anyway, now that my wife Meredith has her own MacBook I figured it was time to start a proper backup regime. A search on Google revealed a piece of software called SuperDuper.

SuperDuper installed in the usual drag and drop fashion and I was up and running with a backup after about an hour or so. I prefer to do full backups to an external drive, and a 60 Gig HD in the MacBook allows you to partition an external 250 Gig WD drive into four parts, allowing a nice rotation of backups.

I use the same model of WD external hard drive with my PowerBook, but partitioned into only three separate drives.

Name a Day Mon|Tue|Wed and Get Date in Previous Week

Posted by jonathan on October 18, 2006

I use this command-line UNIX utility for generating filenames when I know that a, sayyy, audio stream was generated in the previous week, but I only know the day …

Usage: `datep sun`
prints: 20061015 which in this case is a couple days previous to the current day. Perfect for when you want a filename friendly date from yesterday or before.

This may be backticked into any other command for generating filenames, etc. Use at your own-risk. Use ‘View source’ in your browser to get the source.

#include <stdio.h>
#include <string.h>
#include <time.h>
 
int main (int argc, const char * argv[]) {
 
# ifdef _DEBUG
  // Print args 0 - command, 1 - 3 letter day
  int i = 0;
  for( i=0 ; i<argc ; ++i ) {
    printf( "arg( %d ) = %s\n", i, argv[i] );
  }
# endif
 
  // Usage
 
  // Too few args
  if( argc < 2 ) {
    printf( "Error: Too few arguments.\nUsage: datep mon|tue|wed|...|sun [\"+format\"]\nSee date for description of format." );
    return -1;
  }
  
  // Check that we got at least 3 chars for day compare
  if( 3 != strlen( argv[1] ) ) {
    printf( "Please specify a valid day.\nUsage: datep mon|tue|wed|...|sun [\"+format\"]\nSee date for description of format.\n" );
    return -1;
  }
 
  // Check a valid day was passed
  int valid = 0;
  if( !strncasecmp( "sun", argv[1], 3 ) ) valid = 1;
  if( !strncasecmp( "mon", argv[1], 3 ) ) valid = 1;
  if( !strncasecmp( "tue", argv[1], 3 ) ) valid = 1;
  if( !strncasecmp( "wed", argv[1], 3 ) ) valid = 1;
  if( !strncasecmp( "thu", argv[1], 3 ) ) valid = 1;
  if( !strncasecmp( "fri", argv[1], 3 ) ) valid = 1;
  if( !strncasecmp( "sat", argv[1], 3 ) ) valid = 1;
 
  // If not valid ...
  if( !valid ) {
    printf( "Please specify a valid day.\nUsage: datep mon|tue|wed|...|sun [\"+format\"]\nSee date for description of format.\n" );
    return -1;
  }
 
  // Get todays date
  time_t cur_time = time(NULL);
  struct tm *loc_time;
  char buffer[255];
  strncpy( buffer, "xxx", 4 );
 
  // ### always skip back by at least one day
  // Skip back by at least 1 day until we find the day the user specified
  while( strncasecmp( argv[1], buffer, 3 )!=0 ) {
    // skip back a day
    cur_time = cur_time - 1*24*60*60;  // one days worth of seconds
 
    // Format TLA for day into buffer
    loc_time = localtime( &cur_time );
    strftime( buffer, 4, "%a", loc_time );
  }
 
  // Print day in standard [20061017] or passed format
  // ### implement using passed format
  strftime( buffer, 255, "%Y%m%d", loc_time );
  printf( "%s", buffer );
  
  return 0;
}
 

How Do I: Fix Downloaded Pictures from a Motorola Razr

Posted by jonathan on September 28, 2006

One minor problem with my Motorola Razr is that the filenames of downloaded images are munged into a somewhat stupid ‘DD-MM-YYY_xxxx.jpg’ format. And as dates are not preserved when I load the files across into my Mac, I wind up with a bunch of stupidly named and largely unsortable files.

Ok, this is a job for a command on UNIX (Mac OS X in this case) called AWK. AWK is one of the most venerable of the UNIX command-line utilities. Wikipedia has a page on AWK here.

The easiest way to describe AWK is that it allows you to apply a condition to a line of input, and to then ‘do something’ with lines of input that meet the condition, typically printing a new format, or building a command and piping to some other program.

The following piece of AWK applies all three of these common behaviors to do the following:

  • Pick up a filename
  • Ignore the filename if it does not start with two digits
  • Reformat the filename into a simple ‘mv’ command
  • Pipe the ‘mv’ command to a cshell


$ ls | awk 'substr($0,1,2)+0 > 0{print"mv "$0" 20"substr($0,7,2)"-"substr($0,4,2)"-"substr($0,1,2)"_"substr($0,10,4)".jpg"}' | csh

In this case, we transform ‘dd-mm-yy_xxxx.jpg’ to ‘mv dd-mm-yy_xxxx.jpg yyyy-mm-dd_xxxx.jpg’ and execute the commands in a csh.

The funny looking part at the start is the condition, which simply tests that substr($0,1,2) added to 0 is a number > 0.

Have fun with AWK.

Mactracker … All Macs, Past and Present!

Posted by jonathan on September 27, 2006

I just got through playing with the fantastic Mactracker application. Mactracker covers the entire history of the Macintosh from Macintosh, through Mac Portable, the first PowerBooks (drool) and the latest Mac Pro and Intel-based Macs.
As well as full specifications, Mactracker includes beautiful icons, a picture and potted history’s of each model. For me the most special and fun aspect of Mactracker is the startup-chime button which plays the startup-chime for each model. Ahhh, I was able to relive the sound of hearing my old Performa 6100 reboot for the umpteenth time…
Mactracker is freeware but donations are encouraged.
As an additional note, the design and fit and finish of Mactracker are absolutely exceptional. It’s a great lesson in what makes the Mac UI great.
Highly recommended!

iTunes Mac Happiness and PC Woes 1

Posted by jonathan on September 16, 2006

All three iTunes machines in our house have now been updated to the new iTunes 7 with somewhat mixed results.

Summary

1. No installation problems
2. Slow performance until gapless playback scanning is complete
3. Coverflow makes iTunes much more album oriented
4. You can assign almost anything as artwork
5. iTunes 7 Mac still runs just great on a PowerPC G4 (Pentium III class)
6. iTunes 7 PC is a dog on Pentium 4 2 GigHz (Slow, skips when playing)

Installation

On my G4 Powerbook iTunes installed flawlessly and initially came straight up with no problems. After starting however, I kept seeing iTunes trying to scan my library to identify songs for gapless playback. During this scanning the machine was unresponsive, and because my 2437 song 23Gig music library is stored on a slow USB linked laptop drive, the process took a long time. Eventually, I gave up trying to use the machine and just left it to do its thing. Once this process completes, it should not bother you again.

Albums and Coverflow

The next issue was with my library. Because I have so many separate single songs, sessions etc., which end up grouped into one-song ‘albums’, the coverflow feature is rendered pointless. The remedy is to very carefully, go through the library and group songs where appropriate and assign artwork.

One of the most fun things with iTunes 7 is the coverflow feature. iTunes has a new view that shows a 3d sliding image of album covers which is utterly successful in recreating that feeling of flipping through a bunch of LPs looking for something cool to put on. The side effect of this is that since starting using iTunes 7, I’ve been listening to entire albums again. Of course, you better have a bunch of good albums!

Mac Good, PC Bad (iTunes)

Now to the bad news. On my wife’s Dell Pentium 4 PC, iTunes installed just fine, as on the Macs, but unlike the Macs, it just hasn’t settled down. It looks the same, but with any activity on the PC, or even clicking within the iTunes window, the song playing will start skipping and sometimes just won’t stop.

I tried increasing the buffer size, with no real positive effect. I also quit most of the other running processes such as skype, gotomypc, vnc etc., and no improvement.

Final Words

So in summary, I say there aren’t too many issue with the Mac version of iTunes, but you should give the upgrade very serious thought if you are running on a PC. We went from having iTunes 6 run flawlessly on the PC, to having the slow skippy iTunes 7.

Note that none of these criticisms apply to the Mac version which is still great, even on a G4 laptop.

Five Key Rules for Writing Efficient and Maintainable SQL 1

Posted by jonathan on September 04, 2006

These five key rules have been distilled from several years working on complex commercial databases and help in writing queries that perform quickly, are easy to maintain, and need minimal debugging to provide correct results every time.

I concentrate on mainly on SELECT queries here, but the principles are applicable to the other SQL forms.

1. Write a comment with the primary or alternate keys when adding a table to a join

Be conscious of the most appropriate primary or alternate key when adding a table to a join. Write a comment such as /* PK Pipeline_No, Contract_No, Effective_Date_From, Amend_no */ when adding a table to a join. This lets people know that you have thought about join efficiency and gives confirmation that you have used the appropriate fields.

If you have a large join and there’s no suitable unique index, you might want to check if there should be.

Example - Join Dispensing Sheet to Doses to List Patients for a Hospital / Facility / Day

SELECT dose.patient_code, patient.given_name, dose.drug_type
/* PK: hospital_code, facility_code, disp_date */
FROM DispensingSheets disp_sheet
/* IX: hospital_code, facility_code, disp_date */
JOIN Doses doses
  ON disp_sheet.hospital_code = doses.hospital_code
  and disp_sheet.facility_code = dose.facility_code
  and disp_sheet.disp_date = dose.disp_date
/* PK: hospital_code, patient_code, eff_date_from */
LEFT JOIN Patients patient
  ON disp_sheet.hospital_code = patients.hospital_code
  AND disp_sheet.patient_code = patients.patient_code
  AND disp_sheet.disp_date BETWEEN patient.eff_date_from AND patient.eff_date_to
WHERE disp_sheet.hospital_code = '0001'
  and facility_code = 'pharmacy'
  and disp_date = to_date( '2006-09-03', 'YYYY-MM-DD' );

2. Always SELECT only the columns you need. Don’t ever use ‘*’.

Always SELECT only the columns you need in a query. If you are lazy and just use ‘*’ in a SELECT clause, it tells other programmers nothing about what you intend the query to return.

For example, look at the example above and consider the extra cost of building the data to return using ‘*’, compared with the few columns we actually need.

3. Use GROUP BY instead of DISTINCT

Most beginning SQL programmers use DISTINCT when they really mean GROUP BY. GROUP BY can be a pest if you are not really sure what you are selecting. However, if you are inserting to a table with a primary key (which should always be true), then the GROUP BY clause can be used to guarantee that you do not have duplicate values that will break the INSERT.

Example - Insert patients from another system

INSERT INTO PH_PATIENT_TABLE (
  hospital_code, 
  patient_code, 
  eff_date_from, 
  eff_date_to,
  given_name
  )
SELECT
  imp.hospital_code,
  imp.patient_code,
  imp.eff_date_from,
  MIN( imp.eff_date_to ),  -- Use MIN/MAX to get one value to match the 
  MIN( imp.given_name )  -- the GROUP BY.
FROM PH_PATIENT_IMPORT imp
/* Filter existing PK: hospital_code, patient_code, eff_date_from */
LEFT JOIN PH_PATIENT_TABLE patient
  ON imp.hospital_code = patient.hospital_code
  AND imp.patient_code = patient.patient_code
  AND imp.eff_dt_from = patient.eff_dt_from
/* Don't try and insert existing records */
WHERE patient.patient_code IS NULL
GROUP BY 
  imp.hospital_code,
  imp.patient_code,
  imp.eff_date_from

If you need to insert other columns outside the GROUP BY columns that occur in the SELECT, then you can use MIN(), MAX() which pick the minimum or maximum values for columns outside the GROUP BY key.

In most cases, there won’t be multiple values for eff_date_to, or given_name, but we could use a CASE expression, or HAVING statement to check if there were multiple values in these fields.

4. Never SELECT and INSERT to the same table

If you have a large SELECT/INSERT then use a staging table to break the query into two stages. Many databases allow you to look at queries that block. Use these tools to confirm and justify the extra table.

The main exception to this rule is shown above, where we do a LEFT JOIN to avoid trying insert values in the table that are already present.

5. Look at the query plan as you write the query

If you follow the steps above then your queries should be very efficient. Trust, but use the query optimizer to verify this. As a side-benefit, the query optimizer will pick up any typos prior to starting to test the query.

If you see an expensive operation, try and work out why. Fix it.

Cool Way To Prevent Those Flaming Mac Power Connectors 1

Posted by jonathan on September 02, 2006

My wife Meredith just came up with a great way to prevent problems with the power cable on her new MacBook. Here’s how:

This is really neat since it uses the clip that’s already on the cable. This should prevent much of the bending that can lead to fraying inside the cable.