Categories
software

Bill’s Software Guide

[My first foray, c. 1997, into thinking about the process of software. –Bill, 10/25/2010]

Infrastructure

Analysis and Specification

Design

Coding

Debugging

Time and Space Costs

References


Infrastructure

Error Tracking

Maintain a personal error log. I do it in HTML with this template:

<hr>
<b>Date: </b>
date
<br><b>Symptom: </b>
the-symptom
<br><b>Cause: </b>

the-cause
<br><b>Fix: </b>
the-fix
<br><b>Lesson: </b>
the-lesson

It makes each entry like this:

Date: date

Symptom: the-symptom

Cause: the-cause

Fix: the-fix

Lesson: the-lesson
At some point, I’ll go through and try to organize the problems.
For example, I’ve noticed that I often introduce bugs when I
reorganize part of a program with a semi-mechanical transformation,
so I need to watch those more carefully. There are existing defect
classification schemes; soon I’ll adopt one. (Example: DEC’s
two-letter codes (UE=user error, RN=next release, etc.), Orthogonal
Defect Classification, others.)

[Original inspiration: Knuth’s article “The errors of TeX”, Software – Practice and Experience, July 1989.]

Directory Organization

There are two types of deliverables: libraries and applications. A
library is a set of objects designed to work together. An
application is a full program.
A library consists of several things: a set of (usually one or
more) header files, a library for linking, and documentation.
Libraries will be used by one or more applications, and should be
developed as separate products. Libraries may depend on other
libraries.
An application may consist of several things: an executable
file, data files, and documentation.
This is the normal directory tree for a project:

Name/
        data/
        doc/
        exports/
        lib/
            libr1/
                *.a
                *.h
                doc/
            libr2/
                :
        obj/
        src/
            Makefile
            part1/
                Makefile
                *.cc 
                *.h
                RCS/
            part2/
            :
data/
Data files for the library or application.
doc/
Documentation for this application or library.
exports/
A staging area for deliverables of this library or application.
obj/
Object files. A project is normally a number of relatively independent parts. (Each part might be a reasonable section for a programmer to work on.) Each part contains the object files. For each part, there is a corresponding library file created from its object files, and stored in the lib/ subdirectory. Parts do not build directly against each others’ object files; rather, they link against the corresponding library file in lib/.
src/
Source files, for each part. There is a top-level makefile, and a subdirectory for each part. Each subdirectory will contain .h, .cc, and other source files. In addition, if using a source control like RCS, each subdirectory will have its own RCS subdirectory.

This structure may be tailored. If there is only one part, the
files may be directly under the src/ and obj/ levels. If there is
only one programmer, there may be a work/ subdirectory.
Individual programmers working on specific parts will
copy/checkout the corresponding part to their own area, where they
can mix source and object files as they wish.
Justification: This structure isolates the dependencies. It
keeps it clear which things belong to which. It tries to reduce the
number of situations where one part must grab a specific file out
of another, whether for import, export, or general use. The
separate object directory makes it easy to provide multiple object
directories for different machines, or to let others build an
object directory in their own area while the source is
read-only.

Compiler

  • Compile with ALL error checking on (and fix what it complains
    about!). I can’t believe all the people I’ve seen who haven’t taken
    this step.
  • Integrate from the bottom up. For utility classes, I often
    provide a main routine full of tests. I surround it with “#ifdef
    DEBUG/#endif” so it won’t be compiled that way by default.
    Compilers provide switches to let you define the macro when you
    compile it.

Analysis and Specification

  • Know what the program is supposed to do. If there’s no
    specification, how can it be wrong? This applies to test cases,
    too: it’s too easy to convince yourself it works if you don’t
    really know what the right answer should be.

Design

  • Watch the initial design. There should be a clear “high-level”
    design to the system, and it should be fairly clean. (For example,
    the operating systems class has a simulation. The people who drove
    their program by the next input card had a horrible time – had to
    worry about events occurring all over the place. Those who treated
    it as a simulation, finding the next event (card or internal), had
    a program structure that matched the problem structure, and it
    worked much better.)
  • Use Parnas’ information hiding approach. Basically, this says
    that each module should have a secret (e.g., its true
    representation). No other module should rely on the secret, but
    only the interface. If the secret changes, it should only affect
    one module. This goes to the level of the program specification as
    well: if the spec says “do 100 things”, there should only be one
    place that knows the number is 100. Another interpretation of this
    is: never have the same thing in two places, because that provides
    an opportunity for inconsistency.
  • Insert “points of leverage”. For example, in the simulation
    program above, the “right” structure has only one place where the
    next event is generated. This provides a single place you can look
    at to make sure it’s right.
  • Get everything “const-correct”. If a routine shouldn’t modify
    its arguments, it should say so with “const”. This is especially
    important in C++. (And it’s best to get it right from the beginning
    – trying to fix it causes “ripples”.) C++ lets you say whether
    methods are “const”, which matches up with the
    accessor/mutator/iterator breakdown.

Debugging

  • The closer the failure occurs to the actual point of error, the
    easier it is to find the problem. Use everything you can to trap it
    sooner rather than later.
  • Know what the program is supposed to do. If there’s no
    specification, how can it be wrong? This applies to test cases,
    too: it’s too easy to convince yourself it works if you don’t
    really know what the right answer should be.
  • When you have a problem, find the minimal input that will
    reproduce it. Use “binary split” to reduce large input files: try
    it on the first half of the data, see if it still fails. If so, cut
    the data file in half and try again. This will let you work your
    way down to a much smaller file. If this doesn’t work, try
    eliminating certain classes of input.
  • Use the debugger to see what things are like when they’re
    wrong. If you’re using the PRE/POST/ASSERT/OK definitions, it
    almost certainly failed an assertion. Stop the debugger there:
    check the stack trace and the state of the items.
  • It’s probably NOT the compiler! C-based languages have a lot of
    looseness in how they can handle things (size of basic objects,
    storage layout, results of many operations). If you think it is the
    compiler’s fault, cut the program down to something so small it can
    be checked, and file a bug report. (Not that compilers don’t have
    errors, especially around the dark corners, but too often the
    problem is the user’s.)
  • The debugger usually has facilities for setting breakpoints on
    procedures, and watchpoints on locations. Both can be useful.
    Debuggers often let you change a variable and continue execution;
    this can be helpful too.
  • I’ve heard bug-checking likened to a scientific search where
    you generate and test hypotheses (sorry – no ref.) But that is the
    attitude you need. (“If I set this to this value, this should
    change; let’s check it.”)
  • Maintain a log of errors. I document each one with
    “Date/Symptom/Cause/Fix/Lesson”. You tend to make the same sorts of
    errors, so this can highlight that.
  • In C++ in particular, watch out for improper recompilation.
    When I’ve had a mysterious bug, deleting all the object files and
    rebuilding everything has often fixed it. (If you’re using a system
    that doesn’t figure out source dependencies for you, it’s hard to
    get them all right.)
  • When commenting out code, use “#if 0/#endif” to bracket out
    sections of code. This ensures no interaction with comments. (I
    might use “//” to comment out a single line in C++, though.)
  • Run the profiler. Usually you can set it up to give you counts
    of how many times the various routines were called, and you can
    compare this to your expectations.

Time and Space Costs

  • Don’t worry about performance (at least right away). I’ve seen
    a lot of people doing bizarre stuff “for performance” in
    non-critical sections of the code. They’ve never even profiled the
    thing, yet they’re inserting bugs because of performance.

References

[Knuth, Donald. “The errors of TeX”, Software – Practice and Experience, July 1989.]