Summary: Code that’s tedious to write, but must be changed easily, can often be accommodated with a table-driven design.
Forces
- Code is most easily written “ad-hoc”.
- There’s often a structural aspect that is constant even though the details change.
- Ad-hoc code is bulky because it repeats the “constant” part.
Example
Language grammars undergo numerous small but important changes during compiler development.
Resolution
Make the program be table-driven. Define the parts that change and put them in a table. Write a small routine that uses the table to know what to do. (In effect, this is an interpreter, where the table is the language to be interpreted.)
The table may be static, or linked in at runtime, or read from a file or network, etc. It’s often beneficial if it is separate from the interpreting routine, so you can change the table without affecting the code.
Examples
- Many language processing tools use this strategy: lex, yacc, and others. The grammar is stored in the table.
- Resource bundles in Motif, Java, etc. Rather than hard-coding parameters, they’re stored in a file read at the beginning of runtime.
- State machines are often implemented as a table and an interpreter.
Consequences
This is a classic space-time tradeoff.
- It may run faster. The interpreting routine may be small enough to stay in the code cache, with high locality of (I-space) reference. The original code may be so spread out that it runs slower because of cache issues.
- It may run slower (and often will). Interpretation adds an overhead (managing indexes into tables) that is done implicitly by the program counter in straight-line code.
- There is sometimes less opportunity for optimization. While it may be easier to optimize the interpreting routine than it would the straight-line code, it is hard to take advantage of patterns across multiple rows. Straight-line code appends these together directly, so the compiler can know the exact surrounding circumstances.
- It can make structure explicit. For example, the state of the parse is fairly complex when implemented as code – error recovery can be hard to implement because it is hard to jump around in code, while tables may have auxiliary information that will help in repair. As another example, in recursive descent parsing (which uses mutually recursive routines to match a string against a grammar), the call stack is an implicit part of the state; it must contain addresses, etc. In the table-driven form, the stack may be explicit. It will carry a potentially smaller amount of information (eg., table indexes rather than subroutine addresses).
Coming someday (maybe)
An example.
[Written 9-1-98]