Unix Power Tools
Looking for Closure
by Tim O'Reilly
02/24/2000
A common problem in text processing is making sure that items
that need to occur in pairs actually do so.
Most UNIX text editors include support for making sure that elements
of C syntax such as parentheses and braces are closed properly.
There's much less support for making sure that textual documents,
such as
troff
source files, have the proper structure.
For example, tables must start with a .TS macro,
and end with .TE.
HTML documents that start a list with <UL> need a closing
</UL>.
UNIX provides a number of tools that might help you to tackle this
problem.
Here's a shell script written by Dale Dougherty that
uses awk to make sure that .TS
and .TE macros come in pairs:
#! /usr/local/bin/gawk -f
BEGIN {
inTable = 0
TSlineno = 0
TElineno = 0
prevFile = ""
}
# check for unclosed table in first file, when more than one file
FILENAME != prevFile {
if (inTable)
printf ("%s: found .TS at File %s: %d without .TE before end of file\n",
$0, prevFile, TSlineno)
inTable = 0
prevFile = FILENAME
}
# match TS and see if we are in Table
/^/.TS/ {
if (inTable) {
printf("%s: nested starts, File %s: line %d and %d\n",
$0, FILENAME, TSlineno, FNR)
}
inTable = 1
TSlineno = FNR
}
/^/.TE/ {
if (! inTable)
printf("%s: too many ends, File %s: line %d and %d\n",
$0, FILENAME, TElineno, FNR)
else
inTable = 0
TElineno = FNR
}
# this catches end of input
END {
if (inTable)
printf ("found .TS at File %s: %d without .TE before end of file\n",
FILENAME, TSlineno)
}You can adapt this type of script for any place you need to check
for something that has a start and finish.
A more complete syntax checking program could be written with the
help of a lexical analyzer like lex.
lex is normally
used by experienced C programmers, but it can be used profitably by
someone who has mastered awk and is just beginning with C,
since it combines an awk-like pattern-matching process using
regular expression syntax, with actions written in the more powerful
and flexible C language.
(See O'Reilly & Associates' lex & yacc.)
And of course, this kind of problem could be very easily tackled
in
perl.
More Unix Power Tools