Searching in Unusual
Ways and Places
Æleen Frisch
Sidebar: grep Context Displays
A few weeks ago, I was reading an article that cited some statistics about
how many times various actions were performed in the course of a lifetime: how
many hours a person sleeps, how many miles are driven to work, how much food
is consumed you get the idea. I started to think about how many times
Ive done various things, including how many times Id run various
UNIX commands. For me, the top two most frequently used commands are ls and
grep. In the course of my career so far, Ive run each of them more than
100,000 times.
Clearly, grep is a command I cant live without. I constantly use it on its own and in pipes with other commands. For example:
% ps -aux | egrep 'chavez|PID'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
chavez 14355 0.0 1.6 2556 1792 pts/2 S 10:23 0:00 -tcsh
chavez 18684 89.5 9.6 27680 5280 ? R N Sep25 85:26 /home/j03/l988
I use this command combination often enough with different usernames that
Ive defined an alias for it.
There are times, however, when I want to perform grep-like search operations but grep itself is cumbersome or impossible to use: finding data within network traffic, looking for a software package, locating a specific email message. In these contexts where grep cant be applied easily, I have to turn to other tools (some are open source, others are vendor provided). This article will look at some of them.
Searching Network Packets
Searching network traffic for patterns in real time is a useful technique
for debugging a variety of network problems. Its not easy to apply grep
to this task. It is possible to run a packet-capturing utility like tcpdump
and then search the resulting output with grep, but this can be awkward and
ineffective. What you often want is to examine the entire packet when some part
of its data matches a pattern. Unfortunately, using grep with a packet dump
will return only the lines containing the pattern. There are also times when
this approach is simply too slow.
Fortunately, there is an open source utility that does exactly this job: ngrep. It is developed and maintained by Jordan Ritter, and the projects home page is http://ngrep.sourceforge.net. ngrep has the following general syntax:
ngrep [options] pattern [filter]
where pattern is a regular expression to search for in network packets,
and the optional filter is an expression indicating the sorts of network
packets to which to pay attention. This filter, technically known as a Berkeley
packet filter (BPF), consists of a series of keywords specifying rules for selecting
packets (BPF filters are also used by packet-dumping utilities). These keywords
specify the source and/or destination host, network, protocol, and/or port.
See the manual page for information about constructing BPF expressions. If the
filter is omitted, then all packets are included.
Using a filter in combination with a search string usually makes searching more efficient and ngreps output more readable. As an example, consider this ngrep command that looks for packets containing output from the finger command among all network traffic:
$ ngrep "No Plan\."
The command searches for a recognizable string from the finger output. If
you run this on most systems and run a finger command within earshot of that
host, youll see the relevant packet intermixed with lots of number signs
and dot series, and it will often be repeated several times. A number sign is
printed for each packet examined, and dots indicate output interruptions. If
you use the following command instead, then you wont get the ugly output,
and ngrep will do less work:
# ngrep -q "No Plan\." port finger
Now, only packets to or from the finger port (79) are examined. The -q
option suppresses the number signs.
Here is a more complex ngrep command that locates some specific FTP connection operations. It examines FTP-related packets sent from host hamlet to host ophelia and searches their data for the string USER:
# ngrep -q -t "USER" src host hamlet and dst host ophelia and tcp port 21
T 2002/09/24 14:11:15.413069 192.168.9.212:32813 -> 192.168.9.84:21 [AP]
USER chavez..
T 2002/09/24 14:32:07.776476 192.168.9.212:32814 -> 192.168.9.84:21 [AP]
USER amber..
For each packet, the output begins with a letter indicating the protocol (TCP
here), followed by a time stamp (requested with the -t option), and then
the source and destination host and port. The second line displays the packets
data. A command like this is one way of capturing all FTP connection attempts.
As this example indicates, complex filters can be created by joining clauses
with and. You can also use the or and not
logical operators as well as parentheses for grouping (the latter will need
to be escaped to protect them from the shell).
ngrep can be very useful when testing and debugging network services. For example, I found it very useful when I enabled the secure versions of LDAP. Everything seemed to work fine after I was finished, but I wanted to verify that the secure version of the protocol was being used. If it was, then I would not be able to detect any clear text passwords in LDAP traffic. An ngrep command like this one enables me to test this functionality:
# ngrep 'badpassword' src host ldapserver and \( port 636 or port 389 \)
This command watches the ports corresponding to SSL and TLS-secured LDAP.
While this command was running, I ran three queries: one using the normal ldap
protocol, one using the ldaps protocol, and a third using a GUI LDAP client
in TLS mode. All three queries succeeded and displayed the appropriate records.
The ngrep command returned output only for the first one, so it was clear that
the password had been encrypted in the latter two cases. ngrep was a better
choice than a general packet dumper for this job because it limited its scope
to exactly the packets in which I was interested.
This example illustrates that ngrep can be useful for not finding things as well as finding them. In this case, the lack of output was what I hoped for. Those of you who had qualms about the finger example earlier can apply this principle as well: because running a finger daemon is not a good idea in most environments, such a ngrep command can function as a security trap. If finger traffic appears on the network, ngrep will detect it and let you know there is a problem.
ngrep has several other useful options:
-i Perform a case-insensitive search.
-A n Display the n packets following each matched packet.
-d dev Use the specified network device.
-O file Save matching packets in file in addition to displaying
them.
-X Interpret the search pattern as hexadecimal.
ngrep is useful for a wide variety of tasks ranging from testing network applications to monitoring network traffic. It is also quite useful for debugging specific operations or programs on busy systems because of its ability to extract very narrow ranges of packets for examination.
Searching Mailboxes
At first thought, grep ought to be able to perform a task like searching mailboxes
for specific text. You can search mail files for text, but using grep has at
least two disadvantages. First, you may want to retrieve the entire message(s)
that matches the pattern, and grep only returns matching lines by default. Second,
if any of the mailboxes contain lengthy MIME attachments, searching with grep
can produce voluminous output arising from an unlucky false positive within
the binary attachment.
A better tool for this job is grepmail, an open source utility written by
David Coppit (see http://grepmail.sourceforge.net
for more information). grepmail is designed specifically for searching mail
folders. Here is a simple example of its use:
% grepmail -R -i -l hilton ~/Mail
Mail/conf/acs_w01
I was looking for the phone number of a specific Hilton hotel, which was in
a mail message somewhere, but I couldnt remember where Id filed
it. This command searches for the string hilton (-I says
to perform a case-insensitive search) in all mail folders under the specified
starting directory (-R means recursive), and lists the names of files
containing messages that match (-l option). The advantage of this approach
is that I can search for the string I remember and find the telephone number
even though the two items may be lines apart in the actual message. This command
yields the phone number:
% grepmail -i hilton '!!' | grep -i telephone
Telephone: 619-231-4040
This grepmail command searches for the same string in the mail folder returned
by the previous command. This time, grepmail will return the entire message
as its output (since -l is omitted). The result is then piped to grep
to isolate the phone number.
Here is a somewhat more complicated command that uses grepmail twice. Its goal is to find messages from user nadia that mention something related to Naples, Italy:
% grepmail -R -h "^From: .*nadia" ~/Mail | grepmail -b -i \
"naples|napoli|neapolit"
The first command searches mail headers (-h) for From lines
including nadia somewhere in their text. The second command searches
only the body (-b) of the matching messages for the specified strings.
grepmail has several other useful options:
-d date Limit search to messages on the specified date or within
the specified date range. The date format is very flexible; see the manual page
for details.
-v Display only non-matching messages.
-u Display only unique messages.
-M Dont search non-text MIME attachments.
-r Display a report listing each folder searched and the total
number of matching messages within it.
-m Add an X-Mailfolder header to displayed messages; the headers
text will be the path to the messages mail folder.
-H Display only the headers of matching messages.
It is also very easy to forward a mail message located in this manner. Here is a simple method:
% grepmail -m -u ... | mail -s subject someone@somewhere
Finally, some people prefer to view the search results from a mail client.
This is usually easy to accomplish via a simple script that redirects grepmails
output to the mailers default folder. Several have been created for this
purpose:
- pine: grepine by Cristin Pietsch http://www.dfki.de/~pietsch/software
- mutt: grepm by Moritz Barsnick http://www.barsnick.net/sw/grepm.html
- VM: vm-grepmail.el by Robert Fenk http://www.robf.de/Hacking/elisp/vm-grepmail.el
Search Operations for Software Packages
Software packages are another item whose contents are hard to search with
grep. More specifically, I often want to answer questions like these:
- Is a specific package installed?
- What package does a specific file belong to?
- What packages are available on an individual CD (or other media)?
- What is included within a package (installed and not)?
On many systems, one or more of these questions can be answered using the package management tools supplied with the operating system. For example, the following commands can be used to list all currently installed packages on various UNIX systems:
Linux: rpm -q -a
FreeBSD: pkg_info -a -I
Solaris: pkginfo
HP-UX: swlist
AIX: lslpp -l all
You can pipe any of these commands to grep to determine whether a specific
package is present to find its actual package name. For example, the following
command lists all packages related to LDAP installed on a Linux system:
% rpm -q -a | grep -i ldap
nss_ldap-184-1
openldap-2.0.23-4
openldap-clients-2.0.23-4
openldap-servers-2.0.23-4
This system has the OpenLDAP servers and client utilities installed, as well
as the modules that interface LDAP to PAM and to the name service switch file,
/etc/nsswitch.
Its often useful to find out which package a particular file is part of (e.g., when you delete it accidentally and need to restore it). These command forms will indicate which package installed the specified file:
Linux: rpm -q ---whatprovides path
Solaris: pkgchk -l -p path
AIX: lslpp -w path
Here is an example from a Solaris system:
% pkgchk -l -p /etc/init.d/sendmail
Pathname: /etc/init.d/sendmail
Type: editted file
Expected mode: 0744
Expected owner: root
Expected group: sys
Referenced by the following packages:
SUNWsndmr
Current status: installed
When you want to know what is contained in an installed package, use these
commands:
Linux: rpm -q -l name
FreeBSD: pkg_info -L name
Solaris: pkgchk -l name | grep "^Pathname:"
HP-UX: swlist -l file
AIX: lslpp -f name
Here is an example from a FreeBSD system:
% pkg_info -L grub-0.91_1
Information for grub-0.91_1:
Files:
/usr/local/bin/mbchk
/usr/local/info/grub.info
/usr/local/info/multiboot.info
/usr/local/sbin/grub
...
In general, if you want to list the contents of an uninstalled package, you
can replace the package name with the path to the package file in the preceding
commands. On Linux systems, however, you must precede the package name with
the -p option.
Only HP-UX and AIX have easy-to-use commands for listing the packages available on CDs or other media:
HP-UX: swlist -s path-or-device
AIX: installp -l -d device
On Linux, FreeBSD, and Solaris systems, you must rely on GUI package management
tools to handle this function. On Linux systems, you can use gnorpm and similar
packages (as well as yast2 on SuSE Linux systems). Under FreeBSD systems, you
can use the sysinstall utility and select the Configure=>Packages menu path.
On Solaris systems, the Supplementary Software CD includes a GUI installation
tool that starts automatically when the CD is inserted, and it can be used to
view the contents of the CD as well. On all three systems, you can also examine
the directory containing the package files with ls for a quick listing of what
is available.
Searching Net-SNMP MIBs
The Simple Network Management Protocol (SNMP) can be used to monitor and reconfigure
a wide variety of computer systems and other network devices. The items that
can be queried or set are defined in Management Information Bases (MIBs). A
MIB is a collection of value and property definitions, and the various items
are organized as a tree structure. This hierarchical organizational scheme serves
to group related data together. MIB definitions are stored in files and are
implemented in the software on the actual computers and devices. The MIB does
not hold any data it is a schema, not a database.
Here is an example MIB item:
iso.org.dod.internet.mgmt.mib-2.system.sysLocation = "Machine Room"
The long string on the left is the settings name, and its value is the
string to the right of the equals sign. The name is separated into components
by periods, and each corresponds to successive levels of the MIB tree. Thus,
we can see that the sysLocation node is eight levels from the top of the tree.
Although the MIB is organized as a tree, it is not uniformly populated. The top four levels of the standardized MIB tree exist mainly for historical reasons. Given this rather ad hoc structure, searching the MIB tree for specific items is often essential. However, it is not a job for grep.
Most SNMP implementations provide utilities for examining MIBs. The open source SNMP implementation Net-SNMP is used on Linux and FreeBSD systems (and other UNIX systems, if desired). The tool the package provides to examine the MIB structure is snmptranslate. This command provides information about the MIB structure and its items. For example, you can use it to display a MIB subtree, as in this example:
% snmptranslate -Tp .iso.org.dod.internet.mgmt.mib-2.system
+--system(1)
|
+-- -R-- String sysDescr(1)
| Textual Convention: DisplayString
| Size: 0..255
+-- -R-- ObjID sysObjectID(2)
+-- -R-- TimeTicks sysUpTime(3)
+-- -RW- String sysContact(4)
| Textual Convention: DisplayString
| Size: 0..255
...
Ive truncated the output after four entries.
snmptranslate can also provide detailed information about a specific MIB item, as in this example using the sysLocation leaf:
% snmptranslate -Td .iso.org.dod.internet.mgmt.mib-2. \
system.sysLocation
1.3.6.1.2.1.1.6
sysLocation OBJECT-TYPE
-- FROM SNMPv2-MIB, RFC1213-MIB
-- TEXTUAL CONVENTION DisplayString
SYNTAX OCTET STRING (0..255)
DISPLAY-HINT "255a"
MAX-ACCESS read-write
STATUS current
DESCRIPTION "The physical location of this
node (e.g., 'telephone closet, 3rd
floor'). If the location is unknown,
the value is the zero-length string."
::= { iso(1) org(3) dod(6) internet(1) \
mgmt(2) mib-2(1) system(1) 6 }
However, the most important searching feature finding the location
within the tree of a specific leaf is not provided automatically by snmptranslate.
This command will provide that information for the memTotalReal item:
% snmptranslate -Ts | grep memTotalReal\$
.iso.org.dod.internet.private.enterprises. \
ucdavis.memory.memTotalReal
This item, the total real memory present on a system, is located at the specified
point within the hierarchy. A slightly more complex command can provide both
the full location and a description for a MIB leaf:
% snmptranslate -Td 'snmptranslate -Ts | grep memTotalReal\$'
I use it often enough that Ive defined an alias for this command:
% alias snmpwhat 'snmptranslate -Td `snmptranslate -Ts | grep \!:1\$`'
Unusual Pattern Matching Requirements
Ill conclude this article with a quick look at two searching/pattern
matching topics that can be a bit tricky.
Filtering Foreign Language Email
Like many people, I use procmail to preprocess mail messages, including attempting
to remove spam. My current recipes work reasonably well for mail messages in
Western languages, but they fail for ones in many other languages (e.g., Japanese,
Chinese, Russian). Currently, I get 15-20 such spam messages each day.
Some people deal with this situation by discarding all email from the corresponding countries, but this approach does not work for me as I get legitimate mail from these countries on a regular basis (from non-predictable senders). What I needed was a procmail recipe to identify the foreign characters, which are above the normal ASCII range. The trick here is to get all of these characters into the .procmailrc file. This is easiest to do by entering them on a system/application that supports two-byte characters. The next step is to copy that file in binary mode to the system where procmail is run where its contents can be pasted into the initialization file.
A quick and dirty procmail recipe will look something like this when viewed with most text editors:
:0BH:
* [\200\201\202...\377][\200\201\202...\377][\200\201\202...\377]
$MAILDIR/foreign_spam
For me, three such characters in a row was a good enough first attempt at
solving this problem. There are many more elegant solutions available on the
Web. One of the best is by Walter Dnes, and it is available at:
http://www.waltdnes.org/email/chinese/index.html
It takes advantage of procmails weighting capabilities to detect messages
containing more than 5% non-ASCII characters.
Less Well-Known Regular Expression Constructs
Most people are familiar with the asterisk, plus sign, and question mark modifiers
to regular expression items (match zero or more, one or more, or exactly one
of the item, respectively). However, you can specify how many of each item should
be matched even more precisely using some extended regular expression constructs
(use egrep or grep -E):
Form Meaning
{n} Match exactly n of the preceding item.
{n,} Match n or more of the preceding item.
{n,m} Match at least n and no more than m of the preceding item.
Here are some simple examples:
% grep -E "t{2}" bio
She has written eight books, including
Essential Cultural Studies from Pitt. When
she's not writing
% grep -E "[0-9]{3,}" bio
network of Unix and Windows NT/2000/XP
systems. She
% grep -E "(the ){2,}|(and ){2,}" bio
and and creating murder mystery games. She
you'd like to receive the the free newsletter
The first command searches for double ts; the second command looks for
numbers of three or more digits; and the third command searches for two consecutive
instances of the words the and and (its a primitive
copy editor). You might be tempted to formulate the final item as:
(the |and ){2,}
However, this wont work, as it will match and the, which
is not generally an error.
Finally, be aware that the constuct {,m}, which might mean match m or fewer of the preceding item, is not defined.
Æleen Frisch is a systems administrator currently looking after a
pathologically heterogeneous collection of computers. She is also the author
of Essential System Administration, just released in an expanded third
edition, the new System Administration Pocket Reference, as well as several
other books. She can be reached by email at: aefrisch@lorentzian.com.