Deploying Squid, Part 1 of 2
by Jeff Dean
02/07/2000
In this two-part technical tutorial we'll explore the deployment of a web
proxy cache, sometimes referred to as a Web cache or a proxy server, for a small
to medium sized corporate enterprise. A web proxy cache is surprisingly easy to
implement and maintain, and when built using open-source software it can be
quite economical as well.
Why cache?
Bandwidth on a corporate Internet connection is a valuable and often critical
business resource, and in most cases is also fairly expensive. Unfortunately,
even for small and medium sized companies that precious lifeline can become
consumed by Web traffic from the company's own internal systems. This leads to a
slow and unresponsive connection during peak work hours. An analysis of the Web
surfing habits of a company's user population will often show a number of "hot"
Web sites, such as competitors, stock tracking sites, and items of personal
interest to employees. Visits by multiple individuals to hot sites leads to
inefficiencies, because each client browser must use the relatively slow
corporate Internet connection to fetch the same data.
Popular browsers help to reduce inefficiencies by locally caching Web
objects. This locally reduces demand and increases performance, but browser
caches aren't shared across an enterprise. The implementation of a web proxy
cache can save additional bandwidth. Just what is a Web cache? Simply put, it's
an intermediary (or proxy) computer system between Web browsers and Internet Web
servers. Instead of sending requests for Web pages directly to origin servers on
the Internet, browsers instead contact a web proxy cache server on the local
high-speed network, which in turn contacts the origin server on behalf of the
browser. The proxy fetches the object from the Internet and forwards it back to
the browser, but also keeps a copy for itself. Subsequent requests for the same
object from any browser in the enterprise won't require a visit to the origin
server. They can be fulfilled locally from the Web cache. This has the effect of
speeding response time for everyone and reducing bandwidth demands on the
Internet connection. It is not unusual for a cache system to reduce demand by
20-30%.
A variety of web proxy cache products are available. Some are free while
others are very expensive, particularly for large corporate environments. When
per-user licensing fees are evaluated for some of the products, the costs can
become prohibitive. Fortunately, at least one mature, reliable, and popular Open
Source alternative exists, the Squid web
proxy cache. Squid is funded by the US National Science Foundation and is
developed through the unpaid contributions of many volunteers. Squid is free,
licensed under the GNU Public
License. Squid runs on nearly all flavors of Unix, including Linux and
FreeBSD.
System Requirements
A web proxy cache requires a generous amount of memory and a fast disk I/O
subsystem. Memory is needed to maintain lists of cached objects, and disks must
be capable of keeping up with a steady flood of random reads and writes.
Typically processor speed is not a limiting factor, and a modest processor can
make a satisfactory proxy server given the appropriate I/O and memory
configuration.
In this tutorial, we'll be configuring Squid for a pair of Intel systems
running Linux and intended to serve up to 2000 client browsers. Since Internet
demand and usage patterns are site-specific, your site may need more or less
hardware as your needs dictate. For the purposes of this example, the following
specifications are adequate:
- Single-processor Intel PentiumPro-200 or better
- 256MB RAM
- Ultra-Wide SCSI Interface
- Three 4GB Ultra-Wide SCSI disks (no RAID)
- Redhat Linux 6.0
- Squid web proxy cache
In our example configuration we'll begin with a working Redhat Linux 6.0
system (including the gcc C compiler) on ultra-wide SCSI disk
/dev/sda. This partition will also hold Squid and its log files.
Two more disks, /dev/sdb and /dev/sdc, will contain
the cached Web objects. To start, the cache disks are assumed to contain unused
ext2 (Linux native) partitions /dev/sdb1 and
/dev/sdc1. By placing the cache on multiple disks, we increase
cache performance. This distributes I/O and takes advantage of Squid's ability
to manage multiple cache disks simultaneously. (If you are configuring Squid for
a small installation, you may choose to cache to your system disk instead.) For
even better performance, we could place the disks on separate SCSI channels.
Note that IDE disk interfaces are not recommended for heavily loaded proxy
servers because of the inherent random nature of the cache I/O.
We'll be installing Squid into its default location,
/usr/local/squid. It is recommended to make the /usr partition
large enough to handle Squid's log files which can grow very big on a production
server. We will also run Squid under a special user created for the purpose,
appropriately called "squid" with a special group also called "squid."
A web proxy cache will write a large number of small files in its cache
directories. Therefore, you should create the filesystems for the two cache
disks with a relatively large number of inodes. If the inode configuration is
new to you, don't worry about it at this point - it's easy to reconfigure the
cache disks later if necessary.
Getting and Compiling Squid
While you may find a current precompiled binary package for your system,
we'll compile Squid from source code for this tutorial. Squid compiles easily
and offers complete control over where it is installed.
First, create directories for Squid:
# mkdir -p /usr/local/squid/src
Next, set ownership and the SGID permission on the top level and source
directories. This ensures that all new files have the squid group owner,
allowing multiple sysadmins to manage Squid without using root privilege:
# chown -R squid.squid /usr/local/squid
# chmod g+s /usr/local/squid /usr/local/squid/src
Create the squid user (under Redhat Linux, this also creates the squid
group):
# useradd squid -d /usr/local/squid
Use your browser or FTP client to transfer the Squid source distribution from
the Squid web proxy cache
download page. As of February 1, 2000, the latest version of Squid is known
as "2.3.STABLE1," the version we'll use in this example (you should be able to
implement any recent stable release without difficulty).
The squid source is stored in a compressed tar file, which should be placed
in the new src directory. Unpack the compressed tar file:
# cd /usr/local/squid/src
# tar zxvf squid-2.3.STABLE1-src.tar.gz
This will leave you with the entire source directory tree under
squid-2.3.STABLE1. There are helpful documents in the
doc directory, including a quick-start guide and installation
instructions. It's worth poking around at this point to familiarize yourself
with the version of Squid you're using. Next, build the software:
# cd /usr/local/squid/src/squid-2.3.STABLE1
# ./configure
The automatic configuration process will profile your system to determine
exactly what capabilities exist. You shouldn't have difficulty with this
process, but if issues do arise the error messages from configure should help
you find quick resolutions.
Next, we compile Squid using the supplied Makefile:
# make
The compilation should take between a few minutes and an hour depending on
your system's performance. When the compilation has completed without reporting
errors, install it:
# make install
The last line will create a directory hierarchy under
/usr/local/squid, including bin (executables like
squid itself and its utilities), etc (configuration), and
logs (Squid log files). Note that there are no cache directories
set up at this point. To create them, we'll need to mount the two disks we set
aside for the task:
# mkdir /usr/local/squid/cache0 /usr/local/squid/cache1
# mount -t ext2 /dev/sdb1 /usr/local/squid/cache0
# mount -t ext2 /dev/sdc1 /usr/local/squid/cache1
We now need to create a configuration file for Squid, stored in
/usr/local/squid/etc/squid.conf. Listing 1 contains a basic file
that you can use to get started. Later, you'll want to customize your
configuration.
After creating squid.conf, you're ready to build your cache directories. The
cached objects are stored in a large hierarchy. Its framework must be created
before launching Squid for the first time. To initiate the cache build use the
-z option to squid:
# /usr/local/squid/bin/squid -z
This will exercise your disks for a while as the hierarchy is created. When
it completes, you're ready to start Squid for the first time:
# /usr/local/squid/bin/squid -Ns &
To verify that squid is running, take a look at
/usr/local/squid/logs/squid.log. You should see something like
Listing 2, ending in "Ready to serve requests." Squid should now be ready to
accept requests from browsers.
Access Control
Before moving on to the browser side of things, let's stop to
consider some basic security issues involved with using a cache (my
thanks to Michael Alan Dorman for raising this important issue). Your
intended purpose for deploying a cache will imply an intended user
base. In the case of a small to medium sized enterprise, for whom
this tutorial is intended, the users are usually the employees of the
company, who access the Internet from their internal private LAN. A
web cache becomes part of the larger security infrastructure,
including firewalls, mail servers, and other technologies. In many
such cases the web cache can be deployed behind the firewall because
it is intended for access only by users on the LAN. In this
configuration, security for the cache server isn't a significant
concern because only trusted users have access to it.
However, your situation may dictate that you deploy your Squid
system outside your firewall so that it is publicly available on the
Internet. In this scenario, security rises to the top of the priority
list. As Mike Dorman points out, an unsecured web proxy can be unexpectedly
abused by unauthorized outsiders.
To prevent such abuse, you can create an access control methodology
to selectively offer caching services only to users you trust. Squid
offers this capability through administrator-defined Access Control
Lists (ACLs), which can be used to create finely detailed access
control schemes. Limitations can be placed on client addresses,
destination domains, time of day, port numbers, access methods,
browsers, and even users. While a complete treatment of Squid ACLs is
far beyond the scope of this tutorial, a simple client-address ACL
scheme has been included in the Squid configuration shown in Listing 1. The first part of the ACL setup
involves the definition of access groups:
acl all src 0.0.0.0/0.0.0.0
acl mynet src 192.168.1.0/255.255.255.0
The first line defines the group all that includes all
possible IP addresses. The second defines a small subgroup of
addresses called mynet on the private network 192.168.1.0 (this
is just an example - your address configuration will be different).
It is only users from mynet that we wish to allow access to the
cache, which leads us to the second part of the ACL setup:
http_access deny all
http_access allow mynet
Here, we explicitly deny http access to Squid by every possible
address as defined in group all, but then turn around and grant
access to mynet. The effect is that systems coming from
addresses outside of mynet will not be able to access Squid
while those inside have full access.
While effective, this ACL configuration only scratches the surface
of Squid's capability. A thorough review of ACL usage is essential
prior to deployment of a publicly available cache.
Browser Configuration
To test Squid, we'll manually configure a browser to use Squid instead of
origin servers. In Netscape Communicator, this is done using the Edit
-> Preferences -> Advanced -> Proxies dialog. Select "Manual
Proxy Configuration" and click on "View". For each protocol, enter the IP
address of your Squid machine and port number 3128, the default port on which
Squid listens for inbound requests. Save your changes and try browsing a site
you're familiar with. If everything is working correctly, you should be able to
browse as before. The difference is that Squid is now acting as an intermediary,
keeping copies of the pages you view in its cache. To see Squid's activity,
watch its access log:
# tail -f /usr/local/squid/logs/access.log
You should see a line in that file for each request from browsers. An example
is given in Listing 3, showing the time (since the Unix epoch), requesting IP
address, URL, etc. Each line also will indicate a status of the request with
respect to the cache, such as TCP_HIT, TCP_MISS, or
TCP_MEM_HIT, among others. Those status messages including the word
HIT indicate that the request was served from the cache.
If everything has gone well up to this point, you should have a functional Squid
configuration that serves requests from multiple browsers.
Next Month
In the second part of this article, we'll complete our enterprise
installation of Squid, including:
- Configuration of automatic startup and shutdown for the Squid daemon.
- Configuration of Squid's Web-based management utility.
- Automation of proxy configuration for browsers.
- The Setup for two Squid systems as peers, including the ability to share
their caches.
=========
Listing 1
=========
# squid.conf
#
# a basic configuration file for the Squid Proxy Web Cache
# set logging to the lowest level
debug_options ALL,1
# define group "all" that encompasses all possible IP addresses
# and group "mynet" that represents my class-C network:
acl all src 0.0.0.0/0.0.0.0
acl mynet src 192.168.1.0/255.255.255.0
# define an access control for group "all" to deny http access,
# and another for group "mynet" to allow http access.
#
# The effect of using both is to prohibit access to the cache by
# any address that doesn't satisfy the criteria established
# in group "mynet".
http_access deny all
http_access allow mynet
# set Squid's user and group
cache_effective_user squid squid
# set log directories
cache_access_log /usr/local/squid/logs/access.log
cache_log /usr/local/squid/logs/cache.log
# set cache directories of 3.5GB each
cache_dir ufs /usr/local/squid/cache0 3500 16 256
cache_dir ufs /usr/local/squid/cache1 3500 16 256
# set the cache memory target for the Squid process
cache_mem 80 MB
# the mailbox of the sysadmin
cache_mgr root@localhost
=========
Listing 2
=========
2000/02/01 03:12:10| Starting Squid Cache version 2.3.STABLE1 for
i686-pc-linux-gnu...
2000/02/01 03:12:10| Process ID 1188
2000/02/01 03:12:10| With 1024 file descriptors available
2000/02/01 03:12:10| Performing DNS Tests...
2000/02/01 03:12:10| Successful DNS name lookup tests...
2000/02/01 03:12:10| DNS Socket created on FD 5
2000/02/01 03:12:10| idnsParseResolvConf: nameserver 209.195.201.3
2000/02/01 03:12:10| idnsAddNameserver: Added nameserver #0:
209.195.201.3
2000/02/01 03:12:10| idnsParseResolvConf: nameserver 209.195.192.3
2000/02/01 03:12:10| idnsAddNameserver: Added nameserver #1:
209.195.192.3
2000/02/01 03:12:10| Unlinkd pipe opened on FD 10
2000/02/01 03:12:10| Swap maxSize 1024000 KB, estimated 78769 objects
2000/02/01 03:12:10| Target number of buckets: 1575
2000/02/01 03:12:10| Using 8192 Store buckets
2000/02/01 03:12:10| Max Mem size: 40960 KB
2000/02/01 03:12:10| Max Swap size: 1024000 KB
2000/02/01 03:12:10| Rebuilding storage in /usr/local/squid/cache0
(CLEAN)
2000/02/01 03:12:10| Rebuilding storage in /usr/local/squid/cache1
(CLEAN)
2000/02/01 03:12:10| Set Current Directory to /usr/local/squid/cache0
2000/02/01 03:12:10| Loaded Icons.
2000/02/01 03:12:10| Accepting HTTP connections at 0.0.0.0, port 3128,
FD 14.
2000/02/01 03:12:10| Accepting ICP messages at 0.0.0.0, port 3130, FD
15.
2000/02/01 03:12:10| WCCP Disabled.
2000/02/01 03:12:10| Ready to serve requests.
=========
Listing 3
=========
949393249.739 393 192.168.1.30 TCP_MISS/000 526
GET http://oreilly.linux.com/ - DIRECT/oreilly.linux.com -
949393253.010 19 192.168.1.30 TCP_HIT/200 1699
GET http://www.oreillynet.com/onstyle.css - NONE/- text/css
949393253.837 572 192.168.1.30 TCP_MISS/200 529
GET http://adforce.imgis.com/? - DIRECT/adforce.imgis.com
application/x-javascript