How many times have you asked yourself or been curious about how
the developers at Hotmail or Yahoo Mail process the attachments to
your email? Rest assured that you are not the only one. Too often Java
Internet developers only concentrate on processing strings from an
HTML form, and when asked by the boss if they can do file upload, they
have to do some research before they can come back with an
answer. File upload is too rarely discussed by even respectable Java
literature.
And, with the growth of the Internet, file upload has now also
played significant roles beyond email applications. Other
Internet/intranet applications such as Web-based document management
systems and the likes of "Secure File Transfer via HTTP" require
uploading files to the server extensively. This article discusses all
you need to know about file upload. But first things first. Before you
jump too excitedly into coding, you need to understand the underlying
theory: the HTTP request. Knowledge of the HTTP request is critical
because when you process an uploaded file, you work with raw data not
obtainable from an HttpServletRequest object's methods
such as getParameter, getParameterNames, or
getParameterValues.
The HTTP Request
Each HTTP request from the Web browser or other Web client
applications consists of three parts:
A line containing the HTTP request method, the Uniform Resource Identifier (URI), and the protocol and the protocol version
HTTP Request headers
The entity body
These three parts are explained in the following sections.
The Request Method, URI and Protocol
The first subpart of the first part, the HTTP request method,
indicates the method used in the HTTP request. In HTTP 1.0, it could
be one of the following three: get, head, or
post. In HTTP 1.1, in addition to the three methods,
there are four more methods: delete, put,
trace, and options. Among the seven, the two
methods that are most frequently used are get and
post. get is the default method. You use it,
for example, when you type a URL such as
http://www.onjava.com in the Location or Address box of your
browser to request a page. The post method is common
too. You normally use this as the value of the
<form> tag's method attribute. When
uploading a file, you must use the post method.
The second part of the first part, the URI, specifies an Internet
resource. A URI is normally interpreted as being relative to the Web
server's root directory. Thus, it starts with a forward slash
(/) that is of the following format.
/virtualRoot/pageName
For example, in a typical JavaServer Pages application the URI
could be the following.
/eshop/login.jsp
More information about URI can be found here.
The third component of the first part is the protocol and the
protocol version understood by the requester (the browser). The
protocol must be HTTP and the version could be 1.0 or 1.1. Most Web
servers understand both versions 1.0 and 1.1 of HTTP. Therefore, this
kind of Web server can serve HTTP requests in both versions as
well. If you are still using an old HTTP 1.0 Web server, you could be
in trouble if your users use modern browsers that send requests using
HTTP 1.1 protocol.
Combining the three sub-parts of the first component of an HTTP
request, the first component would look like the following.
POST /virtualRoot/pageName HTTP/version
For instance:
POST /eshop/login.jsp HTTP/1.1
The HTTP Request Headers
The second component of an HTTP request consists of a number of
HTTP headers. There are four types of HTTP headers: general, entity,
request, and response. These headers are summarized in Tables 1, 2 and
3. The response headers are HTTP Response specific, thus not relevant
to be discussed here.
Table 1: HTTP General Headers
Header
Description
Pragma
The Pragma general
header is used to include implementation specific directives that may
apply to any recipient along the request/response chain. This is to
say that pragmas notify the servers that are used to send this request
to behave in a certain way. The Pragma header may contain multiple
values. For example, the following line of code inform all proxy
servers that relay this request not to use a cached version of the
object but to download the object from the specified location:
Pragma: no-cache
Date
The Date general header represents the date and time at which the message was originated.
Table 2: HTTP Entity Headers.
Header
Description
Allow
This header lists the
set of method supported by the resource identified by the requested
URL. The purpose of this field is strictly to inform the recipient of
valid methods associated with the resource. The Allow header is not
permitted in a request using the post method, and thus
should be ignored if it is received as part of a post
entity. For instance,
Allow: get, head
Content-Encoding
This header
is used to describe the type of encoding used on the entity. When
present, its value indicates the decoding mechanism that must be
applied to obtain the media type referenced by the Content-Type
header. For example,
Content-Encoding:
x-gzip
Content-Length
This header
indicates the size of the entity-body, in decimal number of octets,
sent to the recipient or, in the case of the head method,
the size of the entity-body that would have been sent had the request
been a get. Applications should use this field to
indicate the size of the entity-body to be transferred, regardless of
the media type of the entity. A valid Content-Length field value is
required on all HTTP/1.0 request messages containing an
entity-body. Any Content-Length header greater than or equal to zero
is a valid value. For example,
Content-Length:
32345
Content-Type
The
Content-Type header indicates the media type of the entity-body sent
to the recipient or, in the case of the head method, the
media type that would have been sent had the request been a
get. For example,
Content-Type:
text/html
Expires
The Expires header
gives the date and time after which the entity should be considered
invalid. This allows information providers to suggest the volatility
of the resource or a date after which the information may no longer be
accurate. Applications must not cache this entity beyond the date
given. The presence of an Expires header does not imply that the
original resource will change or cease to exist at, before, or after
that time. However, information providers should include an Expires
header with that date. For example,
Expires: Thu, 29
Mar 2001 13:34:00 GMT
Last-Modified
The
Last-Modified header indicates the date and time at which the sender
believes the resource was last modified. The exact semantics of this
field are defined in terms of how the recipient should interpret it.
If the recipient has a copy of this resource that is older than the
date given by the Last-Modified field, that copy should be considered
stale For example,
Last-Modified: Thu, 10 Aug 2000
12:12:12 GMT
Table 3: HTTP Request Headers
Header
Description
From
The From header
specifies who is taking responsibility for the request. This field
contains the email address of the user submitting the request. For
example,
From: dragonlancer@labsale.com
Accept
This header contains
a semicolon-separated list of MIME representation schemes that are
accepted by the client. The server uses this information to determine
which data types are safe to send to the client in the HTTP
response. Although the Accept field can contain multiple values, the
Accept line itself can also be used more than once to specify
additional accept types (this has the same effect as specifying
multiple accept types on a singe line). If the Accept filed is not
used in the request header, the default accepts types of text/plain
and text/html are assumed. For example,
This header is very similar to the accept header in
syntax. However, it specifies the content-encoding schemes that are
acceptable in the response. For instance,
Accept-Encoding: x-compress; x-zip
Accept-Language
This header is also similar to the Accept header. It
specifies the preferred response language. The following example
specifies English as the accepted language:
Accept-Language: en
User-Agent
The
User-Agent, if present, specifies the name of the client browser. The
first word should be the name of the software followed by a slash and
an optional version number. Any other product names that are part of
the complete software package may also be included. Each name/version
pair should be separated by white space. This field is used mostly for
statistical purposes. It allows servers to track software usage and
protocol violation. For example,
User-Agent: Mozilla/4.0
(compatible; MSIE 4.01; Windows 98)
Referer
This header specifies
the URI that contained the URI in the request header. In HTML, it
would be the address of the page that contained the link to the
requested object. Like the User-Agent header, this header is not
required but is mostly for the server's statistical and tracking
purpose. For example,
Referer:
http://localhost/Atoms/Details.htm
Authorization
The Authorization header contains authorization information. The first word contained in this header specifies the type of authorization system to use. Then, separated by white space, it should be followed by the authorization information such as a user name, password, and so forth. For example,
Authorization: user ken:dragonlancer
If-Modified-Since
This header is used with the GET method to make it conditional. Basically, if the object hasn't changed since the date and time specified by this header, the object is not sent. A local cached copy of the object is used instead. For example,