Protocol Design: Sessions
by Itamar Shtull-Trauring
January 20, 2004
Many protocols require sending a number of messages or commands
that are connected in one way or another. Sessions, a way of
grouping together multiple messages, occur on all levels of protocol
design, from the low-level parts of the TCP/IP stack to high level
constructs built on top of other protocols.
To understand the concept of sessions, it's best to start with UDP. UDP is a thin
layer on top of IP, the underlying protocol of the Internet that deals
with delivering messages between different hosts. UDP is composed of
datagrams or fixed length sequences of bytes typically less than 1500
bytes long. If the sender sends 500 bytes, the receiver will receive
those 500 bytes as a single unit, separately from any other
datagrams. UDP datagrams sent from one host to another may get
lost, duplicated, or arrive out of order. As a means of communication
UDP is somewhat like mailing letters. Sometimes the post office screws
up and the letter never arrives, and the recipient of the letter may
be getting other letters from other people. There isn't necessarily
any way of knowing which of these letters are connected; some
correspondents may write multiple letters about different
subjects.
For some simple protocols UDP is a good fit. One good example is
DNS, which is used to turn human readable domain names into IP
addresses. The DNS client sends a request datagram to a DNS server
asking for information about "www.example.com", and at some point in
the future the client will receive a datagram indicating
"www.example.com" has the IP address 192.0.34.166. Duplicate messages
can be ignored, and if no response is received the client can try
again.
Other protocols have more complex needs. What if the server, in
response to a request, asks for more information (e.g. for a username
and password to authorize the request)? Likewise, how can large
amounts of data be sent over UDP? One partial solution is to have each
datagram contain all necessary data. If a client receives a response
requesting authorization, it will send a new request datagram that
contains all of the original data together with the additional
required information, a username and password. Since UDP packets have
a maximum size, this solution doesn't solve the second problem. Even
if it were possible, sending multiple copies of the same information
is inefficient.
The second solution to these issues is the concept of a
session. Multiple datagrams will be identified as being in the same
session. This can be done based on the source IP and port of the
datagrams, i.e. all datagrams from a specific sender are considered a
session. Another way of grouping datagrams is using a unique session
identifier, included in each datagram. All datagrams that have the
same identifier are in the same session. Once datagrams are in a
session they can refer to the contents of previous datagrams in that
session without having to resend all the information if the protocol
requires this. It's equivalent to having a filing cabinet that stores
old letters sorted based on the sender of the letter. Without the
concept of a session, there'd be no way of knowing to which datagram
another datagram is referring.
Sessions don't solve another issue with UDP, the fact that datagram delivery
is unreliable, with possibly undesirable side effects. Datagrams may be
duplicated (requesting the download of a 200MB files twice) or arrive out of
order (a "delete item 1" command may arrive before the "backup item 1"
command). To solve this, a protocol can use a message counter. The first
datagram contains, in addition to its session id, the fact that it is message
1. The second datagram contains the fact that it is message 2 and so on. The
recipient can then re-order received datagrams if they arrive in the wrong
order by sorting them based on their message counter. Datagram loss can be
handled by the recipient asking for missing messages after a certain timeout
("resend message 2, I never got it") or by having the recipient acknowledge all
received messages. If the sender doesn't receive acknowledgment for a specific
message after a certain amount of time, the sender will will resend it.
Duplicates are easy to detect as they have duplicate message counters. The TFTP protocol uses the second
method to transfer files over UDP in a reliable fashion.
Implementing all of this from scratch for each protocol would be a pretty
wasteful effort, and this is where TCP comes in. Implemented on top of IP,
using more sophisticated versions of these methods, TCP provides connections,
that is, reliable ordered stream of bytes. A client opens a connection to a
server, sends bytes over the connection and receives bytes from the server, and
then at some point either side may end the connection. Unlike UDP, bytes are
not grouped into datagrams, so if the sender writes 500 bytes and then later
500 bytes, this may arrive on the recipient side as 1000 bytes or as 900 bytes
and then 100 bytes. Implementing sessions on top of TCP is trivial: all that is
required is to match a session to a connection. All messages sent over a single
connection are considered to be a single session. If UDP is similar to the
postal system, a TCP connection is a lot like a telephone conversation: a
continuous stream of information that requires no extra effort to tie
together.
Consider the following transcript of a POP3 session, a TCP-based protocol
used to retrieve email. The "APOP" command is used to authenticate
the user. The other commands, such as "LIST", can not be run until
the user has authenticated. If the authentication is successful, the session's
commands (i.e. all commands sent over this specific TCP connection) are assumed
to refer to the specific user's mailbox.
S: +OK POP3 server ready <1896.697170952@dbc.mtview.ca.us>
C: LIST
S: -ERR Invalid command
C: APOP mrose c4c9334bac560ecc979e58001b3e22fb
S: +OK mrose's maildrop has 2 messages (320 octets)
C: LIST
S: +OK 2 messages (320 octets)
S: 1 120
S: 2 200
S: .
C: RETR 1
S: +OK 120 octets
S: <the POP3 server sends message 1>
S: .
C: DELE 1
S: +OK message 1 deleted
C: QUIT
S: +OK dewey POP3 server signing off (maildrop empty)
POP3 is a stateful protocol. The session (tied to the connection) is a
state machine, and the server will support different commands depending on the
state. In the initial state only "APOP" or some other command that
authenticates the user are accepted. Once the user is in an authenticated state,
"LIST", "RETR" and so on will be accepted by the
server.
Just because it's easy to supports sessions using TCP doesn't mean it's
obligatory. Unlike POP3, HTTP does not tie the concept of a session to TCP
connections. A POP3 client can not send a "APOP" command, get a
response, open a new connection and send "LIST". That just won't
work since the new connection will be considered a separate session. HTTP
clients on the other hand can send requests over a single TCP connection or
over multiple connections, and the only real difference will be speed (opening
multiple connections is slower). Further, HTTP servers can't assume all that
all requests in the same connection are part of the same session, they might be
arriving from different clients via a HTTP proxy. Without the concept of a
session, HTTP is a stateless protocol, which is to say that it does not change
its protocol level behavior based on previous commands. Of course changes to
the underlying data storage (e.g. a HTTP request causing a database entry on
the server to be deleted) might affect the results of future request.
When downloading static files, the lack of sessions is not important. Which
file was downloaded before the current one is not going to affect the contents
of the current download. Many HTTP applications do need session support.
Shopping carts, for example, need to remember which items the user added to the
cart, which means matching multiple requests to a single session. In order to
make this possible, the concept of "cookies" was introduced to the
protocol, essentially session IDs sent along with each request. Other
mechanisms for implementing sessions are occasionally used as well. Rejecting
the concept of sessions tied to connections, HTTP developers were forced to
reintroduce sessions in other ways, all with drawbacks (e.g. browsers can
choose to disable cookie support) and all requiring extra effort to use.
HTTP is an established protocol for web browsing, and changes need to be
within the framework of the existing protocol. New protocols that require tight
coupling, and which can't be modeled using one-off request/response
transactions, would in many cases be better off not using HTTP, nor should such
protocols follow its model. The need to support sessions, required by many
common types of interactions, involves extra
complexity that would not be necessary were a better underlying protocol
chosen. For these protocols, tying the session to the TCP connection is the
natural way to implement the requirement for sessions.
The concept of a session, a series of ordered connected messages, is
fundamental to protocols that require long term conversations. Sessions can be
implemented on a number of levels: in the protocol itself or using the
transport layer's concept of sessions, typically TCP. When the transport
layer's session support is not used, or the transport does not support
sessions, extra effort is required to add them, leading to a more complex and
harder to implement protocol.
The next installment in this series will discuss a related design
issue: dealing with parallel requests, including their interactions with
sessions.