Cookies and session tracking
Michał Okulewicz, MSc & Maciej Grzenda, PhD Warsaw University of Technology
Faculty of Mathematics and Information Science M.Grzenda@mini.pw.edu.pl
http://www.mini.pw.edu.pl/~grzendam
http://www.mini.pw.edu.pl/~grzendam
p. 2
Maciej Grzenda
Identification problem
• HTTP is a stateless protocol
• Thus, it is hard to determine if a request comes from a user who already accessed our web
application
• Why there is a need to identify user?
– To make sure the user has logged on
– To check the contents of his shopping basket
– To obtain any user-related settings in further requests
– …
http://www.mini.pw.edu.pl/~grzendam
p. 3
Maciej Grzenda
Identity maintaining methods
• There are 3 typical methods available with the usage of HTML pages and HTTP
protocol
– Hidden fields – URL rewriting
– The use of cookies
• On the top of these techniques a user's
session might be tracked
http://www.mini.pw.edu.pl/~okulewiczm
p. 4
Maciej Grzenda & Michał Okulewicz
Hidden fields
• Hidden form fields provide the way to support session tracking.
Hidden form fields do not display in the browser, but can be sent back to the server by submit. These fields can contain session identification (session id) or just some data to remember.
Advantages Disadvantages
• Universally supported
• Allow anonymous users • Only works for a sequence of dynamically generated
forms.
• Breaks down with static documents.
http://www.mini.pw.edu.pl/~okulewiczm
p. 5
Maciej Grzenda & Michał Okulewicz
Hidden fields and links to scripts – sample interaction scenario
A user declares that he wants Java books published after 2009 to be found.
This is done by typing in the data into HTML form containing fields such as:
title, min price, max price, published after.
getBooks?
title=Java&year_after=2005&p
ricemin=&pricemax= An HTML document with a list of books is generated and displayed. Every item contains a link in the form
displayBookData?bookid=N, where N – unique book number
A user clicks on one of the links. A document describing one book is generated. It contains:
<input type="hidden" name="productid" value="1015">
<input type="submit" value="Buy!">
HTTP request is sent to the server
displayBookData?book_id=1015
A user clicks on Buy!. An HTTP request buyItem?
productid=1015 containing hidden data is sent. This is why it is clear what a user wants to buy.
buyItem?productid=1015
http://www.mini.pw.edu.pl/~grzendam
p. 6
Maciej Grzenda
URL rewriting
• URLs can be rewritten or encoded to include session
information; URL rewriting usually includes a session id.
• Id can be sent both as extra path information eg.:
http://.../servlet/Rewritten/688
• or as an added parameter:
http://.../servlet/Rewritten?sessionid=688
Advantages Disadvantages
• Let user remain anonymous
• They are universally supported.
• Tedious to rewrite all URLs
• Only works for dynamically created documents
http://www.mini.pw.edu.pl/~grzendam
p. 7
Maciej Grzenda
Cookies
• Cookie is a piece of data submitted from the server to a client (web browser) so as to store session-related data and use it in subsequent requests made by the client (e.g. to store login information). Web server sends a cookie name and value to a browser and later can read them back from the browser.
• The process is as follows:
1. Server application sends a cookie with its response to the client 2. The client saves the cookie (if configured to do so).
3. The client returns the cookie back with subsequent requests (depends on some rules).
• Typical uses of cookies are related to identifying a user or a session. In general, cookies can save either information or identification.
See http://www.ietf.org/rfc/rfc2109.txt for a broader overview of HTTP state management with cookies. Notice: the cookie-related API exposes standard HTTP mechanisms – available in different scripting technologies (but not equally supported in terms of API exposure).
http://www.mini.pw.edu.pl/~okulewiczm
p. 8
Maciej Grzenda & Michał Okulewicz
Cookies– sample interaction scenario
A user declares that he wants Java books published after 2009 to be found.
This is done by typing in the data into HTML form containing fields such as:
title, min price, max price, published after.
getBooks?
title=Java&year_after=2005&p
ricemin=&pricemax= The server creates a unique id e.g 12445 and requests the browser to keep it in a cookie named
"sessionid". A cookie {"sessionid","12445"} is stored in a file on the workstation. The data on detailed user’s request(s) is kept on the server.
HTTP request is sent to the server
12445 Java, after 2009
sessionid 12445
After e.g. a week, the same user (using the same browser) visits the same website. The browser sends cookies from this server: {"sessionid","12445"}. The server finds previous request for session 12445 and generates a list of recent books on Java, before a user specifies any queries.
http://www.mini.pw.edu.pl/~grzendam
p. 9
Maciej Grzenda
Cookies – basics
• New cookies are identified by a name and can store a string value
• Cookie should be added to the response in order to be sent to the client's browser or deleted from it
• Cookies are send by the browser with each of the requests
http://www.mini.pw.edu.pl/~okulewiczm
p. 10
Maciej Grzenda & Michał Okulewicz
Cookies - remarks
• Notice:
• Up till 20 cookies, 4kB each can be created by one server on one client!
• Thus cookies should not contain real data in general – rather
session identifiers. It is possible to keep all needed data in session related objects or in a database. Therefore, instead of keeping e.g.
a list of selected products in a cookie, use cookie just for session id and keep product list in a shopping basket in your server.
Advantages of cookies Disadvantages of cookies
• Very easy to implement.
• Highly customizable.
• Persist across browser shutdowns
• Can span many sessions over a number of days or even months
• Sometimes users turn off cookies for privacy or security reasons.
http://www.mini.pw.edu.pl/~grzendam
p. 11
Maciej Grzenda
Cookie properties
• Domain
– Specifies the domain within which this cookie should be presented.
– The form of the domain name is specified by RFC 2109. A domain name begins with a dot (e.g. .domain.com) and means that the
cookie is visible to servers in a specified Domain Name System (DNS) zone (for example, www.domain.com, but not
a.b.domain.com).
– By default, cookies are only returned to the server that sent them.
Otherwise the privacy would be strongly affected.
That property of the cookies is the reason of redirection in big web services (e.g. Google: gmail.com → mail.google.com)
See http://www.ietf.org/rfc/rfc2109.txt for a broader overview of HTTP state management with cookies.
See http://www.ietf.org/rfc/rfc2109.txt for a broader overview of HTTP state management with cookies.
http://www.mini.pw.edu.pl/~grzendam
p. 12
Maciej Grzenda
Cookie properties
• Expiration date / Max age
– Sets the maximum age of the cookie in seconds / expiration date – A positive value / date later than now indicates that the cookie will
expire after that many seconds have passed.
– A negative value / no date means that the cookie is not stored persistently and will be deleted when the Web browser exits.
– A zero value / date earlier than now causes the cookie to be deleted.
http://www.mini.pw.edu.pl/~grzendam
p. 13
Maciej Grzenda
Cookie – properties
• Path
– Specifies a path for the cookie to which the client should return the cookie.
– The cookie is visible to all the pages in the directory you specify, and all the pages in that directory's subdirectories. A cookie's path must include the server address that set the cookie, for example, /catalog, which makes the cookie visible to all directories on the server under /catalog.
– See RFC 2109 for more information on setting path names for cookies.
path = "/" can be used to declare a cookie that should be returned to any script on the current server. Notice that this may cause collisions between different web apps run on the same server.