Understanding Browser Artifacts
Geo-location artifacts demonstrate an interesting concept with regard to browser-based evidence. Among the various browser artifacts, Internet history is a fan favorite because it provides such rich information. There is no easier place to look to identify sites visited by a specific user at a specific time. Browser history is so useful, a critical shortcoming is often ignored; with today’s dynamic web pages, the vast number of web page requests go unrecorded. When a user visits a website, a multitude of requests are completed in the background to retrieve images and advertisements, populate web analytics, and load content from third parties. The content retrieved from these requests is stored within the cache, and an entry within the cache database is created. While the browser history database may only show the page visited, the cache holds most of the components retrieved to dynamically build that page.
Browser Forensics: History versus Cache Artifacts
To see the difference between what is recorded in browser history versus what may be found within the cache, consider a simple location-aware website that calculates the driving distance between any two points and displays a map. The specific page we will visit is www.mileage-charts.com/search/calc.php. For HTML5 compliant browsers, an option is given to automatically determine the starting point based on your current location.
For this example, we will use the Firefox browser, which provides easy live access to both the history and cached data. Before accessing the site, all browser artifacts were removed. When we enter our URL into the browser, we see only one entry in our history library (Figure 1). However, when the browser cache is reviewed using the about:cache function in Firefox, we see a total of 116 entries representing 11 separate domains (Figure 2). Each entry in the cache ultimately gives us multiple timestamps, a usage count, and the ability to extract and review the cached data (including the pictures used to generate the Google map). Looking closely at Figure 2, an entry from maps.google.com shows the coordinates determined by the HTML5 geo-location feature of Mileage-Charts.com. We have a clue that the two entries are related by their last modified times in the browser cache. The maps.google.com entry gives us some ability to say that the device geo-located to the coordinates 40.646062, -111.497972 on August 4, 2011 05:34:19.
Tip: When profiling sites on a test system you might also consider employing a web proxy, like Paros. A web proxy can capture all inbound and outbound web traffic and give the most complete view of dynamic browser activity, including connections that do not result in cached data.
Geo-location via Mapping Services
How did I know that the “vp=” parameter contained geo-located coordinates? The answer rests with understanding how web-based mapping services work. A vast number of websites utilize mapping services from Google, Yahoo & Microsoft for visually displaying locations. HTML5 geo-location features allow further customization of these maps and sites are increasingly using them to identify visitor locations. Instead of a hamburger franchise showing all of its locations on a map, it can first determine the visitor’s location and show only the closest locations. Only a few lines of code are required to make this change; hence we should expect to see an increasing number of geo-artifacts during our examinations. If we can find the coordinates used by the site to create its map, then that information can be used to tie that device to a location at a specific time, with some degree of accuracy. The easiest place to identify these coordinates is as URL parameters recorded by the browser cache.
Identifying Google Maps Geo-location Data
Google Maps is the most widely used mapping service so we will use it to demonstrate the analysis process. Similar to reviewing search URLs to identify what a user was searching for, we can identify requests to Google Maps which include location information. This location information is often passed via query string parameters, which are dutifully recorded by the browser cache. As an example, you might find the following:
Query string parameters for this request are denoted after the question mark. In this case, we see the parameter “ll” used to request a map centered on latitude 40.760779 and longitude -111.891047. Google Maps accepts a multitude of parameters, but Figure 3 contains those I have found most useful to identify device location (green checks are particularly useful).
In an ideal world, we would be able to determine the device location via the initial communication with the geo-location service. In practice, little of this is stored on the host system because it is conducted via the HTTPS protocol, which explicitly does not cache data. Thus we are required to analyze the by-product of the geo-location, which are the changes made to the page as a result of the new location information (such as a new map being drawn). The difficult part of this process is determining what requests give information about device location. Consider finding an entry for a Google Maps request using a specific set of coordinates. How can we determine definitively if those coordinates are a result of the device being geo-located? In general, we can use the presence of coordinates within URL parameters as an indication of possible geo-location and then test our hypothesis by gathering additional data. Context is extremely important when attempting to identify geo-location artifacts from cached map requests. A search of an address at http://maps.google.com looks much different than a geo-location event triggered by http://twitter.com. A few heuristics:
- Place the map requests in context with the concurrent pages being visited. Do those pages implement a geo-location feature?
- If an explicit location search was accomplished in close proximity to the map request (for instance, via the “q=” parameter in Google Maps), it is a good sign that geo-location did not occur
- Conversely, specific latitude and longitude coordinates used by mapping applications in the absence of a search often indicate the use of a geo-location sensor
Next up in part 3: Profiling location-aware web applications