As a DFIR examiner, poring over internet history records is a well-loathed daily activity. We spend hours looking at these lists trying to find an interesting URL that moves our case one direction or another. Sometimes we can use a filtering mechanism to remove URLs that we know for certain are uninteresting, but keeping a list like this up to date is a manual task. I used Websense to assist with this type of work at my previous job, but I have also had brief experiences with Blue Coat. as well.
What I would like to present to you here is a proof-of-concept (PoC) to automate the task of maintaining and filtering internet history. The companies that provide this type of service do so for a fee, and they provide their data in a proprietary closed system to protect their intellectual property. I was able to locate a service that provides downloadable category lists that I could easily work with to prove (or disprove) this concept. The lists are not free, but the costs seem very reasonable. For instance, a business downloading updated tables once a month only needs to pony up $6 a month.
The EnScript Alternative
I created an EnScript that can ingest this information and place it into a SQLite db for fast and easy lookups. I created a second EnScript that uses this generated SQLite db file to run through the internet history records parsed by EnCase and display them in a categorized view. This should allow you to save time and dig directly into the category that is most pertinent to your case. If a URL is not found in the db, then you revert back to the old mode of poring through the unknown list, only this time it has been significantly reduced by the known URLs.
You can get both EnScripts here. I also included a database that I downloaded in 2012 and converted to the SQLite db for you to play/test with.
I am using the TDurden evidence file here, which Guidance Software provides, for the following screenshots. The only things you need to do are load the evidence into a case and run the evidence processor to parse the internet history.
When you’re ready, run the “Categorize Internet History” EnScript. In my screenshot you may notice that I have 300 records showing for the IE history. When the EnScript window shows up, point it to the SQLite file that you downloaded along with the EnScripts (or a more recent db that you converted).
The process will take a few minutes to sort through all of the records, but when it’s done, you’ll see a window looking something like this:
This EnScript will only show you categories that have records from your evidence file. There were 73 categories at last count.
The history records that can’t be found in the db will be listed in the bottom “category” that I make up called Unknown. In the TDurden case, it is showing 163 records, which is almost half of the original uncategorized list. I don’t know about you, but I think a reduction of 50% is a gift from the DFIR gods!
Please download this and try it against your cases. If this helps to reduce your workload at all, please let me know here in the comments or on Twitter. I’m looking to expand the data sources, but I want to prove the concept before spending more time on it. One source I‘ll soon look at is the malwaredomainlist.com lists. If this proves useful to enough examiners, it will give me some weight to approach the big guys in this industry to see about exposing some sort of API.
Deeper Tech for Those Who Care
The db design is pretty simple at the moment. There are three tables:
- Categories: List of categories ingested from the lists
- Domains: Domains (and subdomains)
- URLs: Complete URL for specific pages different than the domain.
When checking for a given URL in the db, the EnScript does some normalization, such as https vs. http, trailing slashes, etc. Then it searches the URLs table to see if there is a very specific categorization record for this URL. If one is not found, it then searches the normalized domain of the URL in the domains table.
Both of the tables have indexes on the text fields to provide much faster searching.
I intend to expand this db design a bit more. For one, I would like to be able to show which source is providing the categorization data when the time comes to include multiple sources.