Firefox 42, which was released last week, introduced a new feature in its Private Browsing mode: tracking protection.
If you are interested in how this list is put together and then used in Firefox, this post is for you.
Safe Browsing lists
There are many possible ways to download URL lists to the browser and check against that list before loading anything. One of those is already implemented as part of our malware and phishing protection. It uses the Safe Browsing v2.2 protocol.
In a nutshell, the way that this works is that each URL on the block list is
hashed (using SHA-256
) and then that list of hashes is
downloaded by Firefox and stored into a data structure on disk:
~/.cache/mozilla/firefox/XXXX/safebrowsing/mozstd-track*
on Linux~/Library/Caches/Firefox/Profiles/XXXX/safebrowsing/mozstd-track*
on MacC:\Users\XXXX\AppData\Local\mozilla\firefox\profiles\XXXX\safebrowsing\mozstd-track*
on Windows
This sbdbdump script can be used to extract the hashes contained in these files and will output something like this:
$ ~/sbdbdump/dump.py -v .
- Reading sbstore: mozstd-track-digest256
[mozstd-track-digest256] magic 1231AF3B Version 3 NumAddChunk: 1 NumSubChunk: 0 NumAddPrefix: 0 NumSubPrefix: 0 NumAddComplete: 1696 NumSubComplete: 0
[mozstd-track-digest256] AddChunks: 1445465225
[mozstd-track-digest256] SubChunks:
...
[mozstd-track-digest256] addComplete[chunk:1445465225] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5
...
[mozstd-track-digest256] MD5: 81a8becb0903de19351427b24921a772
The name of the blocklist being dumped here (mozstd-track-digest256
) is set in the urlclassifier.trackingTable
preference which you can find in about:config
. The most important part of the output shown above is the addComplete
line which contains a hash that we will see again in a later section.
List lookups
Once it's time to load a resource, Firefox hashes the URL, as well as a few variations of it, and then looks for it in the local lists.
If there's no match, then the load proceeds. If there's a match, then we do an additional check against a pairwise allowlist.
The pairwise allowlist (hardcoded in the urlclassifier.trackingWhitelistTable
pref)
is designed to encode what we call "entity relationships". The list groups related domains together for
the purpose of checking whether a load is first or third party (e.g. twitter.com
and
twimg.com
both belong to the same entity).
Entries on this list (named mozstd-trackwhite-digest256
) look like this:
twitter.com/?resource=twimg.com
which translates to "if you're on the twitter.com
site, then don't block
resources from twimg.com
.
If there's a match on the second list, we don't block the load. It's only when we get a match on the first list and not the second one that we go ahead and cancel the network load.
If you visit our test page, you will see tracking protection in action with a shield icon in the URL bar. Opening the developer tool console will expose the URL of the resource that was blocked:
The resource at "https://trackertest.org/tracker.js" was blocked because tracking protection is enabled.
Creating the lists
The blocklist is created by Disconnect according to their definition of tracking.
The Disconnect list is on their Github page, but the copy we use in Firefox is the copy we have in our own repository. Similarly the Disconnect entity list is from here but our copy is in our repository. Should you wish to be notified of any changes to the lists, you can simply subscribe to this Atom feed.
To convert this JSON-formatted list into the binary format needed by the Safe Browsing code, we run a custom list generation script whenever the list changes on GitHub.
If you run that script locally using the same configuration as our server stack, you can see the conversion from the original list to the binary hashes.
Here's a sample entry from the mozstd-track-digest256.log
file:
[m] twimg.com >> twimg.com/
[canonicalized] twimg.com/
[hash] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5
and one from mozstd-trackwhite-digest256.log
:
[entity] Twitter >> (canonicalized) twitter.com/?resource=twimg.com, hash a8e9e3456f46dbe49551c7da3860f64393d8f9d96f42b5ae86927722467577df
This in combination with the sbdbdump
script mentioned earlier, will allow you to
audit the contents of the local lists.
Serving the lists
The way that the binary lists are served to Firefox is through a custom server component written by Mozilla: shavar.
Every hour, Firefox requests updates from shavar.services.mozilla.com
. If new data is available, then the whole list is downloaded again. Otherwise, all it receives in return is an empty 204
response.
To replicate how Firefox downloads the list, you can use this download script to ask the server for a copy of the full TP list:
$ ./download-list.py
n:3600
i:mozstd-track-digest256
u:tracking-protection.cdn.mozilla.net/mozstd-track-digest256/1445465225
and then follow the URL redirection to get the actual list payload from the CDN:
$ wget https://tracking-protection.cdn.mozilla.net/mozstd-track-digest256/1445465225
Once you've downloaded that binary file, you can examine its content using this extractor script:
$ ./redirect-response-extractor.py 1445465225
Parsing a 54294-byte response file
Processing control line...
Add chunk 1445465225 contains 54272 bytes of 32-byte hashes
Found 1696 prefixes in 54272 bytes
and dump all of the hashes it contains using the --verbose
argument:
$ ./redirect-response-extractor.py --verbose 1445465225
Parsing a 54294-byte response file
Processing control line...
Add chunk 1445465225 contains 54272 bytes of 32-byte hashes
35e032660edb921c0c0ce59bfa289dc5a84c71b99584b359d74d6b03d00de66f
532239bcc9edf7681023070798bee5ec5e4a6bc7c0bb68e1e8e9099e45fdff94
52c058e95fc8d0e51bb9dd4b72f1364aa471157475a8435daa71e8e1c9533615
...
e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5
...
8a565d247c08ff7fd0950d8a1f37bf2da29eae4a0dd65126d87a0db7cab4b400
ca705fed923ab66d6d8bfe0f65359a4b872981be5bcc1364e29aac69375af323
7fc983ea552f7c8d153fc308d621eb4f52e84aa63ecccf3a735698a11a2a4a8d
Found 1696 prefixes in 54272 bytes
which, as I have highlighted, contains the twimg.com
hash we have seen
earlier.
Should you want to play with the server backend and run your own instance,
follow the
installation instructions
and then go into about:config
to change these preferences to point to your
own instance:
browser.trackingprotection.gethashURL
browser.trackingprotection.updateURL
Note that on Firefox 43 and later, these prefs have been renamed to:
browser.safebrowsing.provider.mozilla.gethashURL
browser.safebrowsing.provider.mozilla.updateURL
Learn more
If you want to learn more about how tracking protection works in Firefox, you can find all of the technical details on the Mozilla wiki or you can ask questions on our mailing list.
Thanks to Tanvi Vyas for reviewing a draft of this post.
First, hats off to FF for doing such a great job for us non-tech citizens. I don't understand why anyone would ever trust a pro-data-use browser like edge or chrome. Happy to have made small annual contributions to Mozilla the last two years. Now, in my Win 10 Pro 64-bit event view error messages, I have an entry that "Name resolution for the name shavar.services.mozilla.com timed out after none of the configured DNS servers responded." But from this article, I suppose there's nothing to worry about because shavar updates every hour, and this one line is probably just a glitch. Some electron somewhere had to run to the bathroom for five minutes? Thanks.
It's hard to say what went wrong, but I wouldn't worry about it because Firefox will keep using the old list as long as it can't reach the server.