Anonymity Browser fingerprinting. How to track users on Network. [PART 2]

Serafim · Feb 28, 2021

HTML5 AppCache

Application Cache allows you to specify which part of the site should be saved to disk and be accessible, even if the user is offline. All controlled with manifests that specify the rules for the storage and retrieval of elements in the cache. Similar to the traditional caching mechanism, AppCache also allows you to store unique, user-specific data-both inside the manifest itself and inside resources that are stored indefinitely (unlike a regular cache, resources from which are deleted after some time). AppCache occupies an intermediate value between the HTML5 data storage mechanisms and the normal browser cache. In some browsers, it is cleared when cookies and site data are deleted, while in others, it is only cleared when browsing history and all cached documents are deleted.

Other storage mechanisms

But this is not all the options. With the help of JavaScript and Its fellow developers, you can save and request a unique identifier so that it remains alive even after deleting the entire browsing history and site data. As one of the options, you can use it for storing window.nameor sessionStorage. Even if the user clears all cookies and site data, but does not close the tab where the tracking site was opened, the identification token will be received by the server on the next visit and the user will again be linked to the data already collected about him. The same behavior is observed in JS. any open JavaScript context retains its state, even if the user deletes the site data. At the same time, such JavaScript can not only belong to the displayed site, but also hide in iframes, web workers, and so on. For example, an ad loaded in an iframe will not pay any attention to deleting the site's browsing history and data, and will continue to use the ID stored in a local variable in JS.

Protocols

In addition to the mechanisms associated with caching, the use of JS and various plugins, modern browsers have several other network features that allow you to store and retrieve unique identifiers.

Origin Bound Certificates aka ChannelID) - persistent self-signed certificates that identify the client to the HTTPS server. For each new domain, a separate certificate is created, which is used for connections initiated later. Sites can use OBC to track users without taking any actions that will be visible to the client. As a unique identifier, you can use the cryptographic hash of the certificate provided by the client as part of a legitimate SSL handshake.
Similarly, TLS also has two mechanisms-session identifiersandsession tickets, which allow clients to resume interrupted HTTPS connections without performing a full handshake. This is achieved by using cached data. These two mechanisms allow servers to identify requests originating from a single client over a short period of time.
Almost all modern browsers implement their own internal DNS cache to speed up the name resolution process (and in some cases reduce the risk of DNS rebinding attacks). This cache can easily be used to store small amounts of information. For example, if you have 16 available IP addresses, about 8-9 cached names will be enough to identify each computer on the Network. However, this approach is limited by the size of the browsers ' internal DNS cache and can potentially lead to name resolution conflicts with the provider's DNS.

Machine specifications

All the methods considered before were based on the fact that the user was set a unique identifier, which was sent to the server during subsequent requests. There is another, less obvious approach to tracking users that relies on querying or measuring the characteristics of the client machine. Individually, each received characteristic represents only a few bits of information, but if you combine several of them, they can uniquely identify any computer on the Internet. In addition to the fact that such surveillance is much more difficult to detect and prevent, this technique will allow you to identify a user who is sitting under different browsers or using private mode.

Browser's "fingerprints"

The simplest approach to tracking is to build identifiers by combining a set of parameters available in the browser environment, each of which individually is not of any interest, but together they form a unique value for each machine:

User-Agent. Returns the browser version, OS version, and some of the installed Addons. In cases where the User-Agent is missing or you want to check its "veracity", you can determine the browser version by checking for certain features implemented or changed between releases.
Clock running. If the system does not synchronize its clock with a third-party time server, then sooner or later it will start to lag or rush, which will create a unique difference between real and system time, which can be measured with microsecond accuracy using JavaScript. In fact, even when syncing with an NTP server, there will still be small deviations that can also be measured.
Information about CPU and GPU. You can get it either directly (via GL_RENDERER), or through benchmarks and tests implemented using JavaScript.
Monitor resolution and browser window size (including parameters of the second monitor in the case of a multi-monitor system).
A list of fonts installed in the system, obtained, for example, using getComputedStylethe API.
A list of all installed plugins, ActiveX controls, and Browser Helper Objects, including their versions. You can get it by brutenavigator.plugins[]-force (some plugins show their presence in HTTP headers).
Information about installed extensions and other SOFTWARE. Extensions such as ad blockers make certain changes to the pages viewed, which can be used to determine what kind of extension it is and its settings.

Network " fingerprints»

A number of other features are found in the architecture of the local network and the configuration of network protocols. Such signs will be common for all browsers installed on the client machine, and they can't just be hidden using privacy settings or some security utilities. These include:

External IP address. For IPv6 addresses, this vector is particularly interesting, since in some cases the last octets can be obtained from the device's MAC address and therefore be preserved even when connected to different networks.
Port numbers for outgoing TCP / IP connections (usually selected sequentially for most operating systems).
Local IP address for users who are behind a NAT or HTTP proxy. Combined with an external IP address, it allows you to uniquely identify most of your customers.
Information about the proxy servers used by the client, obtained from the HTTP header (X-Forwarded-For). In combination with the real address of the client, obtained through several possible ways to bypass the proxy also allows you to identify the user.

Behavioral analysis and habits

Another option is to look in the direction of characteristics that are not tied to the PC, but rather to the end user, such as regional settings and behavior. This method again allows you to identify clients between different browser sessions, profiles, and in the case of private browsing. You can draw conclusions based on the following data, which is always available for study:

Preferred language, default encoding, and time zone (all of this lives in HTTP headers and is accessible from JavaScript).
Data in the client's cache and its browsing history. Cache elements can be detected using time-based attacks - the tracker can detect long-lived cache elements related to popular resources by simply measuring the time from loading (and canceling the transition if the time exceeds the expected load time from the local cache). You can also extract URLS stored in the browser's browsing history, although such an attack in modern browsers will require little user interaction.
Mouse gestures, the frequency and duration of keystrokes, and data from the accelerometer - all these parameters are unique for each user.
Any changes to the site's standard fonts and their sizes, zoom level, and use of special features such as text color and size.
The state of certain browser features configured by the client: blocking third-party cookies, DNS prefetching, blocking pop-UPS, Flash security settings, and so on (ironically, users who change the default settings actually make their browser much easier to identify).

And these are just the obvious options that lie on the surface. If you dig deeper - you can come up with more.

To summarize

As you can see, in practice, there are a large number of different ways to track a user. Some of them are the result of implementation errors or omissions and can theoretically be corrected. Others are almost impossible to eradicate without completely changing the principles of computer networks, web applications, and browsers. You can counteract some techniques by clearing the cache, cookies, and other places where unique identifiers can be stored. Others work completely unnoticed by the user, and you are unlikely to be able to protect yourself from them. Therefore, the most important thing is to travel around the Network, even in private viewing mode, remember that your movements can still be tracked.

Anonymity Browser fingerprinting. How to track users on Network. [PART 2]

Serafim

HTML5 AppCache​

Other storage mechanisms​

Protocols​

Machine specifications​

Browser's "fingerprints"​

Network " fingerprints»​

Behavioral analysis and habits​

To summarize​