TrackerSift: Untangling Mixed Tracking and Functional Web Resources

2021 
Trackers typically circumvent filter lists used by privacy-enhancing content blocking tools by changing the domains or URLs of their resources. Filter list maintainers painstakingly attempt to keep up in the ensuing arms race by frequently updating the filter lists. Trackers have recently started to mix tracking and functional resources, putting content blockers in a bind: risk breaking legitimate functionality if they act and risk missing privacy-invasive advertising and tracking if they do not. In this paper, we conduct a large-scale measurement study of such mixed (i.e., both tracking and functional) resources on 100K websites. We propose TRACKERSIFT, an approach that progressively classifies and untangles mixed web resources at multiple granularities of analysis (domain, hostname, script, and method). Using TRACKERSIFT, we find that 83% of the domains can be separated as tracking or functional, and the remaining 17% (11.8K) domains are classified as mixed. For the mixed domains, 52% of the hostnames can be separated, and the remaining 48% (12.3K) hostnames are classified as mixed. For the mixed hostnames, 94% of the javascript snippets can be separated, and the remaining 6%(21.1K) scripts are classified as mixed. For the mixed scripts,91% of the JavaScript methods can be separated, and the remaining 9% (5.5K) methods are classified as mixed. Overall, TRACKERSIFT is able to attribute 98% of all requests to tracking or functional resources at the finest level of granularity. Our analysis shows that mixed resources at different granularities are typically served from CDNs or as inlined and bundled scripts. Our results highlight opportunities for fine-grained content blocking to remove mixed resources without breaking legitimate functionality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    0
    Citations
    NaN
    KQI
    []