N. Demir, M. Große-Kampmann, T. Holz, J. Hörnemann, Norbert Pohlmann (Institut für Internet-Sicherheit), T. Urban, C. Wressnegger: “On the Similarity of Web Measurements Under Different Experimental Setups”. In Proceedings of the 23nd ACM Internet Measurement Conference, 23nd ACM Internet Measurement Conference (2023) Montreal, Kanada, 24.10.2023–26.10.2023
ABSTRACT Measurement studies are essential for research and industry alike to understand the Web’s inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment’s careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we build ‘dependency trees’ for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results.
INTRODUCTION Modern websites are intricate and complex software applications that offer a vast array of features. They often rely heavily on third-party services for their construction and operation. These services are embedded to dynamically load additional content, such as ads or fonts, which may only sometimes be under the control of the site operators [22, 25]. This dynamic loading process can introduce a non-deterministic set of objects on a page, potentially affecting commonly studied phenomena such as Web tracking mechanisms [34] or HTTP headers [31]. Consequently, the same webpage could present different objects during Web measurement studies. Researchers often resort to Web measurements to comprehend various phenomena, like Web tracking, security IMC ’23, October 2023, Montréal, Canada Demir et al. mechanisms, or the behavior of social media sites, that affect millions of users [1, 12, 15, 17, 23, 29, 35]. However, the dynamic nature of the Web poses a significant challenge to these measurements. Tools such as OpenWPM [19] or custom-built crawlers are used to scale up these experiments. Despite this, the effects of different measurement setups and the root causes of measurement discrepancies still need to be more adequately understood. Prior research has indicated that even minor changes in a Web measurement setup can significantly affect the results and conclusions of a study [2, 11, 14, 24, 31]. While previous studies have mainly explored the effects of such practices, they have yet to delve into why results differ. This study bridges that gap by investigating the impact of various measurement setups, providing a detailed illustration of differences in datasets resulting from the respective setups. We examine the similarity of embedded first- and third-party objects across five measurement profiles. Leveraging these profiles, we conduct a large-scale Web measurement covering nearly 25,000 sites and over 350,000 pages, forming the foundation of our analysis. Afterward, we construct dependency trees for each visited page and cross-compare these trees, enabling us to identify and quantify differences and determine to what extent they exist. This approach aids us in fostering a deeper understanding of the comparability of privacy studies by cross-comparing the similarity of different trees horizontally (i.e., nodes at a specific depth) and vertically (i.e., loading dependencies of an object). Our experiment contributes to establishing more robust measurement setups, ensuring reliable results and reproducibility for future work, and understanding why current measurements lack these aspects. Moreover, it provides insights into the comparability of different works. In summary, our contributions include: • Differences in dependency trees. We illustrate that trees obtained from different profiles present notable differences in dimensions, node types, and loading dependencies. These findings suggest that each page visit or measurement introduces a degree of variance, impacting the comparability and reproducibility of a study. • Causes of differences. We identify the entity loading a node and the resource type of the node as primary factors influencing the differences observed in the trees. Specifically, we detail that a node’s content type (e.g., iframes or images) and its loading context (e.g., third-party) are key drivers in introducing dissimilarities between Web measurements. • Effects of measurement setups. We examine the differences caused by minor changes in Web measurement setups. Our findings reveal that even identical setups operating in parallel and visiting the same pages can yield significantly different results. Furthermore, we demonstrate that simple design choices (e.g., mimicked user interaction) can produce almost incomparable results. …
kostenlos downloaden
Weitere Informationen zum Thema “Web Measurements”
„Wie Sicherheit im Internet messbar wird – Internet-Kennzahlen für mehr Transparenz bei Risiken und Sicherheitsstatus“
„Fake-News in Sozialen Netzwerken – Das „Mitmach-Web“ hat seine Unschuld (endgültig) verloren“ „Our (in)Secure Web: Understanding Update Behavior of Websites and Its Impact on Security“ „DNS over HTTPS (DoH) – Schutz der Privatsphäre und Sicherheit auf Protokollebene“ „Reproducibility and Replicability of Web Measurement Studies“
„Lehrbuch Cyber-Sicherheit“
„Übungsaufgaben und Ergebnisse zum Lehrbuch Cyber-Sicherheit“ „Bücher im Bereich Cyber-Sicherheit und IT-Sicherheit zum kostenlosen Download“ „Trusted Computing – Ein Weg zu neuen IT-Sicherheitsarchitekturen“
„Vorlesungen zum Lehrbuch Cyber-Sicherheit“
„Cybernation – Motivation/Definition/Vorgehensweise“ „Zero Trust: Allheilmittel oder fauler Zauber?“ „Cyber-Sicherheit – Lage und Strategien“
„Forschungsinstitut für Internet-Sicherheit (IT-Sicherheit, Cyber-Sicherheit)“
„Master-Studiengang Internet-Sicherheit (IT-Sicherheit, Cyber-Sicherheit)“ „Marktplatz IT-Sicherheit“ „Marktplatz IT-Sicherheit: IT-Notfall“ „Marktplatz IT-Sicherheit: IT-Sicherheitstools“ „Marktplatz IT-Sicherheit: Selbstlernangebot“ „Vertrauenswürdigkeits-Plattform“
„Was wir in der Cybersicherheit angehen müssen“
„IT-Technologien müssen für die digitale Zukunft deutlich robuster werden“ „Schutzlos ausgeliefert? Über unsere Abhängigkeit von digitalen Systemen haben“ „IT-Sicherheitslage in Deutschland: Unternehmen sollten ihre Cyber-Sicherheitsmaßnahmen jetzt überprüfen“
„eco-Studie: Security und digitale Identitäten“
„The German Smart City Market 2021-2026“ „Enquete‐Kommission Künstliche Intelligenz“
„Cyber-Sicherheit braucht mehr Fokus“
„IT-Sicherheitsstrategie für Deutschland“ – Wirkungsklassen von IT-Sicherheitsmaßnahmen für unterschiedliche Schutzbedarfe
„IT-Sicherheit für NRW 4.0 – Gemeinsam ins digitale Zeitalter. Aber sicher.“ „Human-Centered Systems Security – IT Security by People for People“ |