On the Similarity of Web Measurements Under Different Experimental Setups - Prof. Dr. Norbert Pohlmann

On the Similarity of Web Measurements Under Different Experimental Setups

N. Demir, M. Große-Kampmann, T. Holz, J. Hörnemann, Norbert Pohlmann (Institut für Internet-Sicherheit), T. Urban, C. Wressnegger:
“On the Similarity of Web Measurements Under Different Experimental Setups”.
In Proceedings of the 23nd ACM Internet Measurement Conference,
23nd ACM Internet Measurement Conference (2023)
Montreal, Kanada,
24.10.2023–26.10.2023

ABSTRACT
Measurement studies are essential for research and industry alike to understand the Web’s inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment’s careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in
the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we build ‘dependency trees’ for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results.

INTRODUCTION
Modern websites are intricate and complex software applications that offer a vast array of features. They often rely heavily on third-party services for their construction and operation. These services are embedded to dynamically load additional content, such as ads or fonts, which may only sometimes be under the control of the site operators [22, 25]. This dynamic loading process can introduce a non-deterministic set of objects on a page, potentially affecting commonly studied phenomena such as Web tracking mechanisms [34] or HTTP headers [31]. Consequently, the same webpage could present different objects during Web measurement studies.
Researchers often resort to Web measurements to comprehend various phenomena, like Web tracking, security IMC ’23, October 2023, Montréal, Canada Demir et al. mechanisms, or the behavior of social media sites, that affect millions of users [1, 12, 15, 17, 23, 29, 35]. However, the dynamic nature of the Web poses a significant challenge to these measurements. Tools such as OpenWPM [19] or custom-built crawlers are used to scale up these experiments.
Despite this, the effects of different measurement setups and the root causes of measurement discrepancies still need to be more adequately understood. Prior research has indicated that even minor changes in a Web measurement setup can significantly affect the results and conclusions of
a study [2, 11, 14, 24, 31]. While previous studies have mainly explored the effects of such practices, they have yet to delve into why results differ.
This study bridges that gap by investigating the impact of various measurement setups, providing a detailed illustration of differences in datasets resulting from the respective setups.
We examine the similarity of embedded first- and third-party objects across five measurement profiles. Leveraging these profiles, we conduct a large-scale Web measurement covering nearly 25,000 sites and over 350,000 pages, forming the foundation of our analysis. Afterward, we construct dependency trees for each visited page and cross-compare these trees, enabling us to identify and quantify differences and
determine to what extent they exist. This approach aids us in fostering a deeper understanding
of the comparability of privacy studies by cross-comparing the similarity of different trees horizontally (i.e., nodes at a specific depth) and vertically (i.e., loading dependencies of an object). Our experiment contributes to establishing more robust measurement setups, ensuring reliable results and reproducibility for future work, and understanding why current measurements lack these aspects. Moreover, it provides
insights into the comparability of different works.
In summary, our contributions include:
• Differences in dependency trees. We illustrate that trees obtained from different profiles present notable differences in dimensions, node types, and loading dependencies. These findings suggest that each page visit or measurement introduces a degree of variance, impacting the comparability and
reproducibility of a study.
• Causes of differences. We identify the entity loading a node and the resource type of the node as primary factors influencing the differences observed in the trees. Specifically, we detail that a node’s content type (e.g., iframes or images) and its loading context (e.g., third-party) are key drivers in introducing dissimilarities between Web measurements.
• Effects of measurement setups. We examine the differences caused by minor changes in Web measurement setups. Our findings reveal that even identical setups operating in parallel and visiting the same pages can yield significantly different results. Furthermore, we demonstrate that simple
design choices (e.g., mimicked user interaction) can produce almost incomparable results.

…

kostenlos downloaden