main-content

## Über dieses Buch

This book constitutes the proceedings of the 20th International Conference on Passive and Active Measurement, PAM 2019, held in Puerto Varas, Chile, in March 2019.

The 20 full papers presented were carefully reviewed and selected from 75 submissions. The papers cover a wide range of important networking measurement and analysis topics from low layers of the network stack up to applications, using measurements at scales large and small, and covering important aspects of the network ecosystem such as routing, DNS, privacy, security, and performance. They are organized in the following topical sections: mobile networks; measurement at Internet scale; measuremen at other scales; domain names; failures; security and privacy; and Web.

## Inhaltsverzeichnis

### Leveraging Context-Triggered Measurements to Characterize LTE Handover Performance

Abstract
In cellular networks, handover plays a vital role in supporting mobility and connectivity. Traditionally, handovers in a cellular network focus on maintaining continuous connectivity for legacy voice calls. However, there is a poor understanding of how today’s handover strategies impact the network performance, especially for applications that require reliable Internet connectivity.
In this work, using a newly designed context-triggered measurement framework, we carry out the first comprehensive measurement study in LTE networks on how handover decisions implemented by carriers impact network layer performance. We find that the interruption in connectivity during handover is minimal, but in 43% of cases the end-to-end throughput degrades after the handover. The cause is that the deployed handover policy uses statically configured signal strength threshold as the key factor to decide handover and focuses on improving signal strength which by itself is an imperfect metric for performance. We propose that handover decision strategies trigger handover based on predicted performance considering factors such as cell load along with application preference.
Shichang Xu, Ashkan Nikravesh, Z. Morley Mao

### Measuring Web Quality of Experience in Cellular Networks

Abstract
Measuring and understanding the end-user browsing Quality of Experience (QoE) is crucial to Mobile Network Operators (MNOs) to retain their customers and increase revenue. MNOs often use traffic traces to detect the bottlenecks and study their end-users experience. Recent studies show that Above The Fold (ATF) time better approximates the user browsing QoE compared to traditional metrics such as Page Load Time (PLT). This work focuses on developing a methodology to measure the web browsing QoE over operational Mobile Broadband (MBB) networks. We implemented a web performance measurement tool WebLAR (it stands for Web Latency And Rendering) that measures web Quality of Service (QoS) such as TCP connect time, and Time To First Byte (TTFB) and web QoE metrics including PLT and ATF time. We deployed WebLAR on 128 MONROE (a European-wide mobile measurement platform) nodes, and conducted two weeks long (May and July 2018) web measurement campaign towards eight websites from six operational MBB networks. The result shows that, in the median case, the TCP connect time and TTFB in Long Term Evolution (LTE) networks are, respectively, 160% and 30% longer than fixed-line networks. The DNS lookup time and TCP connect time of the websites varies significantly across MNOs. Most of the websites do not show a significant difference in PLT and ATF time across operators. However, Yahoo shows longer ATF time in Norwegian operators than that of the Swedish operators. Moreover, user mobility has a small impact on the ATF time of the websites. Furthermore, the website design should be taken into consideration when approximating the ATF time.
Alemnew Sheferaw Asrese, Ermias Andargie Walelgne, Vaibhav Bajpai, Andra Lutu, Özgü Alay, Jörg Ott

### Realtime Mobile Bandwidth Prediction Using LSTM Neural Network

Abstract
With the popularity of mobile access Internet and the higher bandwidth demand of mobile applications, user Quality of Experience (QoE) is particularly important. For bandwidth and delay sensitive applications, such as Video on Demand (VoD), Realtime Video Call, Games, etc., if the future bandwidth can be estimated in advance, it will greatly improve the user QoE. In this paper, we study realtime mobile bandwidth prediction in various mobile networking scenarios, such as subway and bus rides along different routes. The main method used is Long Short Term Memory (LSTM) recurrent neural network. In specific scenarios, LSTM achieves significant accuracy improvements over the state-of-the-art prediction algorithms, such as Recursive Least Squares (RLS). We further analyze the bandwidth patterns in different mobility scenarios using Multi-Scale Entropy (MSE) and discuss its connections to the achieved accuracy.
Lifan Mei, Runchen Hu, Houwei Cao, Yong Liu, Zifa Han, Feng Li, Jin Li

### Hidden Treasures – Recycling Large-Scale Internet Measurements to Study the Internet’s Control Plane

Abstract
Internet-wide scans are a common active measurement approach to study the Internet, e.g., studying security properties or protocol adoption. They involve probing large address ranges (IPv4 or parts of IPv6) for specific ports or protocols. Besides their primary use for probing (e.g., studying protocol adoption), we show that—at the same time—they provide valuable insights into the Internet control plane informed by ICMP responses to these probes—a currently unexplored secondary use. We collect one week of ICMP responses (637.50M messages) to several Internet-wide ZMap scans covering multiple TCP and UDP ports as well as DNS-based scans covering >50% of the domain name space. This perspective enables us to study the Internet’s control plane as a by-product of Internet measurements. We receive ICMP messages from $$\sim$$171M different IPs in roughly 53K different autonomous systems. Additionally, we uncover multiple control plane problems, e.g., we detect a plethora of outdated and misconfigured routers and uncover the presence of large-scale persistent routing loops in IPv4.
Jan Rüth, Torsten Zimmermann, Oliver Hohlfeld

### Caching the Internet: A View from a Global Multi-tenant CDN

Abstract
Commercial Content Delivery Networks (CDNs) employ a variety of caching policies to achieve fast and reliable delivery in multi-tenant environments with highly variable workloads. In this paper, we explore the efficacy of popular caching policies in a large-scale, global, multi-tenant CDN. We examine the client behaviors observed in a network of over 125 high-capacity Points of Presence (PoPs). Using production data from the Edgecast CDN, we show that for such a large-scale and diverse use case, simpler caching policies dominate. We find that LRU offers the best compromise between hit-rate and disk I/O, providing $$60\%$$ fewer writes than FIFO, while maintaining high hit-rates. We further observe that at disk sizes used in a large-scale CDN, LRU performs on par with complex polices like S4LRU. We further examine deterministic and probabilistic cache admission policies and quantify their trade-offs between hit-rate and origin traffic. Moreover, we explore the behavior of caches at multiple layers of the CDN and provide recommendations to reduce connections passing through the system’s load balancers by approximately $$50\%$$.
Marcel Flores, Harkeerat Bedi

### Sundials in the Shade

An Internet-Wide Perspective on ICMP Timestamps
Abstract
ICMP timestamp request and response packets have been standardized for nearly 40 years, but have no modern practical application, having been superseded by NTP. However, ICMP timestamps are not deprecated, suggesting that while hosts must support them, little attention is paid to their implementation and use. In this work, we perform active measurements and find 2.2 million hosts on the Internet responding to ICMP timestamp requests from over 42,500 unique autonomous systems. We develop a methodology to classify timestamp responses, and find 13 distinct classes of behavior. Not only do these behaviors enable a new fingerprinting vector, some behaviors leak important information about the host e.g., OS, kernel version, and local timezone.
Erik C. Rye, Robert Beverly

### Where on Earth Are the Best-50 Time Servers?

Abstract
We present a list of the Best-50 public IPv4 time servers by mining a high-resolution dataset of Stratum-1 servers for Availability, Stratum Constancy, Leap Performance, and Clock Error, broken down by continent. We find that a server with ideal leap performance, high availability, and low stratum variation is often clock error-free, but this is no guarantee. We discuss the relevance and lifetime of our findings, the scalability of our approach, and implications for load balancing and server ranking.
Yi Cao, Darryl Veitch

### Service Traceroute: Tracing Paths of Application Flows

Abstract
Traceroute is often used to help diagnose when users experience issues with Internet applications or services. Unfortunately, probes issued by classic traceroute tools differ from application traffic and hence can be treated differently by routers that perform load balancing and middleboxes within the network. This paper proposes a new traceroute tool, called Service traceroute. Service traceroute leverages the idea from paratrace, which passively listens to application traffic to then issue traceroute probes that pretend to be part of the application flow. We extend this idea to work for modern Internet services with support for identifying the flows to probe automatically, for tracing of multiple concurrent flows, and for UDP flows. We implement command-line and library versions of Service traceroute, which we release as open source. This paper also presents an evaluation of Service traceroute when tracing paths traversed by Web downloads from the top-1000 Alexa websites and by video sessions from Twitch and Youtube. Our evaluation shows that Service traceroute has no negative effect on application flows. Our comparison with Paris traceroute shows that a typical traceroute tool that launches a new flow to the same destination discovers different paths than when embedding probes in the application flow in a significant fraction of experiments (from 40% to 50% of our experiments in PlanetLab Europe).
Ivan Morandi, Francesco Bronzino, Renata Teixeira, Srikanth Sundaresan

### Mapping an Enterprise Network by Analyzing DNS Traffic

Abstract
Enterprise networks are becoming more complex and dynamic, making it a challenge for network administrators to fully track what is potentially exposed to cyber attack. We develop an automated method to identify and classify organizational assets via analysis of just 0.1% of the enterprise traffic volume, specifically corresponding to DNS packets. We analyze live, real-time streams of DNS traffic from two organizations (a large University and a mid-sized Government Research Institute) to: (a) highlight how DNS query and response patterns differ between recursive resolvers, authoritative name servers, web-servers, and regular clients; (b) identify key attributes that can be extracted efficiently in real-time; and (c) develop an unsupervised machine learning model that can classify enterprise assets. Application of our method to the 10 Gbps live traffic streams from the two organizations yielded results that were verified by the respective IT departments, while also revealing new knowledge, attesting to the value provided by our automated system for mapping and tracking enterprise assets.
Minzhao Lyu, Hassan Habibi Gharakheili, Craig Russell, Vijay Sivaraman

### A First Look at QNAME Minimization in the Domain Name System

Abstract
The Domain Name System (DNS) is a critical part of network and Internet infrastructure; DNS lookups precede almost any user request. DNS lookups may contain private information about the sites and services a user contacts, which has spawned efforts to protect privacy of users, such as transport encryption through DNS-over-TLS or DNS-over-HTTPS.
In this work, we provide a first look on the resolver-side technique of query name minimization (qmin), which was standardized in March 2016 as RFC 7816. qmin aims to only send minimal information to authoritative name servers, reducing the number of servers that full DNS query names are exposed to. Using passive and active measurements, we show a slow but steady adoption of qmin on the Internet, with a surprising variety in implementations of the standard. Using controlled experiments in a test-bed, we validate lookup behavior of various resolvers, and quantify that qmin both increases the number of DNS lookups by up to 26%, and also leads to up to 5% more failed lookups. We conclude our work with a discussion of qmin’s risks and benefits, and give advice for future use.
Wouter B. de Vries, Quirin Scheitle, Moritz Müller, Willem Toorop, Ralph Dolmans, Roland van Rijswijk-Deij

### Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research

Abstract
Top domain rankings (e.g., Alexa) are commonly used in security research, such as to survey security features or vulnerabilities of “relevant” websites. Due to their central role in selecting a sample of sites to study, an inappropriate choice or use of such domain rankings can introduce unwanted biases into research results. We quantify various characteristics of three top domain lists that have not been reported before. For example, the weekend effect in Alexa and Umbrella causes these rankings to change their geographical diversity between the workweek and the weekend. Furthermore, up to 91% of ranked domains appear in alphabetically sorted clusters containing up to 87k domains of presumably equivalent popularity. We discuss the practical implications of these findings, and propose novel best practices regarding the use of top domain lists in the security community.
Walter Rweyemamu, Tobias Lauinger, Christo Wilson, William Robertson, Engin Kirda

### Funny Accents: Exploring Genuine Interest in Internationalized Domain Names

Abstract
International Domain Names (IDNs) were introduced to support non-ASCII characters in domain names. In this paper, we explore IDNs that hold genuine interest, i.e. that owners of brands with diacritical marks may want to register and use. We generate 15 276 candidate IDNs from the page titles of popular domains, and see that 43% are readily available for registration, allowing for spoofing or phishing attacks. Meanwhile, 9% are not allowed by the respective registry to be registered, preventing brand owners from owning the IDN. Based on WHOIS records, DNS records and a web crawl, we estimate that at least 50% of the 3 189 registered IDNs have the same owner as the original domain, but that 35% are owned by a different entity, mainly domain squatters; malicious activity was not observed. Finally, we see that application behavior toward these IDNs remains inconsistent, hindering user experience and therefore widespread uptake of IDNs, and even uncover a phishing vulnerability in iOS Mail.
Victor Le Pochat, Tom Van Goethem, Wouter Joosen

### BGP Zombies: An Analysis of Beacons Stuck Routes

Abstract
Network operators use the Border Gateway Protocol (BGP) to control the global visibility of their networks. When withdrawing an IP prefix from the Internet, an origin network sends BGP withdraw messages, which are expected to propagate to all BGP routers that hold an entry for that IP prefix in their routing table. Yet network operators occasionally report issues where routers maintain routes to IP prefixes withdrawn by their origin network. We refer to this problem as BGP zombies and characterize their appearance using RIS BGP beacons, a set of prefixes withdrawn every four hours. Across the 27 monitored beacon prefixes, we observe usually more than one zombie outbreak per day. But their presence is highly volatile, on average a monitored peer misses 1.8% withdraws for an IPv4 beacon (2.7% for IPv6). We also discovered that BGP zombies can propagate to other ASes, for example, zombies in a transit network are inevitably affecting its customer networks. We employ a graph-based semi-supervised machine learning technique to estimate the scope of zombies propagation, and found that most of the observed zombie outbreaks are small (i.e. on average 10% of monitored ASes for IPv4 and 17% for IPv6). We also report some large zombie outbreaks with almost all monitored ASes affected.
Romain Fontugne, Esteban Bautista, Colin Petrie, Yutaro Nomura, Patrice Abry, Paulo Goncalves, Kensuke Fukuda, Emile Aben

### How to Find Correlated Internet Failures

Abstract
Even as residential users increasingly rely upon the Internet, connectivity sometimes fails. Characterizing small-scale failures of last mile networks is essential to improving Internet reliability.
In this paper, we develop and evaluate an approach to detect Internet failure events that affect multiple users simultaneously using measurements from the Thunderping project. Thunderping probes addresses across the U.S. When the areas in which they are geo-located are affected by severe weather alerts. It detects a disruption event when an IP address ceases to respond to pings. In this paper, we focus on simultaneous disruptions of multiple addresses that are related to each other by geography and ISP, and thus are indicative of a shared cause. Using binomial testing, we detect groups of per-IP disruptions that are unlikely to have happened independently. We characterize these dependent disruption events and present results that challenge conventional wisdom on how such outages affect Internet address blocks.
Ramakrishna Padmanabhan, Aaron Schulman, Alberto Dainotti, Dave Levin, Neil Spring

### On DNSSEC Negative Responses, Lies, and Zone Size Detection

Abstract
The Domain Name System (DNS) Security Extensions (DNSSEC) introduced additional DNS records (NSEC or NSEC3 records) into negative DNS responses, which records can prove there is no translation for a queried domain name. We introduce a novel technique to estimate the size of a DNS zone by analyzing the NSEC3 records returned by only a small number of DNS queries issued. We survey the prevalence of the deployment of different variants of DNSSEC negative responses across a large set of DNSSEC-signed zones in the wild, and identify over 50% as applicable to our measurement technique. Of the applicable zones, we show that 99% are composed of fewer than 40 names.
Jonathan Demke, Casey Deccio

### Spectrum Protection from Micro-transmissions Using Distributed Spectrum Patrolling

Abstract
RF spectrum is a limited natural resource under a significant demand and thus must be effectively monitored and protected. Recently, there has been a significant interest in the use of inexpensive commodity-grade spectrum sensors for large-scale RF spectrum monitoring. The spectrum sensors are attached to compute devices for signal processing computation and also network and storage support. However, these compute devices have limited computation power that impacts the sensing performance adversely. Thus, the parameter choices for the best performance must be done carefully taking the hardware limitations into account. In this paper, we demonstrate this using a benchmarking study, where we consider the detection an unauthorized transmitter that transmits intermittently only for very small durations (micro-transmissions). We characterize the impact of device hardware and critical sensing parameters such as sampling rate, integration size and frequency resolution in detecting such transmissions. We find that in our setup we cannot detect more than 45% of such micro-transmissions on these inexpensive spectrum sensors even with the best possible parameter setting. We explore use of multiple sensors and sensor fusion as an effective means to counter this problem.
Mallesham Dasari, Muhammad Bershgal Atique, Arani Bhattacharya, Samir R. Das

### Measuring Cookies and Web Privacy in a Post-GDPR World

Abstract
In response, the European Union has adopted the General Data Protection Regulation (GDPR), a legislative framework for data protection empowering individuals to control their data. Since its adoption on May 25th, 2018, its real-world implications are still not fully understood. An often mentioned aspect is Internet browser cookies, used for authentication and session management but also for user tracking and advertisement targeting.
In this paper, we assess the impact of the GDPR on browser cookies in the wild in a threefold way. First, we investigate whether there are differences in cookie setting when accessing Internet services from different jurisdictions. Therefore, we collected cookies from the Alexa Top 100,000 websites and compared their cookie behavior from different vantage points. Second, we assess whether cookie setting behavior has changed over time by comparing today’s results with a data set from 2016. Finally, we discuss challenges caused by these new cookie setting policies for Internet measurement studies and propose ways to overcome them.
Adrian Dabrowski, Georg Merzdovnik, Johanna Ullrich, Gerald Sendera, Edgar Weippl

### The Value of First Impressions

The Impact of Ad-Blocking on Web QoE
Abstract
We present the first detailed analysis of ad-blocking’s impact on user Web quality of experience (QoE). We use the most popular web-based ad-blocker to capture the impact of ad-blocking on QoE for the top Alexa 5,000 websites. We find that ad-blocking reduces the number of objects loaded by 15% in the median case, and that this reduction translates into a 12.5% improvement on page load time (PLT) and a slight worsening of time to first paint (TTFP) of 6.54%. We show the complex relationship between ad-blocking and quality of experience - despite the clear improvements to PLT in the average case, for the bottom 10 percentile, this improvement comes at the cost of a slowdown on the initial responsiveness of websites, with a 19% increase to TTFP. To understand the relative importance of this trade-off on user experience, we run a large, crowd-sourced experiment with 1,000 users in Amazon Turk. For this experiment, users were presented with websites for which ad-blocking results in both, a reduction of PLT and a significant increase in TTFP. We find, surprisingly, 71.5% of the time users show a clear preference for faster first paint over faster page load times, hinting at the importance of first impressions on web QoE.
James Newman, Fabián E. Bustamante

### Web Performance Pitfalls

Abstract
Web performance is widely studied in terms of load times, numbers of objects, object sizes, and total page sizes. However, for all these metrics, there are various definitions, data sources, and measurement tools. These often lead to different results and almost all studies do not provide sufficient details about the definition of metrics and the data sources they use. This hinders reproducibility as well as comparability of the results. This paper revisits the various definitions and quantifies their impact on performance results. To do so we assess Web metrics across a large variety of Web pages.
Amazingly, even for such “obvious” metrics as load times, differences can be huge. For example, for more than 50% of the pages, the load times vary by more than 19.1% and for 10% by more than 47% depending on the exact definition of load time. Among the main culprits for such difference are the in-/exclusion of initial redirects and the choice of data source, e.g., Resource Timings API or HTTP Archive (HAR) files. Even “simpler” metrics such as the number of objects per page have a huge variance. For the Alexa 1000, we observed a difference of more than 67 objects for 10% of the pages with a median of 7 objects. This highlights the importance of precisely specifying all metrics including how and from which data source they are computed.
Theresa Enghardt, Thomas Zinner, Anja Feldmann

### Characterizing Web Pornography Consumption from Passive Measurements

Abstract
Web pornography represents a large fraction of the Internet traffic, with thousands of websites and millions of users. Studying web pornography consumption allows understanding human behaviors and it is crucial for medical and psychological research. However, given the lack of public data, these works typically build on surveys, limited by different factors, e.g., unreliable answers that volunteers may (involuntarily) provide.
In this work, we collect anonymized accesses to pornography websites using HTTP-level passive traces. Our dataset includes about $$15\,000$$ broadband subscribers over a period of 3 years. We use it to provide quantitative information about the interactions of users with pornographic websites, focusing on time and frequency of use, habits, and trends. We distribute our anonymized dataset to the community to ease reproducibility and allow further studies.
Andrea Morichetta, Martino Trevisan, Luca Vassio

### Backmatter

Weitere Informationen