1 Introduction
2 Background information on benchmarking DNS64 servers
2.1 Benchmarking methodology for DNS64 servers
2.2 The operation of dns64perf ++ in a nutshell
dns64perf ++
is described in detail in our open access paper [11], therefore, we only give a very short summary of it.dns64perf ++
uses the following namespace: {0.0.255}-{0.0.255}-{0.0.255}-{0.0.255}.dns64perf.test. During DNS64 tests, these names are resolved to IPv4 address. (For example, 000-001-002-003.dns64perf.test is resolved to 0.1.2.3.) A subset of the namespace to be used can be expressed by the IPv4 address range, which the domain names are mapped to. E.g. 10.0.0.0/8 means a name space of 224 size with the following names: 010-{0.0.255}-{0.0.255}-{0.0.255}.dns64perf.test.dns64perf ++
can send only requests for “AAAA” records. (The domain name in the above example can be mapped to 2001:db8::0.1.2.3.)dns64perf ++
program is to perform one elementary test, and a bash shell script is used to perform the binary search and its 20 repetitions.dns64perf ++
used only two threads (one thread for sending the queries and another thread for receiving the replies), and it was capable of testing up to 200,000 qps rate [11]. When we tested its accuracy, we have discovered a bug in its timing algorithm, which made it unreliable over about 50,000qps rate, and we have corrected it and rechecked its accuracy [12]. We have also enabled dns64perf ++
for benchmarking the caching performance of DNS64 servers [13], which is an optional test of RFC 8219.dns64perf ++
, has made different developments on dns64perf ++
so that it can be used for benchmarking up to several million queries per second. The “multiport” feature [14] (latest commit d6fa119 on Oct 8, 2018) includes all the following ones:pthread_setaffinity_np()
, to avoid their wandering among CPU cores. Our modified source code is available from [15].2.3 Size of the name space
3 DNS implementations and their settings
3.1 BIND
--with-tuning = large
option to test their performance difference./etc/bind/named.conf.local
file:
3.2 YADIFA
/etc/yadifa/yadifad.conf
:
/var/lib/yadifa
.3.3 NSD
server-count
is set to higher than 1 [21]./etc/nsd/nsd.conf
file:
server-count
option. Therefore, we always set this value to the number of active CPU cores.3.4 Knot DNS
/etc/knot/knot.conf
:
3.5 FakeDNS
mtd64-ng
DNS64 server to eliminate the need for an authoritative DNS server, when performing DNS64 benchmarking [24]. FakeDNS does not use a zone file, and it can serve only the name space used by dns64perf ++
: it simply takes the information from the first label (e.g. 000-001-002-003) to calculate the appropriate IPv4 address (e.g. 0.1.2.3). As it does not use a zone file, it starts very fast and it uses only a very low amount of memory. Similarly to mdt64-ng
, FakeDNS is also multi-threaded. As it did not provide the required performance during our preliminary measurements, Dánial Bakai has developed an experimental feature, called as “moreproc”. It starts a separate process for every single CPU core, and a modified version of iptables
is used to distribute the requests among the processes./etc/fakedns.conf
as follows:
dns64perf ++
sends its requests to port 53, we used a special kernel module and iptables
patch prepared by Dánial Bakai, which could rewrite the destination port numbers in the requests and the source port number in the replies. The requests were distributed equally among the fakedns
processes using the nth
mode of the statistics
module of iptables
.4 Measurements
4.1 Hardware and software environment
dns64perf ++
. The CPU clock frequency of the DUT could vary from 1.2 to 2.1 GHz. The CPU clock frequency scaling governor was set to “performance” for all active CPU cores on both computers.maxcpus = N
kernel parameter to activate N number of CPU cores at boot time. We tested with 1, 2, 4, 8, 16 and 32 active CPU cores.4.2 Receive-side scaling
4.3 Execution of the measurements
dns64perf ++
program can be used to decide, if the system can serve all the requests at the specified rate during the 60 s long time interval required by RFC 8219. The highest such rate can be determined by a binary search. As for DNS64 measurements, RFC 8219 requires to execute the binary search at least 20 times, and the final result is the median of the 20 results, whereas the first percentile and the 99th percentile are used to express the indices of dispersion, which are the minimum and maximum, if the number of repetitions of the binary search is less than 100.4.4 Measuring further important quantities
5 Results
5.1 BIND
/var/log/syslog
file. At one and two cores, the number of UDP listeners was equal with the number of active CPU cores, however, from 4 cores, the number of UDP listeners was only the half of the number of active CPU cores. Thus the number of listeners beacame a bottleneck at four active CPU cores.
--with-tuning = large
option is shown in Fig. 4. On the one hand, the single core performance increased drastically from 10,564 to 57,670 qps compared with the previous case, however, on the other hand, there are problems with the scale up. The phenomenon that the performance did not increase from one to two active CPU cores can be explained by the fact that BIND used a single UDP listener in both cases. From four active CPU cores, BIND used one less UDP listeners than the number of active CPU cores. Unfortunately, the performance showed decrease from 16 to 32 cores. The high difference between the first percentile and the 99th percentile of the results from 4 to 32 cores is another issue. (We need to consider the first percentile for DNS64 benchmarking.)
5.2 YADIFA
5.3 NSD
5.4 Knot DNS
5.5 FakeDNS
5.6 Performance comparison
5.7 Memory consumption and zone load time
5.8 Recommendation for DNS64 benchmarking
5.9 Discussion of FakeDNS
mtd64-ng
, an experimental DNS64 server [24].5.10 Construction of a high performance DNS64 benchmarking system
6 Plans for future research
mtd64-ng
our tiny DNS64 proxy [24] and its benchmarking, which became feasible by our current results.