Cloudflare DoH: Netflix problems on SmartTV

ftldns-beta
doh

#7

i`m running 4 TVs all streaming Netflix, FTLDNS does not crash.


#8

Okay, so this very much sounds like it is no FTLDNS problem at all, but a problem of Cloudflare’s DoH and/or dnscrypt. FTLDNS doesn’t care (as much as dnsmasq didn’t, either) if you are sending your queries to 1.1.1.1, 8.8.8.8, 9.9.9.9 or wherever on whatever port.

Looking at @Jason_A’s log excerpt, it seems like there are no new incoming queries to the Pi-hole after 13:16:43. However, FTLDNS is still running, as you can see in the messages that it stored queries in the database.

  • Does it only stop responding to the Netflix queries of your TV or does it stop responding to all queries for all devices in your network?
    Even if it would be FTLDNS’s fault, it isn’t clear to me how FTLDNS should even know the time you switched off your TV. The only thing I could imagine here is that somehow Cloudflare+DoH doesn’t answer / answers wrong to the netflix queries and your TV might then be sending thousands of queries per second to your Pi-hole keeping it 100% busy.
  • Can you confirm something like this from the statistics on your dashboard?
  • Also, could you please try using another DNS upstream provider (preferably not 1.1.1.1) to see if you can confirm that it is a problem on their side?

@Tntdruid is one of them also a Samsung device? I have no (Smart)TV, so I cannot test this at all.


#9

@DL6ER Yes, all 4 are Samsung TVs


#10

Okay, but you are not using this DoH or dnscrypt, right? Which upstream DNS servers are you using?


#11

@DL6ER No i dont use dnscrypt

PIHOLE_DNS_1=1.1.1.1
PIHOLE_DNS_2=1.0.0.1
PIHOLE_DNS_3=2606:4700:4700::1111
PIHOLE_DNS_4=2606:4700:4700::1001

Cloudflare


#12

@Tntdruid, any chance you could try setting up DoH (as per this guide) and seeing if you experience the same issues?

For the pi-hole part, just blank out the PIHOLE_DNS_X lines, create /etc/dnsmasq.d/50-cloudflared.conf (containing SERVER=127.0.0.1#5053), and then run pihole -r.

I have done this and have no issues with Netflix, but then again, I am not running a Samsung Smart TV

Edit: starting to see this using Netflix on XboxOne


#13

@PromoFaux Looke like DoH mess stuff up, going to remove it again.


#14

Yeah, probably wise. Dom and I are trying to debug it currently. You can actually still make queries over DoH, it’s just FTLDNS does not like this particular query via DoH for some reason…


#15

OK, it’s not FTLDNS, it’s dnsmasq. Can reproduce the same thing on a system with only dnsmasq


#16

Confirmed, dnsmasq stalls at

#0  0x00007ffff7b15700 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000555555566533 in read_write (fd=fd@entry=13, packet=packet@entry=0x7fffffffe334 "\002[\001", size=size@entry=1, rw=rw@entry=1) at util.c:654
#2  0x000055555557029c in tcp_request (confd=13, now=1523191598, local_addr=0x7fffffffe410, netmask=..., auth_dns=0) at forward.c:1713
#3  0x0000555555575999 in check_dns_listeners (now=1523191598) at dnsmasq.c:1745
#4  0x000055555555d019 in main (argc=<optimized out>, argv=<optimized out>) at dnsmasq.c:1061

I will contact Simon Kelly about it - I’m afraid there will be no quick solution to this… Please don’t use DoH meanwhile!


PSA: Issue With Pi-hole, DoH, and dnsmasq
#17

This might not necessarily help anything… But I have seen no issues with cloudflare DNS using DNS over TLS (via getdns and stubby). Maybe it would be useful to try the same use case but with the DNS over TLS to see if the prolem persists?

https://dnsprivacy.org/wiki/pages/viewpage.action?pageId=3145786


#18

Do you need to have netflix to trigger the problem or does a CURL to a certain netflix domain trigger the bug as well? For reproduction purposes?

It just so happens i’m currently testing DoH on firefox, which internally makes use of the same cloudflare solution.

have you tried enabling DNSSEC and tried to reproduce?


#19

The exact point at which dnsmasq crapped out was after these two queries:

Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 query[A] api-global.netflix.com from 192.168.0.6
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 forwarded api-global.netflix.com to 127.0.0.1
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.geo.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.5.237.4
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.44.197.215
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.45.38.64
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.45.118.37
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.15.52
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.22.121
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.174.212
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.242.21
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 query[A] api-global.netflix.com from 192.168.0.6
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.geo.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.242.21
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.174.212
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.22.121
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.15.52
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.45.118.37
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.45.38.64
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.44.197.215
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.5.237.4
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 query[A] ichnaea.netflix.com from 192.168.0.6
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 forwarded ichnaea.netflix.com to 127.0.0.1
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.geo.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.195.89.28
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.197.160.123
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.203.138.40
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.206.109.253
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.81.133.206
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.83.180.65
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.85.134.97
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.87.182.219
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 query[A] ichnaea.netflix.com from 192.168.0.6
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.geo.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.87.182.219
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.85.134.97
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.83.180.65
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.81.133.206
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.206.109.253
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.203.138.40
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.197.160.123
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.195.89.28

#20

thanks. I’ve changed my dns to the 5053 provider. ran curl on both addresses to trigger the DNS lookup (verified it triggered response via tcpdump)

but dnsmasq does not seem to mind or crash

I do run “proxy-dnssec” (as only other config option). Can anyone confirm that curl to those two addresses triggers the problem on their systems?

–Edit: borrowed a laptop with netflix and ran it across the DNS. see it run through 15 (?) netflix domains and a bunch of akamai stuff, but i can’t seem to trigger the problem

–edit2: it seems for me the actual cloudflare app crashed that time and dnsmasq reloaded to defaults :slight_smile: it is strange. but it does not seem to be any single url on its own


#21

This happens when Netflix is running on an Xbox on the network, I don’t think a regular URL would cause it. I did some more testing with tcpdump running and found the following - I’m not a network engineer so bear with me:

With the regular DNS servers set (“working” environment set to 1.1.1.1) it seems all the DNS queries and replies from the Xbox involving Netflix are done over UDP. When it is set to one of the “bad” environments, for some reason the Netflix app isn’t satisfied with the first UDP reply so it does another request via TCP instead of UDP. After this TCP reply, dnsmasq hangs.

I thought first that just the TCP request was the problem so I tested doing the same query via TCP from another machine and it didn’t crash it. I think the Netflix app isn’t properly ending the stream - or at least it looks different - compare the following:

Broken (X is Xbox):

  1. X > pi [SYN]
  2. pi > X [SYN, ACK]
  3. X > pi [ACK]
  4. X > pi Standard query A api-global.netflix.com
  5. pi > X [ACK]
  6. pi > X Standard query response
  7. X > pi [ACK]

When I did a similar query from another linux machine (X is linux machine) as dig @pi +tcp api-global.netflix.com it looks like this:

  1. X > pi [SYN]
  2. pi > X [SYN, ACK]
  3. X > pi [ACK]
  4. X > pi Standard query A api-global.netflix.com
  5. pi > X [ACK]
  6. pi > X Standard query response
  7. X > pi [ACK]
  8. X > pi [FIN, ACK]
  9. X > pi [FIN, ACK]
  10. pi > X [ACK]

I’m not sure if that matters or not, but it seemed like one of the only differences.

I also looked at the query before the TCP one, to see if there was something obvious about why it would change from UDP to TCP on the next query. The only thing I can tell is that in the “broken” case, the response is larger (647 bytes) - the answer section seems to show the entire hostname in the Hex view of the packet, while the “working” case it seems to be some kind of abbreviated response (281 bytes total) but the info showin in Wireshark is identical as far as the server names and addresses.


#22

Very interesting!

@Jason_A Could you try to prevent this TCP request from reaching your Pi-hole? Something like

sudo iptables -A INPUT -i eth0 -p tcp --destination-port 53 -j REJECT
sudo iptables -A INPUT -i eth0 -p udp --destination-port 53 -j ACCEPT

(provided your networking interface is called eth0)


#23

Seems happy with that - so far Netflix is running and I’m seeing the queries in the log, and it’s not breaking.


#24

Confirmed also working for me here running Netflix app on Xbox


pinned #25

#26

Can also confirm that Netflix is working after the iptables is added.