Cloudflare DoH: Netflix problems on SmartTV

@DL6ER Yes, all 4 are Samsung TVs

Okay, but you are not using this DoH or dnscrypt, right? Which upstream DNS servers are you using?

@DL6ER No i dont use dnscrypt

PIHOLE_DNS_1=1.1.1.1
PIHOLE_DNS_2=1.0.0.1
PIHOLE_DNS_3=2606:4700:4700::1111
PIHOLE_DNS_4=2606:4700:4700::1001

Cloudflare

@Tntdruid, any chance you could try setting up DoH (as per this guide) and seeing if you experience the same issues?

For the pi-hole part, just blank out the PIHOLE_DNS_X lines, create /etc/dnsmasq.d/50-cloudflared.conf (containing SERVER=127.0.0.1#5053), and then run pihole -r.

I have done this and have no issues with Netflix, but then again, I am not running a Samsung Smart TV

Edit: starting to see this using Netflix on XboxOne

@PromoFaux Looke like DoH mess stuff up, going to remove it again.

Yeah, probably wise. Dom and I are trying to debug it currently. You can actually still make queries over DoH, it's just FTLDNS does not like this particular query via DoH for some reason...

OK, it's not FTLDNS, it's dnsmasq. Can reproduce the same thing on a system with only dnsmasq

1 Like

Confirmed, dnsmasq stalls at

#0  0x00007ffff7b15700 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000555555566533 in read_write (fd=fd@entry=13, packet=packet@entry=0x7fffffffe334 "\002[\001", size=size@entry=1, rw=rw@entry=1) at util.c:654
#2  0x000055555557029c in tcp_request (confd=13, now=1523191598, local_addr=0x7fffffffe410, netmask=..., auth_dns=0) at forward.c:1713
#3  0x0000555555575999 in check_dns_listeners (now=1523191598) at dnsmasq.c:1745
#4  0x000055555555d019 in main (argc=<optimized out>, argv=<optimized out>) at dnsmasq.c:1061

I will contact Simon Kelly about it - I'm afraid there will be no quick solution to this... Please don't use DoH meanwhile!

3 Likes

This might not necessarily help anything... But I have seen no issues with cloudflare DNS using DNS over TLS (via getdns and stubby). Maybe it would be useful to try the same use case but with the DNS over TLS to see if the prolem persists?

https://dnsprivacy.org/wiki/pages/viewpage.action?pageId=3145786

Do you need to have netflix to trigger the problem or does a CURL to a certain netflix domain trigger the bug as well? For reproduction purposes?

It just so happens i'm currently testing DoH on firefox, which internally makes use of the same cloudflare solution.

have you tried enabling DNSSEC and tried to reproduce?

The exact point at which dnsmasq crapped out was after these two queries:

Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 query[A] api-global.netflix.com from 192.168.0.6
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 forwarded api-global.netflix.com to 127.0.0.1
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.geo.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.5.237.4
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.44.197.215
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.45.38.64
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.45.118.37
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.15.52
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.22.121
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.174.212
Apr  8 13:30:49 dnsmasq[16025]: 3 192.168.0.6/53524 reply api-global.latency.prodaa.netflix.com is 52.54.242.21
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 query[A] api-global.netflix.com from 192.168.0.6
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.geo.netflix.com is <CNAME>
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.242.21
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.174.212
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.22.121
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.54.15.52
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.45.118.37
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.45.38.64
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.44.197.215
Apr  8 13:30:49 dnsmasq[16025]: 4 192.168.0.6/50363 cached api-global.latency.prodaa.netflix.com is 52.5.237.4
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 query[A] ichnaea.netflix.com from 192.168.0.6
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 forwarded ichnaea.netflix.com to 127.0.0.1
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.geo.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.195.89.28
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.197.160.123
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.203.138.40
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 34.206.109.253
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.81.133.206
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.83.180.65
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.85.134.97
Apr  8 13:53:07 dnsmasq[25121]: 119 192.168.0.6/52192 reply ichnaea.latency.prodaa.netflix.com is 54.87.182.219
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 query[A] ichnaea.netflix.com from 192.168.0.6
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.geo.netflix.com is <CNAME>
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.87.182.219
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.85.134.97
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.83.180.65
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 54.81.133.206
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.206.109.253
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.203.138.40
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.197.160.123
Apr  8 13:53:07 dnsmasq[25121]: 120 192.168.0.6/50382 cached ichnaea.latency.prodaa.netflix.com is 34.195.89.28

thanks. I've changed my dns to the 5053 provider. ran curl on both addresses to trigger the DNS lookup (verified it triggered response via tcpdump)

but dnsmasq does not seem to mind or crash

I do run "proxy-dnssec" (as only other config option). Can anyone confirm that curl to those two addresses triggers the problem on their systems?

--Edit: borrowed a laptop with netflix and ran it across the DNS. see it run through 15 (?) netflix domains and a bunch of akamai stuff, but i can't seem to trigger the problem

--edit2: it seems for me the actual cloudflare app crashed that time and dnsmasq reloaded to defaults :slight_smile: it is strange. but it does not seem to be any single url on its own

This happens when Netflix is running on an Xbox on the network, I don't think a regular URL would cause it. I did some more testing with tcpdump running and found the following - I'm not a network engineer so bear with me:

With the regular DNS servers set ("working" environment set to 1.1.1.1) it seems all the DNS queries and replies from the Xbox involving Netflix are done over UDP. When it is set to one of the "bad" environments, for some reason the Netflix app isn't satisfied with the first UDP reply so it does another request via TCP instead of UDP. After this TCP reply, dnsmasq hangs.

I thought first that just the TCP request was the problem so I tested doing the same query via TCP from another machine and it didn't crash it. I think the Netflix app isn't properly ending the stream - or at least it looks different - compare the following:

Broken (X is Xbox):

  1. X > pi [SYN]
  2. pi > X [SYN, ACK]
  3. X > pi [ACK]
  4. X > pi Standard query A api-global.netflix.com
  5. pi > X [ACK]
  6. pi > X Standard query response
  7. X > pi [ACK]

When I did a similar query from another linux machine (X is linux machine) as dig @pi +tcp api-global.netflix.com it looks like this:

  1. X > pi [SYN]
  2. pi > X [SYN, ACK]
  3. X > pi [ACK]
  4. X > pi Standard query A api-global.netflix.com
  5. pi > X [ACK]
  6. pi > X Standard query response
  7. X > pi [ACK]
  8. X > pi [FIN, ACK]
  9. X > pi [FIN, ACK]
  10. pi > X [ACK]

I'm not sure if that matters or not, but it seemed like one of the only differences.

I also looked at the query before the TCP one, to see if there was something obvious about why it would change from UDP to TCP on the next query. The only thing I can tell is that in the "broken" case, the response is larger (647 bytes) - the answer section seems to show the entire hostname in the Hex view of the packet, while the "working" case it seems to be some kind of abbreviated response (281 bytes total) but the info showin in Wireshark is identical as far as the server names and addresses.

2 Likes

Very interesting!

@Jason_A Could you try to prevent this TCP request from reaching your Pi-hole? Something like

sudo iptables -A INPUT -i eth0 -p tcp --destination-port 53 -j REJECT
sudo iptables -A INPUT -i eth0 -p udp --destination-port 53 -j ACCEPT

(provided your networking interface is called eth0)

1 Like

Seems happy with that - so far Netflix is running and I'm seeing the queries in the log, and it's not breaking.

3 Likes

Confirmed also working for me here running Netflix app on Xbox

1 Like

Can also confirm that Netflix is working after the iptables is added.

2 Likes

I have modified the DNS resolver slightly. It does now spawn child processes for handling individual TCP queries. By this, Netflix (or any other application) shouldn't be able to claim the resolver for itself.

@PromoFaux @vstoms @Jason_A (and anyone lese
I'd be very thankful if you could try again with the most recent version of FTLDNS after removing those iptables rules. There is no guarantee that this will work as I'm not able to reproduce it myself.

3 Likes

@DL6ER is this the correct version to test?
Current Pi-hole version is v3.3.1-74-g557372b
Current AdminLTE version is v3.3-92-ga83592b
Current FTL version is vDev-3656ba2