Protecting privacy and free speech through DNS pre-caching

armchair · March 4, 2017, 4:42pm

We all know the DNS system is pretty broken.

Your external DNS resolver (usually your ISP) knows which websites you visit by the DNS requests you make. In a not too distant future they might sell access to this data.
DNS requests aren't encrypted.
More governments are storing your DNS requests at the ISP level, like in the UK.

A solution is to copy large parts of the world's DNS data onto your Raspberry Pi-Hole. Pre-fill your DNS cache as it were.

This has a lot of advantages:

Really quick DNS resolving (it's local)
Privacy from profilers and advertisers that use DNS (like Google)
Protection form governments that log this data now (hello UK!)

And why not? You could fit your country's domain and the .com domain on a 8gb SD card. There are lots of providers who share this DNS data. Pi-Hole's could even replicate this data between them. In the real world DNS name-data doesn't really change all that often.

Pi-Hole could protect you from tracking in a much more complete way.

Biggest feature EVAR!

DanSchaper · March 4, 2017, 6:07pm

The problem that caching runs in to is stale data. With TTL's in the range of seconds for a lot of records, you'd be constantly reading/writing to the local cache to have current data. Otherwise you'd break the TLD you had local. We do deploy a cache, and it's a good medium between having current data, and minimizing the traffic flow. If you really really value your privacy, you don't use an upstream provide, you create your own. (Yes, as has been noted, I run my own upstreams that query the root servers directly, deploy DNSSEC and the connection between myself and the server is encrypted. The Pi-hole queries that as it's upstream.)

It's not a single program solution, and there really can't be with the nature of the DNS decentralized structure. (And you'd kill an SD card in a month or two with the level of writes and reads required otherwise.)

DL6ER · March 5, 2017, 1:46pm

I second @DanSchaper's assessment but I'm also interested in how you evaluate the size that would be needed.

armchair · March 7, 2017, 8:43pm

I asked the parties involved. The complete basic DNS for my country (domainname->nameserver IP) is 2,5 Gb. I've been researching this a bit

The storage thing is an issue, indeed. A USB SSD may be an option. I'd probably want a device with 8+ Gb of ram. Who knows what the Raspberry Pi 4 or even 5 will bring..

The TTL is not a problem, I asked about that as well. The amount of real changes for my country is measured in hundreds per hour. The website you're likely to visit will almost never change, it's mostly the smaller websites. Updating once a month should be fine.

And if the DNS is wrong you can still ask a DNS provider, as normal. Only I would want that request to be masked by 5 random other requests.

Let's be ghosts.

DanSchaper · March 7, 2017, 9:12pm

How do you know if the DNS is wrong/outdated/spoofed/redirected?

EWouters · March 8, 2017, 6:03am

Isn't it possible to make that a pi-hole feature?

DanSchaper · March 8, 2017, 6:16am

It's not something that you would do automated or scripted. It takes manual installation and configuration tuning to run an authoritative, caching, validating and recursive name server like bind or unbound. It's not something that we could add in currently. You need to handle access control, verification of authentication signatures, trust anchors and encryption with absolute trust in the generator of the keys and certificates. It's something a user should do themselves if they are at the point that they need that level of anonymity.

DanSchaper · March 8, 2017, 6:16am

https://wiki.archlinux.org/index.php/unbound
https://calomel.org/unbound_dns.html

armchair · March 10, 2017, 7:16am

@DanSchaper I thought I could solve the rare DNS-miss by using a small browser plugin that detects NXDOMAIN DNS errors like "domain not found on this server", and then requests an actual normal upstream query to a normal external DNS revolver. But it would mask that query by sending out 5 other fake queries at the same time, to create some plausible deniability.

I don't believe spoofing will be a big problem. If someone wanted to do that to you, they would do that anyway. With this system spoofing could be a smaller overall risk, since you don't send out queries in the first place. But I'm no expert, so: what's your opinion?

I think we are at a point where all users could do with that level of anonymity.

Do you see room for a 'third option', a middle ground, where the system offers 80% security at 20% of the effort? How far could you get if you created an automated version of what you have?

DanSchaper · March 10, 2017, 5:53pm

If you could create a browser plug in that would requery accurately and only on the domains that you wanted, you'd still be leaving a trail. I have thought about masking traffic, but the only thing that gets you is that now instead of whomever you believe is watching your traffic seeing you visit one site, they see you visit six sites. You're not getting rid of that true query, and any semi-professional analytics routine will see that you visiting google.com aiusne.kl jaifijaiiena.net and whoopwhoop.theritis is going to know the real from the fake. You're not trying to block your traffic from the local hacker consortium, you're trying to mask your traffic from a Governement Actor, and you lose, everytime, all the time.

Now, lets dismiss that argument, and look at local caching of an entire TLD. A look for today at http://www.dailychanges.com shows that for the .COM TLD there was the following:

.COM	NEW 136,169	DELETED 130,344	TRANSFERRED 209,116	TOTAL 128,130,769

So, you'd have to account for that. (That's just domain names, doesn't take in to consideration the number of DNS record changes, but that's a really hard number to nail down so I just went with the number of domain transactions which would require changes to DNS SOA records.)

I totally agree that users need to be in control of their information, and have some say to whom it is provided and released. That's why I donate my time to the Pi-hole project.

Automating the setup of your own upstream isn't something I'm looking to do, as I don't want to contribute to the proliferation of unsecured resolvers ripe for abuse. And doing the encryption and certification properly isn't something you should be scripting. I know there are instructions on how to do it, and there are scripts on how to do it, but it's a big no-no to leave the CA on the same device that is authenticating the clients. Compromise the CA and the whole house falls in on you. Scripts that just have you create a CA without explaining the ramification of that process are failing the users. False sense of security is worse than knowing you are running in the open. But if you know what you are doing, and use wise security practices, you could tweak something like GitHub - jedisct1/dnscrypt-server-docker: A Docker image for a non-censoring, non-logging, DNSSEC-capable, DNSCrypt-enabled DNS resolver and use that as a base to a setup that is as secure as a non-governement entity would be able to run.

So, know your enemy. And know as much as you can about their capabilities and then you know how to start to plan your approach. And as always, Pi-hole is NOT a security product, for that you need other software.

EWouters · June 16, 2017, 8:34am

As you have personal experience with all the aspects involved in managing a dns server I am probably wrong but there might be a way to setup a distributed dns server network. The architecture could be similar to how a bittorrent DHT works. It tells you which packets are the most scarce so you can download those first in order to keep the torrent alive. Instead of scarce packets older dns queries could be prioritized to be requeried and re-added to the distributed dns network.

This might not make sense, but I'm sure there is a way to setup a distributed dns network that is updated by the peers and supplies the peers with encrypted dns records (probably slower than current dns servers).

DanSchaper · June 16, 2017, 8:28pm

I'm sure there have been attempts to set up a distributed DNS system, but you will need some kind of centralized control over how to get to the information.

In the current DNS system, you have Authoritative servers for the TLD's, and they report who has authority to answer for the next level down, and then those servers say who has authority for the subdomains, etc. So you have a chain that you can follow up and down to find the valid response to your query. Add in DNSSEC and you can prove that chain and make sure that you are getting the valid response that the final Authoritative DNS server is true.

Distributed DNS would require some scheme to prove the chain of trust, and also a way to be able to know quickly how to get to the server that is Auth for that domain. Unless you have the entire chain stored locally and then listen for signed updates to verify that any changes to your chain are valid, it gets to be a really slow process that is hard to verify. You would need to have the entire phonebook at all times in order to be able to quickly look up the names and addresses, and to be able to update that phonebook when changes occur. Would become very burdensome very quickly. Other solutions and you're right back to a centralized scheme.