Just sharing my blocklist; dbl.oisd.nl

#1

dbl.oisd.nl | info: https://redd.it/9xwwwy

3 Likes

#5

Step 6 is something that interests me. There are a lot of lists containing death domains.

So having a list of listed, but not resolvable domains would be a help to clean other lists that imported in Pi-hole and so reducing death content.

So even the garbage has a value in the world of domain blocking.

0 Likes

#6

This consumes a lot of days of continues dig commands to go trough the entire list.
Maybe there’s a faster way that I do not know of.
Still figuring out how often to do this…
I guess this also sets my list apart from others?
Please don’t be upset for me not sharing

0 Likes

#8

If you are scanning for invalid domains, perhaps only do that once before you add a new domain (identify the new domains before you add them to your existing list). Scan those, and the ones that are valid go to the master list. Then scan the master list occasionally to see if any of the domains died.

0 Likes

#9

Finding those bad domains could be a group effort and it would greatly improve the quality of the lists.

The big advantage would be that there are not several lists but that a cetral list would clean any list.
Scanning on adding will remove very short lived domains. Repeat this every two weeks. On a nxdomain/error then re-test those 48 hours later before moving them to the final bad domain list.

The group effort would be that lists or sections of lists can be loaded and marked in progress. So the next requester gets an different list/section.

The final bad domain list is then uploaded after processing and added to the central list.

Qname would not trigger any DDOS alarms because you are requesting the DNS servers directly and not one upstream server.

1 Like

#10

That is how I do it.
Well actually a little different;
As soon are there are new domains, they will be added.
The DNS check lags behind max 24 hours.
This is by design, I rather have some dead domains in my blocklist and have the latest blocking protection, then to have no dead domains in it and miss out on the latest protection.

0 Likes

#11

That sound like a good plan.
I however lack the time to set this up myself atm.

0 Likes

#12

Next brainstorm run.

The returned bad domain list have to be compared to what original given as section of an list or a whole list. This to avoid advertisers to use this mechanism to ‘whitelist’ themselves on a domain blocklist.

‘Whitelisting’ can also work then on the influx side. An advertiser can poison blocklists with lots of domains that will return nxdomain/errors because they are controlling the authoritative DNS for that domain.
So the bad domain list, can only have separate domains and can’t be condensed in a regex filter.

What can be put in a regex filter, are Databased domains like r1---sn-5hne6nlk.googlevideo.com and this would be the best a static file. Such a list can filter the blocking lists on it’s way into gravity directly and also the OD-Domains (Out Dated Domains) list. The result is the same.

Example:
^r[0-9](\.|---)sn-[a-z0-9\-\.]+\.(googlesyndication|googlevideo)\.com$
Google syndication would also end up in my regex.list so it will be blocked completely.

I mentioned earlier Qname to check OD-Domains and Unbound could be a good candidate to provide this and it has also an interface which let you directly communicate with it.

https://www.nlnetlabs.nl/documentation/unbound/libunbound-tutorial-3/

The OD-Domain scanner can be part of Pi-hole and users can opt in to take part in scanning for those domains. Also there should be different interval settings to check for OD-Domains. One would be every 10 seconds a request and ten would be every second. For the brave under us there could be also the ‘ludicrous’ mode doing ten request per second of if you brave even more.

1 Like

#13

Please, make a mirror to gitlab! github is from microsoft :frowning:

This regex looks very nice, but i found this same regex on reddit and there is someone that says after a while youtube video’s going to broken. after remove the regex, youtube video’s working again. What can you tell us about this regex? (i know its was a example, but it looks nice)

0 Likes

split this topic #14

2 posts were split to a new topic: Off-Topic

0 Likes

split this topic #15

A post was merged into an existing topic: Off-Topic

0 Likes

#16

I ran a NXDOMAIN detection script on the first 500 domains on my gravity list and got 93 hits and that is almost 20%.

Also ran my regex filtering on dbl.oisdb list which is plus 1.4 million entries. My own build regex was already covering almost 450.000 of the domains on that list.

The dynamic optimizer made obsolete, an other 100.000 domains and replaced those by 820 regex entries.

  [i] Creating regex cleaner file for gravity.list
  [â] 121 regex filters found
  [i] reducing the number of blocking domains with the default regex.
  [i] 1444898 (sub)domains are checked against default regex.
  [â] 1046150 (sub)domain left after applying default regex.

  [i] looking for sub.domains that can be used as wilcard when more than 20 and has a bare domain
  [â] 820 wildcard domains found to be included in regex.list

  [i] Applying regex.clean filtering
  [â] writing gravity.list with 918618 domains remaining

In the end the list was reduced in size by a third which were taken over by 941 regex entries.

1 Like

#17

Awesome.

Now we only need a pihole option to auto download regex lists and we’re all set.

0 Likes

#18

I don’t see that happening soon.

The script to list the NXDOMAIN entries is not yet error free so more testing and trying. The speed is not great and 500 domains takes about 3 minutes.

Update.
Because I apply some optimizing to the final gravity I have to sequence things carefully.
The NXDOMAIN’s filter can only be applied the final gravity and also generated from a final gravity.list file.

I generate the NXDOMAIN list with a one liner existing of a dig/awk/sed/sed calls.

Dig reads the file and awk looks if there is a NXDOMAIN and it then ouputs the domain. The two sed’s do remove the leading “;” and tailing “.”

It was not easy but after a few hours running into death ends I found that RS resolved the problem. The whole output of dig can then be handled as one record.

1 Like

#19

I have done now 30000 domain and I have to tally up but 20% is a good estimate.

I still have to go 110K domains and I don’t want to run that soon really big scans.

I began with runs of 1000 and build up to 10K in one run. I could just let it run but if something goes wrong I can’t easy find what and where it bogged.

I run this on my Pihole:

dig A +noadflag -f /etc/pihole/gravity.partX | awk '{RS="->>HEADER<<-";}/NXDOMAIN,/{sub(/\.$/,"",$33); print substr($33,2)}' >> /etc/pihole/od-domains.list

Do not run just your gravity because your ISP will not like that as it looks like you are doing a DDOS on them. Use Unbound with Qname to spread your requests.
Use sections of your gravity.list and create gravity.part1 etc. to run.

0 Likes

#20

Completed the run for 140K domains and found + 29K NXDOMAINS and will run tomorrow or later the re-check.

  [i] Pi-hole cleaner - reducing the number of domains in gravity.list

  [i] Creating regex cleaner file for gravity.list
  [â] 137 regex filters found
  [i] Applying filter, regex.clean to list.preEventHorizon
  [i] reducing the number of obsolete blocking domains.
  [i] 242865 (sub)domains are checked against regex.wildcard
  [â] 176302 (sub)domain left after applying filter, regex.clean

  [i] looking for sub.domains that can be used as wilcard when more than 20 of them
  [â] 258 wildcard domains found to be included in regex.list

  [i] Applying wildcard filtering
  [â] writing gravity.list with 138770 domains remaining

  [i] Applying NXDOMAIN filtering
  [â] writing gravity.list with 108914 domains remaining
0 Likes

#21

Plot twist; I’ve checked the domains on my blocklist once before…
However… I still have the full list (raw unchecked domains)… over 4 million domains. I would be amazed if they’re all really dead.
I can provide you with a temp link, or mail it to you if you are interested.

0 Likes

#22

That is huge list and it would takes weeks to check for NXDOMAIN domains. I use small lists and my the list I use is 450000 entries and when converting is ready, it is little over 100000 without loosing any of their workings.

I am now setting up the eco system to maintain the list. I have to collect the new domains, to be checked, and remove the removed domains from the NXDOMAIN list.

A consequence is that the blocklist can not vary to much in the number of lines. Because that could wipe out the build NXDOMAIN list. Or give a huge list to be checked.

I have all base files present so testing it will be easy…thanks to the utility comm.

ps. those 820 regex domains mentioned is really heavy on the processor so an other solution is needed. Maybe Mmotti’s way is more suitable for that.

0 Likes

#23

The eco-system is also complete and new domains in blocklist are saved up and can be manually of automatic checked if put in a cron job.
Removed domains in the blocklist are also saved up and will be removed if present on the list with NXDOMAIN domains.

I have to some more testing and make it ‘user proof’ and like a old pump you first have to put water in to have it pumping water. You need a nxdomain.list so that this process is not stalling because it has to check lets say 4 million domains. :wink:

I will put it first live on my own Pihole and then put it here for the ones who want to use it.

0 Likes

#24

You also can use the dig command with different DNS servers, like: dig www.google.com @1.1.1.1, www.google.com @8.8.4.4., www.google.com @8.8.8.8, etc.
In this case requests are spread.
DNS servers to be use: 1.1.1.1, 8.8.8.8, 8.8.4.4, 8.26.56.26 and 8.20.247.20 (Commodo), 208.67.222.222 and 208.67.220.220 (OpenDNS), 199.85.126.10 and 199.85.127.10 (Norton), OpenNIC (see https://www.opennic.org)
A list of public DNS servers: https://public-dns.info/. Pro country you can download a plain txt file with all their IP addresses of that country.

You have to make a routine that chose 1 of them from the list before doing the dig.
Using this ‘construction’ the PI can run at full speed.

0 Likes