Blocklists, how are those maintained?

Hi,
I was just scrolling in one of the hosts files that are being imported https://mirror1.malwaredomains.com/files/justdomains and noticed quite a few domains of which the server-IP-address of zeroclan.net cannot be retrieved.
dig returns texts like this

dig http://kisker.czisza.hu/

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> http://kisker.czisza.hu/
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 898
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;http://kisker.czisza.hu/. IN A

;; AUTHORITY SECTION:
. 3058 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2020101602 1800 900 604800 86400

;; Query time: 21 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Oct 16 23:14:33 CEST 2020
;; MSG SIZE rcvd: 128

I was wondering; are those lists being maintained?
Or is such a non retrievable address still a threat.

I tested several with pihole enabled and pihole disabled.

Checking all domains in the hosts files for having an IP address should be easy by writing a script. (which I am unable to)

You are digging for a URL, which will not work. Dig is a process for DNS resolution, and works at the domain level. The correct command would be:

dig kisker.czisza.hu

You will likely find the domain blocked by Pi-hole, since the domain is on one of the blocklists that ships with Pi-hole. This will not stop Pi-hole from attempting to load a blocklist at that domain, since Pi-hole bypasses itself for blocklist downloads.

pihole -q kisker.czisza.hu

 Match found in https://mirror1.malwaredomains.com/files/justdomains:
   kisker.czisza.hu

What is the output of this command from your Pi terminal - this will print all your current adlists:

sqlite3 /etc/pihole/gravity.db "SELECT address FROM adlist;"

Hi,
Thanks for digging into my question.
Your rectification on howto search is both helpfull and not..

dig kisker.czisza.hu

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> kisker.czisza.hu
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12378
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;kisker.czisza.hu.              IN      A

;; ANSWER SECTION:
kisker.czisza.hu.       2       IN      A       0.0.0.0

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Oct 17 13:27:11 CEST 2020
;; MSG SIZE  rcvd: 50

pi@RPI3:~ $ pihole disable
  [i] Disabling blocking
  [✓] Flushing DNS cache
  [✓] Pi-hole Disabled
pi@RPI3:~ $ dig kisker.czisza.hu

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> kisker.czisza.hu
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 17715
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;kisker.czisza.hu.              IN      A

;; AUTHORITY SECTION:
czisza.hu.              3600    IN      SOA     ns1.webspacecontrol.com. domreg.dotroll.com. 202010                                          1703 86400 7200 3600000 3600

;; Query time: 152 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Oct 17 13:27:38 CEST 2020
;; MSG SIZE  rcvd: 119

When I dig bing.com, there is an ANSWER SECTION which is filled with existing IP addresses

;; ANSWER SECTION:
bing.com.               3600    IN      A       13.107.21.200
bing.com.               3600    IN      A       204.79.197.200

The non-existing IP address of (in this example) kisker.czisza.hu does not popp an ANSWER section but an AUTHORITY SECTION saying
czisza.hu. 3600 IN SOA ns1.webspacecontrol.com. domreg.dotroll.com. 202010 1703 86400 7200 3600000 3600
PiHole is doing what it should do:

sqlite3 /etc/pihole/gravity.db "SELECT address FROM adlist;"
https://mirror1.malwaredomains.com/files/justdomains
https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts

What I am trying to ask is this:
Apparently kisker.czisza.hu does no longer exist because no matching IP address can be retrieved.
So my conclusion is that this address and many others in the list do no longer exist and do no longer need being blocked.
These addresses can be skipped from the list, I think.
Or do I miss something here.

Blocklist content is not being provided or maintained by the Pi-hole team, so that would be a question for the respective list maintainer.

But indeed, a least by telling from the website for the blocklist you mention, that particular list seems somehow stale since mid 2018.

I have just set my experiment raspberry model B to work with this chunk

wc -l hosts3
55522 hosts3
rpi1@RPI1:~ $ time wget --spider -nd -T 10 -t 2 -v -H --max-redirect 0 -i hosts3 2>&1 | grep  'unable to resolve' > hosts-non-existent

the chunk is the cleaned contents of wget https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts starting with
n2019cov.000webhostapp.com
and ending with
zy16eoat1w.com

Presume it will take days

Hi,
My RPI has been checking malwaredomains which obviously hasn't been maintained lately.
These are the results.malwaredomains-true.txt (205.9 KB) malwaredomains-false.txt (374.8 KB) malwaredomains-all.txt (580.7 KB)
You can check the domain names with dig

RPI has checked stevenblack list from github as well.
stevenblack-true.txt (804.2 KB) stevenblack-false.txt (272.1 KB) stevenblack-all.txt (1.1 MB)

And both combined to hosts-combined.
hosts-combined.txt (1010.1 KB)
This reduced total hosts from 82377 to 51498 sites.
For a computer this is no big deal.
For me it was a nice awk and sed excercise.

Why do you want to check every domain on your lists for NXDOMAIN responses?

Checking all of them is pretty much a waste of CPU cycles and bandwidth. If the domains (almost all of which you will never attempt to load in actual use of your network) resolve to NXDOMAIN, then the client gets nothing in return. There is no benefit to checking them all and removing those that are NXDOMAIN. This might reduce RAM use by a few percent at most, but the tradeoff is that you had to look up every domain on the list.

What would you like the developers to do with this information?

@jfb & @DanSchaper
Thank you for looking into this.
Like I mentioned before for the RPI it is no big deal to have to scan 50.000 or 100.000 records, so it's trivial to check and clean those lists.

Because it's Corona time and I was looking at those lists out of curiosity.
What triggered me was that when I picked some sites at random the majority of these picks were answered negative in a dig lookup.
So that's were I started to sharpen my Linux skills.
It even triggered a warning from my provider.

What I'd like the developers to do with this information is to think of using a list that is being maintained on a regular basis lik fi this one.

Bottomline
In the end the choices being made are not really important.
On this forum I see several requests for additional hosts files but those don't add much as long as the hosts files being used by PiHole cover mainstream sites to be blocked.
In fact I am in favor of the way Privoxy handles filtering but it's not a real alternative as it turns out to be dead slow.

One could even bring up the question if PiHole is needed (not mentioning the added bonusses) because one can easily import a hosts file in RPI or any other computer.

So I'm using PiHole because of the ease and sophistication and I'm gratefull to guys for developing and maintaining it.

We don't maintain, endorse, create or modify lists. That's all up to the user.

This is a good source for reputable blocklists. Add as desired and delete the default blocklist you don't like.

Firebog.net

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.