Blocking List from UT1

vinifa · November 26, 2024, 2:57am

Good evening, I’d like to contribute something; I’m not sure if it will help anyone or if something similar has already been done, but here it goes... I’m using Pi-hole not only to remove ads but also to filter content on my network, similar to how SquidGuard works.

Since Pi-hole uses a list-based system, it helps a lot with this task. However, it has the characteristic of requiring 0.0.0.0 in front of each URL, which I believe acts as if the domain resolves to that IP, effectively blocking it by redirecting to "null."

With this, I started using the UT1 list, which is frequently updated—almost daily, I’d say. So, I had the idea to create a shell script (with some help from ChatGPT XD) to automate this process for me.

Once a day, I update the lists and send them to another server with Pi-hole installed so that both servers use the same list.

Here’s the script:

update_list_with_sync.sh.txt (3,0,KB)

The transfer to the remote server was done using the shared SSH key, meaning I can access the other server without the need to enter a password.

I hope I have contributed to the community. Cheers!

rdwebdesign · November 26, 2024, 4:12am

Pi-hole doesn't require 0.0.0.0.

Pi-hole accepts lists using this format, but any IP in front of the domain is removed. Only the domains are stored.

See the comment number 7 (line 685) and the corresponding code:

github.com/pi-hole/pi-hole

gravity.sh

7128c18b4


      
          # 2) Remove carriage returns
          # 3) Remove lines starting with ! (ABP Comments)
          # 4) Remove lines starting with [ (ABP Header)
          # 5) Remove lines containing ABP extended CSS selectors ("##", "#!#", "#@#", "#?#") preceded by a letter
          # 6) Remove comments (text starting with "#", include possible spaces before the hash sign)
          # 7) Remove leading tabs, spaces, etc. (Also removes leading IP addresses)
          # 8) Remove empty lines
          
            sed -i -r \
            -e 's/\r$//' \
            -e 's/\s*!.*//g' \
            -e 's/\s*\[.*//g' \
            -e '/[a-z]\#[$?@]{0,1}\#/d' \
            -e 's/\s*#.*//g' \
            -e 's/^.*\s+//g' \
            -e '/^$/d' "${destination}"

Note:
Pi-hole only uses domains, not URLs.

Bucking_Horn · November 26, 2024, 8:52am

You are processing only a subset of archives listed by the university of Toulouse, and they seem to be a mixture of block and allow lists, as the usage file seems to declare.

Note that those archives contain files that are not compatible with Pi-hole.

It seems your script correctly deals with this by extracting the domains file only, but it doesn't inspect usage. Users trying to blindly download all lists may accidentally block domains intended to be allowed (e.g. in the bank, update or liste_blanche archives).

Above means that you can use the domains file as is, i.e. you do not need to

# Edit the file by adding "0.0.0.0" at the beginning of each line
sed -i 's/^/0.0.0.0 /' "$var_name"

Your script unconditionally tries to copy files and initiate a gravity update on a remote Pi-hole.
Users may run one instance of Pi-hole only, and if they run multiple Pi-holes, they may already handle syncing them.
You should consider removing that part, or make it optional, so it wouldn't be triggered by default.