Pre-Parsed Lists For Everyone!


#1

As many are aware, most lists available are not fully compatible with Pi-Hole, and need to be parsed.

There are several ways to parse these lists.

I parse lists locally in a few different ways,

As a way to give back to the community, I put together a parser that uploads to Github, so that others may use the lists I’ve parsed.

Main Repository where you can see how the script works

https://github.com/deathbybandaid/piholeparser

Pre-Parsed lists for anybody to use

https://github.com/deathbybandaid/piholeparser/tree/master/parsed

All of those lists combined into one (because I know people will ask for it)

https://raw.githubusercontent.com/deathbybandaid/piholeparser/master/parsedall/1111ALLPARSEDLISTS1111.txt

Additional Info

  • I set this up so that lists can be added very easily, and the end result has (mildy) nice filenames.
  • Unparsed Lists are mirrored in the mirroredlists directory.
  • Lists that have to be extracted have a methodology as well.
  • This runs daily!

I am also fairly certain that I have more blocked domains than I have seen throughout the web.

My wife hasn’t complained about websites being blocked, and I haven’t had any major issues.

EDIT:

To clarify, adding 1111ALLPARSEDLISTS1111 does NOT give you the 3 million domains.


PSA

I missed a line in my installer, and the crojob will still work, however if you want it to update correctly, either reinstall or update the cronjob to

20 0 * * * sudo bash /etc/updaterunpiholeparser.sh

Look at the comments in the thread, lots of good questions and answers.


FilterLists
#2

Thanks for making them in one list, will do some testing before i deploy them at work, i dont want 10k angry clients :slight_smile:


#3

Thats… that’s a lot of clients


#4

I’d like to see some stats from a typical day.

What hardware do you use? the Pi?


#5

Hi,

nice job.

There are 2 txt files you have parsed from Easylist-GER, but I don’t think that the first bigger one is parsed from the original Easylist-GER file, or am I wrong? The second file has only 59 entries?
maybe you have some time to take a look.

thx, Frank


#6

The one is easylistgermany, the other is easylistgermany+easylist

I pulled most of the lists from filterlists.com


#7

Use to run whit 2x raspberry pi 3, SD cards died after like 7 days, was getting over 400mil dns query a day :slight_smile:


#8

The newer versions of raspbian can actually run off of a flashdrive, or a usb drive, maybe ssd!


#9

If in an enterprise environment, why not have the logs and lists, the ones that get abused the most, stored on a share or iSCSI target from a corporate NAS ?
eg.

mount -t nfs /NAS/pihole/var/log /var/log
mount -t nfs /NAS/pihole/etc/pihole /etc/pihole

In fstab of course.
Or use formatted iSCSI targets and mount similar.

Or setup free Xen virtualisation, a baremetal type 1 hypervisor, on two hosts so you can do failover/balancing and dont depend on SD cards anymore.
The Xenserver distro (Citrix owned) saves you setting up the entire Xen-project hypervisor environment manually, with I believe a nice GUI and so on.


#10

Ohw Ps. nice job @deathbybandaid :+1:


#11

Is 404 now? did you change the url?


#12

I made some changes to the script yesterday, and it apparently took that file down, I am working on fixing it right now though.


#13

It’s fixed now, and I’m now saving a chunk of processing time!


#14

Hi,

deleting all double/triple… entries will reduce urls from 269448 to 48791 :wink:


#15

I knew I forgot something! this should be fixed now, I’m doing a test run now


#16

I think that you also have to add a -L option in parser.sh in the curl activity for a file that is moved on the website and generate a 3XX response code. I have had that with one of the files and with the -L option I could download / process the file.
For example sudo curl --silent -L https://github.com/Ad

See https://curl.haxx.se/docs/manpage.html#-L


#17

I added that tweak, and I’m doing a test run


#18

If anybody thinks that there is a better way to parse them, the file loop is the stongsuit of the script.


#19

Just stumbled upon this and really like the look of it.

Couple of questions…

I guess I run it daily in early hours using cron?

What is the purpose of the mirror files?


#20

This script is a work in progress, and the mirroredlists is there to have a reference of what a list was before parsings, it also gives credit where it is deserved.