Pre-Parsed Lists For Everyone!

deathbybandaid · May 17, 2017, 3:03am

As many are aware, most lists available are not fully compatible with Pi-Hole, and need to be parsed.

There are several ways to parse these lists.

I parse lists locally in a few different ways,,,,

As a way to give back to the community, I put together a parser that uploads to Github, so that others may use the lists I've parsed.

Main Repository where you can see how the script works

https://github.com/deathbybandaid/piholeparser

Pre-Parsed lists for anybody to use

https://github.com/deathbybandaid/piholeparser/tree/master/parsed

All of those lists combined into one (because I know people will ask for it)

https://raw.githubusercontent.com/deathbybandaid/piholeparser/master/parsedall/1111ALLPARSEDLISTS1111.txt

Additional Info

I set this up so that lists can be added very easily, and the end result has (mildy) nice filenames.
Unparsed Lists are mirrored in the mirroredlists directory.
Lists that have to be extracted have a methodology as well.
This runs daily!

I am also fairly certain that I have more blocked domains than I have seen throughout the web.

My wife hasn't complained about websites being blocked, and I haven't had any major issues.

EDIT:

To clarify, adding 1111ALLPARSEDLISTS1111 does NOT give you the 3 million domains.

PSA

I missed a line in my installer, and the crojob will still work,,, however if you want it to update correctly, either reinstall or update the cronjob to

20 0 * * * sudo bash /etc/updaterunpiholeparser.sh

Look at the comments in the thread, lots of good questions and answers.

Tntdruid · May 17, 2017, 8:03am

Thanks for making them in one list, will do some testing before i deploy them at work, i dont want 10k angry clients

PromoFaux · May 17, 2017, 8:31am

Thats... that's a lot of clients

deathbybandaid · May 17, 2017, 11:52am

I'd like to see some stats from a typical day.

What hardware do you use? the Pi?

frankrpi3 · May 17, 2017, 5:28pm

Hi,

nice job.

There are 2 txt files you have parsed from Easylist-GER, but I don't think that the first bigger one is parsed from the original Easylist-GER file, or am I wrong? The second file has only 59 entries?
maybe you have some time to take a look.

thx, Frank

deathbybandaid · May 17, 2017, 5:34pm

The one is easylistgermany, the other is easylistgermany+easylist

I pulled most of the lists from filterlists.com

Tntdruid · May 19, 2017, 6:13pm

Use to run whit 2x raspberry pi 3, SD cards died after like 7 days, was getting over 400mil dns query a day

deathbybandaid · May 19, 2017, 7:06pm

The newer versions of raspbian can actually run off of a flashdrive,, or a usb drive,, maybe ssd!

deHakkelaar · May 19, 2017, 7:39pm

If in an enterprise environment, why not have the logs and lists, the ones that get abused the most, stored on a share or iSCSI target from a corporate NAS ?
eg.

mount -t nfs /NAS/pihole/var/log /var/log
mount -t nfs /NAS/pihole/etc/pihole /etc/pihole

In fstab of course.
Or use formatted iSCSI targets and mount similar.

Or setup free Xen virtualisation, a baremetal type 1 hypervisor, on two hosts so you can do failover/balancing and dont depend on SD cards anymore.
The Xenserver distro (Citrix owned) saves you setting up the entire Xen-project hypervisor environment manually, with I believe a nice GUI and so on.

deHakkelaar · May 19, 2017, 7:51pm

Ohw Ps. nice job @deathbybandaid

Tntdruid · May 20, 2017, 3:16pm

Is 404 now? did you change the url?

deathbybandaid · May 20, 2017, 3:21pm

I made some changes to the script yesterday, and it apparently took that file down,, I am working on fixing it right now though.

deathbybandaid · May 20, 2017, 4:12pm

It's fixed now,,, and I'm now saving a chunk of processing time!

frankrpi3 · May 24, 2017, 5:53pm

Hi,

deleting all double/triple... entries will reduce urls from 269448 to 48791

deathbybandaid · May 24, 2017, 6:03pm

I knew I forgot something! this should be fixed now,,, I'm doing a test run now

eejeel · May 24, 2017, 9:16pm

I think that you also have to add a -L option in parser.sh in the curl activity for a file that is moved on the website and generate a 3XX response code. I have had that with one of the files and with the -L option I could download / process the file.
For example sudo curl --silent -L ad (Daniel Apatin) · GitHub.........

See curl - How To Use

deathbybandaid · May 24, 2017, 9:56pm

I added that tweak, and I'm doing a test run

deathbybandaid · May 24, 2017, 10:02pm

If anybody thinks that there is a better way to parse them, the file loop is the stongsuit of the script.

Valiceemo · May 29, 2017, 9:11pm

Just stumbled upon this and really like the look of it.

Couple of questions...

I guess I run it daily in early hours using cron?

What is the purpose of the mirror files?

deathbybandaid · May 29, 2017, 9:24pm

This script is a work in progress, and the mirroredlists is there to have a reference of what a list was before parsings,, it also gives credit where it is deserved.