Encountered non-critical SQL warnings

Pinging @jpgpi250 and @Vexira on this as we know they have a large collection of lists so they can easily check if my solution is showing any odd behavior like filtering out domains it shouldn't. Also, all the database warnings/errors should be gone now.

pihole checkout core tweak/database_warnings_inspection

Hey I did as, was suggested to do by @DanSchaper and to check all my lists and I shrunk it to just a set that doesn't throw any errors, I've been sifting through the lists to see which ones were clean.

But I'll back up this current install and restore the backup from the old install and test your fix, also thank you for your efforts.

Ohh cool cheers for that I think I found one of his lists is down on fire blog, I'll be sure to find it and let him know

1 Like

From your PR I can see Valid are: a-z, A-Z, 0-9, dot (.), minus (-), underscore (_) and after pihole- g I get

  [i] 28614 of 28619 domains imported (99.9%)
      Sample of invalid domains (showing up to five):
      - банрек.рус
      - укроп-петрушка-огурцы.рф
      - хельга.рф
      - эхх.рф
      - mxtads.com:8040

I don't know if theses cyrillic (?) domains are valid, but as you are checking just for latin letters they might never be imported. Can you extend the check for internationalized domains?

Those are indeed internationalized domain names (IDNA) as resolvable by your browser.

DNS on the other hand never uses these names directly, as it is restricted to a subset of ASCII.

In order for IDNAs to be usable by DNS, they have to be converted to their Punycode equivalent, e.g.

банрек.рус is equivalent to xn--80abnuns.xn--p1acf

Depending on your browser, you might actually see the URL change to its Punycode equivalent when entering and submitting an IDNA URL.

As the lists are meant to be domain names for filtering at DNS level, I'd assume that it would be the list maintainer's job, rather than Pi-hole's, to provide the Punycode names in the first place.

Having Pi-hole do the conversion would mean to mutiply the effort for Punycode conversions by number of Pi-hole installations times building the gravity when using that list.
This would not only be a waste of ressources, it would slow down gravity creation as well.

1 Like

Thanks for the explanation, didn't know that DNS uses a subset of ASCII.

Will contact the list maintainer.

It's still crap, but you want to stuff yourself full of manure, I guess that's your wish.

If this stays then it needs to say

976 of 976 lines importable and that's it. No implying the list is quality, no saying it's 100% correct.

The latest update turned it into

 Target: https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt
  [✓] Status: Retrieval successful
  [i] 28614 of 28619 domains imported (99.9%)
      Sample of invalid domains (showing up to five):
      - банрек.рус
      - укроп-петрушка-огурцы.рф
      - хельга.рф
      - эхх.рф
      - mxtads.com:8040

  [i] Target: https://gitlab.com/ZeroDot1/CoinBlockerLists/raw/master/hosts
  [✓] Status: Retrieval successful
  [i] 46752 of 46752 domains imported (100.0%)

Now that it is a problem, the users will have to clean up their lists, or the list maintainers will have to clean up the lists.

I want to have users know that they are putting garbage in to their DNS servers and learn to not do it anymore. This will help everyone going forward and far in to the future. If we have to monkeypatch in workarounds for bad data then no one will ever change and they will continue to put crap in their DNS servers.

1 Like

So you're saying people will stop using Pi-hole because we tell them that their lists were ineffective before and now they are not valid? Seems that we'd gain more trust from being fully transparent and teaching people what exactly it is they are doing.

You don't tell a diabetic to eat a dozen donuts but just remember to triple the insulin the next time.

There should be no rating. Pi-hole does not and can not "rate" a list. I can make a 100% accepted list that blocks you from accessing anything outside of your local network. Is that Quality 100!? People will now say "Hey, this is a 100% Pi-hole list! It's full of shit, but it's 100%!"

This is a list that will get 100% Quality, note the domains that are whitelisted from it. Do you know that list has approved all these domains for access?

I, for once, did read this as an acknowledgement of DL6ER's latest additions to blacklist import.

What's wrong with that?

Or did I misinterpret something here?

1 Like

This is my stance. There is no rating, no percentage, no implied approval or disapproval. Just the bare facts of how many domains were accepted. Just like we did before with listing the number of domains imported.

2 Likes

The full quote is below. Yesterday I made the statement "Crap in Crap Out". The statement below was a nice little dig at me. Everyone around here thinks I'm the angry, grump guy that is out to destroy Pi-hole and ride off in to the sunset.

it woks perfectly, i can see outputs on broken lists, and which need to be removed.

I agree on the quality being unclever. Showing a net number of what was not imported is an obvious quantity for the "HOSTS likeness" of the list. Quality is misleading here.

Maybe like this?

  [✓] Status: Retrieval successful
  [i] Imported 51223 of 51233 lines, 10 lines invalid
      Sample of invalid domains:
      - банрек.рус
      - укроп-петрушка-огурцы.рф
      - хельга.рф
      - эхх.рф
      - mxtads.com:8040

I'm ready to change it to whatever we agree on, however, I'd obviously prefer to not change this multiple times...

2 Likes