Line 22505 cmp.computerbild.de 1
Line 25890 data-19e0ce8844.computerbild.de 1
Line 82032 ssl.3.damoh.computerbild.de 1
I'm sure it's the same pattern for the other numbers. Pi-hole parses files in HOSTS format. If there are more than one column, we omit the first and expect the domain at the second column.
#!/usr/bin/env bash
src="${1}" destination="${2}"
# Remove comments and print only the domain name
# Most of the lists downloaded are already in hosts file format but the spacing/formatting is not contiguous
# This helps with that and makes it easier to read
# It also helps with debugging so each stage of the script can be researched more in depth
# 1) Remove carriage returns
# 2) Convert all characters to lowercase
# 3) Remove comments (text starting with "#", include possible spaces before the hash sign)
# 4) Remove lines containing "/"
# 5) Remove leading tabs, spaces, etc.
# 6) Remove empty lines
< "${src}" tr -d '\r' | \
tr '[:upper:]' '[:lower:]' | \
sed 's/\s*#.*//g' | \
sed -r '/(\/).*$/d' | \
sed -r 's/^.*\s+//g' | \
sed '/^$/d'> "${destination}"
Packe dass in ein Script und lass rufe es mit ./script.sh input output auf. Mit cat output.txt | grep -E '^2' -n bekommst du eine Liste mit Zeilennummern. Dann musst du händisch danach suchen.
Wenn Du zwischenzeitlich weiter bereinigt hast, weichen die Zeilennummern eventuell ab.
Über die Domänen sollten die fehlerhaften Zeilen aber trotzdem zu finden sein.