Sorry for the delay, I have been busy trying to move, finish schoolwork, and a lot of other things.
So, I think I found something.
Host files that have commented lines have the lines ignored.
Example:
0.0.0.0 blocked.host.net #Comment Here
I have a file with 15 + million lines in it like that and it counted hosts as "0"
I remove the comments and it works fine.
jfb
April 12, 2020, 4:28am
2
We are looking at potential changes to the gravity update routine.
Okay.
Just a thought, I use perl to parse multiple source files and dump them into a SQL database in order to create a single host file for export.
I use regex to identify the host names, I don't know if what I use may be of use for anything, but I'll share it here.
(([\w\d]+[-.])+\w{2,3})
The reason is that in v5.0 gravity.sh removes all lines with invalid (hence #
characters) no matter where in the line they are.
gravity_Blackbody=true } total_num=0 parseList() { local adlistID="${1}" src="${2}" target="${3}" incorrect_lines # This sed does the following things: # 1. Remove all domains containing invalid characters. Valid are: a-z, A-Z, 0-9, dot (.), minus (-), underscore (_) # 2. Append ,adlistID to every line # 3. Ensures there is a newline on the last line sed -e "/[^a-zA-Z0-9.\_-]/d;s/$/,${adlistID}/;/.$/a\\" "${src}" >> "${target}" # Find (up to) five domains containing invalid characters (see above) incorrect_lines="$(sed -e "/[^a-zA-Z0-9.\_-]/!d" "${src}" | head -n 5)" local num_lines num_target_lines num_correct_lines num_invalid # Get number of lines in source file num_lines="$(grep -c "^" "${src}")" # Get number of lines in destination file num_target_lines="$(grep -c "^" "${target}")" num_correct_lines="$(( num_target_lines-total_num ))" total_num="$num_target_lines"
in v4 gravity.sh handled this differently
# Parse source files into domains format gravity_ParseFileIntoDomains() { local source="${1}" destination="${2}" firstLine abpFilter # Determine if we are parsing a consolidated list if [[ "${source}" == "${piholeDir}/${matterAndLight}" ]]; then # Remove comments and print only the domain name # Most of the lists downloaded are already in hosts file format but the spacing/formating is not contigious # This helps with that and makes it easier to read # It also helps with debugging so each stage of the script can be researched more in depth # Awk -F splits on given IFS, we grab the right hand side (chops trailing #coments and /'s to grab the domain only. # Last awk command takes non-commented lines and if they have 2 fields, take the right field (the domain) and leave # the left (IP address), otherwise grab the single field. < ${source} awk -F '#' '{print $1}' | \ awk -F '/' '{print $1}' | \ awk '($1 !~ /^#/) { if (NF>1) {print $2} else {print $1}}' | \ sed -nr -e 's/\.{2,}/./g' -e '/\./p' > ${destination} return 0 fi
Best and easiest solution would be if you remove the trailed comments as you already parse your list anyway.
Yes, a simple solution for my list. However many other public and private lists I use are not currently part of it and many of the other lists are commented as well. This will work for my list, but not the others and It may be months before I get everything combined into my DB.
DL6ER
April 13, 2020, 8:03am
6
Whoops, I already wrote a fix for this two weeks ago, but, apparently, I have forgotten to open a PR for it. My bad. Thanks for reminding me (indirectly).
pi-hole:release/v5.0
← pi-hole:tweak/hosts-comments
opened 08:06AM - 13 Apr 20 UTC
**By submitting this pull request, I confirm the following:**
- [X] I have r… ead and understood the [contributors guide](https://github.com/pi-hole/pi-hole/blob/master/CONTRIBUTING.md), as well as this entire template.
- [X] I have made only one major change in my proposed changes.
- [X] I have commented my proposed changes within the code.
- [X] I have tested my proposed changes, and have included unit tests where possible.
- [X] I am willing to help maintain this change if there are issues with it later.
- [X] I give this submission freely and claim no ownership.
- [X] It is compatible with the [EUPL 1.2 license](https://opensource.org/licenses/EUPL-1.1)
- [X] I have squashed any insignificant commits. ([`git rebase`](http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html))
---
**What does this PR aim to accomplish?:**
See title
**How does this PR accomplish the above?:**
Strip comments from downloaded lists instead of discarding lines with comments altogether
**What documentation changes (if any) are needed to support this PR?:**
None