Faulty regex consumes all memory and causes issues with pihole -q

Nope, marbles are intact! I'll do a short loop over the list as a space delimited array. I'll have a patch up in a few minutes.

Edit: Adding set -x to the top of query.sh is eye opening! :laughing:

1 Like

Phew! :crazy_face:

Okay, seems to work now. Shellcheck doesn't like it but meh...

Branch pushed.

# Scan an array of files for matching strings
scanList(){
    # Escape full stops
    local domain="${1}" esc_domain="${1//./\\.}" lists="${2}" type="${3:-}"

    # Prevent grep from printing file path
    cd "$piholeDir" || exit 1

    # Prevent grep -i matching slowly: http://bit.ly/2xFXtUX
    export LC_CTYPE=C

    # /dev/null forces filename to be printed when only one list has been generated
    case "${type}" in
        "exact" ) grep -i -E -l "(^|(?<!#)\\s)${esc_domain}($|\\s|#)" ${lists} /dev/null 2>/dev/null;;
        # Iterate through each regexp and check whether it matches the domainQuery
        # If it does, print the matching regexp and continue looping
        # Input 1 - regexps | Input 2 - domainQuery
        "regex" ) 
            for list in `echo "${lists}"`; do
                if [[ "${domain}" =~ ${list} ]]; then
                    printf "%b\n" "${list}";
                fi
            done;;
        *       ) grep -i "${esc_domain}" ${lists} /dev/null 2>/dev/null;;
    esac
}

With the fix both domains are found within regex blacklist.

nanopi@nanopi:~$ pihole -q ads.test.bbc.co.uk
 Match found in regex blacklist
   ^(.+[.-])?ad[sxv]?[0-9]*[.-]
   (^|[-_.]+)(m?a(d((vert(s|is(ing|e?ments?))?|im(age|g)s?)|([vx]|s(e?rv(er?|ice)?s?)?)|track(ers?|ing))?|ff(iliat(es?|ion))?|nalytics?)|b((anner|eacon)s?)|count(ers?)?|pixels?|stat(s|istics?)?|t(elemetry|ra(ffic|ck(ers?|ing))))[-_]*[0-9]*[-_.]
   ^(.+[_.-])?ad[sxv]?[0-9]*[_.-]
nanopi@nanopi:~$ pihole -q analytics.test.co
 Match found in regex blacklist
   ^analytics?[.-]
   (^|[-_.]+)(m?a(d((vert(s|is(ing|e?ments?))?|im(age|g)s?)|([vx]|s(e?rv(er?|ice)?s?)?)|track(ers?|ing))?|ff(iliat(es?|ion))?|nalytics?)|b((anner|eacon)s?)|count(ers?)?|pixels?|stat(s|istics?)?|t(elemetry|ra(ffic|ck(ers?|ing))))[-_]*[0-9]*[-_.]

What is the expected output?

yubiuser, can you get me a debug token so I can match your exact lists?

The output is correct. I was testing the domains that @mmotti was trying before.

Edit This post was solely a confirmation that the changes you did with the last update fixed the missing results @mmotti was seeing

Seems to work as expected and it doesn't crap out or seem to increase the time taken to execute or find a result.

Could I make a little suggestion?

# Scan an array of files for matching strings
scanList(){
    # Escape full stops
    local domain="${1}" esc_domain="${1//./\\.}" lists="${2}" type="${3:-}"

    # Prevent grep from printing file path
    cd "$piholeDir" || exit 1

    # Prevent grep -i matching slowly: http://bit.ly/2xFXtUX
    export LC_CTYPE=C

    # /dev/null forces filename to be printed when only one list has been generated
    case "${type}" in
        "exact" ) grep -i -E -l "(^|(?<!#)\\s)${esc_domain}($|\\s|#)" "${lists}" /dev/null 2>/dev/null;;
        # Iterate through each regexp and check whether it matches the domainQuery
        # If it does, print the matching regexp and continue looping
        # Input 1 - regexps | Input 2 - domainQuery
         "regex" ) 
            for regexp in "${lists[@]}"; do
                if [[ "${domain}" =~ ${regexp} ]]; then
                    printf "%b\n" "${regexp}";
                fi
            done;;
        *       ) grep -i "${esc_domain}" "${lists}" /dev/null 2>/dev/null;;
    esac
}

It's a little cleaner than the old ` quotes.

If that is done, the following could be removed:

str_regexList=$(printf '%s\n' "${regexList[@]}")

and change

 mapfile -t regexMatches < <(scanList "${domain}" "${str_regexList}" "regex")

to

 mapfile -t regexMatches < <(scanList "${domain}" "$regexList" "regex")

I haven't been able to test this on my machine, but it would be cleaner to pass the array straight through, and then just iterate through the mapfile array that we already have.

Oh, thanks then! I thought there was something missing or too many hits were being printed.

That only works if lists is already an array. It's not, it has to be built. We can do that and then allow for iteration over the lists[@] but you can't take lists and just consider it to be an array already.

Edit: If the array would be used in multiple places within the function then I think a one time ${2} to an array would be worth it. Just have lists be an array from the start.

Does the mapfile not count as an array? It seems to iterate ok for me?

This could also make sense. However I think that for the regex portion only, it is passed as a mapfile array that is iterable.

Where is the mapfile being given to the scanList function? I don't see that.

scanRegexDatabaseTable() {
    local domain list
    domain="${1}"
    list="${2}"
    type="${3:-}"

    # Query all regex from the corresponding database tables
    mapfile -t regexList < <(sqlite3 "${gravityDBfile}" "SELECT domain FROM domainlist WHERE type = ${type}" 2> /dev/null)

    # If we have regexps to process
    if [[ "${#regexList[@]}" -ne 0 ]]; then
        # Split regexps over a new line
        mapfile -t regexMatches < <(scanList "${domain}" "$regexList" "regex")

This is with part of a tweak I made (as mentioned in my post above) to remove turning it to a multiline string and just pass the mapfile straight to it.

# Scan an array of files for matching strings
scanList(){
    # Escape full stops
    local domain="${1}" esc_domain="${1//./\\.}" lists="${2}" type="${3:-}"
        echo "DEBUG: This is lists[@]"
    for regexp in "${lists[@]}"; do
        echo "DEBUG: regexp ${regexp}"
        echo "DEBUG: -------"
    done
    echo "GUBED"

    # Prevent grep from printing file path
    cd "$piholeDir" || exit 1

    # Prevent grep -i matching slowly: http://bit.ly/2xFXtUX
    export LC_CTYPE=C

    # /dev/null forces filename to be printed when only one list has been generated
    case "${type}" in
        "exact" ) grep -i -E -l "(^|(?<!#)\\s)${esc_domain}($|\\s|#)" ${lists} /dev/null 2>/dev/null;;
        # Create array of regexps
        # Iterate through each regexp and check whether it matches the domainQuery
        # If it does, print the matching regexp and continue looping
        # Input 1 - regexps | Input 2 - domainQuery
        "regex" ) 
            for list in `echo "${lists}"`; do
                if [[ "${domain}" =~ ${list} ]]; then
                    printf "%b\n" "${list}";
                fi
            done;;
        *       ) grep -i "${esc_domain}" ${lists} /dev/null 2>/dev/null;;
    esac
}
pihole -q analytics.test.com
 Match found in regex whitelist
   DEBUG: This is lists[@]
   DEBUG: regexp (\.|^)*\.services\.generalmagic\.com$
   DEBUG: -------
   GUBED
 Match found in regex blacklist
   DEBUG: This is lists[@]
   DEBUG: regexp (\.|^)*\.metric\.gstatic\.com$
   ^analytics?[_.-]
   DEBUG: -------
   GUBED
   ^analytics?[_.-]
 Match found in regex whitelist
   DEBUG: This is lists[@]
   DEBUG: regexp (\.|^)*\.services\.generalmagic\.com$
   DEBUG: -------
   GUBED
   DEBUG2: This is echo (\.|^)*\.services\.generalmagic\.com$
   DEBUG2: regexp (\.|^)*\.services\.generalmagic\.com$
   DEBUG2: -------
   GUBED
 Match found in regex blacklist
   DEBUG: This is lists[@]
   DEBUG: regexp (\.|^)*\.metric\.gstatic\.com$
   ^analytics?[_.-]
   DEBUG: -------
   GUBED
   DEBUG2: This is echo (\.|^)*\.metric\.gstatic\.com$ ^analytics?[_.-]
   DEBUG2: regexp (\.|^)*\.metric\.gstatic\.com$
   DEBUG2: -------
   DEBUG2: regexp ^analytics?[_.-]
   DEBUG2: -------
   GUBED
   ^analytics?[_.-]
scanList(){
    # Escape full stops
    local domain="${1}" esc_domain="${1//./\\.}" lists="${2}" type="${3:-}"
        echo "DEBUG: This is lists[@]"
    for regexp in "${lists[@]}"; do
        echo "DEBUG: regexp ${regexp}"
        echo "DEBUG: -------"
    done
    echo "GUBED"

        echo "DEBUG2: This is echo ${lists}""
    for regexp in `echo "${lists}"`; do
        echo "DEBUG2: regexp ${regexp}"
        echo "DEBUG2: -------"
    done
    echo "GUBED"

Edit: If there is a better option than backticks I welcome it heartily! I only used backticks as a last resort.

Two secs - I am looking at this. I forgot it's /opt/pihole that actually houses the query script.

Edit: I stand corrected. Bash is weird. @DanSchaper only tagging you as this is an edit as opposed to a new post.

So, the current idea is to convert the mapfile to a multiline string and pass that to scanList, which will then use a while loop instead of the for loop with the legacy quotes.

Current issues:
For some reason the printf %s\n is ignored.

mmotti@ubuntu-server:~$ pihole -q analytics.ads.test.com
 Match found in regex blacklist
   ^(.+[_.-])?ad[sxv]?[0-9]*[_.-] ^analytics?[_.-]

scanList

# Scan an array of files for matching strings
scanList(){
    # Escape full stops
    local domain="${1}" esc_domain="${1//./\\.}" lists="${2}" type="${3:-}"

    # Prevent grep from printing file path
    cd "$piholeDir" || exit 1

    # Prevent grep -i matching slowly: http://bit.ly/2xFXtUX
    export LC_CTYPE=C

    # /dev/null forces filename to be printed when only one list has been generated
    case "${type}" in
        "exact" ) grep -i -E -l "(^|(?<!#)\\s)${esc_domain}($|\\s|#)" "${lists}" /dev/null 2>/dev/null;;
        # Iterate through each regexp and check whether it matches the domainQuery
        # If it does, print the matching regexp and continue looping
        # Input 1 - regexps | Input 2 - domainQuery
        "regex" ) \
            while IFS= read -r regexp; do
                if [[ "${domain}" =~ ${regexp} ]]; then
                    printf '%s\n' "${regexp}"
                fi
            done < <(echo "$lists");;
        *       ) grep -i "${esc_domain}" "${lists}" /dev/null 2>/dev/null;;
    esac
}

scanRegexDatabaseTable

scanRegexDatabaseTable() {
    local domain list
    domain="${1}"
    list="${2}"
    type="${3:-}"

    # Query all regex from the corresponding database tables
    mapfile -t regexList < <(sqlite3 "${gravityDBfile}" "SELECT domain FROM domainlist WHERE type = ${type}" 2> /dev/null)

    # If we have regexps to process
    if [[ "${#regexList[@]}" -ne 0 ]]; then
	# Split regexps over a new line
        str_regexList=$(printf '%s\n' "${regexList[@]}")
        # Check domain against regexps
        mapfile -t regexMatches < <(scanList "${domain}" "${str_regexList}" "regex")

Thanks! I think I have a simple solution?
IFS is set to newline, iterate over the lists that is newline delimited.

scanList(){
    # Escape full stops
    local domain="${1}" esc_domain="${1//./\\.}" lists="${2}" type="${3:-}"
        echo "DEBUG: This is lists[@]"
    IFS=$'\n'
    for regexp in ${lists}; do
        echo "DEBUG: regexp ${regexp}"
        echo "DEBUG: -------"
    done
    echo "GUBED"

        echo "DEBUG2: This is echo ${lists}"
    for regexp in `echo "${lists}"`; do
        echo "DEBUG2: regexp ${regexp}"
        echo "DEBUG2: -------"
    done
    echo "GUBED"
pihole -q analytics.test.com
 Match found in regex whitelist
   DEBUG: This is lists[@]
   DEBUG: regexp (\.|^)*\.services\.generalmagic\.com$
   DEBUG: -------
   GUBED
   DEBUG2: This is echo (\.|^)*\.services\.generalmagic\.com$
   DEBUG2: regexp (\.|^)*\.services\.generalmagic\.com$
   DEBUG2: -------
   GUBED
 Match found in regex blacklist
   DEBUG: This is lists[@]
   DEBUG: regexp (\.|^)*\.metric\.gstatic\.com$
   DEBUG: -------
   DEBUG: regexp ^analytics?[_.-]
   DEBUG: -------
   GUBED
   DEBUG2: This is echo (\.|^)*\.metric\.gstatic\.com$
   ^analytics?[_.-]
   DEBUG2: regexp (\.|^)*\.metric\.gstatic\.com$
   DEBUG2: -------
   DEBUG2: regexp ^analytics?[_.-]
   DEBUG2: -------
   GUBED
   ^analytics?[_.-]

Edit: Might not even need to adjust IFS in this case, just use the unquoted lists variable.

I haven't tested this, but if it works, it could be interesting? My preference (if it were my code), would be the while loop to iterate over the list though.

Also - I'm not sure how much of an issue it is, but if changing the IFS, most examples of people I have seen playing around with it take the starting value, change it, and then change it back to the starting value on completion. Might be worth tweaking slightly if you are taking this approach? :slight_smile:

I don't think we even need to adjust IFS now.

I trust your judgement. Bash is bloody weird. I remember entirely why I gave up with it now lol