Garbage response times for many (almost half at times) CNAME answers

Indeed. The way I tried to find these cases is by looking for reply or validation lines and check if there is an earlier query[] line with identical ID and client. If yes, all good. If no, i.e. either the ID or the client doesn't match, all bad.

The script below prints out all log lines with the client for which IDs seem to change, and also prints out all lines containing the offending IDs. There are 197 of such cases among >32'000 queries in my original log file that i uploaded to tricorder (should have expired by now).

There are no cases (no output other than the counter) in today's log file among >15'000 queries. I lost yesterday's to the update, forgot that it overwrites the dnsmasq file again and it didn't have the log-queries=extra anymore. :sweat_smile:

As far as I can tell, the issue is fixed. Or at least the symptoms I could identify.

Thank you, as always, for tracking down the issue and creating a fix so quickly. :heart: :+1:

Script
#!/bin/bash
shopt -s nocasematch

log_file="/home/dax/pihole.old.log"

IP_Port_combos=$(awk '{if(($6 ~ /^[0-9a-f:./]+$/)){print $6}}' ${log_file} | sort -u)
num_combos=$(echo "${IP_Port_combos}" | wc -l)

counter=1
for client in ${IP_Port_combos};
do
        echo -en "\r${counter}/${num_combos}"

        log_lines=$(grep "${client}" "${log_file}")

        # If the current line indicates an incoming query, create an array entry at the index of the ID
        # and the value of the client (IP/Port).
        # If the current line indicates a reply, check the array at index ID for the client value.
        # 1. If the current client is the same as the array value: good
        # 2. If the current client differs from the array value: bad
        result=$(echo "${log_lines}" | awk '
                BEGIN {error=0;array["init"]=0;errIDs=""};
                {
                        queryIDs = queryIDs "\n" $5

                        if($7 ~ /query.*/){
                                array[$5] = $6;
                        }
                        if($7 ~ /reply|validation/ && array[$5] != $6){
                                error = 1;
                                array[$5] = $6;
                        }
                };
                END {
                        if(error == 1){
                                print queryIDs
                        }else{
                                print "ok"
                        }
                }
        ')
        if [ "${result}" != "ok" ]; then
                echo ""
                echo "#####"
                echo "Problematic Client: ${client}"
                echo "${log_lines}"
                for id in $(echo "${result}" | sort -u);
                do
                        echo "Relevant ID: ${id}"
                        grep ": ${id} " "${log_file}"
                done
        fi

        counter=$((counter+1))
done
Sample output for my first log that I linked above
#####
Problematic Client: fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 query[A] aws.duplex.snapchat.com from fddc:704f:99b9:178:a8:d461:21cd:4bb9
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 cached aws.duplex.snapchat.com is <CNAME>
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 forwarded aws.duplex.snapchat.com to 1.1.1.1
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 forwarded aws.duplex.snapchat.com to 1.1.1.1
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 forwarded aws.duplex.snapchat.com to 2606:4700:4700::1111
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 validation aws.duplex.snapchat.com is BOGUS
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 reply error is SERVFAIL
Relevant ID: 11877
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 query[A] aws.duplex.snapchat.com from fddc:704f:99b9:178:a8:d461:21cd:4bb9
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 cached aws.duplex.snapchat.com is <CNAME>
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 forwarded aws.duplex.snapchat.com to 1.1.1.1
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 forwarded aws.duplex.snapchat.com to 1.1.1.1
Oct 14 07:03:41 dnsmasq[29757]: 11877 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 forwarded aws.duplex.snapchat.com to 2606:4700:4700::1111
Relevant ID: 11882
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/58989 query[AAAA] us-east4-gcp.api.snapchat.com from fddc:704f:99b9:178:a8:d461:21cd:4bb9
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/58989 cached us-east4-gcp.api.snapchat.com is <CNAME>
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/58989 cached us-east4-gcp.api.sc-gw.com is NODATA-IPv6
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 validation aws.duplex.snapchat.com is BOGUS
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64649 reply error is SERVFAIL
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64522 validation aws.duplex.snapchat.com is BOGUS
Oct 14 07:03:41 dnsmasq[29757]: 11882 fddc:704f:99b9:178:a8:d461:21cd:4bb9/64522 reply error is SERVFAIL