FATAL: Trying to access query ID -1 errors in pihole-FTL.log

Expected Behaviour:

No errors in the pihole-FTL.log

Actual Behaviour:

Firstly, my pihole appears to be operating normally. Satisfying both DHCP and DNS requests. I don't often look at the logs, but on doing so today I see many of these:

[2021-05-06 06:00:20.041 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
[2021-05-06 06:00:20.042 28816M]        found in FTL_query_in_progress() (/root/project/src/dnsmasq_interface.c:2181)
[2021-05-06 06:00:20.042 28816M] Failed to unlock SHM lock: Operation not permitted
[2021-05-06 06:00:21.455 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
[2021-05-06 06:00:21.455 28816M]        found in FTL_query_in_progress() (/root/project/src/dnsmasq_interface.c:2181)
[2021-05-06 06:00:21.455 28816M] Failed to unlock SHM lock: Operation not permitted
[2021-05-06 06:00:23.187 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
[2021-05-06 06:00:23.188 28816M]        found in FTL_query_in_progress() (/root/project/src/dnsmasq_interface.c:2181)
[2021-05-06 06:00:23.188 28816M] Failed to unlock SHM lock: Operation not permitted
[2021-05-06 06:00:23.327 28816M] FATAL: Trying to access query ID -1, but maximum is 32768

If have searched the forums and found similar items but nothing specifically matching this condition. The 'maximum' changes and I have seen: 36864, 40960 & 32768, seemingly changing after FTL restart or perhaps pi reboot.

I have ensured that all /dev/shm/FTL-* seem to have the proper permissions. I am also on the latest versions:
Pi-hole v5.3.1
Web Interface v5.5
FTL v5.8.1
I am running unbound as my upstream resolver, have built and am running the latest lighttpd (to gain improvements that the repo stretch version does no offer) and I am running doh-proxy to offer up DoH, but otherwise my set up is fairly 'stock'.

Debug Token:

https://tricorder.pi-hole.net/2amhajbm3n

I did a bit of looking at the code in:FTL/dnsmasq_interface.c at b90ab8b1e78d7aece11c8fee1c3c00817761de52 · pi-hole/FTL · GitHub. I wonder if there is a missing return after lline 2177. Not knowing the full use case here, but it seems that if queryID is < 0, in this case I think -1, and the SHM is unlocked, it does it make sense to continue?

Off the top of my head it looks like we could hit unlock_shm() at line 2177 and then try to call unlock_shm() again at line 2204.

The history of that function shows that there was a return there previously. The function was affected by Update embedded dnsmasq to v2.85 by DL6ER · Pull Request #1071 · pi-hole/FTL · GitHub with the specific return removed at Update embedded dnsmasq to v2.85 by DL6ER · Pull Request #1071 · pi-hole/FTL · GitHub

Would there be any plans to fix this? It's certainly making for a larger log file and seemingly unnecessary disk activity.

First we'd need to see if that's the actual issue.

My apologies on behalf of the entire Pi-hole team on not resolving your issue within 24 hours.

:slight_smile: No worries. I hadn't expected it to be that quick. It's not causing any issue that I can see, so far. I'm happy to test. I don't think building and deploying locally is as simple as it is for other GitHubs, is it?

Building FTL is actually pretty simple and we do have directions if you'd like to give it a go.

https://docs.pi-hole.net/ftldns/compile/

We don't currently have a docker build image with the dependencies included but that idea has been brought up as something that could help users and contributors.

I don't use Docker anyway. I already build many things for my pi (lighttpd and unbound to name a few). I'll look at the instructions and maybe give it a go, dropping a 'return' in after the unlock_shm().

1 Like

I have successfully patched, built and deployed FTL from the github repo. It's been running for a few hours with no errors. After reviewing dnsmasq_interface more and seeing similar code blocks with the return after unlock_shm(), I am fairly confident this patch is needed:

diff --git a/src/dnsmasq_interface.c b/src/dnsmasq_interface.c
index d14b3e18..934d5fcc 100644
--- a/src/dnsmasq_interface.c
+++ b/src/dnsmasq_interface.c
@@ -2175,6 +2175,7 @@ void FTL_query_in_progress(const int id)
        {
                // This may happen e.g. if the original query was an unhandled query type
                unlock_shm();
+               return;
        }
 
        // Get query pointer
1 Like

Would you like to submit a PR with your fix to the FTL repo?

https://github.com/pi-hole/FTL/pulls

OK, I forked and created PR. I'm not that versed in doing them, so I apologize ahead of time for any clumsiness. It is located at: Development by tomporter518 · Pull Request #1130 · pi-hole/FTL · GitHub

1 Like

Yes.

Yes. The history of this function is that is was removed altogether and later re-added. In the last step, apparently, the return fell over while it shouldn't.

With v2.84, we had the in-progress status because new queries for a domain+type that were already forwarded were just added to a list of recipients what would receive the reply once upstream returned. This had the side-effect that you could loose DNS resolution is some edge-cases where upstream replies just never arrive. In this case, no new forwarding ever happened and nobody ever got a reply.

Hence, the proposed behavior with v2.85 was to always re-trigger a submission upstream, even when domain+type was already forwarded. This obsoleted the in-progress status and, hence, the function was removed from FTL. Later on, before v2.85 was finalized, discussions on the mailing list lead to a modification that was a compromise I proposed: Don't re-trigger a forwarding for known domain+type immediately, but only after a short amount of time (I wanted to have this configurable, Simon hard-coded it to 2 seconds). This effectively re-introduced in-progress and I reverted the removal of the function from FTL's code. Apparently, this return slipped through.
So @tomporter518's analysis was spot on.

Fortunately, this is entirely harmless because the omitted return here always triggers the next fail check so the worst thing happening here is that there are these two error messages in the log.


Thank you. There are only three small things:

  1. The PR should have a meaningful title (currently it is just Development). Something like Re-add missing return seems appropriate.
  2. There is a complaint by DCO bot. It gives you instructions how to fix it when you follow the link.
  3. We should squash-merge this PR to avoid adding the merge commit to our repo (you don't have to worry about this, we'll do it)

I have renamed the PR as suggested. I assume I was focused on figuring out how to do one, I didn't see that the name was not very useful.

For the second item, I have tried to follow the bot's instructions but the last instruction:

git push --force-with-lease origin development

gives me errors:

remote: Permission to pi-hole/FTL.git denied to tomporter518.
fatal: unable to access 'https://github.com/pi-hole/FTL.git/': The requested URL returned error: 403

I would take this to mean I don't have access to do the push (which I would expect). I'm not sure how to proceed. I'm not very git-literate ( I'm from the CVS/SVN days :slight_smile: ).

All good, we all started one say and continued from there. It should have still worked because you managed to push the branch before (to your own repo from which the fork comes into ours).

What does

git remote
git remote get-url --all origin
git branch -r

reveal? I'd expect this to point to your repo (tomporter518/FTL) not ours (pi-hole/FTL).

The outputs are:

pi@gyre:~/Projects/FTL $ git remote
origin
pi@gyre:~/Projects/FTL $ git remote get-url --all origin
https://github.com/pi-hole/FTL.git
pi@gyre:~/Projects/FTL $ git branch -r
  origin/HEAD -> origin/master
  origin/development
  origin/fix/cancel_api_threads
  origin/fix/fork-resizing-hangs
  origin/fix/lock_gravityClose
  origin/master
  origin/new/GeoIP
  origin/new/codeql
  origin/new/http
  origin/new/regex_replytype
  origin/new/reply_pihole_hostname
  origin/quickfix/5.8
  origin/release/v5.0
  origin/release/v5.1
  origin/release/v5.1.1
  origin/release/v5.2
  origin/release/v5.3
  origin/release/v5.3.1
  origin/release/v5.3.2
  origin/release/v5.3.3
  origin/release/v5.4
  origin/release/v5.5
  origin/release/v5.6
  origin/release/v5.7
  origin/release/v5.8
  origin/special/CI_development
  origin/special/umbrella
  origin/tweak/audit_statement_and_all_the_things
  origin/tweak/delay_earlier
  origin/tweak/incorrect_hwclock_warning
  origin/tweak/listener_logging
  origin/tweak/more_listening_logging
  origin/tweak/moveloglocation
  origin/update/dnsmasq
  origin/update/dnsmasq-v2.84
  origin/update/dnsmasq-v2.86

I was in pi-hole/FTL because when I tried those git commands in my fork, I get:

fatal: Couldn't find remote ref pull/1130/head

I feel like I may have borked things up a bit, due to my lack of experience. :confused:

In your fork would be correct, what exact command lead to

Was it the git push --force-with-lease origin development ?

If so, try simply

git push -f

That fatal error is on: git fetch origin pull/1130/head:re-add_missing_return when I am in my fork.

I think I may have done the initial PR wrong. When looking at my fork, I do NOT have any PRs listed. When I did the PR yesterday, I initiated it from the pi-hole repo. That's why I can't pull it in mine, I think though a bit of a guess.

Okay, no need to pull the branch you already have. Try

git commit --amend --no-edit --signoff
git push -f

in your fork in the branch you were editing.

Ok, both of those seemed to work. Sorry to be such a bother for a single line change. Is there anything else that I should do?

Contributions are not bothersome, no need to worry.

1 Like