Firstly, my pihole appears to be operating normally. Satisfying both DHCP and DNS requests. I don't often look at the logs, but on doing so today I see many of these:
[2021-05-06 06:00:20.041 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
[2021-05-06 06:00:20.042 28816M] found in FTL_query_in_progress() (/root/project/src/dnsmasq_interface.c:2181)
[2021-05-06 06:00:20.042 28816M] Failed to unlock SHM lock: Operation not permitted
[2021-05-06 06:00:21.455 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
[2021-05-06 06:00:21.455 28816M] found in FTL_query_in_progress() (/root/project/src/dnsmasq_interface.c:2181)
[2021-05-06 06:00:21.455 28816M] Failed to unlock SHM lock: Operation not permitted
[2021-05-06 06:00:23.187 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
[2021-05-06 06:00:23.188 28816M] found in FTL_query_in_progress() (/root/project/src/dnsmasq_interface.c:2181)
[2021-05-06 06:00:23.188 28816M] Failed to unlock SHM lock: Operation not permitted
[2021-05-06 06:00:23.327 28816M] FATAL: Trying to access query ID -1, but maximum is 32768
If have searched the forums and found similar items but nothing specifically matching this condition. The 'maximum' changes and I have seen: 36864, 40960 & 32768, seemingly changing after FTL restart or perhaps pi reboot.
I have ensured that all /dev/shm/FTL-* seem to have the proper permissions. I am also on the latest versions:
Pi-hole v5.3.1
Web Interface v5.5
FTL v5.8.1
I am running unbound as my upstream resolver, have built and am running the latest lighttpd (to gain improvements that the repo stretch version does no offer) and I am running doh-proxy to offer up DoH, but otherwise my set up is fairly 'stock'.
No worries. I hadn't expected it to be that quick. It's not causing any issue that I can see, so far. I'm happy to test. I don't think building and deploying locally is as simple as it is for other GitHubs, is it?
We don't currently have a docker build image with the dependencies included but that idea has been brought up as something that could help users and contributors.
I don't use Docker anyway. I already build many things for my pi (lighttpd and unbound to name a few). I'll look at the instructions and maybe give it a go, dropping a 'return' in after the unlock_shm().
I have successfully patched, built and deployed FTL from the github repo. It's been running for a few hours with no errors. After reviewing dnsmasq_interface more and seeing similar code blocks with the return after unlock_shm(), I am fairly confident this patch is needed:
diff --git a/src/dnsmasq_interface.c b/src/dnsmasq_interface.c
index d14b3e18..934d5fcc 100644
--- a/src/dnsmasq_interface.c
+++ b/src/dnsmasq_interface.c
@@ -2175,6 +2175,7 @@ void FTL_query_in_progress(const int id)
{
// This may happen e.g. if the original query was an unhandled query type
unlock_shm();
+ return;
}
// Get query pointer
Yes. The history of this function is that is was removed altogether and later re-added. In the last step, apparently, the return fell over while it shouldn't.
With v2.84, we had the in-progress status because new queries for a domain+type that were already forwarded were just added to a list of recipients what would receive the reply once upstream returned. This had the side-effect that you could loose DNS resolution is some edge-cases where upstream replies just never arrive. In this case, no new forwarding ever happened and nobody ever got a reply.
Hence, the proposed behavior with v2.85 was to always re-trigger a submission upstream, even when domain+type was already forwarded. This obsoleted the in-progress status and, hence, the function was removed from FTL. Later on, before v2.85 was finalized, discussions on the mailing list lead to a modification that was a compromise I proposed: Don't re-trigger a forwarding for known domain+type immediately, but only after a short amount of time (I wanted to have this configurable, Simon hard-coded it to 2 seconds). This effectively re-introduced in-progress and I reverted the removal of the function from FTL's code. Apparently, this return slipped through.
So @tomporter518's analysis was spot on.
Fortunately, this is entirely harmless because the omitted return here always triggers the next fail check so the worst thing happening here is that there are these two error messages in the log.
Thank you. There are only three small things:
The PR should have a meaningful title (currently it is just Development). Something like Re-add missing return seems appropriate.
There is a complaint by DCO bot. It gives you instructions how to fix it when you follow the link.
We should squash-merge this PR to avoid adding the merge commit to our repo (you don't have to worry about this, we'll do it)
I have renamed the PR as suggested. I assume I was focused on figuring out how to do one, I didn't see that the name was not very useful.
For the second item, I have tried to follow the bot's instructions but the last instruction:
git push --force-with-lease origin development
gives me errors:
remote: Permission to pi-hole/FTL.git denied to tomporter518.
fatal: unable to access 'https://github.com/pi-hole/FTL.git/': The requested URL returned error: 403
I would take this to mean I don't have access to do the push (which I would expect). I'm not sure how to proceed. I'm not very git-literate ( I'm from the CVS/SVN days ).
All good, we all started one say and continued from there. It should have still worked because you managed to push the branch before (to your own repo from which the fork comes into ours).
That fatal error is on: git fetch origin pull/1130/head:re-add_missing_return when I am in my fork.
I think I may have done the initial PR wrong. When looking at my fork, I do NOT have any PRs listed. When I did the PR yesterday, I initiated it from the pi-hole repo. That's why I can't pull it in mine, I think though a bit of a guess.