Automatically restarting Cloudflared when internet is restored

I had a internesting issue this morning. We had a bad storm move though and we lost internet. I have all my routers, and pi-hole on UPS and protected. But the Cloudfared daemon would not service requests after the internet was restored. I had to manually stop and restart it. Before I restarted I was able to confirm that local DNS was being resolved from cache but anything upstream would hit the Cloudflared and there would be no return.

I was wondering if anyone has written a script that would test for a loss of internet on their pi-hole? I was thinking if I could detect this loss and restoration of internet I could then restart the cloudflard service.

Thanks..

Did you ever find a solution for this? The only thing keeping me on unbound on pfsense is that it always started working again as soon as internet is restored...

Yes I moved from the Cloudflared daemon to the dnscrypt proxy which uses the DNS over HTTPS for the Cloudflared service. Since doing this I have never had an issue where I had to restart dnscryptto re-establish connection to Cloudflared.

My pi.hole has been hands off now for almost 6 months and really happy.

1 Like

Hello Ron. I’m having the same issue as you had and tried Googling your recommendation in order to do this myself but I was unable to locate any info. Would you please post a link or provide instructions on how exactly you moved the daemon over? With my current setup, I lose all upstream DNS requests after router reboots. When this happens, the Pi also becomes unresponsive and I’m unable to SSH into PiHole, and in the rare occasion that I do, the reboots don’t do anything although I have Cloudflared configured as a service to start on boot.

Edit: Correction

Sounds like a different behavior than I was experiencing. If you cannot SSH into the Pi-Hole after the internet is restored there may be something else going on. When I lost internet the cloudflare daemon would no longer connect to backing services for their DNS services for DNS over HTTPS. I was still able to SSH into the Pi-Hole and restart services without restarting the Pi-Hole.

When you SSH into the Pi-Hole are you doing it by IP address or DNS name, if DNS name I would try the IP address since you state that you also lose connectivity to the Pi-Hole since your symptoms are a bit different.

I have always used the IP address for SSH. Cloudflared has a new stable release that I just upgraded to yesterday. It seems to address some of the issues discussed here and has some optimizations. If it ever locks up again, are there any commands I can run to take a look at what’s going on before the crash once I manage to SSH into it again?

You may want to run a test and pull the internet for a few moments and then restore while the pi-hole is running to make sure it reconnects. Keeping my fingers crossed that the new Cloudflared daemon is more stable and reconnects properly.

I usually tail the log file syslog file since most services will log to that location. So you may want to SSH into the pi-hole, start the "tail" of the syslog file, then pull your internet connection and watch the log, then re-connect the internet and watch the file. If the deamon is working properly you should see it re-negotiate it's connection the Cloudflared.

Here is the tail command:

tail -f /var/log/syslog

1 Like

Hello again Ron! Thanks for that command. Will keep that in mind when it happens again. Although I updated Cloudflared and things have been stable, I believe my issues were most likely tied in with a known Raspberry Pi 4 issue when using the reboot or shutdown command. One of the issues was resolved with a bootloader update, but one issue is hardware/board specific and can’t be actually “resolved” with the current hardware. Something about 3v3 voltage completely dropping during the reboot process and hanging in that state, even after the pi has fully rebooted. This explains why I never had these issues with the very same setup for over a year on the pi 3B+. This causes issues with Fan or SATA hats and explains my hangs and unreachable pi during reboots. I only reboot the pi when making network changes and when rebooting the router so I figured it was an issue with Cloudflared or the pi not acquiring proper DNS connectivity after reboot. I can replicate the issue by simply rebooting the pi. If I safely shutdown, pull the power source for a few seconds, then power the pi back on, I have no connectivity or hanging issues with the pi and haven’t lost connectivity at all. They should have pulled this unit from the shelves and offered a revision immediately as this was exposed by a user over a year ago and was confirmed. But, I guess not enough users use Hats on their pi’s for them to react, but I got stuck with a bum unit.

The reason I started looking at a hardware issue was because I had this setup working great hands off for over a year with no issues on the 3B+. Also, I have several SD cards with cloned images with older backups and they all behaved the same. So, of course I suspected the fan hat as causing the issue as it’s the only hardware that was new to the setup. When researching that I ran into the articles about the Pi 4 and it’s voltage issues during reboots.
Either way it goes, I’m sure that command will come in handy for future troubleshooting purposes. I appreciate the help!

Do you have a link or some more information on this? It would be good for the Pi-hole team to know more about and to possibly add additional checks during debugging/troubleshooting.

Hello Dan. I’m almost positive it was a discussion on StackExchange, but now that I’m out of work I’ll try using the same keywords on Google and see if I track that discussion down for you. I’ll chime back in when I do.

The user reported an issue in which the 3v3 voltage would drop to .01 during the reboot process and that rail would never recover full power after the Pi has completely rebooted. The user was using a SATA hat and reported this was causing issues with the drives and SATA hat in general, usually forcing him to pull the plug on the unit to regain functionality. Same was happening to me during reboots and I’m also using a hat, a fan hat to be specific. One of the devs from the pi org chimmed in and was able to replicate the problem and confirmed that an upcoming EPROM update would help alleviate the issue but also admitted that with the current hardware, the problem will most likely remain as they had no fix with the current board. :man_shrugging:t2:
Something told me to copy the link so that I can forward it to Ron, but it slipped my mind. But, I’m on the job, I’ll track that discussion down for u.

1 Like

Hello Dan! I managed to track the discussion down, it was actually on GitHub. So glad I found it as it wasn’t in my history anymore. Here’s the link : Raspberry Pi 4B reboot issue, 3.3V pins go to 0V and stay on reboot with HAT attached · Issue #3065 · raspberrypi/linux · GitHub I hope this helps you guys out.

2 Likes

Hey Ron, quick question. I’ve been having issues with AWS servers and Cloudflared. I ran a traceroute to the problematic url: fanc.tmsimg.com and it shows that the AWS servers seem to be “confused” by the Upstream Server or the PiHole and causes huge latency issues. I’m seeing 500+ms response times to AWS servers in Singapore. Is there a way to force cloudflared to US based servers? Inages/Thumbnails on Netflix, Channel’s App, Amazon Video, etc etc are very slow to load. The documentation on Cloudflared is soooo limited. Couldn’t even find a forum. :pensive:

Sorry to say I never tried to force the cloudflared services to a particular region. My understanding is that many of their services auto identify regional DNS severs via anycast but not sure how it works for DNS over HTTPS.

1 Like

Thanks Ron. I just ended up deleting Cloudflared and installing Unbound. Being that Cloudflare was using servers abroad, it was causing issues with AWS servers. All apps/clients network-wide were affected by this. Images/Thumbnails and other static assets were taking 400-500ms to load. Try running a traceroute to fanc.tmsimg.com and check your results. Since switching to Unbound I’m seeing <1ms response times and 2-3ms response times on cold hits.

Edit: I have to add, that I’m a little worried about Unbound as there is a known exploit (https://nlnetlabs.nl/downloads/unbound/CVE-2019-18934.txt) and I’m not sure if the Debian package has been updated yet. :worried:

Geolocation of your IP address is key to the closest geographic zone hosting the DNS. Yes sometimes there are issue with anycast connecting to the wrong host can cause some latency issues. And even worse i don't even know how the they look at Amazon services that could also be across zones.

Also CDN's do "mapping" of their distribution centers directly to the nearest anycast servers to attempt to keep traffic latency down. There may be an issue with the mapping that your CDN provider has set up when it comes to Clourdflared. The only way to resolve this is usually to reach out the CDN's tech support. I had this issue with one of our DNS providers and Akamai as we migrated video into their platform. West cost users in the US were going against a Dallas TX CDN and causing latency issues also. It took about a week for Akamai to resolve with our provider.

1 Like