Thank you, that will help with users searching the forums.
Regex is a very powerful tool and can be used to customize the user experience. It's not easy to validate regex but I agree that having some kind of check would be helpful. In this case it would not have applied since the input was for wildcard values and should not have contained any regex values at all. This particular issue could be helped with checking for input that should only contain chars found in domain names. Opening a bug report on our core or adminlte repositories would be helpful in tracking that specific case.
I selected "Type: Wildcard Whitelist" but it appeared as "Regex Whitelist". Not sure if it should be like this - it is automatically converted to a regex style. See picture for test.domain - it was added as wildcard whitelist but appears as Regex Whitelist.
As mentioned in the other thread, this is already in the works for quite some time but will likely not hit v5.0 not not push back the release unnecessarily long.
Thanks for this, however, we will simply reuse the inbuilt PHP domain validator we already used before. It also checks for the maximum length of subdomains, etc.
I think the glitch may be in something else. Noted on the GitHub issue, but calling awk directly doesn't cause a crash, and I'm seeing that using shell everything seems to be okay. I don't think there actually is a problem with using a * in a wildcard entry as that will always be the first char and will always end up as (\.|^)*. I think that's okay, just means "Zero or more dots or start of lines".
Checking with a small script and using shell regex instead of awk looks hopeful?
#!/usr/bin/env bash
domains=(gstatic.com generalmagic.com services.generalmagic.com me.services.generalmagic.com)
regex="(\.|^)*\.services\.generalmagic\.com$"
for domain in ${domains[@]}; do
if [[ $domain =~ $regex ]]; then
printf "Found %s in %b.\n" $domain $regex
else
printf "Did not find %s in %b.\n" $domain $regex
fi
done
OUT:
Did not find gstatic.com in (\.|^)*\.services\.generalmagic\.com$.
Did not find generalmagic.com in (\.|^)*\.services\.generalmagic\.com$.
Did not find services.generalmagic.com in (\.|^)*\.services\.generalmagic\.com$.
Found me.services.generalmagic.com in (\.|^)*\.services\.generalmagic\.com$.```
Edit:
For completeness and ease of discussion, here's the awk:
I can try and carve out a very small RAM lxc to check for memory starvation, 1G is the minimum on Digital Ocean and that may not show what we are looking for.
Edit: 1G RAM DigitalOcean droplet and I can't get it to crash or hang at all. I'll need something to duplicate with to try and work this down.
Hmm - To be completely honest, I'm not sure. My dealings with awk are quite limited and I only started playing around with it for this purpose as it seemed to be much quicker than the alternatives when it came to a significant number of regexps. Particularly running grep in a loop for each regexp and checking the return code, as grep doesn't seem to have the ability to display the pattern that caused a match.
(\.|^)* seems like particularly bad regex, and I wonder if it could be classed as 'evil regex', or regex denial or service. The issue likely being the caret used within a group (not great practice anyway), being further complicated by the * quantifier. It could be stuck checking a huge criteria for each domain and thus dying?
I'm away from my laptop at the moment, but does grep or the more generic bash approach have the same outcome when compared against the same number of domains? Is the performance as good running a for loop with bash? Assuming the domain count for the people experiencing issues to be in the millions but haven't read every detail of the thread just yet.
Sadly I'm not sure what to suggest with this particularly with awk. The problem with regex generally is that as so long as the pattern is technically compilable, the processor (awk in this case) is at the mercy of the user.
With the script everything is fine, running just ~1 sec.
nanopi@nanopi:~$ ./test
Did not find gstatic.com in (\.|^)*\.services\.generalmagic\.com$.
Did not find generalmagic.com in (\.|^)*\.services\.generalmagic\.com$.
Did not find services.generalmagic.com in (\.|^)*\.services\.generalmagic\.com$.
Found me.services.generalmagic.com in (\.|^)*\.services\.generalmagic\.com$.
But the awk got killed again due to high memory consumptation
As it seem to come down to awk I checked what was installed:
nanopi@nanopi:~$ apt -s install awk
NOTE: This is only a simulation!
apt needs root privileges for real execution.
Keep also in mind that locking is deactivated,
so don't depend on the relevance to the real current situation!
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package awk is a virtual package provided by:
original-awk:armhf 2012-12-20-6
mawk:armhf 1.3.3-17+b3
gawk:armhf 1:4.2.1+dfsg-1
original-awk 2012-12-20-6
mawk 1.3.3-17+b3
gawk 1:4.2.1+dfsg-1
You should explicitly select one to install.
nanopi@nanopi:~$ sudo apt-show-versions awk
awk not installed (not available)
nanopi@nanopi:~$ apt-show-versions mawk
mawk:arm64 1.3.3-17+b3 installed: No available version in archive
nanopi@nanopi:~$ apt-show-versions gawk
gawk not installed (not available)
nanopi@nanopi:~$ sudo apt install gawk
nanopi@nanopi:~$ gawk 'NR==FNR{regexps[$0];next}{for (r in regexps)if($0 ~ r)print r}' <(echo "(\.|^)*\.services\.generalmagic\.com$") <(echo "me.services.generalmagic.com")
(\.|^)*\.services\.generalmagic\.com$
nanopi@nanopi:~$ pihole -q gstatic
Match found in exact whitelist
fonts.gstatic.com
Match found in https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts:
csi.gstatic.com
Match found in https://hosts-file.net/ad_servers.txt:
csi.gstatic.com
p2-aahhyknavsj2m-wtnlrzkba6lht33q-if-v6exp3-v4.metric.gstatic.com
p2-f6rp6piuxns4u-uzq4vp76bu3w2tso-if-v6exp3-v4.metric.gstatic.com
p2-n3zurhre4jjvk-can5rb2f2a4urcxh-if-v6exp3-v4.metric.gstatic.com
p4-ajvwyt5lpjazy-us7r2dzqcjsqh7pt-164149-i1-v6exp3-v4.metric.gstatic.com
p4-ajvwyt5lpjazy-us7r2dzqcjsqh7pt-164149-i2-v6exp3-ds.metric.gstatic.com
p4-ajvwyt5lpjazy-us7r2dzqcjsqh7pt-164149-s1-v6exp3-v4.metric.gstatic.com
p4-ajvwyt5lpjazy-us7r2dzqcjsqh7pt-if-v6exp3-v4.metric.gstatic.com
p5-lj5aujgj7jl7w-r2pmxqvndsgx2im2-931517-i1-v6exp3-v4.metric.gstatic.com
p5-lj5aujgj7jl7w-r2pmxqvndsgx2im2-931517-i2-v6exp3-ds.metric.gstatic.com
s6.netlogstatic.com
v6exp3-ds.metric.gstatic.com
v6exp3-v4.metric.gstatic.com
Match found in https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardMobileAds.txt:
csi.gstatic.com
Match found in https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/AdguardDNS.txt:
metric.gstatic.com
diagnose.igstatic.com
Match found in https://raw.githubusercontent.com/r-a-y/mobile-hosts/master/EasyPrivacy3rdParty.txt:
csi.gstatic.com
diagnose.igstatic.com
Match found in https://www.github.developerdan.com/hosts/lists/ads-and-tracking-extended.txt:
connectivitycheck.gstatic.comwf.cbarsrv.com
gstaticadssl.l.google.com
metric.gstatic.com
anycast.metric.gstatic.com
anycast-stb.metric.gstatic.com
anycast1.metric.gstatic.com
anycast1-stb.metric.gstatic.com
anycast2.metric.gstatic.com
anycast2-stb.metric.gstatic.com
ds.metric.gstatic.com
s-v6exp1-ds.metric.gstatic.com
s-v6exp1-v4.metric.gstatic.com
stbcast.metric.gstatic.com
stbcast-stb.metric.gstatic.com
stbcast2.metric.gstatic.com
stbcast2-stb.metric.gstatic.com
stbcast3.metric.gstatic.com
stbcast3-stb.metric.gstatic.com
stbcast4.metric.gstatic.com
stbcast4-stb.metric.gstatic.com
stbcast5.metric.gstatic.com
stbcast5-stb.metric.gstatic.com
test-ipv6-dot-com-v6exp3-v4.metric.gstatic.com
unicast.metric.gstatic.com
unicast-stb.metric.gstatic.com
unicast2.metric.gstatic.com
unicast2-stb.metric.gstatic.com
v4.metric.gstatic.com
v6exp3-ds.metric.gstatic.com
v6exp3-v4.metric.gstatic.com
NO ERRORS!
There must be a difference between mawk and gawk resulting in the error we've seen.
Is pihole checking for awk during installation? Maybe extend to gawk?