OK, I might be wrong (Though I don't think I am)...
How on earth are you going to tell the browser not to make a request on port 80 or 443? Because it will do that. If the block page is listening on any other port then the request from the browser will simply time out.. because there is nothing there listening for it. Unless i'm missing something extremely obvious here, I think you might be overthinking things somewhat!
Please expand on this, I seemingly don't understand what you mean. Assume my browser lives on Android or some other system where I have no deep control of how to configure it (like a SmartTV) - the browser will request the page on port 80 on the determined IP address, as specified by the HTTP standard:
The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs.
If the port is empty or not given, port 80 is assumed. The semantics are that the identified resource is located at the server listening for TCP connections on that port of that host, and the Request-URI for the resource is abs_path (section 5.1.2). The use of IP addresses in URLs SHOULD be avoided whenever possible (see RFC 1900). If the abs_path is not present in the URL, it MUST be given as "/" when used as a Request-URI for a resource (section 5.1.2). If a proxy receives a host name which is not a fully qualified domain name, it MAY add its domain to the host name it received. If a proxy receives a fully qualified domain name, the proxy MUST NOT change the host name.
Lets define - for this debate - some technical terms:
REQUEST DATA STREAM:
This call is initiated from a users browser on the workstation. Then it will invoke a DNS query at the DNS resolver.
RETURN (output) DATA STREAM
The DNS resolver will answer and give an output.
Either the domain name FQDN has a public IP or it has a false IP to drop the data stream.
In principle, you are right. Here, you are talking about to the REQUEST DATA STREAM.
However, the pi-Hole makes an intelligent change at the level of - not REQUEST DATA STREAM but - RETURN (output) DATA STREAM, at the time when the outbound connection to the domain in question is returning. Pi-Hole traps it and makes a diversion after the resolution is made, publicIP or fakeIP.
Thereafter, the data streams on RETURN (output) DATA STREAM are passed to the lighttpd. This services gets the incoming data streams with a command to generate a web page of blocking with /var/www/html/pihole/index.php.
Absolutely. I agree. At the time when the RETURN (output) DATA STREAM is passing by, there is no port mapped to it! Why should it be there. These data packets have nothing to do with the browser at the workstation, who actually initiated the call. They are there in a data packet at the server with no references of any ports. The fakeIP simply becomes an EventTrigger to invoke a command to generate a webpage for lighttpd.
What I suggested is to modify these data packets at the time of diversion in the server (not at the workstation).
This chapter of data packets and data streams is a lot of headaches and it is very complicated. So, may be I am wrong. May be there is something I am missing too.
My problem is, that I do not have any solution to what I am saying that could be achieved. I only have a feeling that this could be achieved differently that how it is implemented. However, even here, I could be making a terrible mistake and could be wrong.
Agreed, however before we can return data, you have to be aware that somebody is asking (the REQUEST part of the procedure). Due to the HTTP specifications, the browser will request the data on port 80 of our server. If we don't have anything here that is listening to the incoming request, how should we give an answer?
Or are you talking about sending the blocking page w/o having the browser even asking for it? (what it would be doing on port 80).
You gave me a bad news. Thats one of the test I was going to make. I of course saved my time but got disappointed that it did not work because this was ONE POSSIBILITY.
Thats very correct. First lets make one more thing clear: There are two ports involved: Source port and Destination port. Every outbound http (or https) call - as in your example - on port 80 (or 443) will have
UserSendingSourceIP:WorkstationSourcePort
and
DomainDotCom:53
This will be changed or forwarded to FakeBlockedPiHoleGivenIP:PiHoleDestinationPort. When the data transmission from UserSendingSourceIP starts, if nothing is given, then DestinationPort is added by the browser in the TCP/IP packets as 80 by default. This is explained here very good in No. 19:
This is how tcp packets in the TCP Header are constructed before sending, as seen in the table here:
We know about Port forwarding too. I refer to the Remote port forwarding here:
It mentions: "Remote port forwarding allows other computers to access applications hosted on remote servers. ". This means that the UserSendingSourceIP wants to access from his WorkstationSourcePort at home to the PiHoleDestinationPort, for e.g. 2096. Such a change of ports by proxies is not new. I quickly googled and found here:
All what I mentioned above is described in different words under the title: "A look at port forwarding.". Port forwarding and proxies: It probably won't surprise you to learn that Web proxies use a port-forwarding service.
Did I now make myself clear on port forwarding of data packets and explain that this idea is feasible or could work?
Oops, there is one more thing that I missed to mention in my message above:
The idea of port forwarding I have suggested as a concept may/could primarily work with OpenVPN i.e. after a tunnel is constructed. With this, the local port 80 on the server remains free for normal web services like apache@80 and apache@443.
Yes, I now see what you are targeting at, however, I'm unsure how you think about the actual implementation.
I agree.
The only working scenario I see right now could be: Detect if the user is requesting an ad domain (let's say pi.hole for now) on port 80 using, e.g. iptables. If we detect that the user asked for such a domain, we forward the port to 1234 or whatever where we have lighttpd listening. If not (like re requested the IP itself or some user-defined domain) we let the connection unaltered and it may reach an already existing apache2 server.
Is this what you are thinking about @SunderRaj ? As of now, I'm unsure if iptablescan do it and how it would affect performance if we load it with some hundred thousands domains to look for.
After thinking on some issues, "I think" there is a change in the concept, if one talks about installation of pi-Hole within the home network and pi-Hole installed in a data center.
A. Concept "pi-Hole in a home network"
If pi-Hole is installed in a home network, then one gets the output from lighttpd @80 or @443. Thats the web server giving a response. This becomes the main intercepting server. Thus, the concept of port forwarding is related to the @80 in there. Here, if the output is given @80 or if it is shifted to @2086, is one and the same. The port 80 DOES NOT BECOME FREE FOR ITS USE BY OTHER SERVICE LIKE APACHE.
B. Concept "pi-Hole in a data center"
Here, we want to have the port 80/443 free. By using OpenVPN, the default port 80/443 already remains free for nginx, apache, etc. So, we work with the incoming port through the OpenVPN. This helps in port forwarding because the data stream does not bang immediately @ port @80 or @443 but lands at the tunnel port, which could then be forwarded.
In this case, apache/nginx web servers could function normal @80 and @443. The tunnel traffic from OpenVPN is then diverted to a different port based on detection of a specific IP for blocked domains.
@DL6ER, did you now understand what I am talking about?
Yes, I see, but this does not sound like something I like very much as it makes OpenVPN an integral part of the story. You are basically talking about having either
the firewall intercepting packets that are incoming over VPN with destination port 80 and subsequently forwarding them to port 2086 internally, or
limit lighttpd to listen only on the OpenVPN interface.
Albeit 1 and 2 are entirely different approaches, they would in the end result in the same behavior, as in:
nothing is blocking the public port 80
any requests to port 80 coming over OpenVPN are answered by lighttpd
Well, then you may be successful to forward it to port 1234. What did you achieve - as an advantage - then? Nothing.
The entire web server gets occupied on @80 as well as @1234, right? Because you cannot reuse the port 80, if it is to be diverted to port 1234. But in your example, there seems to be one thing: "lets say pi.hole" domain. This means that one could configure http apache @80 with the domain pi.hole with a NameVirtualHost:1234. THEN YOU COULD REUSE THE PORT 80 forwarded to 1234. Is that what you meant?
No, I meant using the firewall (iptables) to intercept incoming packages and redirect them to port 1234 if they match some criterion we set.
By this, we can leave port 80 untouched altogether (except in a firewall rule, of course) and don't have to care about if there is a service running or not on this port.
I can follow and understand you completely as well as agree to this. It also clarifies a bit and helps me some aspects in my approach that were not really very clear. Thanks for this.
Yes, I know that it is possible to have iptables detect the incoming packages based on host. You may be able to implement your idea to resolve into pi.hole and have the iptables do the job. I am not clear if this shall not work, if the pi-Hole is installed in a data center because you will not be able to resolve pi.hole on a public traffic data streams. They are initiated already after the dnsmasq has responded and given the IP. But I think the IP returned would be that of the pi-Hole server, it would mean that the
Domain.Com:80 = Pi-HolePublicIP:80
I will have to think on it.
The dnsmasq does nothing more that resolve the domain.com@53. It is not the dnsmasq that send to port 80. The whole game begins after the dnsmasq function is over and it has given the IP to the Workstation calling for the resolution. So when the Ip is returned or resolved, the answer comes back to a SourceIP @ Workstation and the data packets are composed to contain the port 80 at the workstation.
computer requests github.com from it's primary dns server (on port 53)
dnsmasq listens on port 53 to respond to dns requests
dnsmasq sees that computer wants github.com and replies with 192.30.253.112
for the sake of the story, one of the users blacklists contains github.com
since github.com is in the blacklist it will instead reply 192.168.1.100
computer sees address and goes to it
The computer doesn't know to go to or try any other port. dnsmasq can't tell the computer to go to a specific port. Computer will always assume port 80/443.
If lighttpd (or any other webserver) isn't at that IP address listening to port 80/443, it will just give up, as there is no webpage available to display.
dnsmasq resolves only to an IP address, not to a port.
lighttpd (Pi-Hole webui), is just the pretty face for that IP address to display, when a domain is blocked.
devices connect to the network will always go to port 80/443.
Here, it looks like you are saying that dnsmasq is talking directly to lighttpd instead of the client processing dnsmasq's answer and querying lighttpd on the default port (80/443). Since dnsmasq cannot do this, this method is invalid.
Also, port forwarding/proxying would require a middleman between the Pi-hole and the client, which is currently not present in a home system.
Yes. Let's assume my datacenter server IP is 1.2.3.4 and I request www.ads.com
dnsmasq will give me 1.2.3.4 and my browser will connect to port 80 of my server in the datacenter. There, the firewall intercepts that package and analyzes the packet.
The trick is now, that the hostname is included in the initial handshake to support servers which have multiple host names (with different certificates) on the same IP address (SNI, Server Name Indication). Accordingly, a HTTP header will look like:
GET /
Host: www.ads.com
Note that this imposes no limitations when thinking about HTTPS, because the first message from the client (ClientHello), that is before any identification and key exchange is done, is always transported in the clear.
The firewall can now decide what to do with the package - if the Host record matches an ad domain, it can silently forward it to port 1234 where out blocking page is waiting. If, however, the Host record does not match any configured domain, we know that the user wants to connect to the server itself, i.e. whatever there might be waiting on port 80.
Nop. I never said this. In fact, as you could see my message to @deathbybandaid, I clarified that this is just the wrong thought.
DNSmasq uses - or could use - UDP protocol. After the domain resolution, the hypertext transfer protocol story changes to a TCP/IP Protocol@80/443 (or any other). Here, when there is a change in the protocol, how could I ever dare to make such a fundamental mistake?
Oops, did you mean to say a device? But port forwarding could be done by some configuration with some binaries existing in a linux env.
Yes. After the IP address is returned by dnsmasq on UDP protocoll, the IP is written in the TCP/IP Packet for the "hopping journey" to begin with the defined destination of that IP.
Well, that the tricky part of it. This would be "kind of " web content filtering.
To be honest, here, I have only some basic knowledge. So I need to inform myself on this. DL6ER, somehow all what you are writing above appears convincing and promising. Thus, I would NOT neglect this idea without a proper and through investigation.
I'm interested in seeing what you may come up with. As already said my major concern at this point is that I don't know what may happen to the performance of IPTables if we would install tens of thousands (or even worse) rules, each requiring a packet inspection.
So, although the idea is promising, I'm really holding back with my expectations. I should note that I have had another idea to tackle the issue, but that goes a whole different way (e.g. lighttpd would still sit on port 80 and neither OpenVPN nor a firewall is part of the plan) and needs a lot more research on my side.