Check for existing webservers during installation

PromoFaux · June 2, 2017, 12:23pm

OK, I might be wrong (Though I don't think I am)...

How on earth are you going to tell the browser not to make a request on port 80 or 443? Because it will do that. If the block page is listening on any other port then the request from the browser will simply time out.. because there is nothing there listening for it. Unless i'm missing something extremely obvious here, I think you might be overthinking things somewhat!

DL6ER · June 2, 2017, 12:29pm

I fully agree.

Please expand on this, I seemingly don't understand what you mean. Assume my browser lives on Android or some other system where I have no deep control of how to configure it (like a SmartTV) - the browser will request the page on port 80 on the determined IP address, as specified by the HTTP standard:

The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs.

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

If the port is empty or not given, port 80 is assumed. The semantics are that the identified resource is located at the server listening for TCP connections on that port of that host, and the Request-URI for the resource is abs_path (section 5.1.2). The use of IP addresses in URLs SHOULD be avoided whenever possible (see RFC 1900). If the abs_path is not present in the URL, it MUST be given as "/" when used as a Request-URI for a resource (section 5.1.2). If a proxy receives a host name which is not a fully qualified domain name, it MAY add its domain to the host name it received. If a proxy receives a fully qualified domain name, the proxy MUST NOT change the host name.

Cited from RFC 2616 (Hypertext Transfer Protocol - HTTP/1.1)

I explicitly refer to the statement:

If the port is empty or not given, port 80 is assumed.

I think you agreed that DNS cannot provide the port to connect to.

Hence, I don't understand how we can tell any device to not use port 80 (but again: I'm eager it learn!).

No doubt, that would be awesome!

SunderRaj · June 2, 2017, 12:58pm

Hello,

Lets define - for this debate - some technical terms:

REQUEST DATA STREAM:

This call is initiated from a users browser on the workstation. Then it will invoke a DNS query at the DNS resolver.

RETURN (output) DATA STREAM

The DNS resolver will answer and give an output.
Either the domain name FQDN has a public IP or it has a false IP to drop the data stream.

In principle, you are right. Here, you are talking about to the REQUEST DATA STREAM.

However, the pi-Hole makes an intelligent change at the level of - not REQUEST DATA STREAM but - RETURN (output) DATA STREAM, at the time when the outbound connection to the domain in question is returning. Pi-Hole traps it and makes a diversion after the resolution is made, publicIP or fakeIP.

Thereafter, the data streams on RETURN (output) DATA STREAM are passed to the lighttpd. This services gets the incoming data streams with a command to generate a web page of blocking with /var/www/html/pihole/index.php.

Absolutely. I agree. At the time when the RETURN (output) DATA STREAM is passing by, there is no port mapped to it! Why should it be there. These data packets have nothing to do with the browser at the workstation, who actually initiated the call. They are there in a data packet at the server with no references of any ports. The fakeIP simply becomes an EventTrigger to invoke a command to generate a webpage for lighttpd.

What I suggested is to modify these data packets at the time of diversion in the server (not at the workstation).
This chapter of data packets and data streams is a lot of headaches and it is very complicated. So, may be I am wrong. May be there is something I am missing too.

My problem is, that I do not have any solution to what I am saying that could be achieved. I only have a feeling that this could be achieved differently that how it is implemented. However, even here, I could be making a terrible mistake and could be wrong.

DL6ER · June 2, 2017, 1:08pm

Agreed, however before we can return data, you have to be aware that somebody is asking (the REQUEST part of the procedure). Due to the HTTP specifications, the browser will request the data on port 80 of our server. If we don't have anything here that is listening to the incoming request, how should we give an answer?

Or are you talking about sending the blocking page w/o having the browser even asking for it? (what it would be doing on port 80).

PromoFaux · June 2, 2017, 1:11pm

OK, I'm out. I've a feeling this is getting way beyond my levels of understanding!

Slightly off topic, but when you say this, I cannot help but think of this video by the Onion News Network...

(^ Posted with my tongue wedged firmly into my cheek)

deathbybandaid · June 2, 2017, 1:47pm

Just wanted to say thanks for the best laugh that I will have today.

I've been following this topic, and it's an interesting debate/argument.

If the idea works, then that would be really awesome, giving more options for a customized setup.

However, I'm still struggling to see EXACTLY how it will be implemented.

Edit: The only way I can see the custom port working is

Browser requests on Port 80
iptables forwards port 80 to 8080 (or whatever port)

but that isn't a working solution and defeats the purpose.

I'm going to do a test. I'm going to use dnsmasq to do

`addn-hosts=/etc/piadvanced/piholetweaks/customRedirect.list

and within that file do

PIHOLE-IP:8080 google.com

Port 8080 for me runs a tomcat guacamole server.

If it works, then google.com should direct to guacamole, right?

Also Edit:

Here's the answer,

bad address at /etc/piadvanced/piholetweaks/customRedirect.list line 1

I don't think dnsmasq is capable of sending queries to a specific port. However, I cannot speak for others such as Unbound or Bind

Another Edit:

I tried

address=/google.com/PiHoleIP:8080

as well, and dnsmasq does not even start

SunderRaj · June 2, 2017, 4:01pm

Hello deathbybandaid,

You gave me a bad news. Thats one of the test I was going to make. I of course saved my time but got disappointed that it did not work because this was ONE POSSIBILITY.

Thats very correct. First lets make one more thing clear: There are two ports involved: Source port and Destination port. Every outbound http (or https) call - as in your example - on port 80 (or 443) will have

UserSendingSourceIP:WorkstationSourcePort
and
DomainDotCom:53

This will be changed or forwarded to FakeBlockedPiHoleGivenIP:PiHoleDestinationPort. When the data transmission from UserSendingSourceIP starts, if nothing is given, then DestinationPort is added by the browser in the TCP/IP packets as 80 by default. This is explained here very good in No. 19:

This is how tcp packets in the TCP Header are constructed before sending, as seen in the table here:

We know about Port forwarding too. I refer to the Remote port forwarding here:

It mentions: "Remote port forwarding allows other computers to access applications hosted on remote servers. ". This means that the UserSendingSourceIP wants to access from his WorkstationSourcePort at home to the PiHoleDestinationPort, for e.g. 2096. Such a change of ports by proxies is not new. I quickly googled and found here:

http://whatismyipaddress.com/port-forwarding

All what I mentioned above is described in different words under the title: "A look at port forwarding.". Port forwarding and proxies: It probably won't surprise you to learn that Web proxies use a port-forwarding service.
Did I now make myself clear on port forwarding of data packets and explain that this idea is feasible or could work?

SunderRaj · June 2, 2017, 4:31pm

Oops, there is one more thing that I missed to mention in my message above:

The idea of port forwarding I have suggested as a concept may/could primarily work with OpenVPN i.e. after a tunnel is constructed. With this, the local port 80 on the server remains free for normal web services like apache@80 and apache@443.

DL6ER · June 2, 2017, 4:35pm

Yes, I now see what you are targeting at, however, I'm unsure how you think about the actual implementation.

I agree.

The only working scenario I see right now could be: Detect if the user is requesting an ad domain (let's say pi.hole for now) on port 80 using, e.g. iptables. If we detect that the user asked for such a domain, we forward the port to 1234 or whatever where we have lighttpd listening. If not (like re requested the IP itself or some user-defined domain) we let the connection unaltered and it may reach an already existing apache2 server.

Is this what you are thinking about @SunderRaj ? As of now, I'm unsure if iptables can do it and how it would affect performance if we load it with some hundred thousands domains to look for.

SunderRaj · June 2, 2017, 4:57pm

Hello DL6ER,

After thinking on some issues, "I think" there is a change in the concept, if one talks about installation of pi-Hole within the home network and pi-Hole installed in a data center.

A. Concept "pi-Hole in a home network"

If pi-Hole is installed in a home network, then one gets the output from lighttpd @80 or @443. Thats the web server giving a response. This becomes the main intercepting server. Thus, the concept of port forwarding is related to the @80 in there. Here, if the output is given @80 or if it is shifted to @2086, is one and the same. The port 80 DOES NOT BECOME FREE FOR ITS USE BY OTHER SERVICE LIKE APACHE.

B. Concept "pi-Hole in a data center"

Here, we want to have the port 80/443 free. By using OpenVPN, the default port 80/443 already remains free for nginx, apache, etc. So, we work with the incoming port through the OpenVPN. This helps in port forwarding because the data stream does not bang immediately @ port @80 or @443 but lands at the tunnel port, which could then be forwarded.

In this case, apache/nginx web servers could function normal @80 and @443. The tunnel traffic from OpenVPN is then diverted to a different port based on detection of a specific IP for blocked domains.

@DL6ER, did you now understand what I am talking about?

DL6ER · June 2, 2017, 5:03pm

Yes, I see, but this does not sound like something I like very much as it makes OpenVPN an integral part of the story. You are basically talking about having either

the firewall intercepting packets that are incoming over VPN with destination port 80 and subsequently forwarding them to port 2086 internally, or
limit lighttpd to listen only on the OpenVPN interface.

Albeit 1 and 2 are entirely different approaches, they would in the end result in the same behavior, as in:

nothing is blocking the public port 80
any requests to port 80 coming over OpenVPN are answered by lighttpd

Correct me if I'm, still mistaken...

SunderRaj · June 2, 2017, 5:06pm

Well, then you may be successful to forward it to port 1234. What did you achieve - as an advantage - then? Nothing.

The entire web server gets occupied on @80 as well as @1234, right? Because you cannot reuse the port 80, if it is to be diverted to port 1234. But in your example, there seems to be one thing: "lets say pi.hole" domain. This means that one could configure http apache @80 with the domain pi.hole with a NameVirtualHost:1234. THEN YOU COULD REUSE THE PORT 80 forwarded to 1234. Is that what you meant?

DL6ER · June 2, 2017, 5:12pm

No, I meant using the firewall (iptables) to intercept incoming packages and redirect them to port 1234 if they match some criterion we set.

By this, we can leave port 80 untouched altogether (except in a firewall rule, of course) and don't have to care about if there is a service running or not on this port.

deathbybandaid · June 2, 2017, 5:18pm

dnsmasq will always send you to port 80,,,,

IPTables is kinda a "hacky" solution to fix the issue.

This typically would only really matter if the device is being accessed externally.

Also, in a corporate environment, you are probably already behind a firewall.

I still believe the best solution is the one I suggested earlier in the thread.

If dnsmasq always sends you to port 80/443, then the best solution is to give pihole it's own interface, and use iptools to protect the device.

SunderRaj · June 2, 2017, 5:27pm

I can follow and understand you completely as well as agree to this. It also clarifies a bit and helps me some aspects in my approach that were not really very clear. Thanks for this.

Yes, I know that it is possible to have iptables detect the incoming packages based on host. You may be able to implement your idea to resolve into pi.hole and have the iptables do the job. I am not clear if this shall not work, if the pi-Hole is installed in a data center because you will not be able to resolve pi.hole on a public traffic data streams. They are initiated already after the dnsmasq has responded and given the IP. But I think the IP returned would be that of the pi-Hole server, it would mean that the

Domain.Com:80 = Pi-HolePublicIP:80

I will have to think on it.

The dnsmasq does nothing more that resolve the domain.com@53. It is not the dnsmasq that send to port 80. The whole game begins after the dnsmasq function is over and it has given the IP to the Workstation calling for the resolution. So when the Ip is returned or resolved, the answer comes back to a SourceIP @ Workstation and the data packets are composed to contain the port 80 at the workstation.

deathbybandaid · June 2, 2017, 5:42pm

Let's say pihole is running on 192.168.1.100

Order of operations:

computer requests github.com from it's primary dns server (on port 53)
dnsmasq listens on port 53 to respond to dns requests
dnsmasq sees that computer wants github.com and replies with 192.30.253.112

for the sake of the story, one of the users blacklists contains github.com

since github.com is in the blacklist it will instead reply 192.168.1.100
computer sees address and goes to it

The computer doesn't know to go to or try any other port. dnsmasq can't tell the computer to go to a specific port. Computer will always assume port 80/443.

If lighttpd (or any other webserver) isn't at that IP address listening to port 80/443, it will just give up, as there is no webpage available to display.

dnsmasq resolves only to an IP address, not to a port.

lighttpd (Pi-Hole webui), is just the pretty face for that IP address to display, when a domain is blocked.

devices connect to the network will always go to port 80/443.

Mcat12 · June 2, 2017, 5:52pm

Here, it looks like you are saying that dnsmasq is talking directly to lighttpd instead of the client processing dnsmasq's answer and querying lighttpd on the default port (80/443). Since dnsmasq cannot do this, this method is invalid.

Also, port forwarding/proxying would require a middleman between the Pi-hole and the client, which is currently not present in a home system.

DL6ER · June 2, 2017, 6:08pm

Yes. Let's assume my datacenter server IP is 1.2.3.4 and I request www.ads.com

dnsmasq will give me 1.2.3.4 and my browser will connect to port 80 of my server in the datacenter. There, the firewall intercepts that package and analyzes the packet.

The trick is now, that the hostname is included in the initial handshake to support servers which have multiple host names (with different certificates) on the same IP address (SNI, Server Name Indication). Accordingly, a HTTP header will look like:

GET /
Host: www.ads.com

Note that this imposes no limitations when thinking about HTTPS, because the first message from the client (ClientHello), that is before any identification and key exchange is done, is always transported in the clear.

The firewall can now decide what to do with the package - if the Host record matches an ad domain, it can silently forward it to port 1234 where out blocking page is waiting. If, however, the Host record does not match any configured domain, we know that the user wants to connect to the server itself, i.e. whatever there might be waiting on port 80.

SunderRaj · June 2, 2017, 6:34pm

Nop. I never said this. In fact, as you could see my message to @deathbybandaid, I clarified that this is just the wrong thought.

DNSmasq uses - or could use - UDP protocol. After the domain resolution, the hypertext transfer protocol story changes to a TCP/IP Protocol@80/443 (or any other). Here, when there is a change in the protocol, how could I ever dare to make such a fundamental mistake?

Oops, did you mean to say a device? But port forwarding could be done by some configuration with some binaries existing in a linux env.

Yes. After the IP address is returned by dnsmasq on UDP protocoll, the IP is written in the TCP/IP Packet for the "hopping journey" to begin with the defined destination of that IP.

Well, that the tricky part of it. This would be "kind of " web content filtering.

To be honest, here, I have only some basic knowledge. So I need to inform myself on this. DL6ER, somehow all what you are writing above appears convincing and promising. Thus, I would NOT neglect this idea without a proper and through investigation.

DL6ER · June 2, 2017, 6:40pm

I'm interested in seeing what you may come up with. As already said my major concern at this point is that I don't know what may happen to the performance of IPTables if we would install tens of thousands (or even worse) rules, each requiring a packet inspection.

So, although the idea is promising, I'm really holding back with my expectations. I should note that I have had another idea to tackle the issue, but that goes a whole different way (e.g. lighttpd would still sit on port 80 and neither OpenVPN nor a firewall is part of the plan) and needs a lot more research on my side.