While I'm not a developer on the pi-hole project, I can offer some general information.
In this instance, caching is 'in memory' not 'on disk' (which your SD card is acting as). You won't 'fill up' anything other than RAM and even then it is limited to a number of entries, 10K I think by default.
Now to your other questions. Only the domain look up is cached. That's all pi-hole is, a DNS with a bit of snazzy filtering (and DHCP too). These look ups are cached to speed up satisfying your client's requests. They don't have to wait for a potentially longer round trip to your upstream resolver.
Domains are cached based on their TTL (time to live), I believe. So part of the information obtained from the upstream resolver is 'how long is this information good for?'.
I would NOT recommend to disable caching unless you have a really good reason. It's trivial as to RAM usage for the small amount of data and largely driven by the number of different domains you are looking up in your network. It absolutely helps, especially when navigating the same web site over an over. Those domain look ups are sub-second, usually.
In Pi-hole, it is the domain name IP only. Pi-hole does not see any of the content that is loaded after the domain name is resolved to an IP.
Browsers typically cache some content, but that is independent of Pi-hole.
All resolved IP's are cached by Pi-hole. When the upstream DNS server provides an IP, it includes the Time to Live (TTL) provided by the authoritative nameserver. That time is in seconds, and Pi-hole keeps that in cache memory for that duration. At the end of the TTL, the entry expires from the cache.
When you run a dig command, the TTL is shown in the answer section. The example below shows the 900 second TTL for pi.hole.net. Most TTLs are much less than this (typically less than 5 minutes), as shown in the second example.
dig cnn.com @1.1.1.1
...
;; ANSWER SECTION:
cnn.com. 59 IN A 151.101.65.67
cnn.com. 59 IN A 151.101.129.67
cnn.com. 59 IN A 151.101.193.67
cnn.com. 59 IN A 151.101.1.67
;; Query time: 22 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Aug 12 09:54:39 CDT 2021
;; MSG SIZE rcvd: 100
Very little, and it is memory.
No. Caching improves performance. Instead of domain name resolutions takings tens of msecs or more to query an upstream DNS server, Pi-hole can serve the anser from cache in 1 msec or less (essentially instantaneously).
Caching in Pi-hole is independent of your chosen upstream DNS server. Pi-hole caches regardless of which upstream DNS server(s) you use, and it does help.
Ahh I see, as it is in the volatile memory, does it get removed every time I reboot pi? so what if I visit a site regularly and want it to be cached, does it get written on sdcard on reboot and put back in memory on boot, or does pi has to cache it every time it gets deleted from memory on reboot.
So what I understand is, if I put google.com, instead of searching for google's IP, pihole caches google.com's IP for later use, let me know if I'm correct. And if I am correct, just resolving the IP takes so much time! then pihole would save me much time.... in the long run, I suppose.
Yes. Note that most all TTLs are short lived and expire naturally from cache. If you are away from your computer for an hour or asleep all night, when you next use the internet the Pi-hole cache will be almost empty as well.
Won't it be better if it didn't have to cache from scratch on boot i.e save on sdcard?. Would save time on the first domain lookup, as there's not much time to save anyway, we're talking about ms here.
Of course, only a small number of the most frequently visited sites should be cached in sdcard.
At best, all that would do is delay requesting upstream resolution until TTL expires.
At startup, you'd have check domains from such a persistent cache for expired TTLs anyway, lest you'd run the risk of using stale data.
Really? Is there nothing in your house that uses the Internet while you are asleep? (thinking of cell phones with enabled WiFi, and other devices such a internet radios, etc.).
Saving the cache to the drive wouldn't get you much gain. Most major sites have TTL values (and thus cache time) of 5 minutes or less. The time it takes to write the cache, reboot and then read the cache would be longer than the lifetime of the value in cache.