The Internet Census 2012 is a report on how all the IPv4 addresses are being used on the Internet. It’s also a botnet of epic scale.
The methodology is actually far more interesting than the conclusions of the paper: the author (who did not identify himself beyond his public PGP key… which makes sense because his methodology is absolutely illegal) didn’t have access to equipment that could port scan the entire internet himself, so he “borrowed” the tools he needed.
The author wrote software (to some, a virus) that would start port scanning a pre-defined subset of IP addresses with 128 simultaneous connections at a time. With each scan, the software would do its research (stuff like a Reverse DNS, ICMP Ping, Traceroute, etc…) but then it would also try to “recruit” the destination address to its benevolent research botnet by trying common passwords like root/root or admin/admin. If the software could break into the destination address, then the software would copy itself on to the destination machine and the destination machine would now become a part of the researching botnet and the system would continue to self-perpetuate until the whole IPv4 Internet had been scanned.
As a result of his white hat hacking, the author was able to port scan the entire Internet in a single night. I’m amazed by the scale and resourcefulness of their approach.
The methodology of this Internet study is the very similar to the Morris worm, written by Robert Tappan Morris (RTM) back in 1988. RTM wanted to determine the size of the internet at that time (just like the 2012 Census), but unfortunately RTM’s self-propagation code had a small bug. It failed to check to see if the destination computer it was addressing had already been captured and counted. The result was a web-wide DDoS attack. Fun side note, RTM is one of the founding partners at YCombinator.
The purpose of this post is to shine light on the amazing methodology in this study. But, for context on the image at the top of this post, it is a picture of one of the conclusions: IP address space is under-utilized. All that black space is reserved addresses that are going unused and for the most part cannot be reclaimed. Hence the push to IPv6.
Given that the purpose of this study was to determine how big is the Internet, I’ll leave you with the author’s closing answer to that question:
So, how big is the Internet?
That depends on how you count. 420 Million pingable IPs + 36 Million more that had one or more ports open, making 450 Million that were definitely in use and reachable from the rest of the Internet. 141 Million IPs were firewalled, so they could count as “in use”. Together this would be 591 Million used IPs. 729 Million more IPs just had reverse DNS records. If you added those, it would make for a total of 1.3 Billion used IP addresses. The other 2.3 Billion addresses showed no sign of usage.
(hat tip to O’Reilly Radar on my source)