Thursday, January 22, 2015

Visualizing Internet with Igraph

Visualizing the Countries Internet connections with Igraph

After reading this awesome article I wanted to do a PoC using R, so I can practice the knowledge acquired during the coursera courses.
I consider that you know how is the Internet connected, the meaning of AS, BGP, …, but if you don’t know, I recommend reading first the @ThibaultReuille article.
We are going to use the tool bgpdump and the BGP table from one of the RIPE BGP collectors to obtain the data. This is a simple example, there are a lot of articles about the Internet topology, particularly interesting CAIDA webpage if you want to deep more on this topic.


We need to compile the bgpdump tool

cd ripencc-bgpdump-f78c0095a2c4/
Running Regression Tests...
      testing bview.20020722.2337.gz...success
      testing bview.20100101.0759.gz...success
      testing bview.20100701.0759.gz...success
      testing rrc07-bview.20100101.0759.gz...success
      testing updates.20020722.2238.gz...success
      testing updates.20071015.1505.gz...success
      testing updates.20100722.2015.gz...success


Recommended lecture:
IP to ASN Mapping is NOT a GeoIP service!
The country code, registry, and allocation date are all based on data >obtained directly from the regional registries including: ARIN, RIPE, >AFRINIC, APNIC, LACNIC. The information returned relating to these >categories will only be as accurate as the data present in the RIR >databases.
IMPORTANT NOTE: Country codes are likely to vary significantly from >actual IP locations, and we must strongly advise that the IP to ASN >mapping tool not be used as an IP geolocation (GeoIP) service.
We need to download some files and aggregate the data with a simple bash online:
# Download sources and get the ASN2Country table
# Process files
for i in delegated*; do grep asn $i | grep -v '*' | grep -v reserved >> /tmp/asn2country.txt; done


The graph and connections will depend on the collector you choose because of the dynamic nature of the Internet, better explained in this paper. We will use Amsterdam data for this example.
gunzip latest-bview.gz
bgpdump -m latest-bview -O /tmp/bgptable.txt
We have 2 files with the data, we need to clean and process it because there are weird things in AS path field and care must be taken:
  • IANA has reserved AS64512 through to AS65535 to be used as private ASNs
  • The states of ASNs are: allocated, assigned, available and reserved
  • Curly braces represent aggregated routes Understanding Route Aggregation in BGP
I have done a dirty perl script to process the BGP table dump and generate the edges file, you have to run it and store the output in one file ./ /tmp/bgptable.txt > /tmp/asnedges.txt

Network analysis with R

We have now 2 datasets, one with the links between ASs and other with the correspondence between AS and Country.
We’ll use igraph to do the data analysis.

These are some samples of different countries, the quality is far from OpenGraphiti, but it’s enough to show us interesting things.
The size of the icons is proportional to the number of connected AS. The circles in red represent regional ASs.


Picture is not sharp enough because of the quantity of ASs and links, as you can see with the igraph console.

# Number of ASs
> vcount(bsk)
[1] 21677
# Number of links
> ecount(bsk)
[1] 42352
# Top 6 connected ASs
> head(sort(igraph::degree(bsk),decreasing = TRUE))
 174 3356 6939 7018 4323  209 
4463 4130 3300 2371 1864 1527 
174 <- COGENT
3356 <- LEVEL3
7018 <- ATT
4323 <- TW Telecom (acquired by Level3)
209 <- QWEST
You can verify this data with HE BGP Peer Report and there are similar results.
Furthermore, you can see that are a lot of hubs giving connectivity to foreign AS. In conclusion, as Thibault Reuille said for Canada network:
It is big, highly connected and fairly complex.
But I want to show an interesting thing that we can do with igraph:

#Show the largest “clique”.
#A clique is a maximally-connected subgraph in which every vertex connects to every other vertex.
lc <- largest.cliques(bsk)
# create a new graph of the largest clique
V(bsk)$label <- V(bsk)$name <- subgraph(bsk, lc[[1]])
plot(,vertex.label.cex = 2, main = "Largest clique of USA")
As you can see the most connected ASs are connected in a mesh.

Singapur and Ukraine

I have done this examples so you can compare with the opengraphiti article, showing how is connected the country to the rest of the Internet.
You can export the graphs to a file and use other tools like Gephi or Cytoscape to manipulate and enhance the graphs, but IMHO I don’t think they can be as good as Opengraphiti samples.