Tuesday, February 10, 2015

R netflow analytics I

This is my first try on using R in network analytics. For this example I'm going to use the dataset used on Raffy's blog Cleaning Up Network Traffic Logs - VAST 2013 Challenge.

As you can see in the network topology , the netflow collector is logging all traffic to/from internet.

The package data.table has a great performance loading the data, as you can see:

Read 15172767 rows and 19 (of 19) columns from 1.777 GB file in 00:01:11
   user  system elapsed 
  47.75    2.56   71.65 

So we can analyse the data, Let's make a pair of graphs to understand the data:

plot of chunk unnamed-chunk-2

As you can see:

  • TCP is by far the most used protocol
  • ICMP has few responses, common in places with firewalls that deny it from internet

Other interesting graphs:

plot of chunk unnamed-chunk-3

The ports usage is important because you can identify an OS by the Ephemeral Source Port Selection Strategies.

Futhermore, most internet services (http,smtp,pop, …) run below port 1024.

