Network Ports: Categorical or NumericalRaffael Marty make me a great advice about the treatment of network ports for visualization.
@Itsuugo #ty! Your choice of box plots for ports is not my favorite. Ports should be treated as categorical variables! Keep posting more!
— Raffael Marty (@raffaelmarty) October 6, 2014
Network PortsThe port numbers are used by the Transport layer to provide host-to-host connectivity of protocol services.The Internet Assigned Numbers Authority (IANA) is responsible for maintaining the official assignments of port numbers for specific uses. More on this here
We can usually distingish 3 ranges of ports:
- Well-known ports [0 to 1023]: They are used by system processes that provide widely used types of network services. You can find here the most used protocols, like HTTP(80), HTTPS(443), SMTP(25), DNS(53), …
- Registered ports [1024 to 49151]: They are assigned by IANA for specific service upon application by a requesting entity.
- Ephemereal ports [49152 to 65535]: This range is used for custom or temporary purposes and for automatic allocation. Team Cymru has a great compilation of default ephemeral port usage and source port selection strategies known to be used by a variety of systems.
Types of variables: Numerical or CategoricalI bought the book OpenIntro Statistics to learn R and statistics, and in the Chapter 1.2.2 it explain the types of variables.
Summarizing, a numerical variable can take a wide range of numerical values, and it is sensible to add, subtract, or take averages with those values. On the other hand, average, sum, and difference of categorical variables have no clear meaning.
Test labI have collected some netflow data from one host of my home network, if we consider network ports as numerical:
Analyzing Destination Ports:
Destination TCP ports are port 80 (HTTP) and port 443 (HTTPS), The average, sum, and difference of them has not clear meaning, so this variable can't be numerical.
On the other hand, if we look at Source Ports:
We can see that most source ports in UDP and TCP are between 49000 to 52000. Also, UDP uses port 123 and 5353 source ports.
NOTE: Searching for udp ports 123 and 5353 in google you can see articles about AirPlay, AppleTV and Apple related technologies.With this information and looking at the Team Cymru table we can assume that this device is working with Apple IOS (It's my IPad)
ConclusionI think that most times you are analyzing network flows you must consider ports as a canonical variable.
Considering all flows originated from local network, SrcPort means a Local Port and DestPort means the ports connected to.
# Filter only flows originated from local network n<-nf[V4 %like% "192.168.1."] # Clean nfdump columns to srcip,srcport,dstport and protocol nports<-n[,c(4,6,7,8),with=FALSE] names(nports)<-c("SrcIP","SrcPort","DestPort","Proto") # Remove ICMP flows nports<-nports[Proto != "ICMP" ] # Melt and group flows to plot n.melt<-melt(nports,measure.vars = (2:3)) n.melt$PortCat<-cut(n.melt$value,c(-1,1024,49151,65535),labels=c("Well Known","Registered","Ephemereal") ) # Table with data table(n.melt$PortCat,n.melt$Proto,n.melt$variable)
## , , = SrcPort ## ## ## TCP UDP ## Well Known 2 2592 ## Registered 44502 33690 ## Ephemereal 91647 19634 ## ## , , = DestPort ## ## ## TCP UDP ## Well Known 134664 31216 ## Registered 1459 24699 ## Ephemereal 28 1
mosaic( ~ PortCat + Proto + variable, data = n.melt)
In this graph you can see that most TCP connections are originated from Ephemeral and Registered Ports to Well Known Ports. UDP connections have a different behavior because the connections are more distributed. ICMP is removed because nfdump use the ICMP code as the destination port.Anyway, if you are dealing with ephemereal ports trying to detect the OS of the host, you need to consider them as a numerical variable.