Network Profiling with Randomly Sampled Data
By Doctor Electron
A census of people defines the normal and the unusual. The present net census of computers connected to the internet similarly shows the most common or normal responses to establish a baseline. With the baseline defining the normal, interesting deviations from normal characteristics and extraordinary cases can then be identified. Thus, thousands of computers and networks may be examined in a net census for two basic purposes: to establish a baseline and to describe cases which deviate significantly from the normal.
Broad categories of responses by computers to our survey items are described in this paper. Similar analyses may be performed on networks defined by more focussed IP address prefixes (e.g., x.y/16 or x.y.z/24). A further report in preparation will perform a similar profiling analysis with additional variables based on responses by specific listening ports on surveyed machines.
The present study shows that noteworthy differences among networks may be described and used to develop categories of networks. A profiling scheme is presented that defines nine types of networks.
Methods
Data was collected using random sampling of IP address space using procedures and response criteria described in detail previously [1, 2]. Four response categories were observed:
(1) ICMP echo reply (Ping),
(2) ICMP error messages elicited by ICMP echo requests (Error),
(3) established TCP connections to common ports such as 21, 22, 23, 25, 80, 113, 139 and 443 (TCP) [RFC 1700] and
(4) TCP connection refused errors (10061).The author wrote the software used for data collection. Descriptions of recipients of address space allocations in Table 1 were obtained from IANA [IANA].
Results
The major finding is that almost all networks deviate significantly from average behavior. Thus, the truly average network may be as elusive as the truly average person.
Table 1 shows the ratios of responses observed compared to expected, for each of the four response categories (Ping, Error, TCP and 10061) in IP x/8 address spaces which will be referred to as "networks" (rows). The networks are listed in descending order for each of the most prominent response categories observed (highest ratios).
These ratios permit comparison among the response types and among the address spaces or networks. A ratio greater than 1.0 indicates that more responses were obtained than expected by random distribution of the data. Conversely, ratios less than 1.0 indicate fewer than expected responses. For example, in the first row for 32/8 addresses, 125% and 82% describe the observed Error and TCP response counts compared to expected (100%).
Table 1: Observed/Expected Responses at IPv4/8 Addresses X/8 Description - Echo Repliers Ping Error TCP 10061 Total pP pE pT pR G 032/8 Norsk Informasjonsteknologi 4.00 1.25 0.82 0.60 21 1.1 057/8 SITA (French) 3.00 3.00 0.00 1.33 13 1.2 213/8 RIPENCC-Europe 2.26 1.55 0.70 0.91 1482 -19 -14 -34 1.2 211/8 APNIC-Pacific Rim 2.24 0.76 0.86 1.13 3232 -39 -10 -16 -3 1.4 217/8 RIPENCC-Europe 2.19 0.90 0.89 0.98 1174 -13 -4 1.1 208/8 ARIN-North America 1.83 1.27 0.96 0.63 1537 -10 -4 -19 1.2 209/8 ARIN-North America 1.83 0.74 1.19 0.48 3042 -20 -12 -31 -90 1.3 210/8 APNIC-Pacific Rim 1.80 1.53 0.83 0.76 2049 -13 -19 -15 -10 1.2 216/8 ARIN-North America 1.73 0.67 1.18 0.61 3799 -20 -26 -32 -53 1.3 207/8 ARIN-North America 1.60 1.38 0.94 0.67 1655 -7 -8 -16 1.2 X/8 Description - ICMP Volunteers Ping Error TCP 10061 Total pP pE pT pR G 010/8 IANA-Private Use 0.00 5.86 0.01 0.00 130 -99 -99 2.2 154/8 Various Registries 0.20 5.08 0.10 0.29 76 -4 -28 -29 -6 2.2 135/8 Various Registries 4.00 0.00 0.00 4 2.2 192/8 Various Reg. - MultiRegional 0.83 3.09 0.49 0.72 593 -53 -43 -4 2.2 038/8 Performance Systems Internat'l 1.50 3.00 0.39 0.75 53 -5 -6 2.2 157/8 Various Registries 0.29 2.77 0.37 1.37 590 -15 -40 -69 -4 2.4 004/8 Bolt Beranek and Newman Inc. 1.00 2.62 0.35 1.35 279 -16 -36 -2 2.4 194/8 RIPENCC-Europe 1.10 2.53 0.64 0.68 1052 -56 -36 -9 2.2 169/8 Various Registries 0.33 2.45 0.54 1.27 133 -3 -6 -8 2.2 202/8 APNIC-Pacific Rim 1.38 2.41 0.65 0.65 1476 -3 -69 -46 -17 2.1 195/8 RIPENCC-Europe 1.23 2.33 0.68 0.69 1603 -68 -41 -14 2.2 193/8 RIPENCC-Europe 1.02 2.22 0.70 0.79 923 -34 -21 -3 2.2 166/8 Various Registries 1.00 2.20 0.57 1.15 178 -7 -9 2.2 206/8 ARIN-North America 1.62 2.09 0.79 0.50 1226 -5 -38 -14 -33 2.1 033/8 DLA Systems Automation Center 0.00 2.00 0.80 1.00 10 2.2 198/8 VariousRegistries 0.88 1.99 0.80 0.78 808 -21 -9 -3 2.2 145/8 Various Registries 0.18 1.93 0.76 1.11 159 -8 -4 -2 2.2 168/8 Various Registries 0.57 1.93 0.83 0.84 430 -2 -11 -3 2.2 144/8 Various Registries 0.29 1.91 0.89 0.80 694 -18 -16 -2 -2 2.2 012/8 AT&T Bell Laboratories 1.55 1.91 0.37 1.65 918 -3 -21 -99 -18 2.4 139/8 Various Registries 0.38 1.88 1.06 0.39 492 -8 -11 -24 2.2 200/8 ARIN-Central and South America 1.48 1.83 0.69 0.96 1144 -3 -22 -28 2.1 203/8 APNIC-Pacific Rim 1.57 1.79 0.79 0.74 1578 -6 -29 -19 -9 2.1 205/8 ARIN-North America 1.20 1.68 0.97 0.49 716 -10 -20 2.2 212/8 RIPENCC-Europe 1.37 1.67 0.89 0.64 2044 -4 -29 -6 -24 2.1 165/8 Various Registries 0.52 1.64 0.81 1.15 328 -2 -4 -3 2.2 035/8 MERIT Computer Network 0.50 1.60 1.00 0.67 28 2.2 204/8 ARIN-North America 1.35 1.55 0.98 0.53 976 -2 -9 -22 2.1 170/8 Various Registries 0.25 1.53 0.84 1.20 177 -6 -2 2.2 199/8 ARIN-North America 1.37 1.52 0.79 1.00 503 -5 -6 2.2 063/8 ARIN 1.32 1.49 0.82 0.95 1041 -8 -8 2.1 062/8 RIPENCC-Europe 1.06 1.43 0.76 1.22 1006 -6 -15 -15 2.4 148/8 Various Registries 0.64 1.40 0.79 1.30 312 -2 -4 -2 2.4 151/8 Various Registries 0.50 1.26 1.13 0.67 668 -5 -2 -3 -7 2.1 196/8 Various Registries 1.00 1.21 1.02 0.79 173 2.2 X/8 Description - Connect Accept Ping Error TCP 10061 Total pP pE pT pR G 051/8 Dept. of Social Security of UK 2.00 2 3.3 006/8 Army Information Systems Center 0.00 0.00 1.75 0.33 15 -4 3.3 131/8 Various Registries 0.11 0.14 1.73 0.20 3292 -99 -99 -99 -99 3.3 132/8 Various Registries 0.04 0.06 1.68 0.39 4347 -99 -99 -99 -99 3.3 153/8 Various Registries 0.14 0.19 1.65 0.32 319 -23 -34 -56 -22 3.3 171/8 Various Registries vaskapu .hu 0.33 0.27 1.62 0.30 133 -3 -9 -19 -10 3.3 137/8 Various Registries 0.13 0.37 1.59 0.34 1683 -99 -66 -99 -99 3.3 162/8 Various Registries 0.21 0.31 1.57 0.41 398 -16 -21 -46 -17 3.3 147/8 Various Registries 0.04 0.20 1.55 0.60 983 -99 -97 -99 -15 3.3 155/8 Various Registries 0.15 0.22 1.50 0.67 889 -59 -77 -72 -9 3.3 140/8 Various Registries 0.40 0.50 1.45 0.51 717 -10 -15 -45 -19 3.3 158/8 Various Registries 0.10 0.45 1.44 0.65 1026 -99 -27 -62 -12 3.3 129/8 Various Registries 0.42 0.52 1.36 0.69 1309 -16 -22 -48 -11 3.3 142/8 Various Registries 0.53 0.53 1.31 0.77 675 -4 -12 -18 -3 3.3 164/8 Various Registries 0.21 0.93 1.31 0.58 496 -22 -13 -9 3.3 159/8 Various Registries 0.18 0.42 1.30 0.96 571 -30 -17 -15 3.3 134/8 Various Registries 0.52 0.58 1.27 0.82 756 -6 -10 -16 -2 3.3 136/8 Various Registries 0.53 0.32 1.25 1.08 279 -2 -15 -5 3.3 064/8 ARIN 0.94 0.57 1.20 0.87 2453 -31 -26 -3 3.3 138/8 Various Registries 0.27 0.59 1.16 1.15 372 -11 -4 -3 3.3 156/8 Various Registries 0.30 0.71 1.13 1.12 144 -4 3.3 128/8 Various Registries 0.86 0.65 1.11 1.04 1069 -8 -4 3.3 X/8 Description - Connect Refused Ping Error TCP 10061 Total pP pE pT pR G 020/8 Computer Sciences Corporation 0.00 0.00 0.00 4.40 555 -99 4.4 219/8 APNIC-Pacific Rim 1.88 0.05 0.02 3.78 119 -37 -91 -38 4.4 218/8 APNIC-Pacific Rim 1.40 0.30 0.09 3.55 653 -37 -99 -99 4.4 018/8 MIT 0.67 0.50 0.20 3.50 37 -9 -8 4.4 172/8 Various Registries aol.com 1.41 0.55 0.03 3.50 1085 -2 -17 -99 -99 4.1 043/8 Japan Inet 1.40 0.09 0.25 3.33 67 -14 -13 -13 4.4 068/8 ARIN 1.66 0.15 0.32 3.03 503 -2 -66 -72 -75 4.1 081/8 RIPENCC-Europe 0.50 0.22 0.56 2.67 51 -4 -3 -6 4.4 163/8 Various Registries 0.42 0.86 0.51 2.45 342 -4 -23 -27 4.4 024/8 ARIN-Cable Block 2.00 0.21 0.55 2.35 1182 -11 -99 -64 -83 4.1 080/8 RIPENCC-Europe 1.15 0.41 0.62 2.28 594 -19 -22 -39 4.4 061/8 APNIC-Pacific Rim 1.35 0.54 0.61 2.17 1404 -2 -23 -57 -76 4.1 149/8 Various Registries 0.78 1.48 0.39 2.14 126 -14 -7 4.4 133/8 Various Registries 0.26 1.13 0.71 1.82 453 -15 -10 -14 4.4 161/8 Various Registries 0.50 1.05 0.71 1.81 262 -2 -6 -8 4.4 152/8 Various Registries 0.43 1.47 0.60 1.77 531 -6 -4 -22 -14 4.2 067/8 ARIN 0.67 2.16 0.36 1.74 515 -18 -63 -13 4.2 055/8 Boeing Computer Services .mil 0.00 0.00 1.20 1.57 3026 -34 -47 4.3 167/8 Various Registries 0.24 0.52 1.03 1.52 422 -15 -8 -6 4.4 143/8 Various Registries 0.28 0.57 1.06 1.41 412 -11 -5 -4 4.4 150/8 Various Registries 0.31 1.32 0.82 1.39 566 -13 -2 -5 -5 4.2 160/8 Various Registries 0.32 1.25 0.86 1.35 450 -10 -2 -3 4.4 141/8 Various Registries 0.63 1.29 0.80 1.34 383 -4 -2 4.4 065/8 ARIN 1.10 0.87 0.88 1.33 1394 -5 -8 4.4 066/8 ARIN 1.05 0.40 1.05 1.33 1815 -60 -10 4.4 130/8 Various Registries 0.49 0.73 1.01 1.33 1057 -9 -5 -6 4.4 146/8 Various Registries 0.43 0.94 1.06 1.08 581 -7 4.4 Column totals are random sample size 5619 13706 43370 18362 81057Legend: Ping, ICMP echo replies. Error, ICMP echo request error reports by "volunteer" hosts. TCP, established connections mainly with ports 21, 22, 23, 25, 80, 113, 139 and 443. 10061, connection refused error reports. Each item is the ratio of the observed/expected frequencies. Expected = row total x column total / table total. For the difference of these ratios from expected, pP, pE, pT and pR respectively are p values expressed as a power of ten (e.g., -3 indicates p < .001). G, network group classification code (please see Tables 2 and 3).The statistical significance of the differences of these ratios from chance (ratio = 1.0) are shown in the pP, pE, pT and pR columns respectively. Blank entries indicate lack of a significant difference and/or insufficient sample size for a statistical evaluation. Since random variation occurs in data collected using random sampling, the statistical significance helps evaluate whether a difference or phenomenon exists. Once we believe a phenomenon has been detected, then the magnitude of the effect can be considered. It is often other concerns, such as theory, which determine the importance or meaning of demonstrated differences. In many fields of research, small, but real, differences may be theoretically important and provide clues for further investigation.
The following descriptions include the networks which showed statistically significant differences from expected probabilities of response (defined by the column totals).
(1) ICMP Echo Reply ratios above 1.70 were found in x/8 prefixes 213, 211, 217, 24, 208, 209, 210, 216. It may not be a coincidence that all of these, except 24, are thought to consist of x.y.z/24 networks. All also showed significantly elevated rates of other response categories: ICMP Error reports (213, 208, 216), TCP connections (209, 216) and connection refused (211, 24).
(2) ICMP Error reports were "volunteered" as the most prominent response category by 35 x/8 address spaces. The leaders were 10, 154, 192, 38, 157, 4, 194, 169, 202, and 195, and showed more than 2.30 times the expected response rates.
(3) Established TCP connections were the favorite response mode for 22 of the IP address samples. The winners in the TCP competition, so to speak, were military (6) and various registries (131, 132, 153, 171, 137, 162, 147, 155, 140, 158 and 129 above 133%).
(4) TCP connection refused error responses were most prominant in 30 of the x/8 address samples. The champions in this category are Computer Sciences Corporation (20), Pacific Rim (219, 218), MIT (18), aol.com (172), Japan Inet (43), ARIN (68) and Europe (81).
Using the group classification scheme in column G of Table 1, Table 2 summarizes the types of networks found and may help visualize the results.
Table 2: Network Classification by Response Pattern
Legend: The response codes in the G column of Table 1 were used to group the networks where code a.b indicates row a and column b. a is the response type with the highest ratio for the network and b is the second highest ratio if statistically significant (Table 1). Where only one response type showed an elevated ratio, the a.b code of a = b was arbitrarily assigned to show those networks in the diagonal cells (blue).Table 2 emphasizes that most networks displayed only one elevated response category (the diagonal entries). The off-diagonal networks show the two highest elevated response categories. Off-diagonal transpose cells (e.g., 1.2 and 2.1) are indicated by the color coding and for simplicity will presently be considered as single groups. Of the ten possible groups (4 diagonal cells and 6 pairs of off-diagonal cells), Table 2 shows that nine are populated with one or more networks.
In summary, this simple scheme defines nine mutually exclusive groups of networks, which may be listed by number of network members with tentative group labels (Table 3):Table 3: Network Types
Group Codes n Description Group Label1 2.2 22 ICMP error reporter volunteer 2 4.4 22 TCP connection refused refuser 3 3.3 22 Established TCP connections acceptor 4 2.1/1.2 13 volunteer and echoer ICMPer 5 2.4/4.2 8 volunteer and refuser surfer? 6 4.1/1.4 5 refuser and echoer surfer? 7 1.1 2 IMCP echo repliers echoer 8 1.3 2 echoer and acceptor welcomer 9 4.3 1 acceptor and refuser TCPer Classified.... 97The first three groups -- the volunteers, refusers and acceptors -- account for two thirds (66 of 97) of the networks. In contrast, the profiling scheme identified the unusual. Peculiar activity was exhibited by Boeing (military) systems (55), the ninth group. Highlighting these distinct characteristics, the 55/8 network was among the top six most active regarding internet responses (Table 1, row totals).
The most interactive x/8 IP address prefixes were 132 (n = 4347), 216 (n = 3799), 131 (n = 3292), 211 (n = 33232), 209 (n = 3042), 55 (n = 3026), 64 (n = 2453), 210 (n = 2049), 212 (n = 2044), 66 (n = 1815), 137 (n = 1683), 207 (n = 1655), 195 (n = 1603), 203 (n = 1578) and 208 (n = 1537).
Discussion
Previous papers by the author [1, 2] presented raw counts of numbers of computers interacting over the internet categorized by the type of interaction (ICMP, TCP, etc) and by x/8 address spaces. The objective was to begin to account for computers actually connected to the internet in the present period. These may be the only published reports based on random sampling of IP address space. If the reader knows of any references to prior random sampling work, they would be appreciated, studied and possibly cited.
This paper begins to process that raw data to present it in a more interpretable form, with the dual objectives of determining a baseline for several internet response parameters and of profiling address sub-spaces (networks) to characterize special features of their interaction with the internet.
A basic research guideline was applied -- simplify everything as much as possible.
First, both basic and commonly used ICMP and TCP protocols were used as response measures.
Second, in the absence of a published database, most of IPv4 address space was surveyed with random sampling so that population parameters could be estimated. The x/8 networks or address sub-spaces used in this paper are broad and not fine-grained as x.y/16 or x.y.z/24 networks.
Third, the most elementary methods of statistical analysis were chosen. Featuring only four variables -- the response categories -- simple ratios and probabilities could be presented and a mutually exclusive set of groups could be defined for network profiling and categorization. In contrast, as more variables are added, more complicated analytical procedures such as cluster analysis and factor analysis might be marshaled. Cluster analysis identifies groups of networks by the distances between networks in an abstract vector space where the coordinates are the values of the internet response variables. Factor analysis accounts for variance and covariance of the network response variables by isolating independent dimensions in an abstract vector space. This might sound a bit complicated, but most readers may be familiar with personality tests where questionnaire items are used to create a series of scores on separate aspects of personality. Indeed, the personality and social psychologists were pioneers in this methodology.
Finally, this paper does not explicitly develop a scoring method where each case, a network, could be assigned a score for each of a set of profile variables, such as the groups defined above. In behavioral testing, say, in the measurement of intelligence, separate scores may be computed for types of intelligence, such as mathematical, verbal, spatial, musical, etc. The development of reliable and valid scoring methods in profiling requires fairly extensive effort, some trial and error and repeated random sampling to obtain the required data sets.
Given the foregoing simplifications and perhaps limitations in this study, the degree to which the stated objectives were achieved might be a bit unexpected. Although different x/8 address spaces are known to have assignments to entities which obviously use the resource differently, there was a large number of individual responses found to be markedly different than expected both statistically and quantitatively (Table 1). Further, the patterns of these deviations from the average were clearly not random as shown by Tables 2 and 3. Indeed, these patterns could be used to define a preliminary network profiling scheme. The resulting network groups describe both the most common patterns of internet behavior as well as some unique networks like Boeing (military) 55/8.
As profiling methods develop in a particular application, composite variables like the groups defined above interface with the conceptualization of the material. This phase is illustrated by the Group Labels listed in Table 3 and usually involves a concept or some interpretation of the item. As of this writing, the naming of the groups relied on operational definitions (direct reference to the data), some interpretation and some uncertainty.
Many of our readers are systems administrators and security personnel and the author might ask, "In which group is your network?"
References
[1] Doctor Electron, "Computers Connected to IPv4 Address Space", June, 2002.
[2] Doctor Electron, "A TCP Ping Reveals Hosts by Connection Refused Error", August, 2002.
[IANA] Internet Assigned Numbers Authority, "Internet Protocol in v4 Address Space", December, 2001.
[RFC 1519] Fuller, V. et al. "Classless Inter- Domain Routing (CIDR): an Address Assignment and Aggregation Strategy", September, 1993.
[RFC 1700] Reynolds, J. K., and J. Postel, "ASSIGNED NUMBERS", October, 1994.
Copyright © 2002 Global Services
Original publication: August 13, 2002The reader is welcome to contact Global Services for more specific data from our present databases or further data collection regarding specific IP address prefixes.
Back to Net Census