In the last post on this topic, I concluded that IP addresses are protected information (PII or Personally Identifiable Information). Operators should handle users’ IP addresses according to privacy best practices. But I also concluded that because IP addresses used for forwarding are collected (or carried through the network) only for forwarding, the user cannot reasonably expect the network to forward traffic without collecting and using this information.
But networks don’t just use the IP addresses attached to packets for forwarding. Logging systems also collect this information and a lot of other information to monitor network operations and aid in troubleshooting network failures.
What about this logged data? Let’s look through the data privacy dimensions:
- Identifiability
- Centricity
- Accessibility
- Integrity
- Purpose
Identifiability means the degree to which data identifies an individual user. We’ve already determined IP addresses are protected PII, so we know they should be considered identifiable. Logs contain a lot of other information that might be private. Let’s consider a few.
DNS queries contain information about what users are accessing and often contain information (including the IP address!) that can identify each user individually. DNS queries would probably be considered protected information in most jurisdictions.
Information about when, where, and how users access the network inevitably contains the user’s location at specific times of the day. Location information can be especially important if users can access the network from their devices or use company devices from remote locations. That Susan accesses the corporate network from a corner coffee shop every Tuesday morning is protected information.
If you’re capturing packets, those packets are bound to contain usernames, passwords, and other private information (potentially a serious security risk).
Centricity means the extent to which the information remains local to a processing client. While the data carried through the network has low centricity, data is logged to one of a few hosts and generally stays on those devices for processing. The centricity of logging data depends on how your logging system stores and processes data.
Accessibility means how easily authorized — and unauthorized — users can access the data. The accessibility of log data depends on how you have configured user access — and how strong the user access controls are for your particular logging system.
Integrity is how well the system maintains a reliable state. Logging systems must have high integrity to be useful — you can’t understand or predict the state of a network if you don’t have accurate data to work from.
Purpose, or usage, considers why the information is collected. Logged information used to monitor network performance and troubleshooting is being used for its intended purpose.
Sometimes, though, information is captured incidentally, like usernames and passwords. This information is not used to monitor the network’s state or troubleshooting — it just happens to be ‘what’s in the captured packet’.
Some systems intend to use logged information for purposes other than network maintenance, such as whether a user is ‘doing their job’, or is likely to leave the company, and so on. These systems tend to focus on application-level data rather than network data, although network logging can play a role. ‘At work’ monitoring systems are also somewhat controversial.
Network operators should be concerned about managing information captured during routine logging; existing (and likely forthcoming) privacy laws protect this information. In my next post, we’ll start looking at the lifecycle of data from a privacy perspective.
Russ White is a Senior Network Architect at Akamai Technologies.
This post is adapted from a series at Packet Pushers.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.