Here’s a question for network or application folk tasked with protecting user and/or customer privacy — is an IP address Personally Identifiable Information (PII)? While there are many kinds of PII in networks (see the last part of this series for more details), the IP address is just about the most common.
Your IP is PII
PII is commonly defined as anything that can be used to identify an individual user. Does an IP address identify an individual user?
At any given time, an individual user’s device might be connected to the network in one of several locations. Perhaps they’re connected to Wi-Fi in the office at 10:00 (UTC), then to the Wi-Fi outside the cafeteria at 12:15 (UTC), and then through the Internet (across a VPN, for instance) at 22:00 (UTC). Because the user’s IP address can change depending on how and where they are connected to the network, it does not seem like the IP address can be used to identify an individual user.
But the definition of PII doesn’t end with anything that can be used to identify an individual. The definition includes any information that can be combined with other information to identify an individual. The ‘combined with’ clause makes things more complicated.
So long as an IP address can be combined with some other information to identify a user, it’s still PII. What other things? Things like which device was connected to the network at a specific time and place and who was using that device.
So long as the IP address can be connected to a device at some time and a user can be connected to a device simultaneously, the IP address can be at least part of the information used to identify an individual user’s activity.
All of which means the IP address is, in fact, protected information.
What about Network and Port Address Translation (NAT and PAT)? Don’t these remove the IP address from the realm of PII by obfuscating the connection between the IP address and a particular user?
No — because, again, we can use logs at any given moment to remove the obfuscation created by NAT or PAT (unless you’re not keeping those logs). Many privacy regulations explicitly call out IP addresses as one form of protected PII.
Privacy and forwarding
If you’re a network engineer, you might be feeling a bit of a rising panic right about now. IP addresses, the basic ‘stuff’ of networking, are protected PII? How can the network even work if IP addresses cannot be advertised and used?
Many in the Internet and protocols communities argue IP addresses shouldn’t be considered PII because they are ubiquitous, and the Internet can’t operate without them. This line of argument won’t get us anywhere, though — birthdays are ubiquitous (everyone has them) and widely used to differentiate between individuals with the same name (or determine a person’s age). But it’s hard to argue that birthdays shouldn’t be PII because they are widely used as one piece of information that identifies a person.
It’s hard to argue that IP addresses should not be PII when they are used because of their identifying properties.
Still … don’t panic — deep breaths. Protected PII can still be widely used (like a birthday) so long as it is used properly. Which leads to this question — what does proper use mean?
To determine if using data is proper, consider the privacy dimensions of data. While there are many official lists of these dimensions, for this series, we’ll use:
Identifiability means the degree to which data identifies an individual user. We’ve already determined IP addresses are protected PII, so we know they should be considered identifiable.
Centricity means the extent to which the information remains local to a processing client. Is the information gathered on one device, stored on another, and combined with other information to be processed on a third? Then the data has low centricity. As more devices ‘touch’ the data, it becomes less centric.
IP addresses are part of IP packets that originate on one computer, are carried through a network, are touched by many different network devices, and consumed on another computer. IP addresses have low network centricity.
Accessibility means how easily authorized and unauthorized users can access the data. IP addresses are more difficult to tie to an individual user than they are to discover; I would rate the accessibility of IP addresses as moderate.
Integrity is how well the system maintains a reliable state; this doesn’t apply to our IP address evaluation.
Purpose, or usage, considers why the information is collected. Is the collected information being used for its primary purpose, or has the data been repurposed to discover something else about the person it can be used to identify?
IP addresses in forwarding packets are only used to carry information from one device to another.
This last property clears the air in terms of IP addresses used to forward traffic — it doesn’t matter if IP addresses have high identifiability, low centricity, and moderate accessibility. IP addresses must be somewhat unique to serve their purpose, and they are used for one purpose by network devices — to forward traffic.
However, this doesn’t let network operations off the privacy hook, but we’ll have to continue this conversation in the next instalment.
Russ White is a Network Architect at LinkedIn.
This post is adapted from a series at Packet Pushers.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.