So far in this series, I’ve concluded that IP addresses and other information network operators handle is personally identifiable information (PII) and covered under privacy and security regulations. I’ve also looked at the data lifecycle and user rights related to private data. What are some best practices network operators can follow to reduce their risk? The simplest way to think about best practices is to think about user rights and risks at each stage of the data lifecycle.
When collecting data, the most important right is what users know. Do users know what data is being collected, or what can be inferred from that data? Do users know how the data might be used, especially if combined with other information?
Informed consent is one of the most significant gaping holes in the collection of use and data in network operations.
Suppose commercial operators notify users about the data they will be collecting. In that case, they bury that information in pages-long disclaimers and fair-use documents that broadly discuss other details. These lengthy, complex disclosure statements are often (and unfortunately) intentional.
Noncommercial (‘enterprise’) operators often bury disclosures about information collected in mountains of information users must sign to use the organization’s facilities. These notifications often don’t include information about the provider collecting, using, or destroying data — just that it is collected.
To reduce risk, every network operator, whether commercial or not, should have a clear, concise, and well-organized statement about what data is collected, what can be inferred from network usage, how the data is stored, what the data is used for, and when the data is destroyed. Network engineers can help their legal departments by reading the available notifications and making suggestions to include specific information.
The legal folks might be the experts on how to word a privacy statement, but the network engineer is the expert on data collected through the network.
We often think of disclosure as internal — data is pulled from logs to troubleshoot problems or better understand network usage. Sending data to third parties is also disclosed.
Third-party access is particularly important in the current cloud-based era. Every vendor now offers some cloud-based service to manage network assets — Mist (Juniper), ACI (Cisco), and CloudVision (Arista) are some examples.
Because network operators are responsible for how these services handle private data collected about users, network engineers should ask any cloud-based service they’re using the same hard questions about how data is collected, stored, used, and destroyed they would ask about locally stored data.
Essential questions
Data breaches are also a form of disclosure — unintentional rather than intentional. My next post on privacy will cover data breaches in more detail.
You must retain data collected from the network to use it. How is stored data protected? How long is data stored? How can users gain access to this data?
Is data containing personal information encrypted when in motion and at rest? Encryption should be a baseline protection for all data — there’s no longer reason for data transported or stored in the clear.
Is access to PII carefully controlled? Who has access to this data, and under what circumstances? Zero-trust systems are, of course, preferred over simpler group membership systems. Still, there should always be some control over who has access to PII collected through network operations.
Is there a clear plan to time data out over time? How long is data collected from the network useful, and what happens to it after this time has passed?
There is more to timing data out than simply destroying it — you can de-identify data to reduce risk (and data handling responsibilities). Do you need host-level information forever, or would subnet-level information suffice to understand traffic patterns? Why not write a script that zeroes out the host address in network logs after a few days (or weeks)?
Do you really need to store packet contents for troubleshooting applications after a short time, or would de-identified headers still provide enough information to track application performance over time? Can you convert location data to per-application rather than per-user to remove the individual user identities?
Network data can be de-identified in many ways while still retaining its usefulness — we just don’t think about these techniques due to competing pressures. Reducing risk through de-identification techniques is helpful, especially in the case of a breach.
Finally — what about user access to the data you are holding about them in network logging systems? Is there a defined way for users to gain access to this information? Could you even search for and find information about a specific user if they (or some legal entity) asked for it?
Another incentive to de-identify data over time is to reduce the information you need to find and return when asked about a specific user. You don’t need to spend time searching through data that no longer identifies a single user.
You need to destroy it once you’re done using it. Don’t think, ‘the data will be overwritten soon enough’, or ‘erasing these log files off my local hard drive is enough’. Every IT professional knows there are ways to recover data that’s been erased. Encrypting data at rest can be helpful here.
It’s tempting to ignore user privacy issues; we’re all busy, and network engineers are not privacy experts. However, a few commonsense methods used to protect user privacy can significantly reduce risk.
Russ White is a Senior Network Architect at Akamai Technologies.
This post is adapted from a series at Packet Pushers.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.