An article recently appeared on The Register called [the] ‘Curious tale of broken VPNs, the Year 2038, and certs that expired 100 years ago‘. It looks at a problem with secure PKI certificates. The article appears to be about Virtual Private Network (VPN) problems exposing underlying certification problems. However, it becomes an intriguing detective story about debugging a chain of problems that in the end stem from an old favourite protocol we’ve discussed several times on the blog — Network Time Protocol (NTP).
I haven’t spoiled the article entirely by stating here that the root cause is NTP, because it’s not as simple as it seems. The article speaks to deployment procedures, post-deployment usage, the source of upgrades, and the interaction of complex systems relying on accurate time when the time system faces anomalies.
The reference to the year 2038 is a flag to an old and well-understood UNIX time problem that by now (hopefully!) has been fixed in current systems, albeit with continuing risks in systems that are infrequently or even never upgraded, like the Y2K problem.
By its very nature, the NTP protocol transmits the current time as a numerical value, aligning with UNIX system calls that anticipate ‘time’ as a positive 32-bit number, originating from the reference point of 00:00 UTC on 1 January 1970.
While both NTP and public-private key certification systems have been coded to avoid this problem, the intersection of time, sources of time, and systems that depend on time invite bugs to come out of the woodwork, and this article discusses one such case, its ramifications, and how it was found.
Hats off to the original Author, Bob Zim who is the detective in question and posted this story to the Mastodon infosec community, where The Register picked it up. This is an amazing example of how to track down an operational problem and work through all the systems to the root causes.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.
Um no. It’s ALWAYS DNS. (Unfortunately. I would love to solve an issue that’s not DNS)
I agree with Joshua. Most issues come from DNS and IT so called professionals who do not follow best practices.
When NTP is a problem for me is when it is not configured and the time skews more than 5 minutes and all sort of authentication issues
TCP space is an issue. I’m looking at the OSI model like the Constitution in the States. It is not immutable it needs a relook. NTP is a constant requirement that seems to get lost in the translation. Maybe embed the pulse some other way in the model.
NTP has quite a few issues, particularly accuracy and synchronisation. Check out the talk provided on this by Darryl Veitch for the Internet Association of Australia: https://internet.asn.au/events/online_convergent_event_2/