The client-server computing model has exploded into a dauntingly complex architecture, now involving distributed processing at network edges and intermediate nodes sprinkled on the traffic path. Web caches that temporarily store and rapidly serve frequently accessed objects are a critical component in this ecosystem.
However, caches also make lucrative targets for wrongdoers.
My research team at Northeastern University, University of Trento, and Akamai previously conducted the first systematic investigation into Web Cache Deception (WCD) attacks that trick caches into erroneously storing sensitive content. Readers may remember my musings on our paper ‘Cached and Confused’ here on the APNIC Blog. Now, we scale things up in our follow-up ‘Web Cache Deception Escalates!’.
Let’s summarize how WCD plays out with the example in Figure 1. The attacker starts with a legitimate URL for a sensitive profile page but appends an invalid path component disguised as a static file — a style sheet. Web application frameworks often remove the invalid component and reroute the request to the profile endpoint instead. The cache fronting the application has no visibility into this path rewriting. Here, application operators have also configured the cache to disregard upstream cache-control headers because they manage the caching rules centrally on the cache server. As a result, the cache misinterprets the response as a cacheable style sheet. Victims that visit this link will have their profile details publicly cached and exposed.
Web cache deception escalates
In 2020, we probed 340 websites for WCD and found 37 that were vulnerable. Our methodology was straightforward: Create accounts on sites, plant markers in sensitive fields, and crawl each site with WCD payloads. A successful attack causes a marked page to get cached. A second crawler then accesses the same pages without authentication, checking for markers in the responses.
That was an educational experiment, but with major limitations. Setting up markers was an arduous prerequisite, which also meant we could not test sites that had no avenues for entering or displaying user-supplied input. Overall, our findings were narrowly scoped regarding personal information disclosure — a limitation shared by all prior WCD research.
Our new work addresses these limitations. We propose a methodology that no longer relies on markers or requires authenticating to sites. Instead, we use two heuristics: Page identicality checks to determine whether a page contains dynamic content, and response header checks to determine if a request is served by a cache. When both are triggered, a potentially sensitive dynamic page is served from a cache meant for static objects and we have a WCD vulnerability. The checks are simple, however, combining them into a robust methodology perhaps carries more nuance than one would appreciate while sipping their morning coffee.
Equipped with this powerful tool, we performed the largest WCD experiment to date, searching for vulnerabilities among 10,000 sites and found 1,118 that were impacted. This number is concerning, but I would especially like to emphasize three key insights.
1. WCD impacts unauthenticated pages
Our methodology tests public pages; we do not log into accounts. This is a conscious choice, expanding the research scope beyond personal information disclosure on pages behind authentication gates. Our findings show that public pages still contain sensitive secrets such as Cross-Site Request Forgery (CSRF) tokens, OAuth state parameters, and Content Security Policy (CSP) nonces that enable further attacks if leaked. For instance, we were able to hijack chat sessions on a travel reservation platform by reusing a session token leaked via WCD.
2. WCD leads to cache poisoning
WCD forces sensitive data into a cache. However, the mechanism closely resembles cache poisoning that instead pollutes the cache with an exploit payload. We found that sites that contain WCD vulnerabilities but no immediately harmful data to leak may still be exposed to damaging attacks via poisoning. For example, the homepage of a payment processor contained nothing to leak but was impacted by a reflected Cross-Site Scripting (XSS) vulnerability. We repurposed a WCD vector to poison the cache with an XSS payload, escalating the attack to a stored XSS.
3. WCD is a supplier chain vulnerability
When third-party services integrated with a site use caches, their cache issues can become supplier chain vulnerabilities. We observed a customer support management service with a WCD vulnerability expose a whopping 456 of its clients to attacks. This was not an isolated incident; three other service providers also endangered their clients, albeit at smaller scales.
Making cacheability decisions based solely on public exposure of content to the Internet is dangerous. Operators should not conflate unauthenticated with non-sensitive as unauthenticated pages still contain per-visitor tokens. Caching mishaps should not be overlooked, even when there is no direct path to sensitive content. Caches can impact the application architecture in unpredictable ways.
I reiterate that WCD is a system problem. Security is a property of the system as a whole, emerging from the interactions of components that may be perfectly secure in isolation. Keeping this in mind, operators should strive to analyze the implications of changes to their caching architectures with a broader, systems-centric view.
Curious readers can find the details in the paper.
Kaan Onarlioglu is an Architect with Akamai’s Security Intelligence team.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.