A survey on securing inter-domain routing: Part 2

By Geoff Huston on 9 Jul 2021

Tags: BGP, IETF, measurement, ROV, security

The Border Gateway Protocol (BGP) is the Internet’s inter-domain routing protocol. And after some thirty years of operation, it’s now one of the more venerable of the Internet’s ‘core’ protocols.

One of the major ongoing concerns related to BGP is its lack of effective security measures, and as a result, the routing infrastructure of the Internet continues to be vulnerable to various forms of attack.

In Part 1, I looked at the design of BGP, the threat model, and the requirements from a security framework for BGP. In Part 2, I will look at the various proposals to add security to the routing environment and also review the current state of the efforts in the Internet Engineering Task Force (IETF) to provide a standard specification of the elements of a secure BGP framework.

The approaches to securing BGP can be further classified in the same fashion as the security requirements: Securing the operation of BGP and securing the integrity of the BGP data.

Securing the operation of BGP sessions

BGP uses a long-held TCP session and the same approaches to securing any TCP session can be used in the context of a BGP session. These approaches fall into two categories:

Those that simply attempt to protect the TCP session from disruption via injection of spurious traffic
Those that also attempt to protect the TCP session from eavesdropping and alteration by encrypting the payload

The generalized TTL security mechanism

The Generalized TTL Security Mechanism (GTSM) was originally described in RFC 3682 and updated in RFC 5082 and is based on the observation that the overall majority of BGP peering sessions are established between routers that are directly connected.

The technique involves configuring each BGP IP packet to be sent with a TTL field value in the IP header of 255, and for the BGP receiver to discard all packets with an inbound TTL of less than a set threshold value. For a direct connection, the inbound TTL value should be 255, so all inbound TCP packets within this session with a TTL of 254 or less can be discarded by the receiver.

The motivation for this approach is that spoofing the TTL field in an IP header is challenging for an unassisted remote attacker. This TTL packet filter is a lightweight defensive measure intended to add some protection to the BGP session from efforts to intrude into the session using remote attacks. This GTSM approach can be used for multi-hop BGP peer sessions as well as directly connected BGP sessions, but it is not all that robust in terms of its security properties because of the additional variables introduced with TTL changes due to routing changes and the potential to mask the conventional TTL behaviour with tunnelling techniques.

TCP MD5 Signature Option

A more robust approach to protecting the TCP session is using cryptographic protection of the TCP session. While these crypto approaches can be highly resilient to intrusion attempts, they do expose the BGP speaker to potential Denial-of-Service attacks if the processing load of the cryptographic functions to detect bogus packets is sufficiently high. Bogus packets still must be processed by the target just to ascertain that they are bogus.

The TCP MD5 Signature Option uses message authentication codes, which are a class of cryptographic hash algorithms applied to messages of arbitrary length that produce a ‘message digest’ intended to protect the integrity of a message. The desired property of a message digest is that it is infeasible to generate two messages that have the same message digest value and equally infeasible to generate a new message that has a particular message digest value.

The MD5 algorithm is intended for digital signature applications where a message digest is generated over the combination of a message and a secret shared key value. The message and the digest value can be transmitted openly, and the receiver can use a local copy of the secret key and apply the message digest algorithm to the combination of the received message and the key. If the digest value matches the received value, then the receiver can be assured that the message has not been altered in transit and that the message was generated by a party who also has knowledge of the key.

The TCP MD5 Signature Option is a TCP extension where each TCP segment contains a TCP option that contains the 128-bit MD5 digest of the combination of the TCP pseudo-header, the TCP segment payload excluding TCP options, and a connection-specific key. This establishes a cryptographically secure signature of the packet.

Without knowing the key, it is very challenging to construct a TCP segment with a valid signature. It is also not readily possible to alter the packet without causing the signature to be invalidated. The receiver calculates the MD5 digest across the received data, using a locally held copy of the key, and rejects the segment if the digest value fails to match that provided in the packet.

In the context of BGP, the TCP session is resistant to various forms of intrusion attack unless the attacker has knowledge of the shared secret key value. The TCP MD5 specification does not specify how the shared key is passed between the two BGP speakers, or how the key value can be changed during the session.

This latter problem is significant because continued use of a key weakens its integrity, and it is conventionally advised that MD5 session keys should be changed every 90 days or so in this type of use context. With a mechanism for in-band key change, this advice implies the need for a BGP session reset every 90 days, which is counter to conventional operational practice in BGP where sessions are held up for as long as possible. Even with tools such as BGP graceful restart, deliberate BGP session resets are generally avoided in the operational community.

TCP Authentication Option

A somewhat different approach, the TCP Authentication Option, uses a Message Authentication Field in the place of the MD5 message digest.

In this option, the final bit of the length field determines whether a key ID has been appended to the Message Authentication Code or not. The message-digest algorithm, in this case, is specified as HMAC-MD5- 96, although other algorithms can be used if configured in advance.

This approach relies on a similar form of out-of-band provisioning as the original MD5 approach, where each end of the conversation must configure a TCP Security Association Database in advance of the use of this mechanism. This database contains a description of the supported TCP connections, the key set, the MAC algorithm, and the MAC length.

IPSEC

IPSEC is a suite of protocols that operate at the IP level of the protocol stack to secure all communications between two endpoints.

The functionality of IPSEC includes methods for protection of IP packet headers, methods for protection and encryption of IP payloads and key management services that allow key rollover during long-held sessions. This is an implementation of public/private key cryptography and can ensure the confidentiality and integrity of all IP messages passed between two hosts.

IPSEC can be used to secure BGP sessions, and it provides greater levels of assurance than can be derived from MD5. However, IPSEC is not widely used in the public Internet for the purpose of securing BGP sessions, and no generally accepted profile of IPSEC for BGP has been standardized so far, with earlier efforts along these lines not progressing within the standards process.

The perceived problem with IPSEC relates to the complications for re-keying IKE/IPSEC sessions, and the observation that processing load to detect bogus packets is considerably higher with IPSEC than MD5. This exposes a Denial-of-Service attack where a stream of bogus IPSEC packets directed at a BGP speaker may be capable of exercising the processor into a fully saturated mode of operation, causing other concurrent router functions to be degraded.

More options

As was observed in Part 1 of this survey, there are many alternatives here, including TLS and QUIC. But more choice is not a substitute for better quality.

These session-level encryption approaches used by applications provide no better answer to dynamic re-keying and follow a now well-established Internet tradition of adding more options to divert attention from the observation that the common fundamental problems are inadequately addressed by many or either all such options! The design goal of such application-level session approaches is protection for transient short-duration sessions, while the vulnerabilities associated with long-held BGP sessions are somewhat different.

The best advice today is that a combination of TCP-Authentication Option (TCP-AO), and the Generalized TTL Security Mechanism (GTSM) is as good as it gets at present. However, it’s also highly desirable to avoid multi-hop BGP wherever possible and directly attach the two BGP speakers. That way the radius of potential eavesdroppers and attackers is reduced considerably.

Securing the integrity of routing information passed in BGP

Early work: 1988-2000

One of the earlier recognized works that addressed routing security was the 1988 study on Byzantine Robustness by Radia Perlman. In the event of failure or malicious behaviour on the part of one or more entities in the system, all correctly operating entities should reach a mutually consistent decision regarding the validity of each message in finite time. This study was in the area of link-state protocol design and described a protocol that satisfied the properties of Byzantine Robustness. It categorized route validation in three approaches:

Bound or just in time — validation occurs the same moment a route is announced, and appropriate measures are taken immediately. Credentials must be available immediately.
Unbound or just in case — validation occurs only if a new router takes part in the system. Credentials are retrieved on the arrival of this router.
Interrogative or just too late — validation occurs on a sporadic basis, requesting validation or credentials from a remote system when necessary.

While the link-state approach described in this paper does not exactly match the inter-domain routing environment, the concept of validating routing information is a consistent theme in all BGP security architectures.

Subsequent work by Smith and Garcia-Luna-Aceves (Securing the Border Gateway Routing Protocol and Efficient Security Mechanisms for The Border Gateway Routing Protocol) published in 1996, attempts to address session security by modifying the BGP protocol. This work proposed the protection of BGP control messages using message encryption at the BGP level, with session keys exchanged at BGP session establishment time. It also proposed the addition of a message sequence number to protect against replay attacks and message removal.

This approach also proposed a predecessor path attribute that indicated the AS prior to the destination AS for the current route and proposed digitally signing all fixed fields in the UPDATE message. The predecessor attribute is used to construct a means of validation of the AS Path attribute. These proposed changes to the BGP protocol required comprehensive adoption and deployment to be effective, as partial adoption would create gaps in any assurance that a predecessor attribute could provide. Their approach was similar to the earlier IDRP work.

IDRP eschewed the use of TCP and included a reliable flow-controlled transport into the IDRP protocol, also including several message integrity protection options.

A contemporary proposal to the Smith and Garcia-Lunes-Aceves approach to securing BGP was based on leaving the BGP protocol unchanged but augmenting the BGP data flow with access to credential information. This additional information was intended to allow a BGP speaker to confirm the authenticity of origination information in BGP UPDATE messages by validating the binding of address prefixes to originating ASes. This proposal, NLRI origin AS verification, used the DNS as the distribution mechanism for origination information, where a BGP speaker could perform a DNS query to validate the prefix size and authorized originating AS information contained in a BGP route object.

Informally, it was intended to allow a DNS query to answer the question: ‘Which ASes have been authorized by the address holder to originate a route for this prefix?’ The proposed framework assumed that the reverse DNS space was securely associated with the holder of the address prefix, and the DNS response was verifiable (using a DNSSEC-signed DNS record and DNSSEC validation, presumably, although this work was concurrent with DNSSEC and did not make use of it in this proposal). This proposal assumed that the performance of DNS queries was within the same order of timescale as the propagation of BGP messages within BGP. It also assumed that there was no circularity, where a DNS recursive resolver or authoritative name server used by the BGP speaker was located within an address prefix that was being validated prior to local acceptance of the route associated with that prefix.

The DNS delegation hierarchy would need to be precisely aligned to the address allocation framework so that the zone administrator of each of these origination authentication zones was in fact the duly delegated holder of the addresses, and this alignment should, preferably, be capable of being validated by third parties. Meeting these requirements would create a digital signature hierarchy embedded in the DNS that would be aligned to the address allocation framework.

The Internet Routing Registry (IRR) pre-dates most other efforts and dates back to the routing work of the early 1990s in the Routing Arbiter project that was part of the US NSFNET, and a project coordinated under the auspices of RIPE in Europe.

The IRR’s objective was to provide a set of routing policy databases populated by the ASes themselves that described the addresses that they intended to announce in the routing system and the routing policies that they intended to apply to these announcements. The Routing Registry was a response to the need described in RFC 1787 for improving global consistency by allowing providers to share routing policies. Each participating AS submits policy data, encoded using the Routing Policy Specification Language (RPSL) (RFC 2622, RFC 4012). Clients may use the registry to determine the stated policies for a particular AS, including what ASes (and possibly prefixes) are suitable for import or export, potentially using the data to populate filter sets on their BGP feeds. Additional information provided to the IRR by an AS could include policy concerning the configuration of BGP communities and the policy responses associated with particular community settings.

However, the utility of the IRR for securing routing is quite limited.

First, the IRR does not provide information about current routes, but only about potential routes. Some potential routes may be legal according to the IRR, but undesirable from a more global point of view.

Next, the IRR has many security vulnerabilities concerning the integrity of registry contents and authorization of changes to the registry. There is no intrinsic authority model that constrains which party can publish data about addresses and ASes in an IRR. Moreover, some policy information concerning agreements between peering ASes is not intended for broader public distribution and the IRRs did not normally implement any form of limited disclosure rules.

Efforts to improve the controls over the authority framework in registries and access frameworks (RFC 2725) never really gained traction. The IRR system is a misnomer, in that there is not a single IRR but many IRRs. The contents of these IRRs are not necessarily mutually consistent and there is no clear way to resolve any such conflicts. Not only is there no authority model ensuring that only authorized parties may publish routing policy data about their own address prefixes and ASes, but there is also no way to describe the intended lifetime of the information. Old information that is no longer current or relevant sits alongside current information, and this sits along with contingency information that may never be actually used.

While the overall approach of providing an out-of-band commentary on routing, enumerating all the cases of authorized (or valid) route objects has been a useful tool for many operational environments, IRR tools are only truly useful in the context of being able to detect and filter routing anomalies if the information is verifiable and authentic, current and complete. In other words, IRRs are most useful if they are carefully and continuously managed, and the accuracy and usefulness of the information rapidly declines if the information in the registry is neglected.

Our experience with IRRs suggests that it would be somewhat foolhardy to automatically apply IRR data to populate route filters, given the risks of incorrect outcomes, both positive and negative, and while there have been good counterexamples in some operational communities, the broader judgement for IRRs being capable of supporting a robust whole-of-Internet role for route integrity is somewhat negative.

It appears that the common requirements in this space appear to relate to authenticity, currency and completeness.

Digital signatures can provide strong assurance related to the authenticity and currency of information, assuming that there is robust enrollment practice that governs the authority to generate such signatures. Given such a practice, the consequent observation is that whether this digital signature framework is placed into the DNS, via a DNSSEC framework, or placed into a framework of X.509 certificates and an associated PKI is, at one level, an isomorphic transform of the same information. The issue of the choice of DNS (and DNSSEC) or X.509 certificates (and certificate-based validation) is then an issue of the performance requirements of these systems.

Completeness is a more challenging requirement. The identification of invalid routing information in the partial adoption case of this approach is unclear. When a query to an information source has a negative response, it is unclear whether the route object that was the basis of the query is not valid (such as a bogus prefix, or a bogus AS), or whether the database being queried is incomplete.

Let’s now move forward in time to review some more recent proposals to secure BGP.

Secure BGP

Secure BGP (sBGP) offered a relatively complete approach to securing the BGP protocol by placing digital signatures over the address and AS path information contained in routing advertisements and defining an associated PKI for validation of these signatures.

sBGP defines the ‘correct’ operation of a BGP speaker in terms of a set of constraints placed on individual protocol messages, including ensuring that:

All protocol UPDATE messages have not been altered in transit between the BGP peers,
The UPDATE messages were sent by the indicated peer,
The UPDATE messages contain more recent information than has been previously sent to this BGP speaker from the peer,
The UPDATE was intended to be received by this BGP speaker, and
That the peer is authorized to advertise information on behalf of the peer AS.

In addition, for every prefix and its originating AS, the prefix must be a validly allocated prefix, and the prefix’s ‘right-of-use’ holder must have authorized the advertisement of the prefix and must have authorized the originating AS to advertise the prefix.

The basic security framework proposed in sBGP is that of digital signatures, X.509 certificates and PKIs to enable BGP speakers to verify the identities and authorization of other BGP speakers, AS administrators and address prefix owners.

The verification framework for sBGP requires a PKI for address allocations, where every address assignment is reflected in an issued certificate. This PKI provides a means of verification of a ‘right-of-use’ of an address. A second PKI maps the assignment of ASes, where an AS number assignment is reflected in an issued certificate, and the association between an AS number and a BGP speaking router is reflected in a subordinate certificate. In addition, sBGP proposes the use of IPSEC to secure the inter-router communication paths.

sBGP also proposes the use of attestations. An address attestation is produced by an address holder and authorizes a nominated AS to advertise itself as the origin AS for a particular address prefix. A route attestation is produced by an AS holder and attests that a BGP speaker is an authorized member of that AS and that it has received a specified route. The address and AS PKIs, together with these attestations, allows a BGP speaker to verify the origination of a route advertisement and verify that the AS path as specified in the BGP UPDATE is the path taken by the routing UPDATE message via the sequence of nested route attestations.

Inter-operation and information exchange between sBGP elements is shown in Figure 1.

Diagram showing sBGP — Figure 1 — sBGP: Certificates for each ISP are issued by the regional registries. The ISPs exchange public keys through special repositories. The keys are pushed to the sBGP routers which validate the BGP UPDATE messages.

sBGP proposed to distribute the address attestations and the set of certificates that compose the two PKIs via conventional distribution mechanisms outside of BGP messages. For route attestations, it is necessary to pass these attestations via path attributes of the BGP UPDATE message, as an additional attribute of the UPDATE message.

There have been a number of significant issues identified with sBGP including the computation burden for signature generation and validation, the increased load in BGP session restart, and the issue of piecemeal deployment and the completeness of route attestations. Also, the requirement that the BGP UPDATE message has to traverse the same AS sequence as that contained in the UPDATE message.

Secure Origin BGP

Secure Origin BGP (soBGP) was a response to some of the significant issues that were raised with the sBGP approach, particularly relating to the update processing load when validating the chain of router attestations and the potential overhead of signing every advertised UPDATE with a locally generated router attestation.

The validation questions posed by soBGP also included the notion of explicit authorization from the address holder to the originating AS to advertise the prefix into the routing system. soBGP’s AS path validation is quite different from sBGP, in that soBGP attempts to validate that the AS path, as presented in the UPDATE message, represents a feasible inter-AS path from the BGP speaker to the destination AS. This feasibility test is a weaker validation condition than validating that the UPDATE message actually traversed the AS path described in the message.

soBGP avoids the use of a hierarchical PKI that mirrors the AS number distribution framework and nominates the use of a web of trust, or a reputation mechanism, to validate these certificates. At the time, no Address or AS PKI had been devised or deployed, so this web of trust approach was a pragmatic response to this critical omission. soBGP uses the concept of an AuthCertto bind an address prefix to an originating AS. This AuthCert is not signed by the address holder, but by a private key that is bound to an AS via an EntityCert.

soBGP deliberately avoided the use of a PKI which was derived from the established AS and address distribution framework. This appears to have been a pragmatic consideration at the time, as no such PKI existed at the time, and it was unclear if the various address registries were in a position to undertake such a role of administering such a specialized PKI in any case. This left open the issue of how to establish trust anchors for validation of these signed objects, which was a rather significant deficiency in the validation framework of soBGP.

Instead of sBGP’s route attestations, soBGP used the concept of an ASPolicyCert as the foundation for constructing the data for testing the feasibility of a given AS path. An ASPolicyCert contained a list of the AS’s local peer ASes, signed by the AS’s private key. An AS peering was considered valid only if both ASes list each other in their respective ASPolicyCerts. Figure 2 depicts a possible soBGP peering network.

Diagram of a possible soBGP peering network. — Figure 2 — The ASPolicyCert is a self-signed certificate containing routing policies. An UPDATE message originating at AS4 would necessarily take the Path {AS4,AS5,AS2,AS1} instead of {AS4,AS3,AS2,AS1} because the connection between AS2 and AS3 would not be regarded as valid.

The overall approach proposed in soBGP represented a different set of design trade-offs to sBGP, where the amount of validated material in a BGP UPDATE message is reduced. This approach was intended to reduce the processing overhead for the validation of UPDATE messages. In soBGP, each local BGP speaker assembles a validated inter-AS topology map as it collects ASPolicyCerts, and each AS path in UPDATE messages is then checked to see if the AS sequence matches a feasible inter-AS path in this map. soBGP proposed to use BGP itself to flood ASPolicyCerts through the network, using a new BGP message type (a Security Message) for this function.

The use of Web of Trust and the avoidance of a hierarchical PKI for the validation of AuthCerts and EntityCerts could be considered a weakness in this approach, as the derivation of authority to speak about addresses is very unclear in this model, but this absence was a result of the protocol being developed prior to the completion of the work on the RPKI. It is clear that soBGP could be readily adapted to use the RPKI as its trust and authority framework.

soBGP’s use of BGP itself to flood the security credentials through the network represented an interesting approach to the problem of distributing such credentials, but it also raised some at the time unanswered questions relating to partial deployment scenarios. Interest in continuing work on soBGP waned in the early 2000s, most likely in recognition that there was an inadequate level of operator demand to sustain the development effort.

Pretty Secure BGP

Pretty Secure BGP (psBGP) puts forward the proposition that the proposals relating to the authentication of the use of an address in a routing context must either rely on the use of signed attestations that need to be validated in the context of a PKI or rely on the authenticity of information contained in IRRs.

The weakness of routing registries is that the commonly used access controls to the registry are insufficient to validate the accuracy or the current authenticity of the information that is represented as being contained in a route registry object. The information may have been accurate at the time the information was entered into the registry, but this may no longer be the case at the time the information is accessed by a relying party.

The psBGP approach was also motivated by the proponent’s opinion that a PKI could not be constructed in a deterministic manner because of the indeterminate nature of some forms of address allocations. This lead to the assertion that any approach that relied on trusted sources of comprehensive information about prefix assignments and the identity of current right-of-use holders of address space was not a feasible proposition. Accordingly, psBGP rejected the notion of a hierarchical PKI that could be used to validate assertions about addresses and their use.

Interestingly, although psBGP rejected the notion of a hierarchical address PKI, psBGP assumed the existence of a centralized trust model for AS numbers and the existence of a hierarchical PKI that allowed public keys to be associated with AS numbers in a manner that could be validated in the context of this PKI. This exposed a basic inconsistency in the assumptions that lie behind psBGP, namely that a hierarchical PKI for ASes aligned to the AS distribution framework was assumed to be feasible, but a comparable PKI for addresses was not. Given that the same distribution framework has been used for both resources in the context of the Internet, it is unclear why this distinction between ASes and addresses was necessary or even appropriate.

psBGP used a rating mechanism similar to that used by PGP, but in this case, the rating was used for prefix origination. An AS asserted the prefixes it originated and also could list the prefixes originated by its AS peers in the signed attestation. The ability of an AS to sign an attestation about prefixes originated by a neighbor AS allowed a psBGP speaker to infer AS neighbor relationship from such assertions, allowing the local BGP speaker to construct a local model of interAS topology in a fashion analogous to soBGP.

One of the critical differences between psBGP and soBGP was the explicit inclusion of the strict AS Path validation test, namely that it was a goal of psBGP to allow a BGP speaker to verify that the BGP UPDATE message traversed the same sequence of ASes as is asserted in the AS path of the UPDATE message.

The AS Path validation function relies on a sequence of nested digital signatures of each of the ASes in the AS path for trusted validation, using a similar approach to sBGP. However, psBGP allowed for partial path signatures to exist, mapping the validation outcome to a confidence level rather than a more basic sBGP model of accepting an AS path only if the AS path in the BGP UPDATE message was completely verifiable.

The essential approach of psBGP was the use of a reputation scheme in place of a hierarchical address PKI, but the value of this contribution was based on accepting the underlying premise that a hierarchical PKI for addresses was infeasible. It is also noted that the basis of accepting inter-AS ratings in order to construct a local trust value was based on accepting the validity of an AS trust rating, which, in turn, was predicated upon the integrity of the AS hierarchical PKI. psBGP appeared to be needlessly complex and bears much of the characteristics of making a particular solution fit the problem, rather than attempting to craft a solution within the bounds of the problem space.

The use of inter-AS cross-certification with prefix assertion lists introduces considerable complexity in both the treatment of confidence in the assertions and in the resulting assessment of the reliability of the verification of the outcome. psBGP does not consider the alternate case where the trust model relating to addresses is based on a hierarchical PKI that mirrors the address distribution framework. In such a case the calculation of confidence levels would be largely unnecessary. The major contribution of psBGP relates to the case of partial deployment of a security solution in relation to AS Path validation, with the calculation of a confidence rating in the face of partial security information.

Inter-domain Route Validation

The approaches to securing the semantics of BGP described in this section so far all entail changes to the operation of BGP itself and operate most effectively in an environment of universal deployment. In practical terms, this is an unlikely scenario, and the experience with the uptake of modifications to BGP that supported 32-bit AS number values suggests that the public Internet has considerable inertia and is very resistant to adopting changes to BGP. In such a system as large as the public Internet, long term piecemeal deployment is a far more likely scenario.

The approach proposed with Inter-domain Route Validation (IRV) is not to modify the BGP protocol in any way, but to define a companion information distribution protocol. The intent here was to attempt to provide legacy compatibility and incremental deployment capability.

The IRV approach replaced the concept of simultaneously feeding both routing information and associated credentials in BGP with the concept of moving the provision of credentials into a query response framework. In such a framework the receiver of a route object can query the originating AS as to the authenticity of a received route object, or request additional information relating to the object in a similar fashion to the information contained in an IRR (RFC 1786).

In IRV, each AS is responsible for providing an IRV server capable of providing authoritative responses relating to prefixes originated by this AS. IRV is envisaged as being used to provide routing policy information, using the Routing Policy Specification Language (RPSL) (RFC 2622, RFC 4012) structure already used by the IRRs, community configuration information, contact information, a local view of the routing system in terms of received route advertisements and withdrawals and route updates that have been sent to neighbouring ASes.

Assuming that there is a way to reliably query a per-AS IRV server, and receive a response that can be validated, then AS origination validation in the IRV framework is a case of querying the originating AS’s IRV server with the origination query for the prefix in question and verifying the response. In a similar fashion AS Path validation is a case of querying each AS’s IRV server in the AS path, confirming that an advertisement was received from the previous AS in the AS path and that an advertisement has been sent to the next AS in the AS path (Figure 3).

Diagram showing how AS Path Verification works using IRV. — Figure 3 — AS Path verification using IRV.

This approach is midway between a strict AS path test that validates that the UPDATE message was passed along the AS sequence described in the AS path, and AS path plausibility that validates that there is a set of AS peer connections that correspond to the AS sequence. Here the validation test is that each AS in the sequence is currently advertising this prefix to the next AS in sequence.

This IRV architecture has a number of issues that are not completely specified, including IRV discovery, IRV query redirection, authentication of queries and responses, selective responses, transport layer protection and imposed overheads. It is unclear how an IRV response is to be validated, and how the relying party can verify that the received response originated from the IRV server of the AS in question, that the response has not been altered in any way, and that the response represents the actual held state in the queried AS. A similar concern lies in the estimation of additional overhead associated with performing a query to each AS in the AS path for every received BGP UPDATE.

It is also unspecified whether the query and response is a precondition to the local acceptance of a BGP route or not. While making validation of a route a precondition for acceptance of a route would appear to offer a more robust form of security, it is also the case that the IRV associated with the originating AS may only be reachable via the prefix being advertised, in which case the IRV would be unreachable until the route is accepted. It is also unclear to what extent the additional information that the IRV could provide would be useful within strict real-time constraints.

The IRV approach is essentially an extension of the IRR concept that further decentralizes the publication point of routing information to individual ASes. It extends the IRR in a manner that is intended to provide adequate assurance that received responses are responses to the original query, that the response has been formed by the authoritative IRV for an AS, that the response is complete and has not been altered in any way, and that the response is an accurate representation of the state of the remote AS, using DNS-style chained lookups. What is unclear here is whether this decentralization has superior performance and security properties to an alternative approach of further augmentation to the existing IRR framework.

A similar approach within the IRR framework that integrates the concept of an address and AS PKI could make provision for signed responses in a way that allows the IRR client to authenticate that the response is accurate, current, and contains information that has been digitally signed by the AS or prefix holder. In such a model of publication, the relying party can validate the authenticity of the IRR object independently of the manner in which the object was published or the manner in which it has been retrieved.

Secure Path Vector routing for securing BGP

Secure Path Vector routing for securing BGP (SPV) is another proposal that explores the feasibility of using symmetric cryptographic operations to secure the AS path in BGP UPDATE messages using hash chains and trees. The SPV study identified the following classes of path attacks:

Forgery where false paths are associated with routes in order to influence local route selection decisions,
Modification where the path is altered in order to hide the UPDATE from a target AS or in order to influence local route selection decisions,
Denial-of-Service where the attack attempts to overwhelm the intended victim’s resources, and
Worm-holing where colluding adversaries assert false AS-to-AS links.

The first two classes are attacks via BGP, whereas the second two could be more accurately classified as attacks on the routing system itself through multi-party collusion. SPV takes the approach of tree-authenticated hash values and applies this specifically to AS Path validation as an alternative to the nested digital signature structure proposed as the AS Path validation mechanism of sBGP. The paper claims significantly improved processor performance using this technique, based on the difference in computational complexity for asymmetric cryptography from symmetric cryptography as used in hash functions.

This proposal falls into the category of proposals that calls for changes to the operation of the BGP protocol. In this case, the significant change is the requirement that all routes must be re-advertised to peers within a fixed time interval. This is the weakest part of the approach in terms of performance evaluation, as much of the leverage in terms of scaling BGP, is based on the use of a reliable transport protocol for BGP messages which, in turn, obviated any need for a BGP re-advertisement function. The need to regularly re-advertise the entire routing table to all peers has some adverse implications in terms of the performance of the protocol and its scaling capabilities.

SPV also assumes that the originating AS has knowledge of the private key associated with an address, as distinct from the more logical approach that an originating AS need only be able to produce an authority from the address allowing the AS to originate the advertisement. This approach, while efficient on processing speed, requires more storage, a higher level of time synchronization, higher update rates within the BGP protocol, coupled with some form of loose time synchronization and complex key pair distribution. It has also been observed that SPV does not sufficiently protect against route forgery and eavesdropping or collusion attacks.

Signature amortization and aggregate signatures

If the signature load of sBGP is the problem, then how can this load be reduced? This question has been studied in a number of papers.

On technique shows it may be possible to amortize the cost of signature validation over many messages. This technique signs a subset of the connected topology over which an UPDATE flows and placing a topology description as a vector in an equivalent of an AS connectivity attestation which is flooded to all relying parties. The AS-Path signing can then be generalized such that the same vector is reproduced in the signed data, with the AS neighbors who were passed the UPDATE messages marked in the bit vector. All AS neighbors can now receive the same UPDATE.

Related work combines the time-efficient approach of signature amortisation with space-efficient techniques of aggregate signatures to propose a set of constructions for aggregated path authentication that improve on sBGP’s requirements for processing throughput and memory space.

Aggregate signatures apply to a collection of UPDATE messages that are to be sent to a peer. Instead of signing each UPDATE separately, the UPDATE messages are hashed into a Merkle hash tree and the root of the tree is signed. The UPDATE and the root of the hash tree are then sent as the signed UPDATE to each peer. This technique improves upon the approach of Boneh et. al., which uses bi-linear maps instead of Merkle hash trees.

Exploiting path stability

Mitigating the validation overhead can also be achieved by caching validation outcomes and reapplying the outcome if the same update information is received within the cache lifetime. A study by Butler, McDaniel and Aiello noted that across a one-month period less than 2% of advertised prefixes were advertised using more than 10 paths and less than 0.06% of prefixes were advertised with more than 20 paths.

Their paper proposed combining a number of approaches to reduce the AS Path validation workload. The first was the use of hash chains and signature aggregation, where a BGP speaker sends all local viable paths to its peers along with the tokens that represent hash chain anchors, allowing route change to be represented by an authentication token that can be validated by hash operations. The second part of the approach was to use Merkle hash trees to sign across a set of UPDATE messages that are queued awaiting the MRAI Timer. The third part of the approach was to exploit the stability of path advertisements to amortize cryptographic operations over many validations, achieved by caching the cryptographic proofs.

The paper asserted that simulations point to a reduction of the computational costs by as much as 97% over existing approaches using this approach.

Another approach, termed pretty good BGP (pgBGP), analyses path stability over a longer period of time and builds a local database which is then consulted in order to detect anomalous routes.

The idea is that origin ASes usually do not suddenly change over time for certain prefixes, and that such a sudden change might indicate an attack on the routing system. pgBGP does not provide completely automated security, as it does not eliminate any route advertisements, but rather puts them into quarantine for 24 hours (similar to route flap damping), giving operators the time to decide how to classify the event.

This proposal can be incrementally deployed and imposes little overhead on the routing system. It is a method to mitigate the effects of an attack on the routing system and not an effective mechanism for the prevention of such attacks.

Detecting prefix hijacking

One special case of routing attacks that is considered a major threat and evokes high interest in the research community is prefix hijacking.

There has been a considerable amount of research undertaken in order to provide security against this single form of attack. The approaches (here, here and here) describe possible methods of detecting prefix hijacking as well as complete systems and implementations of prefix hijacking detection in order to possibly react to the attack.

These systems (described here, here, here and here) rely on existing external route monitoring databases like Route Views or need special routing registries to be deployed to detect prefix hijacking. The quality of such prefix hijack detection systems is strongly dependent on the quality of the route databases, all of which have some level of perspective bias given that all views of the BGP routing system are relative to the location of the collector.

Another method to detect prefix hijacking is to look for a Multiple Origin AS (MOAS), which can be either a sign of multi-homing an AS or a sign of bogus route announcements, thus prefix hijacking.

A different approach is presented for ISPY, which tries to detect prefix hijacking by continuously probing known transit ASes in order to detect whether the prefix owned by the probing AS has been hijacked through a path change in the routing fabric to reach the address prefix.

Secure BGP and BGP dynamics

If securing BGP is a case of applying cryptographic operations to BGP UPDATE messages, then the other approach to reducing the security overhead is to exploit the dynamic behaviour of these messages.

In one study into BGP update dynamics, a cache of 10,000 prefixes and AS Path validation outcomes, or less than 5% of the total number of distinct routed entries, was shown to achieve a cache rate of between 30% to 50% using a simple least recently used cache replacement algorithm.

When distance vector algorithms react to a change in prefix reachability a number of UPDATE messages are generally observed before the routing system reaches a stable state. A study of BGP convergence across the global Internet concluded that the severity of path exploration and the convergence speed depends on the relative positions of the event origin and the observer. This study aligned the originator and the observer in terms of the ‘tiering’ of Internet Service Providers and noted that these extended convergence times and larger path exploration events occurred at lower levels of the tiering hierarchy. It was hypothesized that the richer inter-connectivity that was typically prevalent at such lower levels in the tiering hierarchy was a major contributing factor here. Fail-over and new route announcements converge in similar times, while route withdrawals have far longer convergence times.

A similar study on BGP’s path exploration characteristics proposed modifications to the BGP UPDATE message intended to identify and limit the path exploration behaviour of BGP. If a significant level of update load is related to path exploration and a significant level of AS Path security overhead is related to validation of short-term transient routing states associated with path exploration, then another direction in terms of reducing security overheads is to limit path exploration behaviour. An approach to do so by selective damping of BGP updates that are characteristic of BGP path exploration following a withdrawal at source is described in Path Exploration Damping.

Further study of BGP update behaviour has explored the level of determinism that exists in BGP’s route selection process and noted that in the absence of the Multiple Exit Discriminator (MED) and route reflectors, then the process can be considered to be a deterministic one. The paper suggests some refinements to BGP that could achieve a similar outcome to MEDs and route reflectors while preserving the deterministic route selection property. The question this paper raises is that most security proposals view AS Path validation as an ‘after the event’ activity because of the assumed lack of predictability in BGP. This paper questions this basic assumption and raises the possibility of path security as a provisioning activity, which, in turn, raises some interesting performance optimizations for BGP path security as a provisioning exercise rather than a reactive task.

Securing the data plane

Securing BGP is not only a matter of securing the control plane but also of securing the data plane and making sure that the status of the forwarding table is consistent with the advertised BGP routing information.

A study by Mao et al. showed that up to 8% of the paths advertised through the control plane, do not match the actual paths in the data plane. The data plane is not only subject to attacks that try to subvert the routing system, but also subject to synthetic BGP announcements from network operators that could enable the theft of carriage capacity. It is, therefore, necessary to provide security for the whole data path, and not only on a Next Hop basis as Stealth Probing intends to, as carriers might span over multiple ASes and synthesize false routing information that spans multiple AS hops.

Proposed approaches mainly focus on probing the full data path through packet injection, trying to detect and isolate malicious routers. In ‘secure traceroute‘ a modified traceroute is used to control which path data packets actually take and compares it to the actual AS path of the routing table, effectively detecting malicious ASes. Secure traceroute comes with the overhead of a PKI and related key exchange and no chance for piecemeal deployment.

The Faith approach instead focuses on using traffic summary functions, and comparing their results with those of other routers, allowing to detect ASes which provide anomalous values. These traffic summary functions seem to be prone to inaccuracy due to a variety of applications running on routers which might alter the packet flow and their application appears infeasible in routers with very high packet volumes.

The solution proposed as Listen and Whisper tries to detect inaccuracies in the data plane (the listen part) but focuses also on control plane security (the whisper part) and aims to provide an almost complete BGP security solution, combining both parts.

Compared to sBGP, Listen and Whisper should be classified as a ‘just too late’ solution for BGP security, like many solutions which try to ensure data plane — control plane consistency. Like other data plane security solutions, this approach seems infeasible, as it tries to detect data plane anomalies by analysing individual TCP flows, and scaling this approach to the high-speed core of the Internet presents some practical challenges.

Another approach aims for high performance and possible partial deployment. Its focus is to ensure that the data path always conforms to the announced AS path, which is achieved by probing data paths through injecting tagged IP packets, or by using IP options. Similar to pgBGP, it leaves the decision of which action to take towards a malicious router to the network operator and builds up a small database to detect possible malicious routers. It deploys the roles of verifiers and provers on certain ASes, with the verifier being an AS that wants to verify a certain route, and the prover being an AS that helps the verifier in the process by replying on probe data.

Even though all these approaches intend to provide a certain level of data plane security, and also a certain level of control plane security, none provide comprehensive data plane security. The authenticity of a data path from start to end could easily be forged by two ASes deploying tunnels between them, and thus disabling the possibility to effectively verify the data path by a third party.

IETF Activity — RPKI, ROV, BGPSEC and ASPA

Following a number of efforts to make progress in this area, the IETF charted a Routing Protocol Security Requirements Working Group (RPSEC) in 2002 to develop a common set of security requirements for routing protocols (the activity concluded in 2009).

In terms of the study of inter-domain security requirements, the work stalled on some fundamental and evidently irreconcilable disagreements over the issue of the requirements for AS Path security and the BGP-related working drafts from the RPSEC Working Group were never published as RFCs.

Based on the initial RPSEC work on the security of route origination, the IESG chartered the Secure Inter-Domain Routing Working Group (SIDR) in 2006. The charter for this effort presented some issues, in that it was stalled in assuming security requirements for AS Path validation and had to await results from the RPSEC activity.

Given that RPSEC was unable to agree on a requirement for AS Path security then the initial work in SIDR was concentrated on securing the origination of routing information, rather than its propagation through the inter-domain space. Notably, in retrospect, SIDR was also constrained from making any changes to the BGP protocol, implying that any security framework applied to the operation of BGP was to be positioned as an overlay rather than a basic change to the BGP protocol itself. This turned out to be a very important decision as it precluded some design decisions that would turn out to be critical for the SIDR design work.

The initial SIDR products were a collection of specifications that described a profile for a PKI for IP addresses and AS numbers (the RPKI), as well as a model for publication and maintenance of local cache, discussed earlier in Part 1 of this survey. From this foundation, the SIDR Working Group moved on to Route Origination Validation (ROV).

Route Origination Validation

ROV builds upon the earlier work in the Routing Registry effort, where a prefix holder is able to publish information as to how an address prefix is to be announced into the routing system by nominating the AS number(s) that are permitted to originate a routing announcement for the prefix. In the ROV framework. In the RPKI framework, this information is published as a signed Route Origin Authorization (ROA) (RFC 6482, RFC 6483).

A ROA is signed by a prefix holder and denotes permission given by the address prefix holder for an AS to originate a route.

There are a number of additional implications associated with publishing a ROA. The first is that no other ASes have permission to announce that prefix when there is a cryptographically valid ROA extant in the RPKI system. If the prefix holder wishes to authorizer multiple ASes to originate a route for this prefix, then the prefix holder must generate multiple ROAs. This means that an address holder can declare that a prefix should not be routed at all by issuing a ROA that provides permission to AS0.

Secondly, the ROA denies permission for any AS to originate a prefix that is more specific than the prefix listed in the ROA. There is a MaxLength attribute of a ROA that may be used to define a range of more specific prefix lengths that are permitted by a ROA. Thirdly, there is no acknowledgment of the ROA on the part of the AS. A prefix holder may publish a ROA providing permission to an AS who is unaware of the permission.

There is no symmetric instrument in the RPKI framework relating to the AS holder. An AS holder does not have the ability to issue a signed attestation that lists all the prefixes that it intends to originate in the routing system.

There is one more important component of the ROV framework, namely the RPKI To Router protocol (RTR) (RFC 8210). This protocol allows a crypto engine to be removed from a router and operate on a dedicated platform. The result of this local processing of ROA data is expressed in the form of a filter list, and this filter list is implemented as a shared state between an RTR server and one or more RTR client routers. This mechanism offloads most of the RPKI overheads from the router and leaves just a residual filtering function on the router.

BGPsec

THE SIDR working group commenced work on an extension to BGP that would allow validation of the AS Path attribute in 2011, and the standard track specification of BGPsec (RFC 8205) was published in 2017.

Unlike ROV, BGPsec is not implemented in an off-router mode but is implemented through the definition of non-transitive BGP AS Path attributes. These attributes carry the digital signatures produced by the AS that propagates a BGP UPDATE message. These signatures, signed by the AS, provide confidence that every AS listed in the AS Path attribute has handled the propagation of this prefix, that the order in the AS path is the exact order of propagation of the UPDATE message through the inter-domain routing space, and each AS listed have explicitly authorized the propagation of an UPDATE message to its eBGP peer.

BGPsec appears to be solidly based on the concepts described in the earlier sBGP work (RFC 4301). In essence, each eBGP speaker generates a digital signature that covers the information it received (including that digital signature) and the AS number to whom this UPDATE is to be sent (Figure 4).

Diagram showing how BGPsec handles AS Path Signature structure. — Figure 4 — BGPsec handling of AS Path Signature structure.

There is a wealth of detail behind this simple summary, but it can be summarised by the observation that this mechanism ties the AS path in the UPDATE message to the sequence of ASes that handled the propagation of the route object. See RFC 8374 for a detailed exposition of BGPsec’s design decisions.

Step wise, AS Path validation cannot tolerate AS Sets in this approach, nor AS Confederation Sets, that are in the process of being deprecated in response to this limitation (RFC 6472). In a similar vein, BGP Route Reflectors require special processing, as do private AS numbers.

There are a number of consequences of this design approach.

The first, and perhaps the most important consequence, is that piecemeal incremental deployment is simply not possible in BGPsec. When an UPDATE is passed from a BGPsec BGP speaker to a non-BGPsec BGP speaker all BGPsec attributes are lost. This means that if the UPDATE is further propagated to a BGPsec BGP speaker the initial BGPsec information is unavailable. In today’s Internet, the consequences of this highly constrained deployment scenario are prohibitive factors for adoption.

This approach also places a high crypto processing load on BGPsec-aware BGP speakers. There is some scepticism that this is a feasible impost on the Internet’s routing infrastructure, and this scepticism guided the design of the ROV RTR approach. However, for BGPsec not only are routers expected to process the BGPsec messages but also hold secure private keys to perform signing on the fly for outgoing UPDATE messages.

Thirdly, while this approach can provide some assurance regarding the ‘correct’ operation of the BGP protocol and can detect efforts to tamper with update messages but there is no protection against spurious WITHDRAW messages, no ability to ascertain the alignment of the route object with the network’s forwarding state and no protection of alignment of the UPDATE with the policy state. In other words, route leaks can still occur in BGPsec.

In summary, BGPsec represents a relatively high overhead to pay for a limited set of assurances and a limited protective capability. Furthermore, there is a more extreme view that BGPsec cannot achieve any of the security properties due to the fundamental design principles of BGP and BGPsec. In one research paper, it is asserted that in BGPsec, routes can still be hijacked, and routing loops can still appear. The paper’s authors hope to stimulate further dialogue to rethink the fundamental tenets of BGP and BGPsec designs by publishing their analysis of the observed shortcomings of BGPsec.

Autonomous System Provider Authorization (ASPA)

The issue with the overall SIDR approach to BGP security is that if BGPsec is impractical then we cannot rely on ROV alone. All a determined routing attacker needs to do is tack on the originating AS to a synthesized AS path and any AS sequence can be placed in the AS Path attribute of a synthetic route.

ROV represents a substantial effort to get the infrastructure deployed, but without any form of AS Path protection, the level of protection offered by ROV is minimal at best. The conclusion is that ROV needs to be accompanied by some form of AS Path validation if it is to be useful.

There have been a number of proposals to address this shortfall. An interesting approach is Peer Locking, which is based on the observation that the core of the routed Internet is a small set of Tier 1 ASes, and no customer of an AS should be announcing a route where the AS path includes any of these Tier 1 networks. Secondly, no more than two of these Tier 1 ASes should appear in any AS path, if there are two such ASes in the AS path they should be adjacent. This approach does not necessarily catch much in the way of deliberate efforts to generate a synthetic AS path, but it can be effective in catching a number of common forms of route leaks, and its implementation is quite simple and very lightweight.

Can we do better?

In what appears to be a replay of the situation from around 2000 when soBGP was proposed as a lighter weight response to the crypto load associated with sBGP in the area of AS Path validation, there has been a proposal to use RPKI-signed AS adjacency attestations as a response to the issues with BGPsec.

There is a slight twist on this, however which different from soBGP, in that there is an element of routing policy that is also used in the ASPA proposal. Instead of an AS listing its adjacent ASes in the inter-domain routing space and requiring both ASes to list each other as BGP neighbors before accepting the AS adjacency as valid, the ASPA framework requires an AS to list only its adjacent ASes that act in a transit provider role to the issuing AS. Given that a common criticism of BGPsec, sBGP and soBGP was that these proposals were incapable of identifying route leaks (as route leaks represent a violation of route policy as distinct from a violation of the BGP protocol itself) ASPA provides a means of identifying such route leaks.

The ASPA relationship is a graph fragment in the directed graph which describes the inter-AS topology. The property used by the ASPA proposal is described as ‘valley-free’ AS Paths. All AS Paths can be characterised by zero or more paired relationships from Customer-to-Provider (up), zero or one Peer-to-Peer relationship (flat) and zero or more Provider-to-Customer relationships (down). In other words, all viable AS Paths are a sequence of customer-to-provider (up) AS pairs, then a peer AS pair, then a set of provider-to-customer (down) AS pairs. Any AS sequence that contains a down then an up (or a ‘valley’) represents a customer AS leaking routes learned from one provider to another (Figure 5).

Diagram showing ASPA and route leaks. — Figure 5 — ASPA and route leaks.

ASPA requires any AS that issues an ASPA object has to comply with the constraint that the providers listed in an AS’s ASPA are the complete set of providers for that AS.

ASPA still provides some benefit even in scenarios of partial deployment. Once an AS issues an ASPA then a routing attacker can only include this AS in a synthetic AS Path attribute if it also includes an adjacent provider AS, and the synthetic AS pair can only be inserted in the ‘front part’ of the AS path (Customer-to-Provider) if the order is preserved, and in the ‘back part’ of the AS path (Provider-to-Customer) in reverse order. Like soBGP, the use of ASPAs does not necessarily prevent the synthesis of AS paths by a routing attacker, but it limits what can be used to make such synthetic paths, and the greater the use of ASPAs the more it becomes the case that the only AS paths that can be synthesized are viable BGP AS paths in any case. soBGP termed this constraint AS Path Plausibility, and the same condition applies to ASPA.

It’s evidently still early days for ASPA and after three years the work remains a study item in the SIDROPS Working Group of the IETF. Part of the issue here is that the SIDROPS Work Group has turned its attention to the operational aspects of the operation of the RPKI and has taken on the role of the RPKI operational maintenance working group and has had its collective attention diverted from the issues of BGP security mechanisms and AS Path validation. And in the area of RPKI operations the topic that is taking up the Working Group’s attention is not the PKI itself, but the ongoing ramifications of the original design decision to use an out-of-band client-pull credential distribution mechanism for RPKI distribution. The emerging observation is that this original design choice is sufficiently flawed that the efforts in the working group to adjust the parameters of this distribution system will in all likelihood be unable to adequately address the operational issues that accompany scaling up the use of the RPKI credential system.

It may be productive at this point in time to re-open the question of how to use BGP itself to perform a just-in-time push-based distribution of BGP security credentials, but within the structure of the IETF, it is difficult for an operationally focussed working group to perform protocol development work. However, it’s equally difficult for the IETF to reopen a protocol design effort on BGP security so soon after the closure of the original SIDR effort. The protracted and painful saga of the DNSSEC development effort in the IETF is one that many participants in the IETF are unwilling to repeat for BGP security.

Open questions on securing BGP

It appears to some observers that no current solution to routing security has found an adequate balance between appropriate security and acceptable deployment overhead, and that’s an observation that I can agree with. We are just not there yet.

Current research on BGP performance is focused on topics related to scalability, convergence times, stability and consistency, while the questions on security research have been focused on the integrity, authenticity, authority and verifiability of routing information. These two fields of research are inherently connected, in that a more stable routing system that was able to provide clear indications when convergence to a stable routing state had been achieved is believed to also provide clear indications of when verification of routing information is appropriate.

In exploring the threat model for BGP it is noted that BGP was designed to support inter-domain routing between trusted networks, while today’s networks operate in a looser confederation that does not exhibit the same mutual trust properties. Not only are the TCP sessions used by BGP vulnerable to attack, and the messages used by BGP vulnerable to alteration in order to disrupt the network’s routing system, but the integrity of the operation of BGP is also threatened by misconfiguration, where incorrect information is injected into the routing system unintentionally, and by router vulnerabilities where a compromised routing system can exploit its trusted role and intentionally inject false information into the routing system.

Some of these attacks are intended to cause a BGP speaker to be overwhelmed and reset, as BGP is a method of directly accessing a router’s processing unit and a saturation attack can cause processor and memory overload. Other attacks are aimed at altering the router’s forwarding state, generating an incorrect or unintended forwarding state for one or more prefixes. Other forms of attack are aimed at causing a BGP speaker to become unstable and thereby disrupt the forwarding function and impact on applications. A BGP session that is being continually reset will cause large local traffic bursts as neighbouring BGP speakers continually resend their routing tables upon each reset, but the continued instability will trigger a flap damping response in other BGP speakers.

The factors that contribute to these vulnerabilities include a lack of BGP message integrity checks, an as yet partial ability to check the authority of an originating AS to actually originate an advertisement for a prefix, and an inability to verify the accuracy, completeness and authenticity of AS Path attributes of a routing advertisement. The use of the RPKI to support address attestations, as in ROAs, provides a very robust means of detecting incorrect origin route objects, as long as the RPKI itself is accurately aligned to the address distribution framework and as long as the RPKI is generally, if not universally, used.

In contrast, robust solutions to the problem of AS Path authentication have been elusive so far. BGPsec provides a robust method of path validation but has been assessed to be significantly expensive in terms of processor and memory cost, and also detrimental to BGP convergence times and requires comprehensive adoption to be effective. Efforts to substitute AS Path plausibility in place of actual AS Path validity, as is the case with ASPA, offer a different level of robustness that appears to be more practically achievable.

The study of approaches to securing BGP has raised several questions about the behaviour of inter-domain routing and the most effective approach to securing BGP. These questions include consideration of security topics and raise the issue of whether it is possible to secure the routing information to the extent that the routing information being presented is tightly aligned to the associated forwarding state.

Is it possible to secure this association of routing information to the chained forwarding state? Can a BGP speaker validate that the AS path as presented in a BGP route advertisement not only matches the BGP propagation path taken by the prefix advertisement but that the network’s current forwarding state to reach the address prefix is aligned to this AS path and this alignment can be validated? To put it simply, can a router validate that a route matches the forwarding path? This question is not one that is directly addressed within any of the current set of inter-domain routing security measures.

A related issue concerns the overheads of securing BGP and the scaling properties of BGP. Is BGP too monolithic a protocol even before adding security capabilities? BGP simultaneously performs the functions of exchanging reachable prefixes, maintaining an inter-domain network topology, binding prefixes to paths, and implementing routing policy. Would inter-domain routing be more scalable if these functions were to be performed by separate protocols? Adding security and authentication within BGP, as in the sBGP model, increases the complexity of the protocol and may diminish its long-term prospects for scalability across ever larger and denser inter-domain topologies. At the same time, using a separate mechanism to flood security credentials in a manner that is entirely distinct from BGP itself, as used in the ROV framework, becomes a source of additional operational complexity and potential vulnerability, even though the BGP protocol itself is unaltered.

There are several practical and some more fundamental questions relating to securing BGP.

The first is a practical question relating to the inevitable design trade-off between the level of security and the performance overheads of processing security credentials. The question concerns what aspects of securing BGP should be considered essential and what is simply desirable, but not essential. Our level of understanding as to what aspects of BGP performance and load are critical for the robust operation of network applications and what is not so critical appears to be less than comprehensive. The impact of performance trade-offs in BGP in terms of time to converge, the size of the routing space, the router memory and processing load and scaling capability are not well understood to the extent that there is a commonly accepted answer here.

The next question is whether verification of the correct operation of the BGP protocol is sufficient, or whether the policy intent of the routing environment is equally critical. For example, if a stub network were to leak the routes it learned from one transit network to another transit network this route leak would, in the normal situation, be regarded as contrary to routing policies, but there is no violation of the BGP protocol itself. If we want to also include alignment to routing policies then the question arises as to how such policies are to be expressed, who has the authority to express them, and how BGP speakers reconcile local routing policies with external routing policies when the policies differ.

The next question is whether securing the operation of the BGP protocol (securing the control plane) is sufficient in and of itself to adequately mitigate the vulnerabilities in the overall routing system, or whether it is also necessary to include mechanisms that extend the security model to validate that the routing information represents current forwarding state in each routing element in the network (securing the data plane). One perspective on this is that securing one element of the system with multiple components does not necessarily address the underlying vulnerabilities of the entire system. The more common outcome is that such work exposes the residual vulnerabilities in other components and that an effective security system needs to address all components of the routing system. While it may be possible for a BGP speaker to be able to validate that the originating AS did indeed originate the prefix advertisement and that the AS path accurately represents the propagation path of this advertisement through the network, that is not the basic question in terms of the properties of the overall system.

The more basic question here is whether a BGP speaker can verify that if it decides to forward a packet on the next hop along a path indicated by the routing system as the optimal path to a destination is this indeed the optimal local choice and does this next hop decision pass the packet ‘closer’ to the destination address?

If a comprehensive security framework is proving to be elusive in terms of deployment considerations, then could a less comprehensive approach offer acceptable outcomes? Many security frameworks demonstrate a profile of diminishing returns, where the incremental cost of deployment of additional security capabilities increases, while the incremental benefit in terms of risk mitigation decreases. In the case of securing BGP could an approach of reducing the security credential generation and validation workload, through reducing the amount or timeliness of validated information, represent an acceptable trade-off? We see a practical form of this question today, where the capabilities offered by ROV can mitigate some forms of routing incidents but are ineffectual against other forms of route manipulation that preserve the origination data. Practically, is this enough? Or do we need to also deploy some mechanism that allows detection of various forms as AS Path manipulation? A similar question relates to the comparison of the earlier soBGP and sBGP models. Is Path Plausibility sufficient? Did the mechanisms of soBGP exercise sufficient levels of constraint such that any synthesized path is close enough to a viable network path that the difference is of little consequence from a security perspective? This question is being replayed today when we consider the relative merits of the ASPA approach against the heavier weight of BGPsec’s fully signed AS Path attribute.

A final question here concerns the practicalities of deployment. The Internet is now far too large to sustain the concept of a Flag Day for the deployment of any technology. And it is not possible to assume that a technology would be universally adopted without a protracted period of piecemeal deployment as part of a transitional interval. Indeed, as the Internet continues to grow and the diversity within the Internet increases, the anticipated transitional periods become indefinite, and piecemeal deployment becomes a continuing factor rather than a temporary transitional factor. The questions this exposes include whether it is even possible to deploy high integrity security using partial deployment scenarios, or whether the BGP protocol is too incomplete in terms of its information distribution properties to allow robust validation of the intended forwarding state? Does securing forwarding require carrying additional information relating to the routing and forwarding state coupling in addition to routing that would be entirely impractical in a partial deployment scenario?

Conclusions

BGP has proven surprisingly resilient in terms of its longevity of useful operational life, despite early predictions of its imminent demise in favour of IDRP. BGP-4 has routed the inter-domain Internet since late 1993 and the number of routed elements for the IPv4 Internet ‘default-free zone’ has grown from under 20,000 distinct prefixes to some 1,000,000 distinct prefixes by the middle of 2021, and a further 130,000 prefixes in the IPv6 network.

Despite the changes in the IPv4 address infrastructure due to exhaustion of the registry free pools the growth in the number of routing IPv4 prefixes appears to continue unabated, and together with the continued deployment of IPv6, these numbers are expected to continue to rise in the coming years.

Due to its extensibility and large installed base, BGP-4 will likely remain the only inter-domain routing protocol in the foreseeable future for the Internet (although the term ‘foreseeable’ is prudently measured in units of years and perhaps not in decades). So far BGP has not changed in any substantive manner, including in its security properties.

There is ample evidence from reports of the use of unregistered addresses or of ‘routing incidents‘ that BGP is the subject of various forms of accidental inattention and possibly deliberate forms of abuse.

Current efforts at mitigation of these forms of abuse appear in the inter-domain routing space to be less than fully adequate and the ease with which unauthorized or bogus route objects can be injected into the inter-domain routing system remains a continuing threat issue for the security, stability and utility of the Internet.

We appear to be getting very comfortable in operating a network that experiences a continuing stream of routing incidents, both intentional and unintentional, and the longer this situation persists the more we are resigned to just accept this as the status quo for the Internet and place the onus on applications and content distribution systems to defend themselves from routing attack. Like many unintended outcomes, it’s not the outcome we would prefer to have, nor is it necessarily the optimal outcome in terms of collective cost and benefit, but it’s the outcome many of us have simply accepted.

All change comes at a price, and the more we resign ourselves to operating networks in the face of a poorly secured routing system the greater the effort required to make the case that the cost of a change to improve this situation will be money and effort wisely spent.

Rate this article

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.