Adoption call for draft draft-ymbk-idr-bgp-open-policy have started at IETF IDR working group. This draft covers core part of our route leak mitigation initiative: it describes usage of BGP roles and new non-transit attribute iOTC, that should be used to prevent route leaking in a fully automated way.
We invite network engineers to share their support or objectives at the IDR mailing list. Depending on this feedback, work on this draft will be continued to the RFC status or skipped. To provide feedback you need to:
Subscribe to IDR mailing list at subscription page;
Share your support or objectives at mailing list email@example.com;
Please also read the thread in advance to avoid repeated questions.
Hello. My name is Alexander Azimov, I am from Qrator Labs, and today I am going to provide you with an update on route leak topic. This update is not a new one, and it was already presented here for a few times, sometimes it was introduced by me. However, if there are newcomers, and I hope they are here, I would start from the definition of route leaks and its consequences.
So, for the purpose of this topic, I would speak only about transit route leaks. A route leak is a situation when a prefix is learned from one provider peer and announced to another provider or peer. So the real question is: Why you should care? Why should you care that for example one autonomous system accepted your prefix from one provider and announced to another one?
Unfortunately, you should — because these additional hops would typically increase your network delays. Some attacks could also be made by the man in the middle, and do you want to rely on, to make your network dependent, your global connectivity, your availability on a network that already proved to be unmanaged?
So, let’s see how and what networks are affected.
On the slide, you see the statistics that we gathered in April, and as you know, there are thousands of route leaks on a daily basis. From day to day, there is a set of affected networks exchanging, so during April, we saw more than 40,000 different prefixes that were in route leaks. However, in reality, the number of affected networks is higher, because a route leak is a double-edged sword. It does not only affects the leaked prefixes — but it also affects all networks that accept the leaked prefixes. What are these effects? You would be surprised, but they are totally the same. So if you are accepting the leaked prefixes, you redirect your traffic and traffic of your customers through these unmanaged networks with just the same results. Moreover, how are your networks affected by this?
Nearly all of us. So, every day, almost every network accepts at least one route leak.
So, the problem is global. We are all affected. So could we ask a real question? Who are the leakers?
I would put out the scope specific malicious leaks. They do exist, but the majority of leaks is a result of lack of understanding and mistakes. However, the number of the leakers or these autonomous systems who seem to have problems with their configuration is high. During April, there were more than 1,000 ISPs that were seen as the source of the leaks.
Also, as you can see, the trend shows if we keep investigating number of unique leakers we would find out that the number is even higher.
So, what we can do. Okay, we can try to reach those guys, we can try to educate them. And if they have goodwill, they may fix it, or may not. So we should focus technical side of this route leak problem.
So what we have on the table now.
Of course, we have communities. So if you set a proper community and after this set an appropriate filter, you would prevent your network from leaking. The problem here are ‘if’ and ‘proper’. There is no verification, no built-in support in the protocol, so, it is a standard BGP issue — it is too flexible. The result of this flexibility is thousands of leaks.
Another way, of course, is to build somehow ingress filters to detect leaks and stop its propagation. Maybe the best efforts here is to use AS sets, but here we have a problem again. Not all IRRs support AS sets. Moreover — not all AS sets are up to date. Not all AS sets are correct and you need no authorisation to, for example, add in your AS set, AS list of customers from DTAG. However, okay, let’s imagine us in a perfect ideal world where all AS sets are up to date and correct.
Still, it would not solve the general problem of route leaks, because if route leak happens inside your AS cone, the origin of the route would be correct and you have to accept it. Also, all upper-level upstreams would also accept it too.
So, what is your last effort? Your last effort is monitoring. Moreover, this is maybe the only option to detect route leaks that are taking place beyond your borders. Let’s make sub-conclusion.
Today if we set up a proper community, proper filtering, we would prevent our network from leaking. If we deploy some ingress filters, we can filter out some route leaks. So monitoring is the only option for you to detect route leaks that are happening beyond your network border. Monitoring can be also used to detect route leaks that you are accepting.
However, all these options do not provide you a mechanism to heal automatically. If you detect a route leak, you are not able to fix it. From this point of view, the problem seems to be very complicated.
At the same time, if we speak about peering relationships, they are not complicated at all. We have only four of them. Also, maybe the fifth one which is some complex combination of these four. So from my standpoint, it seems like BGP lacks representation of these native relations inside the protocol. So, to solve the problem, we propose to add BGP roles.
BGP role is new configuration option — just these four values (BGP roles). It is negotiated at the start of the BGP session using capabilities in open messages, and what does it mean if notification fails? It means that your neighbor or maybe you is trying to configure a BGP session in a wrong way. So, there is nothing to do. Just drop the BGP session.
So, I think that roles are actually native. Roles are not revealing anything to the third parties. So there is nothing to worry about. Also, roles have many applications. So roles could be used to automate several previously manually configurable mechanisms.
First of all — route leak prevention. As soon as we set up roles and we add one more attribute, which is called “internal only-to-customer attribute”, which has zero length, it just has a flag, and it is set on all routes that are learned from providers and peers. Moreover, we can also set automatic filters that would filter out routes when we are announcing prefixes to other providers and peers and if this attribute is set.
So, I hope you see — this is just an automated version of communities without any fat fingers inside, with no mistakes.
So here we can solve a problem of a leak prevention in a fully automated way. However, we can also address a problem of a leak detection.
Please meet “external only-to-customer attribute,” which equals 4 octet length, it equals to the value of autonomous system that set it. So, if a route is announced to a customer or peer, an autonomous system should set its value to its autonomous system number. These values should not be changed, and so in this scenario, it helps autonomous system 3 to detect a route leak that was made by autonomous system number 2. It also seems to be very simple, it just works.
So, what should we do if we detect route leak? It seems that we should filter it out, maybe drop the BGP session, but in reality, we should be very careful with the level of our aggression.
Because detection is based on an transitive attribute. As any other path attribute which is transitive it could be violated. So that is why instead of filtering, instead of dropping sessions, we should only make deprioritization of local prefix value. That is all. Also, this is would normally prevent your AS from propagating route leaks.
We have already made an implementation using a fork of Bird routing daemon, that you can find on the GitHub. So as you can see, there aren’t many strings you need to configure to automatically prevent route leaking and detect route leaks that are made by third parties.
So, for me, it seemed to be a cool idea. We have a general solution for route leak problem. It is in code. There are no fat fingers inside. It is fully automated. It is also verified by your neighbor, and the open messages guarantee that your role setting is correct. That is why we decided to move to IETF. However, it proved to be an interesting but hard and long road.
I would like to give here special thanks to Randy Bush because without his help I think I would already give up. Today, we have two drafts. First one covers roles and iOTC. The second one covers eOTC. So I hope they would be finally adopted and we would see roles in the release notes of the routing software shortly.
There are also several honorable mentions, one of them is BGP reject draft by Job Snijders which changes the default behavior of BGP router. So, if you have an absence of export or import policies, there’ll be no exchange of announcements.
There is also a competitor of eOTC made by other guys. Also, I would like to highlight that the only document that has RFC status here is an informational document. This document just describes what is route leak and gives us a classification for types of route leaks.
So, here is a question: Should we blame IETF for this slow motion? You know, I
see here, I think hundred of people and it seems that you are all interesting in routing, I hope so, but I think not all of you are on the IETF mailing list. So, instead of blaming IETF, collaborate with the IETF. This is the point.
So what we have on the table as a result?
In the meanwhile, you do not have any other options just to keep your community properly configured, keep your filters, monitor your prefixes to reduce the level of problems that route leaks create for your networks.
There is a chance that there’ll be a change in BGP protocol that would solve general problem of route leaks.
However, the existence or absence of this change depends on you and your collaboration with IETF. Thank you.