Why an under-attack DDoS mitigation service adoption is too late
Preliminary and detailed planning is essential if you intend to build a competitive service somewhere on the Internet. It could draw the attention of not only intruders but also competitors. At last, but not the least, unexpected and massive user influx is no difference from a DDoS attack.
If you have not thought of mitigation in advance, the conditions you would have to get back to this may be completely suboptimal. We decided to provide you with a few basic points, reclaiming main difficulties that any under attack service could encounter trying to adopt DDoS mitigation service.
If you are going on vacation to some tropical country on another continent, probably you must get a travel insurance and take care of vaccinations against most dangerous and common diseases.
Yes, the chance of a mosquito bite is quite small if you wear a long sleeve and roll your pants under the socks. You would be saved, after all. However, this process can be delayed, and you’d spend many nerves and, quite possibly, money for doctors and drugs both on the spot and after return. It could only take a while to get vaccinated and make sure that insurance covers every case.
What are the advantages of adopting mitigation before an attack?
First and foremost — user behavior. Observing target resource’ traffic before any stress we know pretty well how does the normal user behavior looks like and what statistics are natural. Under attack there is a gradual (or fast, depends on attackers actions) service degradation, significantly changing normal user behavior. In some cases, legitimate users actions begin to resemble the attack and worse its consequences. An example is a cyclic update of the same page by the user trying to get it shown in the browser.
The more service under attack degrades, the more difficult it is to distinguish legitimate users from malicious bots. If we have not seen and don’t know in advance what’s the standard user behavior and the legitimate usage statistics before the attack, it’s hard to find the sights to distinguish the real user from a bot that sends garbage requests to the server. Besides, the amount of traffic that corresponds to the standard workflow of the service would also be unknown.
If at the beginning of the attack filtration network or equipment were excluded from the mitigation process, and models have not learned, all users pressing the «Refresh» button are indistinguishable from the attack traffic, this the probability of their blocking is high. Previous user behavior data can protect both them and you from an occurrence of such scenario.
Similarly, with natural traffic and statistics — in the absence of such information at the time attack happens it would be challenging to say whether the traffic returned to normal of attack successfully mitigated. A normal user could be blocked, just having slightly different from the median behavior during service degradation.
Of course, such problems do not occur every time, but they become acute and certain attack scenarios. Depending on the complexity of the attacking bot and the attacker’s intelligence an unpleasant could be the attack to the root page. In this case, when the service degrades the bot behavior looks quite like the behavior of legitimate users who simply «could not load the page». In other situations, there are usually significant differences in legitimate and malicious behavior regarding data flow.
Imagine a tragic household situation — water pipe breakthrough, a flood. You were not ready, therefore have no plugs nor patches, where are no blocking cranes on the pipe and so on, worst case scenario. You could close and hole and stop leaking, but toys and furniture are already floating around the house, and guests are knocking on the door. Can you let them in? Hardly.
What are the real disadvantages and possible consequences when connecting under attack?
Strangely enough, even long-standing companies with experienced employees and thousands of customers all over the world do not always think about DDoS mitigation before the attack begins. We would like to change this situation through the delivery of adequate information to the audience, but let us not make illusions.
Let’s take a look at the key difficulties which we have to cope with the attack.
The first thing that could be stated with certainty in the case of a service adoption filtration system under attack is that more likely is experiencing partial or complete user inaccessibility.
Secondly, if you are attacked, an attacker probably knows not only your domain name but also the IP address. Also, most commonly your IP address belongs to the hoster, so we believe that the attacked is aware of this too. So most probably the attacked has enough information to modify the attack vector.
To quickly move to another hoster (perhaps you are not the aim), or to change to IP transit provider (maybe you are only an incidental victim of the attack on a larger target) it takes time.
Working out false positive is not the only problem. Under attack mitigation adoption does not solve the problem of unavailability, and if the attack has already started and managed to do harm, in addition to the connection of the resource for protection the damage control is needed. That is work on the consequences of the attack on the customer’s side and premise.
Sometimes it happens that the attack «went well» and caused considerable damage. This means the following — besides clogging the communication channel with junk, which still would be necessary further to organize the connection between the mitigation provider, while there are no dedicated channels, the border router or the balancer could «die». Other infrastructure components may also feel sick, like the firewall, which is a gentle device and does not like distributed denial of service attacks. Denial of the firewall threatens the whole network perimeter. Yes, this could be treated with the reloading of the failed equipment. However, it takes the time to get to it and wait for its return to a consistent state. All this time all traffic is skipping filters.
Speaking of attacks on the application layer (L7), when the channel could be okay, as well as the connectivity, but the web application is still sick, depressed and can, for example, fall off the database, or crash with it. Returning it to a normal state can take from several hours to several days, depending on the present architectural difficulties.
There are two most common methods of connection to the mitigation providers — using DNS or BGP. Let’s consider these two scenarios separately.
Let’s support that the attack is conducted only in the domain name.
A typical scenario for connecting to the mitigation provider via DNS looks like that: owner of a network resource represented by a domain name, where A-record represents the current IP-address of the attacked web server, turns to us for help. After formalities, Qrator Labs allocates client with a special IP-address, that he replaces his current A-record IP-address with.
At this point you have to consider the possible high TTL for changing the DNS record, which could range from a couple of hours to a day (RFC limit is 2147483647 seconds) maximum — during this time the old A-record would exist in the cache of DNS recursors. Therefore, if you realize in advance that an attack is possible, you should have a low TTL on the DNS A-record.
In some cases, even that does not help. After all, attacker checks effectiveness if his assault, and seeing that nothing happens clever villain would quickly switch to an attack on the IP-address he remembers, bypassing filtration network.
In this case, you need a new IP-address, unknown to the attackers, not compromising the domain name in any way, and preferably at a different address block, ideally from another service provider. After all, an attacker can always go further and switch the vector of the DDoS attack to the supplier’s address block (prefix), having sufficient capacity.
Everything looks different with BGP.
How does the connection to the mitigation provider look like if the attacked service has its address block (prefix) and wants to transfer the entire infrastructure for full protection, announcing own prefixes through the mitigation provider?
An autonomous system under attack is added to our AS-SET to announce its prefixes and starting this moment we have a 24-hour lag for receiving this information by all uplinks and update their prefix list. In an emergency case, we try to force this process, but this is not always possible in every case and is done manually. This said time is the key stress factor because it is required to protect the resource without any delay.
If the owner of a prefix prepares for an attack in advance, by making a series of preliminary settings, he could be announced to the Orator filtering network almost instantly, saving time and technicians nerves spend after-hours. Mutual integration, in this case, is not only technical but also a psychological process.
Your connectivity provider usually buys IP-transit from large and reliable operators, at least regional Tier-1. They filter the prefixes coming to them from clients based on some prefix list, which they take from accessible databases (RIPE, RADB and others). Some IP-transit providers update these filters once a day, others do this at request only. A qualified provider of a DDoS mitigation has his points distributed around the world, which means that you could not deploy changes instantly.
The most difficult case of adoption is the request of a complex infrastructure, with a variety of equipment: routers, firewalls, load balancers and other technological wonders. In such scenarios, even a normal connection to the filtration system is a long process, that should be done in a balanced and systematic manner. It is even embarrassing to mention here, that in the case of an attack on such a service, a connection to the filtration system consumes much effort and does not leave any time for careful planning. Under attack, every second is on the account. While these problems are not solved, the service has difficulties with the availability and continually degrades.
Often DDoS-attack is combined in some order with a hacking attack, either before the attack of denial of a service or after. The risks involved in such cases are of a different order: data leakage, root access. Such problems should be solved separately from the DDoS mitigation, which in the worst case leads to even longer unavailability or denial of service, and at best to increased delays for those trying to request necessary page.
Dear colleagues, we are glad to announce the following important piece of news: https://www.ietf.org/mail-archive/web/idr/current/msg18258.html
The initiative on BGP protocol upgrade implementing an automatic protection mechanism against route leaks, where Qrator Labs engineers were directly involved from the start, successfully passed the “adoption” stage and moved to the Interdomain Routing (IDR) working group.
The next step is the finalization of the draft within the IDR and the review of the IESG (Internet Engineering Steering Group (https://www.ietf.org/iesg/). If these stages would be passed and completed the draft would become the new RFC network standard (https://www.ietf.org/rfc.html).
Authors: Alexander Asimov and Evgeny Bogomazov from Qrator Labs, Randy Bush from Internet Initiative Japan, Kotikalapudi Sriram from the US. NIST and Keyur Patel from Arrcus Inc. are acutely aware of the fact that the industry has a high demand for the proposed changes. However, hurry is unacceptable, so the authors would do their best to make the proposed standard convenient for both transnational operators and small networks.
We are grateful to all technical specialists who expressed their support in the timeframe of an adoption call. At the same time, we say a big “Thank you!” to those who did not just express an opinion, but sent their detailed commentaries. We will try to take them into consideration for future changes to this particular draft.
We would also like to inform interested engineers — you still have the opportunity to express your thoughts and wishes for making additions and clarifications through the IETF mailing list (firstname.lastname@example.org) or with the help of Qrator Labs initiatives website (init.qrator.net).
It is important to note that by the time the final decision is taken the standard should have two working prototypes. One of them is already available — this is our fork based on the Bird routing daemon, available at Github: https://github.com/QratorLabs/bird. We invite vendors and the open source community to join this process and walk the same path together.