Introduction

Purpose

APIs are an integral part of CS usage. The legitimate users of CloudStack can occasionally hammer the server with heavy API requests that cause undesirable results, like killing the server, performance issues for other CloudStack users. Also, it may become a mechanism for certain malicious users to do malicious attacks to CloudStack service to cause cloud outage. To prevent certain things happen, we will introduce this API request throttling feature to limit number of APIs that can be placed by each account within certain time duration and will block API requests if the account is over the limit so that he/she have to retry later.

References

Feature Specifications

  • Requirements
    • There should be a global configuration parameter to define number of APIs that can be placed per account within a pre-defined interval, for example, 1 second.
    • This API throttling feature should allow customer to turn on/off by a global configuration parameter to avoid causing unexpected behavior for existing third-party integrators with CloudStack.
    • For CS, we will implement this API throttling at account level.
  • Non-requirements
    • Limit the API calls by volume of data through GET or POST.

Architecture and Design description

This API throttling feature will be implemented as an APIChecker Adapter as well as a Pluggable service to provide api limit check in invoking each API command. This Adapter will be invoked by APIServer
when the command is invoked, as a chain to current acl access check, but before each acl access checker.

Global Configurations

  • api.throttling.enabled - Enable/Disable API throttling. By default, this setting is false, so api throttling is disabled.
  • api.throttling.interval (in seconds) - Time interval to count number of APIs placed.
  • api.throttling.max - Maximum number of APIs that can be placed within api.throttling.interval period.
  • api.throttling.cachesize - Cache size used for counter tracking, applicable for Ehcache based limit store implementation.
    These configurations can be configured through UI Global Settings tab, but it needs restarting MS afterwards.

Api Rate Limit Pluggable Service

APIChecker Adapter

This Api Rate Limit Pluggable service is also implementing APIChecker adapter interface to perform API limit checking on the API commands. This adapter can be chained before current ACL access checker adapter to control whether an API invocation can go through or not. APIServer will invoke this adapter first on each command invocation to avoid wasting resources to perform access checking.

public interface APIChecker extends Adapter {
	boolean checkAccess(User user, String commandName) throws PermissionDeniedException;
}

By default, we will provide an implementation to query Ehcache-based rate limit store (in memory) to check if the given account has passed his/her api limit set in plugin configuration to pass through or deny the request. In case of denial, we will throw an ServerApiException with HttpErrorCode = 429 and clearing indicating the error message that is indicative to the user. For example, "You have reached the API limit per second, please re-try after x seconds". For custom implementation, for example, setting different limits for different accounts based on business needs, we can write other custom API rate limit plugins to serve the purpose.

New APIs Provided

We will introduce two new APIs related limit reset and query:

  • resetApiLimitCmd: This is an admin-only API. For root admin, if accountId parameter is passed, it will reset count for that particular account, otherwise it will reset all counters.
  • getApiLimitCmd: show number of remaining APIs for the invoking user in current window.

To allow UI to retrieve the global configured api.throttling.interval and api.throttling.max, we also modified existing listCapabilitiesCmd to also return apilimitinterval and apilimitmax.

Interfaces for Rate Limit Store

We have defined the following Rate Limit Store interface to provide contracts among different implementations of api limit store. Contributors can provide their own particular implementations based on different technologies, such as DB, Memcached, Redis, Ehcache, etc. Here we have provided a sample implementation of this interface using Ehcache in this pluggable service. See details next.

public interface LimitStore {

    StoreEntry get(Long account);

    StoreEntry create(Long account, int timeToLiveInSecs);

    void resetCounters();

}

public interface StoreEntry {

    int getCounter();

    int incrementAndGet();

    boolean isExpired();

    long getExpireDuration(); 
}

Ehcache based Rate Limit Store

With scalability and simplicity in mind, we are thinking of using Ehcache to keep track of API limit counter in memory. With Ehcache time_to_live feature, item in the cache can automatically become expired based on time_to_live value set for each cache element, saving us from resetting counter in our business logic. For this release, we are implementing counter cache per management server.

The known limitation for this approach are:

  • Cache is not synchronized across multiple management servers, we may not be able to control exact desired number of API requests going through CS in case of clustered management server case, but the worst case is that we may allow #number of MS * #API_limit go through, which should be acceptable based on our requirement of avoid malicious attack causing denial of services.
  • restApiLimitCmd and getApiLimitCmd will be limited to the management server where the API is invoked, which should be also acceptable for avoiding denial of service intention.

Other implementations of Rate Limit Store (not in this release)

DB-based Rate Limit Store

We had initially thought of use a new DB table to keep track of the number of APIs issued by a given account within the given interval as below

  • account_api_count (account_id, count, time_to_live) :  

    account_id

    bigint(20)

    YES

    Primary

    NULL

     

    count

    bigint(20)

    YES

     

    NULL

     

    time_to_live

    bigint(20)

    YES

     

    NULL

     

However, the drawback for this approach is that it will involve frequent DB read/write for each API request, which may degrade system performance a lot. This is why for 4.1 release, we chose not to go this route.

Memcached Rate Limit Store

The ideal implementation of Rate limit store is to use Memcached (see http://simonwillison.net/2009/jan/7/ratelimitcache/), where we can set a memcached server for counter tracking. We didn't pursue this route is that currently Memcached is not currently under Apache license. With our current code abstraction of API rate limit as a pluggable service and clear definition of LimitStore interface, this implementation should be straightforward for any contributor once Memcached license issue is resolved.

Test Plan

We have done the following two kinds of testing during development cycle:

  • Unit test to verify ApiRateLimitService pluggable service interface and Limit Store interface methods. These unit testcases are located in plugins/api/rate-limit/test/org/apache/cloudstack/ratelimit/ApiRateLimitTest.java.
  • Integration test to verify rate limit feature and new APIs through a running MS. These integration testcases are located in plugins/api/rate-limit/test/org/apache/cloudstack/ratelimit/integration/RateLimitIntegrationTest.java. This test is assuming that we have a "demo" user account created on your running MS.

Open Questions

  1. Shoud we enforce this limit checking for root admin as well?
    Answer We will not enforce this limit checking for root admin.
  1. Shoud we have a back-off algorithm to automatically trigger those blocked requests?
    Answer not implemented, need to trigger manually.
  • No labels