Exploring Communication of Rate Limits in ASP.NET Core With Rate Limit Headers

Rate limiting (sometimes also referred to as throttling) is a key mechanism when it comes to ensuring API responsiveness and availability. By enforcing usage quotas, it can protect an API from issues like:

  • Denial Of Service (DOS) attacks
  • Degraded performance due to traffic spikes
  • Monopolization by a single consumer

Despite its importance, the typical approach to rate limiting is far from perfect when it comes to communicating usage quotas by services and (as a result) respecting those quotas by clients. It shouldn't be a surprise that various services were experimenting with different approaches to solve this problem. The common pattern in the web world is that some of such experiments start to gain traction which results in standardization efforts. This is exactly what is currently happening around communicating services usage quotas with RateLimit Fields for HTTP Internet-Draft. As rate limiting will have built-in support with .NET 7, I thought it might be a good time to take a look at what this potential standard is bringing. But before that, let's recall how HTTP currently supports rate limiting (excluding custom extensions).

Current HTTP Support for Rate Limiting

When it comes to current support for rate limiting in HTTP, it's not much. If a service detects that a client has reached the quota, instead of a regular response it may respond with 429 (Too Many Request) or 503 (Service Unavailable). Additionally, the service may include a Retry-After header in the response to indicate how long the client should wait before making another request. That's it. It means that client can be only reactive. There is no way for a client to get the information about the quota to avoid hitting it.

In general, this works. That said, handling requests which are above the quota still consumes some resources on the service side. Clients would also prefer to be able to understand the quota and adjust their usage patterns instead of handling it as an exceptional situation. So as I said, it's not much.

Proposed Rate Limit Headers

The RateLimit Fields for HTTP Internet-Draft proposes four new headers which aim at enabling a service to communicate usage quotas and policies:

  • RateLimit-Limit to communicate the total quota within a time window.
  • RateLimit-Remaining to communicate the remaining quota within the current time window.
  • RateLimit-Reset to communicate the time (in seconds) remaining in the current time window.
  • RateLimit-Policy to communicate the overall quota policy.

The most interesting one is RateLimit-Policy. It is a list of quota policy items. A quota policy item consists of a quota limit and a single required parameter w which provides a time window in seconds. Custom parameters are allowed and should be treated as comments. Below you can see an example of RateLimit-Policy which informs that client is allowed to make 10 requests per second, 50 requests per minute, 1000 requests per hour, and 5000 per 24 hours.

RateLimit-Policy: 10;w=1, 50;w=60, 1000;w=3600, 5000;w=86400

The only headers which intend to be required are RateLimit-Limit and RateLimit-Reset (RateLimit-Remaining is strongly recommended). So, how an ASP.NET Core based service can server those headers.

Communicating Quotas When Using ASP.NET Core Middleware

As I've already mentioned, built-in support for rate limiting comes to .NET with .NET 7. It brings generic purpose primitives for writing rate limiters as well as a few ready-to-use implementations. It also brings a middleware for ASP.NET Core. The below example shows the definition of a fixed time window policy which allows 5 requests per 10 seconds. The OnRejected callback is also provided to return 429 (Too Many Request) status code and set the Retry-After header value based on provided metadata.

using System.Globalization;
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

app.UseHttpsRedirection();

app.UseRateLimiter(new RateLimiterOptions
{
    OnRejected = (context, cancellationToken) =>
    {
        if (context.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
        {
            context.HttpContext.Response.Headers.RetryAfter = ((int)retryAfter.TotalSeconds).ToString(NumberFormatInfo.InvariantInfo);
        }

        context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

        return new ValueTask();
    }
}.AddFixedWindowLimiter("fixed-window", new FixedWindowRateLimiterOptions(
    permitLimit: 5,
    queueProcessingOrder: QueueProcessingOrder.OldestFirst,
    queueLimit: 0,
    window: TimeSpan.FromSeconds(10),
    autoReplenishment: true
)));

app.MapGet("/", context => context.Response.WriteAsync("-- Demo.RateLimitHeaders.AspNetCore.RateLimitingMiddleware --"))
    .RequireRateLimiting("fixed-window");

app.Run();

The question is, if and how this can be extended to return the rate limit headers? The answer is, sadly, that there seems to be no way to provide the required ones right now. All the information about rate limit policies is well hidden from public access. It would be possible to provide RateLimit-Limit and RateLimit-Policy as they are a direct result of provided options. It is also possible to provide RateLimit-Remaining, but it requires rewriting a lot of the middleware ecosystem to get the required value. What seems completely impossible to get right now is RateLimit-Reset as timers are managed centrally deep in System.Threading.RateLimiting core without any access to their state. There is an option to provide your own timers, but it would mean rewriting the entire middleware stack and taking a lot of responsibility from System.Threading.RateLimiting. Let's hope that things will improve.

Communicating Quotas When Using AspNetCoreRateLimit Package

That built-in support for rate limiting is something that is just coming to .NET. So far the ASP.NET Core developers were using their own implementations or non-Microsoft packages for this purpose. Arguably, the most popular rate limiting solution for ASP.NET Core is AspNetCoreRateLimit. The example below provides similar functionality to the one from the built-in rate limiting example.

using AspNetCoreRateLimit;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddMemoryCache();

builder.Services.Configure<IpRateLimitOptions>(options =>
{
    options.EnableEndpointRateLimiting = true;
    options.StackBlockedRequests = false;
    options.HttpStatusCode = 429;
    options.GeneralRules = new List<RateLimitRule>
    {
        new RateLimitRule { Endpoint = "*", Period = "10s", Limit = 5 }
    };
});

builder.Services.AddInMemoryRateLimiting();

builder.Services.AddSingleton<IRateLimitConfiguration, RateLimitConfiguration>();

var app = builder.Build();

app.UseHttpsRedirection();

app.UseIpRateLimiting();

app.MapGet("/", context => context.Response.WriteAsync("-- Demo.RateLimitHeaders.AspNetCore.RateLimitPackage --"));

app.Run();

The AspNetCoreRateLimit has its own custom way of communicating quotas with HTTP headers. In the case of the above example, you might receive the following values in response.

X-Rate-Limit-Limit: 10s
X-Rate-Limit-Remaining: 4
X-Rate-Limit-Reset: 2022-07-24T11:30:47.2291052Z

As you can see, they provide potentially useful information but not in a way that RateLimit Fields for HTTP is going for. Luckily, AspNetCoreRateLimit is not as protective about its internal state and algorithms, the information needed can be accessed and served in a different way.

The information about the current state is kept in IRateLimitCounterStore. This is a dependency that could be accessed directly, but the method for generating needed identifiers lives in ProcessingStrategy so it will be better to create an implementation of it dedicated for purpose of just getting the counters state.

internal interface IRateLimitHeadersOnlyProcessingStrategy : IProcessingStrategy
{ }

internal class RateLimitHeadersOnlyProcessingStrategy : ProcessingStrategy, IRateLimitHeadersOnlyProcessingStrategy
{
    private readonly IRateLimitCounterStore _counterStore;
    private readonly IRateLimitConfiguration _config;

    public RateLimitHeadersOnlyProcessingStrategy(IRateLimitCounterStore counterStore, IRateLimitConfiguration config) : base(config)
    {
        _counterStore = counterStore;
        _config = config;
    }

    public override async Task ProcessRequestAsync(ClientRequestIdentity requestIdentity, RateLimitRule rule,
        ICounterKeyBuilder counterKeyBuilder, RateLimitOptions rateLimitOptions, CancellationToken cancellationToken = default)
    {
        string rateLimitCounterId = BuildCounterKey(requestIdentity, rule, counterKeyBuilder, rateLimitOptions);

        RateLimitCounter? rateLimitCounter = await _counterStore.GetAsync(rateLimitCounterId, cancellationToken);
        if (rateLimitCounter.HasValue)
        {
            return new RateLimitCounter
            {
                Timestamp = rateLimitCounter.Value.Timestamp,
                Count = rateLimitCounter.Value.Count
            };
        }
        else
        {
            return new RateLimitCounter
            {
                Timestamp = DateTime.UtcNow,
                Count = _config.RateIncrementer?.Invoke() ?? 1
            };
        }
    }
}

The second thing that is needed are rules which apply to specific endpoint and identity. Those can be retrieved from specific (either based on IP or client identifier) IRateLimitProcessor. The IRateLimitProcessor is also a tunnel to IProcessingStrategy, so it's nice we have a dedicated one. But what about the identity I've just mentioned? The algorithm to retrieve it lies in RateLimitMiddleware, so access to it will be needed. There are two options here. One is to inherit from RateLimitMiddleware and the other is to create an instance of one of its implementation and use it as a dependency. The first case would require hiding the base implementation of Invoke as it can't be overridden. I didn't like that, so I went with keeping an instance as a dependency approach. This led to the following code.

internal class IpRateLimitHeadersMiddleware
{
    private readonly RequestDelegate _next;
    private readonly RateLimitOptions _rateLimitOptions;
    private readonly IpRateLimitProcessor _ipRateLimitProcessor;
    private readonly IpRateLimitMiddleware _ipRateLimitMiddleware;

    public IpRateLimitHeadersMiddleware(RequestDelegate next,
        IRateLimitHeadersOnlyProcessingStrategy processingStrategy, IOptions<IpRateLimitOptions> options, IIpPolicyStore policyStore,
        IRateLimitConfiguration config, ILogger<IpRateLimitMiddleware> logger)
    {
        _next = next;
        _rateLimitOptions = options?.Value;
        _ipRateLimitProcessor = new IpRateLimitProcessor(options?.Value, policyStore, processingStrategy);
        _ipRateLimitMiddleware = new IpRateLimitMiddleware(next, processingStrategy, options, policyStore, config, logger);
    }

    public async Task Invoke(HttpContext context)
    {
        ClientRequestIdentity identity = await _ipRateLimitMiddleware.ResolveIdentityAsync(context);

        if (!_ipRateLimitProcessor.IsWhitelisted(identity))
        {
            var rateLimitRulesWithCounters = new Dictionary<RateLimitRule, RateLimitCounter>();

            foreach (var rateLimitRule in await _ipRateLimitProcessor.GetMatchingRulesAsync(identity, context.RequestAborted))
            {
                rateLimitRulesWithCounters.Add(
                    rateLimitRule,
                    await _ipRateLimitProcessor.ProcessRequestAsync(identity, rateLimitRule, context.RequestAborted)
                 );
            }
        }

        await _next.Invoke(context);

        return;
    }
}

The rateLimitRulesWithCounters contains all the rules applying to the endpoint in the context of the current request. This can be used to calculate the rate limit headers values.

internal class IpRateLimitHeadersMiddleware
{
    private class RateLimitHeadersState
    {
        public HttpContext Context { get; set; }

        public int Limit { get; set; }

        public int Remaining { get; set; }

        public int Reset { get; set; }

        public string Policy { get; set; } = String.Empty;

        public RateLimitHeadersState(HttpContext context)
        {
            Context = context;
        }
    }

    ...

    public async Task Invoke(HttpContext context)
    {
        ...
    }

    private RateLimitHeadersState PrepareRateLimitHeaders(HttpContext context, Dictionary<RateLimitRule, RateLimitCounter> rateLimitRulesWithCounters)
    {
        RateLimitHeadersState rateLimitHeadersState = new RateLimitHeadersState(context);

        var rateLimitHeadersRuleWithCounter = rateLimitRulesWithCounters.OrderByDescending(x => x.Key.PeriodTimespan).FirstOrDefault();
        var rateLimitHeadersRule = rateLimitHeadersRuleWithCounter.Key;
        var rateLimitHeadersCounter = rateLimitHeadersRuleWithCounter.Value;

        rateLimitHeadersState.Limit = (int)rateLimitHeadersRule.Limit;

        rateLimitHeadersState.Remaining = rateLimitHeadersState.Limit - (int)rateLimitHeadersCounter.Count;

        rateLimitHeadersState.Reset = (int)(
            (rateLimitHeadersCounter.Timestamp+ (rateLimitHeadersRule.PeriodTimespan ?? rateLimitHeadersRule.Period.ToTimeSpan())) - DateTime.UtcNow
            ).TotalSeconds;

        rateLimitHeadersState.Policy = String.Join(
            ", ",
            rateLimitRulesWithCounters.Keys.Select(rateLimitRule =>
                $"{(int)rateLimitRule.Limit};w={(int)(rateLimitRule.PeriodTimespan ?? rateLimitRule.Period.ToTimeSpan()
            ).TotalSeconds}")
        );

        return rateLimitHeadersState;
    }
}

The only thing that remains is setting the headers on the response.

internal class IpRateLimitHeadersMiddleware
{
    ...

    public async Task Invoke(HttpContext context)
    {
        ...

        if (!_ipRateLimitProcessor.IsWhitelisted(identity))
        {
            ...

            if (rateLimitRulesWithCounters.Any() && !_rateLimitOptions.DisableRateLimitHeaders)
            {
                context.Response.OnStarting(
                    SetRateLimitHeaders,
                    state: PrepareRateLimitHeaders(context, rateLimitRulesWithCounters)
                );
            }
        }

        ...
    }

    ...

    private Task SetRateLimitHeaders(object state)
    {
        var rateLimitHeadersState = (RateLimitHeadersState)state;

        rateLimitHeadersState.Context.Response.Headers["RateLimit-Limit"] = rateLimitHeadersState.Limit.ToString(CultureInfo.InvariantCulture);
        rateLimitHeadersState.Context.Response.Headers["RateLimit-Remaining"] = rateLimitHeadersState.Remaining.ToString(CultureInfo.InvariantCulture);
        rateLimitHeadersState.Context.Response.Headers["RateLimit-Reset"] = rateLimitHeadersState.Reset.ToString(CultureInfo.InvariantCulture);
        rateLimitHeadersState.Context.Response.Headers["RateLimit-Policy"] = rateLimitHeadersState.Policy;

        return Task.CompletedTask;
    }
}

After registering the RateLimitHeadersOnlyProcessingStrategy and IpRateLimitHeadersMiddleware (I've registered it after the IpRateLimitMiddleware) response will contain values similar to the following ones.

RateLimit-Limit: 5
RateLimit-Remaining: 4
RateLimit-Reset: 9
RateLimit-Policy: 5;w=10
X-Rate-Limit-Limit: 10s
X-Rate-Limit-Remaining: 4
X-Rate-Limit-Reset: 2022-07-25T20:57:32.0746592Z

The code works but certainly isn't perfect, so I've created an issue in hope that AspNetCoreRateLimit will get those headers built in.

Limiting the Number of Outbound Requests in HttpClient

The general rule around rate limit headers is that they should be treated as informative, so the client doesn't have to do anything specific with them. They are also described as generated at response time without any guarantee of consistency between requests. This makes perfect sense. In the simple examples above, multiple clients would be competing for the same quota so received headers values don't exactly tell how many requests a specific client can make within a given window. But, real-life scenarios are usually more specific and complex. It's very common for quotas to be per client or per IP address (this is why AspNetCoreRateLimit has concepts like request identity as a first-class citizen, the ASP.NET Core built-in middleware also enables sophisticated scenarios by using PartitionedRateLimiter at its core). In a such scenario, the client might want to use rate limit headers to avoid making requests which have a high likelihood of being throttled. Let's explore that, below is a simple code that can handle 429 (Too Many Request) responses and utilize the Retry-After header.

HttpClient client = new();
client.BaseAddress = new("http://localhost:5262");

while (true)
{
    Console.Write("{0:hh:mm:ss}: ", DateTime.UtcNow);

    int nextRequestDelay = 1;

    try
    {
        HttpResponseMessage response = await client.GetAsync("/");
        if (response.IsSuccessStatusCode)
        {
            Console.WriteLine(await response.Content.ReadAsStringAsync());
        }
        else
        {
            Console.Write($"{(int)response.StatusCode}: {await response.Content.ReadAsStringAsync()}");

            string? retryAfter = response.Headers.GetValues("Retry-After").FirstOrDefault();
            if (Int32.TryParse(retryAfter, out nextRequestDelay))
            {
                Console.Write($" | Retry-After: {nextRequestDelay}");
            }

            Console.WriteLine();
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }

    await Task.Delay(TimeSpan.FromSeconds(nextRequestDelay));
}

Let's assume that the service is sending all rate limit headers and that they are dedicated to the client. We can rate limit the HttpClient by creating a DelegatingHandler which will read the RateLimit-Policy header value and instantiate a FixedWindowRateLimiter based on it. The FixedWindowRateLimiter will be used to rate limit the outbound requests - if a lease can't be acquired a locally created HttpResponseMessage will be returned.

internal class RateLimitPolicyHandler : DelegatingHandler
{
    private string? _rateLimitPolicy;
    private RateLimiter? _rateLimiter;

    private static readonly Regex RATE_LIMIT_POLICY_REGEX = new Regex(@"(\d+);w=(\d+)", RegexOptions.Compiled);

    public RateLimitPolicyHandler() : base(new HttpClientHandler())
    { }

    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        if (_rateLimiter is not null)
        {
            using var rateLimitLease = await _rateLimiter.WaitAsync(1, cancellationToken);
            if (rateLimitLease.IsAcquired)
            {
                return await base.SendAsync(request, cancellationToken);
            }

            var rateLimitResponse = new HttpResponseMessage(HttpStatusCode.TooManyRequests);
            rateLimitResponse.Content = new StringContent($"Service rate limit policy ({_rateLimitPolicy}) exceeded!");

            if (rateLimitLease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
            {
                rateLimitResponse.Headers.Add("Retry-After", ((int)retryAfter.TotalSeconds).ToString(NumberFormatInfo.InvariantInfo));
            }

            return rateLimitResponse;
        }

        var response = await base.SendAsync(request, cancellationToken);

        if (response.Headers.Contains("RateLimit-Policy"))
        {
            _rateLimitPolicy = response.Headers.GetValues("RateLimit-Policy").FirstOrDefault();

            if (_rateLimitPolicy is not null)
            {
                Match rateLimitPolicyMatch = RATE_LIMIT_POLICY_REGEX.Match(_rateLimitPolicy);

                if (rateLimitPolicyMatch.Success)
                {
                    int limit = Int32.Parse(rateLimitPolicyMatch.Groups[1].Value);
                    int window = Int32.Parse(rateLimitPolicyMatch.Groups[2].Value);

                    _rateLimiter = new FixedWindowRateLimiter(new FixedWindowRateLimiterOptions(
                        limit,
                        QueueProcessingOrder.NewestFirst,
                        0,
                        TimeSpan.FromSeconds(window),
                        true
                    ));

                    string? rateLimitRemaining = response.Headers.GetValues("RateLimit-Remaining").FirstOrDefault();
                    if (Int32.TryParse(rateLimitRemaining, out int remaining))
                    {
                        using var rateLimitLease = await _rateLimiter.WaitAsync(limit - remaining, cancellationToken);
                    }
                }
            }
        }

        return response;
    }
}

The above code also uses the RateLimit-Remaining header value to acquire leases for requests which are no longer available in the initial window.

Now depending if the sample code is run with the RateLimitPolicyHandler in the HttpClient pipeline or not, the console output will be different as those 429 will be coming from a different place.

Opinions

The rate limit headers seem like an interesting addition for communicating services usage quotas. Properly used in the right situations might be a useful tool, it is just important not to treat them as guarantees.

Serving rate limit headers from ASP.NET Core right now has its challenges. If they will become a standard and gain popularity, I think this will change.

If you want to play with the samples, they are available on GitHub.