Azure Functions Extensibility - Runtime Scaling

In the past, I've written posts that explored the core aspects of creating Azure Functions extensions:

In this post, I want to dive into how extensions implement support for the Azure Functions scaling model.

The Azure Functions scaling model (as one can expect) has evolved over the years. The first model is incremental scaling handled by the scale controller (a dedicated Azure Functions component). The scale controller monitors the rate of incoming events and makes magical (following the rule that every sophisticated enough technology is indistinguishable from magic) decisions about whether to add or remove a worker. The workers count can change only by one and at a specific rate. This mechanism has some limitations.

The first problem is the inability to scale in network-isolated deployments. In such a scenario, the scale controller doesn't have access to the services and can't monitor events - only the function app (and as a consequence the extensions) could. As a result, toward the end of 2019, the SDK provided support for extensions to vote in scale in and scale out decisions.

The second problem is the scaling process clarity and rate. Change by one worker at a time driven by complex heuristics is not perfect. This led to the introduction of target-based scaling at the beginning of 2023. Extensions become able to implement their own algorithms for requesting a specific number of workers and scale up by four instances at a time.

So, how do those two mechanisms work and can be implemented? To answer this question I'm going to describe them from two perspectives. One will be how it is being done by a well-established extension - Azure Cosmos DB bindings for Azure Functions. The other will be a sample implementation in my extension that I've used previously while writing on Azure Functions extensibility - RethinkDB bindings for Azure Functions.

But first things first. Before we can talk about the actual logic driving the scaling behaviors, we need the right SDK and the right classes in place.

Runtime and Target-Based Scaling Support in Azure WebJobs SDK

As I've mentioned above, support for extensions to participate in the scaling model has been introduced gradually to the Azure WebJobs SDK. At the same time, the team makes an effort to introduce changes in a backward-compatible manner. Support for voting in scaling decisions came in version 3.0.14 and for target-based scaling in 3.0.36, but you can have an extension that is using version 3.0.0 and it will work on the latest Azure Functions runtime. You can also have an extension that uses the latest SDK and doesn't implement the scalability mechanisms. You need to know that they are there and choose to utilize them.

This is reflected in the official extensions as well. The Azure Cosmos DB extension has implemented support for runtime scaling in version 3.0.5 and target-based scaling in 4.1.0 (you can find some tables that are gathering version numbers, for example in the section about virtual network triggers). This means, that if your function app is using lower versions of extensions, it won't benefit from these capabilities.

So, regardless if you're implementing your extension or you're just using an existing one, the versions of dependencies matter. However, if you're implementing your extension, you will need some classes.

Classes Needed to Implement Scaling Support

The Azure Functions scaling model revolves around triggers. After all, it's their responsibility to handle the influx of events. As you may know (for example from my previous post 😉), the heart of a trigger is a listener. This is where all the heavy lifting is being done and this is where we can inform the runtime that our trigger supports scalability features. We can do so by implementing two interfaces: IScaleMonitorProvider and ITargetScalerProvider. They are respectively tied to runtime scaling and target-based scaling. If they are implemented, the runtime will use the properties they define to obtain the actual logic implementations.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly IScaleMonitor<RethinkDbTriggerMetrics> _rethinkDbScaleMonitor;
    private readonly ITargetScaler _rethinkDbTargetScaler;

    ...

    public IScaleMonitor GetMonitor()
    {
        return _rethinkDbScaleMonitor;
    }

    public ITargetScaler GetTargetScaler()
    {
        return _rethinkDbTargetScaler;
    }
}

From the snippets above you can notice one more class - RethinkDbTriggerMetrics. It derives from the ScaleMetrics class and is used for capturing the values on which the decisions are being made.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    ...
}

The IScaleMonitor implementation contributes to the runtime scaling part of the scaling model. Its responsibilities are to provide snapshots of the metrics and to vote in the scaling decisions.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    public ScaleMonitorDescriptor Descriptor => ...;

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        ...
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        ...
    }
}

The ITargetScaler implementation is responsible for the target-based scaling. It needs to focus only on one thing - calculating the desired worker count.

internal class RethinkDbTargetScaler : ITargetScaler
{
    public TargetScalerDescriptor TargetScalerDescriptor => ...;

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        ...
    }
}

As both of these implementations are tightly tied with a specific trigger, it is typical to instantiate them in the listener constructor.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(...);
        _rethinkDbTargetScaler = new RethinkDbTargetScaler(...);
    }

    ...
}

We are almost ready to start discussing the details of scaling logic, but before we do that, we need to talk about gathering metrics. The decisions need to be based on something.

Gathering Metrics

Yes, the decisions need to be based on something and the quality of the decisions will depend on the quality of metrics you can gather. So a lot depends on how well the service, for which the extension is being created, is prepared to be monitored.

Azure Cosmos DB is well prepared. The Azure Cosmos DB bindings for Azure Functions implement the trigger on top of the Azure Cosmos DB change feed processor and use the change feed estimator to gather the metrics. This gives access to quite an accurate estimate of the remaining work. As an additional metric, the extension is gathers the number of leases for the container that the trigger is observing.

RethinkDB is not so well prepared. It seems like change feed provides only one metric - buffered items count.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    public int BufferedItemsCount { get; set; }
}

Also, the metric can only be gathered while iterating the change feed. This forces an intermediary between the listener and the scale monitor (well you could use the scale monitor directly, but it seemed ugly to me).

internal class RethinkDbMetricsProvider
{
    public int CurrentBufferedItemsCount { get; set; }

    public RethinkDbTriggerMetrics GetMetrics()
    {
        return new RethinkDbTriggerMetrics { BufferedItemsCount = CurrentBufferedItemsCount };
    }
}

Now the same instance of this intermediary can be used by the listener to provide the value.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbMetricsProvider = new RethinkDbMetricsProvider();
        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(..., _rethinkDbMetricsProvider);

        ...
    }

    ...

    private async Task ListenAsync(CancellationToken listenerStoppingToken)
    {
        ...

        while (!listenerStoppingToken.IsCancellationRequested
               && (await changefeed.MoveNextAsync(listenerStoppingToken)))
        {
            _rethinkDbMetricsProvider.CurrentBufferedItemsCount = changefeed.BufferedItems.Count;
            ...
        }

        ...
    }
}

And the scale monitor to read it.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbScaleMonitor(..., RethinkDbMetricsProvider rethinkDbMetricsProvider)
    {
        ...

        _rethinkDbMetricsProvider = rethinkDbMetricsProvider;

        ...
    }

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        return Task.FromResult(_rethinkDbMetricsProvider.GetMetrics());
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        return Task.FromResult((ScaleMetrics)_rethinkDbMetricsProvider.GetMetrics());
    }

    ...
}

We have the metrics now, it is finally time to make some decisions.

Voting to Scale In or Scale Out

An extension can cast one of three votes: to scale out, to scale in, or neutral. This is where the intimate knowledge of the service that the extension is created for comes into play.

As I've already written, the Azure Cosmos DB change feed processor is well prepared to gather metrics about it. It is also well prepared to be consumed in a scalable manner. It can distribute the work among multiple compute instances, by balancing the number of leases owned by each compute instance. It will also adjust the number of leases based on throughput and storage. This is why the Azure Cosmos DB bindings are tracking the number of leases as one of the metrics - it's an upper limit for the number of workers. So the extension utilizes all this knowledge and employs the following algorithm to cast a vote:

  1. If there are no metrics gathered yet, cast a neutral vote.
  2. If the current number of leases is greater than the number of workers, cast a scale-in vote.
  3. If there are less than five metrics samples, cast a neutral vote.
  4. If the ratio of workers to remaining items is less than 1 per 1000, cast a scale-out vote.
  5. If there are constantly items waiting to be processed and the number of workers is smaller than the number of leases, cast a scale-out vote.
  6. If the trigger source has been empty for a while, cast a scale-in vote.
  7. If there has been a continuous increase across the last five samples in items to be processed, cast a scale-out vote.
  8. If there has been a continuous decrease across the last five samples in items to be processed, cast a scale-in vote.
  9. If none of the above has happened, cast a neutral vote.

RethinkDB is once again lacking here. It looks like its change feed is not meant to be processed in parallel at all. This leads to an interesting edge case, where we never want to scale beyond a single instance.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        return GetScaleStatus(
            context.WorkerCount,
            context.Metrics?.ToArray()
        );
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        return GetScaleStatus(
            context.WorkerCount, 
            context.Metrics?.Cast<RethinkDbTriggerMetrics>().ToArray()
        );
    }

    private ScaleStatus GetScaleStatus(int workerCount, RethinkDbTriggerMetrics[] metrics)
    {
        ScaleStatus status = new ScaleStatus
        {
            Vote = ScaleVote.None
        };

        // RethinkDB change feed is not meant to be processed in paraller.
        if (workerCount > 1)
        {
            status.Vote = ScaleVote.ScaleIn;

            return status;
        }

        return status;
    }
}

Requesting Desired Number of Workers

Being able to tell if you want more workers or less is great, being able to tell how many workers you want is even better. Of course, there is no promise the extension will get the requested number (even in the case of target-based scaling the scaling happens with a maximum rate of four instances at a time), but it's better than increasing and decreasing by one instance. Extensions can also use this mechanism to participate in dynamic concurrency.

This is exactly what the Azure Cosmos DB extension is doing. It divides the number of remaining items by the value of the MaxItemsPerInvocation trigger setting (the default is 100). The result is capped by the number of leases and that's the desired number of workers.

We already know that in the case of RethinkDB, it's even simpler - we always want just one worker.

internal class RethinkDbTargetScaler : ITargetScaler
{
    ...

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        return Task.FromResult(new TargetScalerResult
        {
            TargetWorkerCount = 1
        });
    }
}

That's it. The runtime scaling implementation is now complete.

Consequences of Runtime Scaling Design

Creating extensions for Azure Functions is not something commonly done. Still, there is a value in understanding their anatomy and how they work as it is often relevant also when you are creating function apps.

The unit of scale for Azure Functions is the entire functions app. At the same time, the scaling votes and the desired number of workers are delivered at a trigger level. This means that you need to be careful when creating function apps with multiple triggers. If the triggers will be aiming for completely different scaling decisions it may lead to undesired scenarios. For example, one trigger may constantly want to scale in, while another will want to scale out. As you could notice throughout this post, some triggers have hard limits on how far they can scale to not waste resources (even two Azure Cosmos DB triggers may have a different upper limit because the lease containers they are attached to will have different numbers of leases available). This all should be taken into account while designing the function app and trying to foresee how it will scale.