In the past, I've written posts that explored the core aspects of creating Azure Functions extensions:

In this post, I want to dive into how extensions implement support for the Azure Functions scaling model.

The Azure Functions scaling model (as one can expect) has evolved over the years. The first model is incremental scaling handled by the scale controller (a dedicated Azure Functions component). The scale controller monitors the rate of incoming events and makes magical (following the rule that every sophisticated enough technology is indistinguishable from magic) decisions about whether to add or remove a worker. The workers count can change only by one and at a specific rate. This mechanism has some limitations.

The first problem is the inability to scale in network-isolated deployments. In such a scenario, the scale controller doesn't have access to the services and can't monitor events - only the function app (and as a consequence the extensions) could. As a result, toward the end of 2019, the SDK provided support for extensions to vote in scale in and scale out decisions.

The second problem is the scaling process clarity and rate. Change by one worker at a time driven by complex heuristics is not perfect. This led to the introduction of target-based scaling at the beginning of 2023. Extensions become able to implement their own algorithms for requesting a specific number of workers and scale up by four instances at a time.

So, how do those two mechanisms work and can be implemented? To answer this question I'm going to describe them from two perspectives. One will be how it is being done by a well-established extension - Azure Cosmos DB bindings for Azure Functions. The other will be a sample implementation in my extension that I've used previously while writing on Azure Functions extensibility - RethinkDB bindings for Azure Functions.

But first things first. Before we can talk about the actual logic driving the scaling behaviors, we need the right SDK and the right classes in place.

Runtime and Target-Based Scaling Support in Azure WebJobs SDK

As I've mentioned above, support for extensions to participate in the scaling model has been introduced gradually to the Azure WebJobs SDK. At the same time, the team makes an effort to introduce changes in a backward-compatible manner. Support for voting in scaling decisions came in version 3.0.14 and for target-based scaling in 3.0.36, but you can have an extension that is using version 3.0.0 and it will work on the latest Azure Functions runtime. You can also have an extension that uses the latest SDK and doesn't implement the scalability mechanisms. You need to know that they are there and choose to utilize them.

This is reflected in the official extensions as well. The Azure Cosmos DB extension has implemented support for runtime scaling in version 3.0.5 and target-based scaling in 4.1.0 (you can find some tables that are gathering version numbers, for example in the section about virtual network triggers). This means, that if your function app is using lower versions of extensions, it won't benefit from these capabilities.

So, regardless if you're implementing your extension or you're just using an existing one, the versions of dependencies matter. However, if you're implementing your extension, you will need some classes.

Classes Needed to Implement Scaling Support

The Azure Functions scaling model revolves around triggers. After all, it's their responsibility to handle the influx of events. As you may know (for example from my previous post 😉), the heart of a trigger is a listener. This is where all the heavy lifting is being done and this is where we can inform the runtime that our trigger supports scalability features. We can do so by implementing two interfaces: IScaleMonitorProvider and ITargetScalerProvider. They are respectively tied to runtime scaling and target-based scaling. If they are implemented, the runtime will use the properties they define to obtain the actual logic implementations.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly IScaleMonitor<RethinkDbTriggerMetrics> _rethinkDbScaleMonitor;
    private readonly ITargetScaler _rethinkDbTargetScaler;

    ...

    public IScaleMonitor GetMonitor()
    {
        return _rethinkDbScaleMonitor;
    }

    public ITargetScaler GetTargetScaler()
    {
        return _rethinkDbTargetScaler;
    }
}

From the snippets above you can notice one more class - RethinkDbTriggerMetrics. It derives from the ScaleMetrics class and is used for capturing the values on which the decisions are being made.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    ...
}

The IScaleMonitor implementation contributes to the runtime scaling part of the scaling model. Its responsibilities are to provide snapshots of the metrics and to vote in the scaling decisions.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    public ScaleMonitorDescriptor Descriptor => ...;

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        ...
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        ...
    }
}

The ITargetScaler implementation is responsible for the target-based scaling. It needs to focus only on one thing - calculating the desired worker count.

internal class RethinkDbTargetScaler : ITargetScaler
{
    public TargetScalerDescriptor TargetScalerDescriptor => ...;

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        ...
    }
}

As both of these implementations are tightly tied with a specific trigger, it is typical to instantiate them in the listener constructor.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(...);
        _rethinkDbTargetScaler = new RethinkDbTargetScaler(...);
    }

    ...
}

We are almost ready to start discussing the details of scaling logic, but before we do that, we need to talk about gathering metrics. The decisions need to be based on something.

Gathering Metrics

Yes, the decisions need to be based on something and the quality of the decisions will depend on the quality of metrics you can gather. So a lot depends on how well the service, for which the extension is being created, is prepared to be monitored.

Azure Cosmos DB is well prepared. The Azure Cosmos DB bindings for Azure Functions implement the trigger on top of the Azure Cosmos DB change feed processor and use the change feed estimator to gather the metrics. This gives access to quite an accurate estimate of the remaining work. As an additional metric, the extension is gathers the number of leases for the container that the trigger is observing.

RethinkDB is not so well prepared. It seems like change feed provides only one metric - buffered items count.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    public int BufferedItemsCount { get; set; }
}

Also, the metric can only be gathered while iterating the change feed. This forces an intermediary between the listener and the scale monitor (well you could use the scale monitor directly, but it seemed ugly to me).

internal class RethinkDbMetricsProvider
{
    public int CurrentBufferedItemsCount { get; set; }

    public RethinkDbTriggerMetrics GetMetrics()
    {
        return new RethinkDbTriggerMetrics { BufferedItemsCount = CurrentBufferedItemsCount };
    }
}

Now the same instance of this intermediary can be used by the listener to provide the value.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbMetricsProvider = new RethinkDbMetricsProvider();
        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(..., _rethinkDbMetricsProvider);

        ...
    }

    ...

    private async Task ListenAsync(CancellationToken listenerStoppingToken)
    {
        ...

        while (!listenerStoppingToken.IsCancellationRequested
               && (await changefeed.MoveNextAsync(listenerStoppingToken)))
        {
            _rethinkDbMetricsProvider.CurrentBufferedItemsCount = changefeed.BufferedItems.Count;
            ...
        }

        ...
    }
}

And the scale monitor to read it.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbScaleMonitor(..., RethinkDbMetricsProvider rethinkDbMetricsProvider)
    {
        ...

        _rethinkDbMetricsProvider = rethinkDbMetricsProvider;

        ...
    }

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        return Task.FromResult(_rethinkDbMetricsProvider.GetMetrics());
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        return Task.FromResult((ScaleMetrics)_rethinkDbMetricsProvider.GetMetrics());
    }

    ...
}

We have the metrics now, it is finally time to make some decisions.

Voting to Scale In or Scale Out

An extension can cast one of three votes: to scale out, to scale in, or neutral. This is where the intimate knowledge of the service that the extension is created for comes into play.

As I've already written, the Azure Cosmos DB change feed processor is well prepared to gather metrics about it. It is also well prepared to be consumed in a scalable manner. It can distribute the work among multiple compute instances, by balancing the number of leases owned by each compute instance. It will also adjust the number of leases based on throughput and storage. This is why the Azure Cosmos DB bindings are tracking the number of leases as one of the metrics - it's an upper limit for the number of workers. So the extension utilizes all this knowledge and employs the following algorithm to cast a vote:

  1. If there are no metrics gathered yet, cast a neutral vote.
  2. If the current number of leases is greater than the number of workers, cast a scale-in vote.
  3. If there are less than five metrics samples, cast a neutral vote.
  4. If the ratio of workers to remaining items is less than 1 per 1000, cast a scale-out vote.
  5. If there are constantly items waiting to be processed and the number of workers is smaller than the number of leases, cast a scale-out vote.
  6. If the trigger source has been empty for a while, cast a scale-in vote.
  7. If there has been a continuous increase across the last five samples in items to be processed, cast a scale-out vote.
  8. If there has been a continuous decrease across the last five samples in items to be processed, cast a scale-in vote.
  9. If none of the above has happened, cast a neutral vote.

RethinkDB is once again lacking here. It looks like its change feed is not meant to be processed in parallel at all. This leads to an interesting edge case, where we never want to scale beyond a single instance.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        return GetScaleStatus(
            context.WorkerCount,
            context.Metrics?.ToArray()
        );
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        return GetScaleStatus(
            context.WorkerCount, 
            context.Metrics?.Cast<RethinkDbTriggerMetrics>().ToArray()
        );
    }

    private ScaleStatus GetScaleStatus(int workerCount, RethinkDbTriggerMetrics[] metrics)
    {
        ScaleStatus status = new ScaleStatus
        {
            Vote = ScaleVote.None
        };

        // RethinkDB change feed is not meant to be processed in paraller.
        if (workerCount > 1)
        {
            status.Vote = ScaleVote.ScaleIn;

            return status;
        }

        return status;
    }
}

Requesting Desired Number of Workers

Being able to tell if you want more workers or less is great, being able to tell how many workers you want is even better. Of course, there is no promise the extension will get the requested number (even in the case of target-based scaling the scaling happens with a maximum rate of four instances at a time), but it's better than increasing and decreasing by one instance. Extensions can also use this mechanism to participate in dynamic concurrency.

This is exactly what the Azure Cosmos DB extension is doing. It divides the number of remaining items by the value of the MaxItemsPerInvocation trigger setting (the default is 100). The result is capped by the number of leases and that's the desired number of workers.

We already know that in the case of RethinkDB, it's even simpler - we always want just one worker.

internal class RethinkDbTargetScaler : ITargetScaler
{
    ...

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        return Task.FromResult(new TargetScalerResult
        {
            TargetWorkerCount = 1
        });
    }
}

That's it. The runtime scaling implementation is now complete.

Consequences of Runtime Scaling Design

Creating extensions for Azure Functions is not something commonly done. Still, there is a value in understanding their anatomy and how they work as it is often relevant also when you are creating function apps.

The unit of scale for Azure Functions is the entire functions app. At the same time, the scaling votes and the desired number of workers are delivered at a trigger level. This means that you need to be careful when creating function apps with multiple triggers. If the triggers will be aiming for completely different scaling decisions it may lead to undesired scenarios. For example, one trigger may constantly want to scale in, while another will want to scale out. As you could notice throughout this post, some triggers have hard limits on how far they can scale to not waste resources (even two Azure Cosmos DB triggers may have a different upper limit because the lease containers they are attached to will have different numbers of leases available). This all should be taken into account while designing the function app and trying to foresee how it will scale.

A year ago I wrote about running a .NET based Spin application on a WASI node pool in AKS. Since then the support for WebAssembly in AKS hasn't changed. We still have the same preview, supporting the same versions of two ContainerD shims: Spin and Slight (a.k.a. SpiderLightning). So why am I coming back to the subject? The broader context has evolved.

With WASI preview 2, the ecosystem is embracing the component model and standardized APIs. When I was experimenting with Spin, I leveraged WAGI (WebAssembly Gateway Interface) which allowed me to be ignorant about the Wasm runtime context. Now I want to change that, cross the barrier and dig into direct interoperability between .NET and Wasm.

Also, regarding the mentioned APIs, one of the emerging ones is wasi-cloud-core which aims to provide a generic way for WASI applications to interact with services. This proposal is not yet standardized but it has an experimental host implementation which happens to be Slight. By running a .NET based Slight application I want to get a taste of what that API might bring.

Last but not least, .NET 8 has brought a new way of building .NET based Wasm applications with a wasi-experimental workload. I want to build something with it and see where the .NET support for WASI is heading.

So, this "experiment" has multiple angles and brings together a bunch of different things. How am I going to start? In the usual way, by creating a project.

Creating a .NET 8 wasi-wasm Project

The wasi-experimental workload is optional, so we need to install it before we can create a project.

dotnet workload install wasi-experimental

It also doesn't bundle the WASI SDK (the WASI SDK for .NET 7 did), so we have to install it ourselves. The releases of WASI SDK available on GitHub contain binaries for different platforms. All you need to do is download the one appropriate for yours, extract it, and create the WASI_SDK_PATH environment variable pointing to the output. The version I'll be using here is 20.0.

With the prerequisites in place, we can create the project.

dotnet new wasiconsole -o Demo.Wasm.Slight

Now, if you run dotnet build and inspect the output folder, you can notice that it contains Demo.Wasm.Slight.dll, other managed DLLs, and dotnet.wasm. This is the default output, where the dotnet.wasm is responsible for loading the Mono runtime and then loading the functionality from DLLs. This is not what we want. We want a single file. To achieve that we need to modify the project file by adding the WasmSingleFileBundle property (in my opinion this should be the default).

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    ...
    <WasmSingleFileBundle>true</WasmSingleFileBundle>
  </PropertyGroup>
</Project>

If you run dotnet build after this modification, you will find Demo.Wasm.Slight.wasm in the output. Exactly what we want.

But, before we can start implementing the actual application, we need to work on the glue between the .NET and WebAssembly so we can interact with the APIs provided by Slight from C#.

From WebAssembly IDL To C#

The imports and exports in WASI APIs are described in Wasm Interface Type (WIT) format. This format is an IDL which is a foundation behind tooling for the WebAssembly Component Model.

WIT aims at being a developer-friendly format, but writing bindings by hand is not something that developers expect. This is where wit-bindgen comes into the picture. It's a (still young) binding generator for languages that are compiled into WebAssembly. It currently supports languages like Rust, C/C++, or Java. The C# support is being actively worked on and we can expect that at some point getting C# bindings will be as easy as running a single command (or even simpler as Steve Sanderson is already experimenting with making it part of the toolchain) but for now, it's too limited and we will have to approach things differently. What we can use is the C/C++ support.

There is one more challenge on our way. The current version of wit-bindgen is meant for WASI preview 2. Meanwhile, a lot of existing WIT definitions and WASI tooling around native languages are using WASI preview 1. This is exactly the case when it comes to the Slight implementation available in the AKS preview. To handle that we need an old (and I mean old) version of wit-bindgen. I'm using version v0.2.0. Once you install it, you can generate C/C++ bindings for the desired imports and exports. Slight in version 0.1.0 provides a couple of capabilities, but I've decided to start with just one, the HTTP Server. That means we need imports for http.wit and exports for http-handler.wit.

wit-bindgen c --import ./wit/http.wit --out-dir ./native/
wit-bindgen c --export ./wit/http-handler.wit --out-dir ./native/

Once we have the C/C++ bindings, we can configure the build to include those files in the arguments passed to Clang. For this purpose, we can use the .targets file.

<Project>
  <ItemGroup>
    <_WasmNativeFileForLinking Include="$(MSBuildThisFileDirectory)\..\native\*.c" />
  </ItemGroup>
</Project>

We also need to implement the C# part of the interop. For the main part, it's like any other interop you might have ever done. You can use generators or ask for help from a friendly AI assistant and it will get you through the code for types and functions. However, there is one specific catch. The DllImportAttribute requires passing a library name for the P/Invoke generator, but we have no library. The solution is as simple as surprising (at least to me), we can provide the name of any library that the P/Invoke generator knows about.

internal static class HttpServer
{
    // Any library name that P/Invoke generator knows
    private const string LIBRARY_NAME = "libSystem.Native";

    [DllImport(LIBRARY_NAME)]
    internal static extern unsafe void http_server_serve(WasiString address,
        uint httpRouterIndex, out WasiExpected<uint> ret0);

    [DllImport(LIBRARY_NAME)]
    internal static extern unsafe void http_server_stop(uint httpServerIndex,
        out WasiExpected<uint> ret0);
}

With all the glue in place, we can start implementating the HTTP server capability.

Implementing the Slight HTTP Server Capability

In order for a Slight application to start accepting HTTP requests, the application needs to call the http_server_serve function to which it must provide the address it wants to listen on and the index of a router that defines the supported routes. I've decided to roll out to simplest implementation I could think of to start testing things out - the HttpRouter and HttpServer classes which only allow for calling the serve function (no support for routing).

internal class HttpRouter
{
    private uint _index;

    public uint Index => _index;

    private HttpRouter(uint index)
    {
        _index = index;
    }

    public static HttpRouter Create()
    {
        http_router_new(out WasiExpected<uint> expected);

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        return new HttpRouter(expected.Result.Value);
    }
}

internal static class HttpServer
{
    private static uint? _index;

    public static void Serve(string address)
    {
        if (_index.HasValue)
        {
            throw new Exception("The server is already running!");
        }

        HttpRouter router = HttpRouter.Create();
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        _index = expected.Result;
    }
}

This allowed me to implement a simplistic application, a one-liner.

HttpServer.Serve("0.0.0.0:80");

It worked! Every request results in 404, because there is no routing, but it worked. So, how to add support for routing?

The http_router_* functions for defining routes expect two strings - one for the route and one for the handler. This suggests that the handler should be an exported symbol that Slight will be able to call. I went through the bindings exported for http-handler.wit and I've found a function that is being exported as handle-http. That function seems to be what we are looking for. It performs transformations to/from the request and response objects and calls a http_handler_handle_http function which has only a definition. So it looks like the http_handler_handle_http implementation is a place for the application logic. To test this theory, I've started by implementing a simple route registration method.

internal class HttpRouter
{
    ...

    private static readonly WasiString REQUEST_HANDLER = WasiString.FromString("handle-http");

    ...

    public HttpRouter RegisterRoute(HttpMethod method, string route)
    {
        WasiExpected<uint> expected;

        switch (method)
        {
            case HttpMethod.GET:
                http_router_get(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.PUT:
                http_router_put(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.POST:
                http_router_post(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.DELETE:
                http_router_delete(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            default:
                throw new NotSupportedException($"Method {method} is not supported.");
        }

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        return new HttpRouter(expected.Result.Value)
    }

    ...
}

Next I've registered some catch-all routes and implemented the HandleRequest method as the .NET handler. It would still return a 404, but it would be mine 404.

internal static class HttpServer
{
    ...

    public static void Serve(string address)
    {
        ...

        HttpRouter router = HttpRouter.Create()
                            .RegisterRoute(HttpMethod.GET, "/")
                            .RegisterRoute(HttpMethod.GET, "/*");
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        ...
    }

    private static unsafe void HandleRequest(ref HttpRequest request,
        out WasiExpected<HttpResponse> result)
    {
        HttpResponse response = new HttpResponse(404);
        response.SetBody($"Handler Not Found ({request.Method} {request.Uri.AbsolutePath})");

        result = new WasiExpected<HttpResponse>(response);
    }

    ...
}

It's time for some C. I went through the Mono WASI C driver and found two functions that looked the right tools for the job: lookup_dotnet_method and mono_wasm_invoke_method_ref. The implementation didn't seem overly complicated.

#include <string.h>
#include <wasm/driver.h>
#include "http-handler.h"

MonoMethod* handle_request_method;

void mono_wasm_invoke_method_ref(MonoMethod* method, MonoObject** this_arg_in,
                                 void* params[], MonoObject** _out_exc, MonoObject** out_result);

void http_handler_handle_http(http_handler_request_t* req,
                              http_handler_expected_response_error_t* ret0)
{
    if (!handle_request_method)
    {
        handle_request_method = lookup_dotnet_method(
            "Demo.Wasm.Slight",
            "Demo.Wasm.Slight",
            "HttpServer",
            "HandleRequest",
        -1);
    }

    void* method_params[] = { req, ret0 };
    MonoObject* exception;
    MonoObject* result;
    mono_wasm_invoke_method_ref(handle_request_method, NULL, method_params, &exception, &result);
}

But it didn't work. The thrown exception suggested that the Mono runtime wasn't loaded. I went back to studying Mono to learn how it is being loaded. What I've learned is that during compilation a _start() function is being generated. This function performs the steps necessary to load the Mono runtime and wraps the entry point to the .NET code. I could call it, but this would mean going through the Main method and retriggering HttpServer.Serve, which was doomed to fail. I needed to go a level lower. By reading the code of the _start() function I've learned that it calls the mono_wasm_load_runtime function. Maybe I could as well?

...

int mono_runtime_loaded = 0;

...

void http_handler_handle_http(http_handler_request_t* req,
                              http_handler_expected_response_error_t* ret0)
{
    if (!mono_runtime_loaded) {
        mono_wasm_load_runtime("", 0);

        mono_runtime_loaded = 1;
    }

    ...
}

Now it worked. But I wasn't out of the woods yet. What I've just learned meant that to provide dedicated handlers for routes I couldn't rely on registering dedicated methods as part of the Main method flow. I could only register the routes and the handlers needed to be discoverable later, in a new context, with static HandleRequest as the entry point. My thoughts went in the direction of a poor man's attribute-based routing, so I've started with an attribute for decorating handlers.

internal class HttpHandlerAttribute: Attribute
{
    public HttpMethod Method { get; }

    public string Route { get; }

    public HttpHandlerAttribute(HttpMethod method, string route)
    {
        Method = method;
        Route = route;
    }

    ...
}

A poor man's implementation of an attribute-based routing must have an ugly part and it is reflection. To register the routes (and later match them with handlers), the types must be scanned for methods with the attributes. In a production solution, it would be necessary to narrow the scan scope but as this is just a small demo I've decided to keep it simple and scan the whole assembly for static, public and non-public, methods decorated with the attribute. The code supports adding multiple attributes to a single method just because it's simpler than putting proper protections in place. As you can probably guess, I did override the Equals and GetHashCode implementations in the attribute to ensure it behaves nicely as a dictionary key.

internal class HttpRouter
{
    ...

    private static readonly Type HTTP_HANDLER_ATTRIBUTE_TYPE = typeof(HttpHandlerAttribute);

    private static Dictionary<HttpHandlerAttribute, MethodInfo>? _routes;

    ...

    private static void DiscoverRoutes()
    {
        if (_routes is null)
        {
            _routes = new Dictionary<HttpHandlerAttribute, MethodInfo>();

            foreach (Type type in Assembly.GetExecutingAssembly().GetTypes())
            {
                foreach(MethodInfo method in type.GetMethods(BindingFlags.Static |
                                                             BindingFlags.Public |
                                                             BindingFlags.NonPublic))
                {
                    foreach (object attribute in method.GetCustomAttributes(
                             HTTP_HANDLER_ATTRIBUTE_TYPE, false))
                    {
                        _routes.Add((HttpHandlerAttribute)attribute, method);
                    }
                }
            }
        }
    }

    ...
}

With the reflection stuff (mostly) out of the way, I could implement a method that can be called to register all discovered routes and a method to invoke a handler for a route. This implementation is not "safe". I don't do any checks on the reflected MethodInfo to ensure that the method has a proper signature. After all, I can only hurt myself here.

internal class HttpRouter
{
    ...

    internal HttpRouter RegisterRoutes()
    {
        DiscoverRoutes();

        HttpRouter router = this;
        foreach (KeyValuePair<HttpHandlerAttribute, MethodInfo> route in _routes)
        {
            router = router.RegisterRoute(route.Key.Method, route.Key.Route);
        }

        return router;
    }

    internal static HttpResponse? InvokeRouteHandler(HttpRequest request)
    {
        DiscoverRoutes();

        HttpHandlerAttribute attribute = new HttpHandlerAttribute(request.Method,
                                                                  request.Uri.AbsolutePath);
        MethodInfo handler = _routes.GetValueOrDefault(attribute);

        return (handler is null) ? null : (HttpResponse)handler.Invoke(null,
                                                                     new object[] { request });
    }

    ...
}

What remained was small modifications to the HttpServer to use the new methods.

internal static class HttpServer
{
    ...

    public static void Serve(string address)
    {
        ...

        HttpRouter router = HttpRouter.Create()
                            .RegisterRoutes();
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        ...
    }

    private static unsafe void HandleRequest(ref HttpRequest request,
        out WasiExpected<HttpResponse> result)
    {
        HttpResponse? response = HttpRouter.InvokeRouteHandler(request);

        if (!response.HasValue)
        {
            response = new HttpResponse(404);
            response.Value.SetBody(
                $"Handler Not Found ({request.Method} {request.Uri.AbsolutePath})"
            );
        }

        result = new WasiExpected<HttpResponse>(response.Value);
    }

    ...
}

To test this out, I've created two simple handlers.

[HttpHandler(HttpMethod.GET, "/hello")]
internal static HttpResponse HandleHello(HttpRequest request)
{
    HttpResponse response = new HttpResponse(200);
    response.SetHeaders(new[] { KeyValuePair.Create("Content-Type", "text/plain") });
    response.SetBody($"Hello from Demo.Wasm.Slight!");

    return response;
}

[HttpHandler(HttpMethod.GET, "/goodbye")]
internal static HttpResponse HandleGoodbye(HttpRequest request)
{
    HttpResponse response = new HttpResponse(200);
    response.SetHeaders(new[] { KeyValuePair.Create("Content-Type", "text/plain") });
    response.SetBody($"Goodbye from Demo.Wasm.Slight!");

    return response;
}

That's it. Certainly not complete, certainly not optimal, most likely buggy, and potentially leaking memory. But it works and can be deployed to the cloud.

Running a Slight Application in WASM/WASI Node Pool

To deploy our Slight application to the cloud, we need an AKS Cluster with a WASM/WASI node pool. The process of setting it up hasn't changed since my previous post and you can find all the necessary steps there. Here we can start with dockerizing our application.

As we are dockerizing a Wasm application, the final image should be from scratch. In the case of a Slight application, it should contain two elements: app.wasm and slightfile.toml. The slightfile.toml is a configuration file and its main purpose is to define and provide options for the capabilities needed by the application. In our case, that's just the HTTP Server capability.

specversion = "0.1"

[[capability]]
name = "http"

The app.wasm file is our application. It should have this exact name and be placed at the root of the image. To be able to publish the application, the build stage in our Dockerfile must install the same prerequisites as we did for local development (WASI SDK and wasi-experimental workload).

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build

RUN curl https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz -L --output wasi-sdk-20.0-linux.tar.gz
RUN tar -C /usr/local/lib -xvf wasi-sdk-20.0-linux.tar.gz
ENV WASI_SDK_PATH=/usr/local/lib/wasi-sdk-20.0

RUN dotnet workload install wasi-experimental

WORKDIR /src
COPY . .
RUN dotnet publish --configuration Release

FROM scratch

COPY --from=build /src/bin/Release/net8.0/wasi-wasm/AppBundle/Demo.Wasm.Slight.wasm ./app.wasm
COPY --from=build /src/slightfile.toml .

With the infrastructure in place and the image pushed to the container registry, all that is needed is a deployment manifest for Kubernetes resources. It is the same as it was for a Spin application, the only difference is the kubernetes.azure.com/wasmtime-slight-v1 node selector.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: "wasmtime-slight-v1"
handler: "slight"
scheduling:
  nodeSelector:
    "kubernetes.azure.com/wasmtime-slight-v1": "true"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: slight-with-dotnet-8
spec:
  replicas: 1
  selector:
    matchLabels:
      app: slight-with-dotnet-8
  template:
    metadata:
      labels:
        app: slight-with-dotnet-8
    spec:
      runtimeClassName: wasmtime-slight-v1
      containers:
        - name: slight-with-dotnet-8
          image: crdotnetwasi.azurecr.io/slight-with-dotnet-8:latest
          command: ["/"]
---
apiVersion: v1
kind: Service
metadata:
  name: slight-with-dotnet-8
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: slight-with-dotnet-8
  type: LoadBalancer

After applying the above manifest, you can get the service IP with kubectl get svc or from the Azure Portal and execute some requests.

WebAssembly, Cloud, .NET, and Future?

Everything I toyed with here is either a preview or experimental, but it's a glimpse into where WebAssembly in the Cloud is heading.

If I dare to make a prediction, I think that when the wasi-cloud-core proposal reaches a mature enough phase, we will see it supported on AKS (probably it will replace the Spin and Slight available in the current preview). The support for WASI in .NET will also continue to evolve and we will see a non-experimental SDK once the specification gets stable.

For now, we can keep experimenting and exploring. If you want to kick-start your own exploration with what I've described in this post, the source code is here.

Postscript

Yes, I know that this post is already a little bit long, but I wanted to mention one more thing.

When I was wrapping this post, I read the "Extending WebAssembly to the Cloud with .NET" and learned that Steve Sanderson has also built some Slight samples. Despite that, I've decided to publish this post. I had two main reasons for that. First,I dare to think that there is valuable knowledge in this dump of thoughts of mine. Second, those samples take a slightly different direction than mine, and I believe that the fact that you can arrive at different destinations from the same origin is one of the beautiful aspects of software development - something worth sharing.

As developers, we want reassurance that our code functions as expected. We also want to be given that reassurance as fast as possible. This is why we are writing automated tests. We also desire for those tests to be easy to run on our machine or in the worst-case scenario on the build agent as part of the continuous integration pipeline. This is something that can be challenging to achieve when it comes to Azure Functions.

Depending on the language we are using for our Azure Functions, the challenges can be different. Let's take .NET (I guess I haven't surprised anyone with that choice) as an example. In the case of .NET, we can write any unit tests we want (I will deliberately avoid trying to define what is that unit). But the moment we try to move to integration tests, things get tricky. If our Azure Function is using the in-process model, we have an option of crafting a system under test based on the WebJobs host which will be good enough for some scenarios. If our Azure Function is using the isolated worker model there are only two options: accept that our tests will integrate only to a certain level and implement Test Doubles or wait for the Azure Functions team to implement a test worker. This is all far from perfect.

To work around at least some of the above limitations, I've adopted with my teams a different approach for Azure Functions integration testing - we've started using Testcontainers. Testcontainers is a framework for defining through code throwaway, lightweight instances of containers, to be used in test context.

We initially adopted this approach for .NET Azure Functions, but I know that teams creating Azure Functions in different languages also started using it. This is possible because the approach is agnostic to the language in which the functions are written (and the tests can be written using any language/framework supported by Testcontainers).

In this post, I want to share with you the core parts of this approach. It starts with creating a Dockerfile for your Azure Functions.

Creating a Dockerfile for an Azure Functions Container Image

You may already have a Dockerfile for your Azure Functions (for example if you decided to host them in Azure Container Apps or Kubernetes). From my experience, that's usually not the case. That means you need to create a Dockerfile. There are two options for doing that. You can use Azure Functions Core Tools and call func init with the --docker-only option, or you can create the Dockerfile manually. The Dockerfile is different for every language, so until you gain experience I suggest using the command. Once you are familiar with the structure, you will very likely end up with a modified template that you will be reusing with small adjustments. The one below is my example for .NET Azure Functions using the isolated worker model.

FROM mcr.microsoft.com/dotnet/sdk:7.0 AS installer-env
ARG RESOURCE_REAPER_SESSION_ID="00000000-0000-0000-0000-000000000000"
LABEL "org.testcontainers.resource-reaper-session"=$RESOURCE_REAPER_SESSION_ID

WORKDIR /src
COPY function-app/ ./function-app/

RUN dotnet publish function-app \
    --output /home/site/wwwroot

FROM mcr.microsoft.com/azure-functions/dotnet-isolated:4-dotnet-isolated7.0

ENV AzureWebJobsScriptRoot=/home/site/wwwroot \
    AzureFunctionsJobHost__Logging__Console__IsEnabled=true

COPY --from=installer-env ["/home/site/wwwroot", "/home/site/wwwroot"]

What's probably puzzling you right now is that label based on the provided argument. This is something very specific to Testcontainers. The above Dockerfile is for a multi-stage build, so it will generate intermediate layers. Testcontainers has a concept of Resource Reaper which job is to remove Docker resources once they are no longer needed. This label is needed for the Resource Reaper to be able to track those intermediate layers.

Once we have the Dockerfile we can create the test context.

Creating a Container Instance in Test Context

The way you create the test context depends on the testing framework you are going to use and the isolation strategy you want for that context. My framework of choice is xUnit. When it comes to the isolation strategy, it depends 😉. That said, the one I'm using most often is test class. For xUnit that translates to class fixture. You can probably guess that there are also requirements when it comes to the context lifetime management. After all, we will be spinning containers and that takes time. That's why the class fixture must implement IAsyncLifetime to provide support for asynchronous operations.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    ...

    public AzureFunctionsTestcontainersFixture()
    { 
        ...
    }

    public async Task InitializeAsync()
    {
        ...
    }

    public async Task DisposeAsync()
    {
        ...
    }
}

There are a couple of things that we need to do here. The first is creating an image based on our Dockerfile. For this purpose, we can use ImageFromDockerfileBuilder. The minimum we need to provide is the location of the Dockerfile (directory and file name). Testcontainers provides us with some handy helpers for getting the solution, project, or Git directory. We also want to set that RESOURCE_REAPER_SESSION_ID argument.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    private readonly IFutureDockerImage _azureFunctionsDockerImage;

    public AzureFunctionsTestcontainersFixture()
    {
        _azureFunctionsDockerImage = new ImageFromDockerfileBuilder()
            .WithDockerfileDirectory(CommonDirectoryPath.GetSolutionDirectory(), String.Empty)
            .WithDockerfile("AzureFunctions-Testcontainers.Dockerfile")
            .WithBuildArgument(
                 "RESOURCE_REAPER_SESSION_ID",
                 ResourceReaper.DefaultSessionId.ToString("D"))
            .Build();
    }

    public async Task InitializeAsync()
    {
        await _azureFunctionsDockerImage.CreateAsync();

        ...
    }

    ...
}

With the image in place, we can create a container instance. This will require reference to the image, port binding, and wait strategy. Port binding is something that Testcontainers can almost completely handle for us. We just need to tell which port to bind and the host port can be assigned randomly. The wait strategy is quite important. This is how the framework knows that the container instance is available. We have a lot of options here: port availability, specific message in log, command completion, file existence, successful request, or HEALTHCHECK. What works great for Azure Functions is a successful request to its default page.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    private readonly IFutureDockerImage _azureFunctionsDockerImage;

    public IContainer AzureFunctionsContainerInstance { get; private set; }

    ...

    public async Task InitializeAsync()
    {
        await _azureFunctionsDockerImage.CreateAsync();

        AzureFunctionsContainerInstance = new ContainerBuilder()
            .WithImage(_azureFunctionsDockerImage)
            .WithPortBinding(80, true)
            .WithWaitStrategy(
                Wait.ForUnixContainer()
                .UntilHttpRequestIsSucceeded(r => r.ForPort(80)))
            .Build();
        await AzureFunctionsContainerInstance.StartAsync();
    }

    ...
}

The last missing part is the cleanup. We should nicely dispose the container instance and the image.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    private readonly IFutureDockerImage _azureFunctionsDockerImage;

    public IContainer AzureFunctionsContainerInstance { get; private set; }

    ...

    public async Task DisposeAsync()
    {
        await AzureFunctionsContainerInstance.DisposeAsync();

        await _azureFunctionsDockerImage.DisposeAsync();
    }
}

Now we are ready to write some tests.

Implementing Integration Tests

At this point, we can start testing our function. We need a test class using our class fixture.

public class AzureFunctionsTests : IClassFixture<AzureFunctionsTestcontainersFixture>
{
    private readonly AzureFunctionsTestcontainersFixture _azureFunctionsTestcontainersFixture;

    public AzureFunctions(AzureFunctionsTestcontainersFixture azureFunctionsTestcontainersFixture)
    {
        _azureFunctionsTestcontainersFixture = azureFunctionsTestcontainersFixture;
    }

    ...
}

Now for the test itself, let's assume that the function has an HTTP trigger. To build the URL of our function we can use the Hostname provided by the container instance and acquire the host port by calling .GetMappedPublicPort. This means that the test only needs to create an instance of HttpClient, make a request, and assert the desired aspects of the response. The simplest test I could think of was to check for a status code indicating success.

public class AzureFunctionsTests : IClassFixture<AzureFunctionsTestcontainersFixture>
{
    private readonly AzureFunctionsTestcontainersFixture _azureFunctionsTestcontainersFixture;

    ...

    [Fact]
    public async Task Function_Request_ReturnsResponseWithSuccessStatusCode()
    {
        HttpClient httpClient = new HttpClient();
        var requestUri = new UriBuilder(
            Uri.UriSchemeHttp,
            _azureFunctionsTestcontainersFixture.AzureFunctionsContainerInstance.Hostname,
            _azureFunctionsTestcontainersFixture.AzureFunctionsContainerInstance.GetMappedPublicPort(80),
            "api/function"
        ).Uri;

        HttpResponseMessage response = await httpClient.GetAsync(requestUri);

        Assert.True(response.IsSuccessStatusCode);
    }
}

And Voila. This will run on your machine (assuming you have Docker) and in any CI/CD environment which build agents have Docker pre-installed (for example Azure DevOps or GitHub).

Adding Dependencies

What I've shown you so far covers the scope of the function itself. This is already beneficial because it allows for verifying if dependencies are registered properly or if the middleware pipeline behaves as expected. But Azure Functions rarely exist in a vacuum. There are almost always dependencies and Testcontainers can help us with those dependencies as well. There is a wide set of preconfigured implementations that we can add to our test context. A good example can be storage. In the majority of cases, storage is required to run the function itself. For local development, Azure Functions are using the Azurite emulator and we can do the same with Testcontainers as it is available as a ready-to-use module. To add it to the context you just need to reference the proper NuGet package and add a couple of lines of code.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    ...

    public AzuriteContainer AzuriteContainerInstance { get; private set; }

    ...

    public async Task InitializeAsync()
    {
        AzuriteContainerInstance = new AzuriteBuilder().Build();
        await AzuriteContainerInstance.StartAsync();

        ...
    }

    public async Task DisposeAsync()
    {
        ...

        await AzuriteContainerInstance.DisposeAsync();
    }
}

We also need to point Azure Functions to use this Azurite container by setting the AzureWebJobsStorage parameter.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    ...

    public async Task InitializeAsync()
    {
        ...

        AzureFunctionsContainerInstance = new ContainerBuilder()
            ...
            .WithEnvironment("AzureWebJobsStorage", AzuriteContainerInstance.GetConnectionString())
            ...
            .Build();

        ...
    }

    ...
}

That's it. Having Azurite in place also enables testing functions that use triggers and bindings based on Azure Storage. There are also ready-to-use modules for Redis, Azure Cosmos DB, Azure SQL Edge, MS SQL, Kafka, or RabbitMQ. So there is quite good out-of-the-box coverage for potential Azure Functions dependencies. Some other dependencies can be covered by creating containers yourself (for example with an unofficial Azure Event Grid simulator). That said, some dependencies can be only satisfied by a real thing (at least for now).

A Powerful Tool in Your Toolbox

Is Testcontainers a solution for every integration problem - no. Should Testcontainers be your default choice when thinking about integration tests - also no. But it is a very powerful tool and you should be familiar with it, so you can use it when appropriate.

The first time I wrote about change feeds consumption from .NET (ASP.NET Core to be more precise) was back in 2018 in the context of RethinkDB. It was always a very powerful concept. Having access to an ordered flow of information about changes to items is a low-entry enabler for various event-driven, stream processing, or data movement scenarios. As a result, over the years, this capability (with various name variations around the words change, stream, and feed) has found its way to many databases and sometimes even other storage services. The list includes (but is not limited to) MongoDB, RavenDB, Cosmos DB, DynamoDB or Azure Blob Storage (in preview).

As I was cleaning up and updating a demo application that shows how to consume and expose various change feeds from ASP.NET Core, I decided to write down some notes to refresh the content from my previous posts.

IAsyncEnumerable as Universal Change Feed Abstraction

When I started working with change feeds over 5 years ago, I initially didn't put them behind any abstraction. I like to think that I was smart and avoided premature generalization. The abstraction came after a couple of months when I could clearly see that I was implementing the same concepts through similar components in different projects where teams were using RethinkDB, MongoDB, or Cosmos DB. The abstraction that I started advocating back then looked usually like this.

public interface IChangeFeed<T>
{
    T CurrentChange { get; }

    Task<bool> MoveNextAsync(CancellationToken cancelToken = default(CancellationToken));
}

In retrospect, I'm happy with this abstraction, because around two or more years later, when those teams and projects started to adopt C# 8 and .NET Core 3 (or later versions), refactoring all those implementations was a lot easier. C# 8 has brought async streams, a natural programming model for asynchronous streaming data sources. Asynchronous streaming data source is exactly what change feeds are and modeling them through IAsyncEnumerable results in nice and clean consumption patterns. This is why currently I advocate for using IAsyncEnumerable as a universal change feed abstraction. The trick to properly using that abstraction is defining the right change representation to be returned. That should depend on change feed capabilities and actual needs in a given context. Not all change feeds are the same. Some of them can provide information on all operations performed on an item, and some only on a subset. Some can provide old and new value, and some only the old. Your representation of change should consider all that. In the samples ahead I'm avoiding this problem by reducing the change representation to the changed version of the item.

Azure Cosmos DB Change Feed

Azure Cosmos DB change feed is the second (after the RethinkDB one) I've been writing about in the past. It's also the one which consumption has seen the most evolution through time.

The first consumption model was quite complicated. It required going through partition key ranges, building document change feed queries for them, and then obtaining enumerators. This whole process required managing its state, which resulted in non-trivial code. It's good that it has been deprecated as part of Azure Cosmos DB .NET SDK V2, and it's going out of support in August 2024.

Azure Cosmos DB .NET SDK V3 has brought the second consumption model based on change feed processor. The whole inner workings of consuming the change feed have been enclosed within a single class, which reduced the amount of code required. But change feed processor has its oddities. It requires an additional container - a lease container that deals with previously described state management. This is beneficial in complex scenarios as it allows for coordinated processing by multiple workers, but becomes an unnecessary complication for simple scenarios. It also provides only a push-based programming model. The consumer must provide a delegate to receive changes. Once again this is great for certain scenarios, but leads to awkward implementation when you want to abstract change feed as a stream.

The story doesn't end there, version 3.20.0 of Azure Cosmos DB .NET SDK has introduced the third consumption model based on change feed iterator. It provides a pull-based alternative to change feed processor for scenarios where it's more appropriate. With the change feed iterator the control over the pace of consuming the changes is given back to the consumer. State management is also optional, but it's the consumer's responsibility to persist continuation tokens if necessary. Additionally, the change feed iterator brings the option of obtaining a change feed for a specific partition key.

The below snippet shows a very simple consumer implementation of the change feed iterator model - no state management, just starting the consumption from a certain point in time and waiting one second before polling for new changes.

public async IAsyncEnumerable<T> FetchFeed(
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    FeedIterator<T> changeFeedIterator = _container.GetChangeFeedIterator<T>(
        ChangeFeedStartFrom.Time(DateTime.UtcNow),
        ChangeFeedMode.LatestVersion
    );

    while (changeFeedIterator.HasMoreResults && !cancellationToken.IsCancellationRequested)
    {
        FeedResponse<T> changeFeedResponse = await changeFeedIterator
            .ReadNextAsync(cancellationToken);

        if (changeFeedResponse.StatusCode == HttpStatusCode.NotModified)
        {
            await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
        }
        else
        {
            foreach (T item in changeFeedResponse)
            {
                yield return item;
            }
        }
    }
}

MongoDB Change Feed

MongoDB is probably the most popular NoSQL choice among the teams that I've been working with, which doesn't use cloud PaaS databases for their needs. Among its many features, it has quite powerful change feed (a.k.a. Change Streams) capability.

The incoming change information can cover a wide spectrum of operations which can come from a single collection, database, or entire deployment. If the operation relates to a document, the change feed can provide the current version, the previous version, and the delta. There is also support for resume tokens which can be used to manage state if needed.

One unintuitive thing when it comes to MongoDB change feed is that it's only available when you are running a replica set or a sharded cluster. This doesn't mean that you have to run a cluster. You can run a single instance as a replica set (even in a container), you just need the right configuration (you will find a workflow that handles such a deployment to Azure Container Instances in the demo repository).

The consumption of MongoDB change feed is available through the Watch and WatchAsync methods available on IMongoCollection, IMongoDatabase, and IMongoClient instances. The below snippet watches a single collection and configures the change feed to return the current version of the document. You can also provide a pipeline definition when calling Watch or WatchAsync to filter the change feed (for example to monitor only specific operation types).

public async IAsyncEnumerable<T> FetchFeed(
    [EnumeratorCancellation]CancellationToken cancellationToken = default)
{
    IAsyncCursor<ChangeStreamDocument<T>> changefeed = await _collection.WatchAsync(
        new ChangeStreamOptions { FullDocument = ChangeStreamFullDocumentOption.UpdateLookup },
        cancellationToken: cancellationToken
    );

    while (!cancellationToken.IsCancellationRequested)
    {
        while (await changefeed.MoveNextAsync(cancellationToken))
        {
            IEnumerator<ChangeStreamDocument<T>>  changefeedCurrentEnumerator = changefeed
                .Current.GetEnumerator();

            while (changefeedCurrentEnumerator.MoveNext())
            {
                if (changefeedCurrentEnumerator.Current.OperationType
                    == ChangeStreamOperationType.Insert)
                {
                    yield return changefeedCurrentEnumerator.Current.FullDocument;
                }

                ...
            }
        }

        await Task.Delay(_moveNextDelay, cancellationToken);
    }
}

Azure Blob Storage Change Feed

Azure Blob Storage is the odd one on this list because it's an object storage, not a database. Its change feed provides information about changes to blobs and blobs metadata in an entire storage account. Under the hood the change feed is implemented as a special container (yes it's visible, yes you can take a look) which is being created once you enable it. As it is a container you should consider the configuration of the retention period as it will affect your costs.

There is one more important aspect of Azure Blob Storage change feed when considering its usage - latency. It's pretty slow. It can take minutes for changes to appear.

From the consumption perspective, it follows the enumerator approach. You can obtain the enumerator by calling BlobChangeFeedClient.GetChangesAsync. The enumerator is not infinite, it will return the changes currently available and once you process them you have to poll for new ones. This makes managing the continuation tokens required even for a local state. What is unique is that you can request changes within a specified time window.

The change feed supports six events in the latest schema version. In addition to expected ones like created or deleted, there are some interesting ones like tier changed. The information never contains the item, which shouldn't be surprising as in the context of object storage this would be quite risky.

The below snippet streams the change feed by locally managing the continuation token and for changes that represent blob creation, it downloads the current version of the item.

public async IAsyncEnumerable<T> FetchFeed(
    [EnumeratorCancellation]CancellationToken cancellationToken = default)
{
    string? continuationToken = null;

    TokenCredential azureCredential = new DefaultAzureCredential();

    BlobServiceClient blobServiceClient = new BlobServiceClient(_serviceUri, azureCredential);
    BlobChangeFeedClient changeFeedClient = _blobServiceClient.GetChangeFeedClient();

    while (!cancellationToken.IsCancellationRequested)
    {
        IAsyncEnumerator<Page<BlobChangeFeedEvent>> changeFeedEnumerator = changeFeedClient
            .GetChangesAsync(continuationToken)
            .AsPages()
            .GetAsyncEnumerator();

        while (await changeFeedEnumerator.MoveNextAsync())
        {
            foreach (BlobChangeFeedEvent changeFeedEvent in changeFeedEnumerator.Current.Values)
            {
                if ((changeFeedEvent.EventType == BlobChangeFeedEventType.BlobCreated)
                    && changeFeedEvent.Subject.StartsWith($"/blobServices/default/containers/{_container}"))
                {
                    BlobClient createdBlobClient = new BlobClient(
                        changeFeedEvent.EventData.Uri,
                        azureCredential);

                    if (await createdBlobClient.ExistsAsync())
                    {
                        MemoryStream blobContentStream =
                            new MemoryStream((int)changeFeedEvent.EventData.ContentLength);
                        await createdBlobClient.DownloadToAsync(blobContentStream);
                        blobContentStream.Seek(0, SeekOrigin.Begin);

                        yield return JsonSerializer.Deserialize<T>(blobContentStream);
                    }
                }
            }

            continuationToken = changeFeedEnumerator.Current.ContinuationToken;
        }

        await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
    }
}

There Is More

The above samples are in no way exhaustive. They don't show all the features of given change feeds and they don't show all the change feeds out there. But they are a good start, this is why I've been evolving them for the past five years.

Containers have become one of the main, if not the main, ways to modularize, isolate, encapsulate, and package applications in the cloud. The sidecar pattern allows for taking this even further by allowing the separation of functionalities like monitoring, logging, or configuration from the business logic. This is why I recommend that the teams who are adopting containers adopt sidecars as well. One of my preferred suggestions is Dapr which can bring early value by providing abstractions for message broker integration, encryption, observability, secret management, state management, or configuration management.

To my surprise, many conversations starting around adopting sidecars quickly deviate to "we should set up a Kubernetes cluster". It's almost like there are only two options out there - you either run a standalone container or you need Kubernetes for anything more complicated. This is not the case. There are multiple ways to run containers and you should choose the one which is most suitable for your current context. Many of those options will give you more sophisticated features like sidecars or init containers while your business logic is still in a single container. Sidecars give here an additional benefit of enabling later evolution to more complex container hosting options without requirements for code changes.

In the case of Azure, such a service that enables adopting sidecars at an early stage is Azure Container Instances.

Quick Reminder - Azure Container Instances Can Host Container Groups

Azure Container Instances provides a managed approach for running containers in a serverless manner, without orchestration. What I've learned is that a common misconception is that Azure Container Instances can host only a single container. That is not exactly the truth, Azure Container Instances can host a container group (if you're using Linux containers 😉).

A container group is a collection of containers scheduled on the same host and sharing lifecycle, resources, or local network. The container group has a single public IP address, but the publicly exposed ports can forward to ports exposed on different containers. At the same time, all the containers within the group can reach each other via localhost. This is what enables the sidecar pattern.

How to create a container group? There are three options:

  • With ARM/Bicep
  • With Azure CLI by using YAML file
  • With Azure CLI by using Docker compose file

I'll go with Bicep here. The Microsoft.ContainerInstance namespace contains only a single type which is containerGroups. This means that from ARM/Bicep perspective there is no difference if you are deploying a standalone container or a container group - there is a containers list available as part of the resource properties where you specify the containers.

resource containerGroup 'Microsoft.ContainerInstance/containerGroups@2023-05-01' = {
  name: CONTAINER_GROUP
  location: LOCATION
  ...
  properties: {
    sku: 'Standard'
    osType: 'Linux'
    ...
    containers: [
      ...
    ]
    ...
  }
}

How about a specific example? I've mentioned that Dapr is one of my preferred sidecars, so I'm going to use it here.

Running Dapr in Self-Hosted Mode Within a Container Group

Dapr has several hosting options. It can be self-hosted with Docker, Podman, or without containers. It can be hosted in Kubernetes with first-class integration. It's also available as a serverless offering - part of Azure Container Apps. The option interesting us in the context of Azure Container Instances is self-hosted with Docker, but from that list, you could pick up how Dapr enables easy evolution from Azure Container Instances to Azure Container Apps, Azure Kubernetes Services or non-Azure Kubernetes clusters.

But before we will be ready to deploy the container group, we need some infrastructure around it. We should start with a resource group, container registry and managed identity.

az group create -l $LOCATION -g $RESOURCE_GROUP
az acr create -n $CONTAINER_REGISTRY -g $RESOURCE_GROUP --sku Basic
az identity create -n $MANAGED_IDENTITY -g $RESOURCE_GROUP

We will be using the managed identity for role-based access control where possible, so we should reference it as the identity of the container group in our Bicep template.

resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' existing = {
  name: MANAGED_IDENTITY
}

resource containerGroup 'Microsoft.ContainerInstance/containerGroups@2023-05-01' = {
  ...
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${managedIdentity.id}': {}
    }
  }
  properties: {
    ...
  }
}

The Dapr sidecar requires a components directory. It's a folder that will contain YAML files with components definitions. To provide that folder to the Dapr sidecar container, we have to mount it as a volume. Azure Container Instances supports mounting an Azure file share as a volume, so we have to create one.

az storage account create -n $STORAGE_ACCOUNT -g $RESOURCE_GROUP --sku Standard_LRS
az storage share create -n daprcomponents --account-name $STORAGE_ACCOUNT

The created Azure file share needs to be added to the list of volumes that can be mounted by containers in the group. Sadly, the integration between Azure Container Instances and Azure file share doesn't support role-based access control, an access key has to be used.

...

resource storageAccount 'Microsoft.Storage/storageAccounts@2022-09-01' existing = {
  name: STORAGE_ACCOUNT
}

resource containerGroup 'Microsoft.ContainerInstance/containerGroups@2023-05-01' = {
  ...
  properties: {
    ...
    volumes: [
      {
        name: 'daprcomponentsvolume'
        azureFile: {
          shareName: 'daprcomponents'
          storageAccountKey: storageAccount.listKeys().keys[0].value
          storageAccountName: storageAccount.name
          readOnly: true
        }
      }
    ]
    ...
  }
}

We also need to assign the AcrPull role to the managed identity so it can access the container registry.

az role assignment create --assignee $MANAGED_IDENTITY_OBJECT_ID \
    --role AcrPull \
    --scope "/subscriptions/$SUBSCRIPTION_ID/resourcegroups/$RESOURCE_GROUP/providers/Microsoft.ContainerRegistry/registries/$CONTAINER_REGISTRY"

I'm skipping the creation of the image for the application with the business logic, pushing it to the container registry, adding its definition to the containers list, and exposing needed ports from the container group - I want to focus on the Dapr sidecar.

In this example, I will be grabbing the daprd image from the Docker Registry.

The startup command for the sidecar is ./daprd. We need to provide a --resources-path parameter which needs to point to the path where the daprcomponentsvolume will be mounted. I'm also providing the --app-id parameter. This parameter is mostly used for service invocation (it won't be the case here and I'm not providing --app-port) but Dapr is using it also in different scenarios (for example as partition key for some state stores).

Two ports need to be exposed from this container (not publicly): 3500 is the default HTTP endpoint port and 50001 is the default gRPC endpoint port. There is an option to change both ports through configuration if they need to be taken by some other container.

resource containerGroup 'Microsoft.ContainerInstance/containerGroups@2023-05-01' = {
  ...
  properties: {
    ...
    containers: [
      ...
      {
        name: 'dapr-sidecar'
        properties: {
          image: 'daprio/daprd:1.10.9'
          command: [ './daprd', '--app-id', 'APPLICATION_ID', '--resources-path', './components']
          volumeMounts: [
            {
              name: 'daprcomponentsvolume'
              mountPath: './components'
              readOnly: true
            }
          ]
          ports: [
            { 
              port: 3500
              protocol: 'TCP'
            }
            { 
              port: 50001
              protocol: 'TCP'
            }
          ]
          ...
        }
      }
    ]
    ...
  }
}

I've omitted the resources definition for brevity.

Now the Bicep template can be deployed.

az deployment group create -g $RESOURCE_GROUP -f container-group-with-dapr-sidecar.bicep

The below diagram visualizes the final state after the deployment.

Diagram of Azure Container Instances hosting a container group including application container and Dapr sidecar integrated with Azure Container Registry and having Azure file share mounted as volume with Dapr components definitions.

Configuring a Dapr Component

We have a running Dapr sidecar, but we have yet to make it truly useful. To be able to use APIs provided by Dapr, we have to provide the mentioned earlier components definitions which will provide implementation for those APIs. As we already have a storage account as part of our infrastructure, a state store component seems like a good choice. Dapr supports quite an extensive list of stores, out of which two are based on Azure Storage: Azure Blob Storage and Azure Table Storage. Let's use the Azure Table Storage one.

First I'm going to create a table. This is not a required step, the component can do it for us, but let's assume we want to seed some data manually before the deployment.

Second, the more important operation is granting needed permissions to the storage account. Dapr has very good support for authenticating to Azure which includes managed identities and role-based access control, so I'm just going to assign the Storage Table Data Reader role to our managed identity for the scope of the storage account.

az storage table create -n $TABLE_NAME --account-name $STORAGE_ACCOUNT
az role assignment create --assignee $MANAGED_IDENTITY_OBJECT_ID \
    --role "Storage Table Data Contributor" \
    --scope "/subscriptions/$SUBSCRIPTION_ID/resourcegroups/$RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/$STORAGE_ACCOUNT"

The last thing we need is the component definition. The component type we want is state.azure.tablestorage. The name is what we will be using when making calls with a Dapr client. As we are going to use managed identity for authenticating, we should provide accountName, tableName, and azureClientId as metadata. I'm additionally setting skipCreateTable because I created the table earlier and the component will fail on an attempt to create it once again.

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: state.table.<TABLE_NAME>
spec:
  type: state.azure.tablestorage
  version: v1
  metadata:
  - name: accountName
    value: <STORAGE_ACCOUNT>
  - name: tableName
    value: <TABLE_NAME>
  - name: azureClientId
    value: <Client ID of MANAGED_IDENTITY>
  - name: skipCreateTable
    value: true

The file with the definition needs to be uploaded to the file share which is mounted as the components directory. The Azure Container Instances need to be restarted for the component to be loaded. We can quickly verify if it has been done by taking a look at logs.

time="2023-08-31T21:25:22.5325911Z"
level=info
msg="component loaded. name: state.table.<TABLE_NAME>, type: state.azure.tablestorage/v1"
app_id=APPLICATION_ID
instance=SandboxHost-638291138933285823
scope=dapr.runtime
type=log
ver=1.10.9

Now you can start managing your state with a Dapr client for your language of choice or with HTTP API if one doesn't exist.

The Power of Abstraction, Decoupling, and Flexibility

As you can see, the needed increase in complexity (when compared to a standalone container hosted in Azure Container Instances) is not that significant. At the same time, the gain is. Dapr allows us to abstract all the capabilities it provides in the form of building blocks. It also decouples the capabilities provided by building blocks from the components providing implementation. We can change Azure Table Storage to Azure Cosmos DB if it better suits our solution, or to AWS DynamoDB if we need to deploy the same application to AWS. We also now have the flexibility of evolving our solution when the time comes to use a more sophisticated container offering - we just need to take Dapr with us.

Older Posts