Azure Functions Extensibility - Extensions and Isolated Worker Model

March 5, 2024 azure functions, extensibility, isolated worker 0 comments

I've been exploring the subject of Azure Functions extensibility on this blog for quite some time. I've touched on subjects directly related to creating extensions and their anatomy, but also some peripheral ones.

I have always written from the perspective of the in-process model, but since 2020 there has been a continued evolution when it comes to the preferred model for .NET-based function apps. The isolated worker model, first introduced with .NET 5, has been gaining parity and becoming the leading vehicle for making new .NET versions available with Azure Functions. In August 2023 Microsoft announced the intention for .NET 8 to be the last LTS release to receive in-process model support. So the question comes, does it invalidate all that knowledge about Azure Functions extensibility? The short answer is no. But before I go into details, I need to cover the common ground.

Isolated Worker Model in a Nutshell

.NET was always a little bit special in Azure Functions. It shouldn't be a surprise. After all, it's Microsoft technology and there was a desire for the integration to be efficient and powerful. So even when Azure Functions v2 brought the separation between the host process and language worker process, .NET-based function apps were running in the host process. This had performance benefits (no communication between processes) and allowed .NET functions apps to leverage the full capabilities of the host, but started to become a bottleneck when the pace of changes in the .NET ecosystem accelerated. There were more and more conflicts between the assemblies that developers wanted to use in the apps and the ones used by the host. There was a delay in making new .NET versions available because the host had to be updated. Also, there were things that the app couldn't do because it was coupled with the host. Limitations like those were reasons for bringing Azure Functions .NET Worker to life.

At the same time, Microsoft didn't want to take away all the benefits that .NET developers had when working with Azure Functions. The design had to take performance and developers' experience into account. So how does Azure Functions .NET Worker work? In simplification, it's an ASP.NET Core application that receives inputs and provides outputs to the host over gRPC (which is more performant than HTTP primitives used in the case of custom handlers)

The request and response payloads are also pretty well hidden. Developers have been given a new binding model with required attributes available through *.Azure.Functions.Worker.Extensions.* packages. But if the actual bindings activity happens in the host, what do those new packages provide? And what is their relation with the *.Azure.WebJobs.Extensions.* packages?

Worker Extensions and WebJobs Extensions

The well-hidden truth is that the worker extension packages are just a bridge to the in-process extension packages. It means that if you want to create a new extension or understand how an existing one works, you should start with an extension for the in-process model. The worker extensions are mapped to the in-process ones through an assembly-level attribute, which takes the name of the package and version to be used as parameters.

[assembly: ExtensionInformation("RethinkDb.Azure.WebJobs.Extensions", "0.6.0")]

The integration is quite seamless. During the build, the Azure Functions tooling will use NuGet to install the needed in-process extension package, it doesn't have to be referenced. Of course that has its drawbacks (tight coupling to a specific version and more challenges during debugging). So, the final layout of the packages can be represented as below.

What ensures the cooperation between those two packages running in two different processes are the binding attributes.

Binding Attributes

In the case of the in-process model extensions we have two types of attributes - one for bindings and one for trigger. In the case of the isolated worker model, there are three - for input binding, for output binding, and for trigger.

public class RethinkDbInputAttribute : InputBindingAttribute
{
    ...
}

public sealed class RethinkDbOutputAttribute : OutputBindingAttribute
{
    ...
}

public sealed class RethinkDbTriggerAttribute : TriggerBindingAttribute
{
    ...
}

The isolated worker model attributes are used in two ways. One is for developers, who use them to decorate their functions and provide needed settings. The other is for the worker, which uses them as data transfer objects. They are being serialized and transferred as metadata. On the host side, they are being deserialized to the corresponding in-process model extension attribute. The input and output attributes will be deserialized to the binding attribute, and the trigger will be deserialized to the trigger. This means that we need to ensure that the names of properties which we want to support are matching.

Implementing the attributes and decorating the functions with them is all we need to make it work. This will give us support for POCOs as values (the host and worker will take care of serialization, transfer over gRPC, and deserialization). But what if we want something more than POCO?

Beyond POCO Inputs With Converters

It's quite common for in-process extensions to support binding data provided using types from specific service SDKs (for example CosmosClient in the case of Azure Cosmos DB). That kind of binding data is not supported out-of-the-box by isolated worker extensions as they can't be serialized and transferred. But there is a way for isolated worker extensions to go beyond POCOs - input converters.

Input converters are classes that implement the IInputConverter interface. This interface defines a single method, which is supposed to return a conversation result. Conversation result can be one of the following:

Unhandled (the converter did not act on the input)
Succeeded (conversion was successful and the result is included)
Failed

The converter should check if it's being used with an extension it supports (the name that has been used for isolated extensions registration will be provided as part of the model binding data) and if the incoming content is in supported format. The converter can also be decorated with multiple SupportedTargetType attributes to narrow its scope.

Below is a sample template for an input converter.

[SupportsDeferredBinding]
[SupportedTargetType(typeof(...))]
[SupportedTargetType(typeof(...))]
internal class RethinkDbConverter : IInputConverter
{
    private const string RETHINKDB_EXTENSION_NAME = "RethinkDB";
    private const string JSON_CONTENT_TYPE = "application/json";

    ...

    public RethinkDbConverter(...)
    {
        ...
    }

    public async ValueTask<ConversionResult> ConvertAsync(ConverterContext context)
    {
        ModelBindingData modelBindingData = context?.Source as ModelBindingData;

        if (modelBindingData is null)
        {
            return ConversionResult.Unhandled();
        }

        try
        {
            if (modelBindingData.Source is not RETHINKDB_EXTENSION_NAME)
            {
                throw new InvalidOperationException($"Unexpected binding source.");
            }

            if (modelBindingData.ContentType is not JSON_CONTENT_TYPE)
            {
                throw new InvalidOperationException($"Unexpected content-type.");
            }

            object result = context.TargetType switch
            {
                // Here you can use modelBindingData.Content,
                // any injected services, etc.
                // to prepare the value.
                ...
            };


            return ConversionResult.Success(result);
        }
        catch (Exception ex)
        {
            return ConversionResult.Failed(ex);
        }
    }
}

Input converters can be applied to input and trigger binding attributes by simply decorating them with an attribute (we should also define the fallback behavior policy).

[InputConverter(typeof(RethinkDbConverter))]
[ConverterFallbackBehavior(ConverterFallbackBehavior.Default)]
public class RethinkDbInputAttribute : InputBindingAttribute
{
    ...
}

Adding input converters to the extensions moves some of the logic from host to worker. It may mean that the worker will be now establishing connections to the services or performing other operations. This will most likely create a need to register some dependencies, read configuration, and so on. Such things are best done at a function startup.

Participating in the Function App Startup

Extensions for the isolated worker model can implement a startup hook. It can be done by creating a public class with a parameterless constructor, that derives from WorkerExtensionStartup. This class also has to be registered through an assembly-level attribute. Now we can override the Configure method and register services and middlewares. The mechanic is quite similar to its equivalent for in-process extension.

[assembly: WorkerExtensionStartup(typeof(RethinkDbExtensionStartup))]

namespace Microsoft.Azure.Functions.Worker
{
    public class RethinkDbExtensionStartup : WorkerExtensionStartup
    {
        public override void Configure(IFunctionsWorkerApplicationBuilder applicationBuilder)
        {
            if (applicationBuilder == null)
            {
                throw new ArgumentNullException(nameof(applicationBuilder));
            }

            ...
        }
    }
}

The Conclusion

The isolated worker model doesn't invalidate what we know about the Azure Functions extensions, on the contrary, it builds another layer on top of that knowledge. Sadly, there are limitations in the supported data types for bindings which come from the serialization and transfer of data between the host and the worker. Still, in my opinion, the benefits of the new model can outweigh those limitations.

If you are looking for a working sample to learn and explore, you can find one here.

Azure Functions Extensibility - Runtime Scaling

February 22, 2024 azure functions, extensibility, runtime scaling, target-based scaling 0 comments

In the past, I've written posts that explored the core aspects of creating Azure Functions extensions:

In this post, I want to dive into how extensions implement support for the Azure Functions scaling model.

The Azure Functions scaling model (as one can expect) has evolved over the years. The first model is incremental scaling handled by the scale controller (a dedicated Azure Functions component). The scale controller monitors the rate of incoming events and makes magical (following the rule that every sophisticated enough technology is indistinguishable from magic) decisions about whether to add or remove a worker. The workers count can change only by one and at a specific rate. This mechanism has some limitations.

The first problem is the inability to scale in network-isolated deployments. In such a scenario, the scale controller doesn't have access to the services and can't monitor events - only the function app (and as a consequence the extensions) could. As a result, toward the end of 2019, the SDK provided support for extensions to vote in scale in and scale out decisions.

The second problem is the scaling process clarity and rate. Change by one worker at a time driven by complex heuristics is not perfect. This led to the introduction of target-based scaling at the beginning of 2023. Extensions become able to implement their own algorithms for requesting a specific number of workers and scale up by four instances at a time.

So, how do those two mechanisms work and can be implemented? To answer this question I'm going to describe them from two perspectives. One will be how it is being done by a well-established extension - Azure Cosmos DB bindings for Azure Functions. The other will be a sample implementation in my extension that I've used previously while writing on Azure Functions extensibility - RethinkDB bindings for Azure Functions.

But first things first. Before we can talk about the actual logic driving the scaling behaviors, we need the right SDK and the right classes in place.

Runtime and Target-Based Scaling Support in Azure WebJobs SDK

As I've mentioned above, support for extensions to participate in the scaling model has been introduced gradually to the Azure WebJobs SDK. At the same time, the team makes an effort to introduce changes in a backward-compatible manner. Support for voting in scaling decisions came in version 3.0.14 and for target-based scaling in 3.0.36, but you can have an extension that is using version 3.0.0 and it will work on the latest Azure Functions runtime. You can also have an extension that uses the latest SDK and doesn't implement the scalability mechanisms. You need to know that they are there and choose to utilize them.

This is reflected in the official extensions as well. The Azure Cosmos DB extension has implemented support for runtime scaling in version 3.0.5 and target-based scaling in 4.1.0 (you can find some tables that are gathering version numbers, for example in the section about virtual network triggers). This means, that if your function app is using lower versions of extensions, it won't benefit from these capabilities.

So, regardless if you're implementing your extension or you're just using an existing one, the versions of dependencies matter. However, if you're implementing your extension, you will need some classes.

Classes Needed to Implement Scaling Support

The Azure Functions scaling model revolves around triggers. After all, it's their responsibility to handle the influx of events. As you may know (for example from my previous post 😉), the heart of a trigger is a listener. This is where all the heavy lifting is being done and this is where we can inform the runtime that our trigger supports scalability features. We can do so by implementing two interfaces: IScaleMonitorProvider and ITargetScalerProvider. They are respectively tied to runtime scaling and target-based scaling. If they are implemented, the runtime will use the properties they define to obtain the actual logic implementations.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly IScaleMonitor<RethinkDbTriggerMetrics> _rethinkDbScaleMonitor;
    private readonly ITargetScaler _rethinkDbTargetScaler;

    ...

    public IScaleMonitor GetMonitor()
    {
        return _rethinkDbScaleMonitor;
    }

    public ITargetScaler GetTargetScaler()
    {
        return _rethinkDbTargetScaler;
    }
}

From the snippets above you can notice one more class - RethinkDbTriggerMetrics. It derives from the ScaleMetrics class and is used for capturing the values on which the decisions are being made.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    ...
}

The IScaleMonitor implementation contributes to the runtime scaling part of the scaling model. Its responsibilities are to provide snapshots of the metrics and to vote in the scaling decisions.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    public ScaleMonitorDescriptor Descriptor => ...;

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        ...
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        ...
    }
}

The ITargetScaler implementation is responsible for the target-based scaling. It needs to focus only on one thing - calculating the desired worker count.

internal class RethinkDbTargetScaler : ITargetScaler
{
    public TargetScalerDescriptor TargetScalerDescriptor => ...;

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        ...
    }
}

As both of these implementations are tightly tied with a specific trigger, it is typical to instantiate them in the listener constructor.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(...);
        _rethinkDbTargetScaler = new RethinkDbTargetScaler(...);
    }

    ...
}

We are almost ready to start discussing the details of scaling logic, but before we do that, we need to talk about gathering metrics. The decisions need to be based on something.

Gathering Metrics

Yes, the decisions need to be based on something and the quality of the decisions will depend on the quality of metrics you can gather. So a lot depends on how well the service, for which the extension is being created, is prepared to be monitored.

Azure Cosmos DB is well prepared. The Azure Cosmos DB bindings for Azure Functions implement the trigger on top of the Azure Cosmos DB change feed processor and use the change feed estimator to gather the metrics. This gives access to quite an accurate estimate of the remaining work. As an additional metric, the extension is gathers the number of leases for the container that the trigger is observing.

RethinkDB is not so well prepared. It seems like change feed provides only one metric - buffered items count.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    public int BufferedItemsCount { get; set; }
}

Also, the metric can only be gathered while iterating the change feed. This forces an intermediary between the listener and the scale monitor (well you could use the scale monitor directly, but it seemed ugly to me).

internal class RethinkDbMetricsProvider
{
    public int CurrentBufferedItemsCount { get; set; }

    public RethinkDbTriggerMetrics GetMetrics()
    {
        return new RethinkDbTriggerMetrics { BufferedItemsCount = CurrentBufferedItemsCount };
    }
}

Now the same instance of this intermediary can be used by the listener to provide the value.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbMetricsProvider = new RethinkDbMetricsProvider();
        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(..., _rethinkDbMetricsProvider);

        ...
    }

    ...

    private async Task ListenAsync(CancellationToken listenerStoppingToken)
    {
        ...

        while (!listenerStoppingToken.IsCancellationRequested
               && (await changefeed.MoveNextAsync(listenerStoppingToken)))
        {
            _rethinkDbMetricsProvider.CurrentBufferedItemsCount = changefeed.BufferedItems.Count;
            ...
        }

        ...
    }
}

And the scale monitor to read it.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbScaleMonitor(..., RethinkDbMetricsProvider rethinkDbMetricsProvider)
    {
        ...

        _rethinkDbMetricsProvider = rethinkDbMetricsProvider;

        ...
    }

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        return Task.FromResult(_rethinkDbMetricsProvider.GetMetrics());
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        return Task.FromResult((ScaleMetrics)_rethinkDbMetricsProvider.GetMetrics());
    }

    ...
}

We have the metrics now, it is finally time to make some decisions.

Voting to Scale In or Scale Out

An extension can cast one of three votes: to scale out, to scale in, or neutral. This is where the intimate knowledge of the service that the extension is created for comes into play.

As I've already written, the Azure Cosmos DB change feed processor is well prepared to gather metrics about it. It is also well prepared to be consumed in a scalable manner. It can distribute the work among multiple compute instances, by balancing the number of leases owned by each compute instance. It will also adjust the number of leases based on throughput and storage. This is why the Azure Cosmos DB bindings are tracking the number of leases as one of the metrics - it's an upper limit for the number of workers. So the extension utilizes all this knowledge and employs the following algorithm to cast a vote:

If there are no metrics gathered yet, cast a neutral vote.
If the current number of leases is greater than the number of workers, cast a scale-in vote.
If there are less than five metrics samples, cast a neutral vote.
If the ratio of workers to remaining items is less than 1 per 1000, cast a scale-out vote.
If there are constantly items waiting to be processed and the number of workers is smaller than the number of leases, cast a scale-out vote.
If the trigger source has been empty for a while, cast a scale-in vote.
If there has been a continuous increase across the last five samples in items to be processed, cast a scale-out vote.
If there has been a continuous decrease across the last five samples in items to be processed, cast a scale-in vote.
If none of the above has happened, cast a neutral vote.

RethinkDB is once again lacking here. It looks like its change feed is not meant to be processed in parallel at all. This leads to an interesting edge case, where we never want to scale beyond a single instance.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        return GetScaleStatus(
            context.WorkerCount,
            context.Metrics?.ToArray()
        );
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        return GetScaleStatus(
            context.WorkerCount, 
            context.Metrics?.Cast<RethinkDbTriggerMetrics>().ToArray()
        );
    }

    private ScaleStatus GetScaleStatus(int workerCount, RethinkDbTriggerMetrics[] metrics)
    {
        ScaleStatus status = new ScaleStatus
        {
            Vote = ScaleVote.None
        };

        // RethinkDB change feed is not meant to be processed in paraller.
        if (workerCount > 1)
        {
            status.Vote = ScaleVote.ScaleIn;

            return status;
        }

        return status;
    }
}

Requesting Desired Number of Workers

Being able to tell if you want more workers or less is great, being able to tell how many workers you want is even better. Of course, there is no promise the extension will get the requested number (even in the case of target-based scaling the scaling happens with a maximum rate of four instances at a time), but it's better than increasing and decreasing by one instance. Extensions can also use this mechanism to participate in dynamic concurrency.

This is exactly what the Azure Cosmos DB extension is doing. It divides the number of remaining items by the value of the MaxItemsPerInvocation trigger setting (the default is 100). The result is capped by the number of leases and that's the desired number of workers.

We already know that in the case of RethinkDB, it's even simpler - we always want just one worker.

internal class RethinkDbTargetScaler : ITargetScaler
{
    ...

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        return Task.FromResult(new TargetScalerResult
        {
            TargetWorkerCount = 1
        });
    }
}

That's it. The runtime scaling implementation is now complete.

Consequences of Runtime Scaling Design

Creating extensions for Azure Functions is not something commonly done. Still, there is a value in understanding their anatomy and how they work as it is often relevant also when you are creating function apps.

The unit of scale for Azure Functions is the entire functions app. At the same time, the scaling votes and the desired number of workers are delivered at a trigger level. This means that you need to be careful when creating function apps with multiple triggers. If the triggers will be aiming for completely different scaling decisions it may lead to undesired scenarios. For example, one trigger may constantly want to scale in, while another will want to scale out. As you could notice throughout this post, some triggers have hard limits on how far they can scale to not waste resources (even two Azure Cosmos DB triggers may have a different upper limit because the lease containers they are attached to will have different numbers of leases available). This all should be taken into account while designing the function app and trying to foresee how it will scale.

Experimenting With .NET & WebAssembly - Running .NET Based Slight Application On WASM/WASI Node Pool in AKS

January 9, 2024 .net, aks, azure, kubernetes, slight, wasi, wasm, webassembly 0 comments

A year ago I wrote about running a .NET based Spin application on a WASI node pool in AKS. Since then the support for WebAssembly in AKS hasn't changed. We still have the same preview, supporting the same versions of two ContainerD shims: Spin and Slight (a.k.a. SpiderLightning). So why am I coming back to the subject? The broader context has evolved.

With WASI preview 2, the ecosystem is embracing the component model and standardized APIs. When I was experimenting with Spin, I leveraged WAGI (WebAssembly Gateway Interface) which allowed me to be ignorant about the Wasm runtime context. Now I want to change that, cross the barrier and dig into direct interoperability between .NET and Wasm.

Also, regarding the mentioned APIs, one of the emerging ones is wasi-cloud-core which aims to provide a generic way for WASI applications to interact with services. This proposal is not yet standardized but it has an experimental host implementation which happens to be Slight. By running a .NET based Slight application I want to get a taste of what that API might bring.

Last but not least, .NET 8 has brought a new way of building .NET based Wasm applications with a wasi-experimental workload. I want to build something with it and see where the .NET support for WASI is heading.

So, this "experiment" has multiple angles and brings together a bunch of different things. How am I going to start? In the usual way, by creating a project.

Creating a .NET 8 `wasi-wasm` Project

The wasi-experimental workload is optional, so we need to install it before we can create a project.

dotnet workload install wasi-experimental

It also doesn't bundle the WASI SDK (the WASI SDK for .NET 7 did), so we have to install it ourselves. The releases of WASI SDK available on GitHub contain binaries for different platforms. All you need to do is download the one appropriate for yours, extract it, and create the WASI_SDK_PATH environment variable pointing to the output. The version I'll be using here is 20.0.

With the prerequisites in place, we can create the project.

dotnet new wasiconsole -o Demo.Wasm.Slight

Now, if you run dotnet build and inspect the output folder, you can notice that it contains Demo.Wasm.Slight.dll, other managed DLLs, and dotnet.wasm. This is the default output, where the dotnet.wasm is responsible for loading the Mono runtime and then loading the functionality from DLLs. This is not what we want. We want a single file. To achieve that we need to modify the project file by adding the WasmSingleFileBundle property (in my opinion this should be the default).

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    ...
    <WasmSingleFileBundle>true</WasmSingleFileBundle>
  </PropertyGroup>
</Project>

If you run dotnet build after this modification, you will find Demo.Wasm.Slight.wasm in the output. Exactly what we want.

But, before we can start implementing the actual application, we need to work on the glue between the .NET and WebAssembly so we can interact with the APIs provided by Slight from C#.

From WebAssembly IDL To C#

The imports and exports in WASI APIs are described in Wasm Interface Type (WIT) format. This format is an IDL which is a foundation behind tooling for the WebAssembly Component Model.

WIT aims at being a developer-friendly format, but writing bindings by hand is not something that developers expect. This is where wit-bindgen comes into the picture. It's a (still young) binding generator for languages that are compiled into WebAssembly. It currently supports languages like Rust, C/C++, or Java. The C# support is being actively worked on and we can expect that at some point getting C# bindings will be as easy as running a single command (or even simpler as Steve Sanderson is already experimenting with making it part of the toolchain) but for now, it's too limited and we will have to approach things differently. What we can use is the C/C++ support.

There is one more challenge on our way. The current version of wit-bindgen is meant for WASI preview 2. Meanwhile, a lot of existing WIT definitions and WASI tooling around native languages are using WASI preview 1. This is exactly the case when it comes to the Slight implementation available in the AKS preview. To handle that we need an old (and I mean old) version of wit-bindgen. I'm using version v0.2.0. Once you install it, you can generate C/C++ bindings for the desired imports and exports. Slight in version 0.1.0 provides a couple of capabilities, but I've decided to start with just one, the HTTP Server. That means we need imports for http.wit and exports for http-handler.wit.

wit-bindgen c --import ./wit/http.wit --out-dir ./native/
wit-bindgen c --export ./wit/http-handler.wit --out-dir ./native/

Once we have the C/C++ bindings, we can configure the build to include those files in the arguments passed to Clang. For this purpose, we can use the .targets file.

<Project>
  <ItemGroup>
    <_WasmNativeFileForLinking Include="$(MSBuildThisFileDirectory)\..\native\*.c" />
  </ItemGroup>
</Project>

We also need to implement the C# part of the interop. For the main part, it's like any other interop you might have ever done. You can use generators or ask for help from a friendly AI assistant and it will get you through the code for types and functions. However, there is one specific catch. The DllImportAttribute requires passing a library name for the P/Invoke generator, but we have no library. The solution is as simple as surprising (at least to me), we can provide the name of any library that the P/Invoke generator knows about.

internal static class HttpServer
{
    // Any library name that P/Invoke generator knows
    private const string LIBRARY_NAME = "libSystem.Native";

    [DllImport(LIBRARY_NAME)]
    internal static extern unsafe void http_server_serve(WasiString address,
        uint httpRouterIndex, out WasiExpected<uint> ret0);

    [DllImport(LIBRARY_NAME)]
    internal static extern unsafe void http_server_stop(uint httpServerIndex,
        out WasiExpected<uint> ret0);
}

With all the glue in place, we can start implementating the HTTP server capability.

Implementing the Slight HTTP Server Capability

In order for a Slight application to start accepting HTTP requests, the application needs to call the http_server_serve function to which it must provide the address it wants to listen on and the index of a router that defines the supported routes. I've decided to roll out to simplest implementation I could think of to start testing things out - the HttpRouter and HttpServer classes which only allow for calling the serve function (no support for routing).

internal class HttpRouter
{
    private uint _index;

    public uint Index => _index;

    private HttpRouter(uint index)
    {
        _index = index;
    }

    public static HttpRouter Create()
    {
        http_router_new(out WasiExpected<uint> expected);

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        return new HttpRouter(expected.Result.Value);
    }
}

internal static class HttpServer
{
    private static uint? _index;

    public static void Serve(string address)
    {
        if (_index.HasValue)
        {
            throw new Exception("The server is already running!");
        }

        HttpRouter router = HttpRouter.Create();
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        _index = expected.Result;
    }
}

This allowed me to implement a simplistic application, a one-liner.

HttpServer.Serve("0.0.0.0:80");

It worked! Every request results in 404, because there is no routing, but it worked. So, how to add support for routing?

The http_router_* functions for defining routes expect two strings - one for the route and one for the handler. This suggests that the handler should be an exported symbol that Slight will be able to call. I went through the bindings exported for http-handler.wit and I've found a function that is being exported as handle-http. That function seems to be what we are looking for. It performs transformations to/from the request and response objects and calls a http_handler_handle_http function which has only a definition. So it looks like the http_handler_handle_http implementation is a place for the application logic. To test this theory, I've started by implementing a simple route registration method.

internal class HttpRouter
{
    ...

    private static readonly WasiString REQUEST_HANDLER = WasiString.FromString("handle-http");

    ...

    public HttpRouter RegisterRoute(HttpMethod method, string route)
    {
        WasiExpected<uint> expected;

        switch (method)
        {
            case HttpMethod.GET:
                http_router_get(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.PUT:
                http_router_put(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.POST:
                http_router_post(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.DELETE:
                http_router_delete(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            default:
                throw new NotSupportedException($"Method {method} is not supported.");
        }

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        return new HttpRouter(expected.Result.Value)
    }

    ...
}

Next I've registered some catch-all routes and implemented the HandleRequest method as the .NET handler. It would still return a 404, but it would be mine 404.

internal static class HttpServer
{
    ...

    public static void Serve(string address)
    {
        ...

        HttpRouter router = HttpRouter.Create()
                            .RegisterRoute(HttpMethod.GET, "/")
                            .RegisterRoute(HttpMethod.GET, "/*");
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        ...
    }

    private static unsafe void HandleRequest(ref HttpRequest request,
        out WasiExpected<HttpResponse> result)
    {
        HttpResponse response = new HttpResponse(404);
        response.SetBody($"Handler Not Found ({request.Method} {request.Uri.AbsolutePath})");

        result = new WasiExpected<HttpResponse>(response);
    }

    ...
}

It's time for some C. I went through the Mono WASI C driver and found two functions that looked the right tools for the job: lookup_dotnet_method and mono_wasm_invoke_method_ref. The implementation didn't seem overly complicated.

#include <string.h>
#include <wasm/driver.h>
#include "http-handler.h"

MonoMethod* handle_request_method;

void mono_wasm_invoke_method_ref(MonoMethod* method, MonoObject** this_arg_in,
                                 void* params[], MonoObject** _out_exc, MonoObject** out_result);

void http_handler_handle_http(http_handler_request_t* req,
                              http_handler_expected_response_error_t* ret0)
{
    if (!handle_request_method)
    {
        handle_request_method = lookup_dotnet_method(
            "Demo.Wasm.Slight",
            "Demo.Wasm.Slight",
            "HttpServer",
            "HandleRequest",
        -1);
    }

    void* method_params[] = { req, ret0 };
    MonoObject* exception;
    MonoObject* result;
    mono_wasm_invoke_method_ref(handle_request_method, NULL, method_params, &exception, &result);
}

But it didn't work. The thrown exception suggested that the Mono runtime wasn't loaded. I went back to studying Mono to learn how it is being loaded. What I've learned is that during compilation a _start() function is being generated. This function performs the steps necessary to load the Mono runtime and wraps the entry point to the .NET code. I could call it, but this would mean going through the Main method and retriggering HttpServer.Serve, which was doomed to fail. I needed to go a level lower. By reading the code of the _start() function I've learned that it calls the mono_wasm_load_runtime function. Maybe I could as well?

...

int mono_runtime_loaded = 0;

...

void http_handler_handle_http(http_handler_request_t* req,
                              http_handler_expected_response_error_t* ret0)
{
    if (!mono_runtime_loaded) {
        mono_wasm_load_runtime("", 0);

        mono_runtime_loaded = 1;
    }

    ...
}

Now it worked. But I wasn't out of the woods yet. What I've just learned meant that to provide dedicated handlers for routes I couldn't rely on registering dedicated methods as part of the Main method flow. I could only register the routes and the handlers needed to be discoverable later, in a new context, with static HandleRequest as the entry point. My thoughts went in the direction of a poor man's attribute-based routing, so I've started with an attribute for decorating handlers.

internal class HttpHandlerAttribute: Attribute
{
    public HttpMethod Method { get; }

    public string Route { get; }

    public HttpHandlerAttribute(HttpMethod method, string route)
    {
        Method = method;
        Route = route;
    }

    ...
}

A poor man's implementation of an attribute-based routing must have an ugly part and it is reflection. To register the routes (and later match them with handlers), the types must be scanned for methods with the attributes. In a production solution, it would be necessary to narrow the scan scope but as this is just a small demo I've decided to keep it simple and scan the whole assembly for static, public and non-public, methods decorated with the attribute. The code supports adding multiple attributes to a single method just because it's simpler than putting proper protections in place. As you can probably guess, I did override the Equals and GetHashCode implementations in the attribute to ensure it behaves nicely as a dictionary key.

internal class HttpRouter
{
    ...

    private static readonly Type HTTP_HANDLER_ATTRIBUTE_TYPE = typeof(HttpHandlerAttribute);

    private static Dictionary<HttpHandlerAttribute, MethodInfo>? _routes;

    ...

    private static void DiscoverRoutes()
    {
        if (_routes is null)
        {
            _routes = new Dictionary<HttpHandlerAttribute, MethodInfo>();

            foreach (Type type in Assembly.GetExecutingAssembly().GetTypes())
            {
                foreach(MethodInfo method in type.GetMethods(BindingFlags.Static |
                                                             BindingFlags.Public |
                                                             BindingFlags.NonPublic))
                {
                    foreach (object attribute in method.GetCustomAttributes(
                             HTTP_HANDLER_ATTRIBUTE_TYPE, false))
                    {
                        _routes.Add((HttpHandlerAttribute)attribute, method);
                    }
                }
            }
        }
    }

    ...
}

With the reflection stuff (mostly) out of the way, I could implement a method that can be called to register all discovered routes and a method to invoke a handler for a route. This implementation is not "safe". I don't do any checks on the reflected MethodInfo to ensure that the method has a proper signature. After all, I can only hurt myself here.

internal class HttpRouter
{
    ...

    internal HttpRouter RegisterRoutes()
    {
        DiscoverRoutes();

        HttpRouter router = this;
        foreach (KeyValuePair<HttpHandlerAttribute, MethodInfo> route in _routes)
        {
            router = router.RegisterRoute(route.Key.Method, route.Key.Route);
        }

        return router;
    }

    internal static HttpResponse? InvokeRouteHandler(HttpRequest request)
    {
        DiscoverRoutes();

        HttpHandlerAttribute attribute = new HttpHandlerAttribute(request.Method,
                                                                  request.Uri.AbsolutePath);
        MethodInfo handler = _routes.GetValueOrDefault(attribute);

        return (handler is null) ? null : (HttpResponse)handler.Invoke(null,
                                                                     new object[] { request });
    }

    ...
}

What remained was small modifications to the HttpServer to use the new methods.

internal static class HttpServer
{
    ...

    public static void Serve(string address)
    {
        ...

        HttpRouter router = HttpRouter.Create()
                            .RegisterRoutes();
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        ...
    }

    private static unsafe void HandleRequest(ref HttpRequest request,
        out WasiExpected<HttpResponse> result)
    {
        HttpResponse? response = HttpRouter.InvokeRouteHandler(request);

        if (!response.HasValue)
        {
            response = new HttpResponse(404);
            response.Value.SetBody(
                $"Handler Not Found ({request.Method} {request.Uri.AbsolutePath})"
            );
        }

        result = new WasiExpected<HttpResponse>(response.Value);
    }

    ...
}

To test this out, I've created two simple handlers.

[HttpHandler(HttpMethod.GET, "/hello")]
internal static HttpResponse HandleHello(HttpRequest request)
{
    HttpResponse response = new HttpResponse(200);
    response.SetHeaders(new[] { KeyValuePair.Create("Content-Type", "text/plain") });
    response.SetBody($"Hello from Demo.Wasm.Slight!");

    return response;
}

[HttpHandler(HttpMethod.GET, "/goodbye")]
internal static HttpResponse HandleGoodbye(HttpRequest request)
{
    HttpResponse response = new HttpResponse(200);
    response.SetHeaders(new[] { KeyValuePair.Create("Content-Type", "text/plain") });
    response.SetBody($"Goodbye from Demo.Wasm.Slight!");

    return response;
}

That's it. Certainly not complete, certainly not optimal, most likely buggy, and potentially leaking memory. But it works and can be deployed to the cloud.

Running a Slight Application in WASM/WASI Node Pool

To deploy our Slight application to the cloud, we need an AKS Cluster with a WASM/WASI node pool. The process of setting it up hasn't changed since my previous post and you can find all the necessary steps there. Here we can start with dockerizing our application.

As we are dockerizing a Wasm application, the final image should be from scratch. In the case of a Slight application, it should contain two elements: app.wasm and slightfile.toml. The slightfile.toml is a configuration file and its main purpose is to define and provide options for the capabilities needed by the application. In our case, that's just the HTTP Server capability.

specversion = "0.1"

[[capability]]
name = "http"

The app.wasm file is our application. It should have this exact name and be placed at the root of the image. To be able to publish the application, the build stage in our Dockerfile must install the same prerequisites as we did for local development (WASI SDK and wasi-experimental workload).

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build

RUN curl https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz -L --output wasi-sdk-20.0-linux.tar.gz
RUN tar -C /usr/local/lib -xvf wasi-sdk-20.0-linux.tar.gz
ENV WASI_SDK_PATH=/usr/local/lib/wasi-sdk-20.0

RUN dotnet workload install wasi-experimental

WORKDIR /src
COPY . .
RUN dotnet publish --configuration Release

FROM scratch

COPY --from=build /src/bin/Release/net8.0/wasi-wasm/AppBundle/Demo.Wasm.Slight.wasm ./app.wasm
COPY --from=build /src/slightfile.toml .

With the infrastructure in place and the image pushed to the container registry, all that is needed is a deployment manifest for Kubernetes resources. It is the same as it was for a Spin application, the only difference is the kubernetes.azure.com/wasmtime-slight-v1 node selector.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: "wasmtime-slight-v1"
handler: "slight"
scheduling:
  nodeSelector:
    "kubernetes.azure.com/wasmtime-slight-v1": "true"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: slight-with-dotnet-8
spec:
  replicas: 1
  selector:
    matchLabels:
      app: slight-with-dotnet-8
  template:
    metadata:
      labels:
        app: slight-with-dotnet-8
    spec:
      runtimeClassName: wasmtime-slight-v1
      containers:
        - name: slight-with-dotnet-8
          image: crdotnetwasi.azurecr.io/slight-with-dotnet-8:latest
          command: ["/"]
---
apiVersion: v1
kind: Service
metadata:
  name: slight-with-dotnet-8
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: slight-with-dotnet-8
  type: LoadBalancer

After applying the above manifest, you can get the service IP with kubectl get svc or from the Azure Portal and execute some requests.

WebAssembly, Cloud, .NET, and Future?

Everything I toyed with here is either a preview or experimental, but it's a glimpse into where WebAssembly in the Cloud is heading.

If I dare to make a prediction, I think that when the wasi-cloud-core proposal reaches a mature enough phase, we will see it supported on AKS (probably it will replace the Spin and Slight available in the current preview). The support for WASI in .NET will also continue to evolve and we will see a non-experimental SDK once the specification gets stable.

For now, we can keep experimenting and exploring. If you want to kick-start your own exploration with what I've described in this post, the source code is here.

Postscript

Yes, I know that this post is already a little bit long, but I wanted to mention one more thing.

When I was wrapping this post, I read the "Extending WebAssembly to the Cloud with .NET" and learned that Steve Sanderson has also built some Slight samples. Despite that, I've decided to publish this post. I had two main reasons for that. First,I dare to think that there is valuable knowledge in this dump of thoughts of mine. Second, those samples take a slightly different direction than mine, and I believe that the fact that you can arrive at different destinations from the same origin is one of the beautiful aspects of software development - something worth sharing.

Azure Functions Integration Testing With Testcontainers

October 31, 2023 azure functions, integration testing, testcontainers 0 comments

As developers, we want reassurance that our code functions as expected. We also want to be given that reassurance as fast as possible. This is why we are writing automated tests. We also desire for those tests to be easy to run on our machine or in the worst-case scenario on the build agent as part of the continuous integration pipeline. This is something that can be challenging to achieve when it comes to Azure Functions.

Depending on the language we are using for our Azure Functions, the challenges can be different. Let's take .NET (I guess I haven't surprised anyone with that choice) as an example. In the case of .NET, we can write any unit tests we want (I will deliberately avoid trying to define what is that unit). But the moment we try to move to integration tests, things get tricky. If our Azure Function is using the in-process model, we have an option of crafting a system under test based on the WebJobs host which will be good enough for some scenarios. If our Azure Function is using the isolated worker model there are only two options: accept that our tests will integrate only to a certain level and implement Test Doubles or wait for the Azure Functions team to implement a test worker. This is all far from perfect.

To work around at least some of the above limitations, I've adopted with my teams a different approach for Azure Functions integration testing - we've started using Testcontainers. Testcontainers is a framework for defining through code throwaway, lightweight instances of containers, to be used in test context.

We initially adopted this approach for .NET Azure Functions, but I know that teams creating Azure Functions in different languages also started using it. This is possible because the approach is agnostic to the language in which the functions are written (and the tests can be written using any language/framework supported by Testcontainers).

In this post, I want to share with you the core parts of this approach. It starts with creating a Dockerfile for your Azure Functions.

Creating a Dockerfile for an Azure Functions Container Image

You may already have a Dockerfile for your Azure Functions (for example if you decided to host them in Azure Container Apps or Kubernetes). From my experience, that's usually not the case. That means you need to create a Dockerfile. There are two options for doing that. You can use Azure Functions Core Tools and call func init with the --docker-only option, or you can create the Dockerfile manually. The Dockerfile is different for every language, so until you gain experience I suggest using the command. Once you are familiar with the structure, you will very likely end up with a modified template that you will be reusing with small adjustments. The one below is my example for .NET Azure Functions using the isolated worker model.

FROM mcr.microsoft.com/dotnet/sdk:7.0 AS installer-env
ARG RESOURCE_REAPER_SESSION_ID="00000000-0000-0000-0000-000000000000"
LABEL "org.testcontainers.resource-reaper-session"=$RESOURCE_REAPER_SESSION_ID

WORKDIR /src
COPY function-app/ ./function-app/

RUN dotnet publish function-app \
    --output /home/site/wwwroot

FROM mcr.microsoft.com/azure-functions/dotnet-isolated:4-dotnet-isolated7.0

ENV AzureWebJobsScriptRoot=/home/site/wwwroot \
    AzureFunctionsJobHost__Logging__Console__IsEnabled=true

COPY --from=installer-env ["/home/site/wwwroot", "/home/site/wwwroot"]

What's probably puzzling you right now is that label based on the provided argument. This is something very specific to Testcontainers. The above Dockerfile is for a multi-stage build, so it will generate intermediate layers. Testcontainers has a concept of Resource Reaper which job is to remove Docker resources once they are no longer needed. This label is needed for the Resource Reaper to be able to track those intermediate layers.

Once we have the Dockerfile we can create the test context.

Creating a Container Instance in Test Context

The way you create the test context depends on the testing framework you are going to use and the isolation strategy you want for that context. My framework of choice is xUnit. When it comes to the isolation strategy, it depends 😉. That said, the one I'm using most often is test class. For xUnit that translates to class fixture. You can probably guess that there are also requirements when it comes to the context lifetime management. After all, we will be spinning containers and that takes time. That's why the class fixture must implement IAsyncLifetime to provide support for asynchronous operations.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    ...

    public AzureFunctionsTestcontainersFixture()
    { 
        ...
    }

    public async Task InitializeAsync()
    {
        ...
    }

    public async Task DisposeAsync()
    {
        ...
    }
}

There are a couple of things that we need to do here. The first is creating an image based on our Dockerfile. For this purpose, we can use ImageFromDockerfileBuilder. The minimum we need to provide is the location of the Dockerfile (directory and file name). Testcontainers provides us with some handy helpers for getting the solution, project, or Git directory. We also want to set that RESOURCE_REAPER_SESSION_ID argument.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    private readonly IFutureDockerImage _azureFunctionsDockerImage;

    public AzureFunctionsTestcontainersFixture()
    {
        _azureFunctionsDockerImage = new ImageFromDockerfileBuilder()
            .WithDockerfileDirectory(CommonDirectoryPath.GetSolutionDirectory(), String.Empty)
            .WithDockerfile("AzureFunctions-Testcontainers.Dockerfile")
            .WithBuildArgument(
                 "RESOURCE_REAPER_SESSION_ID",
                 ResourceReaper.DefaultSessionId.ToString("D"))
            .Build();
    }

    public async Task InitializeAsync()
    {
        await _azureFunctionsDockerImage.CreateAsync();

        ...
    }

    ...
}

With the image in place, we can create a container instance. This will require reference to the image, port binding, and wait strategy. Port binding is something that Testcontainers can almost completely handle for us. We just need to tell which port to bind and the host port can be assigned randomly. The wait strategy is quite important. This is how the framework knows that the container instance is available. We have a lot of options here: port availability, specific message in log, command completion, file existence, successful request, or HEALTHCHECK. What works great for Azure Functions is a successful request to its default page.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    private readonly IFutureDockerImage _azureFunctionsDockerImage;

    public IContainer AzureFunctionsContainerInstance { get; private set; }

    ...

    public async Task InitializeAsync()
    {
        await _azureFunctionsDockerImage.CreateAsync();

        AzureFunctionsContainerInstance = new ContainerBuilder()
            .WithImage(_azureFunctionsDockerImage)
            .WithPortBinding(80, true)
            .WithWaitStrategy(
                Wait.ForUnixContainer()
                .UntilHttpRequestIsSucceeded(r => r.ForPort(80)))
            .Build();
        await AzureFunctionsContainerInstance.StartAsync();
    }

    ...
}

The last missing part is the cleanup. We should nicely dispose the container instance and the image.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    private readonly IFutureDockerImage _azureFunctionsDockerImage;

    public IContainer AzureFunctionsContainerInstance { get; private set; }

    ...

    public async Task DisposeAsync()
    {
        await AzureFunctionsContainerInstance.DisposeAsync();

        await _azureFunctionsDockerImage.DisposeAsync();
    }
}

Now we are ready to write some tests.

Implementing Integration Tests

At this point, we can start testing our function. We need a test class using our class fixture.

public class AzureFunctionsTests : IClassFixture<AzureFunctionsTestcontainersFixture>
{
    private readonly AzureFunctionsTestcontainersFixture _azureFunctionsTestcontainersFixture;

    public AzureFunctions(AzureFunctionsTestcontainersFixture azureFunctionsTestcontainersFixture)
    {
        _azureFunctionsTestcontainersFixture = azureFunctionsTestcontainersFixture;
    }

    ...
}

Now for the test itself, let's assume that the function has an HTTP trigger. To build the URL of our function we can use the Hostname provided by the container instance and acquire the host port by calling .GetMappedPublicPort. This means that the test only needs to create an instance of HttpClient, make a request, and assert the desired aspects of the response. The simplest test I could think of was to check for a status code indicating success.

public class AzureFunctionsTests : IClassFixture<AzureFunctionsTestcontainersFixture>
{
    private readonly AzureFunctionsTestcontainersFixture _azureFunctionsTestcontainersFixture;

    ...

    [Fact]
    public async Task Function_Request_ReturnsResponseWithSuccessStatusCode()
    {
        HttpClient httpClient = new HttpClient();
        var requestUri = new UriBuilder(
            Uri.UriSchemeHttp,
            _azureFunctionsTestcontainersFixture.AzureFunctionsContainerInstance.Hostname,
            _azureFunctionsTestcontainersFixture.AzureFunctionsContainerInstance.GetMappedPublicPort(80),
            "api/function"
        ).Uri;

        HttpResponseMessage response = await httpClient.GetAsync(requestUri);

        Assert.True(response.IsSuccessStatusCode);
    }
}

And Voila. This will run on your machine (assuming you have Docker) and in any CI/CD environment which build agents have Docker pre-installed (for example Azure DevOps or GitHub).

Adding Dependencies

What I've shown you so far covers the scope of the function itself. This is already beneficial because it allows for verifying if dependencies are registered properly or if the middleware pipeline behaves as expected. But Azure Functions rarely exist in a vacuum. There are almost always dependencies and Testcontainers can help us with those dependencies as well. There is a wide set of preconfigured implementations that we can add to our test context. A good example can be storage. In the majority of cases, storage is required to run the function itself. For local development, Azure Functions are using the Azurite emulator and we can do the same with Testcontainers as it is available as a ready-to-use module. To add it to the context you just need to reference the proper NuGet package and add a couple of lines of code.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    ...

    public AzuriteContainer AzuriteContainerInstance { get; private set; }

    ...

    public async Task InitializeAsync()
    {
        AzuriteContainerInstance = new AzuriteBuilder().Build();
        await AzuriteContainerInstance.StartAsync();

        ...
    }

    public async Task DisposeAsync()
    {
        ...

        await AzuriteContainerInstance.DisposeAsync();
    }
}

We also need to point Azure Functions to use this Azurite container by setting the AzureWebJobsStorage parameter.

public class AzureFunctionsTestcontainersFixture : IAsyncLifetime
{
    ...

    public async Task InitializeAsync()
    {
        ...

        AzureFunctionsContainerInstance = new ContainerBuilder()
            ...
            .WithEnvironment("AzureWebJobsStorage", AzuriteContainerInstance.GetConnectionString())
            ...
            .Build();

        ...
    }

    ...
}

That's it. Having Azurite in place also enables testing functions that use triggers and bindings based on Azure Storage. There are also ready-to-use modules for Redis, Azure Cosmos DB, Azure SQL Edge, MS SQL, Kafka, or RabbitMQ. So there is quite good out-of-the-box coverage for potential Azure Functions dependencies. Some other dependencies can be covered by creating containers yourself (for example with an unofficial Azure Event Grid simulator). That said, some dependencies can be only satisfied by a real thing (at least for now).

A Powerful Tool in Your Toolbox

Is Testcontainers a solution for every integration problem - no. Should Testcontainers be your default choice when thinking about integration tests - also no. But it is a very powerful tool and you should be familiar with it, so you can use it when appropriate.

Revisiting Various Change Feeds Consumption in .NET

October 10, 2023 .net, azure storage, change feed, cosmos db, mongodb 0 comments

The first time I wrote about change feeds consumption from .NET (ASP.NET Core to be more precise) was back in 2018 in the context of RethinkDB. It was always a very powerful concept. Having access to an ordered flow of information about changes to items is a low-entry enabler for various event-driven, stream processing, or data movement scenarios. As a result, over the years, this capability (with various name variations around the words change, stream, and feed) has found its way to many databases and sometimes even other storage services. The list includes (but is not limited to) MongoDB, RavenDB, Cosmos DB, DynamoDB or Azure Blob Storage (in preview).

As I was cleaning up and updating a demo application that shows how to consume and expose various change feeds from ASP.NET Core, I decided to write down some notes to refresh the content from my previous posts.

IAsyncEnumerable as Universal Change Feed Abstraction

When I started working with change feeds over 5 years ago, I initially didn't put them behind any abstraction. I like to think that I was smart and avoided premature generalization. The abstraction came after a couple of months when I could clearly see that I was implementing the same concepts through similar components in different projects where teams were using RethinkDB, MongoDB, or Cosmos DB. The abstraction that I started advocating back then looked usually like this.

public interface IChangeFeed<T>
{
    T CurrentChange { get; }

    Task<bool> MoveNextAsync(CancellationToken cancelToken = default(CancellationToken));
}

In retrospect, I'm happy with this abstraction, because around two or more years later, when those teams and projects started to adopt C# 8 and .NET Core 3 (or later versions), refactoring all those implementations was a lot easier. C# 8 has brought async streams, a natural programming model for asynchronous streaming data sources. Asynchronous streaming data source is exactly what change feeds are and modeling them through IAsyncEnumerable results in nice and clean consumption patterns. This is why currently I advocate for using IAsyncEnumerable as a universal change feed abstraction. The trick to properly using that abstraction is defining the right change representation to be returned. That should depend on change feed capabilities and actual needs in a given context. Not all change feeds are the same. Some of them can provide information on all operations performed on an item, and some only on a subset. Some can provide old and new value, and some only the old. Your representation of change should consider all that. In the samples ahead I'm avoiding this problem by reducing the change representation to the changed version of the item.

Azure Cosmos DB Change Feed

Azure Cosmos DB change feed is the second (after the RethinkDB one) I've been writing about in the past. It's also the one which consumption has seen the most evolution through time.

The first consumption model was quite complicated. It required going through partition key ranges, building document change feed queries for them, and then obtaining enumerators. This whole process required managing its state, which resulted in non-trivial code. It's good that it has been deprecated as part of Azure Cosmos DB .NET SDK V2, and it's going out of support in August 2024.

Azure Cosmos DB .NET SDK V3 has brought the second consumption model based on change feed processor. The whole inner workings of consuming the change feed have been enclosed within a single class, which reduced the amount of code required. But change feed processor has its oddities. It requires an additional container - a lease container that deals with previously described state management. This is beneficial in complex scenarios as it allows for coordinated processing by multiple workers, but becomes an unnecessary complication for simple scenarios. It also provides only a push-based programming model. The consumer must provide a delegate to receive changes. Once again this is great for certain scenarios, but leads to awkward implementation when you want to abstract change feed as a stream.

The story doesn't end there, version 3.20.0 of Azure Cosmos DB .NET SDK has introduced the third consumption model based on change feed iterator. It provides a pull-based alternative to change feed processor for scenarios where it's more appropriate. With the change feed iterator the control over the pace of consuming the changes is given back to the consumer. State management is also optional, but it's the consumer's responsibility to persist continuation tokens if necessary. Additionally, the change feed iterator brings the option of obtaining a change feed for a specific partition key.

The below snippet shows a very simple consumer implementation of the change feed iterator model - no state management, just starting the consumption from a certain point in time and waiting one second before polling for new changes.

public async IAsyncEnumerable<T> FetchFeed(
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    FeedIterator<T> changeFeedIterator = _container.GetChangeFeedIterator<T>(
        ChangeFeedStartFrom.Time(DateTime.UtcNow),
        ChangeFeedMode.LatestVersion
    );

    while (changeFeedIterator.HasMoreResults && !cancellationToken.IsCancellationRequested)
    {
        FeedResponse<T> changeFeedResponse = await changeFeedIterator
            .ReadNextAsync(cancellationToken);

        if (changeFeedResponse.StatusCode == HttpStatusCode.NotModified)
        {
            await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
        }
        else
        {
            foreach (T item in changeFeedResponse)
            {
                yield return item;
            }
        }
    }
}

MongoDB Change Feed

MongoDB is probably the most popular NoSQL choice among the teams that I've been working with, which doesn't use cloud PaaS databases for their needs. Among its many features, it has quite powerful change feed (a.k.a. Change Streams) capability.

The incoming change information can cover a wide spectrum of operations which can come from a single collection, database, or entire deployment. If the operation relates to a document, the change feed can provide the current version, the previous version, and the delta. There is also support for resume tokens which can be used to manage state if needed.

One unintuitive thing when it comes to MongoDB change feed is that it's only available when you are running a replica set or a sharded cluster. This doesn't mean that you have to run a cluster. You can run a single instance as a replica set (even in a container), you just need the right configuration (you will find a workflow that handles such a deployment to Azure Container Instances in the demo repository).

The consumption of MongoDB change feed is available through the Watch and WatchAsync methods available on IMongoCollection, IMongoDatabase, and IMongoClient instances. The below snippet watches a single collection and configures the change feed to return the current version of the document. You can also provide a pipeline definition when calling Watch or WatchAsync to filter the change feed (for example to monitor only specific operation types).

public async IAsyncEnumerable<T> FetchFeed(
    [EnumeratorCancellation]CancellationToken cancellationToken = default)
{
    IAsyncCursor<ChangeStreamDocument<T>> changefeed = await _collection.WatchAsync(
        new ChangeStreamOptions { FullDocument = ChangeStreamFullDocumentOption.UpdateLookup },
        cancellationToken: cancellationToken
    );

    while (!cancellationToken.IsCancellationRequested)
    {
        while (await changefeed.MoveNextAsync(cancellationToken))
        {
            IEnumerator<ChangeStreamDocument<T>>  changefeedCurrentEnumerator = changefeed
                .Current.GetEnumerator();

            while (changefeedCurrentEnumerator.MoveNext())
            {
                if (changefeedCurrentEnumerator.Current.OperationType
                    == ChangeStreamOperationType.Insert)
                {
                    yield return changefeedCurrentEnumerator.Current.FullDocument;
                }

                ...
            }
        }

        await Task.Delay(_moveNextDelay, cancellationToken);
    }
}

Azure Blob Storage Change Feed

Azure Blob Storage is the odd one on this list because it's an object storage, not a database. Its change feed provides information about changes to blobs and blobs metadata in an entire storage account. Under the hood the change feed is implemented as a special container (yes it's visible, yes you can take a look) which is being created once you enable it. As it is a container you should consider the configuration of the retention period as it will affect your costs.

There is one more important aspect of Azure Blob Storage change feed when considering its usage - latency. It's pretty slow. It can take minutes for changes to appear.

From the consumption perspective, it follows the enumerator approach. You can obtain the enumerator by calling BlobChangeFeedClient.GetChangesAsync. The enumerator is not infinite, it will return the changes currently available and once you process them you have to poll for new ones. This makes managing the continuation tokens required even for a local state. What is unique is that you can request changes within a specified time window.

The change feed supports six events in the latest schema version. In addition to expected ones like created or deleted, there are some interesting ones like tier changed. The information never contains the item, which shouldn't be surprising as in the context of object storage this would be quite risky.

The below snippet streams the change feed by locally managing the continuation token and for changes that represent blob creation, it downloads the current version of the item.

public async IAsyncEnumerable<T> FetchFeed(
    [EnumeratorCancellation]CancellationToken cancellationToken = default)
{
    string? continuationToken = null;

    TokenCredential azureCredential = new DefaultAzureCredential();

    BlobServiceClient blobServiceClient = new BlobServiceClient(_serviceUri, azureCredential);
    BlobChangeFeedClient changeFeedClient = _blobServiceClient.GetChangeFeedClient();

    while (!cancellationToken.IsCancellationRequested)
    {
        IAsyncEnumerator<Page<BlobChangeFeedEvent>> changeFeedEnumerator = changeFeedClient
            .GetChangesAsync(continuationToken)
            .AsPages()
            .GetAsyncEnumerator();

        while (await changeFeedEnumerator.MoveNextAsync())
        {
            foreach (BlobChangeFeedEvent changeFeedEvent in changeFeedEnumerator.Current.Values)
            {
                if ((changeFeedEvent.EventType == BlobChangeFeedEventType.BlobCreated)
                    && changeFeedEvent.Subject.StartsWith($"/blobServices/default/containers/{_container}"))
                {
                    BlobClient createdBlobClient = new BlobClient(
                        changeFeedEvent.EventData.Uri,
                        azureCredential);

                    if (await createdBlobClient.ExistsAsync())
                    {
                        MemoryStream blobContentStream =
                            new MemoryStream((int)changeFeedEvent.EventData.ContentLength);
                        await createdBlobClient.DownloadToAsync(blobContentStream);
                        blobContentStream.Seek(0, SeekOrigin.Begin);

                        yield return JsonSerializer.Deserialize<T>(blobContentStream);
                    }
                }
            }

            continuationToken = changeFeedEnumerator.Current.ContinuationToken;
        }

        await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
    }
}

There Is More

The above samples are in no way exhaustive. They don't show all the features of given change feeds and they don't show all the change feeds out there. But they are a good start, this is why I've been evolving them for the past five years.

Older Posts