Not so long ago my colleague reached out with a question "Are there any known issues with ITelemetryInitializer in Azure Functions?". This question started a discussion about the monitoring configuration for C# Azure Functions in the isolated worker model. At some point in that discussion I stated "Alright, you've motivated me, I'll make a sample". When I sat down to do that, I started wondering which scenario should I cover and my conclusion was that there are several things I should go through... So here I am.

Setting the Scene

Before I start describing how we can monitor an isolated worker model C# Azure Function, allow me to introduce you to the one we are going to use for this purpose. I want to start with as basic setup as possible. This is why the initial Program.cs will contain only four lines.

var host = new HostBuilder()
    .ConfigureFunctionsWebApplication()
    .Build();

host.Run();

The host.json will also be minimal.

{
    "version": "2.0",
    "logging": {
        "logLevel": {
            "default": "Information"
        }
    }
}

For the function itself, I've decided to go for the Fibonacci sequence implementation as it can easily generate a ton of logs.

public class FibonacciSequence
{
    private readonly ILogger<FibonacciSequence> _logger;

    public FibonacciSequence(ILogger<FibonacciSequence> logger)
    {
        _logger = logger;
    }

    [Function(nameof(FibonacciSequence))]
    public async Task<HttpResponseData> Run(
        [HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "fib/{index:int}")]
        HttpRequestData request,
        int index)
    {
        _logger.LogInformation(
            $"{nameof(FibonacciSequence)} function triggered for {{index}}.",
            index
        );

        var response = request.CreateResponse(HttpStatusCode.OK);
        response.Headers.Add("Content-Type", "text/plain; charset=utf-8");

        await response.WriteStringAsync(FibonacciSequenceRecursive(index).ToString());

        return response;
    }

    private int FibonacciSequenceRecursive(int index)
    {
        int fibonacciNumber = 0;

        _logger.LogInformation("Calculating Fibonacci sequence for {index}.", index);

        if (index <= 0)
        {
            fibonacciNumber = 0;
        }
        else if (index <= 1)
        {
            fibonacciNumber = 1;
        }
        else
        {
            fibonacciNumber = 
              FibonacciSequenceRecursive(index - 1)
              + FibonacciSequenceRecursive(index - 2);
        }

        _logger.LogInformation(
            "Calculated Fibonacci sequence for {index}: {fibonacciNumber}.",
            index,
            fibonacciNumber
        );

        return fibonacciNumber;
    }
}

This is a recursive implementation which in our case has the added benefit of being crashable on demand 😉.

Now we can start capturing the signals this function will produce after deployment.

Simple and Limited Option for Specific Scenarios - File System Logging

What we can capture in a minimal deployment scenario is logs coming from our function. What do I understand by a minimal deployment scenario? The bare minimum that Azure Function requires is a storage account, app service plan, and function app. The function can push all its logs into that storage account. Is this something I would recommend for a production scenario? Certainly not. It's only logs (and in production you will want metrics) and works reasonably only for Azure Functions deployed on Windows (for production I would suggest Linux due to better cold start performance or lower pricing for dedicated plan). But it may be the right option for some development scenarios (when you want to run some work, download the logs and analyze them locally). So, how to achieve this? Let's start with some Bicep snippets for the required infrastructure. First, we need to deploy a storage account.

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-05-01' = {
  name: 'stwebjobs${uniqueString(resourceGroup().id)}'
  location: resourceGroup().location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'Storage'
}

We also need an app service plan. It must be a Windows one (the below snippet creates a Windows Consumption Plan).

resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = {
  name: 'plan-monitored-function'
  location: resourceGroup().location
  sku: {
    name: 'Y1'
    tier: 'Dynamic'
  }
  properties: {
    computeMode: 'Dynamic'
  }
}

And the actual service doing the work, the function app.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  name: 'func-monitored-function'
  location: resourceGroup().location
  kind: 'functionapp'
  properties: {
    serverFarmId: appServicePlan.id
    siteConfig: {
      netFrameworkVersion: 'v8.0'
      appSettings: [
        {
          name: 'AzureWebJobsStorage'
          value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};EndpointSuffix=${environment().suffixes.storage};AccountKey=${storageAccount.listKeys().keys[0].value}'
        }
        {
          name: 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING'
          value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccount.name};EndpointSuffix=${environment().suffixes.storage};AccountKey=${storageAccount.listKeys().keys[0].value}'
        }
        {
          name: 'WEBSITE_CONTENTSHARE'
          value: 'func-monitored-function'
        }
        {
          name: 'FUNCTIONS_EXTENSION_VERSION'
          value: '~4'
        }
        {
          name: 'FUNCTIONS_WORKER_RUNTIME'
          value: 'dotnet-isolated'
        }
      ]
    }
    httpsOnly: true
  }
}

A couple of words about this function app. The kind: 'functionapp' indicates that this is a Windows function app that creates the requirement of setting netFrameworkVersion to desired .NET version. To make this an isolated worker model function, the FUNCTIONS_EXTENSION_VERSION is set to ~4 and FUNCTIONS_WORKER_RUNTIME to dotnet-isolated. When it comes to all the settings referencing the storage account, your attention should go to WEBSITE_CONTENTAZUREFILECONNECTIONSTRING and WEBSITE_CONTENTSHARE - those indicate the file share that will be created for this function app in the storage account (this is where the logs will go).

After deployment, the resulting infrastructure should be as below.

Infrastructure for Azure Functions with file system logging (storage account, app service plan, and function app)

What remains is configuring the file system logging in the host.json file of our function. The default behavior is to generate log files only when the function is being debugged using the Azure portal. We want them to be generated always.

{
    "version": "2.0",
    "logging": {
        "fileLoggingMode": "always",
        "logLevel": {
            "default": "Information"
        }
    }
}

When you deploy the code and execute some requests, you will be able to find the log files in the defined file share (the path is /LogFiles/Application/Functions/Host/).

2024-11-17T15:33:39.339 [Information] Executing 'Functions.FibonacciSequence' ...
2024-11-17T15:33:39.522 [Information] FibonacciSequence function triggered for 6.
2024-11-17T15:33:39.526 [Information] Calculating Fibonacci sequence for 6.
2024-11-17T15:33:39.526 [Information] Calculating Fibonacci sequence for 5.
...
2024-11-17T15:33:39.527 [Information] Calculated Fibonacci sequence for 5: 5.
...
2024-11-17T15:33:39.528 [Information] Calculated Fibonacci sequence for 4: 3.
2024-11-17T15:33:39.528 [Information] Calculated Fibonacci sequence for 6: 8.
2024-11-17T15:33:39.566 [Information] Executed 'Functions.FibonacciSequence' ...

For Production Scenarios - Entra ID Protected Application Insights

As I said, I wouldn't use file system logging for production. Very often it's not even enough for development. This is why Application Insights are considered a default these days. But before we deploy one, we can use the fact that we no longer new Windows and change the deployment to be a Linux based one. First I'm going to change the app service plan to a Linux Consumption Plan.

resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = {
  ...
  kind: 'linux'
  properties: {
    reserved: true
  }
}

With a different app service plan, the function app can be changed to a Linux one, which means changing the kind to functionapp,linux and replacing netFrameworkVersion with the corresponding linuxFxVersion.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  kind: 'functionapp,linux'
  properties: {
    serverFarmId: appServicePlan.id
    siteConfig: {
      linuxFxVersion: 'DOTNET-ISOLATED|8.0'
      ...
    }
    httpsOnly: true
  }
}

There is also a good chance that you no longer need the file share (although the documentation itself is inconsistent whether it's needed when running Linux functions on the Elastic Premium plan) so the WEBSITE_CONTENTAZUREFILECONNECTIONSTRING and WEBSITE_CONTENTSHARE can simply be removed. No requirement for the file share creates one more improvement opportunity - we can drop credentials for the blob storage (AzureWebJobsStorage) as this connection can use managed identity. I'm going to use a user-assigned managed identity because I often prefer them and they usually cause more trouble to set up 😉. To do so, we need to create one and grant it the Storage Blob Data Owner role for the storage account.

resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'id-monitored-function'
  location: resourceGroup().location
}

resource storageBlobDataOwnerRoleDefinition 'Microsoft.Authorization/roleDefinitions@2022-04-01' existing = {
  name: 'b7e6dc6d-f1e8-4753-8033-0f276bb0955b' // Storage Blob Data Owner
  scope: subscription()
}

resource storageBlobDataOwnerRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: storageAccount
  name: guid(storageAccount.id, managedIdentity.id, storageBlobDataOwnerRoleDefinition.id)
  properties: {
    roleDefinitionId: storageBlobDataOwnerRoleDefinition.id
    principalId: managedIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

The change to the function app is only about assigning the managed identity and replacing AzureWebJobsStorage with AzureWebJobsStorage__accountName.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...

  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        {
          name: 'AzureWebJobsStorage__accountName'
          value: storageAccount.name
        }
        ...
      ]
    }
    httpsOnly: true
  }
}

Enough detouring (although it wasn't without a purpose 😜) - it's time to deploy Application Insights. As the classic Application Insights are retired, we are going to create a workspace-based instance.

resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = {
  name: 'log-monitored-function'
  location: resourceGroup().location
  properties: {
    sku: { 
      name: 'PerGB2018' 
    }
  }
}

resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: 'appi-monitored-function'
  location: resourceGroup().location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    WorkspaceResourceId: logAnalyticsWorkspace.id
    DisableLocalAuth: true
  }
}

You might have noticed that I've set DisableLocalAuth to true. This is a security improvement. It enforces authentication by Entra ID for ingestion and as a result, makes InstrumentationKey a resource identifier instead of a secret. This is nice and we can easily handle it because we already have managed identity in place (I told you it had a purpose 😊). All we need to do is grant the Monitoring Metrics Publisher role to our managed identity.

resource monitoringMetricsPublisherRoleDefinition 'Microsoft.Authorization/roleDefinitions@2022-04-01' existing = {
  name: '3913510d-42f4-4e42-8a64-420c390055eb' // Monitoring Metrics Publisher
  scope: subscription()
}

resource monitoringMetricsPublisherRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: applicationInsights
  name: guid(applicationInsights.id, managedIdentity.id, monitoringMetricsPublisherRoleDefinition.id)
  properties: {
    roleDefinitionId: monitoringMetricsPublisherRoleDefinition.id
    principalId: managedIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

Adding two application settings definitions to our function app will tie the services together.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        ...
        {
          name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
          value: applicationInsights.properties.ConnectionString
        }
        {
          name: 'APPLICATIONINSIGHTS_AUTHENTICATION_STRING'
          value: 'ClientId=${managedIdentity.properties.clientId};Authorization=AAD'
        }
        ...
      ]
    }
    httpsOnly: true
  }
}

Are we done with the infrastructure? Not really. There is one more useful thing that is often forgotten - enabling storage logs. There are some important function app data in there so it would be nice to monitor it.

resource storageAccountBlobService 'Microsoft.Storage/storageAccounts/blobServices@2023-05-01' existing = {
  name: 'default'
  parent: storageAccount
}

resource storageAccountDiagnosticSettings 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: '${storageAccount.name}-diagnostic'
  scope: storageAccountBlobService
  properties: {
    workspaceId: logAnalyticsWorkspace.id
    logs: [
      {
        category: 'StorageWrite'
        enabled: true
      }
    ]
    metrics: [
      {
        category: 'Transaction'
        enabled: true
      }
    ]
  }
}

Now we are done and our infrastructure should look like below.

Infrastructure for Azure Functions with Application Insights (managed identity, application insights, log workspace, storage account, app service plan, and function app)

Once we deploy our function and make some requests, thanks to the codeless monitoring by the host and relaying worker logs through the host, we can find the traces in Application Insights.

Relayed worker logs ingested through host codeless monitoring

You may be asking what I mean by "relaying worker logs through the host"? You probably remember that in the case of C# Azure Functions in the isolated worker model, we have two processes: functions host and isolated worker. Azure Functions wants to be helpful and by default, it sends the logs from the worker process to the functions host process, which then sends them to Application Insights.

Azure Functions relaying worker logs through the host

This is nice, but may not be exactly what you want. You may want to split the logs so you can treat them separately (for example by configuring different default log levels for the host and the worker). To achieve that you must explicitly set up Application Insights integration in the worker code.

var host = new HostBuilder()
    .ConfigureFunctionsWebApplication()
    .ConfigureServices(services => {
        services.AddApplicationInsightsTelemetryWorkerService();
        services.ConfigureFunctionsApplicationInsights();
    })
    .Build();

host.Run();

The telemetry from the worker can now be controlled in the code, while from the host through the host.json. But if you deploy this, you will find no telemetry from the worker in Application Insights. You can still see your logs in the Log stream. You will also see a lot of those:

Azure.Identity: Service request failed.
Status: 400 (Bad Request)

Content:
{"statusCode":400,"message":"Unable to load the proper Managed Identity.","correlationId":"..."}

That's Application Insights SDK not being able to use the user-assigned managed identity. Azure Functions runtime uses the APPLICATIONINSIGHTS_AUTHENTICATION_STRING setting to provide a user-assigned managed identity OAuth token to the Application Insights but none of that (at the time of writing this) happens when we set up the integration explicitly. That said, we can do it ourselves. The parser for the setting is available in the Microsoft.Azure.WebJobs.Logging.ApplicationInsights so we can mimic the host implementation.

using TokenCredentialOptions =
    Microsoft.Azure.WebJobs.Logging.ApplicationInsights.TokenCredentialOptions;

var host = new HostBuilder()
    ...
    .ConfigureServices(services => {
        services.Configure<TelemetryConfiguration>(config =>
        {
            string? authenticationString =
                Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_AUTHENTICATION_STRING");

            if (!String.IsNullOrEmpty(authenticationString))
            {
                var tokenCredentialOptions = TokenCredentialOptions.ParseAuthenticationString(
                    authenticationString
                );
                config.SetAzureTokenCredential(
                    new ManagedIdentityCredential(tokenCredentialOptions.ClientId)
                );
            }
        });
        services.AddApplicationInsightsTelemetryWorkerService();
        services.ConfigureFunctionsApplicationInsights();
    })
    .Build();

host.Run();

The telemetry should now reach the Application Insights without a problem.

Azure Functions emitting worker logs directly

Be cautious. As Application Insights SDK now controls emitting the logs you must be aware of its opinions. It so happens that it tries to optimize by default and adds a logging filter that sets the minimum log level to Warning. You may want to get rid of that (be certain to make it after ConfigureFunctionsApplicationInsights).

...

var host = new HostBuilder()
    ...
    .ConfigureServices(services => {
        ...
        services.ConfigureFunctionsApplicationInsights();
        services.Configure<LoggerFilterOptions>(options =>
        {
            LoggerFilterRule? sdkRule = options.Rules.FirstOrDefault(rule =>
                rule.ProviderName == typeof(Microsoft.Extensions.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider).FullName
            );

            if (sdkRule is not null)
            {
                options.Rules.Remove(sdkRule);
            }
        });
    })
    .Build();

host.Run();

Modifying Application Map in Application Insights

The default application map is not impressive - it will simply show your function app by its resource name.

Default application map in Application Insights

As function apps rarely exist in isolation, we often want to present them in a more meaningful way on the map by providing a cloud role. The simplest way to do this is through the WEBSITE_CLOUD_ROLENAME setting.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        ...
        {
          name: 'WEBSITE_CLOUD_ROLENAME'
          value: 'InstrumentedFunctions'
        }
        ...
      ]
    }
    ...
  }
}

This will impact both the host and the work. With explicit Application Insights integration, we can separate the two (if there is such a need) and change the cloud role for the worker. This is done in the usual way, by registering an ITelemetryInitializer. The only important detail is the place of the registration - it needs to be after ConfigureFunctionsApplicationInsights as Azure Functions are adding their own initializers.

public class IsolatedWorkerTelemetryInitializer : ITelemetryInitializer
{
    public void Initialize(ITelemetry telemetry)
    {
        telemetry.Context.Cloud.RoleName = "InstrumentedFunctionsIsolatedWorker";
    }
}
...

var host = new HostBuilder()
    ...
    .ConfigureServices(services => {
        ...
        services.ConfigureFunctionsApplicationInsights();
        services.AddSingleton<ITelemetryInitializer, IsolatedWorkerTelemetryInitializer>();
        services.Configure<LoggerFilterOptions>(options =>
        {
            ...
        });
    })
    .Build();

host.Run();

The resulting map will look as below.

Modified application map in Application Insights

Monitoring for Scaling Decisions

This is currently in preview, but you can enable emitting scale controller logs (for a chance to understand the scaling decisions). This is done with a single setting (SCALE_CONTROLLER_LOGGING_ENABLED) that takes a value in the : format. The possible values for destination are Blob and AppInsights, while the verbosity can be None, Verbose, and Warning. The Verbose seems to be the most useful one when you are looking for understanding as it's supposed to provide reason for changes in the worker count, and information about the triggers that impact those decisions.

resource functionApp 'Microsoft.Web/sites@2024-04-01' = {
  ...
  properties: {
    ...
    siteConfig: {
      ...
      appSettings: [
        ...
        {
          name: 'SCALE_CONTROLLER_LOGGING_ENABLED'
          value: 'AppInsights:Verbose'
        }
        ...
      ]
    }
    ...
  }
}

Is This Exhaustive?

Certainly not 🙂. There are other things that you can fiddle with. You can have a very granular configuration of different log levels for different categories. You can fine-tune aggregation and sampling. When you are playing with all those settings, please remember that you're looking for the right balance between the gathered details and costs. My advice is not to configure monitoring for the worst-case scenario (when you need every detail) as that often isn't financially sustainable. Rather aim for a lower amount of information and change the settings to gather more if needed (a small hint - you can override host.json settings through application settings without deployment).

I've been exploring working with asynchronous streaming data sources over HTTP for quite some time on this blog. I've been playing with async streamed JSON and NDJSON. I've been streaming and receiving streams in ASP.NET Core and console applications. But there is one scenario I haven't touched yet - streaming from Blazor WebAssembly. Why? Simply it wasn't possible.

Streaming objects from Blazor WebAssembly requires two things. First is the support for streaming upload in browsers Fetch API. The Fetch API did support streams for response bodies for quite some time (my first post using it is from 2019) but non-experimental support for streams as request bodies come to Chrome, Edge, and Safari in 2022 (and it is yet to come to Firefox as far as I can tell). Still, it's been available for two years and I haven't explored it yet? Well, I did, more than a year ago, with NDJSON and pure Javascript, where I used the ASP.NET Core endpoint I created long ago. But here we are talking about Blazor WebAssembly, which brings us to the second requirement - the browser API needs to be surfaced in Blazor. This is finally happening with .NET 9 and now I can fully explore it.

Async Streaming NDJSON From Blazor WebAssembly

I've decided to start with NDJSON because I already have my own building blocks for it (that mentioned ASP.NET Core endpoint, and NdjsonAsyncEnumerableContent coming from one of my packages). If things wouldn't work I would be able to easily debug them.

I've quickly put together a typical code using HttpClient. I just had to make sure I've enabled streaming upload by calling SetBrowserRequestStreamingEnabled on the request instance.

private async Task PostWeatherForecastsNdjsonStream()
{
    ...

    try
    {
        ....

        HttpRequestMessage request = new HttpRequestMessage(
            HttpMethod.Post, "api/WeatherForecasts/stream");
        request.Content = new NdjsonAsyncEnumerableContent<WeatherForecast>(
            StreamWeatherForecastsAsync());
        request.SetBrowserRequestStreamingEnabled(true);

        using HttpResponseMessage response = await Http.SendAsync(request, cancellationToken);

        response.EnsureSuccessStatusCode();
    }
    finally
    {
        ...
    }
}

It worked! No additional changes and no issues to resolve. I could see the items being logged by my ASP.NET Core service as they were streamed.

After this initial success, I could move to a potentially more challenging scenario - async streaming JSON.

Async Streaming JSON From Blazor WebAssembly

In theory, async streaming JSON should also just work. .NET has the support for IAsyncEnumerable built into JsonSerializer.SerializeAsync so JsonContent should just use it. Well, there is no better way than to try, so I've quickly changed the code.

private async Task PostWeatherForecastsJsonStream()
{
    ...

    try
    {
        ...

        HttpRequestMessage request = new HttpRequestMessage(
            HttpMethod.Post, "api/WeatherForecasts/stream");
        request.Content = JsonContent.Create(StreamWeatherForecastsAsync());
        request.SetBrowserRequestStreamingEnabled(true);

        using HttpResponseMessage response = await Http.SendAsync(request, cancellationToken);

        response.EnsureSuccessStatusCode();
    }
    finally
    {
        ...
    }
}

To my surprise, it also worked! I couldn't test it against my ASP.NET Core service because it didn't support it, but I quickly created a dumb endpoint that would simply dump whatever was incoming.

Now, why was I surprised? Because this is in fact a platform difference - this behavior seems to be unique for the "browser" platform. I did attempt the same in a desktop application earlier and HttpClient would buffer the request - I had to create a simple HttpContent that would wrap the underlying stream and flush on write. I think I like the Blazor WebAssembly behavior better and I'm happy that I didn't have to do the same here.

Ok, but if I can't handle the incoming stream nicely in ASP.NET Core, it's not really that useful. So I've tried to achieve that as well.

Receiving Async Streamed JSON in ASP.NET Core

Receiving async streamed JSON is a little bit more tricky. JsonSerializer has a dedicated method (DeserializeAsyncEnumerable) to deserialize an incoming JSON stream into an IAsyncEnumerable instance. The built-in SystemTextJsonInputFormatter doesn't use that method, it simply uses JsonSerializer.DeserializeAsync with request body as an input. That shouldn't be a surprise, it is the standard way to deserialize incoming JSON. To support async streamed JSON, a custom formatter is required. What is more, to be sure that this custom formatter will be used, it can't be just added - it has to replace the built-in one. But that would mean that the custom formatter has to have all the functionality of the built-in one (if we want to support not only async streamed JSON). Certainly, not something I wanted to reimplement, so I've decided to simply inherit from SystemTextJsonInputFormatter.

internal class SystemTextStreamedJsonInputFormatter : SystemTextJsonInputFormatter
{
    ...

    public override Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
    {
        return base.ReadRequestBodyAsync(context);
    }
}

Now, for the actual functionality, the logic has to branch based on the type of the model as we want to use DeserializeAsyncEnumerable only if it's IAsyncEnumerable.

internal class SystemTextStreamedJsonInputFormatter : SystemTextJsonInputFormatter
{
    ...

    public override Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
    {
        if (context.ModelType.GetGenericTypeDefinition() == typeof(IAsyncEnumerable<>))
        {
            ...
        }

        return base.ReadRequestBodyAsync(context);
    }
}

That's not the end of challenges (and reflection). JsonSerializer.DeserializeAsyncEnumerable is a generic method. That means we have to get the actual type of values, based on it construct the right method, and call it.

internal class SystemTextStreamedJsonInputFormatter : SystemTextJsonInputFormatter
{
    ...

    public override Task<InputFormatterResult> ReadRequestBodyAsync(InputFormatterContext context)
    {
        if (context.ModelType.GetGenericTypeDefinition() == typeof(IAsyncEnumerable<>))
        {
            MethodInfo deserializeAsyncEnumerableMethod = typeof(JsonSerializer)
                .GetMethod(
                    nameof(JsonSerializer.DeserializeAsyncEnumerable),
                    [typeof(Stream), typeof(JsonSerializerOptions), typeof(CancellationToken)])
                .MakeGenericMethod(context.ModelType.GetGenericArguments()[0]);

            return Task.FromResult(InputFormatterResult.Success(
                deserializeAsyncEnumerableMethod.Invoke(null, [
                    context.HttpContext.Request.Body,
                    SerializerOptions,
                    context.HttpContext.RequestAborted
                ])
            ));
        }

        return base.ReadRequestBodyAsync(context);
    }
}

The implementation above is intentionally ugly, so it doesn't hide anything and can be understood more easily. You can find a cleaner one, which also performs some constructed methods caching, in the demo repository.

Now that we have the formatter, we can create a setup for MvcOptions that will replace the SystemTextJsonInputFormatter with a custom implementation.

internal class SystemTextStreamedJsonMvcOptionsSetup : IConfigureOptions<MvcOptions>
{
    private readonly IOptions<JsonOptions>? _jsonOptions;
    private readonly ILogger<SystemTextJsonInputFormatter> _inputFormatterLogger;

    public SystemTextStreamedJsonMvcOptionsSetup(
        IOptions<JsonOptions>? jsonOptions,
        ILogger<SystemTextJsonInputFormatter> inputFormatterLogger)
    {
        _jsonOptions = jsonOptions;
        _inputFormatterLogger = inputFormatterLogger;
    }

    public void Configure(MvcOptions options)
    {
        options.InputFormatters.RemoveType<SystemTextJsonInputFormatter>();
        options.InputFormatters.Add(
            new SystemTextStreamedJsonInputFormatter(_jsonOptions?.Value, _inputFormatterLogger)
        );
    }
}

A small convenience method to register that setup.

internal static class SystemTextStreamedJsonMvcBuilderExtensions
{
    public static IMvcBuilder AddStreamedJson(this IMvcBuilder builder)
    {
        ...

        builder.Services.AddSingleton
            <IConfigureOptions<MvcOptions>, SystemTextStreamedJsonMvcOptionsSetup>();

        return builder;
    }
}

And we can configure the application to use it.

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        services.AddControllers()
            ...
            .AddStreamedJson();

        ...
    }

    ...
}

It gets the job done. As the formatter for NDJSON expects a different content type (application/x-ndjson) the content negotiation also works and I could asynchronously stream NDJSON and JSON to the same endpoint. Honestly, I think it's cool.

Afterthoughts

One question that arises at this point is "Which one to use?". I prefer NDJSON here. Why? Both require custom formatters and both formatters must use reflection. But in the case of NDJSON that is isolated only to the requests coming with a dedicated content type. To support async streamed JSON, the initial model type check has to happen on every request with JSON-related content type. Also I don't feel great with replacing the built-in formatter like that. On top of that, NDJSON enables clean cancellation as the endpoint is receiving separate JSON objects. In the case of JSON that is a single JSON collection of objects that will not be terminated properly and it will trip the deserialization.

But this is just my opinion. You can grab the demo and play with it, or use one of my NDJON packages and experiment with it yourself. That's the best way to choose the right approach for your scenario.

I've been exploring the subject of Azure Functions extensibility on this blog for quite some time. I've touched on subjects directly related to creating extensions and their anatomy, but also some peripheral ones.

I have always written from the perspective of the in-process model, but since 2020 there has been a continued evolution when it comes to the preferred model for .NET-based function apps. The isolated worker model, first introduced with .NET 5, has been gaining parity and becoming the leading vehicle for making new .NET versions available with Azure Functions. In August 2023 Microsoft announced the intention for .NET 8 to be the last LTS release to receive in-process model support. So the question comes, does it invalidate all that knowledge about Azure Functions extensibility? The short answer is no. But before I go into details, I need to cover the common ground.

Isolated Worker Model in a Nutshell

.NET was always a little bit special in Azure Functions. It shouldn't be a surprise. After all, it's Microsoft technology and there was a desire for the integration to be efficient and powerful. So even when Azure Functions v2 brought the separation between the host process and language worker process, .NET-based function apps were running in the host process. This had performance benefits (no communication between processes) and allowed .NET functions apps to leverage the full capabilities of the host, but started to become a bottleneck when the pace of changes in the .NET ecosystem accelerated. There were more and more conflicts between the assemblies that developers wanted to use in the apps and the ones used by the host. There was a delay in making new .NET versions available because the host had to be updated. Also, there were things that the app couldn't do because it was coupled with the host. Limitations like those were reasons for bringing Azure Functions .NET Worker to life.

At the same time, Microsoft didn't want to take away all the benefits that .NET developers had when working with Azure Functions. The design had to take performance and developers' experience into account. So how does Azure Functions .NET Worker work? In simplification, it's an ASP.NET Core application that receives inputs and provides outputs to the host over gRPC (which is more performant than HTTP primitives used in the case of custom handlers)

Azure Functions .NET Worker overview

The request and response payloads are also pretty well hidden. Developers have been given a new binding model with required attributes available through *.Azure.Functions.Worker.Extensions.* packages. But if the actual bindings activity happens in the host, what do those new packages provide? And what is their relation with the *.Azure.WebJobs.Extensions.* packages?

Worker Extensions and WebJobs Extensions

The well-hidden truth is that the worker extension packages are just a bridge to the in-process extension packages. It means that if you want to create a new extension or understand how an existing one works, you should start with an extension for the in-process model. The worker extensions are mapped to the in-process ones through an assembly-level attribute, which takes the name of the package and version to be used as parameters.

[assembly: ExtensionInformation("RethinkDb.Azure.WebJobs.Extensions", "0.6.0")]

The integration is quite seamless. During the build, the Azure Functions tooling will use NuGet to install the needed in-process extension package, it doesn't have to be referenced. Of course that has its drawbacks (tight coupling to a specific version and more challenges during debugging). So, the final layout of the packages can be represented as below.

Azure Functions .NET Worker Extensions and WebJobs Extensions relation overview

What ensures the cooperation between those two packages running in two different processes are the binding attributes.

Binding Attributes

In the case of the in-process model extensions we have two types of attributes - one for bindings and one for trigger. In the case of the isolated worker model, there are three - for input binding, for output binding, and for trigger.

public class RethinkDbInputAttribute : InputBindingAttribute
{
    ...
}
public sealed class RethinkDbOutputAttribute : OutputBindingAttribute
{
    ...
}
public sealed class RethinkDbTriggerAttribute : TriggerBindingAttribute
{
    ...
}

The isolated worker model attributes are used in two ways. One is for developers, who use them to decorate their functions and provide needed settings. The other is for the worker, which uses them as data transfer objects. They are being serialized and transferred as metadata. On the host side, they are being deserialized to the corresponding in-process model extension attribute. The input and output attributes will be deserialized to the binding attribute, and the trigger will be deserialized to the trigger. This means that we need to ensure that the names of properties which we want to support are matching.

Implementing the attributes and decorating the functions with them is all we need to make it work. This will give us support for POCOs as values (the host and worker will take care of serialization, transfer over gRPC, and deserialization). But what if we want something more than POCO?

Beyond POCO Inputs With Converters

It's quite common for in-process extensions to support binding data provided using types from specific service SDKs (for example CosmosClient in the case of Azure Cosmos DB). That kind of binding data is not supported out-of-the-box by isolated worker extensions as they can't be serialized and transferred. But there is a way for isolated worker extensions to go beyond POCOs - input converters.

Input converters are classes that implement the IInputConverter interface. This interface defines a single method, which is supposed to return a conversation result. Conversation result can be one of the following:

  • Unhandled (the converter did not act on the input)
  • Succeeded (conversion was successful and the result is included)
  • Failed

The converter should check if it's being used with an extension it supports (the name that has been used for isolated extensions registration will be provided as part of the model binding data) and if the incoming content is in supported format. The converter can also be decorated with multiple SupportedTargetType attributes to narrow its scope.

Below is a sample template for an input converter.

[SupportsDeferredBinding]
[SupportedTargetType(typeof(...))]
[SupportedTargetType(typeof(...))]
internal class RethinkDbConverter : IInputConverter
{
    private const string RETHINKDB_EXTENSION_NAME = "RethinkDB";
    private const string JSON_CONTENT_TYPE = "application/json";

    ...

    public RethinkDbConverter(...)
    {
        ...
    }

    public async ValueTask<ConversionResult> ConvertAsync(ConverterContext context)
    {
        ModelBindingData modelBindingData = context?.Source as ModelBindingData;

        if (modelBindingData is null)
        {
            return ConversionResult.Unhandled();
        }

        try
        {
            if (modelBindingData.Source is not RETHINKDB_EXTENSION_NAME)
            {
                throw new InvalidOperationException($"Unexpected binding source.");
            }

            if (modelBindingData.ContentType is not JSON_CONTENT_TYPE)
            {
                throw new InvalidOperationException($"Unexpected content-type.");
            }

            object result = context.TargetType switch
            {
                // Here you can use modelBindingData.Content,
                // any injected services, etc.
                // to prepare the value.
                ...
            };


            return ConversionResult.Success(result);
        }
        catch (Exception ex)
        {
            return ConversionResult.Failed(ex);
        }
    }
}

Input converters can be applied to input and trigger binding attributes by simply decorating them with an attribute (we should also define the fallback behavior policy).

[InputConverter(typeof(RethinkDbConverter))]
[ConverterFallbackBehavior(ConverterFallbackBehavior.Default)]
public class RethinkDbInputAttribute : InputBindingAttribute
{
    ...
}

Adding input converters to the extensions moves some of the logic from host to worker. It may mean that the worker will be now establishing connections to the services or performing other operations. This will most likely create a need to register some dependencies, read configuration, and so on. Such things are best done at a function startup.

Participating in the Function App Startup

Extensions for the isolated worker model can implement a startup hook. It can be done by creating a public class with a parameterless constructor, that derives from WorkerExtensionStartup. This class also has to be registered through an assembly-level attribute. Now we can override the Configure method and register services and middlewares. The mechanic is quite similar to its equivalent for in-process extension.

[assembly: WorkerExtensionStartup(typeof(RethinkDbExtensionStartup))]

namespace Microsoft.Azure.Functions.Worker
{
    public class RethinkDbExtensionStartup : WorkerExtensionStartup
    {
        public override void Configure(IFunctionsWorkerApplicationBuilder applicationBuilder)
        {
            if (applicationBuilder == null)
            {
                throw new ArgumentNullException(nameof(applicationBuilder));
            }

            ...
        }
    }
}

The Conclusion

The isolated worker model doesn't invalidate what we know about the Azure Functions extensions, on the contrary, it builds another layer on top of that knowledge. Sadly, there are limitations in the supported data types for bindings which come from the serialization and transfer of data between the host and the worker. Still, in my opinion, the benefits of the new model can outweigh those limitations.

If you are looking for a working sample to learn and explore, you can find one here.

In the past, I've written posts that explored the core aspects of creating Azure Functions extensions:

In this post, I want to dive into how extensions implement support for the Azure Functions scaling model.

The Azure Functions scaling model (as one can expect) has evolved over the years. The first model is incremental scaling handled by the scale controller (a dedicated Azure Functions component). The scale controller monitors the rate of incoming events and makes magical (following the rule that every sophisticated enough technology is indistinguishable from magic) decisions about whether to add or remove a worker. The workers count can change only by one and at a specific rate. This mechanism has some limitations.

The first problem is the inability to scale in network-isolated deployments. In such a scenario, the scale controller doesn't have access to the services and can't monitor events - only the function app (and as a consequence the extensions) could. As a result, toward the end of 2019, the SDK provided support for extensions to vote in scale in and scale out decisions.

The second problem is the scaling process clarity and rate. Change by one worker at a time driven by complex heuristics is not perfect. This led to the introduction of target-based scaling at the beginning of 2023. Extensions become able to implement their own algorithms for requesting a specific number of workers and scale up by four instances at a time.

So, how do those two mechanisms work and can be implemented? To answer this question I'm going to describe them from two perspectives. One will be how it is being done by a well-established extension - Azure Cosmos DB bindings for Azure Functions. The other will be a sample implementation in my extension that I've used previously while writing on Azure Functions extensibility - RethinkDB bindings for Azure Functions.

But first things first. Before we can talk about the actual logic driving the scaling behaviors, we need the right SDK and the right classes in place.

Runtime and Target-Based Scaling Support in Azure WebJobs SDK

As I've mentioned above, support for extensions to participate in the scaling model has been introduced gradually to the Azure WebJobs SDK. At the same time, the team makes an effort to introduce changes in a backward-compatible manner. Support for voting in scaling decisions came in version 3.0.14 and for target-based scaling in 3.0.36, but you can have an extension that is using version 3.0.0 and it will work on the latest Azure Functions runtime. You can also have an extension that uses the latest SDK and doesn't implement the scalability mechanisms. You need to know that they are there and choose to utilize them.

This is reflected in the official extensions as well. The Azure Cosmos DB extension has implemented support for runtime scaling in version 3.0.5 and target-based scaling in 4.1.0 (you can find some tables that are gathering version numbers, for example in the section about virtual network triggers). This means, that if your function app is using lower versions of extensions, it won't benefit from these capabilities.

So, regardless if you're implementing your extension or you're just using an existing one, the versions of dependencies matter. However, if you're implementing your extension, you will need some classes.

Classes Needed to Implement Scaling Support

The Azure Functions scaling model revolves around triggers. After all, it's their responsibility to handle the influx of events. As you may know (for example from my previous post 😉), the heart of a trigger is a listener. This is where all the heavy lifting is being done and this is where we can inform the runtime that our trigger supports scalability features. We can do so by implementing two interfaces: IScaleMonitorProvider and ITargetScalerProvider. They are respectively tied to runtime scaling and target-based scaling. If they are implemented, the runtime will use the properties they define to obtain the actual logic implementations.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly IScaleMonitor<RethinkDbTriggerMetrics> _rethinkDbScaleMonitor;
    private readonly ITargetScaler _rethinkDbTargetScaler;

    ...

    public IScaleMonitor GetMonitor()
    {
        return _rethinkDbScaleMonitor;
    }

    public ITargetScaler GetTargetScaler()
    {
        return _rethinkDbTargetScaler;
    }
}

From the snippets above you can notice one more class - RethinkDbTriggerMetrics. It derives from the ScaleMetrics class and is used for capturing the values on which the decisions are being made.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    ...
}

The IScaleMonitor implementation contributes to the runtime scaling part of the scaling model. Its responsibilities are to provide snapshots of the metrics and to vote in the scaling decisions.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    public ScaleMonitorDescriptor Descriptor => ...;

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        ...
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        ...
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        ...
    }
}

The ITargetScaler implementation is responsible for the target-based scaling. It needs to focus only on one thing - calculating the desired worker count.

internal class RethinkDbTargetScaler : ITargetScaler
{
    public TargetScalerDescriptor TargetScalerDescriptor => ...;

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        ...
    }
}

As both of these implementations are tightly tied with a specific trigger, it is typical to instantiate them in the listener constructor.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(...);
        _rethinkDbTargetScaler = new RethinkDbTargetScaler(...);
    }

    ...
}

We are almost ready to start discussing the details of scaling logic, but before we do that, we need to talk about gathering metrics. The decisions need to be based on something.

Gathering Metrics

Yes, the decisions need to be based on something and the quality of the decisions will depend on the quality of metrics you can gather. So a lot depends on how well the service, for which the extension is being created, is prepared to be monitored.

Azure Cosmos DB is well prepared. The Azure Cosmos DB bindings for Azure Functions implement the trigger on top of the Azure Cosmos DB change feed processor and use the change feed estimator to gather the metrics. This gives access to quite an accurate estimate of the remaining work. As an additional metric, the extension is gathers the number of leases for the container that the trigger is observing.

RethinkDB is not so well prepared. It seems like change feed provides only one metric - buffered items count.

internal class RethinkDbTriggerMetrics : ScaleMetrics
{
    public int BufferedItemsCount { get; set; }
}

Also, the metric can only be gathered while iterating the change feed. This forces an intermediary between the listener and the scale monitor (well you could use the scale monitor directly, but it seemed ugly to me).

internal class RethinkDbMetricsProvider
{
    public int CurrentBufferedItemsCount { get; set; }

    public RethinkDbTriggerMetrics GetMetrics()
    {
        return new RethinkDbTriggerMetrics { BufferedItemsCount = CurrentBufferedItemsCount };
    }
}

Now the same instance of this intermediary can be used by the listener to provide the value.

internal class RethinkDbTriggerListener : IListener, IScaleMonitorProvider, ITargetScalerProvider
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbTriggerListener(...)
    {
        ...

        _rethinkDbMetricsProvider = new RethinkDbMetricsProvider();
        _rethinkDbScaleMonitor = new RethinkDbScaleMonitor(..., _rethinkDbMetricsProvider);

        ...
    }

    ...

    private async Task ListenAsync(CancellationToken listenerStoppingToken)
    {
        ...

        while (!listenerStoppingToken.IsCancellationRequested
               && (await changefeed.MoveNextAsync(listenerStoppingToken)))
        {
            _rethinkDbMetricsProvider.CurrentBufferedItemsCount = changefeed.BufferedItems.Count;
            ...
        }

        ...
    }
}

And the scale monitor to read it.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    private readonly RethinkDbMetricsProvider _rethinkDbMetricsProvider;

    ...

    public RethinkDbScaleMonitor(..., RethinkDbMetricsProvider rethinkDbMetricsProvider)
    {
        ...

        _rethinkDbMetricsProvider = rethinkDbMetricsProvider;

        ...
    }

    public Task<RethinkDbTriggerMetrics> GetMetricsAsync()
    {
        return Task.FromResult(_rethinkDbMetricsProvider.GetMetrics());
    }

    Task<ScaleMetrics> IScaleMonitor.GetMetricsAsync()
    {
        return Task.FromResult((ScaleMetrics)_rethinkDbMetricsProvider.GetMetrics());
    }

    ...
}

We have the metrics now, it is finally time to make some decisions.

Voting to Scale In or Scale Out

An extension can cast one of three votes: to scale out, to scale in, or neutral. This is where the intimate knowledge of the service that the extension is created for comes into play.

As I've already written, the Azure Cosmos DB change feed processor is well prepared to gather metrics about it. It is also well prepared to be consumed in a scalable manner. It can distribute the work among multiple compute instances, by balancing the number of leases owned by each compute instance. It will also adjust the number of leases based on throughput and storage. This is why the Azure Cosmos DB bindings are tracking the number of leases as one of the metrics - it's an upper limit for the number of workers. So the extension utilizes all this knowledge and employs the following algorithm to cast a vote:

  1. If there are no metrics gathered yet, cast a neutral vote.
  2. If the current number of leases is greater than the number of workers, cast a scale-in vote.
  3. If there are less than five metrics samples, cast a neutral vote.
  4. If the ratio of workers to remaining items is less than 1 per 1000, cast a scale-out vote.
  5. If there are constantly items waiting to be processed and the number of workers is smaller than the number of leases, cast a scale-out vote.
  6. If the trigger source has been empty for a while, cast a scale-in vote.
  7. If there has been a continuous increase across the last five samples in items to be processed, cast a scale-out vote.
  8. If there has been a continuous decrease across the last five samples in items to be processed, cast a scale-in vote.
  9. If none of the above has happened, cast a neutral vote.

RethinkDB is once again lacking here. It looks like its change feed is not meant to be processed in parallel at all. This leads to an interesting edge case, where we never want to scale beyond a single instance.

internal class RethinkDbScaleMonitor : IScaleMonitor<RethinkDbTriggerMetrics>
{
    ...

    public ScaleStatus GetScaleStatus(ScaleStatusContext<RethinkDbTriggerMetrics> context)
    {
        return GetScaleStatus(
            context.WorkerCount,
            context.Metrics?.ToArray()
        );
    }

    public ScaleStatus GetScaleStatus(ScaleStatusContext context)
    {
        return GetScaleStatus(
            context.WorkerCount, 
            context.Metrics?.Cast<RethinkDbTriggerMetrics>().ToArray()
        );
    }

    private ScaleStatus GetScaleStatus(int workerCount, RethinkDbTriggerMetrics[] metrics)
    {
        ScaleStatus status = new ScaleStatus
        {
            Vote = ScaleVote.None
        };

        // RethinkDB change feed is not meant to be processed in paraller.
        if (workerCount > 1)
        {
            status.Vote = ScaleVote.ScaleIn;

            return status;
        }

        return status;
    }
}

Requesting Desired Number of Workers

Being able to tell if you want more workers or less is great, being able to tell how many workers you want is even better. Of course, there is no promise the extension will get the requested number (even in the case of target-based scaling the scaling happens with a maximum rate of four instances at a time), but it's better than increasing and decreasing by one instance. Extensions can also use this mechanism to participate in dynamic concurrency.

This is exactly what the Azure Cosmos DB extension is doing. It divides the number of remaining items by the value of the MaxItemsPerInvocation trigger setting (the default is 100). The result is capped by the number of leases and that's the desired number of workers.

We already know that in the case of RethinkDB, it's even simpler - we always want just one worker.

internal class RethinkDbTargetScaler : ITargetScaler
{
    ...

    public Task<TargetScalerResult> GetScaleResultAsync(TargetScalerContext context)
    {
        return Task.FromResult(new TargetScalerResult
        {
            TargetWorkerCount = 1
        });
    }
}

That's it. The runtime scaling implementation is now complete.

Consequences of Runtime Scaling Design

Creating extensions for Azure Functions is not something commonly done. Still, there is a value in understanding their anatomy and how they work as it is often relevant also when you are creating function apps.

The unit of scale for Azure Functions is the entire functions app. At the same time, the scaling votes and the desired number of workers are delivered at a trigger level. This means that you need to be careful when creating function apps with multiple triggers. If the triggers will be aiming for completely different scaling decisions it may lead to undesired scenarios. For example, one trigger may constantly want to scale in, while another will want to scale out. As you could notice throughout this post, some triggers have hard limits on how far they can scale to not waste resources (even two Azure Cosmos DB triggers may have a different upper limit because the lease containers they are attached to will have different numbers of leases available). This all should be taken into account while designing the function app and trying to foresee how it will scale.

A year ago I wrote about running a .NET based Spin application on a WASI node pool in AKS. Since then the support for WebAssembly in AKS hasn't changed. We still have the same preview, supporting the same versions of two ContainerD shims: Spin and Slight (a.k.a. SpiderLightning). So why am I coming back to the subject? The broader context has evolved.

With WASI preview 2, the ecosystem is embracing the component model and standardized APIs. When I was experimenting with Spin, I leveraged WAGI (WebAssembly Gateway Interface) which allowed me to be ignorant about the Wasm runtime context. Now I want to change that, cross the barrier and dig into direct interoperability between .NET and Wasm.

Also, regarding the mentioned APIs, one of the emerging ones is wasi-cloud-core which aims to provide a generic way for WASI applications to interact with services. This proposal is not yet standardized but it has an experimental host implementation which happens to be Slight. By running a .NET based Slight application I want to get a taste of what that API might bring.

Last but not least, .NET 8 has brought a new way of building .NET based Wasm applications with a wasi-experimental workload. I want to build something with it and see where the .NET support for WASI is heading.

So, this "experiment" has multiple angles and brings together a bunch of different things. How am I going to start? In the usual way, by creating a project.

Creating a .NET 8 wasi-wasm Project

The wasi-experimental workload is optional, so we need to install it before we can create a project.

dotnet workload install wasi-experimental

It also doesn't bundle the WASI SDK (the WASI SDK for .NET 7 did), so we have to install it ourselves. The releases of WASI SDK available on GitHub contain binaries for different platforms. All you need to do is download the one appropriate for yours, extract it, and create the WASI_SDK_PATH environment variable pointing to the output. The version I'll be using here is 20.0.

With the prerequisites in place, we can create the project.

dotnet new wasiconsole -o Demo.Wasm.Slight

Now, if you run dotnet build and inspect the output folder, you can notice that it contains Demo.Wasm.Slight.dll, other managed DLLs, and dotnet.wasm. This is the default output, where the dotnet.wasm is responsible for loading the Mono runtime and then loading the functionality from DLLs. This is not what we want. We want a single file. To achieve that we need to modify the project file by adding the WasmSingleFileBundle property (in my opinion this should be the default).

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    ...
    <WasmSingleFileBundle>true</WasmSingleFileBundle>
  </PropertyGroup>
</Project>

If you run dotnet build after this modification, you will find Demo.Wasm.Slight.wasm in the output. Exactly what we want.

But, before we can start implementing the actual application, we need to work on the glue between the .NET and WebAssembly so we can interact with the APIs provided by Slight from C#.

From WebAssembly IDL To C#

The imports and exports in WASI APIs are described in Wasm Interface Type (WIT) format. This format is an IDL which is a foundation behind tooling for the WebAssembly Component Model.

WIT aims at being a developer-friendly format, but writing bindings by hand is not something that developers expect. This is where wit-bindgen comes into the picture. It's a (still young) binding generator for languages that are compiled into WebAssembly. It currently supports languages like Rust, C/C++, or Java. The C# support is being actively worked on and we can expect that at some point getting C# bindings will be as easy as running a single command (or even simpler as Steve Sanderson is already experimenting with making it part of the toolchain) but for now, it's too limited and we will have to approach things differently. What we can use is the C/C++ support.

There is one more challenge on our way. The current version of wit-bindgen is meant for WASI preview 2. Meanwhile, a lot of existing WIT definitions and WASI tooling around native languages are using WASI preview 1. This is exactly the case when it comes to the Slight implementation available in the AKS preview. To handle that we need an old (and I mean old) version of wit-bindgen. I'm using version v0.2.0. Once you install it, you can generate C/C++ bindings for the desired imports and exports. Slight in version 0.1.0 provides a couple of capabilities, but I've decided to start with just one, the HTTP Server. That means we need imports for http.wit and exports for http-handler.wit.

wit-bindgen c --import ./wit/http.wit --out-dir ./native/
wit-bindgen c --export ./wit/http-handler.wit --out-dir ./native/

Once we have the C/C++ bindings, we can configure the build to include those files in the arguments passed to Clang. For this purpose, we can use the .targets file.

<Project>
  <ItemGroup>
    <_WasmNativeFileForLinking Include="$(MSBuildThisFileDirectory)\..\native\*.c" />
  </ItemGroup>
</Project>

We also need to implement the C# part of the interop. For the main part, it's like any other interop you might have ever done. You can use generators or ask for help from a friendly AI assistant and it will get you through the code for types and functions. However, there is one specific catch. The DllImportAttribute requires passing a library name for the P/Invoke generator, but we have no library. The solution is as simple as surprising (at least to me), we can provide the name of any library that the P/Invoke generator knows about.

internal static class HttpServer
{
    // Any library name that P/Invoke generator knows
    private const string LIBRARY_NAME = "libSystem.Native";

    [DllImport(LIBRARY_NAME)]
    internal static extern unsafe void http_server_serve(WasiString address,
        uint httpRouterIndex, out WasiExpected<uint> ret0);

    [DllImport(LIBRARY_NAME)]
    internal static extern unsafe void http_server_stop(uint httpServerIndex,
        out WasiExpected<uint> ret0);
}

With all the glue in place, we can start implementating the HTTP server capability.

Implementing the Slight HTTP Server Capability

In order for a Slight application to start accepting HTTP requests, the application needs to call the http_server_serve function to which it must provide the address it wants to listen on and the index of a router that defines the supported routes. I've decided to roll out to simplest implementation I could think of to start testing things out - the HttpRouter and HttpServer classes which only allow for calling the serve function (no support for routing).

internal class HttpRouter
{
    private uint _index;

    public uint Index => _index;

    private HttpRouter(uint index)
    {
        _index = index;
    }

    public static HttpRouter Create()
    {
        http_router_new(out WasiExpected<uint> expected);

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        return new HttpRouter(expected.Result.Value);
    }
}

internal static class HttpServer
{
    private static uint? _index;

    public static void Serve(string address)
    {
        if (_index.HasValue)
        {
            throw new Exception("The server is already running!");
        }

        HttpRouter router = HttpRouter.Create();
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        _index = expected.Result;
    }
}

This allowed me to implement a simplistic application, a one-liner.

HttpServer.Serve("0.0.0.0:80");

It worked! Every request results in 404, because there is no routing, but it worked. So, how to add support for routing?

The http_router_* functions for defining routes expect two strings - one for the route and one for the handler. This suggests that the handler should be an exported symbol that Slight will be able to call. I went through the bindings exported for http-handler.wit and I've found a function that is being exported as handle-http. That function seems to be what we are looking for. It performs transformations to/from the request and response objects and calls a http_handler_handle_http function which has only a definition. So it looks like the http_handler_handle_http implementation is a place for the application logic. To test this theory, I've started by implementing a simple route registration method.

internal class HttpRouter
{
    ...

    private static readonly WasiString REQUEST_HANDLER = WasiString.FromString("handle-http");

    ...

    public HttpRouter RegisterRoute(HttpMethod method, string route)
    {
        WasiExpected<uint> expected;

        switch (method)
        {
            case HttpMethod.GET:
                http_router_get(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.PUT:
                http_router_put(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.POST:
                http_router_post(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            case HttpMethod.DELETE:
                http_router_delete(_index, WasiString.FromString(route), REQUEST_HANDLER,
                    out expected);
                break;
            default:
                throw new NotSupportedException($"Method {method} is not supported.");
        }

        if (expected.IsError)
        {
            throw new Exception(expected.Error?.ErrorWithDescription.ToString());
        }

        return new HttpRouter(expected.Result.Value)
    }

    ...
}

Next I've registered some catch-all routes and implemented the HandleRequest method as the .NET handler. It would still return a 404, but it would be mine 404.

internal static class HttpServer
{
    ...

    public static void Serve(string address)
    {
        ...

        HttpRouter router = HttpRouter.Create()
                            .RegisterRoute(HttpMethod.GET, "/")
                            .RegisterRoute(HttpMethod.GET, "/*");
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        ...
    }

    private static unsafe void HandleRequest(ref HttpRequest request,
        out WasiExpected<HttpResponse> result)
    {
        HttpResponse response = new HttpResponse(404);
        response.SetBody($"Handler Not Found ({request.Method} {request.Uri.AbsolutePath})");

        result = new WasiExpected<HttpResponse>(response);
    }

    ...
}

It's time for some C. I went through the Mono WASI C driver and found two functions that looked the right tools for the job: lookup_dotnet_method and mono_wasm_invoke_method_ref. The implementation didn't seem overly complicated.

#include <string.h>
#include <wasm/driver.h>
#include "http-handler.h"

MonoMethod* handle_request_method;

void mono_wasm_invoke_method_ref(MonoMethod* method, MonoObject** this_arg_in,
                                 void* params[], MonoObject** _out_exc, MonoObject** out_result);

void http_handler_handle_http(http_handler_request_t* req,
                              http_handler_expected_response_error_t* ret0)
{
    if (!handle_request_method)
    {
        handle_request_method = lookup_dotnet_method(
            "Demo.Wasm.Slight",
            "Demo.Wasm.Slight",
            "HttpServer",
            "HandleRequest",
        -1);
    }

    void* method_params[] = { req, ret0 };
    MonoObject* exception;
    MonoObject* result;
    mono_wasm_invoke_method_ref(handle_request_method, NULL, method_params, &exception, &result);
}

But it didn't work. The thrown exception suggested that the Mono runtime wasn't loaded. I went back to studying Mono to learn how it is being loaded. What I've learned is that during compilation a _start() function is being generated. This function performs the steps necessary to load the Mono runtime and wraps the entry point to the .NET code. I could call it, but this would mean going through the Main method and retriggering HttpServer.Serve, which was doomed to fail. I needed to go a level lower. By reading the code of the _start() function I've learned that it calls the mono_wasm_load_runtime function. Maybe I could as well?

...

int mono_runtime_loaded = 0;

...

void http_handler_handle_http(http_handler_request_t* req,
                              http_handler_expected_response_error_t* ret0)
{
    if (!mono_runtime_loaded) {
        mono_wasm_load_runtime("", 0);

        mono_runtime_loaded = 1;
    }

    ...
}

Now it worked. But I wasn't out of the woods yet. What I've just learned meant that to provide dedicated handlers for routes I couldn't rely on registering dedicated methods as part of the Main method flow. I could only register the routes and the handlers needed to be discoverable later, in a new context, with static HandleRequest as the entry point. My thoughts went in the direction of a poor man's attribute-based routing, so I've started with an attribute for decorating handlers.

internal class HttpHandlerAttribute: Attribute
{
    public HttpMethod Method { get; }

    public string Route { get; }

    public HttpHandlerAttribute(HttpMethod method, string route)
    {
        Method = method;
        Route = route;
    }

    ...
}

A poor man's implementation of an attribute-based routing must have an ugly part and it is reflection. To register the routes (and later match them with handlers), the types must be scanned for methods with the attributes. In a production solution, it would be necessary to narrow the scan scope but as this is just a small demo I've decided to keep it simple and scan the whole assembly for static, public and non-public, methods decorated with the attribute. The code supports adding multiple attributes to a single method just because it's simpler than putting proper protections in place. As you can probably guess, I did override the Equals and GetHashCode implementations in the attribute to ensure it behaves nicely as a dictionary key.

internal class HttpRouter
{
    ...

    private static readonly Type HTTP_HANDLER_ATTRIBUTE_TYPE = typeof(HttpHandlerAttribute);

    private static Dictionary<HttpHandlerAttribute, MethodInfo>? _routes;

    ...

    private static void DiscoverRoutes()
    {
        if (_routes is null)
        {
            _routes = new Dictionary<HttpHandlerAttribute, MethodInfo>();

            foreach (Type type in Assembly.GetExecutingAssembly().GetTypes())
            {
                foreach(MethodInfo method in type.GetMethods(BindingFlags.Static |
                                                             BindingFlags.Public |
                                                             BindingFlags.NonPublic))
                {
                    foreach (object attribute in method.GetCustomAttributes(
                             HTTP_HANDLER_ATTRIBUTE_TYPE, false))
                    {
                        _routes.Add((HttpHandlerAttribute)attribute, method);
                    }
                }
            }
        }
    }

    ...
}

With the reflection stuff (mostly) out of the way, I could implement a method that can be called to register all discovered routes and a method to invoke a handler for a route. This implementation is not "safe". I don't do any checks on the reflected MethodInfo to ensure that the method has a proper signature. After all, I can only hurt myself here.

internal class HttpRouter
{
    ...

    internal HttpRouter RegisterRoutes()
    {
        DiscoverRoutes();

        HttpRouter router = this;
        foreach (KeyValuePair<HttpHandlerAttribute, MethodInfo> route in _routes)
        {
            router = router.RegisterRoute(route.Key.Method, route.Key.Route);
        }

        return router;
    }

    internal static HttpResponse? InvokeRouteHandler(HttpRequest request)
    {
        DiscoverRoutes();

        HttpHandlerAttribute attribute = new HttpHandlerAttribute(request.Method,
                                                                  request.Uri.AbsolutePath);
        MethodInfo handler = _routes.GetValueOrDefault(attribute);

        return (handler is null) ? null : (HttpResponse)handler.Invoke(null,
                                                                     new object[] { request });
    }

    ...
}

What remained was small modifications to the HttpServer to use the new methods.

internal static class HttpServer
{
    ...

    public static void Serve(string address)
    {
        ...

        HttpRouter router = HttpRouter.Create()
                            .RegisterRoutes();
        http_server_serve(
            WasiString.FromString(address),
            router.Index,
            out WasiExpected<uint> expected
        );

        ...
    }

    private static unsafe void HandleRequest(ref HttpRequest request,
        out WasiExpected<HttpResponse> result)
    {
        HttpResponse? response = HttpRouter.InvokeRouteHandler(request);

        if (!response.HasValue)
        {
            response = new HttpResponse(404);
            response.Value.SetBody(
                $"Handler Not Found ({request.Method} {request.Uri.AbsolutePath})"
            );
        }

        result = new WasiExpected<HttpResponse>(response.Value);
    }

    ...
}

To test this out, I've created two simple handlers.

[HttpHandler(HttpMethod.GET, "/hello")]
internal static HttpResponse HandleHello(HttpRequest request)
{
    HttpResponse response = new HttpResponse(200);
    response.SetHeaders(new[] { KeyValuePair.Create("Content-Type", "text/plain") });
    response.SetBody($"Hello from Demo.Wasm.Slight!");

    return response;
}

[HttpHandler(HttpMethod.GET, "/goodbye")]
internal static HttpResponse HandleGoodbye(HttpRequest request)
{
    HttpResponse response = new HttpResponse(200);
    response.SetHeaders(new[] { KeyValuePair.Create("Content-Type", "text/plain") });
    response.SetBody($"Goodbye from Demo.Wasm.Slight!");

    return response;
}

That's it. Certainly not complete, certainly not optimal, most likely buggy, and potentially leaking memory. But it works and can be deployed to the cloud.

Running a Slight Application in WASM/WASI Node Pool

To deploy our Slight application to the cloud, we need an AKS Cluster with a WASM/WASI node pool. The process of setting it up hasn't changed since my previous post and you can find all the necessary steps there. Here we can start with dockerizing our application.

As we are dockerizing a Wasm application, the final image should be from scratch. In the case of a Slight application, it should contain two elements: app.wasm and slightfile.toml. The slightfile.toml is a configuration file and its main purpose is to define and provide options for the capabilities needed by the application. In our case, that's just the HTTP Server capability.

specversion = "0.1"

[[capability]]
name = "http"

The app.wasm file is our application. It should have this exact name and be placed at the root of the image. To be able to publish the application, the build stage in our Dockerfile must install the same prerequisites as we did for local development (WASI SDK and wasi-experimental workload).

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build

RUN curl https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz -L --output wasi-sdk-20.0-linux.tar.gz
RUN tar -C /usr/local/lib -xvf wasi-sdk-20.0-linux.tar.gz
ENV WASI_SDK_PATH=/usr/local/lib/wasi-sdk-20.0

RUN dotnet workload install wasi-experimental

WORKDIR /src
COPY . .
RUN dotnet publish --configuration Release

FROM scratch

COPY --from=build /src/bin/Release/net8.0/wasi-wasm/AppBundle/Demo.Wasm.Slight.wasm ./app.wasm
COPY --from=build /src/slightfile.toml .

With the infrastructure in place and the image pushed to the container registry, all that is needed is a deployment manifest for Kubernetes resources. It is the same as it was for a Spin application, the only difference is the kubernetes.azure.com/wasmtime-slight-v1 node selector.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: "wasmtime-slight-v1"
handler: "slight"
scheduling:
  nodeSelector:
    "kubernetes.azure.com/wasmtime-slight-v1": "true"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: slight-with-dotnet-8
spec:
  replicas: 1
  selector:
    matchLabels:
      app: slight-with-dotnet-8
  template:
    metadata:
      labels:
        app: slight-with-dotnet-8
    spec:
      runtimeClassName: wasmtime-slight-v1
      containers:
        - name: slight-with-dotnet-8
          image: crdotnetwasi.azurecr.io/slight-with-dotnet-8:latest
          command: ["/"]
---
apiVersion: v1
kind: Service
metadata:
  name: slight-with-dotnet-8
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  selector:
    app: slight-with-dotnet-8
  type: LoadBalancer

After applying the above manifest, you can get the service IP with kubectl get svc or from the Azure Portal and execute some requests.

WebAssembly, Cloud, .NET, and Future?

Everything I toyed with here is either a preview or experimental, but it's a glimpse into where WebAssembly in the Cloud is heading.

If I dare to make a prediction, I think that when the wasi-cloud-core proposal reaches a mature enough phase, we will see it supported on AKS (probably it will replace the Spin and Slight available in the current preview). The support for WASI in .NET will also continue to evolve and we will see a non-experimental SDK once the specification gets stable.

For now, we can keep experimenting and exploring. If you want to kick-start your own exploration with what I've described in this post, the source code is here.

Postscript

Yes, I know that this post is already a little bit long, but I wanted to mention one more thing.

When I was wrapping this post, I read the "Extending WebAssembly to the Cloud with .NET" and learned that Steve Sanderson has also built some Slight samples. Despite that, I've decided to publish this post. I had two main reasons for that. First,I dare to think that there is valuable knowledge in this dump of thoughts of mine. Second, those samples take a slightly different direction than mine, and I believe that the fact that you can arrive at different destinations from the same origin is one of the beautiful aspects of software development - something worth sharing.

Older Posts