Server-Sent Events and ASP.NET Core - You may need keep alives

More than a year ago I've written about supporting Server-Sent Events in ASP.NET Core. Since that time I've been maintaining a middleware providing such support. I wasn't doing that just as a hobby, I've been using this middleware in several applications. Somehow all of them were deployed with Kestrel on the edge (internal infrastructure etc.). Only recently I've deployed first real-life application with this middleware to Azure. After few days issues started to appear. Users were stopping receiving updates after some (indeterministic) time. There was also no correlation between users actions and SSE connection drop. I had to look for answer in App Service log. I've found it, a 502.3 coming from IIS. So this was it, but what it was? I had to perform further research to discover the root cause. Long story short, I've learned that ANCM (the module which is used to run Kestrel behind IIS) has a request timeout of 120 seconds, which is measured from last activity. In another words, if there is a long living connection and it doesn't send anything for 2 minutes, it will be killed by ANCM. I needed to mitigate that.

Keepalives to the rescue

There is a known solution for scenarios where we want to prevent a network connection from being teardown due to inactivity - keepalives. The mechanism is simple, the side which wants to uphold the connection sends a control message at predefined intervals. Many protocols have built-in support for keepalives. TCP has optional keepalive which can be enabled per connection. WebSockets provide ping and pong frames for easy on-demand keepalive. Server-Sent Events doesn't provide anything dedicated, but it can be implemented easily.

Implementing keepalives

Keepalives require a long running background task, in case of ASP.NET Core that calls for BackgroundService or IHostedService implementation. I wanted my middleware to target ASP.NET Core 2.0.0 so I went with IHostedService. One special case I needed to handle was having multiple instances of the service. The ServerSentEventsMiddleware can be registered at multiple endpoints and in such scenario it uses different implementations of ServerSentEventsService to isolate those endpoints. This means that every implementation of ServerSentEventsService requires its own instance of IHostedService and separated set of options. Generics to the rescue.

public class ServerSentEventsServiceOptions<TServerSentEventsService>
    where TServerSentEventsService : ServerSentEventsService
{
}
internal class ServerSentEventsKeepaliveService<TServerSentEventsService> : IHostedService, IDisposable
    where TServerSentEventsService : ServerSentEventsService
{
    ...

    private readonly ServerSentEventsServiceOptions<TServerSentEventsService> _options;
    private readonly TServerSentEventsService _serverSentEventsService;

    public ServerSentEventsKeepaliveService(TServerSentEventsService serverSentEventsService,
        IOptions<ServerSentEventsServiceOptions<TServerSentEventsService>> options)
    {
        _options = options?.Value ?? throw new ArgumentNullException(nameof(options));
        _serverSentEventsService = serverSentEventsService;
    }

    ...
}

The simplest thing which can be send as keepalive is a single comment line (for example : KEEPALIVE). This is correct from protocol point of view, requires the smallest amount of data to be send and will be ignored by browser.

public class ServerSentEventsServiceOptions<TServerSentEventsService>
    where TServerSentEventsService : ServerSentEventsService
{
    public int KeepaliveInterval { get; set; }  = 30;
}
internal class ServerSentEventsKeepaliveService<TServerSentEventsService> : IHostedService, IDisposable
    where TServerSentEventsService : ServerSentEventsService
{
    ...

    private readonly static ServerSentEventBytes _keepaliveServerSentEventBytes =
        ServerSentEventsHelper.GetCommentBytes("KEEPALIVE");

    ...

    private async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            await _serverSentEventsService.SendEventAsync(_keepaliveServerSentEventBytes);

            await Task.Delay(TimeSpan.FromSeconds(_options.KeepaliveInterval), stoppingToken);
        }
    }
}

The last thing left to do is tying up ServerSentEventsService registration with ServerSentEventsKeepaliveService registration.

public static IServiceCollection AddServerSentEvents<TIServerSentEventsService, TServerSentEventsService>(
    this IServiceCollection services,
    Action<ServerSentEventsServiceOptions<TServerSentEventsService>> configureOptions)
    where TIServerSentEventsService : class, IServerSentEventsService
    where TServerSentEventsService : ServerSentEventsService, TIServerSentEventsService
{
    ...

    services.AddSingleton<TServerSentEventsService>();
    services.AddSingleton<TIServerSentEventsService>(serviceProvider =>
        serviceProvider.GetService<TServerSentEventsService>());

    services.Configure(configureOptions);
    services.AddSingleton<IHostedService, ServerSentEventsKeepaliveService<TServerSentEventsService>>();

    return services;
}

Sending keepalives when behind ANCM

It's nice when libraries have useful defaults. Keepalives are needed when application runs behind ANCM, but in other scenarios not necessary. It would be great if that could be the default behaviour.

public enum ServerSentEventsKeepaliveMode
{
    Always,
    Never,
    BehindAncm
}
public class ServerSentEventsServiceOptions<TServerSentEventsService>
    where TServerSentEventsService : ServerSentEventsService
{
    public ServerSentEventsKeepaliveMode KeepaliveMode { get; set; } = ServerSentEventsKeepaliveMode.BehindAncm;

    ...
}

The only requirement is ability to detect the ANCM. I had hard time finding out how to do that, luckily David Fowler gave me a hand. This allowed me to add a check to StartAsync.

internal class ServerSentEventsKeepaliveService<TServerSentEventsService> : IHostedService, IDisposable
    where TServerSentEventsService : ServerSentEventsService
{
    ...

    public Task StartAsync(CancellationToken cancellationToken)
    {
        if ((_options.KeepaliveMode == ServerSentEventsKeepaliveMode.Always)
            || ((_options.KeepaliveMode == ServerSentEventsKeepaliveMode.BehindAncm) && IsBehindAncm()))
        {
            _executingTask = ExecuteAsync(_stoppingCts.Token);

            if (_executingTask.IsCompleted)
            {
                return _executingTask;
            }
        }

        return Task.CompletedTask;
    }

    ...

    private static bool IsBehindAncm()
    {
        return !String.IsNullOrEmpty(Environment.GetEnvironmentVariable("ASPNETCORE_PORT"))
            && !String.IsNullOrEmpty(Environment.GetEnvironmentVariable("ASPNETCORE_APPL_PATH"))
            && !String.IsNullOrEmpty(Environment.GetEnvironmentVariable("ASPNETCORE_TOKEN"));
    }
}

This gave me exactly what I wanted and hopefully made my library more useful for others.