WebSocket per-message compression in ASP.NET Core

This is my third post about WebSocket protocol in ASP.NET Core. Previously I've written about subprotocol negotiation and Cross-Site WebSocket Hijacking. The subject I'm focusing on here is the per-message compression which is out of the box supported by Chrome, FireFox and other browsers.

WebSocket Extensions

The WebSocket protocol has a concept of extensions, which can provide new capabilities. An extension can define any additional functionality which is able to work on top of the WebSocket framing layer. The specification reserves three bits of header (RSV1, RSV2 and RSV3) and all opcodes from 3 to 7 and 11 to 15 to be used by extensions (it also allows for using the reserved bits in order to create additional opcodes or even using some of the payload data for that purpose). The extensions (similarly to subprotocol) are being negotiated through dedicated header (Sec-WebSocket-Extensions) as part of the handshake. A client can advertise the supported extensions by putting the list into the header and server can accept one or more in the exactly same way.

There are two extensions I have heard of: A Multiplexing Extension for WebSockets and Compression Extensions for WebSocket. The first one has never gone beyond draft but the second has become a standard and got adopted by several browsers.

WebSocket Per-Message Compression Extensions

The Compression Extensions for WebSocket standard defines two things. First is a framework for adding compression functionality to the WebSocket protocol. The framework is really simple, it states only two things:

  • Per-Message Compression Extension operates only on message data (so compression takes place before spliting into frames and decompression takes place after all frames have been received).
  • Per-Message Compression Extension allocates the RSV1 bit and calls it the Per-Message Compressed bit. The bit is supposed to be set to 1 on first frame of compressed message.

The challenging part is the allocation of the RSV1 bit. It makes it impossible to implement support for per-message compression on top of the WebSocket stack available in ASP.NET Core. Because of that I've decided to roll my own implementation for IHttpWebSocketFeature. It is very similar to one provided by Microsoft.AspNetCore.WebSockets and the underlying WebSocket implementation is based on ManagedWebSocket so closely that needed changes can be described in its context (the key difference is that my implementation is stripped from the client specific logic as it is not needed).

From the public API perspective there must be way to set and get the information that message is compressed. First can be achieved with overload of SendAsync method (or more like extending the current SendAsync with one more parameter and providing overload which doesn't need it).

internal class CompressionWebSocket : WebSocket
{
    public override Task SendAsync(ArraySegment<byte> buffer, WebSocketMessageType messageType,
        bool endOfMessage, CancellationToken cancellationToken)
    {
        return SendAsync(buffer, messageType, false, endOfMessage, cancellationToken);
    }

    public Task SendAsync(ArraySegment<byte> buffer, WebSocketMessageType messageType, bool compressed,
        bool endOfMessage, CancellationToken cancellationToken)
    {
        ...;
    }
}

The information about a received message can be exposed through a delivered WebSocketReceiveResult.

public class CompressionWebSocketReceiveResult : WebSocketReceiveResult
{
    public bool Compressed { get; }

    public CompressionWebSocketReceiveResult(int count, WebSocketMessageType messageType,
        bool compressed, bool endOfMessage)
        : base(count, messageType, endOfMessage)
    {
        Compressed = compressed;
    }

    public CompressionWebSocketReceiveResult(int count, WebSocketMessageType messageType,
        bool endOfMessage, WebSocketCloseStatus? closeStatus, string closeStatusDescription)
        : base(count, messageType, endOfMessage, closeStatus, closeStatusDescription)
    {
        Compressed = false;
    }
}

Next step is adjusting the internals of the WebScoket implementation to properly write and read the RSV1 bit. The writing part is being handled by the WriteHeader method. This method needs to be changed in a way that it sets the RSV1 bit when the messages is compressed and current frame is not a continuation.

private static int WriteHeader(WebSocketMessageOpcode opcode, byte[] sendBuffer,
    ArraySegment<byte> payload, bool compressed, bool endOfMessage)
{
    sendBuffer[0] = (byte)opcode;

    if (compressed && (opcode != WebSocketMessageOpcode.Continuation))
    {
        sendBuffer[0] |= 0x40;
    }

    if (endOfMessage)
    {
        sendBuffer[0] |= 0x80;
    }

    ...
}

After this change all the paths leading to WriteHeader method must be changed to either provide (passed down) value of compressed parameter from SendAsync or false.

The receiving flow has a corresponding method TryParseMessageHeaderFromReceiveBuffer which fills out a MessageHeader struct. A different version of that struct is needed.

[StructLayout(LayoutKind.Auto)]
internal struct CompressionWebSocketMessageHeader
{
    internal WebSocketMessageOpcode Opcode { get; set; }

    internal bool Compressed { get; set; }

    internal bool Fin { get; set; }

    internal long PayloadLength { get; set; }

    internal int Mask { get; set; }
}

The TryParseMessageHeaderFromReceiveBuffer method will require two changes. One will take care of reading the RSV1 bit and the second will change the validation of all RSV bits values (per protocol specification invalid combination of RSV bits must fail the connection).

private bool TryParseMessageHeaderFromReceiveBuffer(out CompressionWebSocketMessageHeader resultHeader)
{
    var header = new CompressionWebSocketMessageHeader();

    header.Opcode = (WebSocketMessageOpcode)(_receiveBuffer[_receiveBufferOffset] & 0xF);
    header.Compressed = (_receiveBuffer[_receiveBufferOffset] & 0x40) != 0;
    header.Fin = (_receiveBuffer[_receiveBufferOffset] & 0x80) != 0;

    bool reservedSet = (_receiveBuffer[_receiveBufferOffset] & 0x70) != 0;
    bool reservedExceptCompressedSet = (_receiveBuffer[_receiveBufferOffset] & 0x30) != 0;

    ...

    bool shouldFail = (!header.Compressed && reservedSet) || reservedExceptCompressedSet;

    ...
}

The last step is to modify InternalReceiveAsync method so it skips UTF-8 validation for compressed messages and properly creates CompressionWebSocketReceiveResult.

private async Task<WebSocketReceiveResult> InternalReceiveAsync(ArraySegment<byte> payloadBuffer,
    CancellationToken cancellationToken)
{
    ...

    try
    {
        while (true)
        {
            ...

            if ((header.Opcode == WebSocketMessageOpcode.Text) && !header.Compressed
                && !TryValidateUtf8(
                    new ArraySegment<byte>(payloadBuffer.Array, payloadBuffer.Offset, bytesToCopy),
                    header.Fin, _utf8TextState))
            {
                await CloseWithReceiveErrorAndThrowAsync(WebSocketCloseStatus.InvalidPayloadData,
                    WebSocketError.Faulted, cancellationToken).ConfigureAwait(false);
            }

            _lastReceiveHeader = header;
            return new CompressionWebSocketReceiveResult(
                bytesToCopy,
                header.Opcode == WebSocketMessageOpcode.Text ?
                    WebSocketMessageType.Text : WebSocketMessageType.Binary,
                header.Compressed,
                bytesToCopy == 0 || (header.Fin && header.PayloadLength == 0));
        }
    }
    catch (Exception ex)
    {
        ...
    }
    finally
    {
        ...
    }
}

With those changes in place the WebSocket implementation has support for per-message compression framework. A support for specific compression extension can be implemented on top of that.

Deflate based PMCE

The second thing which Compression Extensions for WebSocket standard defines is permessage-deflate compression extension. This extension specifies a way of compressesing message payload using the DEFLATE algorithm with help of the byte boundary alignment method. But first it is worth to implement concepts which are shared by all potential compression extensions - receiving and sending the message payload. Methods responsible for handling those operations should be able to properly concatenate or split the message into frames.

public abstract class WebSocketCompressionProviderBase
{
    private readonly int? _sendSegmentSize;

    ...

    protected async Task SendMessageAsync(WebSocket webSocket, byte[] message,
        WebSocketMessageType messageType, bool compressed, CancellationToken cancellationToken)
    {
        if (webSocket.State == WebSocketState.Open)
        {
            if (_sendSegmentSize.HasValue && (_sendSegmentSize.Value < message.Length))
            {
                int messageOffset = 0;
                int messageBytesToSend = message.Length;

                while (messageBytesToSend > 0)
                {
                    int messageSegmentSize = Math.Min(_sendSegmentSize.Value, messageBytesToSend);
                    ArraySegment<byte> messageSegment = new ArraySegment<byte>(message, messageOffset,
                        messageSegmentSize);

                    messageOffset += messageSegmentSize;
                    messageBytesToSend -= messageSegmentSize;

                    await SendAsync(webSocket, messageSegment, messageType, compressed,
                        (messageBytesToSend == 0), cancellationToken);
                }
            }
            else
            {
                ArraySegment<byte> messageSegment = new ArraySegment<byte>(message, 0, message.Length);

                await SendAsync(webSocket, messageSegment, messageType, compressed, true,
                    cancellationToken);
            }
        }
    }

    private Task SendAsync(WebSocket webSocket, ArraySegment<byte> messageSegment,
        WebSocketMessageType messageType, bool compressed, bool endOfMessage,
        CancellationToken cancellationToken)
    {
        if (compressed)
        {
            CompressionWebSocket compressionWebSocket = (webSocket as CompressionWebSocket)
            ?? throw new InvalidOperationException($"Used WebSocket must be CompressionWebSocket.");

            return compressionWebSocket.SendAsync(messageSegment, messageType, true, endOfMessage,
                cancellationToken);
        }
        else
        {
            return webSocket.SendAsync(messageSegment, messageType, endOfMessage, cancellationToken);
        }
    }

    protected async Task<byte[]> ReceiveMessagePayloadAsync(WebSocket webSocket,
        WebSocketReceiveResult webSocketReceiveResult, byte[] receivePayloadBuffer)
    {
        byte[] messagePayload = null;

        if (webSocketReceiveResult.EndOfMessage)
        {
            messagePayload = new byte[webSocketReceiveResult.Count];
            Array.Copy(receivePayloadBuffer, messagePayload, webSocketReceiveResult.Count);
        }
        else
        {
            IEnumerable<byte> webSocketReceivedBytesEnumerable = Enumerable.Empty<byte>();
            webSocketReceivedBytesEnumerable = webSocketReceivedBytesEnumerable
                .Concat(receivePayloadBuffer);

            while (!webSocketReceiveResult.EndOfMessage)
            {
                webSocketReceiveResult = await webSocket.ReceiveAsync(
                    new ArraySegment<byte>(receivePayloadBuffer), CancellationToken.None);
                webSocketReceivedBytesEnumerable = webSocketReceivedBytesEnumerable
                    .Concat(receivePayloadBuffer.Take(webSocketReceiveResult.Count));
            }

            messagePayload = webSocketReceivedBytesEnumerable.ToArray();
        }

        return messagePayload;
    }
}

With this base the permessage-deflate specifics can be implemented. Let's start with the byte boundary alignment method. In practice it boils down to two operations:

  • In case of compression operation the compressed data should end with empty deflate block and last four octets of that block removed.
  • In case of decompression operation last four octets of empty deflate block should be appended to the received payload before decompression.

It looks that in case of compressing with DeflateStream provided by .NET the empty deflate block is always there, so the above can be implemented with two helper methods.

public sealed class WebSocketDeflateCompressionProvider : WebSocketCompressionProviderBase
{
    private static readonly byte[] LAST_FOUR_OCTETS = new byte[] { 0x00, 0x00, 0xFF, 0xFF };
    ...

    private byte[] TrimLastFourOctetsOfEmptyNonCompressedDeflateBlock(byte[] compressedMessagePayload)
    {
        int lastFourOctetsOfEmptyNonCompressedDeflateBlockPosition = 0;
        for (int position = compressedMessagePayload.Length - 1; position >= 4; position--)
        {
            if ((compressedMessagePayload[position - 3] == LAST_FOUR_OCTETS[0])
                && (compressedMessagePayload[position - 2] == LAST_FOUR_OCTETS[1])
                && (compressedMessagePayload[position - 1] == LAST_FOUR_OCTETS[2])
                && (compressedMessagePayload[position] == LAST_FOUR_OCTETS[3]))
            {
                lastFourOctetsOfEmptyNonCompressedDeflateBlockPosition = position - 3;
                break;
            }
        }
        Array.Resize(ref compressedMessagePayload, lastFourOctetsOfEmptyNonCompressedDeflateBlockPosition);

        return compressedMessagePayload;
    }

    private byte[] AppendLastFourOctetsOfEmptyNonCompressedDeflateBlock(byte[] compressedMessagePayload)
    {
        Array.Resize(ref compressedMessagePayload, compressedMessagePayload.Length + 4);

        compressedMessagePayload[compressedMessagePayload.Length - 4] = LAST_FOUR_OCTETS[0];
        compressedMessagePayload[compressedMessagePayload.Length - 3] = LAST_FOUR_OCTETS[1];
        compressedMessagePayload[compressedMessagePayload.Length - 2] = LAST_FOUR_OCTETS[2];
        compressedMessagePayload[compressedMessagePayload.Length - 1] = LAST_FOUR_OCTETS[3];

        return compressedMessagePayload;
    }
}

The second set of helper methods which will be needed is the actual compression and decompression. For simplicity purposes only text messages will be considered from this point.

public sealed class WebSocketDeflateCompressionProvider : WebSocketCompressionProviderBase
{
    ...
    private static readonly Encoding UTF8_WITHOUT_BOM = new UTF8Encoding(false);

    ...

    private async Task<byte[]> CompressTextWithDeflateAsync(string message)
    {
        byte[] compressedMessagePayload = null;

        using (MemoryStream compressedMessagePayloadStream = new MemoryStream())
        {
            using (DeflateStream compressedMessagePayloadCompressStream =
                new DeflateStream(compressedMessagePayloadStream, CompressionMode.Compress))
            {
                using (StreamWriter compressedMessagePayloadCompressWriter =
                    new StreamWriter(compressedMessagePayloadCompressStream, UTF8_WITHOUT_BOM))
                {
                    await compressedMessagePayloadCompressWriter.WriteAsync(message);
                }
            }

            compressedMessagePayload = compressedMessagePayloadStream.ToArray();
        }

        return compressedMessagePayload;
    }

    private async Task<string> DecompressTextWithDeflateAsync(byte[] compressedMessagePayload)
    {
        string message = null;

        using (MemoryStream compressedMessagePayloadStream = new MemoryStream(compressedMessagePayload))
        {
            using (DeflateStream compressedMessagePayloadDecompressStream =
                new DeflateStream(compressedMessagePayloadStream, CompressionMode.Decompress))
            {
                using (StreamReader compressedMessagePayloadDecompressReader =
                    new StreamReader(compressedMessagePayloadDecompressStream, UTF8_WITHOUT_BOM))
                {
                    message = await compressedMessagePayloadDecompressReader.ReadToEndAsync();
                }
            }
        }

        return message;
    }
}

Now the public API can be exposed.

public interface IWebSocketCompressionProvider
{
    Task CompressTextMessageAsync(WebSocket webSocket, string message,
        CancellationToken cancellationToken);

    Task<string> DecompressTextMessageAsync(WebSocket webSocket,
        WebSocketReceiveResult webSocketReceiveResult, byte[] receivePayloadBuffer);
}

public sealed class WebSocketDeflateCompressionProvider :
    WebSocketCompressionProviderBase, IWebSocketCompressionProvider
{
    ...

    public override async Task CompressTextMessageAsync(WebSocket webSocket, string message,
        CancellationToken cancellationToken)
    {
        byte[] compressedMessagePayload = await CompressTextWithDeflateAsync(message);

        compressedMessagePayload =
            TrimLastFourOctetsOfEmptyNonCompressedDeflateBlock(compressedMessagePayload);

        await SendMessageAsync(webSocket, compressedMessagePayload, WebSocketMessageType.Text, true,
            cancellationToken);
    }

    public override async Task<string> DecompressTextMessageAsync(WebSocket webSocket,
        WebSocketReceiveResult webSocketReceiveResult, byte[] receivePayloadBuffer)
    {
        string message = null;

        CompressionWebSocketReceiveResult compressionWebSocketReceiveResult =
            webSocketReceiveResult as CompressionWebSocketReceiveResult;

        if ((compressionWebSocketReceiveResult != null) && compressionWebSocketReceiveResult.Compressed)
        {
            byte[] compressedMessagePayload =
                await ReceiveMessagePayloadAsync(webSocket, webSocketReceiveResult, receivePayloadBuffer);

            compressedMessagePayload =
                AppendLastFourOctetsOfEmptyNonCompressedDeflateBlock(compressedMessagePayload);

            message = await DecompressTextWithDeflateAsync(compressedMessagePayload);
        }
        else
        {
            byte[] messagePayload =
                await ReceiveMessagePayloadAsync(webSocket, webSocketReceiveResult, receivePayloadBuffer);

            message = Encoding.UTF8.GetString(messagePayload);
        }

        return message;
    }

    ...
}

This API makes it easy to plug in a compression provider into typical (SendAsync and ReceiveAsync based) flow for WebSocket by replacing calls to SendAsync with calls to CompressTextMessageAsync and calling DecompressTextMessageAsync whenever the WebSocketReceiveResult acquired from ReceiveAsync indicates a text message. But before this can be done the permessage-deflate extension must be properly negotiated.

Context takeover and LZ77 window size

An important part of compression extension negotiation are parameters. The permessage-deflate defines four of them: server_no_context_takeover, client_no_context_takeover, server_max_window_bits and client_max_window_bits. First two define if server and/or client can reuse the same context (LZ77 sliding window) for subsequent messages. The remaining two allow for limiting the LZ77 sliding window size. The sad truth is that the above implementation is not able to handle most of this parameters properly, so the negotiation process needs to make sure that the acceptable values are beign used or negotiation fails (failing of negotiation doesn't fail the connection, the handshake response simply doesn't contain the extension as accepted one). So what are the acceptable values for this implementation?

The DeflateStream doesn't provide control over LZ77 sliding window size which means that the negotiation must be failed if offer contains server_max_window_bits parameter as it can't be handled. At the same time the presence of client_max_window_bits should be ignored as this is just a hint that client can support this parameter.

When it comes to the context reuse the above implementation creates new DeflateStream for every message which means it always work in "no context takeover" mode. Because of that the negiotation response must prevent client from reusing the context - the client_no_context_takeover must always be included in the response. This also means that server_no_context_takeover send by client in the offer can always be accepted.

I'm skipping the code which handles the negotiation here. Despite being based on NameValueWithParametersHeaderValue class which handles all the parsing it is still quite lengthy (it also must validate the parameters for "technical correctness"). For anyone who is interested the implementation is split between WebSocketCompressionService and WebSocketDeflateCompressionOptions classes which can be found at GitHub (link below).

Trying this out

There was an issue in Microsoft.AspNetCore.WebSockets repository for implementing per-message compression. It's currently closed, so I've made this implementation available independently on GitHub and NuGet. It is also part of my WebSocket demo project if somebody is looking for ready to use playground.