Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscribing to a base message type when the assembly versions dont match results in an unexpected message instance of the base type instead of the type that was published #6779

Open
oskar opened this issue Jun 21, 2023 · 5 comments
Labels

Comments

@oskar
Copy link

oskar commented Jun 21, 2023

Describe the bug

Description

We have a system with many applications with their own messages contract packages (published as nuget packages) which publishes events that other applications subscribe to (consuming the publishing application's contract nuget package to be able to deserialize the messages).

Each time a new version of an application is released (multiple times per day), the assembly version is bumped by the CI pipeline. But only if the contracts are actually changed, we publish a new contract nuget package. All subscribing applications always try to consume the latest available contract nuget packages from the publishers, to avoid message deserialization problems. But in practice, they almost never consume the exact same assembly version for two reasons:

  1. We don't create a new nuget package unless the contracts have changed
  2. The publishing application gets a new version for each deploy even if only internal things have changed

Secondly, a common scenario is that we have a message hierarchy that can look like this:

interface IItemChanged : IEvent { }
interface IPriceChanged : IItemChanged { }
interface ICategoryChanged : IItemChanged { }

The publishing application publishes specific events like IPriceChanged and ICategoryChanged and the consuming application might subscribe to all item changes using IHandleMessages<IItemChanged>

Thirdly, sometimes the consuming application downcasts the messages to the specific types:

var price = message switch
{
    IPriceChanged m => m.NewPrice,
    ICategoryChanged m => m.CategoryBasePrice,
    _ => 0
});

In NServiceBus 7, this scenario has always worked as expected. The subscriber receives an instance of type IPriceChanged__impl or ICategoryChanged__impl to the Handle method so the downcasting works. After upgrading to NServiceBus 8, this behaviour has changed. The Handle method now receives an instance of IItemChanged__impl - making the downcast fail.

Right now we have identified two overlapping workarounds:

  1. Subscribe to "concrete" event types instead of base types (i.e. subscribe to IPriceChanged and ICategoryChanged instead of IItemChanged)
  2. Avoid downcasting if subscribing to a base type (which in practice might mean the same as the first workaround)

We consider this changed behavior a bug in NServiceBus 8, but would like to hear how you interpret the situation. Please see the repro app linked below. In that app we have verified that the behavior works on NSB 7 and also works if the assembly versions are exactly matching.

Steps to reproduce

See this repro app: https://github.com/oskar/NSBBaseSubscribeUnmatchingVersions

Relevant log output

No response

Additional Information

No response

@oskar oskar added the Bug label Jun 21, 2023
@timbussmann
Copy link
Contributor

Thanks for the repro sample, this is very helpful! 👍

IIRC, there have been a few optimizations around resolving message types and assembly scanning which are most likely related to your issue.

It seems the problem is caused by the sub-interfaces not being detected as message types in the beginning since they are convention based and don't use the NServiceBus marker interfaces. Convention-based messages are only detected if they are directly referenced from message handlers which isn't the case here. When it tries to load the type at runtime dynamically, the difference in the assembly version causes this logic to fail as the requested version is higher than the one available (the other way should work).

I'll have to dig a bit deeper into this. A potential workaround would be to create empty message handlers for the desired types. This will cause the messages to be correctly registered and the deserialization logic can find the correct types.

@timbussmann
Copy link
Contributor

Ok so I played around with v8 and v7 a bit and this isn't directly a bug in the logic of NServiceBus but rather related to how assembly loading and type lookups work with .NET:

  • Because you're using convention-based messages, the message types aren't scanned at endpoint startup. This is intentional behavior (and both versions behave the same way) to limit the assembly scanning logic for performance and security reasons. However, the IItemChanged message type is loaded and cached as NServiceBus found a handler for it. This is also why adding a handler for the expected message types works.
  • For message types that don't have a handler directly associated with them, NServiceBus will try to load the type lazily at runtime. It tries to load it using Type.GetType. Since the incoming message specificies a more concrete type than the one known from the message handler, it attempts to load it using Type.GetType, and here's where there difference between the versions happens. Note that the incoming fully qualified assembly name of the message contract is higher than the one that is available on disk for the subscriber.
    • In NServiceBus v7, Type.GetType(...) basically ignores the provided version information in the assembly-qualified name.
    • In NServiceBus v8, Type.GetType(...) does not ignore the version, and since the provided version is higher, it won't find the specified type.
  • The type resolving logic then moves to the next message type identifier in the message header which at some point is the type of the message handler. Since NServiceBus can resolve types by name IF the type is already a known message type, this works on both versions as this logic won't rely on Type.GetType. Therefore it decides to use the more abstract type that has been identified during endpoint startup.

So ultimately, your issue is caused by different behaviors of Type.GetType. This difference of the behavior lies in the target frameworks: NServiceBus v7 targets .NET Standard which causes it to load some assemblies via Assembly.LoadFrom, while NServiceBus v8 uses the newer, .NET (Core) specific assembly loading APIs. Assembly.LoadFrom has some interesting side-effects on the assembly/type resolving logic. I've written a blog post about this quite a while ago in case you want to learn more about it.

That's a lot of details and probably too many details to digest easily. So the workarounds right now would be:

  • Add (empty) message handlers for the message types you expect to receive
  • Use the NServiceBus marker interfaces instead of conventions
  • When starting the application, call Assembly.LoadFrom("NServiceBus.Core.dll"); at the beginning of your application to enable the same type-resolving logic as NServiceBus version 7

@oskar
Copy link
Author

oskar commented Jun 24, 2023

Thanks a lot for the in-depth response! 👍

Before opening the issue, we did a bit of debugging on what might have changed in NServiceBus 8 and found this commit in AssemblyScanner.cs and also this .NET runtime issue by @bording. So it does not come as a surprise that you also point in the direction of this changed behavior in .NET.

Your blog post also does a good job explaining why Assembly.LoadFrom("NServiceBus.Core.dll") is a workaround (while at the same time making it painfully clear that assembly loading in .NET really is non-trivial - I had to read the article a couple of times 😅). It is a workaround we can probably rollout pretty easily (we have a shared infrastructure nuget package that is used for our NServiceBus applications), but I can't shake off the feeling that it is a hack that will come back and bite us again in the future. After all, both your blog post and also the Microsoft employees in the issue linked above recommend avoiding Assembly.LoadFrom() on .NET Core. Would you trust upcoming .NET versions to maintain this behavior?

Regarding using the NServiceBus marker interface instead of conventions, it is something I don't really think would fit our current architecture. Our messages contract packages target netstandard2.0 and are also only allowed to contain a reference to a small base package (which defines our custom IEvent, ICommand and IMessage.). Having an NServiceBus 8 reference in our contracts would require us to multitarget net6 and net472, but we actually have some older applications still on NServiceBus 7 which I think are running an even older version of .NET Framework than net472. To summarize, we prefer conventions over the marker contract to avoid having the NServiceBus reference in our contracts.

There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation? Failing message deserialization just because two versions don't match is always worse than trying and failing on an actual breaking change in the message. As soon as the publishing application goes live with an updated message contract, all consumers/subscribers will be in trouble if there is a breaking change anyway. I haven't really experimented with this internally yet, but do you see any potential problems with this approach? Is there anything regarding assembly loading in NServiceBus that might become problematic if all our messages assemblies would contain the same assembly version even while the message types change over time?

@timbussmann
Copy link
Contributor

both your blog post and also the Microsoft employees in the issue linked above recommend avoiding Assembly.LoadFrom() on .NET Core. Would you trust upcoming .NET versions to maintain this behavior?

Microsoft has been extremely careful of breaking behaviors of existing APIs, so I think we can expect the behavior to continue working this way, but assembly loading has always been a tricky area with hard to predict behaviors so who knows.

To summarize, we prefer conventions over the marker contract to avoid having the NServiceBus reference in our contracts.

Totally understandable 👍 , I was just trying to list all the options.

There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation?

Yes, there is actually one more option that I didn't mention before (because I forgot about it), which is to remove the "Version=x.x.x.x" attribute from the fully qualified assembly name in the enclosed message types header. When not providing a version, .NET will also be more open to resolving types (I'll have to write an additional blog post on this one as the old one was primarily focused on locating assemblies and it didn't take into account versions). This could also be done by a behavior on either publisher or receiver side. Oversimplified, it could look something like this:

class RemoveVersionBehavior : Behavior<IIncomingPhysicalMessageContext>
{
    public override Task Invoke(IIncomingPhysicalMessageContext context, Func<Task> next)
    {
        if (context.Message.Headers.TryGetValue(Headers.EnclosedMessageTypes, out var enclosedMessageTypes))
        {
            context.Message.Headers[Headers.EnclosedMessageTypes] = enclosedMessageTypes.Replace("Version=2.0.0.0,", string.Empty);
        }

        return next();
    }
}

and needs to be registered in the pipeline using configuration.Pipeline.Register(new RemoveVersionBehavior(), "Removes the assembly version from the message types header");

right here, the version is hard-coded and you'd have to write some extra logic to make this more flexible (that's probably why it would be easier on the publisher side).

There is one more possible workaround and that is to stop publishing our contracts nuget packages with an ever changing assembly version. If we assign no version (or just hard code 1.0.0.0) to the assemblies (but still increase the nuget package version of course), I guess we could avoid the situation?

I think that would work too but I haven't tested it. The approach should be no problem as long as you make sure that you're not introducing breaking changes to your message types but instead create a new type containing the breaking change.

@timbussmann
Copy link
Contributor

I wanted to add an additional comment about some potential approaches to "fix" this by making message-type resolution work for this specific scenario. Ultimately, the problem is caused by the desired message type not being registered as a known message type and the runtime resolution logic failing to find a matching type. I've been trying to come up with other scenarios that would run into this issue but I couldn't come up with any right now. This is because there are 3 different conditions necessary, where every combination of just two would be handled by core already I think:

  • The message must be "marked" using message conventions instead of marker interfaces. When using marker interfaces, assembly scanning should always detect all message types of an assembly and register them. Once message types are registered, there is a built-in match-by-name mechanism in case Type.GetType from the message header values would fail.
  • There must be a version number mismatch, where the incoming message type header has a version higher than the assembly loaded in the receiver. If the version would match or would be lower, Type.GetType would find the specified type and register it as a known message type.
  • There must be message inheritance, where message handlers handle a more abstract type than the incoming message type. This is due to the fact that all message types that are directly referenced in message handler (using the IHandleMessages<T> interface) are also registered as message type. All registered message types should also resolve using match-by-name, even when Type.GetType would fail (e.g. due to version mismatch)

Some ideas on how to make this work:

  • Automatically drop the version attribute from the fully qualified assembly name when calling Type.GetType.
  • At startup, scan all assemblies for messages that contain types used in message handler declarations (currently only that specific type is added), since it is likely that the assembly can contain further message types.
  • Add an additional API to allow explicit scanning of assemblies. This would allow to register convention-based assemblies to be scanned at endpoint startup.

Each approach has different tradeoffs, e.g. performance on startup or message-processing performance, security, expected behavior, and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants