Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding special value UNRECOGNIZED to the enum type #2725

Open
JurajBegovac opened this issue Nov 28, 2023 · 21 comments
Open

Adding special value UNRECOGNIZED to the enum type #2725

JurajBegovac opened this issue Nov 28, 2023 · 21 comments
Assignees

Comments

@JurajBegovac
Copy link

Are there any plans for Wire to add an UNRECOGNIZED value to enums, similar to the functionality provided by the proto3 compiler for Java?

https://protobuf.dev/reference/java/java-generated/#enum

The protocol buffer compiler will generate a Java enum type called Foo with the same set of values. If you are using proto3, it also adds the special value UNRECOGNIZED to the enum type. 

The context for my query is that I'm working on a project where we're considering switching to Wire for generating Kotlin code from our proto3 schema. However, I've noticed that not all enums in our existing proto schema include a special type in the first position (value 0). This feature of handling unknown enum values automatically is important for our project's compatibility and robustness, and I'm interested in knowing if Wire plans to support something similar for Kotlin. If this feature is not planned, is there a possibility for me to implement it myself using some custom handler or workaround?

Thank you in advance

@oldergod
Copy link
Member

I'm surprised protoc didn't throw on that. Is there a lenient flag you can turn on to get that behaviour?

@oldergod
Copy link
Member

The doc says it adds it all the time? https://protobuf.dev/reference/java/java-generated/#enum Even if the example had a constant with a 0 value which is the default...

What is happening...

@oldergod
Copy link
Member

UNRECOGNIZED(-1),

@JurajBegovac
Copy link
Author

Hi, we don't use any special flags.

The protoc compiler automatically adds UNRECOGNIZED(-1) to all enum types in Protocol Buffers, even if there's an explicitly defined value at the first position. To illustrate, I'll share an actual example from our project (nothing sensitive, of course 😅):

Here’s a part of our proto scheme (luckily here we have that UNKNOWN at the first position, but it's not always the case):

enum CurrencyCodeType {
    CURRENCYCODETYPE_UNKNOWN = 0;
    CURRENCYCODETYPE_EUR = 1;
    CURRENCYCODETYPE_USD = 2;
}

message Price {
    CurrencyCodeType currency_code = 1;
    float amount = 2;
}

And this is the corresponding generated enum in Java (a part of generated code):

public enum CurrencyCodeType implements com.google.protobuf.ProtocolMessageEnum {

  CURRENCYCODETYPE_UNKNOWN(0),
  CURRENCYCODETYPE_EUR(1),
  CURRENCYCODETYPE_USD(2),
  UNRECOGNIZED(-1), // <----- Even with CURRENCYCODETYPE_UNKNOWN(0) defined, this is added
  ;
  
  // other code

  public static CurrencyCodeType forNumber(int value) {
    switch (value) {
      case 0: return CURRENCYCODETYPE_UNKNOWN;
      case 1: return CURRENCYCODETYPE_EUR;
      case 2: return CURRENCYCODETYPE_USD;
      default: return null;
    }
  }
  
  // other code

The generated Price class looks like this (part of it):

public final class Price extends com.google.protobuf.GeneratedMessageV3 implements PriceOrBuilder {
  private Price(com.google.protobuf.GeneratedMessageV3.Builder<?> builder) {
    super(builder);
  }

  private Price() {
    currencyCode_ = 0;
  }
  
  // other code

  @java.lang.Override 
  public com.mypackage.CurrencyCodeType getCurrencyCode() {
    com.mypackage.CurrencyCodeType result = com.mypackage.CurrencyCodeType.valueOf(currencyCode_);
    return result == null ? com.mypackage.CurrencyCodeType.UNRECOGNIZED : result;
  }
  
  // other code

As illustrated, the constructor initializes the default value to 0. However, if a new value is encountered—such as when the backend introduces a new currency type that hasn't yet been implemented on the client side—the system defaults to UNRECOGNIZED instead of the first defined value.
We have been using protoc in this manner since 2019.

@oldergod
Copy link
Member

oldergod commented Nov 29, 2023

Got it. Wire since the beginning has been returning null. This would be breaking for most consumers. Maybe we could put it behind an option for Java and Kotlin generation... 🤔

Java:

    public static CurrencyCodeType fromValue(int value) {
      switch (value) {
        case 0: return CURRENCYCODETYPE_UNKNOWN;
        case 1: return CURRENCYCODETYPE_EUR;
        case 2: return CURRENCYCODETYPE_USD;
        default: return null;
      }
    }

Kotlin:

      public fun fromValue(`value`: Int): CurrencyCodeType? = when (value) {
        0 -> CURRENCYCODETYPE_UNKNOWN
        1 -> CURRENCYCODETYPE_EUR
        2 -> CURRENCYCODETYPE_USD
        else -> null
      }

@JurajBegovac
Copy link
Author

JurajBegovac commented Nov 29, 2023

I don't believe the issue lies with returning null (my example also returns null). The potential problem, as I see it, could arise with the decode function in messages that utilize enum.

Take a look at the following example, which is a snippet from the Price source code generated using Wire:

override fun decode(reader: ProtoReader): Price {
  var currency_code: CurrencyCodeType = CurrencyCodeType.CURRENCYCODETYPE_UNKNOWN
  var amount: Float = 0f
  val unknownFields = reader.forEachTag { tag ->
    when (tag) {
      1 -> try {
        currency_code = CurrencyCodeType.ADAPTER.decode(reader)
      } catch (e: ProtoAdapter.EnumConstantNotFoundException) {
        reader.addUnknownField(tag, FieldEncoding.VARINT, e.value.toLong()) // THIS is crucial because it doesn't change currency_code -> currency_code stays CURRENCYCODETYPE_UNKNOWN
        // currency_code = CurrencyCodeType.UNRECOGNIZED // Adding something like this would align our implementation with that of protoc
      }
      2 -> amount = ProtoAdapter.FLOAT.decode(reader)
      else -> reader.readUnknownField(tag)
    }
  }
  return Price(
    currency_code = currency_code,
    amount = amount,
    unknownFields = unknownFields
  )
}

Or, as you've highlighted, if the fromValue function returns UNRECOGNIZED in the else case.

@oldergod
Copy link
Member

I wonder if protoc-kotlin does it too? I'm surprised Java does it on its own since it's not specified in the specs. Serialisation should work even if the service on the other side doesn't do that.
If we're talking about code consuming Java generated types, switching to Wire would break the call sites anyway so having to treat an enum as null instead of UNRECOGNIZED, or custom read the unknownfield should suffice?
Maybe we could write tests showing why it'd be a problem if Wire were not to do it

@JurajBegovac
Copy link
Author

protoc-kotlin

It's important to note that protoc-kotlin operates similarly to protoc-java because it essentially acts as a Kotlin wrapper over the Java code, as outlined in the Protocol Buffers Kotlin tutorial. This means it doesn't generate enums specifically for Kotlin; they aren't necessary due to this underlying Java implementation.

However, a key distinction with protoc-kotlin is its handling of proto messages. For instance, in the case of our Price message, protoc-kotlin generates an additional Kotlin file (PriceKt.kt in our example). This Kotlin file provides a DSL wrapper around the Java builder. Here’s what the output looks like:

@kotlin.jvm.JvmSynthetic
public inline fun price(block: com.mypackage.PriceKt.Dsl.() -> kotlin.Unit): com.mypackage.Price =
  com.mypackage.PriceKt.Dsl._create(com.mypackage.Price.newBuilder()).apply { block() }._build()
  
  public object PriceKt {
  @kotlin.OptIn(com.google.protobuf.kotlin.OnlyForUseByGeneratedProtoCode::class)
  @com.google.protobuf.kotlin.ProtoDslMarker
  public class Dsl private constructor(
    private val _builder: com.mypackage.Price.Builder
  ) {
    public companion object {
      @kotlin.jvm.JvmSynthetic
      @kotlin.PublishedApi
      internal fun _create(builder: com.mypackage.Price.Builder): Dsl = Dsl(builder)
    }

    @kotlin.jvm.JvmSynthetic
    @kotlin.PublishedApi
    internal fun _build(): com.mypackage.Price = _builder.build()

    /**
     * <code>.CurrencyCodeType currency_code = 1;</code>
     */
    public var currencyCode: com.mypackage.CurrencyCodeType
      @JvmName("getCurrencyCode")
      get() = _builder.getCurrencyCode()
      @JvmName("setCurrencyCode")
      set(value) {
        _builder.setCurrencyCode(value)
      }
    /**
     * <code>.CurrencyCodeType currency_code = 1;</code>
     */
    public fun clearCurrencyCode() {
      _builder.clearCurrencyCode()
    }

    /**
     * <code>float amount = 2;</code>
     */
    public var amount: kotlin.Float
      @JvmName("getAmount")
      get() = _builder.getAmount()
      @JvmName("setAmount")
      set(value) {
        _builder.setAmount(value)
      }
    /**
     * <code>float amount = 2;</code>
     */
    public fun clearAmount() {
      _builder.clearAmount()
    }
  }
}
@kotlin.jvm.JvmSynthetic
public inline fun com.mypackage.Price.copy(block: com.mypackage.PriceKt.Dsl.() -> kotlin.Unit): com.mypackage.Price =
  com.mypackage.PriceKt.Dsl._create(this.toBuilder()).apply { block() }._build()

This DSL essentially provides a Kotlin-friendly way to interact with the underlying Java builder. For example, the currencyCode property in the Kotlin DSL:

public var currencyCode: com.mypackage.CurrencyCodeType
  @JvmName("getCurrencyCode")
  get() = _builder.getCurrencyCode() // <-- This calls the Java code

corresponds to the Java generated code, like so:

@Override
public com.mypackage.CurrencyCodeType getCurrencyCode() {
  @SuppressWarnings("deprecation")
  com.mypackage.CurrencyCodeType result = com.mypackage.CurrencyCodeType.valueOf(currencyCode_);
  return result == null ? com.mypackage.CurrencyCodeType.UNRECOGNIZED : result;
}

In essence, the Kotlin DSL is a layer on top of the Java builder, ultimately calling the same underlying Java functions for properties like currencyCodeType.

In response to the comment about "wire breaking"

switching to Wire would break the call sites anyway

Your point about Wire potentially breaking existing call sites is valid, though I find this primarily affects properties due to their naming patterns. Although it's interesting why Wire uses underscore naming instead of camelCase, that's not our main focus here, so I'll put that topic aside for now.

For enums, Wire does things similarly to protoc-java, so we get the Kotlin versions without having to change any code, which is great because the names stay the same. The only hitch is that Wire skips adding the UNRECOGNIZED value to the enums, unlike protoc-java which always includes it. Swapping that out isn't too tough, but the issue with Wire is it falls back to the first value by default. That's tricky for us because then we can't tell if it's really the first value or some new value that the client doesn't know about yet.

@JurajBegovac
Copy link
Author

Hi @oldergod,

I've forked the repository and submitted a pull request with my proposed solution to address the issue

JurajBegovac#1

@oldergod
Copy link
Member

Sorry I missed your last message, I've been thinking about it and I cannot convince myself that the change is required.
Would you mind explaining what you mean with "but the issue with Wire is it falls back to the first value by default." ? Proto3's specs say we should fall back to 0 if nothing is on the Wire. For proto2, we return null. Where do we fall back to the first value?

@JurajBegovac
Copy link
Author

Oh, right, I just meant that Wire defaults to the value at 0, which is usually the first one listed, hence my 'first value' mix-up 😅.

I get why you might not see the need for a change. The thing is, when our proto models were initially set up (before my time on the project), there was this fallback to -1, which is even mentioned in the proto docs. So, I guess that's why some of our developers didn't always bother to put a special or unknown value in the 0 position.

@oldergod
Copy link
Member

I think I'm starting to understand what's happening.
Protoc does generate the UNRECOGNIZED for Proto3 only (not on proto2).

Why?

With the new no-label field in proto3, an enum cannot be null. It has to be of some value, defaulting to its identity value (0).

The $10 question is: you receive the value 2 when your enum only accepts 0 or 1, what value do you get when you read the value after decoding?

  • Wire sets it to its identity value: 0 (which I think is a bug)
  • Protoc sets it to its generated-under-the-hood UNRECOGNIZED constant: -1. It exposes at the same time two methods: one which returns the constant, the other one returns the value. Here, myEnum() would return UNRECOGNIZED and myEnumValue() returns 2.

So how about this?

  • We add an option to generate UNRECOGNIZED for proto3 enum fields, both java & kotlin.
    • Need to check edge cases:
      • what if -1 is already there? this cannot happen in proto3
      • what if UNRECOGNIZED already exists? looks like protoc crashes.
  • Now, we need to treat UNRECOGNIZED like a special value: It is only ever assigned when decoding, should not be used when encoding. Can be manually set to a message, etc.

I might have time to do that but that'll be Feb 2024. @JurajBegovac if you're willing to do it, why not? We could start on one language at first (the adapter code is shared anyway). I'll be out for a few weeks so won't be able to give feedback for a while.

@JurajBegovac
Copy link
Author

JurajBegovac commented Dec 19, 2023

Awesome news 😄

By the way, you should probably see if this change might affect any existing clients. Perhaps making it optional would be a good idea – like something we can toggle via the Gradle plugin (https://square.github.io/wire/wire_compiler/#kotlin). Or are you thinking it should be a default thing for all proto3 projects? 🤷

No hurry on our end (we're still using protoc, so we're all good for now).

Also, I'm up for working on this. If you've got any tips, ideas, or suggestions on where I should start, that'd be really helpful!

@oldergod
Copy link
Member

This would be a feature one has to opt-in. I'm not sure about the integration. The safe first step would be to add it to KotlinGenerator and/or JavaGenerator, with a boolean in the constructor deciding to generate it or not.

@JurajBegovac
Copy link
Author

I've already added it in to KotlinGenerator in this PR:
JurajBegovac#1

@JurajBegovac
Copy link
Author

Hi @oldergod 😄
Not sure if you had some time to take a look at this...

@oldergod
Copy link
Member

It's on my radar but other priorities right now. I don't have a clear image yet as how to implement it. I don't think the constant -1 makes sense with the current API offered by Wire.

@oldergod
Copy link
Member

oldergod commented Feb 9, 2024

Summary of some digging:

  • opt-in option (default false) to kotlin {} blocks.
  • generate UNRECOGNIZED -1 only for proto3
  • throw at runtime (like protoc) if one sets it to -1.
  • On encoding, we don't send -1 but the actual value.
  • Test roundtrip bytes + JSON between Wire and Protoc.
  • Do we need it on Java right now? @JurajBegovac what do you use?

Steps (PR separation):

  • Only touching KotlinGenerator, add an option defaulted to false which generates UNRECOGNIZED(-1) field for both constructor and builder. This only concerns proto3.
  • Update ProtoAdapter's encoding/decoding to handle that properly. Decoding: if unknown, we set unrecognized. Encoding: if unknown, we read from unknownFields.
  • Build a method which would allow one to read the unrecognized constant's real value. For JVM, that might be something like private fun <M : Message<M, B>, B : Message.Builder<M, B>, P : Enum<P>> Message<M, B>.unknownEnumTag(property: KProperty1<M, P?>): Int? { which at call site would look like myMessage.unknownEnumTag(MyMessage::myEnumGetter). I'd like to have the same signature for native and JS if possible.
  • Write interop tests with Protoc, including JSON, and confirm we match.
  • Update JavaGenerator to generate UNRECOGNIZED(-1) if an opt-in setting is true.
  • Add option to Wire's Gradle plugin.
  • Add options to WireCompiler

@JurajBegovac
Copy link
Author

I guess only kotlin is fine but as you wish 😄

@oldergod oldergod self-assigned this Mar 11, 2024
@JurajBegovac
Copy link
Author

Hello @oldergod, I was wondering if there's been any update on the task. Please let me know if there's anything I can assist with.

@oldergod
Copy link
Member

oldergod commented Apr 8, 2024

@JurajBegovac We started work on it 2 weeks ago. Should be ready by the end of April, or early May at the latest. We update #2725 (comment) as we go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants