-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] - Artemis Item Encoding Standard #2246
Comments
@magicus The gear item encoding sketch is complete, perhaps it would be the time for you to look at this? I know you had some stronger opinions on this before.. This is just a draft until the V3 item API is migrated (waiting on Wynn). This gives us time to discuss even major changes, if you don't agree with some parts. As for crafted encoding, I don't plan to work on that until the item encoding itself is fully complete. |
Ok... One thing that was a problem for us before was the order of stats. In legacy, they had a (somewhat arbitrary) way of ordering stat types, and then stats were sent as an ordered list of the values for the stats. This meant that both parties had to agree to the order. I'm not sure how you suggest to solve this. In fact, I don't see that you even address this..? We can keep sending the stat values in order, but then we have to specify it very clearly. Or we can send stats as key-value pairs, so we give each stat kind a numeric id, and then basically, if we have Dexterity +4, we send 0x31:0x04, if 0x31 were the code for Dexterity. Or whatever. This will essentially double the amount of data needed to be transfered, though, so will cause more issues for vanilla players. |
Hm, I was thinking, can we maybe inject some control characters to make it appear less bad for Vanilla players? In the "good old days", you'd have stored a Or, maybe there is some new fancy Unicode stuff we can use. I'm pretty certain there are a lot of control codes meaning "combine the following letters". If we can minimize the visual impact on Vanilla players, I see no real need to keep the string to an absolute minimum. Then it would be better to encode things in a way that is more self-describing and thus stable. |
Byte Based Encoding Proposal:Basically I was thinking it would be nice if we could work in a more standard format.
An encoded thing is of the following form, a list of blocks:
(hopefully we don't need more than 256 versions...)
Each
where the meaning of the The header specification may also differ between Item with optional rollsThe
NOTE: we need to decide on a standard item ID map. We also need to decide on a canonical ordering of the stats for any item. Available block types:
rolls bufferA byte array, where every byte corresponds to a roll value (30-130). If it is absent, max rolls for every stat are assumed (or base rolls, for fixed stat items). stars bufferA packed byte array, where each byte is formatted as follows:
where If it is absent, stars are computed from the rolls buffer. powder bufferA binary blob, padded out to the nearest byte.
If it is absent, all powders are assumed to be shiny dataOf the following format:
If it is absent, we assume there is no shiny data. rerollsOf the following format:
If it is absent, we assume 0 rerolls. wynn api versionOf the following format:
(can we bet on there being less than 16k api updates? I hope so...) Crafted item (from ingredients)The
corresponding to ingredients
where Available block types:
item nameA null-terminated string. (any number of nonzero bytes, followed by a zero byte to mark the end of the string.) item loreSame format as item name. Complete item descriptionThe
Available block types:
stat ID bufferA list of the stat IDs for each stat on this item. NOTE: this is not optional! stat length bufferA list of the length (in bytes) for encoding each stat. NOTE: this is not optional! max stats bufferA binary blob containing data about the max stats for this item. The order is given by the 'stat ID buffer', but each entry can have variable size in bytes. Unlike normal buffers, this buffer stores numbers using two's complement. NOTE: level, attackspeed, hp, max durability/charges, stat req, are all counted as "stats".
NOTE: this is not optional! min stats bufferSame format as max stats, but its optional. item descriptionExtra string field to accomodate things like event items. item textureTODO current durabilitySingle byte, storing the current durability/charges of this item. |
magicus made excellent points, I also like the idea of sending key-value pair for stats, and if a stat order list is still required, I think it'd be feasible to use the order from v3 item API, or the order of individual item identifications. (Assume they don't change it) |
This should answer the why's and why nots of doing key-value pairs. As for having both parties agree, that is what the hash is used for. It is basically an error-checking system, but not an error-correcting one. |
+1 to the k-v pairs (essentially what my idea has; but like "unrolled" into a few separate buffers. I think this makes it easier to make entries of the mapping optional, or add new entries (if for whatever reason that is needed)). |
To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant. |
wynnbuilder internally uses implicit order too (defined in an external data file). I think it would be best if we could rely on implicit order as much as possible and use a stable "item ID lookup table" and "stat ID lookup table" to define the ordering. |
The version I see being implemented may just be a third version, combining the good aspects of both proposals. To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format. I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items. What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format). What do you think @hppeng-wynn @RawFish69? |
We do have an implicit internal order too. A "legacy" one is used for chat items, and Artemis has 3 custom orders. Any of those could be used for agreeing on a common order or id-key map. |
I chose a single byte because its basically the default "bit of data" across computers in general
for "just gear items" do you mean like, just the normal rolled items? |
@kristofbolyai We seem to talk just past each other. Your hash check is for the stat values ("ids"). I'm talking about the stat types. Your check would help in a situation where Wynn has e.g. nerfed the base value of a certain stat. But it would not help if there is a misunderstanding in the order of stats. I spend a sh*tload of hours trying to clean up the stat handling from Legacy to Artemis. And the "ordering" of stats was a common pain point. In the end, I had to create a special ordering just to accommodate the old "item chat protocol". And I realized it would be terribly broken for all new stat types that had been introduced since it was created. So, I am very very skeptical towards any idea of "assumed" ordering. If you chose to go down that route, you will basically need to bump the version number each time Wynn adds a new stat type. If, on the other hand, you chose key-value pairs, and have a way to assign numeric ids to the stats (here you can use whatever order you agree on and just enumerate the stat types from that list), then you are safe for all future. If a client receives a number it does not understand, it can just say: "Unknown Stat: 7". |
Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:
|
@magicus hhpeng works on/is the creator of Wynnbuilder. This is basically full integration with parts of Wynnbuilder, and an overall format for anyone to encode items in Wynn in the future. |
I am working on a concept that would reduce the length of the encoded strings, even with bytes. |
On the contrary binary format encoded in unicode is probably going to be shorter (if you design it without much padding) since there's much less wastage ex. in @kristofbolyai 's original specification, the rolls + stars are being encoded using 12 bits (1 unicode char, 4096 values); but they really only take up 10 bits (including the 30 offset). So the binary code would be more efficient by nearly 20% |
tbf I think adding new stat types is pretty rare and i'd be OK with bumping the version number when that happens |
minor fixes to my comment proposal:
|
What do you mean? It sounds like you are talking about values, not types? I am still mostly worried about matching the correct type. |
Also:
That sounds great! As I said, I am all in favor of standardizig formats. Not sure how it helps us, but as long as it doesn't pose a problem for us, just go for it. |
It looks good, I only do decode and it shouldn't matter since the concept @hppeng-wynn proposed is similar enough. |
The way wynnbuilder has done it, we basically have a big list of stats in an order, and whenever wynn added new stats to the game, those stats always get appended to the end of the list (talking about the stat mapping, not the stats for any given item.) that ensure that the old stats are always in the same order, and new stats basically get new IDs. That's one way to create "implicit backwards compatibility" without much effort. However it comes at a small cost to readability (for example, the damage stats are not all next to each other in this list.) |
Basically I am thinking of using bytes as smallest data chunk, but with a trick to make it really efficient in chat: Each block type would define (in the standard, not in encoding) their "requested" data size. It would either be 8 bits, 16, 32 or 64. This would work easily with both encoding formats: Unicode characters in the Supplementary Private Use Area-A can encode any value between Blocks would not only define their "integer" size, but their length, so there would not be a need to reserve any characters for block types, and there would be no need to reserve/use a character for separating parts. As for the block headers itself, 1 byte would represent the type, 1 byte would give us the size of the block (divided by the block's data size). I think all of us understand the benefit of having variable sized blocks, but let me state an obvious case. If we support 64-bit integers natively in the standard, there is no black magic needed when encoding such values. Also supporting lower bit sizes, like 8 and 16 allow us to efficiently bundle information like identification key-value pairs. And the best of all of this is that the Unicode representation would be close to being the most efficient it can be (practically, not theoretically). What do you think? If we all agree here, we can go ahead and agree on the standard for encoding normal gear items, and implement that while getting the encoders/decoders written in the process. Once we know encoding/decoding is stable, we can move to working on the "fun" parts. So, if you agree, please react with an emoji :) |
maybe I'm confused now. I was thinking of using the space you had reserved for I don't understand why the blocks need "preferred data size". fundamentally the byte encoding would be like, just running over unicode character boundaries as follows: |
For those following this proposal, I've updated the issue description to reflect the current state of the format. Many discussions happened outside of Github, but hopefully all the changes we've agreed upon are implemented in the format now. I an update the format is planned, adding 2 other types: custom items (crafted gear, custom normal items) and crafted items as recipes. |
xxrxxxrxrx
I realize I've never responded to this suggestions. Vanilla players seeing a lot of unknown characters is a problem, but the main reason to keep the encoding short is so multiple items can fit into the relatively short maximum chat length (128 chars) Minecraft sets. |
Here, for the
|
I've thought about this being an issue, but I've shrugged it off, and I've only written the encoding part. A simple solution is to have a "null" byte at the end of the list, or to send the powder count. Both solutions use a single byte. I would lean toward sending a single, which is common in the standard. Do you have a better idea perhaps? |
My first thought is to move the powder block to the last block which will naturally gives it a termination and then I realize this is a terrible solution without any robustness. And then by the entropy the only way is to send one more byte. I prefer sending the count, cause this avoids culling the |
For the new standard, check: Wynntils/Artemis#2246 Not compatible with old standard, please lock the version if necessary.
EDIT: The originally proposed format can be found here. The format below is always updated to reflect the current standard.
Artemis Item Encoding Standard
The purpose of this new standard is to provide grounds for a new system, used for encoding Wynncraft items into strings, and vice-versa. Unlike the chat item system, this format is not limited to identified gear items, and can encode items of any (supported) types.
Encoding
Encoding is a 2 layer process. The first layer is responsible for translating between byte arrays and encoded values (UTF-16, base64). The second layer is responsible for translating any kind of game data to an array of bytes.
Encoding a byte array to an encoded value
This section describes how the standard encodes integers. This encoding is not to be changed once implemented.
Encoding to UTF-16:
0x0
and0xFFFD
. If we borrow the first two characters (U+100000-U+100001
) from Supplementary Private Use Area-B, we can encode exactly 2 bytes of data into a single character. If our number of bytes needs to be padded, we use the Supplementary Private Use Area-B to encode the first value, with a0xEE
byte for padding, which will be ignored when decoding blocks with lengths.Encoding to base64:
Format
The encoded string format is represented by different kind of blocks. The order of the blocks is unspecified, with the exception of the start and end blocks, which must be first and last, respectively. Also note that a "type" block must be present, representing the type of the item being encoded. See the format of each block below.
Stability of the format
The format of the encoding itself is not to be changed once implemented. However, the blocks themselves can change their format between versions and blocks may be added and removed. See more information about versioning below.
Block Formats
A block consists of a unique header id between the range of 0-255 (256 possible values, 1 byte). The next bytes are the block's data, which is decoded while reading, since a block's data length is not explicitly encoded. See versions for specific block formats, and the unique header id.
Encoding data
Some blocks use pre-defined ways of encoding data. These are described in this section.
Encoding a string:
Encoding a string is done by encoding the string's ASCII representation into bytes, and terminating with a null byte.
Encoding a variable sized integer:
A variable sized integer is encoded the following way: 7 bits is stored in the first byte. If the data fits in 7 bits, set the highest bit to
0
. If the data does not fit in a single byte, set the highest bit to1
, add a next byte, with the same process. Repeat until the data encoded into the required number of bytes. Zigzag encoding is used to handle negative values.Versions
Versions are only incremented if there is a breaking change in the format of 1 or more blocks, and/or if 1 or more blocks are added or removed.
Block Formats - Version 1.0
Start block
ID:
0
Integer size: 8 bits
Data: A single byte, encoding the version of the data.
End block
ID:
255
Type block
ID:
1
Integer size: 8 bits
Data: A single byte, representing the type of item being encoded. This character works as a mapping key, each type has an id character representing it.
Description: Each type represents a single type of item. In most cases these ids are separated in the same way as Artemis' item classes, however exceptions might be allowed in cases where it is logical.
Key to Type Mapping Table
0
1
2
3
4
5
Name block
Header:
2
Data: The name of item is encoded as an encoded string.
Identifications block
Header:
3
Data: The first byte contains the number of non-pre-identified ids (referred to as N) for the item. The second byte contains the type of the identification info in the following blocks, this is the identification type flag. The following bytes contain all identification info. The size of a single identification info depends on the identification type flag.
Identification Type Flag
The purpose of this flag is to make sure ID encoding fits multiple purposes and give the clients and users control above the stability and size of the encoded data. Choosing a longer, extended id encoding flag allows clients to decode the shared data without any external sources, such as APIs, and makes the encoded data "stable", even if the current item specifications change. Choosing a shorter, normal id encoding flag is preferred in situations where data only needs to be available in shorter periods, but a shorter encoding is preferred, such as in-game chat.
Normal encoding
The byte-flag of this encoding is 0.
Extended encoding
The byte-flag of this encoding is 1. The next byte is the number of pre-identified stats.
Powder block
ID:
4
Data: The first byte is the powder slots on the item. The next byte is the number of bytes. The following bytes are a binary blob, padded to fit the nearest 8 bits with
1
bits. A powder is encoded in 5 bits, with the following math:element * 6 + tier
. The elements follow anETWFA
order. 50
bits are used to represent that no powder is present at the slot. The bits are padded with0
bits to the nearest byte.If it is absent, all powder slots are assumed to be unpowdered.
If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.
Rerolls block
ID:
5
Data: A single byte encoding the number of rerolls.
Shiny block
ID:
6
Data: The first byte is the id of the shiny stat (from a single, open-source, shared, mutually agreed upon source). ID
0
is reserved for "Unknown". The next bytes are the encoded variable sized integer of the shiny value.Custom Gear Type block
ID:
7
Data: The data is a single byte, containing the id of the type of the item. See the ID map below.
Gear Type map
Durability block
ID:
8
Data: The first byte is the overall effectiveness of the identifications (the percentage next to the name for crafted items). The next bytes are the encoded variable sized integer of the maximum value. The next bytes are the encoded variable sized integer of the current value.
Requirements block
ID:
9
Data: The first byte is the level requirement. The second byte is the class requirement, represented with an id. The next byte is the number of skill requirements. A skill requirement encoded as an id byte, representing the skill (
ETFWA
order). The next bytes are the encoded variable sized integer of the requirement values.Class Requirement map
Damage block
ID:
10
Data: The first byte is the id of the attack speed of the item. The next byte is the number of attack damages present on the item. An attack damage is encoded the following way: The first byte is the id of the skill (
ETFWAN
, where N represents Neutral). The next bytes are the encoded variable sized integer of the minimum damage. The next bytes are the encoded variable sized integer of the maximum damage.Attack Speed map
Defense block
ID:
11
Data: The next bytes are the encoded variable sized integer of the health value. The next byte is the number of defense stats present on the item. An defense stat is encoded the following way: The first byte is the id of the skill (
ETFWA
). The next bytes are the encoded variable sized integer of the defense value.Custom Identifications block
ID:
12
Data: The first byte is the number of identifications. The identifications are encoded the following way: The first byte is the id of the identification. The next bytes are the encoded variable sized integer of the max value. For crafted items, the max values can be used to calculate the minimum values (10% of the maximum, rounded) and the current values (from the overall effectiveness).
Custom Consumable Type block
ID:
13
Data: The data is a single byte, containing the id of the type of the item. See the ID map below.
Consumable Type map
Uses block
ID:
14
Data: The first byte is the remaining uses for the item. The second byte is the maximum uses for the item.
Effects block
ID:
15
Data: The first byte is the number of effects. An effect is encoded the following way: The first byte is the id of the effect. The next bytes are the encoded variable sized integer of the effect's value.
Consumable Effect map
Referenced data files
Shiny Stat Table
https://github.com/Wynntils/Static-Storage/blob/main/Data-Storage/shiny_stats.json
Identification ID-map Table
https://github.com/Wynntils/Static-Storage/blob/main/Reference/id_keys.json
The text was updated successfully, but these errors were encountered: