Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] - Artemis Item Encoding Standard #2246

Closed
kristofbolyai opened this issue Nov 11, 2023 · 30 comments · Fixed by #2253
Closed

[Proposal] - Artemis Item Encoding Standard #2246

kristofbolyai opened this issue Nov 11, 2023 · 30 comments · Fixed by #2253
Assignees
Labels

Comments

@kristofbolyai
Copy link
Collaborator

kristofbolyai commented Nov 11, 2023

EDIT: The originally proposed format can be found here. The format below is always updated to reflect the current standard.

Artemis Item Encoding Standard

The purpose of this new standard is to provide grounds for a new system, used for encoding Wynncraft items into strings, and vice-versa. Unlike the chat item system, this format is not limited to identified gear items, and can encode items of any (supported) types.

Encoding

Encoding is a 2 layer process. The first layer is responsible for translating between byte arrays and encoded values (UTF-16, base64). The second layer is responsible for translating any kind of game data to an array of bytes.

Encoding a byte array to an encoded value

This section describes how the standard encodes integers. This encoding is not to be changed once implemented.
Encoding to UTF-16:

Encoding to base64:

  • A byte array is can be encoded into base64 easily, as base64 is made to represent bytes as text.

Format

The encoded string format is represented by different kind of blocks. The order of the blocks is unspecified, with the exception of the start and end blocks, which must be first and last, respectively. Also note that a "type" block must be present, representing the type of the item being encoded. See the format of each block below.

Stability of the format

The format of the encoding itself is not to be changed once implemented. However, the blocks themselves can change their format between versions and blocks may be added and removed. See more information about versioning below.

Block Formats

A block consists of a unique header id between the range of 0-255 (256 possible values, 1 byte). The next bytes are the block's data, which is decoded while reading, since a block's data length is not explicitly encoded. See versions for specific block formats, and the unique header id.

Encoding data

Some blocks use pre-defined ways of encoding data. These are described in this section.

Encoding a string:

Encoding a string is done by encoding the string's ASCII representation into bytes, and terminating with a null byte.

Encoding a variable sized integer:

A variable sized integer is encoded the following way: 7 bits is stored in the first byte. If the data fits in 7 bits, set the highest bit to 0. If the data does not fit in a single byte, set the highest bit to 1, add a next byte, with the same process. Repeat until the data encoded into the required number of bytes. Zigzag encoding is used to handle negative values.

Versions

Versions are only incremented if there is a breaking change in the format of 1 or more blocks, and/or if 1 or more blocks are added or removed.

Block Formats - Version 1.0

Start block

ID: 0
Integer size: 8 bits
Data: A single byte, encoding the version of the data.

End block

ID: 255

Type block

ID: 1
Integer size: 8 bits
Data: A single byte, representing the type of item being encoded. This character works as a mapping key, each type has an id character representing it.
Description: Each type represents a single type of item. In most cases these ids are separated in the same way as Artemis' item classes, however exceptions might be allowed in cases where it is logical.

Key to Type Mapping Table
Key Type Required blocks Optional blocks
0 Gear Item Name Identifications, Powders, Shiny, Reroll
1 Tome Item Name Identifications
2 Charm Item Name Identifications
3 Crafted Gear Item Custom Gear Type, Durability, Requirements Name, Damage, Defense, Custom Identifications, Powders
4 Crafted Consumable Item Custom Consumable Type, Uses, Requirements Effects, Name, Custom Identifications
5 Crafted Item from Recipe TODO TODO

Name block

Header: 2
Data: The name of item is encoded as an encoded string.

Identifications block

Header: 3
Data: The first byte contains the number of non-pre-identified ids (referred to as N) for the item. The second byte contains the type of the identification info in the following blocks, this is the identification type flag. The following bytes contain all identification info. The size of a single identification info depends on the identification type flag.

Identification Type Flag

The purpose of this flag is to make sure ID encoding fits multiple purposes and give the clients and users control above the stability and size of the encoded data. Choosing a longer, extended id encoding flag allows clients to decode the shared data without any external sources, such as APIs, and makes the encoded data "stable", even if the current item specifications change. Choosing a shorter, normal id encoding flag is preferred in situations where data only needs to be available in shorter periods, but a shorter encoding is preferred, such as in-game chat.

Normal encoding

The byte-flag of this encoding is 0.

  • Pre-identified stats: Pre-identified stats are not encoded. Injecting them back is an implementation detail for the client.
  • Normal stats: Each identification takes 2 bytes to encode. The first byte is the numerical key of the ID (from a single, open-source, shared, mutually agreed upon source). The second byte is the calculated internal roll of the item.
Extended encoding

The byte-flag of this encoding is 1. The next byte is the number of pre-identified stats.

  • Pre-identified stats: The first byte is the numerical key of the ID (from a single, open-source, shared, mutually agreed upon source). The following bytes are the encoded variable sized integer of the base value. Internal roll is not sent, as it does not make sense for pre-identified stats.
  • Normal stats: The first byte is the numerical key of the ID (from a single, open-source, shared, mutually agreed upon source). The following bytes are the encoded variable sized integer of the base value. The last byte is the calculated internal roll of the item.

Powder block

ID: 4
Data: The first byte is the powder slots on the item. The next byte is the number of bytes. The following bytes are a binary blob, padded to fit the nearest 8 bits with 1 bits. A powder is encoded in 5 bits, with the following math: element * 6 + tier. The elements follow an ETWFA order. 5 0 bits are used to represent that no powder is present at the slot. The bits are padded with 0 bits to the nearest byte.
If it is absent, all powder slots are assumed to be unpowdered.
If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.

Rerolls block

ID: 5
Data: A single byte encoding the number of rerolls.

Shiny block

ID: 6
Data: The first byte is the id of the shiny stat (from a single, open-source, shared, mutually agreed upon source). ID 0 is reserved for "Unknown". The next bytes are the encoded variable sized integer of the shiny value.

Custom Gear Type block

ID: 7
Data: The data is a single byte, containing the id of the type of the item. See the ID map below.

Gear Type map
ID Type
0 Spear
1 Wand
2 Dagger
3 Bow
4 Relik
5 Ring
6 Bracelet
7 Necklace
8 Helmet
9 Chestplate
10 Leggings
11 Boots
12 Weapon*
13 Accessory*
  • Fallback types

Durability block

ID: 8
Data: The first byte is the overall effectiveness of the identifications (the percentage next to the name for crafted items). The next bytes are the encoded variable sized integer of the maximum value. The next bytes are the encoded variable sized integer of the current value.

Requirements block

ID: 9
Data: The first byte is the level requirement. The second byte is the class requirement, represented with an id. The next byte is the number of skill requirements. A skill requirement encoded as an id byte, representing the skill (ETFWA order). The next bytes are the encoded variable sized integer of the requirement values.

Class Requirement map
ID Type
0 None
1 Mage
2 Archer
3 Warrior
4 Assassin
5 Shaman

Damage block

ID: 10
Data: The first byte is the id of the attack speed of the item. The next byte is the number of attack damages present on the item. An attack damage is encoded the following way: The first byte is the id of the skill (ETFWAN, where N represents Neutral). The next bytes are the encoded variable sized integer of the minimum damage. The next bytes are the encoded variable sized integer of the maximum damage.

Attack Speed map
ID Type
0 Super Slow
1 Very Slow
2 Slow
3 Normal
4 Fast
5 Very Fast
6 Super Fast

Defense block

ID: 11
Data: The next bytes are the encoded variable sized integer of the health value. The next byte is the number of defense stats present on the item. An defense stat is encoded the following way: The first byte is the id of the skill (ETFWA). The next bytes are the encoded variable sized integer of the defense value.

Custom Identifications block

ID: 12
Data: The first byte is the number of identifications. The identifications are encoded the following way: The first byte is the id of the identification. The next bytes are the encoded variable sized integer of the max value. For crafted items, the max values can be used to calculate the minimum values (10% of the maximum, rounded) and the current values (from the overall effectiveness).

Custom Consumable Type block

ID: 13
Data: The data is a single byte, containing the id of the type of the item. See the ID map below.

Consumable Type map
ID Type
0 Potion
1 Food
2 Scroll
3 Consumable*
  • Fallback types

Uses block

ID: 14
Data: The first byte is the remaining uses for the item. The second byte is the maximum uses for the item.

Effects block

ID: 15
Data: The first byte is the number of effects. An effect is encoded the following way: The first byte is the id of the effect. The next bytes are the encoded variable sized integer of the effect's value.

Consumable Effect map
ID Type
0 Heal
1 Mana
2 Duration

Referenced data files

Shiny Stat Table

https://github.com/Wynntils/Static-Storage/blob/main/Data-Storage/shiny_stats.json

Identification ID-map Table

https://github.com/Wynntils/Static-Storage/blob/main/Reference/id_keys.json

@kristofbolyai kristofbolyai self-assigned this Nov 11, 2023
@kristofbolyai kristofbolyai pinned this issue Nov 11, 2023
@kristofbolyai
Copy link
Collaborator Author

@magicus The gear item encoding sketch is complete, perhaps it would be the time for you to look at this? I know you had some stronger opinions on this before..

This is just a draft until the V3 item API is migrated (waiting on Wynn). This gives us time to discuss even major changes, if you don't agree with some parts.

As for crafted encoding, I don't plan to work on that until the item encoding itself is fully complete.

@magicus
Copy link
Member

magicus commented Nov 15, 2023

Ok... One thing that was a problem for us before was the order of stats. In legacy, they had a (somewhat arbitrary) way of ordering stat types, and then stats were sent as an ordered list of the values for the stats. This meant that both parties had to agree to the order. I'm not sure how you suggest to solve this. In fact, I don't see that you even address this..?

We can keep sending the stat values in order, but then we have to specify it very clearly. Or we can send stats as key-value pairs, so we give each stat kind a numeric id, and then basically, if we have Dexterity +4, we send 0x31:0x04, if 0x31 were the code for Dexterity. Or whatever. This will essentially double the amount of data needed to be transfered, though, so will cause more issues for vanilla players.

@magicus
Copy link
Member

magicus commented Nov 15, 2023

Hm, I was thinking, can we maybe inject some control characters to make it appear less bad for Vanilla players? In the "good old days", you'd have stored a ^H (backspace) as every other character, that way all but one of the "unknown squares" characters would have been overwritten. I don't know if that trick is possible in Minecraft chat, but it's worth exploring.

Or, maybe there is some new fancy Unicode stuff we can use. I'm pretty certain there are a lot of control codes meaning "combine the following letters".

If we can minimize the visual impact on Vanilla players, I see no real need to keep the string to an absolute minimum. Then it would be better to encode things in a way that is more self-describing and thus stable.

@hppeng-wynn
Copy link

hppeng-wynn commented Nov 15, 2023

Byte Based Encoding Proposal:

Basically I was thinking it would be nice if we could work in a more standard format.
This would then be converted to/from unicode (base-4096) and/or base64 (wynnbuilder's "native format") easily using a bit of boilerplate.

| represents concatenation. size of data entries is written in square brackets [].

An encoded thing is of the following form, a list of blocks:

version[1 byte] | itemtype[1 byte] | header | block | block | ... | END

(hopefully we don't need more than 256 versions...)

END is the literal 255. which means each itemtype is allowed 255 legal block types.
Technically the END block is optional if you're just encoding one item, but its useful for other applications (ex. wynnbuilder full build encoding).

itemtype can be one of the following:

0	item with optional rolls
1	crafted item description (from ingredients)
2	complete item description

Each block has the following format:

type[1 byte] | data[0+ bytes]

where the meaning of the type block depends on the itemtype of the entire encoded object.

The header specification may also differ between itemtypes.
Basically it represents the "required fields" for that kind of item.

Item with optional rolls

The header for this is two bytes:

item_id[2 bytes]

item_id is the numeric ID of the item.

NOTE: we need to decide on a standard item ID map. We also need to decide on a canonical ordering of the stats for any item.

Available block types:

0	rolls buffer
1	stars buffer
8	powder buffer
9	shiny data
10	rerolls
32	wynn api version

rolls buffer

A byte array, where every byte corresponds to a roll value (30-130).
The ordering of item rolls is dependent on an (external) canonical ordering, TBD.
Only the rolls that are actually on this item are stored.
NOTE: Intentionally not storing 0-100 rolls, because in the past wynn has changed this in a janky manner.

If it is absent, max rolls for every stat are assumed (or base rolls, for fixed stat items).
If it is present, its length must match the base item's number of stats.

stars buffer

A packed byte array, where each byte is formatted as follows:

star[2 bits] | star[2 bits] | star[2 bits] | star[2 bits]

where star is a number from 0-3 indicating the number of stars.
The array is right padded with zeros to align with the byte boundary.

If it is absent, stars are computed from the rolls buffer.
If it is present, its length must match the base item's number of stats.

powder buffer

A binary blob, padded out to the nearest byte.
Every 5 bits corresponds to a powder, via the following algorithm:

// Decode a powder number.
// Accepts a number from 0 to 31.
// 0 is a special character for "No Powder".
// 31 is invalid.
// 1-30 represent the 30 powders in wynncraft.
function decode(powder_num):
	if powder_num = 0:
		return NULL_POWDER
	if powder_num == 31:
		return INVALID_POWDER
	
	powder_num := powder_num - 1
	
	// Element order: ETWFA
	element = int(powder_num / 6)
	tier = (powder_num % 6) + 1
	
	return Powder(element, tier)

If it is absent, all powders are assumed to be NULL_POWDER.
If it is present, its length must match the base item's number of powder slots.

shiny data

Of the following format:

type[1 byte] | counter[8 bytes]

counter is a single unsigned 64-bit number indicating the value of the shiny data.

type selects from a table of possible shiny entries:

Table TODO: hpp does not know what shiny stats are like.

If it is absent, we assume there is no shiny data.

rerolls

Of the following format:

rerolls[1 byte]

If it is absent, we assume 0 rerolls.

wynn api version

Of the following format:

wynn_version[2 bytes]

(can we bet on there being less than 16k api updates? I hope so...)
If it is absent, we assume the latest version of the wynn api.

Crafted item (from ingredients)

The header for this is fifteen (15) bytes:

ing1[2 bytes] | ing2 | ing3 | ing4 | ing5 | ing6 | recipe[2 bytes] | meta [1 byte]

corresponding to ingredients

1 2
3 4
5 6

recipe is a craft recipe ID (from a standard list) that contains information about the craft level, type, and base stats (hp, damage, number of charges).

meta breaks down as follows:

tier1[2 bits] | tier2[2 bits] | unused[1 bit] | atkspd[3 bits]

where tier1 and tier2 are the tiers for material 1 and material 2 (from the recipe), and atkspd is the attack speed (for weapons).

Available block types:

8	powder buffer
11	item name
12	item lore
32	wynn api version

item name

A null-terminated string. (any number of nonzero bytes, followed by a zero byte to mark the end of the string.)

item lore

Same format as item name.

Complete item description

The header for this is three bytes:

num_ids[1 byte] | meta[2 bytes]

meta is defined as follows:

num_powder_slots[6 bits] | unused[1 bits] | item_type[5 bits] | rarity[4 bits]

item_type table:

0	helmet
1	chestplate
2	leggings
3	boots
4	ring
5	bracelet
6	necklace
7	wand
8	spear
9	bow
10	dagger
11	relik
12	potion
13	scroll
14	food
15	weaponTome
16	armorTome
17	guildTome

rarity table:

0	Normal
1	Unique
2	Rare
3	Legendary
4	Fabled
5	Mythic
6	Set
7	Crafted

Available block types:

1	stars buffer
2	stat ID buffer
3	stat length buffer
4	max stats buffer
5	min stats buffer
8	powder buffer
9	shiny data
10	rerolls
11	item name
12	item lore
13	item description
14	item texture
15	current durability

stars buffer, powder buffer, shiny data and rerolls are identical to parsing an Item with optional rolls.
item name and item lore are identical to parsing a Crafted item (from ingredients).

stat ID buffer

A list of the stat IDs for each stat on this item.
Should contain no duplicates, or else the behavior is undefined.
TODO: need an agreed-upon list. Maybe wynn API if its stable?

NOTE: this is not optional!

stat length buffer

A list of the length (in bytes) for encoding each stat.
The min and max stats will each be encoded using the same length.
Lengths are packed (4 bits each), and the result is right padded with zero if needed.
(The actual length is one more than the value in the array; since length 0 will never be used. This allows stats to be up to 16 bytes long.)

NOTE: this is not optional!

max stats buffer

A binary blob containing data about the max stats for this item.

The order is given by the 'stat ID buffer', but each entry can have variable size in bytes.
The size is given by the 'stat length buffer'.

Unlike normal buffers, this buffer stores numbers using two's complement.
This allows negative numbers to be stored as well!

NOTE: level, attackspeed, hp, max durability/charges, stat req, are all counted as "stats".
So they can go here.

attackspeed lookup table:

0	SUPER_SLOW
1	VERY_SLOW
2	SLOW
3	NORMAL
4	FAST
5	VERY_FAST
6	SUPER_FAST

NOTE: this is not optional!

min stats buffer

Same format as max stats, but its optional.
If left out then max stats are used and the item is assumed to be a fixed ID item.

item description

Extra string field to accomodate things like event items.

item texture

TODO

current durability

Single byte, storing the current durability/charges of this item.

@RawFish69
Copy link

magicus made excellent points, I also like the idea of sending key-value pair for stats, and if a stat order list is still required, I think it'd be feasible to use the order from v3 item API, or the order of individual item identifications. (Assume they don't change it)
The byte base encoding hpp described above is nice, especially for the crafted-item encoding, it's a straightforward way

@kristofbolyai
Copy link
Collaborator Author

Identification hash-check: Since the encoded values depend on other factors, mainly the API's identifications list and base values, we can't be sure that the sender and the receiver has the same understanding of the item. In a perfect world, we could send the identification names along with the base values in the encoded message. However, doing this would considerably increase the encoded message's length, making this option not practical in applications.

For this reason, identification hash character is included in the encoded data. It is highly likely that hash collisions to happen in real world, however this should still give the clients a way to catch issues in most cases.

Doing this hash-check on the receiver side is an optional implementation detail, however all senders must include this data.

This should answer the why's and why nots of doing key-value pairs. As for having both parties agree, that is what the hash is used for. It is basically an error-checking system, but not an error-correcting one.

@hppeng-wynn
Copy link

+1 to the k-v pairs (essentially what my idea has; but like "unrolled" into a few separate buffers. I think this makes it easier to make entries of the mapping optional, or add new entries (if for whatever reason that is needed)).

@kristofbolyai
Copy link
Collaborator Author

Identification hash-check: Since the encoded values depend on other factors, mainly the API's identifications list and base values, we can't be sure that the sender and the receiver has the same understanding of the item. In a perfect world, we could send the identification names along with the base values in the encoded message. However, doing this would considerably increase the encoded message's length, making this option not practical in applications.
For this reason, identification hash character is included in the encoded data. It is highly likely that hash collisions to happen in real world, however this should still give the clients a way to catch issues in most cases.
Doing this hash-check on the receiver side is an optional implementation detail, however all senders must include this data.

This should answer the why's and why nots of doing key-value pairs. As for having both parties agree, that is what the hash is used for. It is basically an error-checking system, but not an error-correcting one.

To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant.

@hppeng-wynn
Copy link

To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant.

wynnbuilder internally uses implicit order too (defined in an external data file). I think it would be best if we could rely on implicit order as much as possible and use a stable "item ID lookup table" and "stat ID lookup table" to define the ordering.

@kristofbolyai
Copy link
Collaborator Author

kristofbolyai commented Nov 15, 2023

The version I see being implemented may just be a third version, combining the good aspects of both proposals.

To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format.

I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items.

What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format).

What do you think @hppeng-wynn @RawFish69?

@kristofbolyai
Copy link
Collaborator Author

To be clear, when encoding for 3rd parties, I am more than happy to include ID keys. But for chat encoding itself, I do think it would be too long and/or too redundant.

wynnbuilder internally uses implicit order too (defined in an external data file). I think it would be best if we could rely on implicit order as much as possible and use a stable "item ID lookup table" and "stat ID lookup table" to define the ordering.

We do have an implicit internal order too. A "legacy" one is used for chat items, and Artemis has 3 custom orders. Any of those could be used for agreeing on a common order or id-key map.

@hppeng-wynn
Copy link

hppeng-wynn commented Nov 15, 2023

I chose a single byte because its basically the default "bit of data" across computers in general
2 bytes could also work I guess but there's already a lot of fields that are much smaller than that (most IDs will fit in like 5 bits lol) so I think it would be wasteful

The version I see being implemented may just be a third version, combining the good aspects of both proposals.

To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format.

I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items.

What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format).

What do you think @hppeng-wynn @RawFish69?

for "just gear items" do you mean like, just the normal rolled items?
i mean thats pretty simple our two encoding proposals are basically identical i guess (though i separated out stars as a separate buffer to make it easier to include as an optional entry for applications that don't need it)

@magicus
Copy link
Member

magicus commented Nov 15, 2023

@kristofbolyai We seem to talk just past each other. Your hash check is for the stat values ("ids"). I'm talking about the stat types. Your check would help in a situation where Wynn has e.g. nerfed the base value of a certain stat. But it would not help if there is a misunderstanding in the order of stats.

I spend a sh*tload of hours trying to clean up the stat handling from Legacy to Artemis. And the "ordering" of stats was a common pain point. In the end, I had to create a special ordering just to accommodate the old "item chat protocol". And I realized it would be terribly broken for all new stat types that had been introduced since it was created.

So, I am very very skeptical towards any idea of "assumed" ordering. If you chose to go down that route, you will basically need to bump the version number each time Wynn adds a new stat type. If, on the other hand, you chose key-value pairs, and have a way to assign numeric ids to the stats (here you can use whatever order you agree on and just enumerate the stat types from that list), then you are safe for all future. If a client receives a number it does not understand, it can just say: "Unknown Stat: 7".

@magicus
Copy link
Member

magicus commented Nov 15, 2023

Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:

  1. it will most certainly lead to longer strings in chat
  2. it is very much unclear what would be gained by anyone of us having a common binary format in the chat...

@kristofbolyai
Copy link
Collaborator Author

kristofbolyai commented Nov 15, 2023

@magicus hhpeng works on/is the creator of Wynnbuilder. This is basically full integration with parts of Wynnbuilder, and an overall format for anyone to encode items in Wynn in the future.

@kristofbolyai
Copy link
Collaborator Author

kristofbolyai commented Nov 15, 2023

Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:

  1. it will most certainly lead to longer strings in chat

  2. it is very much unclear what would be gained by anyone of us having a common binary format in the chat...

I am working on a concept that would reduce the length of the encoded strings, even with bytes.

@hppeng-wynn
Copy link

hppeng-wynn commented Nov 15, 2023

Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:

1. it will most certainly lead to longer strings in chat

2. it is very much unclear what would be gained by anyone of us having a common binary format in the chat...

On the contrary binary format encoded in unicode is probably going to be shorter (if you design it without much padding) since there's much less wastage

ex. in @kristofbolyai 's original specification, the rolls + stars are being encoded using 12 bits (1 unicode char, 4096 values); but they really only take up 10 bits (including the 30 offset). So the binary code would be more efficient by nearly 20%

@hppeng-wynn
Copy link

@kristofbolyai We seem to talk just past each other. Your hash check is for the stat values ("ids"). I'm talking about the stat types. Your check would help in a situation where Wynn has e.g. nerfed the base value of a certain stat. But it would not help if there is a misunderstanding in the order of stats.

I spend a sh*tload of hours trying to clean up the stat handling from Legacy to Artemis. And the "ordering" of stats was a common pain point. In the end, I had to create a special ordering just to accommodate the old "item chat protocol". And I realized it would be terribly broken for all new stat types that had been introduced since it was created.

So, I am very very skeptical towards any idea of "assumed" ordering. If you chose to go down that route, you will basically need to bump the version number each time Wynn adds a new stat type. If, on the other hand, you chose key-value pairs, and have a way to assign numeric ids to the stats (here you can use whatever order you agree on and just enumerate the stat types from that list), then you are safe for all future. If a client receives a number it does not understand, it can just say: "Unknown Stat: 7".

tbf I think adding new stat types is pretty rare and i'd be OK with bumping the version number when that happens
or honestly just make the "stat mapping" append only -- that way older parsers would just ignore the out of bounds stat ID

@hppeng-wynn
Copy link

minor fixes to my comment proposal:

  • fix item_type field to be 5 bits (slightly more annoying to parse, but now actually fits all the item types)
  • change the way string parsing is handled (standard null-terminated string, like from C)

@magicus
Copy link
Member

magicus commented Nov 15, 2023

just make the "stat mapping" append only -- that way older parsers would just ignore the out of bounds stat ID

What do you mean? It sounds like you are talking about values, not types? I am still mostly worried about matching the correct type.

@magicus
Copy link
Member

magicus commented Nov 15, 2023

Also:

On the contrary binary format encoded in unicode is probably going to be shorter (if you design it without much padding) since there's much less wastage

That sounds great! As I said, I am all in favor of standardizig formats. Not sure how it helps us, but as long as it doesn't pose a problem for us, just go for it.

@RawFish69
Copy link

RawFish69 commented Nov 15, 2023

The version I see being implemented may just be a third version, combining the good aspects of both proposals.

To come to an agreement, in a timely manner, there is a really simple first step to take: Agree on a mutual base class for a "character" / "data block" / "byte", basically the smallest chunk of data we share. Creating this class would give us easy ways of encoding and decoding, in a clear, unit testable and even sharable format.

I also think that we should first focus only on encoding gear items. This is the easiest case, and gives us valuable info, before working on the custom, and much more complex items, like crafted and "unique" items.

What I like from your format is the simplicity of encoding for some parts of the blocks. I would like to use it, or something similar to it. As for the "common building block" it's either should be written in base 16, as unicode encoding is basically 4 hex bits. However, thinking in hex is much harder than bytes. Since 16^4 is exactly 2^16, we could make our "common building block" 2 bytes. That would give us a really straight forward way of encoding to both base64 and Unicode. (And it would also allow Wynnbuilder to decode/convert chat items from unicode, as you would only have to do almost nothing to extract the data to a byte format).

What do you think @hppeng-wynn @RawFish69?

It looks good, I only do decode and it shouldn't matter since the concept @hppeng-wynn proposed is similar enough.
I would rather get around a common order list, that both parties have to agree upon, if possible.
The stat name and list size may vary as the game updates, basically causing pain like @magicus mentioned above, it would benefit 3rd party receivers to use alternatives. If there's any reason to, pick an order from the 4 existing ones is also fine, whatever is more convenient in the long run.

@hppeng-wynn
Copy link

just make the "stat mapping" append only -- that way older parsers would just ignore the out of bounds stat ID

What do you mean? It sounds like you are talking about values, not types? I am still mostly worried about matching the correct type.

The way wynnbuilder has done it, we basically have a big list of stats in an order, and whenever wynn added new stats to the game, those stats always get appended to the end of the list

(talking about the stat mapping, not the stats for any given item.)

that ensure that the old stats are always in the same order, and new stats basically get new IDs.

That's one way to create "implicit backwards compatibility" without much effort. However it comes at a small cost to readability (for example, the damage stats are not all next to each other in this list.)

@kristofbolyai
Copy link
Collaborator Author

Also, I like the generality behind @hppeng-wynn's proposal, that we can have a common binary format, and then encode that binary string into chat using unicode characters. However:

  1. it will most certainly lead to longer strings in chat
  2. it is very much unclear what would be gained by anyone of us having a common binary format in the chat...

I am working on a concept that would reduce the length of the encoded strings, even with bytes.

Basically I am thinking of using bytes as smallest data chunk, but with a trick to make it really efficient in chat:

Each block type would define (in the standard, not in encoding) their "requested" data size. It would either be 8 bits, 16, 32 or 64. This would work easily with both encoding formats: Unicode characters in the Supplementary Private Use Area-A can encode any value between 0xF0000 and 0xFFFFD (and with some tricks we can encode 0xFFFFE-0xFFFFF too, although I am not exactly sure how at this time). As for the Wynnbuilder base64 encoding, encoding a byte-array is pretty straight forward.

Blocks would not only define their "integer" size, but their length, so there would not be a need to reserve any characters for block types, and there would be no need to reserve/use a character for separating parts. As for the block headers itself, 1 byte would represent the type, 1 byte would give us the size of the block (divided by the block's data size).

I think all of us understand the benefit of having variable sized blocks, but let me state an obvious case. If we support 64-bit integers natively in the standard, there is no black magic needed when encoding such values. Also supporting lower bit sizes, like 8 and 16 allow us to efficiently bundle information like identification key-value pairs.

And the best of all of this is that the Unicode representation would be close to being the most efficient it can be (practically, not theoretically).

What do you think? If we all agree here, we can go ahead and agree on the standard for encoding normal gear items, and implement that while getting the encoders/decoders written in the process. Once we know encoding/decoding is stable, we can move to working on the "fun" parts.

So, if you agree, please react with an emoji :)

@hppeng-wynn
Copy link

Each block type would define (in the standard, not in encoding) their "requested" data size. It would either be 8 bits, 16, 32 or 64. This would work easily with both encoding formats: Unicode characters in the Supplementary Private Use Area-A can encode any value between 0xF0000 and 0xFFFFD (and with some tricks we can encode 0xFFFFE-0xFFFFF too, although I am not exactly sure how at this time). As for the Wynnbuilder base64 encoding, encoding a byte-array is pretty straight forward.

maybe I'm confused now. I was thinking of using the space you had reserved for encoded numbers -- is that not good for chat display? if that's the case then this is a much harder problem... why did they give you only 4094 options... technically still doable with like BigInteger or something but that's much much more annoying

I don't understand why the blocks need "preferred data size". fundamentally the byte encoding would be like, just running over unicode character boundaries as follows:
image
there is no need to specify the "external" word size. In fact the byte word size is pretty arbitrary (as mentioned by mahakadema in discord) and honestly a pure binary format might work better. I haven't really measured the inefficiency we incur by using this word size

@kristofbolyai
Copy link
Collaborator Author

For those following this proposal, I've updated the issue description to reflect the current state of the format. Many discussions happened outside of Github, but hopefully all the changes we've agreed upon are implemented in the format now.

I an update the format is planned, adding 2 other types: custom items (crafted gear, custom normal items) and crafted items as recipes.

@kristofbolyai
Copy link
Collaborator Author

xxrxxxrxrx

Hm, I was thinking, can we maybe inject some control characters to make it appear less bad for Vanilla players? In the "good old days", you'd have stored a ^H (backspace) as every other character, that way all but one of the "unknown squares" characters would have been overwritten. I don't know if that trick is possible in Minecraft chat, but it's worth exploring.

Or, maybe there is some new fancy Unicode stuff we can use. I'm pretty certain there are a lot of control codes meaning "combine the following letters".

If we can minimize the visual impact on Vanilla players, I see no real need to keep the string to an absolute minimum. Then it would be better to encode things in a way that is more self-describing and thus stable.

I realize I've never responded to this suggestions. Vanilla players seeing a lot of unknown characters is a problem, but the main reason to keep the encoding short is so multiple items can fit into the relatively short maximum chat length (128 chars) Minecraft sets.

@FYWinds
Copy link

FYWinds commented Dec 17, 2023

Powder block
ID: 4
Data: The data is binary blob, padded to fit the nearest 8 bits with 1 bits. A powder is encoded in 5 bits, with the following math: element * 6 + tier. The elements follow an ETWFA order. 5 0 bits are used to represent that no powder is present at the slot.
If it is absent, all powder slots are assumed to be unpowdered.
If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.

Here, for the Powder block, we are missing information about the number of the powders. For a universal standard, where items are not limited to official wynncraft items (with the information of slots number), the only way to determine the stop of the powder block is the start of Rerolls block.
This will result in a problem, with proper combination of several powders, the byte array will have a 5 in it representing the powders. Here is a piece of deliberately fabricated data where the 5 appears:

   W4, W4, W4, W4, T4
-> 16, 16, 16, 16, 10
-> 10000 10000 10000 10000 01010
-> 0b10000100, 0b00100001, 0b00000101, 0b01111111
-> 132, 33, 5, 127

@kristofbolyai
Copy link
Collaborator Author

Powder block

ID: 4

Data: The data is binary blob, padded to fit the nearest 8 bits with 1 bits. A powder is encoded in 5 bits, with the following math: element * 6 + tier. The elements follow an ETWFA order. 5 0 bits are used to represent that no powder is present at the slot.

If it is absent, all powder slots are assumed to be unpowdered.

If it is present, but it's length does not match the number of powder slots of the item, it is assumed that the rest of the slots are unpowdered.

Here, for the Powder block, we are missing information about the number of the powders. For a universal standard, where items are not limited to official wynncraft items (with the information of slots number), the only way to determine the stop of the powder block is the start of Rerolls block.

This will result in a problem, with proper combination of several powders, the byte array will have a 5 in it representing the powders. Here is a piece of deliberately fabricated data where the 5 appears:


   W4, W4, W4, W4, T4

-> 16, 16, 16, 16, 10

-> 10000 10000 10000 10000 01010

-> 0b10000100, 0b00100001, 0b00000101, 0b01111111

-> 132, 33, 5, 127

I've thought about this being an issue, but I've shrugged it off, and I've only written the encoding part. A simple solution is to have a "null" byte at the end of the list, or to send the powder count. Both solutions use a single byte. I would lean toward sending a single, which is common in the standard.

Do you have a better idea perhaps?

@FYWinds
Copy link

FYWinds commented Dec 17, 2023

Do you have a better idea perhaps?

My first thought is to move the powder block to the last block which will naturally gives it a termination and then I realize this is a terrible solution without any robustness. And then by the entropy the only way is to send one more byte. I prefer sending the count, cause this avoids culling the 1 bits for padding.

FYWinds added a commit to FYWinds/WynntilsResolver that referenced this issue Dec 22, 2023
For the new standard, check: Wynntils/Artemis#2246
Not compatible with old standard, please lock the version if necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants