Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile / decompile emojis #7861

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

compile / decompile emojis #7861

wants to merge 8 commits into from

Conversation

jwunderl
Copy link
Member

@jwunderl jwunderl commented Feb 4, 2021

I like using emojis for examples sometimes but it gets bad when you compile (just some combination of _ and numbers), so this would let allow emojis to persist between compile -> decompile:

shiny shinies

Obviously the generated names aren't great (they're just "E" + emoji's codepoint in base 16 + "X", maybe could use $ to surround instead as that's also valid / uncommonly used anyways) but I don't know it's any worse than just _ / _2 / etc; starting as a draft though as I don't know if we'd want to take it necessarily for this reason

test build with a sample project: https://arcade.makecode.com/app/0610dda95754bed7e6df6ce7485f7d128dd0f4ea-1375337f35#pub:_XLqH792iMbks

Most of the diff is from https://github.com/microsoft/pxt/pull/7861/files#diff-b3b92c2632ab9b3508e5ea8cd2f72f4d5829fbfb8ef35ab61303cc85d61b93aeR30, which I needed to change here as it produces invalid encodings for most emojis (unicode 'astral plane' codepoints) / breaks anything above 0xffff; as far as I can tell html / xml with utf8 encoding should only ever need to encode <>"'& special anyway though so this seems fine / better anyways? e.g. https://www.w3.org/International/questions/qa-escapes#use & https://hsivonen.fi/producing-xml/#noescaping

I messed up switching between branches after my local version of the branch broke and accidentally pulled when I should have checkout'd, so I accidentally merged this into master / reverted it already D:. The commits from when I wrote it are in 8e3181e...bb4281a, and I reverted that in 596e90c, so this pr would... revert that.

edit added support for debugger view as well, and made that show unescaped names (e.g. a variable text list currently shows as text_list in arcade/live, but with this shows without the space escaped):

image

also fixes microsoft/pxt-arcade#2774

@jwunderl jwunderl requested a review from riknoll February 4, 2021 06:13
@@ -15,12 +15,18 @@
<value name="VALUE">
<shadow type="math_number"><field name="NUM">0</field></shadow>
<block type="spritescreate">
<field name="img">img&#38;&#35;96&#59;&#38;&#35;10&#59;0&#38;&#35;10&#59;&#38;&#35;10&#59;&#38;&#35;10&#59;1&#38;&#35;10&#59;2&#38;&#35;10&#59;3&#38;&#35;96&#59;</field>
<field name="img">img`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look right... shouldn't newlines get escaped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that they shouldn't need to be escaped in xml like that re: https://www.w3.org/TR/xml-c14n/#Example-WhitespaceInContent

I believe this example would break on decompiling the image if it weren't being read correctly, right? As the https://arcade.makecode.com/app/0610dda95754bed7e6df6ce7485f7d128dd0f4ea-1375337f35#pub:_Rd0UJmiP0RWq

can always add a few more values to be escaped if we're worried about it, would probably be better to just use https://www.npmjs.com/package/he if we want the full solution though as the current escaping is broken -- e.g. it's causing crashes due to malformed encodings here microsoft/pxt-arcade#2774

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiirc we had bugs around that before. can't remember if it was us or blockly

@riknoll
Copy link
Member

riknoll commented Feb 17, 2021

i wonder if we could do a mapping to the unicode names instead? Might look a bit nicer (but would definitely be more complicated).

@jwunderl
Copy link
Member Author

For using the unicode names, I believe the values should be the same in general -- that is, it's currently encoding U+1f4da as E1f4daX in the name just to make it easier to escape / not run into a case where it tries to swap a users actual intended variable name with an emoji that happens to match. I can make it U1f4da$ or something instead (just want to keep a marker on each side so it's definite which things are part of emoji and which are just letters that are valid hexchars) -- only needs changing like this: db107f0

@riknoll
Copy link
Member

riknoll commented Feb 18, 2021

@jwunderl I meant the name of the emoji. e.g. "grinningFace" and "personInSuitLevitating"

@jwunderl
Copy link
Member Author

jwunderl commented Jul 12, 2022

Okay finally brought this up to date, I'm gonna try and get this in for next arcade release :) here's a test build, everything appears to still be working but I'll poke around a bit more

https://arcade.makecode.com/app/ef36c7fc259238e12d680c626d4af6c6bbf1b771-418c08602e#pub:_Rd0UJmiP0RWq

if we wanted to use the english name for the emoji we'd need to do something like fetch it from https://github.com/unicode-org/cldr which is ... a bit much.

We could also try base-36'ing it, which benefit to that is that it appears to be pretty consistently 3-4 chars per id instead of 4-6 then~

@jwunderl jwunderl marked this pull request as ready for review July 12, 2022 20:55
@jwunderl jwunderl requested a review from riknoll July 12, 2022 20:56
@abchatra
Copy link
Contributor

Is this PR still valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emoji break swap from JS back to blocks
3 participants