Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add message entity Unparser #1060

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

YouKnow-sys
Copy link
Contributor

This PR introduces a new utility for unparsing message entities in Telegram messages. The Unparser struct can take a text string and a list of MessageEntity objects, and generate a formatted string with the appropriate HTML or Markdown tags applied based on the entities.

The unparsing process involves iterating through the text and entities, and writing the appropriate tags to a buffer based on the entity types (bold, italic, code, links, etc.). The TagWriter trait defines the tag strings for different formatting types, and the write_tag and write_char methods handle the actual writing to the buffer.

This utility can be useful for when user want to store the received message in its formatted state, previously user had to store a Vec<MessageEntity> beside message text, you can tell this wasn't the most convenient way.

Unit tests have been added to crates/teloxide/src/utils/unparser/mod.rs to ensure the correct behavior of the unparser.

I wrote Unparser with performance in mind, so using this struct at most will cost you only 2 allocation with no reallocation or deallocation.

Questions:

  • should we re-export the Unparser functionality through some helper function in html and markdown modules in utils? as these modules are more commonly used.
  • should we make the TagWriter trait and Unparser::unparse method public so user can create their own implementation of the writer?

TODO:

  • add more tests to make sure everything work as expected
  • comment the code more

@teloxidebot
Copy link
Collaborator

r? @Hirrolot

(teloxidebot has picked a reviewer for you, use r? to override)

@teloxidebot teloxidebot added C-main crate: teloxide S-waiting-on-review Status: Awaiting review from the assignee labels May 7, 2024
Copy link
Member

@WaffleLapkin WaffleLapkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with this approach, this PR seems to add a lot of entities with no clear benefit.

What is the reason to specifically want to store html/markdown? What is so inconvenient in storing Vec<MessageEntity>?

The direction I wanted to go with formatting and such was to make it easier to work with message entities (e.g. allowing to encode them, handling the utf8->utf16 offset issue).


/// The [`MessageTextUnparser`] trait provides methods to generate HTML and
/// Markdown representations of the text and captions in a Telegram message.
pub trait MessageTextUnparser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a trait, if it's only implemented for a single type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? I wanted this helper methods on Message for making using the Unparser easier and more convenient but Message is located in teloxide-core and our unparser is in teloxide in order to add these new methods I had to use a trait...
and other then that I wanted this method to be only available only if user wanted too.

@@ -4,5 +4,6 @@ pub mod command;
pub mod html;
pub mod markdown;
pub(crate) mod shutdown_token;
pub mod unparser;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unparser" is not descriptive, we should come up with a better name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but tbh at the time I couldn't find any better name, so I ended up with unparser for the time being...
I'm open to ideas for better names

}

#[derive(Debug, Clone, Eq, PartialEq)]
enum Tag<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be two enums (start/end, kind)

@YouKnow-sys
Copy link
Contributor Author

I'm not happy with this approach, this PR seems to add a lot of entities with no clear benefit.

What is the reason to specifically want to store html/markdown? What is so inconvenient in storing Vec<MessageEntity>?

The direction I wanted to go with formatting and such was to make it easier to work with message entities (e.g. allowing to encode them, handling the utf8->utf16 offset issue).

the thing is its always easier to store one string then storing one string and one entity list (encoded or not), for example if we store them separately editing one can break another...
and another important thing is Html and Markdown are human readable formats, so user would be able to watch and edit the formatted string if we give the ability to store them as simple html or markdown to them...
and in the end this feature is common between all telegram API bot libraries, you could say there is not a single library that don't have this feature built-in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-main crate: teloxide S-waiting-on-review Status: Awaiting review from the assignee
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants