Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controversial tweet that is deleted and republished anew to erase community notes #146

Open
FrancoisLamotte opened this issue Aug 29, 2023 · 8 comments

Comments

@FrancoisLamotte
Copy link

Is your feature request related to a problem? Please describe.

On some controversial subjects, authors who are faced with notes that contradict them simply delete their tweet and republish it afresh.

Even when adding new notes, they continue.

I'd like to open the discussion to see how this kind of behavior could be handled. Sometimes several notes that provided interesting context have disappeared.

Describe the solution you'd like

I don't have any particular solution, but I'd like to have a discussion to find something simple to implement.

If a tweet is deleted, would it be possible to find the note associated with it?

If a note disappears, could its author be notified (and still have access to his copy)?

One idea might be to mention in the attributes that the original tweet has been deleted (with a screenshot of the original containing the presence of notes) if an author of the original note comes back to the new tweet. (I know it's a complicated process).

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

It's a complex subject so discussion would help gather ideas to sort through.

@sp4rkh
Copy link

sp4rkh commented Aug 29, 2023

Hello, I'm new to this project so what I will say may not be correct.

It looks like the note is linked to the tweet by its ID, so it would be possible to link a note by addind a field that will take the previous tweet ID, search in the "note database" and link it to the new tweet ID if found.
Like that you don't loose the historic of the note + you can add a comment / flag like "previous tweet deleted".

The cons of this method are:

  1. You need to remember the previous tweet ID
  2. You can attach the note to another tweet that has no relation with the previous one and even to another author

To solve the 2nd point, you can save the tweet message and author and compare them with the ones of the new tweet.
I don't know if getting these 2 informations is doable and it may be illegal to save them in a seperate file in some country, like in Europe you need to check the GDPR to be sure it doesn't break the law.

Also I think you will need an algorithm that check the compatibility of the 2 tweet's messages (old and new ones), and if the compatibility is 80% or more, it means the message is the same and from the same author, so you can link it, else the message changed too much and it may be a fake link. It will help in case the author changed a little the tweet message.

Like I said, I don't know if it's doable, looks like a lot of work and may be too complex to do.

@sixChar
Copy link

sixChar commented Sep 18, 2023

How many of these re-published tweets are slightly changed and how many are exactly the same (i.e. copy pasted)?

If mostly the exact same, would storing a hash of the tweet's content work? Then you wouldn't have to keep track of previous tweets and it would be pretty straightforward to look for matching notes.

This would ignore authorship but that doesn't seem like a problem. If the content is the same, the note should still apply in most cases. When it doesn't it should get rapidly down rated.

I thought that most of the tweets that get noted would be linking to an article. However, when I looked at a few that didn't seem to be the case (9 with article linked/42 checked although I wasn't super careful and a quote tweet may have been mistaken in there) so just checking for the same link wouldn't solve the problem in a majority of cases.

If I count all the forms of media including images, videos, gifs, and links I get 30/42. Maybe storing hashes of the non-text portions of tweets would help in the case where text is changed slightly on reupload?

Personally, I think it would be better to store a vector representing the meaning of a tweet as extracted by something like BERT (hopefully more efficient) or maybe user interactions. Then you could automatically attach old notes to new tweets if they seem similar and user ratings filter out the bad matches. A stored vector like that could also help with recommendations but I don't know how much the extra storage would cost.

@elvey
Copy link

elvey commented Nov 12, 2023

To address the kind of duplicate posts of the title of this issue, X should be adapting certain well-established, mature, free tools developed to, for example, detect unsolicited bulk email. SimHash, MinHash, Vipul's Razor, Pyzor... en.wikipedia.org/wiki/Fuzzy_hashing#Notable_fuzzy_hashing_tools_and_algorithms ...
Perhaps X is using some but below the radar - to reduce visibility. Either way, they should be deployed and configured to specifically target this kind of devious content, in addition to spam in general. For this kind, X should do more than deboost the content.
X has tools that cause certain notes to appear on more than one tweet - IIRC, currently they only work when an image is reposted, but should be expanded. They should always appear on near-identical tweets, whether by the same author or not.

@armchairancap
Copy link

On some controversial subjects, authors who are faced with notes that contradict them simply delete their tweet and republish it afresh.

I assume this trick makes more sense for accounts that have enough followers. Reposting could be reported as Spam and the account blocked on those grounds, but I'd be careful about such shortcuts as the owner may be simply trying to create a more accurate version of their post.

Now every X account's profile has Highlights, Media, Likes, etc. Could X add a "Community Notes" tab (it's a bit too long, so maybe come up with something shorter such as "Ratings", "Reputation"?) where one could easily check the account's CN score?
That would make it easy to see if that's likely a habit or an exception. If it's sustained (many Community Notes), then reporting the account for Spam would be fine even if the latest post didn't have any Community Notes attached to it.

@armchairancap
Copy link

What if posts cannot be deleted after they have been Community Noted?

Seriously?

This repo says it Community Notes aim to "create a better informed world, by empowering people on X to add helpful notes to posts that might be misleading". If a node is gone, it's no longer misleading.

X isn't supposed to build a Stasi-like dossier of each user's bad posts. If they feel bad about it and remove it, that should be enough.

Then there are various implementation challenges, where people don't want to delete their post, but if they're not a premium member the only way to fix a misspelling is to delete and repost. But these aren't the main reason why this shouldn't happen, IMO.

In my opinion it's more than enough to note in each user's stats how many post they've deleted, and how many of those had Community Notes attached.

@elvey
Copy link

elvey commented Dec 5, 2023

I think the solution is that when the note associated with a tweet becomes public, it should be shown next to all tweets that contain text or graphics that closely match the tweet, or at least all tweets that have been viewed in the last n weeks. I'm not sure when the comparison should be done, but that's an implementation detail; I presume tweets are already run through similarity hashes like the ones I mentioned above 3 weeks ago and the results used to identify similar tweets.

@armchairancap
Copy link

Well, that's a different thing than the original ask to handle evasion of notes by deleting posts which have them and republishing the content as new notes (obviously without a note, at least initially).

If it's republished verbatim, then the hash will be the same. It's simple enough to just score the account's deleted posts with community notes. If you see an account with 2-3 of those, they're probably not straightforward, if it's 5, it's probably a disinformation agent/bot/account and X can suspend it until the owner clears it with Support, and if it's 7 or 10, X could permanently suspend or delete it. The numbers/limits are just examples. There's nothing controversial here.

If the posts is changed, then it's a very different ask and not just an implementation issue.
For example a person can post something that doxxes another person. If it gets CN'd, they may delete that one and repost the photo with pixelated facial features that do not doxx anyone. There's no reason to auto-tag this with the same CN despite any similarity.

@elvey
Copy link

elvey commented May 4, 2024

@twitterbirdwatch : Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants
@FrancoisLamotte @elvey @sp4rkh @sixChar @armchairancap and others