Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing contacts results in duplicates #52

Open
tmo1 opened this issue Aug 31, 2022 · 7 comments
Open

Importing contacts results in duplicates #52

tmo1 opened this issue Aug 31, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@tmo1
Copy link
Owner

tmo1 commented Aug 31, 2022

The jq command returned 279 contacts as expected. Tried it in android emulator to import, 836 imported. And all the contacts are imported twice. Indeed, the first contact is correct (with image and all fields) and the second is completely empty, only the name-title is correct.

Originally posted by @thanasistrisp in #50 (comment)

@tmo1 tmo1 added the bug Something isn't working label Aug 31, 2022
@tmo1
Copy link
Owner Author

tmo1 commented Aug 31, 2022

It's going to be difficult to figure this out without being able to reproduce the problem. Android is supposed to "aggregate" "matching" contacts, but I don't see a definition of "matching" or a precise specification for the "aggregation" procedure.

How many contacts are actually present after import? Assuming you import your 279 exported contacts into an empty contacts list (e.g., in a fresh emulator image) and then turn around and export them, how many are reported as exported?

@thanasistrisp
Copy link

Exported again from your app says that 567 exported.

@tmo1
Copy link
Owner Author

tmo1 commented Aug 31, 2022

567 is more than twice 279, so it's not just a neat case of each contact appearing twice.

@thanasistrisp
Copy link

thanasistrisp commented Aug 31, 2022

Your app in the initial export showed that 279 exported, however when importing from the app said 836, again export said 567. The 279 is the correct number that it should imported...

@thanasistrisp
Copy link

thanasistrisp commented Aug 31, 2022

567 is more than twice 279, so it's not just a neat case of each contact appearing twice.

As I saw in general, twice contacts exist, but maybe some apps are shown three times as I can understand

@tmo1
Copy link
Owner Author

tmo1 commented Sep 2, 2022

I may have a solution for this, and I starting implementing it in code, but I can't really test or debug it without a contacts collection that displays the problem. Are you willing to post a redacted version of yours? You can do the following:

  1. Create a smaller collection that still has the problem, using the max-records / max_messages preference setting. (The latest commit changed its name from the latter to the former, and enabled it in non-debug builds.)
  2. Redact any information you consider private / personal / sensitive. The following command (where contacts-nnnn-nn-nn.json is the original file exported by the app, and contacts-redacted.json will be the redacted version) will remove much / most of such information:
jq 'walk(if type=="object" then with_entries(if ((.key | startswith("display_name")) or (.key | startswith("sort_key")) or (.key | startswith("data")) or (.key == "account_name")) then .value |= "REDACTED" else . end) else . end)' contacts-nnnn-nn-nn.json > contacts-redacted.json

You should still go through the redacted version to make sure there's nothing you don't want there, and I can take no responsibility for any sensitive information leaking through.

@1Dragoon
Copy link

1Dragoon commented May 24, 2024

Hey I've noticed that I get several duplicates from this, I think this should be a good enough sample. Often times I get as many as four duplicates, and I think this is why:

 grep 'account_type' ./contacts-redacted.json | sort | uniq
"account_type": "com.google",
"account_type": "com.google.android.apps.tachyon",
"account_type": "com.whatsapp",
"account_type": "org.thoughtcrime.securesms",

(removed)

My thought is it might be more useful to sha256sum+truncate each field instead of redacting it, but I think I'd need to write some actual code for that as I don't believe jq can do that.

edit: In fact I'll do something better...

edit 2: Something like this works?
contacts-2024-05-23-chirodacted.json

Some useful stuff:

key: account_name value: Meet -> "dunno_0807"
key: account_name value: Signal -> "dunno_0335"
key: account_name value: WhatsApp -> "dunno_0540"

Script: https://github.com/1Dragoon/chirodactor/

Basically it finds interesting fields and attempts to normalize them, then stores them in an ordered and deduped array, then inserts the order offset in its place along with a guess of what type of data it is. Not perfect, but should be good enough easily determine which contacts are related to each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants