Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Conversion #100

Open
tmo1 opened this issue Feb 24, 2023 · 15 comments
Open

CSV Conversion #100

tmo1 opened this issue Feb 24, 2023 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@tmo1
Copy link
Owner

tmo1 commented Feb 24, 2023

Hi. If at all possible, I am looking for help, please. I have a CSV file of old (10-15 years ago, pre-smart phone) SMS messages that I am trying to import to my current SMS app (QKSMS). Each row is as follows (all punctuation as in original):
Row 1 of CSV-
"type","address","body","date"
For the avoidance of doubt, in my file type is 1 or 2 to show whether SMS was received or sent, address is a phone number beginning with either 0 or a +, body is the content of the SMS text, date is Unix time.
So, an example from row 2 downwards is-
2,"00447779877777","No, but I am leaving soon!","1120755000000"

How do I get this in to the correct format (such as, do the quotation marks need removing? or adding to the type?) and convert to a json that the app can read and export back out to my current SMS app? All my attempts have ended in failure. Or should I be using sms-db or something else?
Thanks very much!

Originally posted by @davebeep in #55 (comment)

@tmo1 tmo1 self-assigned this Feb 24, 2023
@tmo1 tmo1 added the enhancement New feature or request label Feb 24, 2023
@tmo1
Copy link
Owner Author

tmo1 commented Feb 26, 2023

This can be done in a single line of Python! Run this script as follows (messages.csv should be your CSV file, and messages.json will contain the SMS I/E compatible JSON):

csv-convert.py < messages.csv > messages.json

@tmo1 tmo1 closed this as completed in 944c6f8 Feb 26, 2023
@davebeep
Copy link

I will give it a go! Thank you

@davebeep
Copy link

davebeep commented Mar 1, 2023

The conversion works but there is obviously something wrong with the formatting or schema in my CSV because when I try to import to SMS I/E I'm getting 'error parsing JSON'.

@tmo1
Copy link
Owner Author

tmo1 commented Mar 1, 2023

Please post some of the (redacted) converted JSON, as per the instructions here.

@tmo1 tmo1 reopened this Mar 1, 2023
@davebeep
Copy link

davebeep commented Mar 4, 2023

Here it is:
[
{
"\u00ef\u00bb\u00bf"""type""": "1,"07777777987","No, but I'll be leaving soon.","1120755000000"",
"address""": null,
"body""": null,
"date"""": null
},
{
"\u00ef\u00bb\u00bf"""type""": "2,"00447787878781","I am not going tomorrow. Wow","1120756200000"",
"address""": null,
"body""": null,
"date"""": null
},
{
"\u00ef\u00bb\u00bf"""type""": "1,"07888888888","This is jess's phone so i only have your number!","1120756500000"",
"address""": null,
"body""": null,
"date"""": null
},
{
"\u00ef\u00bb\u00bf"""type""": "2,"00447777777987","Thanks 4 checking","1120756680000"",
"address""": null,
"body""": null,
"date"""": null
}
]

@tmo1
Copy link
Owner Author

tmo1 commented Mar 5, 2023

Hm. Is your CSV file ASCII, or something else (e.g., UTF encoded Unicode)? If the latter, try saving it as ASCII and seeing if the script produces proper JSON. If it doesn't contain sensitive information, you can post it here and I can take a look at it.

@davebeep
Copy link

davebeep commented Mar 5, 2023

OK, I tried saving in the different format and it seemed promising with no parsing errors. It looks like this:
[
{
""type","address","body","date"": "1,"07777777987","No, but I'll be leaving soon.","1120755000000""
},
{
""type","address","body","date"": "2,"00447787878781","I am not going tomorrow. Wow","1120756200000""
},
{
""type","address","body","date"": "1,"07888888888","This is jess's phone so i only have your number!","1120756500000""
},
{
""type","address","body","date"": "2,"00447777777987","Thanks 4 checking","1120756680000""
}
]
But my SMS app can't read the result. It just shows as 'null' with blank messages. Thanks for all your help and patience!

@tmo1
Copy link
Owner Author

tmo1 commented Mar 6, 2023

At this point, I really need to see your original CSV file. Sensitive data can be redacted, if desired, but I need to see the more or less original version of the file.

@davebeep
Copy link

davebeep commented Mar 6, 2023

At this point, I really need to see your original CSV file. Sensitive data can be redacted, if desired, but I need to see the more or less original version of the file.

This is the test file I used:
test.csv

@tmo1
Copy link
Owner Author

tmo1 commented Mar 6, 2023

Well, there's your problem. In your initial report, you said that your file had lines like these:

"type","address","body","date"
2,"00447779877777","No, but I am leaving soon!","1120755000000"

This is standard CSV, and the conversion script runs correctly on it.

The file you just posted, however, has lines like these, full of extra double quote marks:

"""type"",""address"",""body"",""date"""
"1,""07777777987"",""No, but I'll be leaving soon."",""1120755000000"""

@davebeep
Copy link

davebeep commented Mar 6, 2023

Well, there's your problem. In your initial report, you said that your file had lines like these:

"type","address","body","date"
2,"00447779877777","No, but I am leaving soon!","1120755000000"

This is standard CSV, and the conversion script runs correctly on it.

The file you just posted, however, has lines like these, full of extra double quote marks:

"""type"",""address"",""body"",""date"""
"1,""07777777987"",""No, but I'll be leaving soon."",""1120755000000"""

So if the conversion process is adding those extra quotation marks, I need to remove them from the original CSV file? (Apologies for my slowness)

@tmo1
Copy link
Owner Author

tmo1 commented Mar 6, 2023

So if the conversion process is adding those extra quotation marks, I need to remove them from the original CSV file?

You didn't mention how you converted the file to UTF-8 from the original before feeding it to my CSV-JSON conversion script, but a correct conversion shouldn't be adding extra quotation marks. See here for various conversion tools and methods. I use Linux, so I suppose I'd use iconv. There are also some tools for Windows mentioned in that discussion.

To be clear:

  1. Start with your original, correct CSV file.
  2. Convert it properly to UTF-8 encoding using one of the tools / methods in the thread I linked to.
  3. Feed the converted CSV file, now encoded in UTF-8, to my conversion script.

@davebeep
Copy link

davebeep commented Mar 8, 2023

Hi @tmo1,
I'm really grateful for all your help and wanted to let you know that, with a bit of trial and error, I think I have done it (I'm on Windows and used Notepad++). The only thing that isn't rendered correctly are the letters with diacritics, but I can probably live with that. I tried UTF-8-BOM encoding for this and, although it looks like the diacritics are preserved, once in QKSMS, the characters are corrupted. Thanks for all your hard work with your apps and for taking the time to reply to me.

@tmo1
Copy link
Owner Author

tmo1 commented Mar 9, 2023

I'm glad you got it working! For the diacritics, see if running the conversion script as follows solves the problem:

PYTHONIOENCODING=utf-8 csv-convert.py < messages.csv > messages.json

@davebeep
Copy link

davebeep commented Mar 9, 2023

I'm glad you got it working! For the diacritics, see if running the conversion script as follows solves the problem:

PYTHONIOENCODING=utf-8 csv-convert.py < messages.csv > messages.json

That doesn't seem to work, unfortunately. It says no such file or directory. There is definitely a file in the same directory as when I ran the previous script with the correct name, encoded in UTF-8 as before. But, honestly, I don't expect you to go on answering me forever. It's just me... I'll see if I can find some solution or live with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants