Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape characters in JSON dump in tdb dump #93

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

knutin
Copy link
Member

@knutin knutin commented Aug 12, 2016

If a field in a TrailDB contains double quotes ("), tdb dump --json does not add the escape parenthesis. This patch handles this particular case, but not other escape characters (like newline).

I tried to keep the change simple and performant. It will only allocate and copy the string if replacement is needed, which most likely will happen very rarely. If no modification is needed, no allocation or copy is done.

I'm a bit unsure about the convention around freeing memory so I'm happy to take another pass if there's a better way.

As a side note, there's quite a few different ways we could speed up the replacement and I'd be happy to do another pass to improve performance.

@tuulos
Copy link
Member

tuulos commented Aug 13, 2016

This is a tricky problem. Currently tdbcli makes no attempt in producing very "valid" CSV or JSON.

Especially producing valid JSON is hard, since the RFC requires that JSON text is encoded in UTF-8 (or 32). Converting arbitrary binary data to valid unicode is tedious in C.

One option that I have in mind is to provide an option to urlencode everything. This would guarantee (more) valid JSON and CSV (urlencode in ascii is valid UTF-8) and urlencoding is somewhat human readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants