Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for complex fields #24

Open
mpapis opened this issue Apr 8, 2024 · 11 comments
Open

Support for complex fields #24

mpapis opened this issue Apr 8, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@mpapis
Copy link

mpapis commented Apr 8, 2024

I have field data in JSON Arrays and simple coma separated list in string, right now whole field is anonymized which "destroys" the array format of it.

@ppomes ppomes added the enhancement New feature or request label Apr 8, 2024
@ppomes
Copy link
Owner

ppomes commented Apr 8, 2024

Yes, json is currently not parsed. Need to implement it.

@ppomes
Copy link
Owner

ppomes commented Apr 8, 2024

@mpapis, just to be sure to work on your dataset, can you provide me a sample dump?

@mpapis
Copy link
Author

mpapis commented Apr 9, 2024

In the example:

  • points.contacts - my use case, this one is important for me
  • points.owner - made up example, I think it might came up earlier or later
  • points.emails - my use case, would be nice, but I can as well migrate to json array.
CREATE TABLE `points` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `contacts` json DEFAULT NULL,
  `owner` json DEFAULT NULL,
  `emails` text,
  `created_at` datetime(6) NOT NULL,
  `updated_at` datetime(6) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `points` VALUES (1,'Shop1','[\"person1@shop1.com\", \"person2@shop1.com\", \"person3@shop1.com\"]','{\"email\": \"owner@shop1.com\", \"last_name\": \"Sawyer\", \"first_name\": \"Tom\"}','person1@shop1.com,person2@shop1.com,person3@shop1.com,owner@shop1.com','2024-04-09 17:58:27.502417','2024-04-09 17:58:27.502417'),(2,'Shop2','[\"person1@shop2.com\", \"person2@shop2.com\", \"person3@shop2.com\"]','{\"email\": \"owner@shop2.com\", \"last_name\": \"Smith\", \"first_name\": \"Will\"}','person1@shop2.com,person2@shop2.com,person3@shop2.com,owner@shop2.com','2024-04-09 17:58:58.337830','2024-04-09 17:58:58.337830');

I've stripped down the dump from things I did not thought were important, let me know if you need the whole thing.

@mpapis
Copy link
Author

mpapis commented Apr 10, 2024

Update: just found out that points.owner is actually my use case, thankfully there is just one level as in the example.

@mpapis
Copy link
Author

mpapis commented Apr 10, 2024

Update2: also found another use cases with such fields:

'{\"emails_changes\":[[\"contact1@shop1.com\"],[\"contact1@shop1.com\",\"contact2@shop2.com\"]]}',
'{\"emails\":[\"contact1@shop1.com\",\"contact2@shop2.com\"]}',
'{\"email_changes\":[\"contact1@shop1.com\",\"contact2@shop2.com\"]}',

@ppomes
Copy link
Owner

ppomes commented Apr 24, 2024

Hi!

Update2: also found another use cases with such fields:

'{\"emails_changes\":[[\"contact1@shop1.com\"],[\"contact1@shop1.com\",\"contact2@shop2.com\"]]}',
'{\"emails\":[\"contact1@shop1.com\",\"contact2@shop2.com\"]}',
'{\"email_changes\":[\"contact1@shop1.com\",\"contact2@shop2.com\"]}',

Just to be sure, do you mean a given field in a table may have different json structures depending on rows ? Or are you talking about different fields?

I started to work on json support, and for the moment, I expect to have the same json structure for a given field.

Let me know!

@mpapis
Copy link
Author

mpapis commented Apr 25, 2024

unfortunately, the structure will be different, we are using one table to keep versions (history) of all other tables, and we store field object -> json dump of the other table state, and field object_changes which includes only the changes.
This leads to:

  • have conditioning of handling the json field based on field type
  • have multiple transformations for the versions table as object/object_change actually includes the whole row of another table as key->value (key->[old_value, new_value]) so multiple parts of the field will have different transformations.

As mentioned earlier the value can be also an array, so for object_changes this would be possible that multiple parts of the json needs to be handled, including nested arrays support.

@ppomes
Copy link
Owner

ppomes commented Apr 25, 2024

Thanks for you reply, I think it will be easier for me.

Just in case, I submitted my current (non complete) work for static json structure on branch issue24.

@ppomes
Copy link
Owner

ppomes commented May 6, 2024

Hi @mpapis,

Partial work for the moment, code in branch issue24_2 should now cover one of your needs (points.owner).

You can have a look at tests/complex.sql and tests/complex.conf

I will now work on your other needs, but in the meanwhile, feedback is welcome.

Thanks!
Pierre

@mpapis
Copy link
Author

mpapis commented May 6, 2024

I had a look at the changes, looks good so far 😄

@ppomes
Copy link
Owner

ppomes commented May 15, 2024

Hi @mpapis, still in the same branch, I added support for fiels separated by a given char. Test case (tests/complex.sql - tests/complex.conf) has been updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants