Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add JSON parsing to dump.transformation attribute #60

Open
joao-zanutto opened this issue Apr 7, 2024 · 5 comments
Open

Feature: add JSON parsing to dump.transformation attribute #60

joao-zanutto opened this issue Apr 7, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@joao-zanutto
Copy link
Contributor

Currently, the only way to pass a configuration to the dump.tansformation is through YAML, making it imperative to use a config file to configure a transformation.
Adding a JSON parser to this attribute will allow users to configure Greenmask entirely from environment variables, not needing to mount any volume or file.

This is specially useful when running Greenmask from a container, because many cloud providers offer container platforms that have environment variable and secret management easily integrated for no additional cost, however, preparing and mounting a volume will require some additional configuration and planning, alongside with other infrastructure considerations.

@wwoytenko wwoytenko self-assigned this Apr 8, 2024
@wwoytenko wwoytenko added the enhancement New feature or request label Apr 8, 2024
@wwoytenko
Copy link
Contributor

@joao-zanutto
That's a good point. I suspect it can be simply solved, but I should ensure that it can be completed seamlessly.
Thank you!

@wwoytenko
Copy link
Contributor

I've just checked. The Json format file is already working

{
  "common": {
    "pg_bin_path": "/usr/lib/postgresql/16/bin",
    "tmp_dir": "/tmp"
  },
  "storage": {
    "type": "s3",
    "s3": {
      "endpoint": "http://playground-storage:9000",
      "bucket": "adventureworks",
      "region": "us-east-1",
      "access_key_id": "Q3AM3UQ867SPQQA43P2F",
      "secret_access_key": "zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"
    }
  },
  "validate": null,
  "dump": {
    "pg_dump_options": {
      "dbname": "host=playground-db user=postgres password=example dbname=original",
      "jobs": 10
    },
    "transformation": [
      {
        "schema": "humanresources",
        "name": "employee",
        "transformers": [
          {
            "name": "NoiseDate",
            "params": {
              "ratio": "10 year 9 mon 1 day",
              "column": "birthdate"
            }
          }
        ]
      }
    ]
  },
  "restore": {
    "pg_restore_options": {
      "jobs": 10,
      "dbname": "host=playground-db user=postgres password=example dbname=transformed"
    }
  }
}

greenmask --config config.json validate

But I suspect it might not work correctly with transformer parameters. I will check

@wwoytenko
Copy link
Contributor

It is not as simple. Currently, it raises an error. I will check the implementation later

export DUMP_TRANSFORMATION='[ { "schema": "humanresources", "name": "employee", "transformers": [ { "name": "NoiseDate", "params": { "ratio": "10 year 9 mon 1 day", "column": "birthdate" } } ] } ]'
greenmask --config config.yml validate
{"level":"fatal","error":"5 error(s) decoding:\n\n* 'dump.transformation[0]' expected a map, got 'string'\n* 'dump.transformation[1]' expected a map, got 'string'\n* 'dump.transformation[2]' expected a map, got 'string'\n* 'dump.transformation[3]' expected a map, got 'string'\n* 'dump.transformation[4]' expected a map, got 'string'","time":"2024-04-08T16:38:50Z"}

@wwoytenko
Copy link
Contributor

Looks like it can't unmarshal the structures, only the scalar types

@joao-zanutto
Copy link
Contributor Author

joao-zanutto commented Apr 8, 2024

@wwoytenko I found possible solution in viper repo: spf13/viper#339 (comment)

I tried to stab the issue in the #61 PR, but it seems that it's not able to unmarshal the internal structures recursively. I'm still not really familiar with the mapstructure package so I'm trying to figure out a way to make it call itself recursively.

If you manage to find a solution to this, please let me know. (this is the last modification we need to use Greenmask in production)

joao-personal@joao:~/greenmask$ export DUMP_TRANSFORMATION='[{"schema":"humanresources","name":"employee","transformers":[{"name":"NoiseDate","params":{"ratio":"10year9mon1day","column":"birthdate"}}]}]'
joao-personal@joao:~/greenmask$ STORAGE_DIRECTORY_PATH=. PGHOST=localhost PGDATABASE=original PGPASSWORD=example PGPORT=54316 ./greenmask validate
2024-04-08T13:03:13-07:00 FTL error="unable to build runtime context: cannot validate and build table config: unable to init transformer: error parsing \"ratio\" parameter: unable to scan parameter via Driver: bad interval format"

UPDATE: @wwoytenko I just realized I had an issue with my transformation, if you take a close look, the ratio parameter is missing spaces between the interval arguments (year, month, day). Fixing this issue in the input and re-running made it work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants