Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml emitter error while executing scans concurrently #2064

Open
Electrogenic opened this issue Apr 23, 2024 · 4 comments
Open

Yaml emitter error while executing scans concurrently #2064

Electrogenic opened this issue Apr 23, 2024 · 4 comments

Comments

@Electrogenic
Copy link

Electrogenic commented Apr 23, 2024

Hi Soda team.

Background

We are using the soda-core-trino version 3.3.1 library inside of a Fast API server. The following code works when handling a single request at a time. However, when multiple requests come in at the same time (even just 2), we are running into the Yaml emitter error found in the stacktrace below. I'm wondering if there might be some thread safety issues going on within the ruaml yaml emitter, or in the way it is being used.

scan = Scan()
scan.set_data_source_name('default')
# Trino data source is configured
scan.add_configuration_yaml_str('<REDACTED>')
# I simplified the checks for this example, but I'm using about 240 in actuality.
checks = """
    checks for my_table:
    - schema:
        name: Check that field foo is present
        fail:
            when required column missing:
            - foo
    """
scan.add_sodacl_yaml_str(checks)
soda_scan_status_code = scan.execute()

Stacktrace

   File "/workspace/app/routes/v1/contracts/routes.py", line 419, in execute_soda_scann
     soda_scan_status_code = scan.execute()
   File "/workspace/.venv/lib/python3.10/site-packages/soda/scan.py", line 638, in execute
     self.scan_results = self.build_scan_results()
   File "/workspace/.venv/lib/python3.10/site-packages/soda/scan.py", line 72, in build_scan_results
     checks = [check.get_dict() for check in self._checks if check.outcome is not None and check.archetype is None]
   File "/workspace/.venv/lib/python3.10/site-packages/soda/scan.py", line 72, in <listcomp>
     checks = [check.get_dict() for check in self._checks if check.outcome is not None and check.archetype is None]
   File "/workspace/.venv/lib/python3.10/site-packages/soda/execution/check/check.py", line 363, in get_dict
     "identity": self.create_identity(with_datasource=True, with_filename=True),
   File "/workspace/.venv/lib/python3.10/site-packages/soda/execution/check/check.py", line 234, in create_identity
     identity_source_configurations_yaml = to_yaml_str(identity_source_configurations)
   File "/workspace/.venv/lib/python3.10/site-packages/soda/common/yaml_helper.py", line 13, in to_yaml_str
     return YamlHelper.to_yaml(yaml_object)
   File "/workspace/.venv/lib/python3.10/site-packages/soda/common/yaml_helper.py", line 24, in to_yaml
     cls.__yaml.dump(yaml_object, stream)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/main.py", line 563, in dump
     self._context_manager.dump(data)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/main.py", line 913, in dump
     self._yaml.representer.represent(data)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/representer.py", line 82, in represent
     self.serializer.serialize(node)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/serializer.py", line 109, in serialize
     self.serialize_node(node, None, None)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/serializer.py", line 188, in serialize_node
     self.emitter.emit(
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/emitter.py", line 257, in emit
     self.state()
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/emitter.py", line 643, in expect_first_block_sequence_item
     return self.expect_block_sequence_item(first=True)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/emitter.py", line 663, in expect_block_sequence_item
     self.expect_node(sequence=True)
   File "/workspace/.venv/lib/python3.10/site-packages/ruamel/yaml/emitter.py", line 467, in expect_node
     raise EmitterError('expected NodeEvent, but got {self.event!s}')
 ruamel.yaml.emitter.EmitterError: expected NodeEvent, but got {self.event!s}

Let me know your thoughts, or what else I could do to help. Thank you!

@tools-soda
Copy link

SAS-3303

@bastienboutonnet
Copy link
Contributor

I think the best at this stage would be to follow your intuition and see if indeed you can make it work, then we'll gladly welcome a PR. At this point, using the package in a fastAPI app hasn't really been a use-case we've tested or given much thought about so if you think you're on to something we're open to support you through that PR.

Otherwise, I'll be more than happy to log-in a feature request for you and understand a bit more the use-case and so on.

@Electrogenic
Copy link
Author

Gotchya! Good to know that about FastAPI as a use case. We've only just started the integration. So nothing in prod yet. But happy to say it's been pretty smooth sailing thus far. Save for this one thorny issue.

I'd be happy to take a crack at it though, and see if I can't figure out what's going on here.

@m1n0
Copy link
Contributor

m1n0 commented Apr 24, 2024

Hmm I have seen this before with parallel scan executions - we were trying to parallelize partitioned scans, where we loaded config and checks from files and then passed those into multiple parallel scans. I did not find a quick root-cause fix but I found a way to make it work - I started adding config and checks as strings to scans (i.e. separate object instances) instead of one file stream object. This solved the issue for me, but it seems like you are already doing that. Did you by any chance get the error when you used files instead of yaml strings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants