Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The catalog file is not written in full by the time the connector starts #5

Open
zxqfd555-pw opened this issue Mar 16, 2024 · 2 comments

Comments

@zxqfd555-pw
Copy link

Hi!

We're using airbyte-serverless in the Pathway framework as a connector to airbyte sources.
Recently we've run into an issue with the internally serialized catalog file not being JSON-readable. We're using an airbyte github connector, but it doesn't seem an important detail.

I've analyzed the stack trace we've got and found a suspicious place there:

ValueError: Could not read json file /mnt/temp/catalog.json: Expecting ':' delimiter: line 1 column 8192 (char 8191).

So, what happens is that the code tries to read the catalog file which is created here with the usage of json.dump, but stumbles on the character 8192 (out of ~65K chars - I did output it locally to estimate the size we should have) which looks like the end of a filesystem block/chunk.

My guess for the reason is the fact that the opened file is not closed straight away, hence leaving some random amount of time for the file not to be fully written, which results in the airbyte connector's docker image starting before this is done in some rare unlucky cases.

If so, the explicit close/context manager usage should help here. Could you please look into the issue and confirm or reject my assumptions? I can send a PR with the supposed fix if that helps.

Thank you in advance!

@unytics
Copy link
Owner

unytics commented Mar 20, 2024

Thanks a lot Sergey for opening this issue.

Being explicit in closing the file being written in any case is a good idea.

If you open a PR on this, I'll merge it.

@zxqfd555-pw
Copy link
Author

@unytics I've created a PR with a little fix for that: #6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants