Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix step logging when using GCS Artifact Store #2211

Open
1 task
strickvl opened this issue Jan 3, 2024 · 8 comments
Open
1 task

Fix step logging when using GCS Artifact Store #2211

strickvl opened this issue Jan 3, 2024 · 8 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@strickvl
Copy link
Contributor

strickvl commented Jan 3, 2024

Open Source Contributors Welcomed!

Please comment below if you would like to work on this issue!

Contact Details [Optional]

support@zenml.io

What happened?

There seems to be an issue with StepLogging when using GCS (Google Cloud Storage) as the artifact store. Specifically, only the last parts of the logs appear in the file, which suggests a problem with the log writing or saving mechanism.

Steps to Reproduce

Here's a snippet to reproduce the issue:

import gcsfs
from zenml.client import Client
from zenml.logging.step_logging import StepLogsStorage

client = Client()
_ = client.active_stack

TEST_FILE="gs://<<your_bucket>>/test_log.log"

log_storage = StepLogsStorage(logs_uri=TEST_FILE, max_messages=5)
for i in range(0,11):
    log_storage.write(f"I'm log line #{i}")
log_storage.save_to_file()

fs = gcsfs.GCSFileSystem()
with fs.open(TEST_FILE, 'r') as f:
    all_of_it = f.read()

print(all_of_it)

Expected Behavior

All log lines should be saved and visible in the GCS file, not just the last few.

Potential Solution

Consider using the logging.StreamHandler facility to temporarily write logs to the remote file (GCS, S3, etc.). Here's an example:

import logging
import fsspec

f = fsspec.open("gs://<<my_gcs_bucket>>/test_log.log", "w")
with f as of:
    log_handler = logging.StreamHandler(of)
    logger = logging.getLogger()  # Root logger
    logger.addHandler(log_handler)
    for i in range(0, 5000):
        logger.warning(f"I'm log line #{i}")
    logger.removeHandler(log_handler)

This approach could fit nicely in the StepLogsStorageContext class.

Additional Context

Proper log handling is crucial for debugging and monitoring pipeline performance, especially when dealing with large-scale data processing in cloud environments.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@strickvl strickvl added bug Something isn't working good first issue Good for newcomers labels Jan 3, 2024
@adtygan
Copy link

adtygan commented Jan 10, 2024

Hello @strickvl, I'm trying to reproduce this issue but can't. I made a GCS bucket and tried to run the first snippet and got the following error. Please let me know if you need the traceback.

ValueError: No file systems were found for the scheme: gs://. Please make sure that you are using the right path and the all the necessary integrations are properly installed.

The error was raised for the following line,

log_storage = StepLogsStorage(logs_uri=TEST_FILE, max_messages=5)

@strickvl
Copy link
Contributor Author

Here I'd patch in @bcdurak who I think was most involved with that particular part of the codebase. I think he should be able to help with this. Other things to check:

  • make sure gcfs is installed?
  • try a simpler GCFS-related example to make sure it's not a permissions issue etc? something like:
import gcsfs

fs = gcsfs.GCSFileSystem()
with fs.open('gs://your-bucket-name/test.txt', 'w') as f:
    f.write('Hello, world!')

with fs.open('gs://your-bucket-name/test.txt', 'r') as f:
    print(f.read())

(Replace 'gs://your-bucket-name/test.txt' with a valid path in your GCS bucket.)

@adtygan
Copy link

adtygan commented Jan 11, 2024

Thank you for the code you provided. I did have some permission issues, which I resolved after trying this code, and the code provided correctly prints Hello, world!. However, the previous error I got persists even now.

ValueError: No file systems were found for the scheme: gs://. Please make sure that you are using the right path and the all the necessary integrations are properly installed.

EDIT:

I think I understand the source of this error. I have attached the traceback below. The code uses fileio to open the URI which raises error. Instead, at this step, gcsfs needs to be used like in the previous code provided.

image

@strickvl
Copy link
Contributor Author

I think I see what's going on now. Are you running the code with a GCS artifact store configured in your ZenML stack? (fileio will use whatever stack you have configured and set up for ZenML, so if you have a GCS artifact store then it should work).

@adtygan
Copy link

adtygan commented Jan 11, 2024

I see. I tried to setup a GCS artifact store but am facing some errors. I don't understand a few steps and will first acquaint myself. Could you please assign me to this issue?

@adtygan
Copy link

adtygan commented Jan 12, 2024

I was able to reproduce the issue. The output I get for the initial code is

I'm log line #10

I will now work on solving the issue.

@adtygan
Copy link

adtygan commented Jan 15, 2024

@strickvl I have fixed the issue locally and I'm getting the expected output as shown below

image

However I'm facing an issue in following the Contributions guidelines. While running the command mypy --install-types I get the error error: Can't determine which types to install with no files to check (and no cache from previous mypy run). Could you please help with this?

Also, while opening a pull request, I read this pre-requisite: I have added tests to cover my changes. To fix the bug I made a change to src/zenml/logging/step_logging.py. So I think I need to add tests, but I'm not sure how to do this. Request help on this.

@strickvl
Copy link
Contributor Author

For our cloud integrations, it's enough to demonstrate that you've tested it. We don't currently run integration tests on cloud environments, so basically for something like this it wouldn't be possible to test it locally. Icing on the cake would be to include instructions how someone from the core team could reproduce your local test (code snippet and reminder of what the stack setup would be) in the PR, but beyond that I think you're ok.

Also for mypy I think you can ignore that and just make the PR. Any issues will be revealed there.

@strickvl strickvl linked a pull request Feb 5, 2024 that will close this issue
9 tasks
@adtygan adtygan mentioned this issue Feb 14, 2024
9 tasks
@coderabbitai coderabbitai bot mentioned this issue Mar 16, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants