New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messages created via concurrent.futures.ProcessPoolExecutor are not being catched #550
Comments
Currently facing this exact same issue:
Managed to work around it by using the ProcesspoolExecutor in "spawn" mode, the container executing the python script runs with import logging
import google.cloud.logging as glog
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import get_context
client = glog.Client()
client.setup_logging()
def logging_something(something):
logging.info(something)
def main():
context = get_context("spawn") # Use spawn to ensure compatibility with GCP libraries
with ProcessPoolExecutor(mp_context=context, max_workers=2) as executor: # Refer to the context with the mp_context attribute
executor.map(logging_something, ['firstPoolExecutorMessage', 'secondPoolExecutorMessage'])
logging_something('noPoolExecutorMessage')
if __name__ == '__main__':
main() |
From my research, the Since there is a work-around setting |
Pretty sure this is breaking the stackdriver provider for airflow, I am doing some testing now. Basically all logs after: Are dropped in the child process. |
is there anyone that could push up priority? @daniel-sanche as reported by @brokenjacobs here we have the stackdriver logging in Apache Airflow broken due to this bug. Thanks! |
I see KubernetesExecutor is mentioned in the bug title. If you're deploying on GKE, you should be able to take advantage of the native stdout logging functionality using the StructuredLogHandler, which is unaffected by this issue. If you're not using GKE (or a GCP serverless environment that supports stdout log collection), have you tried the context workaround mentioned earlier in this thread? Any luck there? |
Airflow doesn't use multiprocessing, it uses There are multiple 'remote' logging providers for airflow. Most of them work by uploading the logs to the logging service after a task has completed. The stackdriver provider is somewhat unique in that it streams logs to stackdriver instead of waiting until completion. The Kubernetes handlers are for 'live' logs if you are using the kubernetes execution engine in airflow, the 'remote logging' provider is still used for retrieving logs after the pod goes away. This integration is in python for airflow use, not the environment. Of course any logs sent to stdout from the pod are still readable from the stackdriver environment, but not in the airflow webUI. Also, the missing logs are still missing in stdout as well when the stackdriver handler is being used. |
Also the KubernetesExecutor in the bug title is a bit of a red herring. The problem isn't with the Kubernetes execution engine, it's the stackdriver remote logging provider. |
agree with @brokenjacobs, are there any other ways to let it work with Stackdriver or is there a way to have a fix @daniel-sanche? |
I see, yeah unfortunately multi-processing is not currently supported by the library, and I don't believe support will be added in the near-term. I can provide workarounds for getting logs into GCP Cloud Logging, but I'm less familiar with getting logs into the airflow web UI. I wasn't aware of that being a feature supported by this library. Was that always broken, or did something change recently that broke it? |
I'm not sure how well tested or deployed the stackdriver provider for airflow is. Google cloud composer doesn't use this provider, it has another way of getting logs into stackdriver. The provider appears to work at first glance because you see logs from numerous components. It's only upon trying to use it with task logs that you notice things are missing. As I mentioned above this provider is a bit unique in that it replaces a full logging handler and does not just upload logs upon completion. I suppose it could be re-written to do that, but it's not how the provider was put together originally. |
Dear Google Logging Team,
I am currently not able to see my logs created while launching parallel tasks. I am using the standard python library: concurrent.futures
This is not an issue while logging on console or file with python's standard logging library.
My environment
MRE:
Looking forward to your reply.
The text was updated successfully, but these errors were encountered: