Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Logs being admitted despite settings causing HA failures #1575

Open
lmeyerov opened this issue Jun 28, 2021 · 3 comments
Open

[BUG] Logs being admitted despite settings causing HA failures #1575

lmeyerov opened this issue Jun 28, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@lmeyerov
Copy link
Collaborator

lmeyerov commented Jun 28, 2021

Describe the bug

Despite setting bc(enable_logs=False, LOGGING_LEVEL='warn', ...), we are seeing debug logs being emitted to algebra.log and blazing_log/pyblazing.log. In a separate setup (dask-cuda workers), we're seeing blazing_log/{RAL.0.log,pyblazing.0.log}.

This is a problem, for example, in HA settings that run periodic healthchecks. In our case, this ended up causing a Denial of Service / Out-of-Memory on a node as we didn't have log rotations setup outside of standard locations.

As an interim workaround, we're setting a periodic watcher to echo "" > *.log

Steps/Code to reproduce bug

Context options:

allocator='existing',
enable_logging=False,
config_options={'protocol': 'tcp'},
LOGGING_LEVEL='warn'

Healthcheck: create table, add some numbers, drop

Expected behavior

Only see warning-or-higher logs. In our case, that would have been no log output.

Environment overview (please complete the following information)

RAPIDS 0.19 in docker

Environment details

Additional context
Add any other context about the problem here.

----For BlazingSQL Developers----
Suspected source of the issue
Where and what are potential sources of the issue

Other design considerations
What components of the engine could be affected by this?

@lmeyerov lmeyerov added bug Something isn't working ? - Needs Triage needs team to review and classify labels Jun 28, 2021
@lmeyerov
Copy link
Collaborator Author

lmeyerov commented Jun 28, 2021

A bit more:

On load, from a clean system, so would expect 0 logs:

Every 1.0s: ./dc exec forge-etl-python cat algebra.log blazing_log/pyblazing.log rmm_log.txt                                                                                  leo-UX331UN: Mon Jun 28 10:09:38 2021
+ docker-compose -f compose/production.yml exec forge-etl-python cat algebra.log blazing_log/pyblazing.log rmm_log.txt
DEBUG: com.blazingdb.calcite.schema.BlazingSchema - getting table names
DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalProject(x=[$0])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
BindableTableScan(table=[[main, df]], aliases=[[x]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
LogicalProject(EXPR$0=[CASE(=($1, 0), null:BIGINT, $0)])
  LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)], agg#1=[COUNT($0)])
    LogicalTableScan(table=[[main, df]])

2021-06-28 17:08:50,016||INFO|||"create_table start for df"||||||
cat: rmm_log.txt: No such file or directory 

After health checks:

+ docker-compose -f compose/production.yml exec forge-etl-python cat algebra.log blazing_log/pyblazing.log rmm_log.txt
DEBUG: com.blazingdb.calcite.schema.BlazingSchema - getting table names
DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalProject(x=[$0])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
BindableTableScan(table=[[main, df]], aliases=[[x]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
LogicalProject(EXPR$0=[CASE(=($1, 0), null:BIGINT, $0)])
  LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)], agg#1=[COUNT($0)])
    LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.schema.BlazingSchema - getting table names
DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalProject(a=[$0], b=[$1])
  LogicalTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
BindableTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]], aliases=[[a, b]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalAggregate(group=[{}], the_sum=[SUM($0)])
  LogicalProject(b=[$1])
    LogicalTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
LogicalProject(the_sum=[CASE(=($1, 0), null:BIGINT, $0)])
  LogicalAggregate(group=[{}], the_sum=[$SUM0($0)], agg#1=[COUNT($0)])
    BindableTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]], projects=[[1]], aliases=[[b]])

2021-06-28 17:08:50,016||INFO|||"create_table start for df"||||||
2021-06-28 17:09:58,488||INFO|||"create_table start for uecf1715992f344648df7558f2fa24b71"||||||
2021-06-28 17:09:58,488||INFO|||"create_table start for uecf1715992f344648df7558f2fa24b71"||||||
cat: rmm_log.txt: No such file or directory

@wmalpica
Copy link
Contributor

wmalpica commented Jul 2, 2021

@lmeyerov
From this issue i think we need to implement a feature that will purge old logs so that they dont over accumulate and be able to turn off all logs. There are some logs that right now you cant turn off, which are the pyblazing logs and algebra logs.
But the other logs you should be able to turn off as follows:

config_options = { 'LOGGING_LEVEL':'err', 'ENABLE_GENERAL_ENGINE_LOGS':False, 'ENABLE_COMMS_LOGS':False, 'ENABLE_TASK_LOGS':False, 'ENABLE_OTHER_ENGINE_LOGS':False}
bc = BlazingContext(enable_logging=False,config_options=config_options)

Note that the enable_logging parameter you pass directly to the BlazingContext is for the allocator logs, which are off by default. The other parameters are set via config_options. The ones relevant to logs are these:

LOGGING_LEVEL: string
                Set the level (as string) to register into the logs
                for the current tool of logging. Log levels have order of priority:
                ``{trace, debug, info, warn, err, critical, off}``. Using ``'trace'`` will
                registers all info.
                **NOTE:** This parameter only works when used in the
                BlazingContext
                **Default:** ``'trace'``
            LOGGING_FLUSH_LEVEL: string
                Set the level (as string) of the flush for
                the current tool of logging. Log levels have order of priority:
                ``{trace, debug, info, warn, err, critical, off}``
                **NOTE:** This parameter only works when used in the
                BlazingContext
                **Default:** ``'warn'``
            ENABLE_GENERAL_ENGINE_LOGS: boolean
                Enables ``'batch_logger'`` logger
                **Default:** ``True``
            ENABLE_COMMS_LOGS: boolean
                Enables ``'output_comms'`` and ``'input_comms'`` logger
                **Default:** ``False``
            ENABLE_TASK_LOGS: boolean
                Enables ``'task_logger'`` logger
                **Default:** ``False``
            ENABLE_OTHER_ENGINE_LOGS: boolean
                Enables ``'queries_logger'``, ``'kernels_logger'``,
                ``'kernels_edges_logger'``, ``'cache_events_logger'`` loggers
                **Default:** ``False``
            LOGGING_MAX_SIZE_PER_FILE: string
                Set the max size in bytes for the log files.
                **NOTE:** This parameter only works when used in the
                BlazingContext
                **Default:** ``1GB``

@lmeyerov
Copy link
Collaborator Author

lmeyerov commented Jul 3, 2021

afaict:

  • purging logs & log rotation are more traditionally the job of whoever runs it
  • ... but the ability to control whether logs are emitted, with what fidelity, & where, are more of the job for the app

So in this case, seems more about controlling file names & fidelity / disabling, and disabling can be the same as sending to /dev/null :)

@wmalpica wmalpica removed the ? - Needs Triage needs team to review and classify label Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants