Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notifications via DingTalk service sometimes unsuccessful #467

Open
zhaoyou opened this issue Jan 7, 2024 · 16 comments
Open

Notifications via DingTalk service sometimes unsuccessful #467

zhaoyou opened this issue Jan 7, 2024 · 16 comments

Comments

@zhaoyou
Copy link

zhaoyou commented Jan 7, 2024

Environment (please complete the following information):

  • OS: Ubuntu 16.04.4 LTS
  • EaseProbe Version v1.5.0

Describe the bug
When using the DingTalk notification service, some exception notifications can be sent out and some cannot. The message is as follows:
WARN [2024-01-07T10:12:38+08:00] [dingtalk / Dingtalk alert service / Notification] Retried to send 1/3 - Error response from Dingtalk [%!d(float64= 40035)] - [{"errcode":40035, "errmsg": "Missing parameter json"}]

Expected behavior
The notifications either both succeed or fail, and the feedback from the DingTalk service response shows that the request is not compliant. I'm not sure what's wrong.

@zhaoyou zhaoyou changed the title s Notifications via DingTalk service sometimes unsuccessful Jan 7, 2024
@samanhappy
Copy link
Collaborator

Thank you for submitting this issue, the error message Missing parameter json may be caused by special characters that are incompatible with the JSON format.

Could you check if there are any such characters in your configuration file?

Alternatively, kindly provide the content of your configuration file(ensuring data anonymization if needed) so that we can identify and resolve the issue.

@suchen-sci
Copy link
Contributor

@samanhappy is right. Providing more information is also better. By the way, I will also do some tests to check if there are some potential bugs or compatibility issues.

@zhaoyou
Copy link
Author

zhaoyou commented Jan 8, 2024

  1. My configuration file does not contain these characters
  2. config notify part:
notify:
  dry: true # dry notification, print the Discord JSON in log(STDOUT)
  timeout: 20s # the timeout send out notification, default: 30s
  retry: # somehow the network is not good and needs to retry.
    times: 3 # default: 3
    interval: 5s # default: 5s

  wecom:
    - name: "Wecom alert service"
      webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx"
  dingtalk:
    - name: "Dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxx"

  log:
    - name: "Local Log"
      file: "/mnt/logs/easeprobe_notify.log"
      dry: true

@samanhappy
Copy link
Collaborator

Could you please provide the complete configuration content to let us test it in our local enviroment?

@zhaoyou
Copy link
Author

zhaoyou commented Jan 8, 2024

# Global settings for all probes and notifiers.
settings:

  # The customized name and icon
  name: "EaseProbe" # the name of the probe: default: "EaseProbe"
  #icon: "https://path/to/icon.png" # the icon of the probe. default: "https://megaease.com/favicon.png"
  # Daemon settings

  # pid file path,  default: $CWD/easeprobe.pid,
  # if set to "", will not create pid file.
  #pid: /var/run/easeprobe.pid
  #timeformat: "2006-01-02 15:04:05"

  # A HTTP Server configuration
  http:
    ip: 127.0.0.1 # the IP address of the server. default:"0.0.0.0"
    port: 8181 # the port of the server. default: 8181
    refresh: 5s # the auto-refresh interval of the server. default: the minimum value of the probes' interval.
    log:
      file: /mnt/logs/easeprobe_http_access.log # access log file. default: Stdout
      # Log Rotate Configuration (optional)
      self_rotate: true # true: self rotate log file. default: true
                        # false: managed by outside  (e.g logrotate)
                        #        the blow settings will be ignored.
      size: 10 # max of access log file size. default: 10m
      age: 7 #  max of access log file age. default: 7 days
      backups: 5 # max of access log file backups. default: 5
      compress: true # compress the access log file. default: true
    # SLA Report schedule
    sla:
       #  daily, weekly (Sunday), monthly (Last Day), none
      schedule : "daily"
      # UTC time, the format is 'hour:min:sec'
      time: "23:59"
      # debug mode
      # - true: send the SLA report every minute
      # - false: send the SLA report in schedule
      debug: false
      # SLA data persistence file path.
      # The default location is `$CWD/data/data.yaml`
      # Use the following to disable SLA data persistence
      # data: "-"
      backups: 5 # max of SLA data file backups. default: 5
               # if set to a negative value, keep all backup files

# HTTP Probe configuration
http:
  - name: 通知-example.com
    url: https://example.com
    timeout: 20s
    interval: 15m
    # HTTP SUCCESS response code range, default is [0, 499]
    success_code:
      - [200,206]
      - [300,308]

  - name: 通知-数据接口1
    url: http://111.26.70.40:8085
    timeout: 20s
    interval: 15m
    # HTTP SUCCESS response code range, default is [0, 499]
    success_code:
      - [200,206]
      - [300,308]

  - name: 通知-数据接口2
    url: http://111.26.70.40:8083
    timeout: 20s
    interval: 5m
    # HTTP SUCCESS response code range, default is [0, 499]
    success_code:
      - [200,200]

  - name: 通知-数据接口3
    url: http://111.26.70.40:8084
    timeout: 20s
    interval: 5m
    # HTTP SUCCESS response code range, default is [0, 499]
    success_code:
      - [200,200]





# TCP Probe Configuration
notify:
  dry: true # dry notification, print the Discord JSON in log(STDOUT)
  timeout: 20s # the timeout send out notification, default: 30s
  retry: # somehow the network is not good and needs to retry.
    times: 3 # default: 3
    interval: 5s # default: 5s

  wecom:
    - name: "Wecom alert service"
      webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx"
  dingtalk:
    - name: "Dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxx"

  log:
    - name: "Local Log"
      file: "/mnt/logs/easeprobe_notify.log"
      dry: true

@samanhappy
Copy link
Collaborator

Appreciated for the thorough response, the configuration content does not include any special characters as you mentioned, I apologize for my hasty speculation.

After checking the code I find that the notification content will incorporate the error message related to the probe, and it will be logged in the following format:
ERRO[2024-01-08T16:12:33+08:00] [http / 通知-example.com] error making get request: Get "https://example.com1": dial tcp: lookup example.com1: no such host

Could you please check your enviroment for a similar log entry? I've tested the configuration, but I was unabled to reproduce the Dingtalk error.

@suchen-sci
Copy link
Contributor

suchen-sci commented Jan 9, 2024

Some online blogs say that error 40035 is related with invalid json payload. But as i tested, easeprobe produces valid json payload which match the request of dingtalk official document.
It is also wired that some exception notifications can be sent out and some cannot (some problem with dingtalk api server?)

If possible, could you please add a log to notify/dingtalk/dingtalk.go SendDingtalkNotification function and deploy it on your local machine? Then when meet this error again, you will know what message easeprobe sent.

@zhaoyou
Copy link
Author

zhaoyou commented Jan 9, 2024

Thank you for your reply, it's strange, I don't know what's wrong with the configuration file to configure the logging, but my corresponding directory did not find the program running log nor notification logs

file: "/mnt/logs/easeprobe_notify.log"
log: /mnt/logs/easeprobe_http_access.log 

@zhaoyou
Copy link
Author

zhaoyou commented Jan 9, 2024

The console displays the following logs, but no log files are found in the corresponding log directory

INFO[0000] Clean data file: data/data.yaml-2023-09-11T10_58_25.259281271Z 
INFO[0000] Load the configuration file successfully!    
INFO[0000] Successfully created the PID file: /home/tbcc/release/easeprobe/easeprobe.pid 
INFO[0000] Application Log File [Stdout] - Self-Rotate  
INFO[0000] Web Access Log File [/mnt/logs/easeprobe_http_access.log] - Self-Rotate 
INFO[2024-01-09T09:38:52+08:00] [Web] Access Log output file: /mnt/logs/easeprobe_http_access.log 
INFO[2024-01-09T09:38:52+08:00] [Web] HTTP server is listening on 127.0.0.1:8181 
INFO[2024-01-09T09:38:52+08:00] Probe [http] - [通知-Thermoberg CCDCC] base options are configured! 
INFO[2024-01-09T09:38:52+08:00] [Metric] Counter <EaseProbe_http_total> is created! 
INFO[2024-01-09T09:38:52+08:00] [Metric] Gauge <EaseProbe_http_duration> is created! 
INFO[2024-01-09T09:38:52+08:00] [Metric] Gauge <EaseProbe_http_status> is created! 
INFO[2024-01-09T09:38:52+08:00] [Metric] Gauge <EaseProbe_http_sla> is created! 
INFO[2024-01-09T09:38:52+08:00] [Metric] Counter <EaseProbe_http_status_code> is created! 
INFO[2024-01-09T09:38:52+08:00] [Metric] Gauge <EaseProbe_http_content_len> is created! 

@suchen-sci
Copy link
Contributor

please change dry to false. dry run means the log will not be notified.

@suchen-sci
Copy link
Contributor

/mnt/logs/easeprobe_http_access.log only active when you call easeporbe via port 8181, for example, curl http:127.0.0.1:8181/metrics. it logs when easeprobe access by other users.

and for /mnt/logs/easeprobe_notify.log, you need to change dry: false.

@zhaoyou
Copy link
Author

zhaoyou commented Jan 10, 2024

I have both of these parameters set to false. i'll keep an eye on it for a while

截屏2024-01-10 12 46 52

@suchen-sci
Copy link
Contributor

suchen-sci commented Jan 10, 2024

notify:
  dry: true # dry notification, print the Discord JSON in log(STDOUT)
  timeout: 20s # the timeout send out notification, default: 30s
  retry: # somehow the network is not good and needs to retry.
    times: 3 # default: 3
    interval: 5s # default: 5s

based on the manual in https://github.com/megaease/easeprobe/blob/main/docs/Manual.md#72-notification-configuration notify doesn't has field of dry. manual says all the notifications in notify has parameters of dry, not means itself has dry.

if you want to set filed like dry for all notifications, you should set them in settings. like https://github.com/megaease/easeprobe/blob/main/docs/Manual.md#73-global-setting-configuration this.

@zhaoyou
Copy link
Author

zhaoyou commented Jan 18, 2024

This error still exists, added the log level to info, but the logs don't see anything useful, is it possible to see the request body of the push message in debug mode?

image

@suchen-sci
Copy link
Contributor

Hi, i will make a pr to do that. Please wait.

@suchen-sci
Copy link
Contributor

Hi, can you download the newest version of code from github and use make command to compile it. Then try it again? I am sure this time the error message will provide more information about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants