Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random "Error opening rules configuration file" when using Routing Rule Engine #166

Open
rdarwish741 opened this issue Apr 4, 2022 · 0 comments · May be fixed by #211
Open

Random "Error opening rules configuration file" when using Routing Rule Engine #166

rdarwish741 opened this issue Apr 4, 2022 · 0 comments · May be fixed by #211

Comments

@rdarwish741
Copy link

I'm getting random "Error opening rules configuration file" when using Routing Rule Engine. It only happens when I have 2 or more clients sending requests to presto-gateway.

ERROR [2022-04-04 21:03:57,026] com.lyft.data.gateway.ha.router.RoutingGroupSelector$Logger: Error opening rules configuration file, using routing group header as default.
! org.yaml.snakeyaml.parser.ParserException: while parsing a block node
!  in 'reader', line 13, column 1:
!     
!     ^
! expected the node content, but found '<block end>'
!  in 'reader', line 13, column 1:
!     
!     ^
! 
! at org.yaml.snakeyaml.parser.ParserImpl.parseNode(ParserImpl.java:482)
! at org.yaml.snakeyaml.parser.ParserImpl.access$1300(ParserImpl.java:117)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockNode.produce(ParserImpl.java:359)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceEntry.produce(ParserImpl.java:506)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceFirstEntry.produce(ParserImpl.java:496)
! at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158)
! at org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:148)
! at org.yaml.snakeyaml.composer.Composer.checkNode(Composer.java:78)
! at org.yaml.snakeyaml.constructor.BaseConstructor.checkData(BaseConstructor.java:123)
! at org.yaml.snakeyaml.Yaml$1.hasNext(Yaml.java:489)
! at org.jeasy.rules.support.reader.YamlRuleDefinitionReader.loadRules(YamlRuleDefinitionReader.java:71)
! at org.jeasy.rules.support.reader.AbstractRuleDefinitionReader.read(AbstractRuleDefinitionReader.java:43)
! at org.jeasy.rules.mvel.MVELRuleFactory.createRules(MVELRuleFactory.java:100)
! at com.lyft.data.gateway.ha.router.RoutingGroupSelector.lambda$byRoutingRulesEngine$1(RoutingGroupSelector.java:42)
! at com.lyft.data.gateway.ha.handler.QueryIdCachingProxyHandler.rewriteTarget(QueryIdCachingProxyHandler.java:123)
! at com.lyft.data.proxyserver.ProxyServletImpl.rewriteTarget(ProxyServletImpl.java:65)
...

Using:
java version "1.8.0_202"
Python 3.7.13
trino-python-client 0.3.11.0

rules.yaml:

---
name: 'airflow'
description: 'if query from airflow, route to etl group'
condition: 'request.getHeader("X-Trino-Source") == "trino-python-client"'
actions:
  - 'result.put("routingGroup", "adhoc")'

python client script:

#!/usr/bin/python

from trino.dbapi import connect
import time

conn = connect(
    host='localhost',
    port=8090,
    user='user1',
    catalog='system',
    schema='runtime',
)
cur = conn.cursor()

while 1 < 2:
  cur.execute('SELECT * from nodes limit 1')
  rows = cur.fetchall()
  print (rows)

Other Observations:

  • When the error is raised, presto-gateway routes request to default RoutingGroup "adhoc". If RoutingGroup "adhoc" does not exist, client request fails.
  • CPU usage keeps increasing over time on the JVM. I stopped my python clients after 30mins, but CPU usage was still at 25%. If I leave it running for days, it will eventually eat up all my CPU cores.

After further investigation, I narrowed it down to MVELRuleFactory class. I'm not a developer so I can't really explain what is happening but the following fixes the issue.

--- a/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
+++ b/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
@@ -34,11 +34,11 @@ public interface RoutingGroupSelector {
    * to determine the right routing group.
    */
   static RoutingGroupSelector byRoutingRulesEngine(String rulesConfigPath) {
-    RulesEngine rulesEngine = new DefaultRulesEngine();
-    MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
 
     return request -> {
       try {
+        RulesEngine rulesEngine = new DefaultRulesEngine();
+        MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
         Rules rules = ruleFactory.createRules(
             new FileReader(rulesConfigPath));
         Facts facts = new Facts();

It ran for hours without any errors and CPU usage was stable.
If anyone can provide more input, It would be appreciated.

deigote added a commit to deigote/presto-gateway that referenced this issue Sep 26, 2023
the previous implementation required to read the routing rules config file for each request, adding unnecessary latency. More importantly, it was done in a non-thread-safe way, as the `rulesEngine` and `ruleFactory` were shared among all requests, but the `Rules` creation was not. \

Since the class MVELRuleFactory is stateful (as it has `RuleDefinitionReader reader` and `ParserContext parserContext` as private members), it should not be shared between concurrent threads. `Rules` OTOH is a read-only class after it's created (it's just iterated over by the `DefaultRulesEngine`). Given that the rules are the same for all requests, and that the `Rules` usage is thread-safe, it makes sense to share it across all of them.\

We have deployed this code in production at Datadog and our APM profiling tool shows a reduction of around ~1-2%, plus we stopped seeing the error described in lyft#166
@deigote deigote linked a pull request Sep 26, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant