Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minion task for hybrid table erroring about a retry attempt exhaustion #12547

Closed
estebanz01 opened this issue Mar 4, 2024 · 8 comments
Closed

Comments

@estebanz01
Copy link

estebanz01 commented Mar 4, 2024

Hola! 馃憢

I'm having some trouble with a pinot cluster deployed into kubernetes with minion enabled. I want to move data from real time table to offline table but it's failing with the following information:

^T[16:18:00.383 [TaskStateModelFactory-task_thread-7] ERROR org.apache.pinot.minion.taskfactory.TaskFactoryRegistry - Caught exception while executing task: Task_RealtimeToOfflineSegmentsTask_8961b037-3c41-47d7-b56f-375ef16dc2fc_1709569080105_0
org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 1 attempts
    at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:65) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.common.utils.fetcher.HttpSegmentFetcher.fetchSegmentToLocal(HttpSegmentFetcher.java:62) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocalInternal(SegmentFetcherFactory.java:158) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:152) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocalInternal(SegmentFetcherFactory.java:202) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocal(SegmentFetcherFactory.java:190) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:201) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:77) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:157) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:118) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:829) [?:?]

That's the only error I see in the minion pods and there's nothing else on the other pods for pinot. Any ideas on how to debug this further? here's the schema and table config for my hybrid table:

Schema definition
{
  "schemaName": "data_counting",
  "dimensionFieldSpecs": [
    {
      "name": "device_name",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "bytes_sent",
      "dataType": "LONG"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "__key",
      "dataType": "TIMESTAMP",
      "format": "1:MICROSECONDS:EPOCH",
      "granularity": "1:MICROSECONDS"
    },
    {
      "name": "__metadata$eventTime",
      "dataType": "TIMESTAMP",
      "format": "1:MICROSECONDS:EPOCH",
      "granularity": "1:MICROSECONDS"
    }
  ]
}
Table configuration (REALTIME)
{
  "REALTIME": {
    "tableName": "data_counting_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "schemaName": "data_counting",
      "replication": "1",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "15",
      "replicasPerPartition": "1",
      "minimizeDataMovement": false,
      "timeColumnName": "__key"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant",
      "tagOverrideConfig": {}
    },
    "tableIndexConfig": {
      "invertedIndexColumns": [],
      "noDictionaryColumns": [],
      "streamConfigs": {
        "streamType": "pulsar",
        "stream.pulsar.topic.name": "persistent://client/devices/all",
        "stream.pulsar.bootstrap.servers": "pulsar://pulsar-proxy.pulsar.svc.cluster.local:6650",
        "stream.pulsar.prop.auto.offset.reset": "smallest",
        "stream.pulsar.consumer.type": "lowlevel",
        "stream.pulsar.fetch.timeout.millis": "20000",
        "stream.pulsar.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
        "stream.pulsar.consumer.factory.class.name": "org.apache.pinot.plugin.stream.pulsar.PulsarConsumerFactory",
        "realtime.segment.flush.threshold.rows": "10000",
        "realtime.segment.flush.threshold.time": "1h",
        "stream.pulsar.metada.populate": "true",
        "stream.pulsar.metadata.fields": "eventTime"
      },
      "loadMode": "MMAP",
      "onHeapDictionaryColumns": [],
      "varLengthDictionaryColumns": [],
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false,
      "rangeIndexColumns": [],
      "rangeIndexVersion": 2,
      "optimizeDictionary": false,
      "optimizeDictionaryForMetrics": false,
      "noDictionarySizeRatioThreshold": 0.85,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "sortedColumn": [],
      "bloomFilterColumns": []
    },
    "metadata": {},
    "quota": {},
    "task": {
      "taskTypeConfigsMap": {
        "RealtimeToOfflineSegmentsTask": {
          "bucketTimePeriod": "1h",
          "bufferTimePeriod": "2h",
          "mergeType": "concat",
          "maxNumRecordsPerSegment": "100000",
          "schedule": "0 * * * * ?"
        }
      }
    },
    "routing": {},
    "query": {
      "timeoutMs": 60000
    },
    "ingestionConfig": {
      "continueOnError": false,
      "rowTimeValueCheck": false,
      "segmentTimeValueCheck": true
    },
    "isDimTable": false
  }
}
Table configuration (OFFLINE)
{
  "OFFLINE": {
    "tableName": "data_counting_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "schemaName": "data_counting",
      "replication": "1",
      "replicasPerPartition": "1",
      "timeColumnName": "__key",
      "minimizeDataMovement": false,
      "segmentPushType": "APPEND",
      "segmentPushFrequency": "HOURLY"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "invertedIndexColumns": [],
      "noDictionaryColumns": [],
      "rangeIndexColumns": [],
      "rangeIndexVersion": 2,
      "createInvertedIndexDuringSegmentGeneration": false,
      "autoGeneratedInvertedIndex": false,
      "sortedColumn": [],
      "bloomFilterColumns": [],
      "loadMode": "MMAP",
      "onHeapDictionaryColumns": [],
      "varLengthDictionaryColumns": [],
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false,
      "optimizeDictionary": false,
      "optimizeDictionaryForMetrics": false,
      "noDictionarySizeRatioThreshold": 0.85
    },
    "metadata": {},
    "quota": {},
    "routing": {},
    "query": {},
    "ingestionConfig": {
      "continueOnError": false,
      "rowTimeValueCheck": false,
      "segmentTimeValueCheck": true
    },
    "isDimTable": false
  }
}

I'm using apache pulsar 3.2.0 and apache pinot version 1.0.0.

@estebanz01
Copy link
Author

estebanz01 commented Mar 4, 2024

additional information from the pinot controller:

java.lang.IllegalStateException: Failed to move segment file for segment
pinot-controller-2 controller 20:09:56.414 [grizzly-http-server-0] ERROR SegmentCompletionFSM_data_counting__0__3__20240304T2005Z - Caught exception while committing segment file for segment: data_counting__0__3__20240304T2005Z
pinot-controller-2 controller java.lang.IllegalStateException: Failed to move segment file for segment: data_counting_temp__0__3__20240304T2005Z from: file:/var/pinot/controller/data,s3://<bucket-name>/pinot-data/pinot/controller-data/data_counting/data_counting__0__3__20240304T2005Z.tmp.7b041191-d8d0-4df9-bf97-543ca6c0a407 to: file:/var/pinot/controller/data,s3://<bucket-name>/pinot-data/pinot/controller-data/data_counting/data_counting__0__3__20240304T2005Z
pinot-controller-2 controller     at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:854) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.moveSegmentFile(PinotLLCRealtimeSegmentManager.java:1580) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.apache.pinot.controller.helix.core.realtime.PinotLLCRealtimeSegmentManager.commitSegmentFile(PinotLLCRealtimeSegmentManager.java:489) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager$SegmentCompletionFSM.commitSegment(SegmentCompletionManager.java:1085) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager$SegmentCompletionFSM.segmentCommitEnd(SegmentCompletionManager.java:660) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.segmentCommitEnd(SegmentCompletionManager.java:326) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.apache.pinot.controller.api.resources.LLCSegmentCompletionHandlers.segmentCommitEndWithMetadata(LLCSegmentCompletionHandlers.java:444) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at jdk.internal.reflect.GeneratedMethodAccessor333.invoke(Unknown Source) ~[?:?]
pinot-controller-2 controller     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
pinot-controller-2 controller     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:134) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:177) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:81) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
pinot-controller-2 controller     at java.lang.Thread.run(Thread.java:829) [?:?]

@Jackie-Jiang
Copy link
Contributor

Can you also check the WARN log from minion?

@snleee @swaminathanmanish Please help take a look

@estebanz01
Copy link
Author

Sure thing. All warn information goes to the same output, right ? or it's there another specific location I can look into.

@estebanz01
Copy link
Author

here's the full log of a fresh minion pod:

minion pod log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-1.0.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-environment/pinot-azure/pinot-azure-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-clp-log/pinot-clp-log-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-orc/pinot-orc-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-yammer/pinot-yammer-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-stream-ingestion/pinot-pulsar/pinot-pulsar-1.0.0-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
ERROR StatusLogger Reconfiguration failed: No configuration found for 'Default' at 'null' in 'null'
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-1.0.0-jar-with-dependencies.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Mar 06, 2024 1:24:34 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:9514]
Mar 06, 2024 1:24:34 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
13:25:18.815 [TaskStateModelFactory-task_thread-0] ERROR org.apache.pinot.minion.taskfactory.TaskFactoryRegistry - Caught exception while executing task: Task_RealtimeToOfflineSegmentsTask_ddf2fd57-ed8f-4ee8-8c04-1e21137ed566_1709731500049_0
org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 1 attempts
	at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:65) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.common.utils.fetcher.HttpSegmentFetcher.fetchSegmentToLocal(HttpSegmentFetcher.java:62) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocalInternal(SegmentFetcherFactory.java:158) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchSegmentToLocal(SegmentFetcherFactory.java:152) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocalInternal(SegmentFetcherFactory.java:202) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.common.utils.fetcher.SegmentFetcherFactory.fetchAndDecryptSegmentToLocal(SegmentFetcherFactory.java:190) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:201) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.plugin.minion.tasks.BaseMultipleSegmentsConversionExecutor.executeTask(BaseMultipleSegmentsConversionExecutor.java:77) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.runInternal(TaskFactoryRegistry.java:157) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.pinot.minion.taskfactory.TaskFactoryRegistry$1.run(TaskFactoryRegistry.java:118) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at org.apache.helix.task.TaskRunner.run(TaskRunner.java:75) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

and the pod description:

Minion pod description
Name:             pinot-minion-0
Namespace:        pinot
Priority:         0
Service Account:  pinot
Node:            <internal-DNS-name>/<internal-IP>
Start Time:       Wed, 06 Mar 2024 08:23:42 -0500
Labels:           app=pinot
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/version=0.2.7
                  component=minion
                  controller-revision-hash=pinot-minion-84bbfbc6f4
                  helm.sh/chart=pinot-0.2.7
                  heritage=Helm
                  release=pinot
                  statefulset.kubernetes.io/pod-name=pinot-minion-0
Annotations:      kubectl.kubernetes.io/restartedAt: 2024-03-06T15:16:59+00:00
                  kubernetes.io/psp: eks.privileged
Status:           Running
IP:               <internal-IP>
IPs:
  IP:           <internal-IP>
Controlled By:  StatefulSet/pinot-minion
Containers:
  minion:
    Container ID:  containerd://e2cfae774017937ba2aa4f217d5f84a20809e4961c8920a82165bed4e290d2bf
    Image:         apachepinot/pinot:release-1.0.0
    Image ID:      docker.io/apachepinot/pinot@sha256:ef93c03cb223a30e2a0eb75452dfb2db1eab05271a59e2913845bff9814556bc
    Port:          9514/TCP
    Host Port:     0/TCP
    Args:
      StartMinion
      -clusterName
      pinot
      -zkAddress
      pinot-zookeeper:2181
      -configFileName
      /var/pinot/minion/config/pinot-minion.conf
    State:          Running
      Started:      Wed, 06 Mar 2024 08:23:49 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     400m
      memory:  1Gi
    Requests:
      cpu:      200m
      memory:   512Mi
    Liveness:   http-get http://:9514/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:9514/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      s3-deep-storage-user  Secret  Optional: false
    Environment:
      JAVA_OPTS:            -XX:ActiveProcessorCount=2 -XX:MaxRAMPercentage=70.0 -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-minion.log -Dlog4j2.configurationFile=/opt/pinot/etc/config/pinot-minion-log4j2.xml -Dplugins.dir=/opt/pinot/plugins
      LOG4J_CONSOLE_LEVEL:  info
    Mounts:
      /var/pinot/minion/config from config (rw)
      /var/pinot/minion/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xhbhh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-pinot-minion-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      pinot-minion-config
    Optional:  false
  kube-api-access-xhbhh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  15m   default-scheduler  Successfully assigned pinot/pinot-minion-0 to <internal-dns-name>
  Normal   Pulled     15m   kubelet            Container image "apachepinot/pinot:release-1.0.0" already present on machine
  Normal   Created    15m   kubelet            Created container minion
  Normal   Started    15m   kubelet            Started container minion

@estebanz01
Copy link
Author

estebanz01 commented Mar 21, 2024

OK, so while reading #12458 I noticed that my S3 bucket were empty and I found surprising that minions need S3 to work, so I went to look at the configuration for the controller and I found out that if I specify the property controller.data.dir twice, it merges both values instead of overriding them 馃檭 so now I have data in my S3 bucket, but now the controller is giving the following error:

pinot-controller-0 controller INFO: [HttpServer] Started.
pinot-controller-0 controller 17:43:02.118 [grizzly-http-server-3] ERROR org.apache.pinot.controller.util.CompletionServiceHelper - Server: null returned error: 404
pinot-controller-0 controller 17:43:02.123 [grizzly-http-server-3] ERROR org.apache.pinot.controller.util.CompletionServiceHelper - Server: null returned error: 404
pinot-controller-0 controller 17:43:02.125 [grizzly-http-server-3] ERROR org.apache.pinot.controller.util.CompletionServiceHelper - Connection error. Details: java.net.UnknownHostException: Controller_null: Name or service not known
pinot-controller-0 controller 17:56:00.608 [grizzly-http-server-3] ERROR org.apache.pinot.controller.util.CompletionServiceHelper - Server: null returned error: 404
pinot-controller-0 controller 17:56:00.610 [grizzly-http-server-3] ERROR org.apache.pinot.controller.util.CompletionServiceHelper - Server: null returned error: 404
pinot-controller-0 controller 17:56:00.611 [grizzly-http-server-3] ERROR org.apache.pinot.controller.util.CompletionServiceHelper - Connection error. Details: java.net.UnknownHostException: Controller_null: Name or service not known

here's the task information, according to the UI:

Task config:
{
  "tableName": "data_counting_REALTIME",
  "configs": {
    "maxNumRecordsPerSegment": "100000",
    "mergeType": "rollup",
    "downloadURL": "http://pinot-controller:9000/segments/data_counting/data_counting__0__50__20240306T0440Z",
    "bufferTimePeriod": "2h",
    "push.mode": "TAR",
    "windowStartMs": "1709730000000",
    "segmentName": "data_counting__0__50__20240306T0440Z",
    "tableName": "data_counting_REALTIME",
    "collectorType": "rollup",
    "schedule": "0 0/5 * * * ?",
    "uploadURL": "http://pinot-controller:9000/segments",
    "push.controllerUri": "http://pinot-controller:9000",
    "__key.aggregationType": "min",
    "bucketTimePeriod": "1h",
    "windowEndMs": "1709733600000",
    "TASK_ID": "Task_RealtimeToOfflineSegmentsTask_4e81b60e-021b-4ba7-8b4c-03fd8f968d1b_1711033800254_0"
  },
  "taskId": "Task_RealtimeToOfflineSegmentsTask_4e81b60e-021b-4ba7-8b4c-03fd8f968d1b_1711033800254_0",
  "taskType": "RealtimeToOfflineSegmentsTask"
}

and here's the configmap that the pinot controller pods are using:

Name:         pinot-controller-config
Namespace:    pinot
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: pinot
              meta.helm.sh/release-namespace: pinot

Data
====
pinot-controller.conf:
----
controller.helix.cluster.name=pinot
controller.port=9000
controller.vip.host=pinot-controller
controller.vip.port=9000
controller.data.dir=s3://<bucket-name>/pinot-data/pinot/controller-data
controller.zk.str=pinot-zookeeper:2181
pinot.set.instance.id.to.hostname=true
controller.task.scheduler.enabled=true
controller.local.temp.dir=/var/pinot/controller/data
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.storage.factory.s3.region=eu-west-1
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.controller.storage.factory.s3.disableAcl=false

BinaryData
====

Events:  <none>

from what I understand, the controller is trying to fetch segments from either a Null hostname or an invalid one. But the hosts are correct or appears to be correct.

Any ideas on how to make it work after this progress?

@estebanz01
Copy link
Author

btw, what does pinot.set.instance.id.to.hostname=true does and if I put it in false, how can I specify an alternative hostname?

@Jackie-Jiang
Copy link
Contributor

You may look up the usage of CommonConstants.SET_INSTANCE_ID_TO_HOSTNAME_KEY from the code.
You can use controller.host key to specify the host name

@estebanz01
Copy link
Author

OK, after lots of trial and error, this is what I did to have a working hybrid table with minion tasks and S3 deep storage:

Controller helm config
controller:
  # We make sure that only this configuration is present, as duplicated configs won't override but merge.
  data:
    dir: s3://<bucket-name>/<custom-path>/controller-data

  #聽If we don't specify the host and port, a `Controller_null_9000` controller will be seen by pinot.
  host: pinot-controller
  port: 9000

  # Not sure why a `Controller_null_9000` will appear if we have `vip` enable, but oh well!
  vip:
    enable: true
    host: pinot-controller
    port: 9000

  # ...other configs
  configs: |-
    pinot.set.instance.id.to.hostname=true
    controller.task.scheduler.enabled=true
    controller.local.temp.dir=/var/pinot/controller/data # Super important! data will be here until it's offloaded to S3
    pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
    pinot.controller.storage.factory.s3.region=us-east-1
    pinot.controller.segment.fetcher.protocols=file,http,s3
    pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    pinot.controller.storage.factory.s3.disableAcl=false
Minion helm config
minion:
  # ... other configs
  extra:
    configs: |-
      pinot.set.instance.id.to.hostname=true
      pinot.minion.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
      pinot.minion.storage.factory.s3.region=us-east-1
      pinot.minion.segment.fetcher.protocols=file,http,s3
      pinot.minion.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Basically, we had to configure the S3 filesystem on controller, server and minion so the workers can fetch and upload/download data when needed. I'm not sure how with other deep storage options it might look like, but it seems that all three components must be in config-sync, if that makes sense.

Thanks for the help on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants