Allow peerDownload config for minion-tasks #12458

tibrewalpratik17 · 2024-02-21T10:16:43Z

Currently, minion-jobs always use zkMetadata downloadURI to download segments from deepstore.

I want to get community's opinion on adding a new optional task-level config allowPeerDownload to allow minion-task during execution to try downloading segment from a server-peer directly once deepstore-retries fails. Currently, the job fails and does not move forward. This also creates a situation of head-of-line-blocking queue for subsequent task-runs if tableMaxNumTasks is specified.

PS: this issue specifically discusses the situation where the zkMetadata has deepstore URI available and is not "" (empty).
There can be multiple reasons for deepstore URI download failures:

Issues with deepstore connection / timeouts
The segment-deepstore-copy getting TTL'ed from deepstore due to other non-Pinot frameworks (we are seeing this for some of our clusters).

For example, we have an upsert-compaction task enabled for a table with following configs:

"task": {
      "taskTypeConfigsMap": {
        "UpsertCompactionTask": {
          "invalidRecordsThresholdPercent": "30",
          "bufferTimePeriod": "0d",
          "schedule": "0 */5 * * * ?",
          "tableMaxNumTasks": "5"
        }
      }
    },

The table has data for more than 60days. There was a TTL enforced on the deepstore-path for 7days.

The following graph shows that there was a drop in row-count (in red circle) when UpsertCompaction first kicked off (at that time I had removed "tableMaxNumTasks": "5" config). But once I added that config there is no task getting executed (not even for newer segments) because it is getting blocked by FileNotFoundException while downloading from deep-store for older segments.

The table has huge potential for cost-savings in terms of compaction and it seems we can use peerDownload to unblock the task.

Another parallel discussion:
There is a framework which periodically checks if zkMetadata URI is empty then upload the jar to deepstore but there is no framework which checks if the path pointed out by zkMetadata URI is actually present or not.

The text was updated successfully, but these errors were encountered:

Jackie-Jiang · 2024-04-01T20:23:37Z

I'm good with the proposal. Maybe renaming to allowDownloadFromServer

tibrewalpratik17 · 2024-04-02T08:54:19Z

Please assign to me. Thanks!

Jackie-Jiang · 2024-04-05T00:46:56Z

cc @snleee @swaminathanmanish

estebanz01 mentioned this issue Mar 21, 2024

Minion task for hybrid table erroring about a retry attempt exhaustion #12547

Closed

Jackie-Jiang added feature minion labels Apr 1, 2024

Jackie-Jiang assigned tibrewalpratik17 Apr 5, 2024

tibrewalpratik17 linked a pull request Apr 18, 2024 that will close this issue

Add ability for minion nodes to download segments from servers during task execution #12960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow peerDownload config for minion-tasks #12458

Allow peerDownload config for minion-tasks #12458

tibrewalpratik17 commented Feb 21, 2024 •

edited

Jackie-Jiang commented Apr 1, 2024

tibrewalpratik17 commented Apr 2, 2024

Jackie-Jiang commented Apr 5, 2024

Allow peerDownload config for minion-tasks #12458

Allow peerDownload config for minion-tasks #12458

Comments

tibrewalpratik17 commented Feb 21, 2024 • edited

Jackie-Jiang commented Apr 1, 2024

tibrewalpratik17 commented Apr 2, 2024

Jackie-Jiang commented Apr 5, 2024

tibrewalpratik17 commented Feb 21, 2024 •

edited