Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-1517] create marker file for every log file #11187

Merged
merged 3 commits into from
May 14, 2024

Conversation

codope
Copy link
Member

@codope codope commented May 10, 2024

Change Logs

NOTE: This is the revision of #10487 but for 0.x line and improvements to avoid unnecessary listing. The description is copied from the original PR, however please check Testing section below which lists out the coverage and other error injection tests that we ran for this patch.

We are looking to fix two problems by adding per log file marker.
a. i. MOR data table rollbacks missed to sync original log files from failed commit to MDT.
a.ii. Along these lines, if rollback instant is retried multiple times, any log files added from failed rollback attempts should also be synced to MDT.
b. If there are spurious log files created even w/ successful commits, we need to ensure these spurious log files are also synced to MDT.

So, to fix all of the above, we are adding per log file marker. Any log file added or appended to will create markers. We don't really need to distinguish between create and append and so we will go w/ APPEND IoType for markers.

Fix for (a. i): Any log file added will emit a marker. If the commit of interest failed, hudi will trigger a rollback. During rollback planning, using markers we identify the original log files added by the failed commit and track it as part of the rollback plan. This also gets tracked in HoodieRollbackMetadata (had to upgrade the schema for this purpose).

Fix for (a.ii): Whenever a rollback is triggered, hudi adds a rollback command block. With this patch, we are also emitting markers for such log files. During rollback execution, apart from adding log files added by failed commit to HoodieRollbackMetadata, we also add these log files which could have been added by previous attempts of rollback for the same instant.

Fix for (b): During marker based reconciliation step, we check for log files from markers and compare it against HoodieCommitMetadata's HoodieWriteStat. If for any additional files tracked using markers (which could happen due to spark retries), we will add new HoodieWriteStat and update HoodieCommitMetadata. So, that when this syncs to MDT, we don't miss to track this spurious log files. We will use #9545 to skip such spurious log files on the reader side. So, on the writer side, we just want to ensure we don't miss to track any log file created by hudi.

Note: Please do note that the reconciliation for log files is kind of opposite of what happens w/ data files. w/ data files, any extraneous files are deleted. But for any extaneous log files, we can't afford to delete. Since there could be a concurrent reader trying to read the the file slice of interest. Eventually during execution, it might parse the log block header and might skip if its partially failed commit or inflight commit. Anyways, in short, we can't afford to delete any log files at any point in time except cleaner. So, for any extraneous log files detected, we fix the HoodieCommitMetadata to track these additional log files as well.

Notes to reviewers to assist in reviewing:

I will break down diff set of changes and the classes to review for the same.

  1. Adding per log file marker for regular log files: Added a callback(AppendLogWriteCallback) for this purpose since we may not know the log file name upfront (unlike data files). New apis are introduced to Markers for this purpose.
    Check files HoodieWriteHandle, HoodieAppendHandle, HoodieLogFormatWriter, HoodieLogFormat, HoodieLogFileWriterCallback
    DirectWriterMarkers, TimelineServerBasedMarkers, WriteMarkers, RequestHandler.

  2. Schema upgrade for rollback metadata.
    Check file HoodieRollbackMetadata.avsc

  3. Rollback from DT when synced to MDT: w/ the schema upgrade, we will fetch "logFilesFromFailedCommit" in HoodieRollbackMetadata and make a delta commit to MDT.
    HoodieBackedTableMetadataWriter and HoodieTableMetadataUtil.

4.Rollback plan changes:
When using Marker based rollback strategy, we poll markers to find the log files added. Apart from log file names, we also need the actual size. So, we do fs listing to fetch the file lengths. We track these as part of HoodieRollbackRequest.logFilesWithBlocksToRollback. There are chances that some files could be missing which are tracked in markers. We can ignore these files since this could happen (just after creating marker file, lets say the process crashed w/o creating the actual log files)
Classes to check: MarkerBasedRollbackStrategy

  1. New argument to StorageScheme named listStatusUnfriendly. Depending on storage scheme, the file system based listing to triage the actual size of the log files of interest could change. As per this patch, we only have one way to do this. But in a follow up patch, we might have to fix that.

  2. Fixing HoodieCommitMetadata to include HoodieWriteStat for any missing log files which are extraneous through markers.
    Classes to check: SparkRDDWriteClient.commit and addMissingLogFileIfNeeded(). This also includes the file system listing to fetch the actual size of spurious log files if any.

  3. Rollback execution:
    a. With this patch, we are also emitting markers for rollback command blocks (log files).
    b. During execution, we also need to fetch any log files added by previous attempts of rollback and update HoodieRollbackSat if need be. Note that we can have only one HoodieRollbackPlan per partition.
    Classes: BaseRollbackHelper.

  4. Misc:
    HoodiePairData to add join() support. HoodieListPairData, HoodieJavaPairRDD.
    FsUtils.getFileStatusesUnderPartition to assist in fetching file statuses for some interested log files.

Testing:

Below tests are covered in this PR. Here, we are generating log files with marker and injecting failures and then validating using fs view

  1. simple insert directly to log files followed by compaction
  2. rollback after failure before updating MDT
  3. rollback after failure during updating MDT
  4. rollback after failure after updating MDT
  5. insert followed by upsert and then failed rollback
  6. same as above and then reattempt rollback

Apart from these, we have a bunch of other tests covering metadata validation for the general workflows.

Long running error injection tests:

  1. MOR w/ async cleaning, compaction, MDT enabled, no RLI
  2. MOR w/ async cleaning, compaction, inline clustering, MDT enabled, no RLI
  3. MOR w/ async cleaning, compaction, MDT enabled, RLI enabled
  4. MOR w/ async cleaning, compaction, inline clustering MDT enabled, RLI enabled

Impact

MDT in sync with filesystem

Risk level (write none, low medium or high below)

high

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope changed the title Hudi 1517 log file markers [HUDI-1517] create marker file for every log file May 10, 2024
@codope codope requested a review from nsivabalan May 10, 2024 07:34
@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label May 10, 2024
@yihua yihua assigned nsivabalan and danny0405 and unassigned nsivabalan May 14, 2024
@danny0405
Copy link
Contributor

If we are aiming to revert #9545, how we make the reader work with spurious log files. Still I think the change is huge and it will bring in too much overhead for maintainance for 0.x branch, if we are saying 0.15.0 is the last release of 0.x, I'm fine with this.

@KnightChess
Copy link
Contributor

Hello @codope , I have a question. In problem b you mentioned . why not delete the extra orphaned files by marker file and commit metadata like cow table, but instead leave them to use #9545 when reading.

@danny0405
Copy link
Contributor

but instead leave them to use #9545 when reading.

I think it is because in 0.x branch, we are trying to appending to existing log files instead of creating new ones.

@KnightChess
Copy link
Contributor

@danny0405 sorry, I don't quite understand. For example, a write task create a new a_0-0-1.logfile, and this's speculation task also create a new a_0-0-2.logfile, these two files are the same and with tow marker log file. Now spark driver will recive one file meta stat which finish first, so other is orphaned file I think we need delete when solve file conflict use log marker file. And in next commit time, can use the file which has remaind to append.

@codope codope force-pushed the hudi-1517-log-file-markers branch from e3d598b to ba291d8 Compare May 14, 2024 13:51
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit c2c7e05 into apache:branch-0.x May 14, 2024
31 checks passed
@danny0405
Copy link
Contributor

@danny0405 sorry, I don't quite understand. For example, a write task create a new a_0-0-1.logfile, and this's speculation task also create a new a_0-0-2.logfile, these two files are the same and with tow marker log file. Now spark driver will recive one file meta stat which finish first, so other is orphaned file I think we need delete when solve file conflict use log marker file. And in next commit time, can use the file which has remaind to append.

The two attempts appended to the same file.

@KnightChess
Copy link
Contributor

@danny0405 if a fileGroup only have base file, the normal task and speculation task will all create a new file in different seq number like #10803 (comment) said

@danny0405
Copy link
Contributor

yeah, you are right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-0.15.0 size:XL PR with lines of changes > 1000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants