Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Feature】When the cluster capacity is almost full, make the cluster read only #2868

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

liuminjian
Copy link

@liuminjian liuminjian commented Nov 6, 2023

What problem does this PR solve?

Issue Number: #2561

Problem Summary: When the space of a single chunkserver of curvebs is insufficient, chunkserver will down directly

What is changed and how it works?

What's Changed:

1.Heartbeat reports disk full error and mds set copyset availflag false and set disk status error.
2.Copyset node leader set readonly when receive copyset availflag false from heartbeat.
3.If the disk becomes full while writing to the chunk file, the server return no space err and client hangs until space is freed up manually.

How it Works:

1.When the disk is full, the heartbeat uploads the disk status. MDS sets the disk status to error to prevent other copysets from migrating to this disk, and sets the copyset to be unavailable to avoid creating new space from these copysets.
2.When copyset status is unavailable, copysetnode will be set to readonly. when a new write request comes in, a read-only prompt will be returned.
3.If the disk becomes full while writing to the chunk file, the server return no space err and client hangs until space is freed up manually.

Side effects(Breaking backward compatibility? Performance regression?):
Older versions of chunkserver need to add disk limit usage percentage configuration

Check List

  • Relevant documentation/comments is changed or added
  • I acknowledge that all my contributions will be made under the project's license

@liuminjian liuminjian force-pushed the feat/clusterfull branch 4 times, most recently from dc1cee6 to 1041ceb Compare November 10, 2023 04:17
message DiskState {
required uint32 errType = 1;
required ErrorType errType = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using ErrorType instead of uint32 satisfy compatibility?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i have checked them all

}
}
// 等待写操作完成,否则on_apply结束后,异步有写错误无法调用set_error_and_rollback()
concurrentapply_->Flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause performance degradation, which is not acceptable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any better ideas. When the on_apply() method completes, last_applied_index will be updated and the Iterator will be destructed, but concurrent tasks may not be completed yet. Calling iterator->set_error_and_callback() may fail when a write error occurs.

@@ -238,6 +238,10 @@ void ClientClosure::Run() {
OnEpochTooOld();
break;

case CHUNK_OP_STATUS::CHUNK_OP_STATUS_READONLY:
OnReadOnly();
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the space is full, the client needs to retry

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@@ -100,13 +101,41 @@ void HeartbeatManager::UpdateChunkServerDiskStatus(
const ChunkServerHeartbeatRequest &request) {
// update ChunkServerState status (disk status)
ChunkServerState state;
if (request.diskstate().errtype() != 0) {

switch (request.diskstate().errtype())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the code style should be consistent with the code repository

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

topology_->SetCopySetAvalFlag(key, false);
}
// 设置disk error,copyset就不会迁移到这个chunkserver
state.SetDiskState(curve::mds::topology::DISKERROR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a new disk state, maybe DISKFULL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added DISKFULL status

@wuhongsong
Copy link
Contributor

cicheck

@wuhongsong wuhongsong closed this Dec 1, 2023
@wuhongsong wuhongsong reopened this Dec 1, 2023
if (errno == EINTR && retryTimes < MAX_RETYR_TIME) {
++retryTimes;
continue;
} else if (errno == ENOSPC) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改在这里可能不合适,需要返回错误,以阻止client端不停的重试IO导致更多的空间不足

@YunhuiChen
Copy link
Contributor

cicheck

1 similar comment
@YunhuiChen
Copy link
Contributor

cicheck

@YunhuiChen YunhuiChen closed this Dec 2, 2023
@YunhuiChen YunhuiChen reopened this Dec 2, 2023
@YunhuiChen
Copy link
Contributor

cicheck

@liuminjian liuminjian force-pushed the feat/clusterfull branch 3 times, most recently from 3134351 to b9219d6 Compare December 6, 2023 02:35
@wu-hanqing wu-hanqing self-requested a review December 21, 2023 09:45
<< ", request: " << request.ShortDebugString();
}
break;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
};
}

LOG(WARNING) << "write failed: "
<< " data store return: " << ret
<< ", request: " << request_->ShortDebugString();
sleep(WAIT_FOR_DISK_FREED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function may be executed in bthread, it's better to use bthread_usleep

@@ -85,6 +85,8 @@ enum CHUNK_OP_STATUS {
CHUNK_OP_STATUS_BACKWARD = 10; // 请求的版本落后当前chunk的版本
CHUNK_OP_STATUS_CHUNK_EXIST = 11; // chunk已存在
CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12; // request epoch too old
CHUNK_OP_STATUS_READONLY = 13; // copyset其他节点故障,设为只读
CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误
CHUNK_OP_STATUS_NO_SPACE = 14; // 空间不足错误

@@ -85,6 +85,8 @@ enum CHUNK_OP_STATUS {
CHUNK_OP_STATUS_BACKWARD = 10; // 请求的版本落后当前chunk的版本
CHUNK_OP_STATUS_CHUNK_EXIST = 11; // chunk已存在
CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12; // request epoch too old
CHUNK_OP_STATUS_READONLY = 13; // copyset其他节点故障,设为只读
CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English comments

@@ -71,8 +71,13 @@ message CopysetStatistics {
required uint32 writeIOPS = 4;
}

enum ErrorType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse DiskState in topology.proto?

std::vector<CopysetNodePtr> copysets;
copysetMan_->GetAllCopysetNodes(&copysets);

req->set_copysetcount(copysets.size());
int leaders = 0;

for (CopysetNodePtr copyset : copysets) {

// 如果磁盘空间不足设为readonly
if (diskState->errtype() == curve::mds::heartbeat::DISKFULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to call SetReadOnly only if disk state changed

} else if (CSErrorCode::NoSpaceError == ret) {
LOG(ERROR) << "paste chunk failed: "
<< ", request: " << request_->ShortDebugString();
sleep(WAIT_FOR_DISK_FREED);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, use bthread_usleep and it's better to add WAIT_FOR_DISK_FREED into configuration file like chunkfilepool.diskUsagePercentLimit

<< ", request: " << request.ShortDebugString();
}
break;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
};
}

curve::mds::heartbeat::ErrorType errType = request.diskstate().errtype();

if (errType == curve::mds::heartbeat::DISKFULL) {
// 当chunkserver磁盘接近满,需要将copyset availflag设为false,避免新空间从这些copyset分配
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English comments

tools-v2/go.mod Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caoxianfei1 PTAL~

@liuminjian liuminjian force-pushed the feat/clusterfull branch 4 times, most recently from e4f77ce to 6f09bcd Compare December 23, 2023 05:36
@wu-hanqing
Copy link
Contributor

cicheck

1 similar comment
@wu-hanqing
Copy link
Contributor

cicheck

@wu-hanqing
Copy link
Contributor

cicheck

@wu-hanqing
Copy link
Contributor

cicheck

@liuminjian liuminjian force-pushed the feat/clusterfull branch 14 times, most recently from c4e6aca to 9771cbf Compare December 28, 2023 07:04
…ead only

Signed-off-by: liuminjian <liuminjian@chinatelecom.cn>
@liuminjian
Copy link
Author

cicheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants