【Feature】When the cluster capacity is almost full, make the cluster read only #2868

liuminjian · 2023-11-06T10:04:47Z

What problem does this PR solve?

Issue Number: #2561

Problem Summary: When the space of a single chunkserver of curvebs is insufficient, chunkserver will down directly

What is changed and how it works?

What's Changed:

1.Heartbeat reports disk full error and mds set copyset availflag false and set disk status error.
2.Copyset node leader set readonly when receive copyset availflag false from heartbeat.
3.If the disk becomes full while writing to the chunk file, the server return no space err and client hangs until space is freed up manually.

How it Works:

1.When the disk is full, the heartbeat uploads the disk status. MDS sets the disk status to error to prevent other copysets from migrating to this disk, and sets the copyset to be unavailable to avoid creating new space from these copysets.
2.When copyset status is unavailable, copysetnode will be set to readonly. when a new write request comes in, a read-only prompt will be returned.
3.If the disk becomes full while writing to the chunk file, the server return no space err and client hangs until space is freed up manually.

Side effects(Breaking backward compatibility? Performance regression?):
Older versions of chunkserver need to add disk limit usage percentage configuration

Check List

Relevant documentation/comments is changed or added
I acknowledge that all my contributions will be made under the project's license

xu-chaojie · 2023-11-10T08:27:50Z

proto/heartbeat.proto

 message DiskState {
-    required uint32 errType = 1;
+    required ErrorType errType = 1;


Does using ErrorType instead of uint32 satisfy compatibility?

yes, i have checked them all

xu-chaojie · 2023-11-10T08:30:19Z

src/chunkserver/copyset_node.cpp

        }
    }
+    // 等待写操作完成，否则on_apply结束后，异步有写错误无法调用set_error_and_rollback()
+    concurrentapply_->Flush();


This will cause performance degradation, which is not acceptable

I don't have any better ideas. When the on_apply() method completes, last_applied_index will be updated and the Iterator will be destructed, but concurrent tasks may not be completed yet. Calling iterator->set_error_and_callback() may fail when a write error occurs.

xu-chaojie · 2023-11-10T08:36:32Z

src/client/chunk_closure.cpp

@@ -238,6 +238,10 @@ void ClientClosure::Run() {
            OnEpochTooOld();
            break;

+        case CHUNK_OP_STATUS::CHUNK_OP_STATUS_READONLY:
+            OnReadOnly();
+            break;


When the space is full, the client needs to retry

xu-chaojie · 2023-11-10T08:38:22Z

src/mds/heartbeat/heartbeat_manager.cpp

@@ -100,13 +101,41 @@ void HeartbeatManager::UpdateChunkServerDiskStatus(
    const ChunkServerHeartbeatRequest &request) {
    // update ChunkServerState status (disk status)
    ChunkServerState state;
-    if (request.diskstate().errtype() != 0) {
+
+    switch (request.diskstate().errtype())


Note that the code style should be consistent with the code repository

xu-chaojie · 2023-11-10T08:45:41Z

src/mds/heartbeat/heartbeat_manager.cpp

+            topology_->SetCopySetAvalFlag(key, false);  
+        }
+        // 设置disk error，copyset就不会迁移到这个chunkserver
+        state.SetDiskState(curve::mds::topology::DISKERROR);


add a new disk state, maybe DISKFULL?

I have added DISKFULL status

wuhongsong · 2023-12-01T02:12:48Z

cicheck

xu-chaojie · 2023-12-01T02:57:53Z

src/fs/ext4_filesystem_impl.cpp

+        if (errno == EINTR && retryTimes < MAX_RETYR_TIME) {
+            ++retryTimes;
+            continue;
+        } else if (errno == ENOSPC) {


改在这里可能不合适，需要返回错误，以阻止client端不停的重试IO导致更多的空间不足

YunhuiChen · 2023-12-02T03:37:31Z

cicheck

YunhuiChen · 2023-12-02T03:39:43Z

cicheck

YunhuiChen · 2023-12-02T04:38:14Z

cicheck

wu-hanqing · 2023-12-21T09:53:02Z

src/chunkserver/op_request.cpp

+                      << ", request: " << request.ShortDebugString();
+       }
+       break;
+    };


Suggested change

};

}

wu-hanqing · 2023-12-21T09:54:00Z

src/chunkserver/op_request.cpp

+           LOG(WARNING) << "write failed: "
+                        << " data store return: " << ret
+                        << ", request: " << request_->ShortDebugString();
+           sleep(WAIT_FOR_DISK_FREED);             


this function may be executed in bthread, it's better to use bthread_usleep

wu-hanqing · 2023-12-21T14:12:29Z

proto/chunk.proto

@@ -85,6 +85,8 @@ enum CHUNK_OP_STATUS {
    CHUNK_OP_STATUS_BACKWARD = 10;          // 请求的版本落后当前chunk的版本
    CHUNK_OP_STATUS_CHUNK_EXIST = 11;       // chunk已存在
    CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12;     // request epoch too old
+    CHUNK_OP_STATUS_READONLY = 13;          // copyset其他节点故障，设为只读
+    CHUNK_OP_STATUS_ENOSPC = 14;            // 空间不足错误


Suggested change

CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误

CHUNK_OP_STATUS_NO_SPACE = 14; // 空间不足错误

wu-hanqing · 2023-12-21T14:13:56Z

proto/chunk.proto

@@ -85,6 +85,8 @@ enum CHUNK_OP_STATUS {
    CHUNK_OP_STATUS_BACKWARD = 10;          // 请求的版本落后当前chunk的版本
    CHUNK_OP_STATUS_CHUNK_EXIST = 11;       // chunk已存在
    CHUNK_OP_STATUS_EPOCH_TOO_OLD = 12;     // request epoch too old
+    CHUNK_OP_STATUS_READONLY = 13;          // copyset其他节点故障，设为只读
+    CHUNK_OP_STATUS_ENOSPC = 14;            // 空间不足错误


Please use English comments

wu-hanqing · 2023-12-21T14:31:36Z

proto/heartbeat.proto

@@ -71,8 +71,13 @@ message CopysetStatistics {
    required uint32 writeIOPS = 4;
 }

+enum ErrorType {


reuse DiskState in topology.proto?

wu-hanqing · 2023-12-21T14:57:55Z

src/chunkserver/heartbeat.cpp

    std::vector<CopysetNodePtr> copysets;
    copysetMan_->GetAllCopysetNodes(&copysets);

    req->set_copysetcount(copysets.size());
    int leaders = 0;

    for (CopysetNodePtr copyset : copysets) {
+
+        // 如果磁盘空间不足设为readonly
+        if (diskState->errtype() == curve::mds::heartbeat::DISKFULL) {


it's better to call SetReadOnly only if disk state changed

wu-hanqing · 2023-12-21T15:00:45Z

src/chunkserver/op_request.cpp

+        } else if (CSErrorCode::NoSpaceError == ret) {
+            LOG(ERROR) << "paste chunk failed: "
+                   << ", request: " << request_->ShortDebugString();
+            sleep(WAIT_FOR_DISK_FREED);             


ditto, use bthread_usleep and it's better to add WAIT_FOR_DISK_FREED into configuration file like chunkfilepool.diskUsagePercentLimit

wu-hanqing · 2023-12-21T15:00:55Z

src/chunkserver/op_request.cpp

+                       << ", request: " << request.ShortDebugString();
+        }
+        break;
+    };


Suggested change

};

}

wu-hanqing · 2023-12-21T15:03:49Z

src/mds/heartbeat/heartbeat_manager.cpp

+    curve::mds::heartbeat::ErrorType errType = request.diskstate().errtype();
+
+    if (errType == curve::mds::heartbeat::DISKFULL) {
+        // 当chunkserver磁盘接近满，需要将copyset availflag设为false，避免新空间从这些copyset分配


Please use English comments

wu-hanqing · 2023-12-21T15:06:32Z

tools-v2/go.mod

@caoxianfei1 PTAL~

wu-hanqing · 2023-12-25T06:52:48Z

cicheck

wu-hanqing · 2023-12-25T08:04:24Z

cicheck

wu-hanqing · 2023-12-25T13:02:53Z

cicheck

wu-hanqing · 2023-12-26T02:48:54Z

cicheck

…ead only Signed-off-by: liuminjian <liuminjian@chinatelecom.cn>

liuminjian · 2023-12-28T07:42:10Z

cicheck

liuminjian force-pushed the feat/clusterfull branch 4 times, most recently from dc1cee6 to 1041ceb Compare November 10, 2023 04:17

xu-chaojie reviewed Nov 10, 2023

View reviewed changes

liuminjian force-pushed the feat/clusterfull branch from 1041ceb to d39e3fb Compare November 15, 2023 08:55

wuhongsong closed this Dec 1, 2023

wuhongsong reopened this Dec 1, 2023

xu-chaojie reviewed Dec 1, 2023

View reviewed changes

YunhuiChen closed this Dec 2, 2023

YunhuiChen reopened this Dec 2, 2023

liuminjian force-pushed the feat/clusterfull branch 3 times, most recently from 3134351 to b9219d6 Compare December 6, 2023 02:35

liuminjian requested a review from xu-chaojie December 8, 2023 08:50

xu-chaojie approved these changes Dec 14, 2023

View reviewed changes

wu-hanqing self-requested a review December 21, 2023 09:45

wu-hanqing reviewed Dec 21, 2023

View reviewed changes

liuminjian force-pushed the feat/clusterfull branch 4 times, most recently from e4f77ce to 6f09bcd Compare December 23, 2023 05:36

liuminjian force-pushed the feat/clusterfull branch from 6f09bcd to 46d06b4 Compare December 25, 2023 10:45

liuminjian force-pushed the feat/clusterfull branch from 46d06b4 to fb3b7a4 Compare December 26, 2023 00:58

liuminjian force-pushed the feat/clusterfull branch 14 times, most recently from c4e6aca to 9771cbf Compare December 28, 2023 07:04

Feature: When the cluster capacity is almost full, make the cluster r…

9771cbf

…ead only Signed-off-by: liuminjian <liuminjian@chinatelecom.cn>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Feature】When the cluster capacity is almost full, make the cluster read only #2868

【Feature】When the cluster capacity is almost full, make the cluster read only #2868

liuminjian commented Nov 6, 2023 •

edited by Ziy1-Tan

xu-chaojie Nov 10, 2023

liuminjian Nov 15, 2023

xu-chaojie Nov 10, 2023 •

edited

liuminjian Nov 15, 2023 •

edited

xu-chaojie Nov 10, 2023

liuminjian Nov 15, 2023

xu-chaojie Nov 10, 2023

liuminjian Nov 15, 2023

xu-chaojie Nov 10, 2023

liuminjian Nov 15, 2023

wuhongsong commented Dec 1, 2023

xu-chaojie Dec 1, 2023

YunhuiChen commented Dec 2, 2023

YunhuiChen commented Dec 2, 2023

YunhuiChen commented Dec 2, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing Dec 21, 2023

wu-hanqing commented Dec 25, 2023

wu-hanqing commented Dec 25, 2023

wu-hanqing commented Dec 25, 2023

wu-hanqing commented Dec 26, 2023

liuminjian commented Dec 28, 2023

	CHUNK_OP_STATUS_ENOSPC = 14; // 空间不足错误
	CHUNK_OP_STATUS_NO_SPACE = 14; // 空间不足错误

【Feature】When the cluster capacity is almost full, make the cluster read only #2868

Are you sure you want to change the base?

【Feature】When the cluster capacity is almost full, make the cluster read only #2868

Conversation

liuminjian commented Nov 6, 2023 • edited by Ziy1-Tan

What problem does this PR solve?

What is changed and how it works?

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xu-chaojie Nov 10, 2023 • edited

Choose a reason for hiding this comment

liuminjian Nov 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wuhongsong commented Dec 1, 2023

Choose a reason for hiding this comment

YunhuiChen commented Dec 2, 2023

YunhuiChen commented Dec 2, 2023

YunhuiChen commented Dec 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wu-hanqing commented Dec 25, 2023

wu-hanqing commented Dec 25, 2023

wu-hanqing commented Dec 25, 2023

wu-hanqing commented Dec 26, 2023

liuminjian commented Dec 28, 2023

liuminjian commented Nov 6, 2023 •

edited by Ziy1-Tan

xu-chaojie Nov 10, 2023 •

edited

liuminjian Nov 15, 2023 •

edited