Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: flush failed with error can not find session: node not found after etcd pod failure chaos test #33151

Closed
1 task done
zhuwenxing opened this issue May 20, 2024 · 5 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20240518-5cc38aa9-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-05-19T19:52:51.035Z] self = <test_data_persistence.TestDataPersistence object at 0x7efd8ed851c0>

[2024-05-19T19:52:51.035Z] db_name = 'default'

[2024-05-19T19:52:51.035Z] 

[2024-05-19T19:52:51.035Z]     @pytest.mark.tags(CaseLabel.L3)

[2024-05-19T19:52:51.035Z]     @pytest.mark.parametrize("db_name", ["default", "prod"])

[2024-05-19T19:52:51.035Z]     def test_milvus_default(self, db_name):

[2024-05-19T19:52:51.035Z]         self._connect()

[2024-05-19T19:52:51.035Z]         # create database if not exist

[2024-05-19T19:52:51.035Z]         dbs, _ = self.database_wrap.list_database()

[2024-05-19T19:52:51.035Z]         log.info(f"all database: {dbs}")

[2024-05-19T19:52:51.035Z]         if db_name not in dbs:

[2024-05-19T19:52:51.035Z]             log.info(f"create database {db_name}")

[2024-05-19T19:52:51.035Z]             self.database_wrap.create_database(db_name)

[2024-05-19T19:52:51.035Z]         self.database_wrap.using_database(db_name)

[2024-05-19T19:52:51.035Z]         # create collection

[2024-05-19T19:52:51.035Z]         name = "Hello_Milvus"

[2024-05-19T19:52:51.035Z]         t0 = time.time()

[2024-05-19T19:52:51.035Z]         collection_w = self.init_collection_wrap(name=name, active_trace=True)

[2024-05-19T19:52:51.035Z]         tt = time.time() - t0

[2024-05-19T19:52:51.035Z]         assert collection_w.name == name

[2024-05-19T19:52:51.035Z] >       entities = collection_w.num_entities

[2024-05-19T19:52:51.035Z] 

[2024-05-19T19:52:51.035Z] testcases/test_data_persistence.py:37: 

[2024-05-19T19:52:51.035Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-05-19T19:52:51.035Z] ../base/collection_wrapper.py:63: in num_entities

[2024-05-19T19:52:51.035Z]     self.flush()

[2024-05-19T19:52:51.035Z] ../utils/wrapper.py:18: in inner_wrapper

[2024-05-19T19:52:51.035Z]     res, result = func(*args, **kwargs)

[2024-05-19T19:52:51.035Z] ../base/collection_wrapper.py:159: in flush

[2024-05-19T19:52:51.035Z]     check_result = ResponseChecker(res, func_name, check_task,

[2024-05-19T19:52:51.035Z] ../check/func_check.py:34: in run

[2024-05-19T19:52:51.035Z]     result = self.assert_succ(self.succ, True)

[2024-05-19T19:52:51.035Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-05-19T19:52:51.035Z] 

[2024-05-19T19:52:51.035Z] self = <check.func_check.ResponseChecker object at 0x7efd8c4c69d0>

[2024-05-19T19:52:51.035Z] actual = False, expect = True

[2024-05-19T19:52:51.035Z] 

[2024-05-19T19:52:51.035Z]     def assert_succ(self, actual, expect):

[2024-05-19T19:52:51.035Z] >       assert actual is expect, f"Response of API {self.func_name} expect {expect}, but got {actual}"

[2024-05-19T19:52:51.035Z] E       AssertionError: Response of API flush expect True, but got False

[2024-05-19T19:52:51.035Z] 

[2024-05-19T19:52:51.035Z] ../check/func_check.py:112: AssertionError

[2024-05-19T19:52:51.035Z] ------------------------------ Captured log setup ------------------------------

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - INFO - ci_test]: ################################################################################ (conftest.py:232)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:233)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - INFO - ci_test]: [setup_class] Start setup class... (client_base.py:38)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:44)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:45)

[2024-05-19T19:52:51.035Z] ------------------------------ Captured log call -------------------------------

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [Connections.connect] args: ['default', '', '', 'default', ''], kwargs: {'host': '10.255.255.214', 'port': 19530} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [list_database] args: ['default', None], kwargs: {} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : ['prod', 'default']  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - INFO - ci_test]: all database: ['prod', 'default'] (test_data_persistence.py:26)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [using_database] args: ['default', 'default'], kwargs: {} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['int64', <DataType.INT64: 5>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : {'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>}  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['varchar', <DataType.VARCHAR: 21>, ''], kwargs: {'max_length': 65535, 'is_primary': False} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['float', <DataType.FLOAT: 10>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['json_field', <DataType.JSON: 23>, ''], kwargs: {'is_primary': False} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : {'name': 'json_field', 'description': '', 'type': <DataType.JSON: 23>}  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [FieldSchema] args: ['float_vector', <DataType.FLOAT_VECTOR: 101>, ''], kwargs: {'dim': 128, 'is_primary': False} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [CollectionSchema] args: [[{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535}}, {'name': 'json_field', 'description': '', 'type': <DataTyp......, kwargs: {'primary_field': 'int64', 'auto_id': False, 'enable_dynamic_field': False} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params......  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : True  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [Collection] args: ['Hello_Milvus', {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type': <DataType.VARC......, kwargs: {'consistency_level': 'Strong'} (api_request.py:62)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_response) : <Collection>:

[2024-05-19T19:52:51.035Z] -------------

[2024-05-19T19:52:51.035Z] <name>: Hello_Milvus

[2024-05-19T19:52:51.035Z] <description>: 

[2024-05-19T19:52:51.035Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'n......  (api_request.py:37)

[2024-05-19T19:52:51.035Z] [2024-05-19 19:46:53 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-05-19T19:52:51.036Z] [2024-05-19 19:49:44 - ERROR - pymilvus.decorators]: RPC error: [flush], <MilvusException: (code=901, message=can not find session: node not found[node=11])>, <Time:{'RPC start': '2024-05-19 19:46:53.095561', 'RPC error': '2024-05-19 19:49:44.191517'}> (decorators.py:139)

[2024-05-19T19:52:51.036Z] [2024-05-19 19:49:44 - ERROR - ci_test]: Traceback (most recent call last):

[2024-05-19T19:52:51.036Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper

[2024-05-19T19:52:51.036Z]     res = func(*args, **_kwargs)

[2024-05-19T19:52:51.036Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request

[2024-05-19T19:52:51.036Z]     return func(*arg, **kwargs)

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 319, in flush

[2024-05-19T19:52:51.036Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 140, in handler

[2024-05-19T19:52:51.036Z]     raise e from e

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2024-05-19T19:52:51.036Z]     return func(*args, **kwargs)

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 175, in handler

[2024-05-19T19:52:51.036Z]     return func(self, *args, **kwargs)

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 115, in handler

[2024-05-19T19:52:51.036Z]     raise e from e

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 86, in handler

[2024-05-19T19:52:51.036Z]     return func(*args, **kwargs)

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 1419, in flush

[2024-05-19T19:52:51.036Z]     check_status(response.status)

[2024-05-19T19:52:51.036Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/utils.py", line 63, in check_status

[2024-05-19T19:52:51.036Z]     raise MilvusException(status.code, status.reason, status.error_code)

[2024-05-19T19:52:51.036Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=901, message=can not find session: node not found[node=11])>

[2024-05-19T19:52:51.036Z]  (api_request.py:45)

[2024-05-19T19:52:51.036Z] [2024-05-19 19:49:44 - ERROR - ci_test]: (api_response) : <MilvusException: (code=901, message=can not find session: node not found[node=11])> (api_request.py:46)

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/14448/pipeline

log:

artifacts-etcd-pod-failure-14448-server-logs.tar.gz

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@zhuwenxing
Copy link
Contributor Author

/assign @XuanYang-cn

@zhuwenxing
Copy link
Contributor Author

there is also another grpc error


[2024-05-19T19:59:10.906Z] [2024-05-19 19:53:09 - DEBUG - ci_test]: (api_response) : <Collection>:

[2024-05-19T19:59:10.906Z] -------------

[2024-05-19T19:59:10.906Z] <name>: QueryChecker__JyGnax9q

[2024-05-19T19:59:10.906Z] <description>: 

[2024-05-19T19:59:10.906Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT:......  (api_request.py:37)

[2024-05-19T19:59:10.906Z] [2024-05-19 19:53:09 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-05-19T19:59:10.906Z] [2024-05-19 19:56:09 - ERROR - pymilvus.decorators]: grpc RpcError: [flush], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-05-19 19:53:09.557337', 'gRPC error': '2024-05-19 19:56:09.558691'}> (decorators.py:150)

[2024-05-19T19:59:10.906Z] [2024-05-19 19:56:09 - ERROR - ci_test]: Traceback (most recent call last):

[2024-05-19T19:59:10.906Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper

[2024-05-19T19:59:10.906Z]     res = func(*args, **_kwargs)

[2024-05-19T19:59:10.906Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request

[2024-05-19T19:59:10.906Z]     return func(*arg, **kwargs)

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 319, in flush

[2024-05-19T19:59:10.906Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 154, in handler

[2024-05-19T19:59:10.906Z]     raise e from e

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2024-05-19T19:59:10.906Z]     return func(*args, **kwargs)

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 175, in handler

[2024-05-19T19:59:10.906Z]     return func(self, *args, **kwargs)

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 90, in handler

[2024-05-19T19:59:10.906Z]     raise e from e

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 86, in handler

[2024-05-19T19:59:10.906Z]     return func(*args, **kwargs)

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 1418, in flush

[2024-05-19T19:59:10.906Z]     response = future.result()

[2024-05-19T19:59:10.906Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 881, in result

[2024-05-19T19:59:10.906Z]     raise self

[2024-05-19T19:59:10.906Z] grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:

[2024-05-19T19:59:10.906Z] 	status = StatusCode.DEADLINE_EXCEEDED

[2024-05-19T19:59:10.906Z] 	details = "Deadline Exceeded"

[2024-05-19T19:59:10.906Z] 	debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-05-19T19:56:09.558020013+00:00", grpc_status:4}"

[2024-05-19T19:59:10.906Z] >

[2024-05-19T19:59:10.906Z]  (api_request.py:45)

[2024-05-19T19:59:10.907Z] [2024-05-19 19:56:09 - ERROR - ci_test]: (api_response) : <_MultiThreadedRendezvous of RPC that terminated with:

[2024-05-19T19:59:10.907Z] 	status = StatusCode.DEADLINE_EXCEEDED

[2024-05-19T19:59:10.907Z] 	details = "Deadline Exceeded"

[2024-05-19T19:59:10.907Z] 	debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-05-19T19:56:09.558020013+00:00", grpc_status:4}"

[2024-05-19T19:59:10.907Z] > (api_request.py:46)

@yanliang567
Copy link
Contributor

/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@yanliang567 yanliang567 added this to the 2.4.2 milestone May 20, 2024
sre-ci-robot pushed a commit that referenced this issue May 22, 2024
See also: #33151, #33149

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
XuanYang-cn added a commit to XuanYang-cn/milvus that referenced this issue May 22, 2024
sre-ci-robot pushed a commit that referenced this issue May 22, 2024
See also: #33151, #33149
pr: #33193

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
@XuanYang-cn
Copy link
Contributor

/assign @zhuwenxing
/unassign
Please help verify

@zhuwenxing
Copy link
Contributor Author

Not reproduced

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants