Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: flush timeout after datanode pod kill chaos test #33153

Closed
1 task done
zhuwenxing opened this issue May 20, 2024 · 3 comments
Closed
1 task done

[Bug]: flush timeout after datanode pod kill chaos test #33153

zhuwenxing opened this issue May 20, 2024 · 3 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4-20240517-780f3137-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-05-19T23:31:20.483Z] _____ TestAllCollection.test_milvus_default[HybridSearchChecker__X6citnkc] _____

[2024-05-19T23:31:20.483Z] [gw1] linux -- Python 3.8.10 /usr/bin/python3.8

[2024-05-19T23:31:20.483Z] 

[2024-05-19T23:31:20.483Z] self = <test_all_collections_after_chaos.TestAllCollection object at 0x7f7f49d42ee0>

[2024-05-19T23:31:20.483Z] collection_name = 'HybridSearchChecker__X6citnkc'

[2024-05-19T23:31:20.483Z] 

[2024-05-19T23:31:20.483Z]     @pytest.mark.tags(CaseLabel.L1)

[2024-05-19T23:31:20.483Z]     def test_milvus_default(self, collection_name):

[2024-05-19T23:31:20.483Z]         self._connect()

[2024-05-19T23:31:20.483Z]         # create

[2024-05-19T23:31:20.483Z]         name = collection_name if collection_name else cf.gen_unique_str("Checker_")

[2024-05-19T23:31:20.483Z]         t0 = time.time()

[2024-05-19T23:31:20.483Z]         schema = Collection(name=name).schema

[2024-05-19T23:31:20.483Z]         collection_w = self.init_collection_wrap(name=name, schema=schema)

[2024-05-19T23:31:20.483Z]         tt = time.time() - t0

[2024-05-19T23:31:20.483Z]         assert collection_w.name == name

[2024-05-19T23:31:20.483Z]         # get collection info

[2024-05-19T23:31:20.483Z]         schema = collection_w.schema

[2024-05-19T23:31:20.483Z]         dim = cf.get_dim_by_schema(schema=schema)

[2024-05-19T23:31:20.483Z]         int64_field_name = cf.get_int64_field_name(schema=schema)

[2024-05-19T23:31:20.483Z]         float_vector_field_name = cf.get_float_vec_field_name(schema=schema)

[2024-05-19T23:31:20.483Z]         float_vector_field_name_list = cf.get_float_vec_field_name_list(schema=schema)

[2024-05-19T23:31:20.483Z]         # compact collection before getting num_entities

[2024-05-19T23:31:20.483Z] >       collection_w.flush(timeout=180)

[2024-05-19T23:31:20.483Z] 

[2024-05-19T23:31:20.483Z] testcases/test_all_collections_after_chaos.py:44: 

[2024-05-19T23:31:20.483Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-05-19T23:31:20.483Z] ../utils/wrapper.py:33: in inner_wrapper

[2024-05-19T23:31:20.483Z]     res, result = func(*args, **kwargs)

[2024-05-19T23:31:20.483Z] ../base/collection_wrapper.py:159: in flush

[2024-05-19T23:31:20.483Z]     check_result = ResponseChecker(res, func_name, check_task,

[2024-05-19T23:31:20.483Z] ../check/func_check.py:34: in run

[2024-05-19T23:31:20.483Z]     result = self.assert_succ(self.succ, True)

[2024-05-19T23:31:20.483Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-05-19T23:31:20.483Z] 

[2024-05-19T23:31:20.483Z] self = <check.func_check.ResponseChecker object at 0x7f7f484d1100>

[2024-05-19T23:31:20.483Z] actual = False, expect = True

[2024-05-19T23:31:20.483Z] 

[2024-05-19T23:31:20.483Z]     def assert_succ(self, actual, expect):

[2024-05-19T23:31:20.483Z] >       assert actual is expect, f"Response of API {self.func_name} expect {expect}, but got {actual}"

[2024-05-19T23:31:20.483Z] E       AssertionError: Response of API flush expect True, but got False

[2024-05-19T23:31:20.483Z] 

[2024-05-19T23:31:20.483Z] ../check/func_check.py:112: AssertionError

[2024-05-19T23:31:20.483Z] ------------------------------ Captured log setup ------------------------------

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:44)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - INFO - ci_test]: [setup_method] Start setup test case test_milvus_default. (client_base.py:45)

[2024-05-19T23:31:20.483Z] ------------------------------ Captured log call -------------------------------

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_request)  : [Connections.connect] args: ['default', '', '', 'default', ''], kwargs: {'host': '10.255.252.189', 'port': 19530} (api_request.py:62)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_request)  : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:62)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_response) : True  (api_request.py:37)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_request)  : [Collection] args: ['HybridSearchChecker__X6citnkc', {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar', 'description': '', 'type......, kwargs: {'consistency_level': 'Strong'} (api_request.py:62)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_response) : <Collection>:

[2024-05-19T23:31:20.483Z] -------------

[2024-05-19T23:31:20.483Z] <name>: HybridSearchChecker__X6citnkc

[2024-05-19T23:31:20.483Z] <description>: 

[2024-05-19T23:31:20.483Z] <schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'int64', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float', 'description': '', 'type': <DataType......  (api_request.py:37)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:28:18 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:31:18 - WARNING - pymilvus.decorators]: Retry timeout: 180s (decorators.py:106)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:31:18 - ERROR - pymilvus.decorators]: RPC error: [flush], <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: HybridSearchChecker__X6citnkc, flusht_ts: 449881387506597901)>, <Time:{'RPC start': '2024-05-19 23:28:18.816618', 'RPC error': '2024-05-19 23:31:18.959539'}> (decorators.py:146)

[2024-05-19T23:31:20.483Z] [2024-05-19 23:31:18 - ERROR - ci_test]: Traceback (most recent call last):

[2024-05-19T23:31:20.484Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 32, in inner_wrapper

[2024-05-19T23:31:20.484Z]     res = func(*args, **_kwargs)

[2024-05-19T23:31:20.484Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 63, in api_request

[2024-05-19T23:31:20.484Z]     return func(*arg, **kwargs)

[2024-05-19T23:31:20.484Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 319, in flush

[2024-05-19T23:31:20.484Z]     conn.flush([self.name], timeout=timeout, **kwargs)

[2024-05-19T23:31:20.484Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 147, in handler

[2024-05-19T23:31:20.484Z]     raise e from e

[2024-05-19T23:31:20.484Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 143, in handler

[2024-05-19T23:31:20.484Z]     return func(*args, **kwargs)

[2024-05-19T23:31:20.484Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 182, in handler

[2024-05-19T23:31:20.484Z]     return func(self, *args, **kwargs)

[2024-05-19T23:31:20.484Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 107, in handler

[2024-05-19T23:31:20.484Z]     raise MilvusException(

[2024-05-19T23:31:20.484Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: HybridSearchChecker__X6citnkc, flusht_ts: 449881387506597901)>

[2024-05-19T23:31:20.484Z]  (api_request.py:45)

[2024-05-19T23:31:20.484Z] [2024-05-19 23:31:18 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=Retry timeout: 180s, message=wait for flush timeout, collection: HybridSearchChecker__X6citnkc, flusht_ts: 449881387506597901)> (api_request.py:46)

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job:https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/13453/pipeline

log:
artifacts-datanode-pod-kill-13453-server-logs.tar.gz

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@yanliang567
Copy link
Contributor

/assign @XuanYang-cn
/unassign

@yanliang567 yanliang567 added this to the 2.4.2 milestone May 20, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 20, 2024
@XuanYang-cn
Copy link
Contributor

/assign @zhuwenxing
After the fix #33258 , is this still happening?

@zhuwenxing
Copy link
Contributor Author

Not reproduced recently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants