Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: 如何支持中文 #1285

Closed
1 task done
laroth opened this issue Apr 10, 2024 · 3 comments
Closed
1 task done

[BUG]: 如何支持中文 #1285

laroth opened this issue Apr 10, 2024 · 3 comments

Comments

@laroth
Copy link

laroth commented Apr 10, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

当上传了包含中文字符的数据集, 会出现utf-8的报错
INFO: 154.64.227.60:22531 - "POST /qa/load_data HTTP/1.1" 500 Internal Server Error
2024-04-10 03:17:06,723 | ERROR | load.py | extract_features | 22 | Error with extracting feature from question 'utf-8' codec can't decode byte 0xb9 in position 17: invalid start byte
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/src/operations/load.py", line 14, in extract_features
data = pd.read_csv(file_dir)
File "/usr/local/lib/python3.8/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 811, in init
self._engine = self._make_engine(self.engine)
2024-04-10 03:17:06,723 | ERROR | h11_impl.py | run_asgi | 372 | Exception in ASGI application
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
Traceback (most recent call last):
File "/app/src/operations/load.py", line 14, in extract_features
data = pd.read_csv(file_dir)
File "/usr/local/lib/python3.8/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 811, in init
self._engine = self._make_engine(self.engine)
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in init
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "pandas/_libs/parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header
File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in init
File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas/_libs/parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.cinit
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 17: invalid start byte
File "pandas/_libs/parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header

File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
During handling of the above exception, another exception occurred:
File "pandas/_libs/parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error

Traceback (most recent call last):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 17: invalid start byte

File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/h11_impl.py", line 369, in run_asgi
During handling of the above exception, another exception occurred:

result = await app(self.scope, self.receive, self.send)

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/uvicorn/protocols/http/h11_impl.py", line 369, in run_asgi
File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 59, in call
result = await app(self.scope, self.receive, self.send)
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/uvicorn/middleware/proxy_headers.py", line 59, in call
File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 199, in call
return await self.app(scope, receive, send)
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/fastapi/applications.py", line 199, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/applications.py", line 112, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 159, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/errors.py", line 159, in call
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 86, in call
await self.app(scope, receive, _send)
await self.simple_response(scope, receive, send, request_headers=headers)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 86, in call
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 142, in simple_response
await self.simple_response(scope, receive, send, request_headers=headers)
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/middleware/cors.py", line 142, in simple_response
File "/usr/local/lib/python3.8/dist-packages/starlette/exceptions.py", line 71, in call
await self.app(scope, receive, sender)
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 580, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/exceptions.py", line 71, in call
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 241, in handle
await self.app(scope, receive, sender)
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 580, in call
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 52, in app
await route.handle(scope, receive, send)
response = await func(request)
File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 216, in app
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 241, in handle
raw_response = await run_endpoint_function(
await self.app(scope, receive, send)
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 52, in app
File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 149, in run_endpoint_function
response = await func(request)
return await dependant.call(**values)
File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 216, in app
File "main.py", line 46, in do_load_api
raw_response = await run_endpoint_function(
total_num = do_load(table_name, fname_path, MODEL, MILVUS_CLI, MYSQL_CLI)
File "/app/src/operations/load.py", line 39, in do_load
File "/usr/local/lib/python3.8/dist-packages/fastapi/routing.py", line 149, in run_endpoint_function
question_data, answer_data, sentence_embeddings = extract_features(file_dir, model)
return await dependant.call(**values)
File "main.py", line 46, in do_load_api
File "/app/src/operations/load.py", line 23, in extract_features
total_num = do_load(table_name, fname_path, MODEL, MILVUS_CLI, MYSQL_CLI)
sys.exit(1)
SystemExit: 1
File "/app/src/operations/load.py", line 39, in do_load
question_data, answer_data, sentence_embeddings = extract_features(file_dir, model)
File "/app/src/operations/load.py", line 23, in extract_features
sys.exit(1)
SystemExit: 1
INFO: 37.128.246.75:12447 - "POST /qa/load_data HTTP/1.1" 500 Internal Server Error

Expected Behavior

No response

Steps To Reproduce

No response

Software version

Milvus: [v2.0.2/milvus-standalone-docker-compose.yml]
Server: [milvusbootcamp/qa-chatbot-server:v1]
Client: [milvusbootcamp/qa-chatbot-client:v1]

Anything else?

No response

@codingjaguar
Copy link
Collaborator

请问这是哪个demo里发生的问题?

@laroth
Copy link
Author

laroth commented Apr 10, 2024

请问这是哪个demo里发生的问题?

Milvus: [v2.0.2/milvus-standalone-docker-compose.yml]
Server: [milvusbootcamp/qa-chatbot-server:v1]
Client: [milvusbootcamp/qa-chatbot-client:v1]
这是使用到的demo版本, 项目地址是https://github.com/milvus-io/bootcamp/blob/v2.0.1/solutions/question_answering_system/quick_deploy/README.md

当.csv文件出现了中文就会报错误
Snipaste_2024-04-10_11-57-16

@DAAworld
Copy link

pandas读取csv文件的时候编码没对。gbk,gb18030,你都试试

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants