Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtp-llm example test issue #56

Open
haic0 opened this issue May 10, 2024 · 1 comment
Open

rtp-llm example test issue #56

haic0 opened this issue May 10, 2024 · 1 comment

Comments

@haic0
Copy link

haic0 commented May 10, 2024

HI DevTeam,
Could you give me a hand to check this issue, thanks so much!

After installed the whl package successfully, follow this guide,
cd rtp-llm

For cuda12 environment, please use requirements_torch_gpu_cuda12.txt

pip3 install -r ./open_source/deps/requirements_torch_gpu.txt

Use the corresponding whl from the release version, here's an example for the cuda11 version 0.1.0, for the cuda12 whl package please check the release page.

pip3 install maga_transformer-0.1.9+cuda118-cp310-cp310-manylinux1_x86_64.whl

start http service

cd ../
TOKENIZER_PATH=/path/to/tokenizer CHECKPOINT_PATH=/path/to/model MODEL_TYPE=your_model_type FT_SERVER_TEST=1 python3 -m maga_transformer.start_server

Issues
It generated the following issue, could you give some suggestions,
(rtp-llm) h@acc:/opt/HF-MODEL$ TOKENIZER_PATH=/opt/HF-MODEL/huggingface-model/qwen-7b CHECKPOINT_PATH=/opt/HF-MODEL/huggingface-model/qwen-7b MODEL_TYPE=qwen FT_SERVER_TEST=1 python3 -m maga_transformer.start_server
[process-385289][root][05/10/2024 15:11:35][init.py:():14][INFO] init logger end
[process-385289][root][05/10/2024 15:11:37][init.py:():28][INFO] no internal_source found
[process-385289][root][05/10/2024 15:11:37][hippo_helper.py:HippoHelper():13][INFO] get container_ip from socket:127.0.1.1
[process-385289][root][05/10/2024 15:11:37][report_worker.py:init():31][INFO] kmonitor report default tags: {}
[process-385289][root][05/10/2024 15:11:37][report_worker.py:init():44][INFO] test mode, kmonitor metrics not reported.
[process-385289][root][05/10/2024 15:11:37][gpu_util.py:init():30][INFO] detected [4] gpus
[process-385289][root][05/10/2024 15:11:38][init.py:():9][INFO] no internal_source found
[process-385289][root][05/10/2024 15:11:38][start_server.py:local_rank_start():30][INFO] start local WorkerInfo: [ip=127.0.1.1 server_port=8088 gang_hb_port=8089 name= info=None ], ParallelInfo:[ tp_size=1 pp_size=1 world_size=1 world_rank=0 local_world_size=1 ]
[process-385289][root][05/10/2024 15:11:38][inference_server.py:_init_controller():87][INFO] CONCURRENCY_LIMIT to 32
[process-385289][root][05/10/2024 15:11:38][gang_server.py:start():173][INFO] world_size==1, do not start gang_server
[process-385289][root][05/10/2024 15:11:38][util.py:copy_gemm_config():131][INFO] not found gemm_config in HIPPO_APP_INST_ROOT, not copy
[process-385289][root][05/10/2024 15:11:38][inference_worker.py:init():51][INFO] starting InferenceWorker
[process-385289][root][05/10/2024 15:11:38][model_factory.py:create_normal_model_config():116][INFO] load model from tokenizer_path: /opt/HF-MODEL/huggingface-model/qwen-7b, ckpt_path: /opt/HF-MODEL/huggingface-model/qwen-7b, lora_infos: {}, ptuning_path: None
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():303][INFO] max_seq_len: 8192
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_config_with_sparse_config():172][INFO] read sparse config from: /opt/HF-MODEL/huggingface-model/qwen-7b/config.json
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:check():64][INFO] sparse config layer_num must not be empty
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_ptuning_config():260][INFO] use ptuning from model_config set by env, None
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_ptuning_config():267][INFO] load ptuing config from /opt/HF-MODEL/huggingface-model/qwen-7b/config.json
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_ptuning_config():274][INFO] read ptuning config, pre_seq_len:0, prefix_projection:False
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():313][INFO] seq_size_per_block: 8
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():315][INFO] max_generate_batch_size: 128
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():317][INFO] max_context_batch_size: 1
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():319][INFO] reserve_runtime_mem_mb: 1024
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():321][INFO] kv_cache_mem_mb: -1
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():323][INFO] pre_allocate_op_mem: True
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():325][INFO] int8_kv_cache: False
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():329][INFO] tp_split_emb_and_lm_head: True
[process-385289][root][05/10/2024 15:11:38][model_weights_loader.py:estimate_load_parallel_num():610][INFO] free_mem: 23.26 model_mem: 14.38, load weights by 2 process
[process-385289][root][05/10/2024 15:11:38][model_weights_loader.py:init():87][INFO] merge lora is enable ? : False
[process-385438][root][05/10/2024 15:11:38][init.py:():14][INFO] init logger end
[process-385437][root][05/10/2024 15:11:38][init.py:():14][INFO] init logger end
[process-385437][root][05/10/2024 15:11:40][init.py:():28][INFO] no internal_source found
[process-385438][root][05/10/2024 15:11:40][init.py:():28][INFO] no internal_source found
[process-385437][root][05/10/2024 15:11:40][hippo_helper.py:HippoHelper():13][INFO] get container_ip from socket:127.0.1.1
[process-385437][root][05/10/2024 15:11:40][report_worker.py:init():31][INFO] kmonitor report default tags: {}
[process-385437][root][05/10/2024 15:11:40][report_worker.py:init():44][INFO] test mode, kmonitor metrics not reported.
[process-385438][root][05/10/2024 15:11:40][hippo_helper.py:HippoHelper():13][INFO] get container_ip from socket:127.0.1.1
[process-385438][root][05/10/2024 15:11:40][report_worker.py:init():31][INFO] kmonitor report default tags: {}
[process-385438][root][05/10/2024 15:11:40][report_worker.py:init():44][INFO] test mode, kmonitor metrics not reported.
[process-385438][root][05/10/2024 15:11:40][gpu_util.py:init():30][INFO] detected [4] gpus
[process-385437][root][05/10/2024 15:11:40][gpu_util.py:init():30][INFO] detected [4] gpus
[process-385438][root][05/10/2024 15:11:41][init.py:():9][INFO] no internal_source found
[process-385437][root][05/10/2024 15:11:41][init.py:():9][INFO] no internal_source found
[process-385289][root][05/10/2024 15:11:47][gpt.py:_load_weights():172][INFO] load weights time: 8.23 s
load final_layernorm.gamma to torch.Size([4096])
load final_layernorm.beta to torch.Size([4096])
+------------------------------------------+
| MODEL CONFIG |
+-----------------------+------------------+
| Options | Values |
+-----------------------+------------------+
| model_type | QWen |
| act_type | WEIGHT_TYPE.FP16 |
| weight_type | WEIGHT_TYPE.FP16 |
| max_seq_len | 8192 |
| use_sparse_head | False |
| use_multi_task_prompt | None |
| use_medusa | False |
| lora_infos | {} |
+-----------------------+------------------+
[process-385289][root][05/10/2024 15:11:47][async_model.py:init():28][INFO] first mem info: used:16259481600 free: 9510322176
[process-385289][root][05/10/2024 15:11:47][engine_creator.py:create_engine():46][INFO] executor_type: ExecutorType.Normal
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO][RANK 0][139646433424000][24-05-10 15:11:47] MMHA multi_block_mode is enabled
Segmentation fault (core dumped)

When running the example test, it generated the following issues,

(rtp-llm) h@acc:/opt/HF-MODEL/rtp-llm$ python example/test.py
Fetching 24 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 26051.58it/s]
load final_layernorm.gamma to torch.Size([2048])
load final_layernorm.beta to torch.Size([2048])
+------------------------------------------+
| MODEL CONFIG |
+-----------------------+------------------+
| Options | Values |
+-----------------------+------------------+
| model_type | QWen |
| act_type | WEIGHT_TYPE.FP16 |
| weight_type | WEIGHT_TYPE.FP16 |
| max_seq_len | 8192 |
| use_sparse_head | False |
| use_multi_task_prompt | None |
| use_medusa | False |
| lora_infos | None |
+-----------------------+------------------+

[WARNING] gemm_config.in is not found; using default GEMM algo

[FT][INFO][RANK 0][140690512618112][24-05-10 14:59:40] MMHA multi_block_mode is enabled
Segmentation fault (core dumped)

@dongjiyingdjy
Copy link
Collaborator

dongjiyingdjy commented May 10, 2024

这个问题是因为在cuda12的环境里install了cuda118的whl包;请参考文档使用cuda12的whl包

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants