You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use the corresponding whl from the release version, here's an example for the cuda11 version 0.1.0, for the cuda12 whl package please check the release page.
HI DevTeam,
Could you give me a hand to check this issue, thanks so much!
After installed the whl package successfully, follow this guide,
cd rtp-llm
For cuda12 environment, please use requirements_torch_gpu_cuda12.txt
pip3 install -r ./open_source/deps/requirements_torch_gpu.txt
Use the corresponding whl from the release version, here's an example for the cuda11 version 0.1.0, for the cuda12 whl package please check the release page.
pip3 install maga_transformer-0.1.9+cuda118-cp310-cp310-manylinux1_x86_64.whl
start http service
cd ../
TOKENIZER_PATH=/path/to/tokenizer CHECKPOINT_PATH=/path/to/model MODEL_TYPE=your_model_type FT_SERVER_TEST=1 python3 -m maga_transformer.start_server
Issues
It generated the following issue, could you give some suggestions,
(rtp-llm) h@acc:/opt/HF-MODEL$ TOKENIZER_PATH=/opt/HF-MODEL/huggingface-model/qwen-7b CHECKPOINT_PATH=/opt/HF-MODEL/huggingface-model/qwen-7b MODEL_TYPE=qwen FT_SERVER_TEST=1 python3 -m maga_transformer.start_server
[process-385289][root][05/10/2024 15:11:35][init.py:():14][INFO] init logger end
[process-385289][root][05/10/2024 15:11:37][init.py:():28][INFO] no internal_source found
[process-385289][root][05/10/2024 15:11:37][hippo_helper.py:HippoHelper():13][INFO] get container_ip from socket:127.0.1.1
[process-385289][root][05/10/2024 15:11:37][report_worker.py:init():31][INFO] kmonitor report default tags: {}
[process-385289][root][05/10/2024 15:11:37][report_worker.py:init():44][INFO] test mode, kmonitor metrics not reported.
[process-385289][root][05/10/2024 15:11:37][gpu_util.py:init():30][INFO] detected [4] gpus
[process-385289][root][05/10/2024 15:11:38][init.py:():9][INFO] no internal_source found
[process-385289][root][05/10/2024 15:11:38][start_server.py:local_rank_start():30][INFO] start local WorkerInfo: [ip=127.0.1.1 server_port=8088 gang_hb_port=8089 name= info=None ], ParallelInfo:[ tp_size=1 pp_size=1 world_size=1 world_rank=0 local_world_size=1 ]
[process-385289][root][05/10/2024 15:11:38][inference_server.py:_init_controller():87][INFO] CONCURRENCY_LIMIT to 32
[process-385289][root][05/10/2024 15:11:38][gang_server.py:start():173][INFO] world_size==1, do not start gang_server
[process-385289][root][05/10/2024 15:11:38][util.py:copy_gemm_config():131][INFO] not found gemm_config in HIPPO_APP_INST_ROOT, not copy
[process-385289][root][05/10/2024 15:11:38][inference_worker.py:init():51][INFO] starting InferenceWorker
[process-385289][root][05/10/2024 15:11:38][model_factory.py:create_normal_model_config():116][INFO] load model from tokenizer_path: /opt/HF-MODEL/huggingface-model/qwen-7b, ckpt_path: /opt/HF-MODEL/huggingface-model/qwen-7b, lora_infos: {}, ptuning_path: None
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():303][INFO] max_seq_len: 8192
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_config_with_sparse_config():172][INFO] read sparse config from: /opt/HF-MODEL/huggingface-model/qwen-7b/config.json
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:check():64][INFO] sparse config layer_num must not be empty
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_ptuning_config():260][INFO] use ptuning from model_config set by env, None
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_ptuning_config():267][INFO] load ptuing config from /opt/HF-MODEL/huggingface-model/qwen-7b/config.json
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_ptuning_config():274][INFO] read ptuning config, pre_seq_len:0, prefix_projection:False
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():313][INFO] seq_size_per_block: 8
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():315][INFO] max_generate_batch_size: 128
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():317][INFO] max_context_batch_size: 1
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():319][INFO] reserve_runtime_mem_mb: 1024
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():321][INFO] kv_cache_mem_mb: -1
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():323][INFO] pre_allocate_op_mem: True
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():325][INFO] int8_kv_cache: False
[process-385289][root][05/10/2024 15:11:38][gpt_init_model_parameters.py:update_common():329][INFO] tp_split_emb_and_lm_head: True
[process-385289][root][05/10/2024 15:11:38][model_weights_loader.py:estimate_load_parallel_num():610][INFO] free_mem: 23.26 model_mem: 14.38, load weights by 2 process
[process-385289][root][05/10/2024 15:11:38][model_weights_loader.py:init():87][INFO] merge lora is enable ? : False
[process-385438][root][05/10/2024 15:11:38][init.py:():14][INFO] init logger end
[process-385437][root][05/10/2024 15:11:38][init.py:():14][INFO] init logger end
[process-385437][root][05/10/2024 15:11:40][init.py:():28][INFO] no internal_source found
[process-385438][root][05/10/2024 15:11:40][init.py:():28][INFO] no internal_source found
[process-385437][root][05/10/2024 15:11:40][hippo_helper.py:HippoHelper():13][INFO] get container_ip from socket:127.0.1.1
[process-385437][root][05/10/2024 15:11:40][report_worker.py:init():31][INFO] kmonitor report default tags: {}
[process-385437][root][05/10/2024 15:11:40][report_worker.py:init():44][INFO] test mode, kmonitor metrics not reported.
[process-385438][root][05/10/2024 15:11:40][hippo_helper.py:HippoHelper():13][INFO] get container_ip from socket:127.0.1.1
[process-385438][root][05/10/2024 15:11:40][report_worker.py:init():31][INFO] kmonitor report default tags: {}
[process-385438][root][05/10/2024 15:11:40][report_worker.py:init():44][INFO] test mode, kmonitor metrics not reported.
[process-385438][root][05/10/2024 15:11:40][gpu_util.py:init():30][INFO] detected [4] gpus
[process-385437][root][05/10/2024 15:11:40][gpu_util.py:init():30][INFO] detected [4] gpus
[process-385438][root][05/10/2024 15:11:41][init.py:():9][INFO] no internal_source found
[process-385437][root][05/10/2024 15:11:41][init.py:():9][INFO] no internal_source found
[process-385289][root][05/10/2024 15:11:47][gpt.py:_load_weights():172][INFO] load weights time: 8.23 s
load final_layernorm.gamma to torch.Size([4096])
load final_layernorm.beta to torch.Size([4096])
+------------------------------------------+
| MODEL CONFIG |
+-----------------------+------------------+
| Options | Values |
+-----------------------+------------------+
| model_type | QWen |
| act_type | WEIGHT_TYPE.FP16 |
| weight_type | WEIGHT_TYPE.FP16 |
| max_seq_len | 8192 |
| use_sparse_head | False |
| use_multi_task_prompt | None |
| use_medusa | False |
| lora_infos | {} |
+-----------------------+------------------+
[process-385289][root][05/10/2024 15:11:47][async_model.py:init():28][INFO] first mem info: used:16259481600 free: 9510322176
[process-385289][root][05/10/2024 15:11:47][engine_creator.py:create_engine():46][INFO] executor_type: ExecutorType.Normal
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO][RANK 0][139646433424000][24-05-10 15:11:47] MMHA multi_block_mode is enabled
Segmentation fault (core dumped)
When running the example test, it generated the following issues,
(rtp-llm) h@acc:/opt/HF-MODEL/rtp-llm$ python example/test.py
Fetching 24 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 26051.58it/s]
load final_layernorm.gamma to torch.Size([2048])
load final_layernorm.beta to torch.Size([2048])
+------------------------------------------+
| MODEL CONFIG |
+-----------------------+------------------+
| Options | Values |
+-----------------------+------------------+
| model_type | QWen |
| act_type | WEIGHT_TYPE.FP16 |
| weight_type | WEIGHT_TYPE.FP16 |
| max_seq_len | 8192 |
| use_sparse_head | False |
| use_multi_task_prompt | None |
| use_medusa | False |
| lora_infos | None |
+-----------------------+------------------+
[WARNING] gemm_config.in is not found; using default GEMM algo
[FT][INFO][RANK 0][140690512618112][24-05-10 14:59:40] MMHA multi_block_mode is enabled
Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered: