pytorch vllm使用
Last updated on May 20, 2025 am
🧙 Questions
☄️ Ideas
vllm离线安装
#python3.9+环境
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-vault-8.5.2111.repo
yum clean all && yum makecache
yum remove python3
yum install python39
python3 -m pip install torch
python3 -V
wget https://github.com/vllm-project/vllm/releases/download/v0.8.4/vllm-0.8.4-cp38-abi3-manylinux1_x86_64.whl
python3 -m pip install /tmp/vllm-0.8.4-cp38-abi3-manylinux1_x86_64.whl
# git clone https://github.com/huggingface/transformers.git
# cd transformers
# pip3 install -e .
pip3 install torch
pip3 install 'transformers[torch]'
pip3 install fastapi uvicorn
uvicorn api:app --host 0.0.0.0 --port 8000
curl -X POST http://127.0.0.1:33221/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "你是谁?"}'
Collecting torch
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/40/bb/feb5644baa621fd8e1e88bf51f6fa38ab3f985d472a764144ff4867ac1d6/torch-2.6.0-cp39-cp39-manylinux1_x86_64.whl (766.7 MB)
Collecting nvidia-nccl-cu12==2.21.5; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/df/99/12cd266d6233f47d00daf3a72739872bdc10267d0383508b0b9c84a18bb6/nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)
Collecting nvidia-nvtx-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/87/20/199b8713428322a2f22b722c62b8cc278cc53dffa9705d744484b5035ee9/nvidia_nvtx_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (99 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/67/42/f4f60238e8194a3106d06a058d494b18e006c10bb2b915655bd9f6ea4cb1/nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)
Collecting nvidia-curand-cu12==10.3.5.147; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/8a/6d/44ad094874c6f1b9c654f8ed939590bdc408349f137f9b98a3a23ccec411/nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)
Collecting nvidia-cublas-cu12==12.4.5.8; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/ae/71/1c91302526c45ab494c23f61c7a84aa568b8c1f9d196efa5993957faf906/nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)
Collecting filelock
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/4d/36/2a115987e2d8c300a974597416d9de88f2444426de9571f4b59b2cca3acc/filelock-3.18.0-py3-none-any.whl (16 kB)
Collecting triton==3.2.0; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/bc/74/9f12bdedeb110242d8bb1bd621f6605e753ee0cbf73cf7f3a62b8173f190/triton-3.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (253.1 MB)
Collecting nvidia-cusparse-cu12==12.3.1.170; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/db/f7/97a9ea26ed4bbbfc2d470994b8b4f338ef663be97b8f677519ac195e113d/nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)
Collecting typing-extensions>=4.10.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/8b/54/b1ae86c0973cc6f0210b53d508ca3641fb6d0c56823f288d108bc7ab3cc8/typing_extensions-4.13.2-py3-none-any.whl (45 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/2c/14/91ae57cd4db3f9ef7aa99f4019cfa8d54cb4caa7e00975df6467e9725a9f/nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)
Collecting sympy==1.13.1; python_version >= "3.9"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/b2/fe/81695a1aa331a842b582453b605175f419fe8540355886031328089d840a/sympy-1.13.1-py3-none-any.whl (6.2 MB)
Collecting nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/ea/27/1795d86fe88ef397885f2e580ac37628ed058a92ed2c39dc8eac3adf0619/nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)
Collecting jinja2
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl (134 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/9f/fd/713452cd72343f682b1c7b9321e23829f00b842ceaedcda96e742ea0b0b3/nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
Collecting nvidia-cusparselt-cu12==0.6.2; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/78/a8/bcbb63b53a4b1234feeafb65544ee55495e1bb37ec31b999b963cbccfd1d/nvidia_cusparselt_cu12-0.6.2-py3-none-manylinux2014_x86_64.whl (150.1 MB)
Collecting nvidia-cusolver-cu12==11.6.1.9; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/3a/e1/5b9089a4b2a4790dfdea8b3a006052cfecff58139d5a4e34cb1a51df8d6f/nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)
Collecting nvidia-cufft-cu12==11.2.1.3; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/27/94/3266821f65b92b3138631e9c8e7fe1fb513804ac934485a8d05776e1dd43/nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)
Collecting fsspec
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/44/4b/e0cfc1a6f17e990f3e64b7d941ddc4acdc7b19d6edd51abf495f32b1a9e4/fsspec-2025.3.2-py3-none-any.whl (194 kB)
Collecting networkx
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/d5/f0/8fbc882ca80cf077f1b246c0e3c3465f7f415439bdea6b899f6b19f61f70/networkx-3.2.1-py3-none-any.whl (1.6 MB)
Collecting nvidia-nvjitlink-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/ff/ff/847841bacfbefc97a00036e0fce5a0f086b640756dc38caea5e1bb002655/nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
Collecting mpmath<1.4,>=1.1.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl (536 kB)
Collecting MarkupSafe>=2.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/53/8f/f339c98a178f3c1e545622206b40986a4c3307fe39f70ccd3d9df9a9e425/MarkupSafe-3.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (20 kB)
Collecting transformers[torch]
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a9/b6/5257d04ae327b44db31f15cce39e6020cc986333c715660b1315a9724d82/transformers-4.51.3-py3-none-any.whl (10.4 MB)
Collecting pyyaml>=5.1
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/3d/32/e7bd8535d22ea2874cef6a81021ba019474ace0d13a4819c2a4bce79bd6a/PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB)
Collecting tokenizers<0.22,>=0.21
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/8a/63/38be071b0c8e06840bc6046991636bcb30c27f6bb1e670f4f4bc87cf49cc/tokenizers-0.21.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Requirement already satisfied: filelock in /usr/local/lib/python3.9/site-packages (from transformers[torch]) (3.18.0)
Collecting numpy>=1.17
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/b9/14/78635daab4b07c0930c919d451b8bf8c164774e6a3413aed04a6d95758ce/numpy-2.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
Collecting tqdm>=4.27
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl (78 kB)
Collecting requests
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/f9/9b/335f9764261e915ed497fcdeb11df5dfd6f7bf257d4a6a2a686d80da4d54/requests-2.32.3-py3-none-any.whl (64 kB)
Collecting huggingface-hub<1.0,>=0.30.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/93/27/1fb384a841e9661faad1c31cbfa62864f59632e876df5d795234da51c395/huggingface_hub-0.30.2-py3-none-any.whl (481 kB)
Collecting packaging>=20.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/20/12/38679034af332785aac8774540895e234f4d07f7545804097de4b666afd8/packaging-25.0-py3-none-any.whl (66 kB)
Collecting safetensors>=0.4.3
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a6/f8/dae3421624fcc87a89d42e1898a798bc7ff72c61f38973a65d60df8f124c/safetensors-0.5.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471 kB)
Collecting regex!=2019.12.17
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/86/44/2101cc0890c3621b90365c9ee8d7291a597c0722ad66eccd6ffa7f1bcc09/regex-2024.11.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (780 kB)
Collecting accelerate>=0.26.0; extra == "torch"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/63/b1/8198e3cdd11a426b1df2912e3381018c4a4a55368f6d0857ba3ca418ef93/accelerate-1.6.0-py3-none-any.whl (354 kB)
Requirement already satisfied: torch>=2.0; extra == "torch" in /usr/local/lib64/python3.9/site-packages (from transformers[torch]) (2.6.0)
Collecting certifi>=2017.4.17
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/38/fc/bce832fd4fd99766c04d1ee0eead6b0ec6486fb100ae5e74c1d91292b982/certifi-2025.1.31-py3-none-any.whl (166 kB)
Collecting idna<4,>=2.5
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/76/c6/c88e154df9c4e1a2a66ccf0005a88dfb2650c1dffb6f5ce603dfbd452ce3/idna-3.10-py3-none-any.whl (70 kB)
Collecting charset-normalizer<4,>=2
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/5a/6d/e2773862b043dcf8a221342954f375392bb2ce6487bcd9f2c1b34e1d6781/charset_normalizer-3.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (146 kB)
Collecting urllib3<3,>=1.21.1
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/6b/11/cc635220681e93a0183390e26485430ca2c7b5f9d33b15c74c2861cb8091/urllib3-2.4.0-py3-none-any.whl (128 kB)
Collecting psutil
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/bf/b9/b0eb3f3cbcb734d930fdf839431606844a825b23eaf9a6ab371edac8162c/psutil-7.0.0-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (277 kB)
Collecting fastapi
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/50/b3/b51f09c2ba432a576fe63758bddc81f78f0c6309d9e5c10d194313bf021e/fastapi-0.115.12-py3-none-any.whl (95 kB)
Collecting starlette<0.47.0,>=0.40.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/8b/0c/9d30a4ebeb6db2b25a841afbb80f6ef9a854fc3b41be131d249a977b4959/starlette-0.46.2-py3-none-any.whl (72 kB)
Collecting pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/b0/1d/407b29780a289868ed696d1616f4aad49d6388e5a77f567dcd2629dcd7b8/pydantic-2.11.3-py3-none-any.whl (443 kB)
Collecting anyio<5,>=3.6.2
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/a1/ee/48ca1a7c89ffec8b6a0c5d02b89c305671d5ffd8d3c94acf8b8c408575bb/anyio-4.9.0-py3-none-any.whl (100 kB)
Collecting typing-inspection>=0.4.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/31/08/aa4fdfb71f7de5176385bd9e90852eaf6b5d622735020ad600f2bab54385/typing_inspection-0.4.0-py3-none-any.whl (14 kB)
Collecting pydantic-core==2.33.1
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/05/a8/fd79111eb5ab9bc4ef98d8fb0b3a2ffdc80107b2c59859a741ab379c96f8/pydantic_core-2.33.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
Collecting annotated-types>=0.6.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl (13 kB)
Collecting exceptiongroup>=1.0.2; python_version < "3.11"
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/02/cc/b7e31358aac6ed1ef2bb790a9746ac2c69bcb3c8588b41616914eb106eaf/exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Collecting sniffio>=1.1
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/e9/44/75a9c9421471a6c4805dbf2356f7c181a29c1879239abab1ea2cc8f38b40/sniffio-1.3.1-py3-none-any.whl (10 kB)
Collecting uvicorn
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/b1/4b/4cef6ce21a2aaca9d852a6e84ef4f135d99fcd74fa75105e2fc0c8308acd/uvicorn-0.34.2-py3-none-any.whl (62 kB)
|████████████████████████████████| 62 kB 19.0 MB/s
Requirement already satisfied: typing-extensions>=4.0; python_version < "3.11" in /usr/local/lib/python3.9/site-packages (from uvicorn) (4.13.2)
Collecting click>=7.0
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/7e/d4/7ebdbd03970677812aac39c869717059dbb71a4cfc033ca6e5221787892c/click-8.1.8-py3-none-any.whl (98 kB)
|████████████████████████████████| 98 kB 28.8 MB/s
Collecting h11>=0.8
Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/95/04/ff642e65ad6b90db43e668d70ffb6736436c7ce41fcc549f4e9472234127/h11-0.14.0-py3-none-any.whl (58 kB)
|████████████████████████████████| 58 kB 42.1 MB/s
🔗 Links
pytorch vllm使用
https://ispong.isxcode.com/pytorch/pytorch/pytorch vllm使用/