阿里 qwen 千问大模型私有部署
千问大模型体系
Qwen 模型体系介绍
大大增强了编码和数学能力。 它在指令遵循、长文本生成(超过 8K 个 token)、理解结构化数据(例如表格)和生成结构化输出(尤其是 JSON 格式)方面取得了重大进步。它还对各种系统提示更具弹性,改善了聊天机器人的角色扮演和条件设置。 它支持多达 128K 个 token 的长上下文,最多可生成 8K 个 token。 它为超过 29 种语言提供多语言支持。
- Qwen 文本生成
- Qwen-Coder 代码理解与生成
- Qwen-Math 数学
- Qwen-Audio 音频识别
- Qwen-VL 视觉识别
Ollama 部署方式


HF TGI 部署方式
# Deploy with docker on Linux:
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e HF_TOKEN="<secret>" \
-p 8000:80 \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id Qwen/Qwen2.5-7B-Instruct
# Call the server using curl:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
langchain 编程调用
import json
from langchain_ollama import ChatOllama
from langgraph.prebuilt import create_react_agent
def weather(city: str):
"""
查询天气
"""
if city == '北京':
return '北京晴朗'
elif city == '上海':
return '上海多云'
else:
return f'不知道 {city}.'
def test_qwen():
llm = ChatOllama(model='qwen2.5', temperature=0)
query = '北京天气如何'
result = llm.invoke([('user', query)])
print(result)
def test_qwen_agent():
llm = ChatOllama(model='qwen2.5', temperature=0)
tools = [weather]
langgraph_agent_executor = create_react_agent(llm, tools)
query = '北京天气如何'
result = langgraph_agent_executor.invoke({"messages": [("human", query)]})
messages = [message.model_dump() for message in result['messages']]
print(json.dumps(messages, indent=2, ensure_ascii=False))
千问 Agent 输出
[
{
"content": "北京天气如何",
"additional_kwargs": {},
"response_metadata": {},
"type": "human",
"name": null,
"id": "347ccc50-4a18-4c5d-98c5-14d20caa04b7",
"example": false
},
{
"content": "",
"additional_kwargs": {},
"response_metadata": {
"model": "qwen2.5",
"created_at": "2024-10-31T04:12:17.997999Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "weather",
"arguments": {
"city": "北京"
}
}
}
]
},
"done_reason": "stop",
"done": true,
"total_duration": 7989638042,
"load_duration": 6329320500,
"prompt_eval_count": 151,
"prompt_eval_duration": 786551000,
"eval_count": 19,
"eval_duration": 869534000
},
"type": "ai",
"name": null,
"id": "run-169bc6ba-0ef7-4bf1-b48f-94af085031ad-0",
"example": false,
"tool_calls": [
{
"name": "weather",
"args": {
"city": "北京"
},
"id": "9e9ce2e0-7982-4be3-8f5d-29d1bd55e871",
"type": "tool_call"
}
],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 151,
"output_tokens": 19,
"total_tokens": 170
}
},
{
"content": "北京晴朗",
"additional_kwargs": {},
"response_metadata": {},
"type": "tool",
"name": "weather",
"id": "1411793a-45ce-4bea-b112-ceee64008129",
"tool_call_id": "9e9ce2e0-7982-4be3-8f5d-29d1bd55e871",
"artifact": null,
"status": "success"
},
{
"content": "北京现在是晴朗的天气。请注意防晒哦!",
"additional_kwargs": {},
"response_metadata": {
"model": "qwen2.5",
"created_at": "2024-10-31T04:12:18.699542Z",
"message": {
"role": "assistant",
"content": "北京现在是晴朗的天气。请注意防晒哦!"
},
"done_reason": "stop",
"done": true,
"total_duration": 693873250,
"load_duration": 14966208,
"prompt_eval_count": 190,
"prompt_eval_duration": 173632000,
"eval_count": 13,
"eval_duration": 498376000
},
"type": "ai",
"name": null,
"id": "run-5666b25b-1a58-48b0-b00f-ecd47ad27082-0",
"example": false,
"tool_calls": [],
"invalid_tool_calls": [],
"usage_metadata": {
"input_tokens": 190,
"output_tokens": 13,
"total_tokens": 203
}
}
]
千问视觉识别模型
Qwen2-VL 视觉识别模型能力
- 对各种分辨率和比例的图像的理解
- 理解 20 分钟+ 的视频
- 可以操作您的手机、机器人等
- 多语言支持

下载大模型
# 开启大文件存储功能
git lfs install
# 魔搭下载
git clone https://www.modelscope.cn/Qwen/Qwen2-VL-7B-Instruct.git
# HF下载
git clone https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
启动官方 UI 服务
python web_demo_mm.py -c ~/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct/

编程调用
# Load model directly
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
def test_vl():
model_path = '/Users/seveniruby/.cache/modelscope/hub/qwen/Qwen2-VL-7B-Instruct/'
# default: Load the model on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_path, torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_path)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
# inputs = inputs.to("cuda")
# 使用mac mps
# inputs = inputs.to("mps")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
# 输出
['The image depicts a serene beach scene with a woman and her dog enjoying a moment together. The woman is sitting on the sandy beach, facing the ocean, and appears to be engaging in a playful activity with her dog. She is wearing a plaid shirt and dark pants, and has long hair that flows down her back. Her expression is one of happiness and contentment as she smiles at her dog.\n\nThe dog, which appears to be a large breed, possibly a Labrador Retriever, is sitting beside her on the sand. The dog is wearing a harness with a leash attached, suggesting that it is under control and not running freely']
