2、DeepSeek OCR 生产级vLLM本地部署

## 1. 什么是 DeepSeek OCR？
DeepSeek OCR 是一款两阶段 Transformer 文档 AI，先将页面图像压缩成紧凑的视觉 Token，再以高
容量的专家混合语言模型解码。阶段一融合窗口化 SAM 视觉 Transformer、致密 CLIP-Large 编码器与
16× 卷积压缩器；阶段二使用 DeepSeek-3B-MoE 解码器（每个 Token 激活约 5.7 亿参数），以最小损
耗重建文本、HTML 与图示标注。

模型训练覆盖 3000 万页真实 PDF 及合成图表、公式与示意图，可保留版式结构、表格、化学式
（SMILES）与几何任务。得益于 CLIP 血统，多模态能力完整保留——即使在激进压缩后，字幕与目标
定位仍旧准确。

官方文档：DeepSeek OCR | 下一代文档智能

**总结**：
- **技术本质突破**：
	- DeepSeek OCR 通过光学二维映射技术，将文本信息转化为视觉图像进行高效处理，而非传统
的字符识别路径。
	- 极致压缩能力：能够将一页文档从传统OCR所需的几千个 tokens压缩到仅100 tokens左右，
压缩率高达达10倍，同时保持了97%以上的准确率。
- **架构创新**：
	- DeepSeek OCR 采用了两阶段设计：由DeepEncoder（3.8亿参数的一个视觉编码器）和
DeepSeek-3B-MoE（30亿参数语言解码器）组成，其中DeepEncoder是核心创新引擎。
	- 三阶段编码流程：从局部感知(SAM-base)→特征提取→光学2D映射压缩，实现内存可控的高
效处理。
- **革命性价值**：
	- DeepSeek OCR 专为大语言模型时代构建，解决了传统OCR在大语言模型应用中的上下文窗
口瓶颈问题。
	- 为多模态AI提供了高效的信息压缩范式，让视觉与语言能力真正融合，让AI像人类一样"理
解"文档结构和语义，而非简单识别文字。这种创新使DeepSeek OCR成为首个真正为大模型
生态设计的OCR系统，实现了从"字符识别"到"文档理解"的范式转变。

## 2. vLLM本地部署DeepSeek OCR
### 2.1 创建云服务器

通过以下链接注册，即可获得10元免费算力使用，租云服务器价格更低。

[https://console.compshare.cn/light-gpu/](https://console.compshare.cn/light-gpu/ "https://console.compshare.cn/light-gpu/")

1. 扫码注册并进行实名认证
2. 进入控制台 - 部署实例
3. VSCode 远程连接云服务器

### 2.2 Conda 环境配置

```python
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr

```

### 2.3 下载DeepSeek OCR模型

使用 modelscope sdk 下载 DeepSeek OCR 模型

```python
pip install modelscope
```

```python
# download.py
from modelscope import snapshot_download
model_dir = snapshot_download('deepseek-ai/DeepSeek-OCR',cache_dir="/root/llms/")
```

### 2.4 克隆DeepSeek OCR到本地

Github 拉取的是 DeepSeek OCR 推理Demo

```python
git clone https://github.com/deepseek-ai/DeepSeek-OCR
```

### 2.5 安装依赖包

```python
# 下载 vllm-0.8.5 安装包
wget https://github.com/vllm-project/vllm/releases/download/v0.8.5/vllm0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url
https://download.pytorch.org/whl/cu118
pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
```

**注意：若出现flash attention 编译了很久都没有结果，跳转下面的链接下载对应的版本到本地，直接安
装即可！**

https://github.com/mjun0812/flash-attention-prebuild-wheels/releases?page=3

```python
# 下载
wget https://github.com/mjun0812/flash-attention-prebuildwheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu118torch2.6-cp312-cp312-
linux_x86_64.whl
# 安装
pip install flash_attn-2.7.4.post1+cu118torch2.6-cp312-cp312-linux_x86_64.whl
```

**注意：如果你希望 vLLM 和 transformers 代码在同一环境中运行，你无需担心类似以下的安装错误：
vllm 0.8.5+cu118 需要 transformers>=4.51.1**

```python
ERROR: pip's dependency resolver does not currently take into account all the
packages that are installed. This behaviour is the source of the following
dependency conflicts.
vllm 0.8.5+cu118 requires tokenizers>=0.21.1, but you have tokenizers 0.20.3
which is incompatible.
vllm 0.8.5+cu118 requires transformers>=4.51.1, but you have transformers 4.46.3
which is incompatible.
```

### 2.6 vLLM推理部署

注意：修改 DeepSeek-OCR-master/DeepSeek-OCR-vllm/config.py 中的
INPUT_PATH/OUTPUT_PATH 及其他设置

```python
MODEL_PATH = '/root/llms/deepseek-ai/DeepSeek-OCR # 本地模型路径
INPUT_PATH = '/root/DeepSeek-OCR/input/1.jpg' # 输入路径，文档类型要和运行的脚本类型对应
OUTPUT_PATH = '/root/DeepSeek-OCR/output' # 输出路径
PROMPT = '<image>\n<|grounding|>Convert the document to markdown.'
# PROMPT = '<image>\nFree OCR.'
# TODO commonly used prompts
# document: <image>\n<|grounding|>Convert the document to markdown.
# other image: <image>\n<|grounding|>OCR this image.
# without layouts: <image>\nFree OCR.
# figures in document: <image>\nParse the figure.
# general: <image>\nDescribe this image in detail.
# rec: <image>\nLocate <|ref|>xxxx<|/ref|> in the image.
# '先天下之忧而忧'
# .......
```

**1. Image图片解析**

```python
python run_dpsk_ocr_image.py
```

**2. PDF文档解析**

```python
python run_dpsk_ocr_pdf.py
```

### 2.7 WebUI 部署

参考：[GitHub - neosun100/DeepSeek-OCR-WebUI](https://github.com/neosun100/DeepSeek-OCR-WebUI "1")

## 3. MinerU

Github：[MinerU](https://github.com/opendatalab/MinerU "MinerU")

## 4. PaddleOCR

Github：[https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR "https://github.com/PaddlePaddle/PaddleOCR")