开发调试到生产上线,全流程仅需一个工作区——DevPod 重新定义 AI 工程化标准,当开发与部署不再割裂,模型价值才真正释放。
简介
告别碎片化开发体验,DevPod 打造从代码到服务的一站式闭环。本文手把手演示在函数计算 Funmodel 上完成 DeepSeek-OCR 模型从云端开发、本地调试到生产部署的完整工作流,让模型真正走出实验室,实现分钟级服务化,重塑 AI 模型从开发到落地的高效路径。
回顾:为何 DevPod 让 DeepSeek-OCR 启动如此简单?
在系列第一篇《为什么别人用 DevPod 秒启 DeepSeek-OCR,你还在装环境?》中,我们见证了 DevPod 如何将原本繁琐的环境配置、依赖安装、硬件适配等过程压缩至 60 秒内完成。无需再为 CUDA 版本冲突、Python 环境隔离、模型权重下载缓慢而烦恼,DevPod 通过云端预置环境,让开发者一进入工作区就能立即与 DeepSeek-OCR 大模型进行交互,真正实现了"开箱即用"的 AI 开发体验。
DevPod 是一款云原生 AI 开发工具,提供统一工作空间与预置环境,实现开发、测试、生产环境一致性,彻底消除“环境漂移”问题。
它能一键调用 GPU 资源,支持代码编写、调试、模型调优、镜像封装与生产部署全流程操作,无需切换多平台工具。
还可与 FunModel 深度集成,提供性能监控、日志分析、在线调试与快速迭代能力,让 AI 模型从实验室到服务化落地更高效。
从开发到生产:DevPod 全流程闭环工作流
然而,启动模型仅仅是开始。在实际业务场景中,我们还需要完成模型调优、代码调试、性能测试、服务封装、生产部署等一系列环节。传统方式下,这些步骤往往涉及多个平台和工具的切换,数据和代码在不同环境间流转,极易出现"在我机器上能运行"的尴尬局面。
DevPod 通过统一的工作空间和无缝衔接的部署能力,打通了从代码到服务的最后一公里。下面,我们将通过 DeepSeek-OCR 模型的实战案例,完整演示这一工作流。
1. 开发调试阶段:VSCode + GPU 加速的云端实验室
在 DevPod 中启动 DeepSeek-OCR 环境实例后,我们立即获得一个配备 GPU 的云端 VSCode 开发环境。这不仅是一个模型运行的容器,更是一个完整的端到端研究与开发平台。基于 DeepSeek-OCR-vLLM 提供的代码推理示例,我们构建了 server.py 作为推理服务的核心入口,实现了高效、可扩展的推理接口(完整代码详见附录)。
/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/server.py
import osimport ioimport torchimport uvicornimport requestsfrom PIL import Imagefrom fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom typing import Optional, Dict, Any, Listimport tempfileimport fitzfrom concurrent.futures import ThreadPoolExecutorimport asyncio# Set environment variablesif torch.version.cuda == '11.8':
os.environ["TRITON_PTXAS_PATH"] = "/usr/local/cuda-11.8/bin/ptxas"os.environ['VLLM_USE_V1'] = '0'os.environ["CUDA_VISIBLE_DEVICES"] = '0'from config import MODEL_PATH, CROP_MODE, MAX_CONCURRENCY, NUM_WORKERSfrom vllm import LLM, SamplingParamsfrom vllm.model_executor.models.registry import ModelRegistryfrom deepseek_ocr import DeepseekOCRForCausalLMfrom process.ngram_norepeat import NoRepeatNGramLogitsProcessorfrom process.image_process import DeepseekOCRProcessor# Register modelModelRegistry.register_model("DeepseekOCRForCausalLM", DeepseekOCRForCausalLM)# Initialize modelprint("Loading model...")
...# Initialize FastAPI appapp = FastAPI(title="DeepSeek-OCR API", version="1.0.0")
...@app.post("/ocr_batch", response_model=ResponseData)async def ocr_batch_inference(request: RequestData): """
Main OCR batch processing endpoint
Accepts a list of image URLs and/or PDF URLs for OCR processing
Returns a list of OCR results corresponding to each input document
Supports both individual image processing and PDF-to-image conversion
"""
print(f"Received request data: {request}") try:
input_data = request.input
prompt = request.prompt # Get the prompt from the request
if not input_data.images and not input_data.pdfs: raise HTTPException(status_code=400, detail="Either 'images' or 'pdfs' (or both) must be provided as lists.")
all_batch_inputs = []
final_output_parts = [] # Process images if provided
if input_data.images:
batch_inputs_images, counts_images = await process_items_async(input_data.images, is_pdf=False, prompt=prompt)
all_batch_inputs.extend(batch_inputs_images)
final_output_parts.append(counts_images) # Process PDFs if provided
if input_data.pdfs:
batch_inputs_pdfs, counts_pdfs = await process_items_async(input_data.pdfs, is_pdf=True, prompt=prompt)
all_batch_inputs.extend(batch_inputs_pdfs)
final_output_parts.append(counts_pdfs) if not all_batch_inputs: raise HTTPException(status_code=400, detail="No valid images or PDF pages were processed from the input URLs.") # Run inference on the combined batch
outputs_list = await run_inference(all_batch_inputs) # Reconstruct final output list based on counts
final_outputs = []
output_idx = 0
# Flatten the counts list
all_counts = [count for sublist in final_output_parts for count in sublist] for count in all_counts: # Get 'count' number of outputs for this input
input_outputs = outputs_list[output_idx : output_idx + count]
output_texts = [] for output in input_outputs:
content = output.outputs[0].text if '<|end▁of▁sentence|>' in content:
content = content.replace('<|end▁of▁sentence|>', '')
output_texts.append(content) # Combine pages if it was a multi-page PDF input (or image treated as PDF)
if count > 1:
combined_text = "\n<--- Page Split --->\n".join(output_texts)
final_outputs.append(combined_text) else: # Single image or single-page PDF
final_outputs.append(output_texts[0] if output_texts else "")
output_idx += count # Move to the next set of outputs
return ResponseData(output=final_outputs) except HTTPException: raise
except Exception as e: raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
...if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000, workers=1)local 测试
# 终端启动推理服务$ python /workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/server.py# 开启另外一个终端$ curl -X POST \
-H "Content-Type: application/json" \
-d '{
"input": {
"pdfs": [
"https://images.devsapp.cn/test/ocr-test.pdf"
]
},
"prompt": "<image>\nFree OCR."
}' \
http://127.0.0.1:8000/ocr_batch也可以通过快速访问 tab 获取代理路径,比如: https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/, 并通过外部的 Postman 等客户端工具直接调用调试。
测试 image
$ curl -X POST \
-H "Content-Type: application/json" \
-d '{
"input": {
"images": [
"https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png"
]
},
"prompt": "<image>\n<|grounding|>Convert the document to markdown."
}' \ "https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/ocr_batch"测试 pdf
$ curl -X POST \
-H "Content-Type: application/json" \
-d '{
"input": {
"pdfs": [
"https://images.devsapp.cn/test/ocr-test.pdf"
]
},
"prompt": "<image>\nFree OCR."
}' \ "https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/ocr_batch"示例:
混合
$ curl -X POST \
-H "Content-Type: application/json" \
-d '{
"input": {
"pdfs": [
"https://images.devsapp.cn/test/ocr-test.pdf"
],
"images": [
"https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png"
]
},
"prompt": "<image>\nFree OCR."
}' \ "https://devpod-dbbeddba-ngywxigepn.cn-hangzhou.ide.fc.aliyun.com/proxy/8000/ocr_batch"DevPod 的优势在于:所有依赖已预装,GPU 资源即开即用,开发者可以专注于算法优化和业务逻辑,而非环境问题。
2. 服务封装阶段:一键转换为镜像交付物
当模型在开发环境中验证通过后,下一步是将其封装为镜像交付物。在 FunModel 的 DevPod 中,这仅需如下操作:
详情请参考 DevPod 镜像构建与 ACR 集成
3. 一键部署:从工作区到生产环境
镜像构建推送完毕后,镜像已经存储到 ACR,此时可以一键部署为 FunModel 模型服务。
4. 监控与迭代:闭环的开发运维体验
部署不是终点。DevPod 与 FunModel 深度集成,提供了完整的监控面板:
性能监控:实时查看 GPU 利用率、请求延迟、吞吐量
日志分析:集中收集所有实例日志,支持关键词检索
变更部署记录:每次变更配置(如卡型、扩缩容策略、 timeout 等)的部署都有记录追溯
在线快捷调试:快速测试部署后的模型服务
当需要优化模型或修复问题时,开发者可以:
在监控中发现问题
直接打开 DevPod 继续开发调试
验证修复方案
制作新的镜像, 一键部署
整个过程在统一环境中完成,避免了环境不一致导致的问题,真正实现了开发与运维的无缝协作。
总结
通过本文的实战演示,我们可以看到 DevPod 不仅解决了"启动难"的问题,更构建了从代码到服务的完整闭环:
环境一致性:开发、测试、生产环境完全一致,消除"环境漂移"
资源弹性:按需分配 GPU 资源,开发时低配,生产时高配
工作流集成:无需在多个平台间切换,所有操作在一个工作区完成
部署零学习曲线:无需掌握 K8s、Dockerfile 等复杂概念,专注业务价值
DevFlow1:云端开发与部署的无缝闭环
DevFlow1 描绘了开发者基于 DevPod 实现的高效工作流:
开发者首先启动一个预配置的云端开发环境——已内置所需依赖与 GPU 资源,可即刻进行代码编写与调试。
代码修改完成后,无需手动编写 Dockerfile 或管理构建流程,只需一键操作,系统即自动将当前开发环境与代码打包为标准化镜像。
该镜像可直接部署为生产级服务,对外提供 API 接口。
当需要迭代优化时,开发者可无缝返回开发环境继续修改,再次一键构建并更新线上服务。
整个流程实现了从开发、调试到部署、迭代的全链路自动化,彻底屏蔽了基础设施的复杂性,让开发者真正聚焦于业务逻辑与模型优化本身。
DevFlow2:面向工程化的开发者工作流
DevFlow2 适用于熟悉容器化与工程化实践的开发者:
开发者从代码仓库的指定稳定版本(Commit)切入,启动专属开发环境,进行代码迭代、依赖安装及集成测试。
一旦测试验证通过且结果符合预期,开发者即可着手准备部署:手动编写或调整 Dockerfile,精确配置镜像构建逻辑,并按需设定函数入口或服务参数。
随后,系统依据该 Dockerfile 重建镜像,并执行端到端测试,以确保生产环境中的行为一致性。
最终,代码与 Dockerfile 变更一同提交至 Git,完成一次标准、可追溯且可复现的发布流程。
此流程赋予开发者对部署细节的精细控制,契合追求工程规范与长期可维护性的团队需求。
在阿里云 FunModel 平台,我们正在见证 AI 开发范式的转变:从"先建基础设施,再开发模型"到"先验证想法,再扩展规模"。DevPod 作为这一变革的核心载体,让 AI 开发者真正回归创造本身,而非被工具和环境所束缚。
了解函数计算模型服务 FunModel
FunModel 是一个面向 AI 模型开发、部署与运维的全生命周期管理平台。您只需提供模型文件(例如来自 ModelScope、Hugging Face 等社区的模型仓库),即可利用 FunModel 的自动化工具快速完成模型服务的封装与部署,并获得可直接调用的推理 API。平台在设计上旨在提升资源使用效率并简化开发部署流程。
FunModel 依托 Serverless + GPU,天然提供了简单,轻量,0 门槛的模型集成方案,给个人开发者良好的玩转模型的体验,也让企业级开发者快速高效的部署、运维和迭代模型。
在阿里云 FunModel 平台,开发者可以做到:
模型的快速部署上线:从原来的以周为单位的模型接入周期降低到 5 分钟,0 开发,无排期
一键扩缩容,让运维不再是负担:多种扩缩容策略高度适配业务流量,实现“无痛运维”
技术优势
| 特性 | FunModel 实现机制 | 说明 |
|---|---|---|
| 资源利用率 | 采用 GPU 虚拟化与资源池化技术。 | 该设计允许多个任务共享底层硬件资源,旨在提高计算资源的整体使用效率。 |
| 实例就绪时间 | 基于快照技术的状态恢复机制。 | 实例启动时,可通过快照在毫秒级别恢复运行状态,从而将实例从创建到就绪的时间控制在秒级。 |
| 弹性扩容响应 | 结合预热资源池与快速实例恢复能力。 | 当负载增加时,系统可以从预热资源池中快速调度并启动新实例,实现秒级的水平扩展响应。 |
| 自动化部署耗时 | 提供可一键触发的构建与部署流程。 | 一次标准的部署流程(从代码提交到服务上线)通常可在 10 分钟内完成。 |
访问模型广场(https://fcnext.console.aliyun.com/fun-model/cn-hangzhou/fun-model/model-market)快速部署 DeepSeek-OCR。
更多内容请参考
附录
完整代码:
/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/server.py
import osimport ioimport torchimport uvicornimport requestsfrom PIL import Imagefrom fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom typing import Optional, Dict, Any, Listimport tempfileimport fitzfrom concurrent.futures import ThreadPoolExecutorimport asyncio# Set environment variablesif torch.version.cuda == '11.8':
os.environ["TRITON_PTXAS_PATH"] = "/usr/local/cuda-11.8/bin/ptxas"os.environ['VLLM_USE_V1'] = '0'os.environ["CUDA_VISIBLE_DEVICES"] = '0'from config import MODEL_PATH, CROP_MODE, MAX_CONCURRENCY, NUM_WORKERSfrom vllm import LLM, SamplingParamsfrom vllm.model_executor.models.registry import ModelRegistryfrom deepseek_ocr import DeepseekOCRForCausalLMfrom process.ngram_norepeat import NoRepeatNGramLogitsProcessorfrom process.image_process import DeepseekOCRProcessor# Register modelModelRegistry.register_model("DeepseekOCRForCausalLM", DeepseekOCRForCausalLM)# Initialize modelprint("Loading model...")
llm = LLM(
model=MODEL_PATH,
hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
block_size=256, # Memory block size for KV cache
enforce_eager=False, # Use eager mode for better performance with multimodal models
trust_remote_code=True, # Allow execution of code from remote repositories
max_model_len=8192, # Maximum sequence length the model can handle
swap_space=0, # No swapping to CPU, keeping everything on GPU
max_num_seqs=max(MAX_CONCURRENCY, 100), # Maximum number of sequences to process concurrently
tensor_parallel_size=1, # Number of GPUs for tensor parallelism (1 = single GPU)
gpu_memory_utilization=0.9, # Use 90% of GPU memory for model execution
disable_mm_preprocessor_cache=True # Disable cache for multimodal preprocessor to avoid issues)# Configure sampling parameters# NoRepeatNGramLogitsProcessor prevents repetition in generated text by tracking n-gram patternslogits_processors = [NoRepeatNGramLogitsProcessor(ngram_size=20, window_size=50, whitelist_token_ids={128821, 128822})]
sampling_params = SamplingParams(
temperature=0.0, # Deterministic output (greedy decoding)
max_tokens=8192, # Maximum number of tokens to generate
logits_processors=logits_processors, # Apply the processor to avoid repetitive text
skip_special_tokens=False, # Include special tokens in the output
include_stop_str_in_output=True, # Include stop strings in the output)# Initialize FastAPI appapp = FastAPI(title="DeepSeek-OCR API", version="1.0.0")class InputData(BaseModel): """
Input data model to define what types of documents to process
images: Optional list of image URLs to process
pdfs: Optional list of PDF URLs to process
Note: At least one of these fields must be provided in a request
"""
images: Optional[List[str]] = None
pdfs: Optional[List[str]] = Noneclass RequestData(BaseModel): """
Main request model that defines the input data and optional prompt
"""
input: InputData # Add prompt as an optional field with a default value
prompt: str = '<image>\nFree OCR.' # Default promptclass ResponseData(BaseModel): """
Response model that returns OCR results for each input document
"""
output: List[str]def download_file(url: str) -> bytes: """Download file from URL"""
try:
response = requests.get(url, timeout=30)
response.raise_for_status() return response.content except Exception as e: raise HTTPException(status_code=400, detail=f"Failed to download file from URL: {str(e)}")def is_pdf_file(content: bytes) -> bool: """Check if the content is a PDF file"""
return content.startswith(b'%PDF')def load_image_from_bytes(image_bytes: bytes) -> Image.Image: """Load image from bytes"""
try:
image = Image.open(io.BytesIO(image_bytes)) return image.convert('RGB') except Exception as e: raise HTTPException(status_code=400, detail=f"Failed to load image: {str(e)}")def pdf_to_images(pdf_bytes: bytes, dpi: int = 144) -> list: """Convert PDF to images"""
try:
images = []
pdf_document = fitz.open(stream=pdf_bytes, filetype="pdf")
zoom = dpi / 72.0
matrix = fitz.Matrix(zoom, zoom) for page_num in range(pdf_document.page_count):
page = pdf_document[page_num]
pixmap = page.get_pixmap(matrix=matrix, alpha=False)
img_data = pixmap.tobytes("png")
img = Image.open(io.BytesIO(img_data))
images.append(img.convert('RGB'))
pdf_document.close() return images except Exception as e: raise HTTPException(status_code=400, detail=f"Failed to convert PDF to images: {str(e)}")def process_single_image_sync(image: Image.Image, prompt: str) -> Dict: # Renamed and made sync
"""Process a single image (synchronous function for CPU-bound work)"""
try:
cache_item = { "prompt": prompt, "multi_modal_data": { "image": DeepseekOCRProcessor().tokenize_with_images(
images=[image],
bos=True,
eos=True,
cropping=CROP_MODE
)
},
} return cache_item except Exception as e: raise HTTPException(status_code=500, detail=f"Failed to process image: {str(e)}")async def process_items_async(items_urls: List[str], is_pdf: bool, prompt: str) -> tuple[List[Dict], List[int]]: """
Process a list of image or PDF URLs asynchronously.
Downloads files concurrently, then processes images/PDF pages in a thread pool.
Returns a tuple: (batch_inputs, num_results_per_input)
"""
loop = asyncio.get_event_loop() # 1. Download all files concurrently
download_tasks = [loop.run_in_executor(None, download_file, url) for url in items_urls]
contents = await asyncio.gather(*download_tasks) # 2. Prepare arguments for processing (determine if PDF/image, count pages)
processing_args = []
num_results_per_input = [] for idx, (url, content) in enumerate(zip(items_urls, contents)): if is_pdf: if not is_pdf_file(content): raise HTTPException(status_code=400, detail=f"Provided file is not a PDF: {url}")
images = pdf_to_images(content)
num_pages = len(images)
num_results_per_input.append(num_pages) # Each page will be processed separately
processing_args.extend([(img, prompt) for img in images]) else: # is image
if is_pdf_file(content): # Handle case where an image URL accidentally points to a PDF
images = pdf_to_images(content)
num_pages = len(images)
num_results_per_input.append(num_pages)
processing_args.extend([(img, prompt) for img in images]) else:
image = load_image_from_bytes(content)
num_results_per_input.append(1)
processing_args.append((image, prompt)) # 3. Process images/PDF pages in parallel using ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=NUM_WORKERS) as executor: # Submit all processing tasks
process_tasks = [
loop.run_in_executor(executor, process_single_image_sync, img, prompt) for img, prompt in processing_args
] # Wait for all to complete
processed_results = await asyncio.gather(*process_tasks) return processed_results, num_results_per_inputasync def run_inference(batch_inputs: List[Dict]) -> List: """Run inference on batch inputs"""
if not batch_inputs: return [] try: # Run inference on the entire batch
outputs_list = llm.generate(
batch_inputs,
sampling_params=sampling_params
) return outputs_list except Exception as e: raise HTTPException(status_code=500, detail=f"Failed to run inference: {str(e)}")@app.post("/ocr_batch", response_model=ResponseData)async def ocr_batch_inference(request: RequestData): """
Main OCR batch processing endpoint
Accepts a list of image URLs and/or PDF URLs for OCR processing
Returns a list of OCR results corresponding to each input document
Supports both individual image processing and PDF-to-image conversion
"""
print(f"Received request data: {request}") try:
input_data = request.input
prompt = request.prompt # Get the prompt from the request
if not input_data.images and not input_data.pdfs: raise HTTPException(status_code=400, detail="Either 'images' or 'pdfs' (or both) must be provided as lists.")
all_batch_inputs = []
final_output_parts = [] # Process images if provided
if input_data.images:
batch_inputs_images, counts_images = await process_items_async(input_data.images, is_pdf=False, prompt=prompt)
all_batch_inputs.extend(batch_inputs_images)
final_output_parts.append(counts_images) # Process PDFs if provided
if input_data.pdfs:
batch_inputs_pdfs, counts_pdfs = await process_items_async(input_data.pdfs, is_pdf=True, prompt=prompt)
all_batch_inputs.extend(batch_inputs_pdfs)
final_output_parts.append(counts_pdfs) if not all_batch_inputs: raise HTTPException(status_code=400, detail="No valid images or PDF pages were processed from the input URLs.") # Run inference on the combined batch
outputs_list = await run_inference(all_batch_inputs) # Reconstruct final output list based on counts
final_outputs = []
output_idx = 0
# Flatten the counts list
all_counts = [count for sublist in final_output_parts for count in sublist] for count in all_counts: # Get 'count' number of outputs for this input
input_outputs = outputs_list[output_idx : output_idx + count]
output_texts = [] for output in input_outputs:
content = output.outputs[0].text if '<|end▁of▁sentence|>' in content:
content = content.replace('<|end▁of▁sentence|>', '')
output_texts.append(content) # Combine pages if it was a multi-page PDF input (or image treated as PDF)
if count > 1:
combined_text = "\n<--- Page Split --->\n".join(output_texts)
final_outputs.append(combined_text) else: # Single image or single-page PDF
final_outputs.append(output_texts[0] if output_texts else "")
output_idx += count # Move to the next set of outputs
return ResponseData(output=final_outputs) except HTTPException: raise
except Exception as e: raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")@app.get("/health")async def health_check(): """Health check endpoint"""
return {"status": "healthy"}@app.get("/")async def root(): """Root endpoint"""
return {"message": "DeepSeek-OCR API is running (Batch endpoint available at /ocr_batch)"}if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000, workers=1)
共同学习,写下你的评论
评论加载中...
作者其他优质文章









