已解决430363个问题，去搜搜看，总会有你想问的

为什么如果我用魔杖从pdf中提取图像jpg，它会使我在文本上变成黑色背景

首页猿问为什么如果我用魔杖从pdf中提取图...

为什么如果我用魔杖从pdf中提取图像jpg，它会使我在文本上变成黑色背景

Python

哔哔one 2022-01-18 16:17:17

我对一些 pdf 文件有疑问。我需要将它们转换为 jpg 图像，使它们可用于 OCR，但是当我转换其中一些时，Wand 将我转换为 jpg，其中文本上有黑色背景。我看到这是关于空间颜色的常见问题。它似乎发生在文件 word 转换为 pdf 文件的情况下，其中空间颜色变为 CMYK。Tesseract OCR 只接受空间颜色 RGB。我已经编写了一个可以转换的 python 脚本，但我想解决这个问题。你可以帮帮我吗？谢谢。原始页面 pd将 pdf 转换为 jpg

查看完整描述

2 回答

波斯汪

TA贡献1811条经验获得超4个赞

解决方案是在调用 save 之前设置这些：

page = wi(image=img)

page.background_color = Color('white')

page.alpha_channel = 'remove'

page.save(...)

反对回复 2022-01-18

繁星点点滴滴

TA贡献1803条经验获得超3个赞

这是我的代码：

def convert_pdf(pdf_file):

# Get name file

title = os.path.splitext(os.path.basename(pdf_file))[0]

basename = os.path.basename(pdf_file)

pdf = wi(filename=pdf_file, resolution=100)

pdfImage = pdf.convert("jpg")

outputPath = PATH_IMAGES+"/" + basename

if not os.path.exists(outputPath):

os.mkdir(outputPath)

i=1

for img in pdfImage.sequence:

page = wi(image=img)

page.save(filename=outputPath+"/"+title+"(*page="+str(i)+"*)"+".jpg")

imagePathConverted = outputPath+"/"+title+"(*page="+str(i)+"*)"+".jpg"

'''image = Image.open(imagePathConverted)

if image.mode != 'RGB':

rgb_image = image.convert('RGB')

rgb_image.save(imagePathConverted)'''

i += 1

return outputPath

反对回复 2022-01-18

2 回答
0 关注
237 浏览

关注

添加回答

0/150

提交

取消

微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号

热搜

最近搜索清空

为什么如果我用魔杖从pdf中提取图像jpg，它会使我在文本上变成黑色背景

为什么如果我用魔杖从pdf中提取图像jpg，它会使我在文本上变成黑色背景

2 回答

添加回答