昨天 十行代码实现文字识别 ,感觉怎样,是不是很爽
今天咋们继续利用pillow和pytesseract来实现验证码的识别
一、环境配置
需要 pillow 和 pytesseract 这两个库,pip install 安装就好了。
pip?install?pillow?-i?http://pypi.douban.com/simple?--trusted-host?pypi.douban.com pip?install?pytesseract?-i?http://pypi.douban.com/simple?--trusted-host?pypi.douban.com
安装好Tesseract-OCR.exe
pytesseract 库的配置:搜索找到pytesseract.py,打开该.py文件,找到 tesseract_cmd,改变它的值为刚才安装 tesseract.exe 的路径。
二、验证码识别
识别验证码,需要先对图像进行预处理,去除会影响识别准确度的线条或噪点,提高识别准确度。
实例1
import?cv2?as?cv import?pytesseract from?PIL?import?Image def?recognize_text(image): ????#?边缘保留滤波??去噪 ????dst?=?cv.pyrMeanShiftFiltering(image,?sp=10,?sr=150) ????#?灰度图像 ????gray?=?cv.cvtColor(dst,?cv.COLOR_BGR2GRAY) ????#?二值化 ????ret,?binary?=?cv.threshold(gray,?0,?255,?cv.THRESH_BINARY_INV?|?cv.THRESH_OTSU) ????#?形态学操作???腐蚀??膨胀 ????erode?=?cv.erode(binary,?None,?iterations=2) ????dilate?=?cv.dilate(erode,?None,?iterations=1) ????cv.imshow('dilate',?dilate) ????#?逻辑运算??让背景为白色??字体为黑??便于识别 ????cv.bitwise_not(dilate,?dilate) ????cv.imshow('binary-image',?dilate) ????#?识别 ????test_message?=?Image.fromarray(dilate) ????text?=?pytesseract.image_to_string(test_message) ????print(f'识别结果:{text}') src?=?cv.imread(r'./test/044.png') cv.imshow('input?image',?src) recognize_text(src) cv.waitKey(0) cv.destroyAllWindows()
运行效果如下:
实例2
import?cv2?as?cv import?pytesseract from?PIL?import?Image def?recognize_text(image): ????#?边缘保留滤波??去噪 ????blur?=cv.pyrMeanShiftFiltering(image,?sp=8,?sr=60) ????cv.imshow('dst',?blur) ????#?灰度图像 ????gray?=?cv.cvtColor(blur,?cv.COLOR_BGR2GRAY) ????#?二值化 ????ret,?binary?=?cv.threshold(gray,?0,?255,?cv.THRESH_BINARY_INV?|?cv.THRESH_OTSU) ????print(f'二值化自适应阈值:{ret}') ????cv.imshow('binary',?binary) ????#?形态学操作??获取结构元素??开操作 ????kernel?=?cv.getStructuringElement(cv.MORPH_RECT,?(3,?2)) ????bin1?=?cv.morphologyEx(binary,?cv.MORPH_OPEN,?kernel) ????cv.imshow('bin1',?bin1) ????kernel?=?cv.getStructuringElement(cv.MORPH_OPEN,?(2,?3)) ????bin2?=?cv.morphologyEx(bin1,?cv.MORPH_OPEN,?kernel) ????cv.imshow('bin2',?bin2) ????#?逻辑运算??让背景为白色??字体为黑??便于识别 ????cv.bitwise_not(bin2,?bin2) ????cv.imshow('binary-image',?bin2) ????#?识别 ????test_message?=?Image.fromarray(bin2) ????text?=?pytesseract.image_to_string(test_message) ????print(f'识别结果:{text}') src?=?cv.imread(r'./test/045.png') cv.imshow('input?image',?src) recognize_text(src) cv.waitKey(0) cv.destroyAllWindows()运行效果如下:
实例3
import?cv2?as?cv import?pytesseract from?PIL?import?Image def?recognize_text(image): ????#?边缘保留滤波??去噪 ????blur?=?cv.pyrMeanShiftFiltering(image,?sp=8,?sr=60) ????cv.imshow('dst',?blur) ????#?灰度图像 ????gray?=?cv.cvtColor(blur,?cv.COLOR_BGR2GRAY) ????#?二值化??设置阈值??自适应阈值的话?黄色的4会提取不出来 ????ret,?binary?=?cv.threshold(gray,?185,?255,?cv.THRESH_BINARY_INV) ????print(f'二值化设置的阈值:{ret}') ????cv.imshow('binary',?binary) ????#?逻辑运算??让背景为白色??字体为黑??便于识别 ????cv.bitwise_not(binary,?binary) ????cv.imshow('bg_image',?binary) ????#?识别 ????test_message?=?Image.fromarray(binary) ????text?=?pytesseract.image_to_string(test_message) ????print(f'识别结果:{text}') src?=?cv.imread(r'./test/045.jpg') cv.imshow('input?image',?src) recognize_text(src) cv.waitKey(0) cv.destroyAllWindows()运行效果如下:
到此这篇关于Python+Pillow+Pytesseract实现验证码识别的文章就介绍到这了,更多相关Python验证码识别内容请搜索以前的文章或继续浏览下面的相关文章希望大家以后多多支持!
查看更多关于Python+Pillow+Pytesseract实现验证码识别的详细内容...
声明:本文来自网络,不代表【好得很程序员自学网】立场,转载请注明出处:http://haodehen.cn/did17516