当前位置:网站首页>Python image recognition OCR

Python image recognition OCR

2020-11-07 20:56:40 Coxhuang

List of articles

  • Python Image recognition OCR
    • #1 demand
    • #2 Environmental Science
    • #3 install
      • #3.1 macOS
      • #3.2 Linux(CentOS)
    • #4 Use
      • #4.1 python install pytesseract library
      • #4.2 Python Code
    • #5 Online case

Python Image recognition OCR

#1 demand

  • Identify the information in the picture , Such as QR code

#2 Environmental Science

macOS / Linux
Python3.7.6

#3 install

#3.1 macOS

  1. install tesseract
// Install only tesseract, Don't install training tools 
brew install tesseract
 
// install tesseract At the same time install training tools 
brew install --with-training-tools tesseract
 
// install tesseract Install all languages at the same time , The language pack is bigger , If installed, it will take a long time , It is not recommended to install , Select on demand 
brew install  --all-languages tesseract
 
// install tesseract, And install training tools and language 
brew install --all-languages --with-training-tools tesseract 

2. Download the language pack

Address : https://github.com/tesseract-ocr/tessdata

I have installed a Chinese language pack here

Chinese language pack : https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

Then copy the downloaded Chinese language pack to the following path :

/usr/local/Cellar/tesseract/4.0.0_1/share/tessdata

3. Check out the local language pack

tesseract --list-langs

#3.2 Linux(CentOS)

  1. Installation dependency
yum install autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel

2. install leptonica

download : wget https://github.com/tesseract-ocr/tesseract/archive/4.1.0.tar.gz

Unpack the installation

tar -xzvf leptonica-1.74.4.tar.gz
cd leptonica-1.74.4.tar.gz
./configure --profix=/usr/local/leptonica
make
sudo make install

3. install tesseract-ocr

wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip
cd tesseract-3.04/
./configure
make && make install
sudo ldconfig

I have installed a Chinese language pack here

Chinese language pack : https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

Then copy the downloaded Chinese language pack to the following path :

/usr/local/share/tessdata

#4 Use

#4.1 python install pytesseract library

pip install pytesseract
pip install Pillow

#4.2 Python Code

from PIL import Image
import pytesseract
 
#  Specify the image path and identify the language 
data = pytesseract.image_to_string(Image.open('/Users/Documents/1.png'), lang='chi_sim')
print(data)

#5 Online case

Address :

http://admin.minhung.me:20420/#/

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

版权声明
本文为[Coxhuang]所创,转载请带上原文链接,感谢