This article is a mirror article of machine translation, please click here to jump to the original article.

View: 760|Reply: 0

[Source] .NET/C# uses Tesseract to OCR text recognition for images

[Copy link]
Posted on 2025-4-24 09:20:01 | | | |
Requirements: Use OCR technology to recognize image text, if it contains some text, the backend will initially pass the screening and give priority. The requirements are relatively simple.

Review:

.NET/C# uses FastDeploy to deploy OCR models to recognize text
https://www.itsvse.com/thread-10911-1-1.html

.NET Core calls Baidu PaddleOCR to recognize images and texts
https://www.itsvse.com/thread-9590-1-1.html

Tesseract OCR

Tesseract was originally developed between 1985 and 1994 at HP Laboratories in Bristol, UK, and HP in Greeley, Colorado, USA. In 1996, Tesseract was further modified for porting to Windows systems, and in 1998 it was partially C++ized. In 2005, HP made Tesseract open source. It was developed by Google from 2006 to November 2018.

Tesseract 4 adds a Neural Network (LSTM)-based OCR engine that focuses on line recognition, but still supports Tesseract 3's legacy Tesseract OCR engine, which works by recognizing character patterns. Use the legacy OCR engine mode (--oem 0) to enable compatibility with Tesseract 3. It also requires training data files that support older engines, such as files from the tessdata repository.

Tesseract Address:The hyperlink login is visible.
tessdata:The hyperlink login is visible.
Documentation:The hyperlink login is visible.

C# calls Tesseract

Regarding using C# to call Tesseract, there are two commonly used libraries: Tesseract and TesseractOCR, of which TesseractOCR is based on the Tesseract library secondary development, and the code of the two open source libraries is actually similar, the difference is that TesseractOCR calls the latest version (5.5.0) of the .dll dynamic link library, so it is recommendedTesseractOCR

Tesseract Code:The hyperlink login is visible.
TesseractOCR Code:The hyperlink login is visible.

First, you need to download the Chinese Simplified (chi_sim.traineddata) model. (omitted)

The code is as follows:


Find a screenshot from the Internet to test, the original picture is as follows:



The OCR recognition results are as follows:



(End)




Previous:.NET/C# file to create a hard link
Next:Accelerate the scikit-learn machine learning library with sklearnex
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com