Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .NET/REAMDE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
##### Example: .NET

# Perform OCR in .NET
The [Syncfusion® .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) used to extract text from scanned PDFs and images in .NET application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) used to extract text from scanned PDFs and images in .NET application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on an entire PDF document in a .NET application

Expand Down
4 changes: 2 additions & 2 deletions ASP.NET Core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Perform OCR in ASP.NET Core using C#

The [Syncfusion® .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from the scanned PDFs and images in the ASP.NET Core application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) is used to extract text from the scanned PDFs and images in the ASP.NET Core application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on entire PDF document in ASP.NET Core application

Expand Down Expand Up @@ -79,4 +79,4 @@ using (OCRProcessor processor = new OCRProcessor("Tesseractbinaries/Windows/"))
```

By executing the program, you will get a PDF document as follows.
<img src="Perform_OCR_NET_Core/ocr_images/OCR-output-image.png" alt="Convert OCR ASP.NET_Core output" width="100%" Height="Auto"/>
<img src="Perform_OCR_NET_Core/ocr_images/OCR-output-image.png" alt="Convert OCR ASP.NET_Core output" width="100%" Height="Auto"/>
4 changes: 2 additions & 2 deletions ASP.NET MVC/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Perform OCR in ASP.NET MVC using C#

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from scanned PDFs and images in ASP.NET MVC application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) is used to extract text from scanned PDFs and images in ASP.NET MVC application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on entire PDF document in ASP.NET MVC

Expand Down Expand Up @@ -74,4 +74,4 @@ using (OCRProcessor processor = new OCRProcessor("TesseractBinaries/3.05/x86/"))
```

By executing the program, you will get a PDF document as follows.
<img src="Perform_OCR_MVC/ocr_images/OCR-output-image.png" alt="Convert OCR ASP.NET_MVC output" width="100%" Height="Auto"/>
<img src="Perform_OCR_MVC/ocr_images/OCR-output-image.png" alt="Convert OCR ASP.NET_MVC output" width="100%" Height="Auto"/>
5 changes: 2 additions & 3 deletions AWS Textract/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Perform OCR with AWS Textract
The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) supports external engines (AWS Textract) to process the OCR on Images and PDF documents.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) supports external engines (AWS Textract) to process the OCR on Images and PDF documents.

## Steps to perform OCR with AWS Textract
Step 1: Create a new .NET Console application project.
Expand Down Expand Up @@ -132,5 +132,4 @@ class AWSExternalOcrEngine : IOcrEngine
```

By executing the program, you will get the PDF document as follows.
<img src="Perform-OCR-AWS-Textract/OCR_Images/Output.png" alt="Output PDF screenshot" width="100%" Height="Auto"/>

<img src="Perform-OCR-AWS-Textract/OCR_Images/Output.png" alt="Output PDF screenshot" width="100%" Height="Auto"/>
5 changes: 2 additions & 3 deletions Azure Vision/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Perform OCR with Azure Vision

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) supports external engines (Azure Computer Vision) to process the OCR on images and PDF documents.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) supports external engines (Azure Computer Vision) to process the OCR on images and PDF documents.

## Steps to perform OCR with Azure Computer Vision
Step 1: Create a new .NET Console application project.
Expand Down Expand Up @@ -187,5 +187,4 @@ class AzureExternalOcrEngine : IOcrEngine
```

By executing the program, you will get the PDF document as follows.
<img src="Perform-OCR-Azure-Vision/OCR_Images/Output.png" alt="Output PDF screenshot" width="100%" Height="Auto"/>

<img src="Perform-OCR-Azure-Vision/OCR_Images/Output.png" alt="Output PDF screenshot" width="100%" Height="Auto"/>
2 changes: 1 addition & 1 deletion Azure/Azure App Services/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Perform OCR in Azure App Service using C#

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from scanned PDFs and images in Azure with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) is used to extract text from scanned PDFs and images in Azure with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on entire PDF document in Azure App Service on Windows

Expand Down
3 changes: 1 addition & 2 deletions Azure/Azure Function/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Azure Functions

The Syncfusion&reg; PDF is a [.NET Core PDF library](https://www.syncfusion.com/pdf-framework/net-core/pdf-library?_gl=1*jslmpe*_ga*OTcwNzc5NDkuMTY4MTEwMjEwNA..*_ga_WC4JKKPHH0*MTY5MDI5MjcyMi4zNjUuMS4xNjkwMjk0MjA5LjYwLjAuMA..) that supports OCR by using the Tesseract open-source engine. Using this library, perform OCR for a PDF document in Azure Functions using .NET Core.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) that supports OCR by using the Tesseract open-source engine. Using this library, perform OCR for a PDF document in Azure Functions using .NET Core.

### Steps to perform OCR on the entire PDF document in Azure Functions

Expand All @@ -24,7 +24,6 @@ Step 5: Then, set Copy to output directory to give copy always the tessdata fold

<img src="Perform_OCR_Azure_Functions/OCR-Images/Set_Copy_Always.png" alt="Convert OCR Azure Functions Tessdata Store" width="100%" Height="Auto"/>


Step 6: Include the following namespaces in the **Function1.cs** file to perform OCR for a PDF document using C#.

{% highlight c# tabtitle="C#" %}
Expand Down
2 changes: 1 addition & 1 deletion Blazor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Perform OCR in Blazor using C#

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from scanned PDFs and images in the Blazor application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) is used to extract text from scanned PDFs and images in the Blazor application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on the entire PDF document in the Blazor application

Expand Down
6 changes: 2 additions & 4 deletions Docker/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Perform OCR in Docker

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from the scanned PDFs and images in the Docker application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) is used to extract text from the scanned PDFs and images in the Docker application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on entire PDF document in Docker
Step 1: Create a new ASP.NET Core application project.
Expand Down Expand Up @@ -113,6 +113,4 @@ Step 12: Build and run the sample in Docker. It will pull the Linux Docker image

By executing the program, you will get a PDF document as follows.

<img src="Perform_OCR_Docker/Images/OCR-output-image.png" alt="OCR Dockeroutput" width="100%" Height="Auto"/>


<img src="Perform_OCR_Docker/Images/OCR-output-image.png" alt="OCR Dockeroutput" width="100%" Height="Auto"/>
9 changes: 4 additions & 5 deletions Linux/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Perform OCR in Linux

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from scanned PDFs and images in the Linux application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) is used to extract text from scanned PDFs and images in the Linux application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
## Pre-requisites

The following Linux dependencies should be installed where the conversion takes place.
Expand All @@ -13,7 +13,6 @@ sudo apt-get install y- libopenjp2-7

```


## Steps to convert HTML to PDF in .NET Core application on Linux.

Step 1: Execute the following command in the Linux terminal to create a new .NET Core Console application.
Expand All @@ -24,7 +23,7 @@ dotnet new console

```

<img src="Perform_OCR_Linux/Images/LinuxStep1.png" alt="OCR Linux Step1" width="100%" Height="Auto"/>
<img src="Perform_OCR_Linux/Images/LinuxStep1.png" alt="OCR Linux Step1" width="100%" Height="Auto"/>

Step 2: Install the [Syncfusion.PDF.OCR.Net](https://www.nuget.org/packages/Syncfusion.PDF.OCR.Net/) NuGet package as a reference to your .NET Core application [NuGet.org](https://www.nuget.org/).

Expand All @@ -46,7 +45,7 @@ OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/")

Step 4: Place the Tesseract language data {E.g eng.traineddata} in the local system and provide a path to the OCR processor. Please use the OCR language data for other languages using the following link.

[Tesseract language data](https://github.com/tesseract-ocr/tessdata)
[Tesseract language data](https://github.com/tesseract-ocr/tessdata)

```csharp

Expand Down Expand Up @@ -125,4 +124,4 @@ dotnet run
<img src="Perform_OCR_Linux/Images/LinuxStep5.png" alt="OCR Linux Step5" width="100%" Height="Auto"/>

By executing the program, you will get the PDF document as follows. The output will be saved in parallel to the program.cs file.
<img src="Perform_OCR_Linux/Images/Linux-output-image.png" alt="OCR Linux Output" width="100%" Height="Auto"/>
<img src="Perform_OCR_Linux/Images/Linux-output-image.png" alt="OCR Linux Output" width="100%" Height="Auto"/>
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# OCR-csharp-examples
The [Syncfusion&reg; .NET Optical Character Recognition (OCR) library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. This repository contains examples of OCR using C# in different platforms.
The [.NET Optical Character Recognition (OCR)](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) library is used to extract text from scanned PDFs and images. With a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. This repository contains examples of OCR using C# in different platforms.

Platform Name | Description
--- | ---
[ASP.NET Core]() | Perform OCR on entire PDF document in ASP.NET Core application.
[ASP.NET MVC]() | Perform OCR on entire PDF document in ASP.NET MVC application.
[Azure App Service]() | Perform OCR on entire PDF document in Azure App Service.
[Azure Functions]() | Perform OCR on entire PDF documnet in Azure Functions.
[AWS Textract]() | Performing OCR on entire PDF document with AWS Texteract.
[Azure Vision]() | Performing OCR on entire PDF document with Azure Vision.
[Blazor]() | Perform OCR on entire PDF document in Blazor application.
[Docker]() | Perform OCR on entire PDF document in Dcoker application.
[Linux]() | Perform OCR on entire PDF document in Linux OS.
[Windows Forms]() | Perform OCR on entire PDF document in Windows Forms application.
[WPF]() | Perform OCR on entire PDF document in WPF application.
[.NET]() | Perform OCR on entire PDF document in .NET Console application.
[ASP.NET Core](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/ASP.NET%20Core) | Perform OCR on entire PDF document in ASP.NET Core application.
[ASP.NET MVC](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/ASP.NET%20MVC) | Perform OCR on entire PDF document in ASP.NET MVC application.
[Azure App Service](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Azure/Azure%20App%20Services) | Perform OCR on entire PDF document in Azure App Service.
[Azure Functions](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Azure/Azure%20Function) | Perform OCR on entire PDF documnet in Azure Functions.
[AWS Textract](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/AWS%20Textract) | Performing OCR on entire PDF document with AWS Texteract.
[Azure Vision](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Azure%20Vision) | Performing OCR on entire PDF document with Azure Vision.
[Blazor](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Blazor) | Perform OCR on entire PDF document in Blazor application.
[Docker](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Docker) | Perform OCR on entire PDF document in Dcoker application.
[Linux](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Linux) | Perform OCR on entire PDF document in Linux OS.
[Windows Forms](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/Windows%20Forms) | Perform OCR on entire PDF document in Windows Forms application.
[WPF](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/WPF) | Perform OCR on entire PDF document in WPF application.
[.NET](https://github.com/SyncfusionExamples/OCR-csharp-examples/tree/master/.NET) | Perform OCR on entire PDF document in .NET Console application.

# How to run the examples
* Download this project to a location in your disk.
Expand All @@ -23,11 +23,11 @@ Platform Name | Description
* Run the application.

# Resources
* **Product page:** [Syncfusion&reg; PDF Framework](https://www.syncfusion.com/document-processing/pdf-framework/net)
* **Documentation page:** [Syncfusion&reg; .NET PDF library](https://help.syncfusion.com/file-formats/pdf/overview)
* **Online demo:** [Syncfusion&reg; .NET PDF library - Online demos](https://ej2.syncfusion.com/aspnetcore/PDF/CompressExistingPDF#/bootstrap5)
* **Product page:** [Syncfusion&reg; PDF Framework](https://www.syncfusion.com/document-sdk/net-pdf-library)
* **Documentation page:** [Syncfusion&reg; .NET PDF library](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/overview)
* **Online demo:** [Syncfusion&reg; .NET PDF library - Online demos](https://document.syncfusion.com/demos/pdf/default#/tailwind)
* **Blog:** [Syncfusion&reg; .NET PDF library - Blog](https://www.syncfusion.com/blogs/category/pdf)
* **Knowledge Base:** [Syncfusion&reg; .NET PDF library - Knowledge Base](https://www.syncfusion.com/kb/windowsforms/pdf)
* **Knowledge Base:** [Syncfusion&reg; .NET PDF library - Knowledge Base](https://support.syncfusion.com/kb/web/section/866)
* **EBooks:** [Syncfusion&reg; .NET PDF library - EBooks](https://www.syncfusion.com/succinctly-free-ebooks)
* **FAQ:** [Syncfusion&reg; .NET PDF library - FAQ](https://www.syncfusion.com/faq/)

Expand Down
2 changes: 1 addition & 1 deletion WPF/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Perform OCR on WPF

The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) used to extract text from scanned PDFs and images in WPF application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) used to extract text from scanned PDFs and images in WPF application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on entire PDF document in WPF

Expand Down
2 changes: 1 addition & 1 deletion Windows Forms/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
##### Example: Windows Forms

# Perform OCR in Windows
The [Syncfusion&reg; .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) used to extract text from scanned PDFs, and images in Windows Forms application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.
The [.NET OCR library](https://www.syncfusion.com/document-sdk/net-pdf-library/ocr-process) used to extract text from scanned PDFs, and images in Windows Forms application with the help of Google's [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

## Steps to perform OCR on entire PDF document in Windows Forms

Expand Down
Loading