Optical character Recognition

OCR (Optical Character Recognition) using MODI (Microsoft® Office Document Imaging)



Table of Contents:

Introduction to OCR----------------

Introduction to MODI-------------

Requirements------------------

Adding MODI components to Project --------------


Introduction to OCR:

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.

Introduction to MODI:

The Microsoft® Office Document Imaging 2003 (MODI) object model makes it possible to develop custom applications for managing document images (such as scanned and faxed documents) and the recognizable text that they contain.
Overview of the MODI Object Model
The MODI object model consists of the following objects, their members, and dependent objects:
• The Document object represents an ordered collection of pages (images).
• The Image object represents a single page of a document.
• The Layout object exposes the results of optical character recognition (OCR) on a page.
• The MiDocSearch object exposes document search functionality.
• The viewer control (the MiDocView object) is an ActiveX® control that displays the pages of a document.

For more information: Please visit the following URLs

http://msdn.microsoft.com/en-us/library/aa167607(office.11).aspx

http://www.codeproject.com/KB/office/modi.aspx

http://www.dotnetspider.com/forum/184661-c.aspx


Requirements:
Licensed copy of MS Office.

Dll used: interop.MODI.dll.

• If MODI component is not present in MSOffice then install and add it using Control panel-?Add/Remove programs?MS Office?Add features?Office tools.

Adding OCR components to Project:

OWC is used to draw chart/graphs. Following steps to be performed to draw graphs using OWC:

• In the solution right click and say Add Reference.
• Select COM tab.
• Select the following MS office component and install.


Sample code:


//create object of MODI document
MODI.Document doc = new MODI.Document();
//Initialise and create the document
doc.Create(FileUpload1.PostedFile.FileName);
//Process the sacnned image with specified language
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, true);
MODI.Layout layout = ((MODI.Image)doc.Images[0]).Layout;

string str = String.Empty;
//Extract the letters and read
for (int i = 0; i < layout.Words.Count; i++)
{
MODI.Word word = (MODI.Word)layout.Words[i];
if (str.Length > 0)
{
str += " ";
}
str += word.Text;
}
doc.Close(false);


Comments

Author: hitesh panchal26 Nov 2009 Member Level: Bronze   Points : 2

Hi mahantesh,

I have serious problem with this MODI.

i have created Windows service. which is converting Image into text format from source folder to destination folder.

its working properly in my development machine as well as my offices local machine. but when i deploy it to WINDOWS SERVER 2003 R2 SP2. so it give me bellow error....

" OCR running error at MODI.DocumentClass.OCR(MiLANGUAGES LangId, Boolean OCROrientImage, Boolean OCRStraightenImage)"

Please help me....
Please help me....

Thanks in advance.......

regards,
Hitesh Panchal

Author: Jigar28 Nov 2009 Member Level: Gold   Points : 1


Hi,

Mahantesh

Gr8 Article.

It's now very useful to me. i'll be using this in my project now.

Thanks Again. !!!



  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name:
    Email: