Hi there, I want to make a program which takes an image, reads the text written in the image and outputs the text to an editable format such as a .txt file or a .doc file. How should I proceed, any ideas???
Welcome to the Java Programming Forums
The professional, friendly Java community. 21,500 members and growing!
The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.
>> REGISTER NOW TO START POSTING
Members have full access to the forums. Advertisements are removed for registered users.
Hi there, I want to make a program which takes an image, reads the text written in the image and outputs the text to an editable format such as a .txt file or a .doc file. How should I proceed, any ideas???
You have posted in the wrong forum. Thread moved.
Please use [highlight=Java] code [/highlight] tags when posting your code.
Forum Tip: Add to peoples reputation by clicking the button on their useful posts.
What you want is far from trivial, and most solutions require complex machine learning algorithms. The following is the only java library I know of written exclusively in java, and I have not tried it for its abilities
Java OCR | Ron Cemer's Blog
Also, google "optical character recognition"
What you need is OCR (optical charcter regognition) SDK. As far as i know, there are no free/opensource pure Java OCR engines. There are Java APIs which wrap calls for native interfaces, for example, for one of the most popular opensource OCR engines - Tesseract - there are some Java wrappers like tesjeract or Tess4J.
However, opensource engines are rather hard to set up and don't provide enough quality, so if you are planning a business software - have a look at ABBYY FineReader Engine. It has a well-composed developer guide, a great set of image analysis and preprocessing features and provides Java API. It's not free, but, as you may know, ABBYY provides the best OCR quality, for example check out Linux OCR Software Comparison [splitbrain.org] or you may test it yourself, it’s free to try.
One more solution could be a cloud service. It requires end-user application to have the internet connection, but it's independent from your programming language choice and resources limitations. Have a look at ABBYY Cloud OCR SDK, it's a cloud-based OCR SDK recently launched by ABBYY. It's in beta, so for now it's totally free to use.
Last edited by NikolayKhl; November 30th, 2011 at 06:04 AM.
Here is a tool; I hope it’s going to solve your problem,
Aspose.OCR for Java is a Java OCR component that allows developers to add OCR functionality in their Java web applications, web services and Windows applications. It provides a simple set of classes for controlling character recognition tasks. It helps developers to work with image files from within their Java applications. It allows developers to extract text from images, Read font, style information quickly, saving time & effort involved in developing an
Here is a related post:
Extract Text from Specific Part of the Image: http://docs.aspose.com/display/ocrne...t+of+the+Image
hi vaibhav21
i want the Same project which you were looking for year ago.
i have done till now is that i am taking an image and able to do binarize.
now i am confused how to extract text from an image and save in .txt file
propabably you got the solution. so plz help me out for this.