OCR | Optical Character Recognition, Convert Text | Scan2CAD https://www.scan2cad.com/blog/tag/ocr/ Intelligent Raster to Vector Conversion Tue, 08 Jul 2025 05:23:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 OCR Guide: Converting Images to Searchable Documents https://www.scan2cad.com/blog/tips/ocr-searchable-documents/ Wed, 06 Feb 2019 09:58:35 +0000 https://www.scan2cad.com/?p=32791 If this isn’t your first time to this blog, you’ll know that we’re forever touting the benefits of converting your images from raster to vector. And for good reason—you simply can’t get the full potential from your technical drawings while they’re in a raster format.

But there are other valuable benefits to converting your images than just editing your drawings. What if your goal is to create a searchable database of the data held within your images? This is where technology like OCR can be a real game-changer.

If you convert text within your imagery to text strings, you can begin to catalog your imagery into a searchable database. Once organized into such a system, one would simply have to search for a text string within the imagery and the relevant image would appear. This level of efficiency is possible when you use conversion software that incorporates the power of OCR.

In this article, we’ll explore the process by which you can transform your images into versatile, editable and, most importantly, searchable documents. Let’s get stuck in!


Table of contents


What is OCR and how does it work?

Optical Character Recognition (OCR) in Scan2CAD OCR stands for Optical Character Recognition. It is the technology that allows computers to detect and highlight text within an image. You can see it in action in various different forms across the globe, as it is put to use by industries with a range of OCR needs. For example, the cameras that police use to track number plates rely on OCR, as does the software that enables law clerks to search for particular legal cases within a giant database.

There are a number of different techniques that OCR utilizes, the two most common of which are pattern recognition and feature extraction. The former involves a computer searching an image and comparing the information within to a collection of fonts, numbers and symbols that it already has stored. While somewhat effective, this approach is limited in the sense that the OCR will only be able to detect common fonts like Times New Roman or its very own OCR-A

Feature extraction, on the other hand, has vastly improved the accuracy of OCR technology. Instead of matching similar letters, the computer is looking for certain features that it has learned, in combination, form a particular letter or number. It should recognize, for example, that a short horizontal line sitting on top of a longer, vertical line makes a ‘T’. Using this technique, a computer system that can retain multiple neural networks (which allow for deep learning) can even be trained to recognize handwritten text!


Raster text vs vector text

Comparison of poor quality raster text with vector text string

Raster

Raster images are good for certain purposes. If you want to store high quality photographs, for example, TIFF files are handy because they support a large number of colors and boast lossless compression—allowing images to retain their quality even after editing or compression has taken place. 

The issue when it comes to text, however, is that raster images are made up of pixels. And that’s it. Even if a raster image appears to contain text, for all intents and purposes (in other words, from a computer’s perspective) the text is indistinguishable from the imagery because it’s all just pixels. The text isn’t really text and thus it isn’t possible to search for these details within a raster image.

What’s more, data cannot be attached to particular elements of the file, and zooming in or changing scale will result in a reduction in quality of the overall image. All of this is to say that having textual information stored in a raster format is a bad idea.

 

Vector

Vector images are comprised of distinct elements, each of which is defined by a mathematical equation. This means that users can edit or attach data to individual components (including text) of a technical drawing. 

As vector text is recognized as such (distinct from the surrounding drawings) you can search through it as you would in any other document. There’s also the option of attaching data to the text elements within vector images. You may, for example, add metadata like ‘page title’ or ‘draft number’ to your drawings. 

Before you can make the most of this potential versatility, however, you need to convert the text in your images using OCR. 


Why make searchable databases from your images?

Patent drawing of the Cameron EVO BOP (a drill part).

Making your images searchable can save a huge amount of time and effort. Imagine you have a large volume of patent drawings, for example. In such a case, storing them as raster images isn’t efficient at all. What you have is just a collection of pixels—the images do not hold any useful information about their contents. How will you ever be able to locate the image that refers to, say, ‘fig. 2’ when needed?

Enter OCR. When you use OCR to convert the pixels in your image into vector text, you are creating a database of information related to the image. This information can then be searched for by users who may be faced with tens of thousands of images to scroll through. 

On a somewhat more serious note, making your images searchable can also provide protection on a legal basis. Take, for example, designs for products. If your work is patented, this needs to be documented and available for others to see, so that they don’t infringe on your designs. Inventors working for large companies like Nike ensure their patented designs are searchable through large online databases. Interested parties can then find the images by performing a simple search on engines like Google Patents.

Aside from benefits to your workflow like increased efficiency and organisation, making your images searchable can also be a savvy business decision. It’s not just easier for you to locate your work—depending on where you store it, it’s also easier for other people to find. This could be great for promoting your services and getting your name or brand out there.


Why you need more than just OCR

There are many simple OCR solutions available which will convert imagery containing only text to fully editable text strings. However, if the imagery you are converting contains elements other than text you will hit a multitude of problems. OCR software will attempt to convert whatever it is provided therefore a key part of the solution is in identifying what should and shouldn’t be sent to OCR.

Scan2CAD is focused on solutions for converting technical drawings. Scan2CAD’s technology will identify which elements are likely to be text and ‘send’ these elements to the OCR, other elements are vectorized into their appropriate vector entities creating a much higher level of OCR accuracy.

performing OCR on patent drawings

Scan2CAD identifies the areas of the image which are likely to contain text. Converting only the appropriate elements using OCR


Tips to ensure OCR is successful

Optimize for conversion

 

If you want to end up with a high quality image, your original document needs to be optimized for conversion. This means making sure the raster text is as clean and clear as possible. Manually erase any dirt or smudges, to prevent the software assuming that such flaws are part of the actual image. It’s worth running through Scan2CAD’s raster text quality checklist to ensure that the image you want to use is suitable in the first place.

Please note: if your image contains too many flaws like overlapping characters, text positioned at different orientations, or unusual fonts, successful conversion may not be possible.

Select the right conversion software

As with many things in life, the quality of the end product in text conversion largely depends on the quality of the software you use. Cheap (and even free) conversion programs are available on the internet. We urge you to exercise caution when it comes to these enticing options, though.

If you don’t invest in a legitimate brand, the OCR may not be up to scratch. Issues like text orientation and non-standard fonts can easily stump basic conversion software. This may result in the final product containing exploded text, rather than defined text strings. The former is just a collection of vector lines and curves. In other words, the software has assumed that the letters are mere shapes rather than text. Thus, you will not be able to edit them as text—let alone make them searchable!

So, even when the results produced by cheap online converters initially look promising, closer inspection may reveal otherwise…


Convert your images to searchable documents

 

Using Scan2CAD to convert text and other elements in an image

Scan2CAD is the world’s leading solution for converting technical drawings. Scan2CAD’s powerful OCR capabilities are designed for real-world technical drawings. Not simplified text-only images.

Want to give it a try yourself? Learn more about Scan2CAD.

 

]]>
OCR API: The Best Options Compared https://www.scan2cad.com/blog/cad/ocr-api-compared/ https://www.scan2cad.com/blog/cad/ocr-api-compared/#comments Tue, 11 Sep 2018 14:23:16 +0000 https://www.scan2cad.com/?p=29974 Getting computers to recognize text within images can be a tricky business. Machines find it very difficult to separate text from other objects because they, of course, do not interpret letters and numbers in the same way as humans all elements are simply a collection of pixels. To get around this problem, CAD programs largely rely on techniques like pattern recognition and feature extraction to detect text within pictures. These processes are made possible by OCR—the technology that allows computers to extract text from images. 

An OCR API can be used to add OCR capabilities to your software. OCR API’s are in high demand because converting text is a common requirement for image processing software yet developing the capabilities yourself would be an extremely time consuming and complex project. 

There are many OCR API options out there to choose from. Big companies like Amazon and Microsoft provide these services, as well as lesser-known companies that offer free versions. With so many factors to weigh up, it can be hard to know which one you should go for. For this reason, we’ve compared a few of the best looking options avaliable. Let’s see how they measured up…


Table of contents


What is OCR?

Using Scan2CAD to convert text and other elements in an image

OCR, or Optical Character Recognition, is the technology that enables software to recognize text in an image. Within a CAD context, you may know it as the feature that enables you to convert raster text into vector text—thus allowing you to edit said text with CAD software. 

Initially, OCR relied on the process of pattern recognition to distinguish text from other image elements. The technology would compare objects in an image to a library of figures it already had stored. When it found a match, it would know to regard it as text. This technique was fairly limited, as OCR then only stored well-established fonts, like Times New Roman or its very own font, OCR-A

As the technology behind OCR has improved, it has increasingly relied on the technique known as feature extraction. This involves a computer associating various features presented in a certain combination with particular letters or figures. For example, a vertical line topped with a smaller, horizontal line is understood to represent the letter ‘T’. Once OCR software can perform feature extraction, it can even be trained to recognize certain handwriting

OCR API

API stands for Application Programming Interface. It’s a fairly general term that can cover a wide range of technologies. You can consider them a tool that allows a distinct piece of software to interact with an established application or program, with the purpose of providing certain methods or properties that the main application lacks.

So, in the case of OCR, an API could be used to detect and extract text from an image that you provide. This is really useful for people working with software that doesn’t offer OCR capabilities. The OCR APIs can return their work with text that is editable or better displayed. 

What to look for in OCR APIs

There are certain qualities that you should always look out for when shopping around for an OCR API. The most important feature is that the technology should be able to extract data (letters and figures) correctly and with precision. This might sound obvious, but you’d be amazed by how many applications fail to cut the mustard. 

Example of vector text strings. This is the desired result of vectorization because they can be edited and displayed correctly.

If you’re already a pro at converting raster files to vector formats, you’ll know all about the pitfalls of exploded text. In short, when using OCR to extract text from an image, the result you’re looking for is text strings. This means the characters are rendered and presented correctly and can be easily edited.

Exploded text

Example of exploded text. The characters are formed of vector shapes rather than actual text.

Software that lacks precision and accuracy may send you back a file containing exploded text instead. This is not really text, but rather a group of vector shapes that will be almost impossible to edit. Selecting the right OCR API is vital if you want to avoid these annoying flaws.

Outside of capability and precision, price and ease of use are aspects to consider before making any software selections. Sometimes you can find efficient services for free; other times, it’s worth shelling out for a top quality product. This is why it’s important to make an informed choice before separating with your cash. 

As for ease of use, it’s important to make sure you are providing the software with images that are optimized for OCR. 

Issues that can stump OCR technology

Poor image quality for raster to vector conversion

Images with any of these problems are unlikely to convert successfully.

Yes, OCR technology is very sophisticated and its capabilities increasingly impressive, but you do need to meet it half way. As is the case with converting raster images to vector images, when using OCR you should make sure your original image is of a high quality.

You can’t expect the technology to be able to detect text in an image that is out of focus and blurry. Similarly, OCR may struggle to separate characters that are very similar (like ‘S’ and ‘5’) or presented in a confusing manner. For the best results, make sure you’re providing a strong starting point.

The best options compared

There are plenty of OCR API options to be found on the internet. From the offerings of the major tech giants, to free online converters, we compare 5 of the best below. 

1 Microsoft Computer Vision

Microsoft logo

Image source: Pixabay

Microsoft Computer Vision, part of the Microsoft Azure platform, offers so much more than OCR capabilities. This software has the ability to analyze video, recognize celebrities and read handwritten text within images (though the latter is still in preview stage).

If basic OCR is all you’re looking for, the free tier option is quite generous. You’re allowed up to 5,000 transactions per month. With  the power of Microsoft behind it, you can expect accurate results and a wide range of special features. You can pick from two OCR endpoints: image file or URL. 

Price Free → $2.50 per every 1000 transactions
Top features Analyzes and extract text from images, recognizes celebrities and landmarks, video analysis
Visit the website

2 Google Cloud Vision

Google Cloud Platform logo

Image source: Medium

Once again, this is a service that offers much more than OCR. Google Cloud Vision can recognize a wide range of text (including handwritten), detect faces, landmarks and is even able to extract logos. This OCR API benefits from having the power of Google image search behind it, providing a huge library of brand logos from which the software can perform feature recognition. 

The free package allows for up to 1,000 free API calls per month. So, not quite as generous as Microsoft, but with this API you are treated to a slightly larger range of extra capabilities.

Price Free → $1.50 per every 1000 transactions
Top features Label detection, handwriting recognition, range of languages supported
Visit the website  

3 Amazon Rekognition

Amazon logo

Image source: Wikimedia Commons

This platform is divided into two main features: Rekognition video and Rekognition image. Amazon’s OCR is referred to as ‘Text in Image’, part of the Rekognition image suite.

Amazon Rekognition boasts the ability to locate and extract text from both natural and on-screen scenes. Once analyzed, the text will be returned with a detected text label and a confidence score. The free tier lets you analyze up to 5,000 images per month. 

The technology behind this platform is sophisticated and there is a high emphasis on customer service, should you run into any problems. However, this API option comes up short in terms of extra capabilities—that is, unless you’re working with videos. Another downside is that the technology only works with images and videos stored in Amazon S3

Price Free → $1 per every 1000 transactions
Top features Text recognition, real-time analysis, activity detection
Visit the website

4 Cloudmersive

Cloudmersive logo

Image source: Cloudmersive

This API has the capabilities to convert scanned images, photos of documents and receipts into text. Over 90 different languages are supported and it even has the ability to deskew and rotate text that has been captured at an angle (don’t push this last one too far, though—cleaning up your images is still vital).

Though it doesn’t come with the backing of a giant tech name, hold on to your hats, because Cloudmersive provides an incredibly generous free version. Sign up for an account and you’ll be allowed up to 50,000 calls per month. Do bear in mind, though, that this tool is designed for simple text recognition and extraction. Don’t expect many extra features.

Price Free → $499.99/month (business package)
Top features OCR, document and data conversion, image recognition and processing
Visit the website

5 Free OCR

Free OCR API logo

Image source: Free OCR

This API sells itself as a simple way to get text extracted from images and PDF documents. Despite its name, not all versions of this software come without a price. Nevertheless, the free tier is fairly generous—allowing for 25,000 requests a month.

Free OCR is probably a good option for people who only have very basic OCR needs. It’s a no frills affair and basically does what it says on the tin. We’d recommend using the free tier to test out the quality of the results. Be wary of free online services, they don’t always provide professional results. On the other hand, such sites are certainly worth a look if you’re just playing around with images for fun.

Price Free → $49.95/month
Top features Locates and extracts text from images and PDF documents
Visit the website

OCR API—which to go for?

As you can see, there are plenty of OCR API’s available on the web. Which one you go for will largely depend on how many images you need to process and the extent to which extra capabilities (like face recognition) will be useful to you. It’s probably worth trying out a few free versions before you commit to anything. 

Working with CAD and don’t have time to experiment with different applications? Rather than signing up for an OCR API, you can rely on software like Scan2CAD to serve all of your needs in one place. Basically, it provides the whole conversion package, including OCR and a full raster and vector editing sweet—saving you from having to fiddle about with different software providers.

No need to take our word for it, sign up for a free trial below and see why Scan2CAD is the ultimate vectorization software! 

Download Scan2CAD Free Trial

]]>
https://www.scan2cad.com/blog/cad/ocr-api-compared/feed/ 1
OCR for PDF Files—How to Convert Text in Your PDF https://www.scan2cad.com/blog/cad/convert-text-in-pdf/ Tue, 04 Sep 2018 10:05:18 +0000 https://www.scan2cad.com/?p=29835 PDFs are one of the most popular file formats around, no matter what industry you’re working in. They’re the perfect way to share and exchange all types of documents. A key reason for this is that you can open them using any standard web browser. Additionally, they can contain a wide variety of graphical information and text. As versatile as PDFs are, however, they’re not infallible. If you’ve got text in your PDF, you’ll have a hard time trying to edit it. The answer, then, is to convert text in your PDF. 

In this article, we’ll go through everything you need to know about converting text in your PDF files—from why it’s necessary to how the process works. We’ll also delve into the importance of OCR technology


Table of Contents


Video: How to convert text in a PDF

View video transcript

Using OCR on PDF files is a common requirement. PDF files contain images just like TIFF files and BMP files and JPEGs, and so on. But PDF Files are a little more complex when it comes to using OCR because they can contain both raster and/or vector elements. You can see here that I’ve loaded a PDF in to Scan2CAD and when I switch off raster we can see what disappears and when we switch off vector, so we can see that this PDF contains both raster and vector elements. So, let’s load that into the canvas and view just the raster elements. Let’s zoom in. So, we can see this is a raster image, I.e. It’s made up of pixels, and we want to use OCR on this image, converting the text using optical character recognition to fully edit all vector text strings. If we now change the view to the vector image, we can see that we already have part of this floor plan in vector format, held within the same PDF. We can see, for example, that we have some text here and so on.

So, what we need to do is convert the raster parts of this PDF into vector, and then automatically combine that with the existing vector, creating a final vector PDF file with all the elements. To do that, we’ll use Scan2CAD. We’ll go to the “vectorize” button. We need to vectorize the image as well as use OCR, because there’s elements that need to be converted to vectors and there’s also elements that need to be converted to text using the OCR. So, I’m going to use the default settings here and we’ll just click “run”, and we can see it’s complete already. We’ll click the “vector color” button to see what kind of results we have here, and we can see that we have pink, which represents the vector text, where we have bedroom and bathroom and so on. So, the results look very good. So, I’m happy with just the defaults we’ve used there. I’m going to kick “okay”. Now, to save that to the canvas and combine it with the existing vector image. Let’s turn off the raster image now in the view and we can see that we have one, complete vector image in which we can edit the text, which was previously raster, to whatever we need. Click “OK”. And we now have fully editable, vector text strings from the original, raster image.

What is a PDF?

PDF LogoBefore we start explaining what OCR is (don’t be too intimidated if it’s an unfamiliar term), we’re going to delve into the PDF file format—what makes it so special and why it can be tricky to work with.

Believe it or not, PDFs have been around since 1993! Yes, they’ve had a pretty lengthy run so far. As we’ve said, the PDF file format is quite easily one of the most versatile and ubiquitous around. With most file formats, there are limitations with compatibility. For example, you might find it difficult to open a TIFF file that someone has sent you. PDFs, by comparison, have the edge—they can be opened on virtually any device. Not only that, but they display documents in the same manner, no matter what you’re opening them with—a far cry from formats like Microsoft Word’s .doc.

One of the most intriguing aspects of the PDF format, however, has to be its ability to support both raster and vector elements. But what exactly is the difference between raster and vector? Let’s take a look…

Raster text

Raster text is made up purely of pixels—tiny squares of color that become more apparent as you change the size of the text. The issue here, then, is that raster text has no structure—it’s just pixels. As such, you’ll find it difficult to edit raster text. To do so, you’d have to use a paint brush or erase specific sections. It’s essentially like painting over an entire canvas—you can’t make changes to individual sections. 

If that wasn’t enough to give you trouble with text in your PDF, raster text comes with an even wider range of issues. You’ll find it impossible to combat pixelation when attempting to zoom into or resize your raster text. Additionally, you won’t be able to attach any data to your text—or edit it within CAD software.

A floorplan saved in .TIFF format with the labels "Bedroom" and "Bathroom"

This raster version of a floorplan is not editable nor scalable.

Vector text

Vector text, on the other hand, is in a world of its own. If you’re looking to edit individual elements of our text, vector is the way to go.

Each element within vector text is mathematically defined. This means that it’s infinitely scalable—you can zoom in or resize as much as you’d like, with absolutely no impact on the text. As a result, you won’t have to worry about any degradation in quality. 

And that’s not all. Vector text is also incredibly easy to edit. Unlike raster text, you can change individual elements within your vector text. So, if you’ve got a typo or you’re looking to add more text, you can do it quickly and efficiently. Even more handily, you can easily take the elements you like and reuse them in other drawings or PDFs. 

Vector floorplan

Meanwhile, you can edit this vector version of a floorplan in CAD software.


Raster text vs. Vector text

Choosing whether to convert raster text to vector all depends on what type of PDF drawings you’re using. If, for example, you’re sharing technical drawings in the PDF file format, you’ll probably need them to be editable in CAD software. To do so, you’ll need your raster text in a vector format.

Why you should avoid raster text in your PDF files….

  • It can’t be edited—making it a pain if you discover a typo or you realize you need to add more text.
  • It’s resolution dependent—if you decide to change the scale, you’ll have to deal with pixelation.
  • You can’t attach any additional data.

Why you should use vector text in your PDF files instead…

  • It can be edited—whether it’s to shorten your text or elaborate on a point, it’s as easy as pie!
  • It’s infinitely scalable—you don’t have to worry about your text losing quality. 
  • You can attach additional data to your text object.

Needless to say, using vector text in your PDF files will make your life easier—cutting your workload in half, if you end up needing to make alterations to any of your PDF files. So, if you’ve got a PDF file containing raster text, you’ve come to the right place—we’ll show you how you can use Scan2CAD’s OCR capabilities to convert text in your PDF.

Can’t I just use an online converter?

You’re probably wondering if it might not just be simpler to use an online converter—rather than use Scan2CAD and learn what OCR actually is. We’ll make it short and sweet: you’d be better off avoiding online converters altogether. 

Most conversion software you’ll come across will struggle when it comes to raster text. In most cases, they’re not advanced enough to differentiate between text and images. Your text will instead be converted to simple vector shapes—like angles and arcs. This is otherwise known as exploded text: 

Exploded text

Example of exploded text. The characters are formed of vector shapes rather than actual text.

Not only is this not actual text, it’s also incredibly difficult to edit. This makes the entire conversion process redundant. Instead, you want a converter that will give you text strings, as Scan2CAD does: 

Example of vector text strings. This is the desired result of vectorization because they can be edited and displayed correctly.

As a market-leading raster to vector conversion software, you can be sure that Scan2CAD has what it takes to convert text in your PDF to vector. Using OCR technology, Scan2CAD can produce impeccable results.


What is OCR?

Optical Character Recognition is industry-leading technology that enables Scan2CAD to detect any raster text in your file and subsequently convert it to vector text. Though it might sound pretty simple, it can oftentimes be a bit more difficult depending on the situation. 

For starters, there are hundreds of font styles out there—making it pretty tricky for some computers to recognize letters. Take the example of the letter ‘g’ below…

Lowercase letter g in six fonts

Easy, right? Wrong. Computers find it difficult to figure out what the image represents. You’ve probably come across this problem before if you’ve ever attempted to convert text with an online converter—the converter, more often than not, will give you a garbled output, as a result of it not recognizing your text. 

And that’s where OCR technology saves the day! OCR works by teaching itself to learn and recognize the shape of each letter. Once it does so, it will be able to detect them when they show up in any image. It’s so advanced that it can work on a feature detection basis. 

Of course, as groundbreaking as OCR is, that’s not to say it can work with any raster textwhich is why we created our own raster text quality checklist


Get the best conversion result

Before we get into the thick of Scan2CAD’s conversion process, we first have to look at the ways in which you can increase your chances of a successful conversion. To put it simply, you need to ensure that your drawing is viable. If your text suffers from any of the following issues, you’re probably going to struggle with getting a decent output…

Poor image quality for raster to vector conversion

Raster images with any of these problems are unlikely to convert successfully.

As incredible as Scan2CAD’s OCR technology is, it still has limitations. As such, you should try to ensure you’re working with a high quality PDF—the raster text should be clean and crisp. Additionally, you need to meet a few conditions…

Is your text legible?

If the raster text in your PDF file is of poor quality, you’re probably better off replacing it entirely. If you’re unable to read it, for example, how do you expect the OCR technology to do so? To remedy this, you can retype your text manually in Scan2CAD. 

Are the text characters touching?

Scan2CAD won’t be able to recognize text characters that are touching. If you don’t want to have to manually retype your text, you can use software like Scan2CAD which will automatically split the text characters during the OCR process. 

Is the text written over other drawing elements?

It can be difficult for Scan2CAD to recognize raster text in a PDF if it’s written over drawing elements. Similarly, Scan2CAD will struggle if your text is underlined or inside a box. 

Is the text at more than one orientation?

If you’ve got text displayed in your PDF at different orientations, it would be more challenging to use OCR compared to a file with all text at horizontal rotation. Fortunately, good OCR software will support text at multiple locations.

 


Convert text in your PDF with Scan2CAD

Prepare your image

To ensure you get the best possible output, you should clean up your image before you begin the conversion process. Scan2CAD’s got its own suite of ‘Raster Effects’ that can help you remove any image distortion and clean image noise. 

Choose conversion settings

To start the process along, you’ll need to click the convert icon: Vectorize icon

Once you’ve done so, the Vectorization Settings dialog box will launch. You can choose to select ‘Vectorize‘, ‘OCR‘ or ‘Vectorize and OCR‘, depending on what type of drawing you’re working with. 

If you’re converting text only, then you should select ‘OCR’. If, on the other hand, you’re converting a raster image along with text, then ‘Vectorize and OCR’ is your go-to option. Got a PDF containing vector text and raster text? Not to worry: Scan2CAD can convert the raster text and combine it with existing vector elements. 

No matter which conversion option you pick, you’ll need to look at OCR settings under the ‘OCR’ tab. Here, you need to specify the size and rotation of the text in your image. 

Convert

Once you’re happy with the settings, it’s time to click ‘Run‘. Wait for the conversion process to finish and look at the resulting output in the preview window. Not happy? Alter the settings and run the conversion again until you’ve got the output you requirethen click ‘OK‘ to save your vector text. 

And that’s all there is to it!

]]>
OCR Guide: Converting Handwritten Text https://www.scan2cad.com/blog/tips/converting-handwritten-text/ https://www.scan2cad.com/blog/tips/converting-handwritten-text/#comments Thu, 02 Aug 2018 10:06:03 +0000 https://www.scan2cad.com/?p=29041 The technology that enables computers to recognize text–Optical Character Recognition—is constantly evolving, expanding the parameters of what we can convert. It now boasts the ability to convert even handwritten text. This is an impressive feat—human handwriting is, of course, the most random and changeable of fonts. Not only does it differ from person to person, but the handwriting of one individual will not be identical each time they write. That’s a lot of variations for a computer to attempt to detect!

Any kind of raster text is tricky to convert, but handwritten characters take things to a whole new level of complexity. In contrast to established fonts, the latter rarely contain regular or predictable patterns—which is basically what computers are searching for when you instruct them to find text within an image. This means that, if you’re looking to convert handwritten text, you need to use very sophisticated technology. Achieving the desired results depends both on selecting the right software and ensuring your original image is optimized for conversion.

This article lays out the extent to which it is realistically possible to convert handwritten text using OCR. We explore the potential and limits of current technology, and provide advice on how to get the most out of your handwritten work in a CAD context. 

Comparing handwritten text vs OCR vector text

The results of using OCR on handwritten text in Scan2CAD


Table of Contents


What is OCR?

Optical Character Recognition (OCR) in Scan2CAD

Optical Character Recognition, or OCR, is the technology that allows software to recognize text within an image. It thus performs a vital stage in the process of converting raster text to vector text. In fact, OCR’s ability to extract text from graphics or documents makes it an incredibly useful tool across a wide range of industries. Consider security cameras that can pick up car number plates, or digital architectural blueprints containing editable annotations—neither would be possible without OCR.

It comes in particularly handy in the world of CAD. Anyone who’s attempted to manually trace an image with text in order to convert it to a vector format knows that getting a computer to do the job is much easier! Until fairly recently, though, automatic tracing was not recommended if the image to be converted included handwritten text. A computer simply cannot compete with the human eye’s ability to recognize letters and numbers.

With OCR technology, however, certain software can now be trained to recognize a wide range of fonts and convert them accordingly.


How does OCR work?

OCR uses more than one approach when it comes to recognising text. The most basic way the technology distinguishes characters from pictures is through a technique known as pattern recognition. This involves a computer comparing objects within an image to letters already stored within its software. In other words, the software is equipped with a library of characters and the computer will search for the same patterns within your work and recognize when it finds a match.

OCR-A Font Preview

The computer refers to its own catalog of characters to carry out pattern recognition

The problem with pattern recognition, at least for our purposes, is that it cannot detect handwritten text. No one writes in Times New Roman, after all. Thankfully, as the technology has become more sophisticated, it increasingly relies on a different tactic known as feature extraction.

Rather than trying to recognize full letters, feature extraction occurs when a computer detects certain features (lines and loops, for example) and understands that they signify a character. The letter ‘H’, for instance, will be picked up by the software whenever it detects two vertical lines joined in the middle by a smaller, horizontal line.

This technique means that a computer’s ability to recognize characters is not constrained to a limited number of fonts. From here, it can be trained to detect even handwritten text. 

Neural networks

Once software is able to perform feature extraction, it may be trained to detect features in handwritten text. Using neural networks, conversion programs like Scan2CAD can train OCR to recognize features from text that the user provides. Once it has learned to recognize a certain style of text from examples you have input, you can train the software to detect the same writing in different pieces of work.

If OCR is trained to recognize a particular individual’s handwriting (perhaps someone who creates technical drawings), it opens up a whole world of possibilities in terms of what they can do with their work.


Why convert handwritten text?

Fountain pen writing on lined paper

If you’re starting out with handwritten text (either scanned into your computer or written on a tablet), it will be in a raster format. Converting the image to a vector format will make your work more versatile and allow edits to be made by yourself and others.

Problems with raster images

Quality issues

Raster images are comprised of pixels. This means that if you attempt to zoom into or rescale the image you’re working on, the overall quality will suffer. In a professional context this is not exactly ideal. Take technical drawings, for example—your work may appear blurry when people attempt to zoom in to inspect certain details. Plus, it’s useful to be able to resize an image for different purposes. This is not possible with a raster file without compromising its overall quality.

Vector images, on the other hand, are made up of objects. Each object (be it an arc, path, line,  etc.) is defined by a mathematical equation. As every individual element has its own fixed relative position, re-scaling or zooming will not affect the overall quality of the image. 

Editing your images with CAD software

Vector files are the ultimate choice if you are looking to edit your work with CAD or CNC software. The objects that comprise a vector image can be edited individually, allowing for a high level of accuracy in the process. Raster files are not compatible with CAD software and even the most basic adjustments will have an impact on the entire image. 

Anyone working in an industry that uses CAD requires vector images to get the most out of their projects. If you are working on an architectural design that includes useful handwritten annotations, for example, you want your collaborators to be able to both read and amend the text where necessary. This level of precision and control is not possible with a raster image.


How to ensure successful conversion

Converting handwritten text, though possible, is by no means a simple task. You need to be realistic about the kind of characters a computer is going to be able to detect. To optimize your chances of success, you need to make sure your original image is viable. If you’re looking for professional results, the image needs to be cleaned up as much as possible. Consult our raster text quality checklist to ensure you have completed this stage.

Poor image quality for raster to vector conversion

Raster images with any of these problems are unlikely to convert successfully.

OCR software still has its limitations. If you find that your handwritten text cannot be converted automatically, it may be best to simply type over it with vector text. 

Image quality

The biggest issue that is flagged up by conversion software is image quality. If you want good quality results, you need to start with a good quality image. Computers are incredibly powerful, but they’re not miracle workers.

If the original file is of a low resolution, for instance, the software will have a hard enough time picking up any details—let alone the handwritten text! Your image should be clean, crisp and contain no overlapping text. It should go without saying, therefore, that joined up handwriting will be impossible for a computer to detect.

Font

There is actually a font specifically designed to be read by OCR technology, handily named OCR-A. It’s commonly used for banking purposes—you’ll recognize it as the font on credit cards and cheques.

Digits in OCR-A font

Digits in OCR-A font

Generally speaking, for OCR purposes, established fonts like Arial are a suitable choice. This obviously isn’t realistic for what we’re covering here, but it’s a good rule of thumb to remember for general OCR practices. At least try to ensure your handwriting is as neat, consistent and clear as it can be.

As you’ll be using a non-standard font (handwriting), make use of technology like the aforementioned neural networks. If the relevant software is already trained to recognize your writing, you stand a higher chance of success when it comes to conversion. 

The right software

Example of vector text strings. This is the desired result of vectorization because they can be edited and displayed correctly.

Repeat after me: not all conversion software is created equal! This is especially apparent when it comes to converting text, be it handwritten or typed. The result you’re looking for is a text string. If you use a cheap online converter, you may end up with what is known as exploded text.

The latter is not in fact text, but a collection of vector shapes that are basically impossible to edit. Scan2CAD, meanwhile, will ensure that conversion produces text strings—text that is rendered correctly, presented logically and can be edited easily. 


How to convert handwritten text

Once your raster image has been cleaned up and you’ve run through the checklist, it’s time to convert. Scan2CAD allows you to do this with handwritten text, and it works in two stages.

The first stage is font training which, as we’ve previously mentioned, involves using neural networks to train the software to recognize your writing. This is a fairly complex process, but don’t worry—the computer is doing most of the work!

In short, you’ll need to create a new training set, add your text examples, train the neural network to recognize them and then test that it has learned the new training set. For detailed instructions head over to the ‘How to train Scan2CAD to recognize a font‘ section of the user manual. 

Now that your handwriting is detectable by the software, you can carry out the conversion following the instructions under the convert a raster image with text section. 

Once your image is saved in a vector format, you can start really making the most of your work! 

Video: Converting handwritten text with Scan2CAD

View video transcript

In this video, we will be converting handwritten text in this image using OCR to editable vector text strings. So we’re going to be doing this with Scan2CAD. If you don’t know what OCR is, it’s Optical Character Recognition, and that’s the process of converting text in an image, such as this image – we can see the pixels here – to editable vector text strings.

This is different to vectorizing the image. I’ll show you, first, an example of vectorizing if we just rush through the settings here in Scan2CAD to show you a simple conversion. We’ve vectorized the image now, and we’re viewing a vector design that can be edited like so. But these are not editable vector text strings, they’re just polygons in the shape of the original text. So let’s close the vector file we created, and instead, we are going to use OCR.

So we’ll go to the OCR option, go to OCR settings. I’ll turn on the image in the preview to see the image here. I’ll select the character size from the image just by drawing like so. Let’s increase that a little bit to 60. We don’t need to enable vertical text since both the text in this image are horizontal. We have English as the language. And let’s change the drawing type or document type to text. If we had other elements in this image that we wanted to convert and not use OCR on it, for example, if this is a technical drawing that contained lines and arcs, and circles as well as text then we’d use technical. But since this is 100% text, we can use our option. Click Run to run the process, and it’s complete. And let’s save the results of the canvas by clicking OK. And we’re viewing the vector and raster image here. Let’s go to View Vector Colors so we can view the text in pink. Let’s turn off the raster image for a second underneath.

So we’ve got hand-printed script all correctly recognized. We have hand-printed script down the bottom here, but there’s a couple of little elements we’d want to change. We can do so manually like so. Seems to be an extra space character because in the handwriting, there’s more space than expected. Click OK to save the results. So we can see here, as we compare the raster and the vector, how it’s done pretty well at recognizing this text. This is because the handwriting is isolated, so we have characters that aren’t touching. It’s quite eligible. And generally, it’s quite a suitable type of handwriting for OCR.

]]>
https://www.scan2cad.com/blog/tips/converting-handwritten-text/feed/ 4
Choosing OCR Software: Converting Text in Technical Drawings https://www.scan2cad.com/blog/cad/ocr-text-technical-drawings/ https://www.scan2cad.com/blog/cad/ocr-text-technical-drawings/#comments Wed, 20 Jun 2018 11:33:30 +0000 https://www.scan2cad.com/?p=27661 A key benefit of vector images over their raster counterparts is their ability to include editable text. The text in a raster image is nothing more than a collection of pixels. As such, it’s indistinguishable on a technical level from the remainder of the image. A vector image, however, is capable of storing text as a separate, editable entity. This means that, if you have a technical drawing containing text, then converting it from raster to vector is the logical choice.

In this article, we’ll run through everything you need to know about converting text in technical drawings. We’ll start with the reasons why it’s so important to convert your text. We’ll also go into detail about why this process can be quite tricky, and alert you to some of the problems you may encounter. After that, we’ll move onto how to perform the conversion, and provide some pro tips to help you ensure that everything passes off without a hitch. We’ll even show you the best OCR software to use to convert your text. Let’s get started!


Composition of raster text

A floorplan saved in .TIFF format with the labels "Bedroom" and "Bathroom"

This raster version of a floorplan is not editable nor scalable.

Pixels are the building blocks that make up the entirety of a raster image. Each pixel is no more and no less than a square of color. As such, there is no particular structure to a raster image, and nothing to distinguish one part of an image from another.

With this in mind, any text that features in a raster image is, in a technical sense, nothing more than pixels. Anyone who’s ever had to edit a raster image will be well aware of the problems this causes.

It is impossible, for example, to go back and edit the text in your raster image. If you’re lucky, you might be able to use a paint brush or eraser tool to white out the text in your image—already a cumbersome process. In some cases, however, even this might not be possible.

To put a long story short, raster text is simply unsuitable for editing. Worse still, it comes with a whole host of other common raster image issues. These include pixelation when zooming or scaling an image, and the inability to attach data to your text.


Composition of vector text

Vector floorplan

Meanwhile, you can edit this vector version of a floorplan in CAD software.

Vector text, on the other hand, is an entirely different beast. Unlike the pixels which make up a raster image, all elements within a vector image are distinct.

Each element is mathematically defined, with a fixed relative position within the image. As a result, each element appears the same at any scale, making it possible to zoom into an image without losing quality.

Another upshot of this is that it is possible to easily edit each element within an image. This includes vector text. Say, for example, you noticed a typo, or simply wished to add more information to the text within your image. As long as you’re using vector text, this is a cinch.

It’s also possible to attach information to each element within a vector image. This means that you can add additional specifications to objects and text.


Raster vs. vector text in technical drawings

For many purposes, raster text works perfectly well. Problems arise, however, when you have a technical drawing. To use these drawings to their full potential, they must be fully editable in CAD software. As such, if you’ve got a technical drawing, you need it to be in a vector format.

Why you shouldn’t use raster text… …and why you should use vector text
You can’t edit raster text
There is no structure to raster text
It easily becomes pixelated upon zooming or scaling
You can’t attach any additional data to it
It’s easy to edit vector text in CAD software
Vector text is a mathematically defined object
 It retains its quality at any scale
 You can attach specifications to the text object

The list of pros and cons lays the choice bare: vector text is simply better for technical drawings. However, while this advice is all well and good when creating a new technical drawing, it doesn’t solve the issue of what to do when you already have a technical drawing saved as a raster image.

If this is the case, then you’ll need to move onto converting your raster text to vector text.


How to convert text in technical drawings

Converting an image of an electrical schematic containing text to vector text stirngs

Editing technical drawings requires CAD software—and to use CAD software, you’ll need to be using a vector image. The key reasons for this are that vector images are editable and versatile.

If you were creating an electrical schematic, for example, then CAD software would enable you to define its components and materials. This, in turn, would make it much easier to produce the design when it comes to the manufacturing stage.

If you’re familiar with converting raster images to vector, you’ll know that the process, which is known as vectorization, is technically highly complex. The reason why vectorization is so tricky comes down to the fact that a raster image has no structure.

This lack of structure means that it’s difficult for software to detect what exactly the image contains. The human eye may be able to tell, for example, that an image contains the word “CAD” in black on a white background. To a machine, however, it’s just a collection of black, white and gray pixels.

In the past, the only way around this was to overlay a vector layer on top of your raster image, and manually type new vector text in place of the old raster text. This isn’t exactly the most elegant solution. Luckily, OCR software now makes it possible to automatically convert text in technical drawings.


What is OCR?

Optical Character Recognition, or OCR, is the technology which lets software detect raster text and convert it to vector text. As such, it’s OCR that enables a computer to convert text in technical drawings.

For OCR to work, it needs to be able to recognize certain letterforms. This is a fairly easy task for the human eye, but a very tricky one for a computer. Part of the reason for this is the sheer difference between how text appears in different fonts. Take, for example, the six letter ‘g’s in the image below.

Lowercase letter g in six fonts

As a human, it’s easy to tell all six of these forms represent the same letter. A computer, however, has no additional information to work off, making it hard to discern what exactly the image represents.

This is where OCR software comes in. OCR software ‘learns’ the shapes of each letter, enabling it to recognize them when they appear in an image. Initially, OCR could only recognize the forms of a single font: OCR-A, which you may recognize on your checkbook. Over time, the technology learned to recognize other common fonts, such as Times New Roman and Helvetica.

Today’s OCR software, however, has much more advanced capabilities. That’s because, instead of trying to spot a letterform in its entirety, it works on the basis of feature detection. For example, OCR may recognize that one straight horizontal line lying perpendicular atop a straight vertical line forms a capital T. This enables OCR to go beyond recognizing specific fonts and allows it to take an ‘omnifont’ approach.


Exploded text vs. vector text

Some raster-to-vector conversion software can’t actually recognize text within a raster image. Instead, they’ll convert the text into vector lines and curves. This is called ‘exploded text’, and, unfortunately, it is useless for practical purposes. You can see an example of exploded text below:

Exploded text

In this image, each letter actually comprises multiple vector lines. Put simply, it isn’t really text. When you convert your image with Scan2CAD, however, you’ll get something that looks more like this:

What you can see above is a text string. It’s actual vector text which you can edit by typing—just as you would edit text in a word processor.

In certain circumstances, you might find yourself with exploded text, either due to using sub-par conversion software or having received a file from a colleague containing it. Meanwhile, standard OCR software doesn’t concern itself with exploded text, focusing only on converting from raster to vector.

Fortunately, Scan2CAD is no standard OCR software. In fact, it has the capability to convert exploded text into text strings. This means you can undo any inadvertent errors and truly get the most out of your technical drawing.


Getting the best conversion results

Great OCR software is an essential tool to have in your arsenal when you convert text in technical drawings. However, there are still extra steps you should take to improve the quality of your conversion.

Choose the right raster file type

The first is to choose the right raster file type to begin with. This is of particular importance if you’re scanning in a technical drawing from paper, such as a floor plan. When scanning, Scan2CAD recommends choosing the TIFF file format. TIFF stands ahead of its rival raster formats due to its use of lossless compression. This means that the image won’t lose quality and will keep vital detail. This file format also allows you to include tags. You therefore have the option to save the file as a GeoTIFF, which you can convert for CAD and GIS.

JPG, on the other hand, is a file format to avoid. The reason for this is its use of lossy compression. This compression method compromises image quality in favour of a smaller file size. In certain circumstances, this makes sense—for example, when you need to save thousands of photos on a smartphone. For CAD purposes, however, it’s a bad trade-off.

Unfortunately, it isn’t always possible to decide on a drawing’s file type yourself. In such cases, you don’t have any control over the quality of the image when you open it. You can, however, improve the image’s quality so that it is more suitable for conversion.

Clean up your text

To get the best possible results when you convert text in technical drawings, you’ll need to ensure that the raster text you’re working with is up to scratch. After all, even the best OCR software out there can’t decipher gibberish. You know what they say: garbage in, garbage out!

Here at Scan2CAD, we created a Raster Text Quality Checklist to help you stay on the straight and narrow. Before attempting any conversion, you should ensure that any text characters in your raster image are:

  • Easily legible
  • Do not touch each other
  • Do not touch other elements within your drawing
  • Are not at different orientations
  • Are in a font that Scan2CAD can recognize

We also have a few pro tips at hand to help you turn poor quality text into something that’s ready for vectorization.

  • If you plan on using a non-standard font, be sure to train Scan2CAD’s neural networks first. Proper testing can help ensure that Scan2CAD recognizes the characters present in your image.
  • Sometimes, characters may touch. In these instances, it is difficult for OCR software to tell where one character ends and the next begins. Scan2CAD’s Split tool can help you to separate these characters for better results.
  • In some cases, your text may simply not be legible. If this is the case, even a human might struggle to tell what the text really says—let alone OCR software! Your best bet here may be to simply type over the text.

Use the right software

On the left is text converted using an online converter. On the right is text converted by Scan2CAD.

On the left is text converted using an online converter. On the right is text converted by Scan2CAD. The difference is clear.

This is something we can’t stress enough: choosing the right conversion software is make-or-break for your vector text.

As we’ve noted, some raster-to-vector software simply can’t tell the difference between text and other elements within an image. Using such software will provide you with near-useless exploded text—and a resultant headache.

Poor-quality conversion results are a common pitfall of online file converters. Unfortunately, this is far from the worst problem they can cause. In fact, using an online file converter can endanger both the privacy of your intellectual property and the security of your system. Putting all of this at risk for the sake of sub-par vector text just isn’t worth it.

The smart move here is to use dedicated software for the conversion of technical drawings to vector images. Scan2CAD, therefore, is the natural choice. It excels at converting text and images, with over 20 years at the cutting edge of vectorization. With its neural networks able to understand text of all varieties, it’s one step ahead of the game.


How to convert text in technical drawings using Scan2CAD

With so much technical know-how involved in the creation of great OCR software, you might expect the conversion process itself to be similarly tricky. Thankfully, you’d be wrong: it’s easy as pie to convert text in technical drawings with Scan2CAD.

In the following video we convert a technical drawing which contains text. Notice how the appropriate elements are converted using OCR and the other elements in the image are vectorized. For this we use a process called object identification. After conversion we can directly edit the text strings.

View video transcript

In this video, we’ll be converting this technical drawing, which contains text and a lot of other objects, which you may see in a electrical schematic or other technical drawings. And we’ll be converting this into vector. What we want to do is recognize text in the image, this is a raster image. We want to recognize text in here using OCR, but we also want to convert the other objects which are not characters to their appropriate vector elements. To do that, we need to use Scan2CAD’s object recognition, sending elements in the image, that look like text to our OCR and elements that don’t look like text to our vectorization. First, we’ll quickly run a threshold to make sure the image is suitable for conversion. Okay, I’ve just set the threshold level somewhere around there, where it’s okay. I’m not going to continue with any other raster effects. Raster effects are tools for cleaning up the image to make it suitable for conversion. What we’ll do is just go straight into a vectorization now, and I’m gonna choose the technical vectorization, meaning we want to Scan2CAD’s objects recognition and we’ll choose electrical as the default vectorization options.

This video isn’t intended to be a tutorial for all the options within Scan2CAD, so we won’t go over the object identification options and so on. What we’ll do is turn on vectorize and OCR, and go to the OCR box, turn on the image in the preview, so we can select from image and select the character size that we need to run the OCR on. Okay, I’m happy with everything as it is. We don’t need to enable vertical, ’cause there’s no vertical text in this drawing. So we’ll click Run. This runs the vectorization and the OCR. And it’s now complete. I’m reasonably happy with that by looking at the preview, so I’m gonna click okay to save that to my canvas. What we’re viewing right now is both the raster image and the vector image. I’ll go to view, just out of the view of this video and click View Vector Colors. That shows us a type of vector by their color, red represents vector lines, blue over here, you can see represents the vector circle objects. We have pink representing vector arc objects, but we also have text. So, I’m gonna turn off the raster image, so we can just view the vector and zoom in. Let’s have a look.

So we can see the vector text now, which is fully editable, if we wanted to, you can just click the Edit Text and edit accordingly and compare it to the raster image. It looks very good and I’m happy with the conversion.

 

]]>
https://www.scan2cad.com/blog/cad/ocr-text-technical-drawings/feed/ 2
Converting Raster Text to Vector Text—A Quick Guide https://www.scan2cad.com/blog/cad/convert-raster-text-to-vector-text/ Mon, 04 Jun 2018 11:13:29 +0000 https://www.scan2cad.com/?p=27506 Converting your designs from raster to vector has many benefits. The quality of the work isn’t affected when re-scaling or re-sizing; it can be edited using CAD software; files tend to be smaller and thus easy to share—the list goes on. But what happens when your designs contain text? As you may have guessed, it’s not quite as straightforward.

Depending on the conversion software you’re using, text is not always converted correctly—or even recognised at all! Thankfully, Scan2CAD includes a special feature that ensures your text won’t be overlooked during the conversion process: OCR technology. In this article, we’ll explore how this feature can elevate your work and how easy it is to get started with vectorizing your text today.


Table of contents


Raster vs. vector: why convert your text?

Whether you’re working on engineering plans, architectural designs or something similar, it’s likely you’ll want your project to be editable and user-friendly. In other words, it’s useful to be able to share your designs across different platforms and allow input from a variety of people.

 

If your work is stored in a raster format, it can suffer from quality issues and may be problematic to share due to large file sizes. Additionally, if a team in another office tries to zoom in on a certain aspect of your design, the overall image will become blurry because raster graphics are made up of pixels. This also means it cannot be edited by CAD software.

Vector files, on the other hand, are made up of individual elements (based on mathematical equations) that can be recognised (and thus manipulated) by CAD or CNC programs. Along with their smaller size and high quality, this makes vector formats a popular choice with designers and engineers alike.


Common problems with text conversion

Raster quality issues

Poor image quality for raster to vector conversion

Don’t bother converting any raster image that sports any of these problems…

One of the most common problems you may run into when vectorizing text is something that can be fixed before the conversion process has even begun: the quality of your raster image. You need to make sure that the graphic you’re converting is of as high a quality as possible. As is the case with any image about to be vectorized, text should be as clear and sharp as you can make it. 

Even when using Scan2CAD, it’s important that you clean-up the designs you want to convert. Bear in mind that your image may look perfect in full screen, but once you zoom in, flaws begin to appear. If you’re struggling with how to go about cleaning up your work, take a look at our raster quality checklist. Scan2CAD includes an image editing suite with all the tools you need to meet these standards. 

Text strings vs. exploded text

Some conversion software will fail to differentiate between images and text. In these cases the text is not recognised as such and so is simply converted to vector shapes—like circles and angles. The outcome is known as exploded text. It’s not actually text, often looks wrong, and is very hard (if not impossible) to edit. 

Exploded text

Example of exploded text.

When converting text from raster to vector, you’re aiming to create text strings—actual vector text that can be edited. Thankfully, Scan2CAD has the features required to produce text strings rather than exploded text.

Example of vector text strings.


Using Scan2CAD to convert text

Scan2CAD is a great option to pick if you’re converting raster text to vector. Not only does it provide the means to clean-up your work prior to conversion and thus produce the best results, it also offers a comprehensive editing suite that means you can edit the text directly within your designs post-conversion.

Meanwhile, forget the hassle of manual tracing—Scan2CAD boasts a special feature that ensures automatic text conversion with professional results: OCR

OCR technology

Optical Character Recognition (OCR) in Scan2CAD

Optical Character Recognition (OCR) is technology that draws on its own database of patterns to detect text characters in your work. It can then convert the text in your designs to editable vector text (or text strings). What’s even more helpful is that said text will also be rendered correctly and presented logically—a result not all conversion programs can produce. 

Even with this sophisticated technology at your disposal, though, converting text can be a tricky business. Once you’ve deemed the quality of your raster to be good enough to vectorize, it’s time to have a quick look through the raster text quality checklist. Yes, we at Scan2CAD love a quality checklist—we’re all about professional results, after all!

Here’s the raster text quality checklist basics:

  • Although OCR recognises most standard font patterns, rarer ones may not be detected. Check out the font training section of the manual for help with resolving this issue.
  • OCR can struggle to read handwritten text. If you’ve got the time, you can actually train Scan2CAD’s neural networks to recognise it, but you’re probably better off typing over handwritten words with vector text.
  • Watch out for overlapping elements—if the text happens to be written over drawing elements, for instance, you should pick another image because OCR won’t be able to detect the text.
  • Avoid having characters that are very close together because OCR will struggle to separate them. To combat this, select OCRSettingsSplit before starting OCR.

If your work meets these conditions, OCR should recognise your raster text and you’ll be good to go!

Step-by-step guide to converting a raster image with text

Converting text in an electrical schematic image with Scan2CAD

Step 1 – Preparing your image for conversion

The first step is to ensure your image is suitable for OCR. Scan2CAD has a suite of tools to clean your image making it suitable for conversion, we call these tools ‘Raster Effects’. Use the Raster Effects to remove image distortion and clean any image noise.

Step 2 – Choose your conversion settings

Click the convert icon, which looks like this:  Vectorize icon

The Vectorization Settings dialogue will launch. You’ll find the OCR settings under the ‘OCR’ tab. Don’t forget to choose the size of the text in the raster image and the text rotation.

Step 3 – Convert

Hit the ‘Run’ button to convert your design. Wait for the conversion to complete and look at the results in the preview window. If you do not get the results you require, tweak the settings and run the conversion again until you’re happy. Finally, hit ‘OK’ to save the results.

And you’re done!


Once you’ve successfully vectorized your work (text and all!), why not make the most of Scan2CAD’s features and check out our tutorials for converting and editing vector text—it’s great for adding those finishing touches. Whether you’re working with images, text, or both, Scan2CAD is the best option for all of your conversion needs. Take advantage of our 14 day free trial and see the results for yourself!

]]>
How does OCR Work? A Short Explanation https://www.scan2cad.com/blog/tips/how-does-ocr-work/ https://www.scan2cad.com/blog/tips/how-does-ocr-work/#comments Wed, 18 May 2016 17:56:52 +0000 https://www.scan2cad.com/?p=13925 Optical Character Recognition (OCR) in Scan2CAD‘Optical Character Recognition’ – or OCR – is a process which allows us to convert text contained in images into editable documents. OCR can extract text from a scanned document or an image of a document; really, any image with text in it.

This technology is employed for a variety of applications, such as data entry of documents, automatic number plate recognition, digitisation of printed documents in Google Books, and even beating CAPTCHA anti-bot systems!

In the CAD world, OCR plays a crucial role in converting raster sketches into editable CAD drawings. In this article, we’ll go behind the scenes to understand how OCR works!

There are two different techniques (or algorithms) in optical character recognition: pattern recognition and feature extraction, and each technique is worth looking at in a little bit more detail.

Pattern recognition

OCR-A Font Preview

The computer matches the text with its dictionary of characters.

Using this technique, the computer tries to recognize the entire character and matches it to the matrix of characters stored in the software. As a result, this technique is also known as pattern matching or matrix matching. The drawback of this technique is that it relies on the input characters and the stored characters being of the same font and same scale. Check out the photo on the left — it’s the first font created in the 1960s for OCR — the OCR-A — where every letter had the same width. All cheques were printed using this font to allow banking computers to process them!

Scan2CAD applies Neural Networks to the task of pattern matching. Neural networks work in an analogous way to the human brain. They learn to recognize shapes and patterns from a range of examples. Scan2CAD includes a feature allowing the user to train their own Neural Networks to recognize font styles unique to their drawings.

Feature extraction

Feature detection in OCR

Letter A = two angled lines + one horizontal line

This one is a much more sophisticated way of spotting characters. It decomposes characters into “features” like lines, closed loops, line directions and intersections.

Let’s take letter A as an example. If the computer sees two angled lines that meet at the top, and both lines are joined together by a horizontal line in the middle, that’s a letter A.

By using rules like these, the program can identify most capital ‘A’s, regardless of the font that it is written in.

Pre-processing to improve text recognition

In order to recognize text effectively, the software must pre-process the image using techniques such as:

  • De-skew – Titlting the image a few degrees in order to make the lines of text perfectly horizontal or vertical
  • Despeckle – Removing spots and smoothing the edges of the characters
  • Character isolation – Split touching characters that may have bled into each other
  • Layout analysis – Identifying text positions, columns and paragraphs
  • Line removal – Removing overlying lines or boxes

More sophisticated software conducts post-processing steps as well. The software would match the transcribed output to a lexicon (a dictionary of allowed characters), or conduct near-neighbor analysis to identify words that are usually seen together (for example, the phrase “living doom” will be automatically corrected to “living room”, since the word “living” and “room” often occur together).

OCR Technology in Scan2CAD

Scan2CAD is a raster-to-vector conversion engine. It converts images into vector drawings, so that it can be edited using other CAD/CAM and CNC software. Since many images contain text, OCR is a vital part of the raster-to-vector conversion process. Unlike many CAD image converters, Scan2CAD converts text in raster images into proper editable vector text strings, instead of constructing it out of individual vector entities (such as lines and arcs.)

You can help ensure that the text on your raster image is ready for vectorization by following Scan2CAD’s Raster Text Quality Checklist.

Scan2CAD Convert PDF to DXF - Text conversion using OCR

On the left is CAD text that’s converted using Scan2CAD. On the right is text converted using another software, that isn’t reassembled into the logical sentences very accurately.

With OCR, there’s no need to manually retype the labels, and these text vectors are easily editable too. In many cases, all you have to do is click “OCR” on the ribbon at the top of your workspace, and voila! Try OCR for yourself using our 14-day FREE trial of Scan2CAD.


Further reading:

]]>
https://www.scan2cad.com/blog/tips/how-does-ocr-work/feed/ 2