Converting Image pdf’s to Optimized Usable Content
Quite often, I receive new industrial clients who come to Cazbah right after working with another marketing company. In those situations, I go through a process that is similar to bringing on a new client; but rather than designing and creating a new website, I thoroughly review their old website to see what is and isn’t working. More often than not, I see remnants of a very rushed website build process, which typically leads to “low hanging fruit” opportunities that can have significant value.
With commercial and industrial companies, there are often hundreds to thousands of products, white papers, specifications etc. These take substantial time to integrate into a website through your ecommerce platform. What I have found is that many of these content items are just image pdf’s. That’s the kind of pdf that has little to no value to Google, except maybe the filename and link. This is a great opportunity for crawlable, relevant and optimized web content. Additionally, customers often find it easier than creating fresh content. This also can serve as an example of what properly structured content look like, making it easier for them to write their own in the future.
This is the process I use for changing image pdf’s or infographics into usable web page content.
To determine if a pdf is an image or crawlable text, open it in your pdf viewer and then try to highlight an area of text. If you are able to do that, then this is most likely crawlable by Google bots already. There are ways to optimize those, but today we want to focus on the pdf’s where you are not able to highlight any text. In this case, your document is an image saved as a pdf and is providing little if any search value to your website.
Organize your project
Create a new folder in Google Drive or on your desktop with the name of the company site, and then download the pdf’s that you would like to convert and optimize.
Locate your conversion tool
Once that is completed, you will need to find an OCR (Optical Character Recognition) converter. If you have the Adobe professional package you can use that, but I typically go to https://www.onlineocr.net/ and utilize their free tool.
- Select the file in your folder that you’d like to convert
- Click Submit
- Wait a few seconds and voila you’ve got text!
I have also recently found a new pdf tool that integrates with Chrome called “Kami” that detects a scanned document and automatically converts it with it’s OCR capabilities.
Edit your new content
OCR is not a perfect process. There will be a few errors, but it will get you close. Copy and paste the text into your editor of choice (Google Docs, Microsoft Word etc.) and go through your document and fix the spelling and grammar errors.
Optimize your content
Finally, you will need to add or change your headings where appropriate. This is basic SEO. Assuming that you have done keyword research, this will be quick. Implement your important (and appropriate to the article) keywords in your page title, then the <H1> heading and continue on down with your <H2’s> etc. Follow that with a link or two to external or internal pages as needed and link back to the image pdf you just converted.
Add to your CMS of choice
Once you have completed this process you will now add it to your website. I typically do this before I add images because it is more efficient to add images once it is in the proper web page format. Optimize your images with title, Alt and Description and publish your page.
It sounds like it could be time consuming, but it is fairly quick and easy to do. You now have numerous new, optimized, user and Google bot friendly content for your site. This is by no means a substitute for well written, thought out content, but sometimes it is just what’s needed to get busy clients started in the content building process.