About Image Files and PDF Documents

Rituraj Kalita
July 2012

Nowadays, the digital image files suitable for computers are obtainable in many ways. (1) The photographs taken by digital camera and by many cellphones are such image files. (2) Also the on-paper images (photographs, drawings, paintings, etc.) when scanned by a scanner give such image files (even the texts on paper may be scanned by a scanner to give images, containing the texts as image). In both these cases, the image files are generally obtained in the JPEG format (having a .jpg or .jpeg file extension). (3) Images may also be manually drawn by you in a computer, using programs such as Paint, and saved in modern image-formats such as .jpg, .tif or .gif (the last one gives the smallest files suitable for the websites). (4) Some images might be collected from the Internet or may be borrowed from your acquaintances. (5) The output (say, a textual document) of many programs (say, of Microsoft Word) can be obtained also as an image file by employing a suitable image printer driver such as Microsoft Office Document Image Writer (a part of Microsoft Office 2003) or PDFill FREE PDF & Image Writer 9.0 (a wonderful freeware from pdfill.com) or NED Image Printer Driver 1.1 (a wonderful freeware from nedatacorp.com). For a public website requiring to contain text in non-European languages, say in an Indian language, it is very important to either keep the texts as images or to have the relevant language-font included in a auto-downloadable form within the website text. This is because the viewer's computer may not have the relevant font installed and so he/she won't be in a position to view the non-English text written and uploaded by the webmaster (that is, by you). (6) Another very important yet simple (though less utilised) way many relevant image files can be obtained by you is the Print Screen method, by which any image (whether a program-output or a website-originated one) seen on your computer screen (monitor) can be immediately obtained as an image file. For this, while seeing the image you need to press the Print Screen key on the keyboard (find where it is!); by this action, whatever is being seen in the whole screen (including, obviously, any text contained in it) at that moment becomes an image file and gets copied to the clipboard, which may then be pasted in a image viewer such as Paint or IrfanView (that's a wonderful freeware from irfanview.com -- in Ubuntu computers, may instead use a similar freeware nomacs), cut into the required pieces and saved as image files. The last two methods (methods 5 & 6) might be unfamiliar for you, and so would require some more discussions. We'll come to them later.
Note: It is more advisable to finally download the software (e.g., IrfanView) from a reputed software-hosting service such as CNET or Softpedia instead of from websites of individual small fry developers such as this author - the former are obviously better equipped to maintain our software setup files always free of all kinds of malware.

Whatever, out of the last six, be the source of your image files, they require some manipulation before finally putting them in your websites. Nowadays, the photographs taken by digital camera and cellphones are too perfect and thus very bulky - their data size need to be drastically reduced. Same is the case with the scanned images, and here we also need to select and cut only our required image, leaving out the white-paper borders. The images delivered by printer drivers may also be somewhat bulky, while the 'Print Screen' method gives the whole monitor screen as the image and not our desired portion! So, for all these purpose, we require something like IrfanView (Paint is not so useful for such purposes). We may open an existing image file in IrfanView or paste an image (say, the Print Screen image) within it, and do the required manipulation in it, followed by saving the desired image. On an image shown in IrfanView, the required portion may be selected in the shape of a rectangle by dragging the mouse from top-left side of the desired portion, towards the bottom-right direction (see the following figure, within which a rectangular part of the image around the clenched fist is being selected), and then copied so as to be pasted in a blank IrfanView window and finally saved (say, as a less bulky .gif file).

Selecting a Desired Portion of an Image in IrfanView

Selecting a Desired (Rectangular) Portion of an Image in IrfanView

Even if the whole existing image is required, this selection method (with whole image selected by dragging the mouse) may help in reducing the data size (OK, I'm not so sure about this sentence!). To reduce the data size of our final required image, an obvious method for too large-sized images (e.g., modern-day camera photographs) is resizing (to lesser mega-pixels), followed by saving in the less bulky .gif format. But while resizing, take care to decrease the size by an integral number of times (e.g., 2 or 4 or 10 times) by choosing 50%, 25% or 10% size (as per the case of the original image size) of the original (this probably leads to better output) and to preserve the aspect ratio (i.e., length to width ratio) - otherwise the image may get deformed in view. The following figure shows the option in IrfanView to decrease the size by 25% (obtainable from the Resize/Resample option of its Image menu). The original picture was of 2187 x 2354 pixels: as it is too large to be placed in an webpage, we should reduce the size to 25% of the original (preserving the aspect ratio), so as to turn it into 547 x 589 pixels, a satisfactory size for our webpage. And wow!, we hereby also decreased the data size from 3.16 MB (original, .jpg - too bad for an Indian website) to just 194 KB (output, .gif) - rather okay for our site.
Note: You probably know that 1 MB equals 1,024 KB, i.e., nearly one thousand KB.

Decreasing Image Size as well as File Size in IrfanView

Another method for decreasing data size as well as image size is to apply the Print Screen method even for the existing images. Thus, let us not manipulate the original image itself but let us open it (say, in IrfanView), zoom it to our desired view and then take a screenshot by pressing the Print Screen key, followed by pasting that screenshot in IrfanView. Next, zoom the screenshot image in IrfanView to our desired size, then copy the desired portion and paste that in another IrfanView window, finally saving it as a .gif file. However, this procedure is more complicated, but sometimes give rather similar reduction in file size compared to the above method. So, try this method only if the above-mentioned one doesn't yield a satisfactory file size.

Let us now learn in details about the aforesaid 'printer driver' method (the 5th one) for obtaining image files corresponding to your (non-English, e.g., Assamese) document files. For a document (or a document page) taller (i.e., lengthier) than what you can see on one screen, the simpler 'Print Screen' for obtaining a corresponding image obviously doesn't work. So for them, the 'printer driver' method is the best. When installed, a printer driver software (e.g., NED Image Printer) adds a hypothetical 'printer' (ha, ha!) to your list of available printers (e.g., HP, Canon, etc.). Next, when you try to print your document from within its creating package (say, from Microsoft Word), you'll be offered the option of printing it using one of the existing real printers or by using one of these hypothetical printers (see figure below). If you have Microsoft Office 2003 within Windows XP, you might already have the MODIW - Microsoft Office Document Image Writer (printer driver) pre-installed (I didn't see it at the beginning, but later somehow got it working - I don't remember exactly how). Now, if you choose a image printing hypothetical printer (e.g., the NED_Image_Printer seen below) instead of HP, Canon, etc., you will generally be shown some options for your required image file and will finally be able to get the output image file (as if that is a printed piece of paper).

Printer Drivers Available While Going to Print Something

Printer Drivers Available While Going to Print a Document

        Let us now install the two free image printers NED & PDFill. Out of them, NED is slightly difficult to install, but after the installation yields (.tif format) image-output for all files. For documents (e.g., in above figure) written using some non-English fonts such as Aadarsha Ratne Internet (a wonderful and easy-to-master free Assamese font I've downloaded from here), PDFill doesn't produce any image output, but NED or MODIW successfully does. However, unfortunately, it is very difficult to install NED & MODIW in Windows 7, and even PDFill refuses to give multi-page .tif image output within Windows 7 (I've got enough reasons to believe that 32-bit Windows XP is the best Microsoft Windows, and Office 2003 is the best Microsoft Office). NED gets downloaded as an archive file, which is then to be extracted yielding a folder (e.g., nedip2) with some files in it. Keep that folder somewhere in your hard disk, and out of the various files in it, get a copy of the NEDPRINT.INI file pasted within your Windows\System32 folder (e.g., C:\Windows\System32). Next, perform this slightly complicated procedure to install your NED Image Printer. It is unfortunate that the output image file of Free NED is always a particular file NEDIP.TIF lying within the C:\Windows\System32 folder - you must copy it to own folder and rename it (or transfer its image-contents) before its image-contents get overwritten by the next NED-printing operation! MODIW, if you could have it in place, has no such peculiar problem and is rather a fantastic work of art! On the other hand, PDFill installation is rather easy - you just need to double-click the downloaded setup file and to agree to whatever it asks.

        The image printer drivers generally offer options, on which the beauty and the data size of the output image depends (unfortunately, more is the data size, the more beautiful is your image!). To access such options for NED, click Properties within the Print dialogue box, then in the ensuing 2nd box (may choose Black & White - OK for text-containing images - or Color option here), click Advanced, and in the 3rd dialogue box, may choose Print Quality (300 x 300 dpi i.e., fine - or 100 x 100 dpi i.e., ordinary, etc.). Click OK after you've chosen your desired options. Using any of them, we can have either a one-page image (if not .gif, convertible to .gif by using IrfanView) corresponding to one page of the document (this one-page image can be directly put within your webpage) or a multi-page image in the TIFF (.tif) format. A multi-page image is obviously useless for direct insertion into an webpage - rather it is the precursor to form a (multi-page) PDF document by using a PDF printer driver such as PDFill or PrimoPDF (another wonderful, must-have freeware, downloadable from CNET). Such PDF files may be kept within your website and linked by hyperlinks from your homepage or some other webpages.

        If a PDF document is formed from a (multi-page) image instead of from the original word-processor document, it won't require fonts unavailable within the viewer's computer, and so will always be correctly seen by the viewer. However, as an image-containing document, it'll be somewhat bulkier than a PDF file directly produced from the original word-processor document - but this price needs to be borne to communicate your non-English writings to your own-language viewers. For some word-processors such as Adobe PageMaker, however, there's available a way to get around this difficulty. In the second part, we'll discuss generation of PDF file in three different possible ways.

Part II
How to Generate the PDF Files in a Manner Suitable for My Website?

PDF printer driver software, whether PDFill or PrimoPDF, become available as hypothetical printers, just as was the case with the image printer drivers (did you notice PrimoPDF showing up as an available printer within the above figure?). So, to generate a PDF - whether starting from a word-processor (e.g., AbiWord), a presentation software (e.g., PowerPoint), a spreadsheet software (e.g., Excel), a Internet browser (e.g., Firefox) or a image software (e.g., IrfanView) - go for printing the file, choose the pages to print but then choose the PDF printer driver instead of a real printer. The output will be a PDF file containing the mentioned pages (stated to print) instead of printed paper-pages. In this way, even some particular pages from an existing PDF file can be obtained as another PDF file. And, all this for free as well!

PrimoPDF allows easily accessible options for the quality (as well as file size) of the output PDF files, and so here we'll discuss it instead of PDFill. Sometime after giving the (hypothetical) print order, four simple options (Screen, Print, eBook and Prepress) are shown as follows. For inclusion in websites, the lowest-quality and smallest-size yielding Screen option generally works. If better quality is sought, then the next eBook option may be tried (you may manually judge one output and accordingly again form the PDF file).

The Options Offered by PrimoPDF

        Let us now discuss the two major ways in which PrimoPDF may be used to form the PDF files. For texts written in English and Spanish, etc., the fonts used are available in all computers, so you may form the PDF directly from the word-processor document. The PDF would then contain actual texts within it, and so would have a rather small file size. This was the case with the world-famous Kyrgyz novel (in English translation) stored within this website (it's too old a novel to have copyright restrictions now): though this file contains 39 text pages and a large colourful map image added by me, it is only 618 KB in size. It was formed, using PrimoPDF, directly from a Word document, into which I had pasted the novel's (English) text, the author's photograph and the coloured map of Central Asia. On the other hand, this Assamese discussion on the macronutrients in Indian diet by me has only 11 text pages, but has a file size 678 KB. This is because its contents, though apparently looking like Assamese text, are actually images obtained from the Assamese text by employing the image printer driver MODIW (employing 200 dpi print quality) - thus giving a multi-page .tif image file, which was then further converted to a PDF file by using PrimoPDF (employing the higher-quality Print option). As the Assamese font used to write the text was unlikely to be present in the computer of every viewer, so the text was to be first converted to image and, from that .tif image only, the PDF file was to be made.

        Something similar was done about the scanned university question papers in chemistry stored in another site by this author. There, the scanning of the question papers resulted in a lot of single-page image files, one for each page of questions (OCR-conversion into actual text couldn't be successfully done for some reason). For combining several page-images into one file, Word was employed to manually paste the page-images one by one, resulting in very large-sized Word documents (e.g., 63 MB for a 30-page document). These purely image-containing Word documents were next converted to PDF files, one by one, by employing PrimoPDF and using its (lowest-quality) Screen option, thus resulting in files somewhere around 2 MB each (and, so, could be meaningfully hosted in that said website).

       For non-European language texts written in some word-processors such as Adobe PageMaker, there is a provision to export the text as a PDF file, and besides there is a provision to somehow (God knows how?) embed the peculiar fonts used to write the text within the generated PDF files. Thus the text remain as genuine text and thus remains clearer (i.e., not blurred) compared to the PDF files containing text converted into image. Note that here the conversion of the non-English word-processor document to a PDF document becomes a single-step one. The following figure depicts the conversion process used in the case of an Assamese periodical written in Adobe PageMaker into a PDF file. Note the box against the line Include downloadable fonts, which must remained ticked (as the text was written using some peculiar Assamese fonts unlikely to be found in every PC). Note that one needs something called Acrobat Distiller (I found that lying within a Distillr subfolder of some Acrobat3 folder within my hard disk C-drive) for this operation to be successful - you'll need to show the location of the Acrobat Distiller when you first do this operation.

Adobe PageMaker 6.5 Exporting Text as PDF with the Fonts Embedded

Existing on-paper documents (e.g., old books) may be converted to digital image files by scanning them. But there are options to do it with high or low quality, and the file size would accordingly vary. Also, if the paper document was printed in English or some select languages, the scanned image may be successfully converted to text by using a OCR (optical character recognition) software, and so a corresponding digital text document (e.g., in webpage, Microsoft Word or textual-PDF format) having a low file size may be finally obtained. This looks like a very stimulating state of affairs, and indeed this is the way the great worldwide book digitisation projects such as the Project Gutenberg and the Google Books initiative are proceeding. In the following lesson, you may learn how to successfully perform the scanning and the OCR operations, using free software all along:

Scanning and OCR: Letting documents travel from the paper-world to the cyber-world