Another method for decreasing data size
as well as image size is to apply the Print
Screen method
even for the existing images. Thus, let us not manipulate the original
image itself but let us open it (say, in IrfanView), zoom it to our desired
view and then take a screenshot by pressing the Print
Screen key,
followed by pasting that screenshot in IrfanView. Next, zoom the screenshot
image in IrfanView to our desired size, then
copy the desired portion and paste that in another IrfanView window, finally
saving it as a .gif
file. However,
this procedure is more complicated, but sometimes give rather similar
reduction in file size compared to the above method. So, try this
method only if the
above-mentioned one doesn't yield a satisfactory file size.
Let us now learn in details about the
aforesaid
'printer driver' method (the 5th one) for obtaining image files
corresponding to your
(non-English, e.g., Assamese) document files. For a document (or a
document page)
taller (i.e., lengthier) than what you can see on one screen, the
simpler 'Print Screen' for obtaining a corresponding image obviously
doesn't work. So for them, the 'printer driver' method is
the best. When installed, a printer driver software (e.g., NED Image Printer)
adds a hypothetical 'printer' (ha, ha!) to your list of available
printers (e.g., HP, Canon, etc.). Next, when you try to print your
document from within its creating package (say, from Microsoft Word),
you'll be offered the option of printing it using one of the
existing real printers or by using one of these hypothetical printers
(see figure below). If you have Microsoft Office 2003
within Windows XP,
you might already have the MODIW - Microsoft Office Document Image
Writer
(printer driver) pre-installed (I didn't see it at the beginning, but
later somehow got it working - I don't remember exactly how). Now, if
you choose a image printing hypothetical printer (e.g., the NED_Image_Printer seen
below) instead of HP, Canon, etc., you will generally be shown some
options for your required image file and will finally be able to get
the output image file (as if that is a printed piece of paper).
Printer Drivers Available While Going to Print a Document
Let us
now install the two free image printers NED & PDFill. Out of them, NED is slightly
difficult to install, but after the installation yields (.tif format)
image-output for all files. For documents (e.g., in above figure)
written using some non-English fonts such as Aadarsha Ratne Internet
(a wonderful and easy-to-master free Assamese font I've downloaded from
here), PDFill doesn't
produce any image output, but NED
or MODIW
successfully does. However, unfortunately, it is very difficult to
install NED & MODIW in Windows 7, and even PDFill refuses to
give multi-page .tif
image output within Windows 7 (I've got enough reasons to
believe that 32-bit Windows XP is
the best Microsoft Windows, and Office 2003 is the
best Microsoft Office).
NED
gets downloaded as an archive file, which is then to be extracted
yielding a folder (e.g., nedip2)
with some files in it. Keep that folder somewhere in your
hard disk, and out of the various files in it, get a copy of
the NEDPRINT.INI
file pasted within your Windows\System32
folder (e.g., C:\Windows\System32). Next, perform this
slightly complicated procedure to install your NED Image Printer.
It is unfortunate that the output image file of Free NED is always a particular file NEDIP.TIF lying within the C:\Windows\System32 folder
- you must copy it to own folder and rename it (or transfer its
image-contents) before
its image-contents get overwritten by the next NED-printing
operation! MODIW,
if you could have it in place, has no such peculiar problem and is
rather a fantastic work of art! On the other hand, PDFill
installation is rather easy - you just need to double-click the
downloaded setup file and to agree to whatever it asks.
The image printer drivers generally
offer options,
on which the beauty and the data size of the output image depends
(unfortunately, more is the data size, the more beautiful is your
image!). To access such options for NED, click Properties within the Print dialogue box, then
in the ensuing 2nd box (may choose Black
& White - OK for text-containing images - or Color option here), click
Advanced, and in the 3rd dialogue box, may choose Print Quality (300 x 300
dpi i.e., fine - or 100 x 100 dpi i.e., ordinary,
etc.). Click OK
after you've chosen your desired options. Using any of them, we can
have either a one-page image (if not .gif, convertible
to .gif by using IrfanView)
corresponding to one page of the document (this one-page image can be
directly put within your webpage) or a multi-page image in the TIFF (.tif) format. A multi-page image is obviously
useless for direct insertion into an webpage - rather it is the precursor
to form a (multi-page) PDF document by using a PDF printer driver
such as PDFill
or PrimoPDF
(another wonderful, must-have freeware, downloadable from CNET). Such PDF files may be kept
within your website and linked by hyperlinks from your homepage or some
other webpages.
If a PDF
document is formed from a (multi-page) image instead of from the
original word-processor document, it won't require fonts unavailable
within the viewer's computer, and so will always be correctly seen by
the viewer. However, as an image-containing document, it'll be somewhat
bulkier than a PDF file directly produced from the original
word-processor document - but this price needs to be borne to
communicate your non-English writings to your own-language viewers. For
some word-processors such as Adobe
PageMaker, however, there's available a way to get around
this difficulty. In
the second part, we'll discuss generation of PDF file in three different
possible ways.
Part II
How to Generate the PDF Files in a Manner Suitable for My Website?
PDF printer driver
software, whether PDFill
or PrimoPDF,
become available as hypothetical printers, just as was the case with
the image printer drivers
(did you notice PrimoPDF
showing
up as
an available printer within the above figure?). So, to
generate a PDF - whether starting from a word-processor (e.g., AbiWord), a
presentation software (e.g., PowerPoint),
a spreadsheet software (e.g., Excel),
a Internet browser (e.g., Firefox)
or a image software (e.g., IrfanView)
- go for printing the file, choose the pages to print but then choose
the PDF printer driver
instead of a real printer. The output will be a PDF file containing the
mentioned pages (stated to print) instead of printed paper-pages. In
this way, even some particular pages from an existing PDF file can be
obtained as another PDF file. And, all this for free as well!
PrimoPDF
allows easily accessible options for the quality (as well as file size)
of the output PDF files, and so here we'll discuss it instead of PDFill. Sometime
after giving the (hypothetical) print order, four simple options (Screen, Print, eBook and Prepress) are shown
as follows. For inclusion in websites, the lowest-quality and
smallest-size yielding Screen
option generally works. If better quality is sought, then the next eBook option may be
tried (you may manually judge one output and accordingly again form the
PDF file).
The Options Offered by PrimoPDF
Let us now discuss the two major ways in
which PrimoPDF
may be used to form the PDF files. For texts written in English and
Spanish, etc., the fonts used are available
in all computers, so you may form the PDF directly from the
word-processor document. The PDF would then contain actual texts within
it, and so would have a rather small file size. This was the case with
the world-famous Kyrgyz novel (in English translation)
stored within this website (it's too old a novel to
have copyright restrictions now): though this file contains
39 text pages and a large colourful map image added by me, it is only
618 KB in size.
It was formed, using PrimoPDF, directly from a Word document,
into which I had pasted the novel's (English) text, the author's
photograph and the coloured map of Central Asia. On the other
hand, this Assamese
discussion
on the macronutrients in Indian diet by me has
only 11 text
pages,
but has a file size 678 KB. This is because its contents, though
apparently looking like Assamese text, are actually images obtained
from the
Assamese text by employing the image printer driver MODIW (employing
200 dpi
print quality) - thus giving a multi-page .tif image file,
which was then further converted to a PDF file by using PrimoPDF (employing the
higher-quality Print
option). As the Assamese font used to
write the text was unlikely to be present in the computer of every
viewer, so the text was to be first converted to image and, from that .tif image
only, the PDF file was to be made.
Something similar was done
about the scanned university question papers in chemistry stored in another
site by
this author. There, the scanning of the question papers resulted
in a lot of single-page image files, one for each page of questions (OCR-conversion into actual
text couldn't be successfully done for some reason). For combining several
page-images into one file, Word was employed to manually
paste the page-images one by one, resulting in very large-sized Word documents (e.g., 63 MB for a 30-page
document). These purely
image-containing Word documents were next
converted to PDF files, one by one, by employing PrimoPDF and using
its (lowest-quality) Screen
option, thus resulting in files somewhere around 2 MB each (and, so,
could be meaningfully hosted in that said website).
For
non-European language texts written in some word-processors such
as Adobe PageMaker,
there is a provision to export
the text as a PDF file, and besides there is a provision to somehow
(God knows how?) embed the peculiar fonts used to write the text within
the generated PDF files. Thus the text remain as genuine text and thus
remains clearer (i.e., not blurred) compared to the PDF files
containing text
converted into image. Note that here the conversion of the
non-English word-processor document to a PDF document becomes a
single-step one.
The following figure depicts the conversion process used in the case of
an Assamese
periodical written in Adobe PageMaker
into a PDF file. Note the box against the line Include downloadable fonts,
which must remained ticked (as the text was written using some peculiar
Assamese fonts unlikely to be found in every PC). Note that one needs
something called Acrobat
Distiller (I found that lying within a Distillr subfolder
of some Acrobat3
folder within my hard disk C-drive)
for this operation to be successful - you'll need to show the location
of the Acrobat Distiller when
you first do this operation.
Adobe PageMaker 6.5 Exporting Text as PDF with the Fonts Embedded
Existing on-paper documents (e.g., old books) may be
converted to digital image files by scanning them. But there are
options to do it with high or low quality, and the file size would
accordingly vary. Also, if the paper document was printed in English or
some select languages, the scanned image may be successfully converted
to text by using a OCR (optical character recognition) software, and so
a corresponding digital text document (e.g., in webpage, Microsoft Word
or textual-PDF format) having a low file size may be finally obtained.
This looks like a very stimulating state of affairs, and indeed this is
the way the great worldwide book digitisation projects such as the Project Gutenberg and the Google Books
initiative are proceeding. In the following lesson, you may learn how
to successfully perform the scanning and the OCR operations, using free
software all along: