Re: PDF vs Markup Languages

From: Clinton Jones <clinton_at_TTALK.COM>
Date: Tue, 1 Sep 1998 07:04:06 -0400

Trying to critically evaluate the deficiencies of a PDF or ENVOY (hands up
all those who don't know what this is) document it is easy to see that on the
first score the problem is indexing, the second problem is it's relative
keyword unsearchability, if it hasn't been produced in a page mill type
environment and thirdly the problem of a risk of totally unstructured
librarianship. I think at this point it becomes obvious that even the best
planned and designed HTML or XML documents show themselves up to be
protential problems. It would be reasonable to assume that in an archival
or library environment that three activities would need to take place.

Development of a standard document format, vis a vis text and images
Development of a comprehensive index
Cross Platform independence.

Standard document format implies that, as with paper vs video vs film vs
diskette vs .... that the 'media' must have a standard. PDF in itself is a
standard but then so is WPD or EVY, XLS, DOC or WK1. DOC is a little
more generic and could be text, Word or RTF so maybe DOC is not such a
standard, PDF, WPD and WK1 pretty much are standard, if you have the
viewer/plug-in/application, you can view it. The images within the document
should ideally be imbedded, not sourced from an external location; this is
where HTML collapses, it is contextual, you can't imbed the images within
the document unless it is UUENCODED in some way, this may change but
for the present it is a problem, images and their significance shouldn't be
underestimated, a biological study of cell structures might be very reliant
uppon the visual appearance of cellular structures and an image that is
absent from the context of the document may make the document worthless
. So adopt a standard that matches as many criteria as possible.

A comprehensive index can't be created in a PDF document unless it is
developed natively specifically for an indexible PDF environment. If I were
to take a journal from the 1900's I would have to physically create the index
by hand, a painstaking task that would cost time and money. I could
however create an index and keywords integrally to HTML as meta
keywords in a hidden environment invisible to browsers. I would be far
better off stuffing my documents in a proper multi-object database though,
where a librarian or archivist would create the indices and keywords
relevant to the document. The document itself could in effect be any
generally accepted format. But finding it would be reliant not upon the
document itself but on the correct methods being used by the archivist or
librarian. Just the same way that that journal from the 1900's can't be
located if it is misfiled, a badly indexed document won't be easily retrieved
irrespective of whether it is in HTML or PDF or any other format.

The most important and very topical issue is the cross platform
independence. Macintosh, Amstrad, DOS and WINTEL users should all be
able to view and search the content with ease. A standard interraction
platform is critical irrespective of whether you have ten documents or ten
thousand. If you choose a standard outside of the two cited (PDF and HTML)
you run the risk of not being able to share the document. Although the
bandwidth of internet resources will grow, so will the traffic and user
demands on the flexibility and performance of the tools that they use.

In a phrase, I believe that PDF is the solution to the document in itself
provided it becomes an open standard, HTML, SGML and XML have
inherent contextual deficiencies which make single file management
impossible, PDF will only be useful for the future if it is combined with a
proper relational database support structure that is archived and indexed
using good librarianship practices.
International +27 (82) 6533776
In the US 888 639 4954 ext 206
ICQ : 9937735
For the Latest in Techformation on the World Wide Web

Fast Talk (3 minutes)

You must become the change you want to see in the world
 Aboriginal saying :Charles Handy, The Hungry Spirit, p.103
Received on Tue Aug 25 1998 - 19:17:43 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:25 GMT