Back Words Indexing 

  Back-of-the-Book Indexes
     for Publishers and Authors

      "You're going to love the way your book ends!"



about indexes
my background

authors as indexer
computers asindexer

quotations about indexes/indexing
indexer humor

order of the kohlrabi
wilson awards judge



Martha Osgood
Back Words Indexing

Since 1996

author of the index in
Inside Indexing:
the Decisionmaking Process

by Sherry Smith & Kari Kells





Indexing ultimately organizes "aboutness" for quick recall. The computer and its software assist, but the human mind alone can speak to the concept of "aboutness". If a term or concept is not specifically articulated on a page, a computer cannot choose it for the index, nor can a search engine find references to it. Neither can the computer reword the entry in a form that aids readers who are unfamiliar with the author's thrust. A paragraph or discussion can be "about" a topic without specifically using those words.


   Indexing software is a tremendous aid to the professional indexer, but it by no means creates indexes "automatically," any more than a spelling or grammar checker can edit a text on its own. Beware of vendors who claim that the services of a professional indexer can be replaced by running a software program on the text of a book. The intellectual and analytical work of indexing is the task of the human brain, and no software program can duplicate it.

   Indexing programs available to professional indexers can help the indexer to produce, sort, and manipulate entries; establish subheading sequences; restyle and amend entries; and keep track of what has been indexed where. On the other hand, the indexing add-ons included with word processors and DTP programs are usually far less efficient as aids to creating a high-quality index.

For example, in a discussion of passwords, keys and locks, you may be speaking about security issues and encryption. The words "security" or "encryption" may not appear in that page yet the reader should be directed to that page by the index under those headings. It takes a human mind to draw these conclusions.

Furthermore, a computer cannot:

  • determine relationships among words and concepts, and therefore cannot place subentries, synonyms and cross-references properly

  • decide what is and is not a relevant reference-- they can only sort the terms that appear in a document according to certain preprogrammed patterns

  • recognize concepts which are discussed over a range of pages

  • limit the search to relevant entries (vs. every occurance of a word)

  • function when a word is misspelled (for example. google the word "backwords" and notice how often it is used where the word "backwards" is meant)

  • consider how terms develop varied meanings -- for example, a "key" on pianos, for computers, to unlock doors, to unlock puzzles, for security, or as a geographic feature, as in Key West. At the same time, a computer is unable distinguish an author's use of multiple terms to indicate one concept: for example, in the computer manual field, 'application', 'software', and 'program' are often used interchangeably

Thus, without a human being to analyze content and context, automation in either a search process or in creating an index falls short of effectively bringing together relevant topics while avoiding the unrelated. Yes, a computer can think—just like a plane can fly or a car can drive.

For more on why a computer can't index a book, visit http://www.indexers.org.uk/index.php?id=463
For more specifics as to the limits of a full text search, visit www.jalamb.f9.co.uk/Full_text_searches.html


If a computer can be programmed to beat a human at chess, why can't it write an index?

"Given any configuration of chess pieces, there are a finite number of sequences possible. It may be a huge number, but it's finite and (more important) definable. That is, with enough computing power and memory, it's possible for a computer to evaluate every possible sequence of moves to a win or loss, and to make the best move possible. That's not at all true of indexing or any other task that requires true understanding." (David Billick)

There are programs that are sometimes called indexing software, but they may in reality be search engines, or concordance builders, or text mining software. The idea of automatic indexing is different from the computer-assisted indexing that professional back-of-the-book indexers use.

Some systems can be adequate for a specific implementation. NStein, Inxight, Autonomy, Convera, Applied Semantics, Sonar Bookends, and/or Entriev are based on automatically extracting concepts from texts in such diverse applications as indexing public records and processing accounts receivables for trucking firms, but the results are not adequate for creating back of the book indexes.

Even "text-mining" software has problems: "How well computers truly make sense of what they are reading is, of course, highly questionable, and most of those who use text-mining software say that it works best when guided by smart people with knowledge of the particular subject." (New York Times, 10/16/2003) These articles are also useful: http://www.intranetjournal.com/features/humanindex-1.shtml

The difference to the reader (and to the number of users who call your Help Line) in the quality of the index, its useablility, flexibility, and integrity, can be enormous.

My guess is it will be about 300 years until computers are as good as, say, your local reference librarian in doing a search.

~ Craig Silverstein, Google's director of technology

What is "automatic indexing"?

The Emperor's New Mind. Written in 1989 by Roger Penrose. He discusses why automatic indexing (AI) will, in his opinion, *never* be able to "understand" what information is truly "about."

Martin Tulic: Wellisch's "Glossary of Terminology in Abstracting, Classification, Indexing, and Thesaurus Construction" (available for purchase here) defines Automatic Indexing as: Any [indexing] method by which the [text] of a [documentary unit] is subjected to algorithmic operations in order to extract [terms] or [phrases] that represent [subject], [topics], or [features] of the documentary unit, where [<term>] refers to terms defined elsewhere in the glossary. By this definition, Automatic Indexing has indisputably been at the center of information retrieval systems ever since people realized the problems inherent in KWIC, KWOC, and similar simple algorithms.

One major difference between the way humans indexers work and the way today's automatic indexing systems work is that humans select terms, arrange and edit them, whereas today's automatic indexing systems select terms, compare them to a thesaurus, and compile their index based on the thesaurus.

Does the indexer really have to read the whole book?

Oh yes. Several times.