Revision: 1.0
Search is the primary tool for finding information the location of which is unknown or when it is not even clear if this information can be found at all. It is an auxiliary tool when the table of contents does not include the anticipated section or when the anticipated accessory item, such as table or figure, is not listed in its index.
The challenge of implementing effective search is to overcome two problems. False positives invade the search results with irrelevant finds that look like the information that the user is looking for. False negatives are the valuable pieces of information which are left undiscovered in response to a legitimate user's query.
Detecting false positives is much easier than detecting false negatives. The user can judge whether or not a reference leads to valuable information: each reference from the search results reveals a spot in documentation that explains or describes a feature of the product being documented. If the suggested part of text tells nothing about the product then this reference is not useful and ought to be excluded from search results.
Users become aware of false negatives only when they try to revisit via search an explanation that they encountered before.
Content and Navigation
Regardless of output format, documentation is made of two key components: content and navigation. The content is the text and accessories that users interact with in order to understand the product. The navigation enables locating valuable segments of content.
Both content and navigation complement each other. Their bonding is predetermined by the purpose of documentation to explain the product. Hence, documentation components also serve this very purpose. They may only differ in the role they play. Search perfectly complements the navigation system.
Methods of Navigation
The system of navigation is sufficient when it offers three essential methods: navigation by structural units, navigation by accessories, and navigation via search.
For navigation by structural units, the documentation is presented as a hierarchical system of sections and subsections. By themselves, these elements only point to contexts in content. The user first locates the document context, selects the section context in it, and — finally — chooses the context of a subsection within the section.
Navigation by accessories provides indices of objects that belong to the same type: tables, diagrams, or code samples. The glossary of terms is also part of this tool. Accessories are provided in a flat structure arranged in a specific order. Alphabetic order is the most popular approach. Depending on the target audience and the product itself, the documentation might feature addition specialized accessories, such as code samples or demos; more traditional accessories, such as tables or figures, can be omitted.
Search is the third essential navigation method. It naturally complements the navigation by structural units and the navigation by accessories giving direct access to information pertaining to the user's free form query. The search results may contain references to sections or subsections as well as tables, figures, and terms defined in the glossary. Also, search may include product related terms which are only loosely explained or mentioned in a context valuable to users.
The support of synonyms in queries makes it even more powerful. The user may input either a term from the domain of the target audience or its product specific variant. The search engine will accept either of them but only return a reference to the product specific definition in the search results.
Looking for the Best Search Engine
The freedom to choose a custom search engine is only available when the documentation is created by using a plain text documentation format. The documentation generation system takes manuscripts — author's files with content that adheres to the chosen format — and builds output.
Structural units and accessories have unambiguous markers in manuscripts to assist authors in writing well structured text and to enable documentation generation systems to construct output. In the output, the syntax structures in the manuscript are converted to conventional representation of text elements. The navigation by structural units and by accessories appears in the output as a result of analysis of related syntax constructs.
Good candidates for search queries, however, seems to be less conspicuous and the practical definition of the best search engine is elusive. If the best search engine is the engine that may find all document contexts that contain the characters from the user's query in a fraction of a second then the search results are likely to include an overwhelming number of irrelevant finds.
If the best search engine is the engine that may only provide matches to section headings or terms from the glossary then users will not be able to detect a valuable information about the product only because it is described elsewhere in text or the search query uses a synonym.
The best search engine in documentation finds all valuable information about the product and expels all false positives. The user may construct a search query which is not found anywhere in text but is still legitimate: queries with typos, synonyms, incomplete terms etc. This leaves but one option for an effective navigation by search. It is the author who should explicitly mark search targets in the manuscript.
Grokking Search
Search is the most complex navigation method. While all other navigation methods obtain enough information from the content to build their references, search interfaces with the user and with the content.
User's input cannot be predicted or limited. For this reason, search is forgiving. It must provide reasonable results even when the user inputs terms that contain typos or when the word order does not match the usage in the content. Search must successfully process queries with synonyms.
Queries with trivial input — those that contains no terms in any form or synonyms to product terminology — cannot provide useful data. In this case, search's best option is either to offer references to one or more other navigation method or to provide instructions about how to search more effectively.
The quality of interface with the content is crucial. To implement this interface the author associates a sequence of terms and synonyms with relevant contexts: sections, sub sections, topics, or standalone definitions and explanations. Typos and incorrect data need not be part of these associations: the search engine must be able to detect these problems and either detect correct terms or discard the query.
It turns out that the workflow where the author associates segments of content with terms that users are likely to use when searching documentation is a typical routine of adding the alphabetic index. The purpose of the alphabetic index is identical to the purpose of search. Search and the alphabetic index only differ by how they interface with the user. The alphabetic index lists all possible queries in one list. Search on the documentation site offers a more compact interactive component.
Mature plain text formats provide special syntax constructs for making the alphabetic index. Based on the alphabetic index, search receives the convenience available to other navigation methods: retrieving information from content by using unambiguous data.
Comments