Site hosted by Angelfire.com: Build your free website today!

 

Searching The Net

 

Using the various search tools on the web is enhanced by knowing how they were actually designed, and especially by knowing the specific rules--all too often quite different--for each tool. I have tried to address both these needs. I have arranged the search engines and catalogs in order of usefulness, provided a link to them in the title of each section, and then spell out, in short form, the rules for using them. I have provided some short examples, but for detailed examples consult the help documentation available at the site. The goal of this article is to help the new user get the most useful "hits" when using the various tools. At the end I have placed a handy little table which summarizes certain common characteristics among the search engines, some general search tips, and cross-references to other useful articles.


AltaVista

AltaVista is the premier search engine on the web. It has the largest, most inclusive indices. That does not mean it is the only one you need, or in all situations the best one to use. Different robot and indexing strategies have resulted in different results when using the various search engines. AltaVista, however, returns consistently useful information, but since no editorial decisions have been made regarding content, it also has the largest "noise to signal" ratio.

AltaVista allows searching of both the web and many Usenet Newsgroups. It allows control of the result lists in a standard, compact, and detailed format. It provides both simple and advanced searches. Advanced searches include all the features of simple ones, and also allow the use of boolean and proximity operators, grouping of terms by parentheses, and results ranking by keyword.

Simple Searches

For an effective search, it is best to enter as many search terms or phrases which exactly qualify the subject in which you are interested. The more precise you can be by offering more exact terms, the better the results.

Case sensitivity: Search terms entered in lower case letters are case insensitive. The use of capitalized terms (or accented letters) makes the term case sensitive. HotDog finds only the terms spelled exactly with that capitalization; hotdog finds all occurances of the term, regardless of capitalization. López only finds a word spelled exactly that way.

Phrases: To group search terms into phrases, include them in double quotes. "Abraham Lincoln" finds occurences of the name Abraham Lincoln, capitalized in just that way. Another way to link words into phrases is to insert punctuation between them: Abraham;Lincoln;Gettysburg;Address.

Required Terms: To require that one of your terms be included in the document being indexed, preface (the formal term is prepend) it with a + symbol: +HotDog. There must not be a space between the + and the term.

Prohibited Terms: To prohibit the inclusion of a term from a document for which you are searching, prepend it with a - symbol: -mustard. To find a reference to F. Scott Fitzgerald without reference to Gatsby: +"F. Scott Fitzgerald" -Gatsby.

Wildcards: With simple queries you are allowed to enter a wildcard character at the end of phrases which will substitute for any combination of letters. The asterisk (*) is AltaVista's wildcard character. For example, butt* will get all occurences of butt, butts, butter, button, etc. The asterisk cannot be used at the beginning or in the middle of words. It will substitute for up to 5 additional lower case letters.

Rankings: AltaVista will assign a confidence ranking to the hits it returns based on the following:

These factors are weighted, and the document with the highest confidence rating is given a score of 1.000. All others are given decimal scores less than 1.000, in order of confidence. This does not mean that the document rated 1.000 is the best source. It only best meets the ranking algorithm. Only rarely is the "best" source ranked first, unless you know the specific title of the document for which you are searching. For example, to find the document "Mr. William Shakespeare and the Internet" a search for that phrase, in double quotes, will find the exact web page, but entering the search terms separately, or just searching for "shakespeare" will result in too many non-specific hits.

Another way to search for a document with a known title is to enter the keyword title: in the search window and follow it with the title in double quotes: title:"Mr. William Shakespeare and the Internet". AltaVista allows searching within specific html tags like this for anchors, applets, hosts, images, links, text, and urls also. The usage is: host:palomar.edu, etc. See the site help pages for more details.

The most useful advice for searching with AltaVista, since its indices are text based whole words, is to be as precise as possible in describing what you are looking for, while excluding things in which you are not interested. "Viet Nam" +Saigon -conflict -war, will find information on Viet Nam and in particular about Saigon without finding information on the conflict.

Advanced Searches

The same rules for capitalization, phrases, wildcards, required/prohibited terms, apply to advanced queries, and in addition the use of boolean searching, proximity operators, and logical groupings with parentheses are allowed. These are only available if you select an advanced search from the AltaVista main page.

Boolean and Proximity Searching: AltaVista supports the use of the binary operators AND, OR, NEAR and the unary operator NOT. You may use the following symbols in place of the words: & (AND), | (OR), ~ (NEAR), ! (NOT). It is a very good idea to use the words rather than the symbols, since the words are easier to remember and common to other search engines. You may enter the operators in lower or upper case letters, but it is probably best to use uppercase to make them stand out from ordinary search terms and make the logic of the search more apparent. If these words are part of the terms for which you are searching, they must be enclosed in quotes. It is best to group your terms within parentheses to avoid confusion, but this is not required.

Examples:

Results Ranking: With advanced searches you may also specify keywords you wish AltaVista to use in order to confidence rank your results. This is a very powerful feature which will let you control which items are ranked at the top of the hit list. Type the terms you wish AltaVista to weight more heavily in the Results Ranking Criteria box on the advanced search screen before submitting the search. Then, even though the search results will not be affected, the listing of the hits will contain those in which you will probably be most interested at the top.


Excite

Excite uses a combination of text and subject indices to search either by keyword or by concept. Concept searches, according to the Excite authors, find documents related to the idea of your search, and not just documents explicitly containing the search terms you enter. From the initial screen you choose which way you would like to search, by clicking the keyword or concept radio button. Concept is the default. You may search web documents, reviews, usenet newsgroups or classifieds. Simple or more advanced features are entered in the same search box. There are not separate entry screens for either type of search, but advanced features like boolean searching and logical grouping are supported. You may not control the appearance of the hit list into standard/summary/detailed formats as you can with some other search engines.

As with all search engines, the more descriptive search terms entered in the search box, the fewer relevant hits will result. Case sensitivity and words grouped into phrases are not observed in the same way AltaVista observes them. Because of the way the ranking algorithm works, the more times a word is entered in a search window, the higher documents containing that word will be ranked: dog dog dog cat will rank dog pages higher than cat pages, but find both.

The use of required terms and prohibited terms is the same with Excite and AltaVista. Precede a required term with a + symbol and a prohibited term with a - symbol: +football -rugby -soccer.

Boolean Searching: Excite supports the use of the binary operators AND, OR, and AND NOT and the unary operator NOT. It also supports grouping of terms within parentheses to create complex logic. The default Excite keyword uses an implicit OR, that is, it searches for documents containing ANY of the search terms specified, though the Excite authors describe this as a "fuzzy AND", meaning documents containing both terms are weighted higher, but either term qualifies. Booleans and grouping allow for more specific results.

Examples:

The use of multiple spellings in the same search window can increase the chances of hits:
Dostoyevski Dostoevski Dostoevsky.

Rankings: Excite ranks its hit lists in order of confidence, with a percentage factor for what it feels is the best fit for the document returned and the search terms entered. The document at the top of the list will not necessarily be 100%. As you scan the hit list, look for a document that is very close to the one you want, then click the little button next to the confidence rating. The search will be re-perfomed using search criteria based on the indexing of that particular document, and a new list will be produced with the one you chose rated 100% and other hits ranked based on their similarity to that one.


Webcrawler

Webcrawler, now sponsored by America On-Line, is an outstanding search engine very much in the mold of AltaVista. In fact, it has more power than AltaVista in implementing advanced features such as the proximity operators NEAR and ADJ. It also includes a catalog of pre-classified subjects (directory services) by editors at GNN. It implements a feature of further searching based on pre-set search terms from the subject catalog, very much like Excite. (This feature hides behind the Spidey button. [Sometimes I feel silly writing this stuff.]). Finally, like AltaVista, it is so good in its own right, and associated with such a large company, that it can afford to be less gaudily commercial than Excite or Lycos.

Webcrawler touts "natural language searching," so you can enter a search like "highest mountain in the world." It throws out the noise words, and does a fuzzy AND search on the others, weighting pages with occurences of all search terms highest, but including pages that contain only one of the search terms. This is the common strategy among the best search engines. Webcrawler is different in that its definition of "noise" words is rather broad. The term "web" for example, is not indexed.

Display Control: On the initial search screen, above the search box, you may select whether you want to see web titles only, or titles and summaries for each hit. You may also select the number of hits per page: 10, 25 or 100. Summary mode will display a brief abstract of the page, its URL, and a numeric version of the confidence ranking.

Confidence Rankings: Next to each hit a little icon which looks something like a June bug larva is displayed. The fuller the larva, the higher the confidence match between the page and the search term. You may see a numeric version of the confidence ranking, for what it is worth, when summary display is chosen. The confidence rankings seem to be nothing more than a count of the occurences of the search term within a particular document.

Phrases: Like AltaVista, you may enter terms you wish considered as a phrase in double quotes. This means the words must appear next to each other in the resulting document. Combined with single, precise search terms this will yield the best results on the first try: Lincoln "Civil War" "Gettysburg Address" Gettysburg.

Boolean and Proximity Searching: Webcrawler allows entry of the operators AND, OR and NOT in the standard search window. Items may also be grouped within parentheses to create complex logic: Simpson NOT (Homer OR Marge OR Lisa OR Bart OR Maggie).

The real strength of Webcrawler's advanced features is in the implementation of its proximity operators. You may use NEAR/n, where n is the number of words apart the two search terms should be: Shakespeare NEAR/5 Internet. If a range is not entered, NEAR will return hits on documents where the words are next to each other, in either order. For controlling the specific order two words must appear next to each other, you may use the ADJ operator: reverse ADJ osmosis. In this example, reverse must precede osmosis.

Webcrawler does not support the use of required/prohibited terms, or wildcard expanders or limiters.

Subject Categories: Another strength of Webcrawler is its implementation of a subject catalog which you may browse. The catalog (and related reviews of web sites) is created by the editors of Global Network Navigator, and is quite good. A feature, similar to Excite's confidence buttons, is the Spidey button which accompanies subject browse mode. By clicking Spidey, Webcrawler will perform a topical search based on search terms for the area of interest pre-entered by the GNN editors. These are called "similarity queries," and are supposed to create optimal results.

On the whole, Webcrawler excels in ease of use and implements some very nice proximity search features, but its indices do not seem to be as extensive as AltaVista or Lycos. It offers some unique special features, such as 'search the web backwards,' to see who is linked to your page, and net statistics.


Lycos

Many of us who have used the Internet for a while have a fond spot for Lycos from its Carnegie Mellon days when it was truly a Godsend. Since the explosion of the web, better search engines have appeared, but Lycos is still good and fast, if not as sophisticated as some of the others. It offers both keyword and subject searching (the subject searches are called directory services), as well as a Point rating system which rates web pages. Its strong points are its speed, ease of use, and the large size of its indices, which often produce usable results by sheer brute force. Its weakest point is that it does not support boolean searching or any of the more sophisticated searches that can be made with AltaVista, Webcrawler or Excite.

Display Control: To gain any sort of control over your searches in Lycos, you need to click on the "Enhance your search" link on the Lycos front page. You will be taken to a screen which will allow you to:

Changing the type of search from OR to AND will result in far fewer hits, of course. The business about matching 2,3,4,5,6,7 terms allows for a degree of fuzzy matching with variant spellings. An example of match 2 terms, different from AND would be: Fyodor Dostoevski Dostoyevski. Documents containing any two of the terms will be returned, but not all three.

Focusing Your Search: You can change (fine tune) the results of your searches by changing the type of matches Lycos considers a success: loose, fair, good, close, and strong. The stronger the match, the fewer sites returned by Lycos.

Inclusion/Exclusion and Rankings: Lycos does not support the required/prohibited term syntax, as does AltaVista and Excite. You may, however prepend a search term with a - symbol meaning that that particular term will not be weighted in determining the ranking of the results: dogs -doberman will still get pages with the term doberman, but the pages with the term doberman will not appear at the top of the list. Lycos ranks each search, rating the best fit as 1.000 and all other hits as less than 1.000. As with the other search engines, it is rare for the site rated 1.000 to be the most useful.

Wildcards: To expand a word with a wildcard, add the $ symbol to the end of the word. For example, gen$ to get genetic, genesis, general, and so on. Lycos provides the use of the period character (.) after a word to prohibit its expansion: gene. will get just gene, and not genetics or general.


Opentext

Opentext is in a state of flux from its early days, so the information on the help pages, if you can find them, is no longer accurate. Features and navigation have changed. It is still, however, an excellent search tool.

The default search window is what used to be called the Power Search. Basically, it presents 3 search windows into which a word or phrases can be entered, separated by a qualifyer as to where to search (anywhere (default), document summary, title, first heading, or URL) and also separated by booleans (AND (default), OR, BUT NOT, NEAR). It goes like this:

Search for [enter your search term(s)] within [choose where] [boolean option to connect to next search term]. Three terms can be entered and qualified.

Opentext does not support a wildcard expansion character, but does handle plurals nicely. Do not enter plural search terms. Opentext will search for plurals automatically, including such plurals as geese.

Booleans: The effort to make the use of booleans and proximity operators simple has backfired. Entering the actual operators and grouping terms with parentheses is far easier and quicker than selecting from boxes. Understanding the logical interpretation of the operator is also more difficult when laid out in linear fashion like this.

Proximity Operators: Opentext implements both the NEAR operator, with a non-adjustable range of 80 words, and also the FOLLOWED BY operator (like Webcrawler's ADJ operator where word order matters--once again, with a non-adustable range of 80 words. Such a large range reduces the usefulness of these operators.

Opentext does not limit whole words, so that a search for the word head will also get hits on headstrong and headline. It will also miss terms if entered in plural rather than singular. Exact, correct spelling is important with Opentext.

Very good features include the ability to see the terms from the referenced page that caused the hit (the 'see match on page' option at the bottom of each hit summary), and the search refining option to 'find similar pages', also at the bottom of each summary item.


Infoseek

Infoseek was once the only Netscape default search engine. It is not the best available. Its virtues are speed and ease of use. Its defects are a lack of sophistication (booleans are not supported) and a 'teaser' approach to showing the first 100 hits and offering to show more for pay. It is both a search engine, and a searchable subject catalog, with options to search Usenet newsgroups, email addresses and web FAQs.

Searches are quasi-case sensitive. Capitalized words are taken as proper nouns and the search is limited. Searching for Babe will find the famous hitter and the famous pig, searching for babe, will also find the Sonny and Cher lyrics. Adjacent capitalized words links them into a phrase. Capitalized phrases must be separated with commas: The Great Bambino, Baseball Hall Of Fame. Phrases may be formed by enclosing the words in double quotes: "i've got you babe". Yet a third way to link words into phrases is to place hyphens between them: wonderful-life.

Required/Prohibited Operators: By prepending a word with a + symbol it requires that the term must be in the documents found by the search. Prepending a - symbol excludes documents containing that term from the search results: +Lincoln -automobile. There cannot be a space between the + or - sign and the affected word.

Proximity operator: Placing words in square brackets causes a hit if they are found within 100 words of each other: [immune disease].

To search Infoseek's 'select sites' (their subject catalog) change the search option from World Wide Web to 'Infoseek Select Sites' on the form provided next to the search term window. There are several other options available, including Reuters news stories.


Yahoo!

Yahoo is not a search engine, but strictly a heirarchically arranged subject index. It has developed over a long time, with lots of editorial care, so the quality is very high. Browsing Yahoo is the best way to surf for good sites when you don't know (or perhaps care) where exactly you are going. It is also the best way to find good 'starter' sites, from which you can branch out to more specailized ones.

Using Yahoo is simple. Just enter your search term(s) in the search window and click SEARCH. Yahoo will return three types of information: 1) Yahoo categories that match the search term (so you can explore them for cross referencing); 2) Actual matching end-sites; and 3) The Yahoo categories from which the various pages are indexed--sort of a 'much broader term' cross reference. Though you cannot create very sophisticated searches as with the search engines, you can control:

You may access these controls by clicking the small 'options' link next to the main search window.

Yahoo has a couple of other unique features: At the bottom of each results page links to search engines are provided. By clicking on Yahoo Remote you can invoke a secondary Netscape window which you can minimize and then maximize whenever you need to do a quick search.

If the essential search engine is AltaVista, the essential subject catalog is Yahoo! Don't surf without it.


NlightN

NlightN is more along the lines of the classical information/document delivery service, like Ebsco. The difference is you can use NlightN's Universal Index free, and only pay if you order a document. It indexes not only the web, but reference works, news wires, books, dissertations, and many public and private databases. NlightN bills itself as 'the world's largest table of contents,' but this is the sort of hype one gets used to using the Internet. AltaVista is the largest, but it depends on what you mean by 'table of contents.' Remember, this is a for profit organization, but you cannot spend money by accident, and are free to use the databases for research. They ask that you sign up for a free NlightN account, but this is not required. If you do, you gain some searching power, but you are free to make the call. If you are serious about using the service for pay, get the FAQ available from the help pages.

NlightN's search window is simple. Just enter the term(s) and click FIND. You will be taken to an intermediate screen which tells the occurence of the term(s) in:

Choose WWW by clicking on the link. You will find that the web is less thoroughly indexed than other areas, yet you may still find some useful information.

Booleans: The default connector between search terms is AND. That is, there is an implicit AND between each term. In order to construct your own boolean searches, use the symbol & for AND, | for OR (the vertical bar, or piping symbol), and ^ for NOT (the circumflex or caret). For example (Army & Navy) ^ (Air Force). By grouping in parentheses (Air Force) NlightN will consider the term both as a phrase and as two search terms. NlightN is the exception to most search engines in that entering fewer search terms is better than many. Your search will proceed faster.

What is said above applies before you sign-up. After you sign-up you will be given a userid and password, and your search window will look different. You will be given the choice to search within fields (such as traditional author/title/subject fields in a library catalog) and you can control the scope of the databases being accessed from the search window. You will also have access to a LIMIT/FILTER and a SEARCH LOG option above the search window so that you can focus searches or access your recent searches. If you decide to sign-up, get the FAQ which will explain these options more fully.


The Internet Sleuth

This is a very useful tool, but not as inclusive as you might imagine. Its concept is somewhat different from the tools considered here. It indexes a large number of databases, and provides a front end from which they may be searched. Therefore, in the opening search box it is best to put as broad a single term as possible, and then from the resulting search window(s) be more specific. For example, if I were searching for the lyrics to that timeless classic by Sonny Bono called 'I Got You Babe,' I would search initially on "music." This search would result in a list of 29 searchable databases, such as the CD-ROM Database, Music Colleges, Chicago Concerts, Smithsonian Folkways, and so on. Among the databases (for each of which the Sleuth presents you with a search window) there is one called Lyrics Server. In the search window titled Artist or Title the terms "sonny and cher" result in a list of two songs: 'I Got You Babe', and 'The Beat Goes On'. (Where are all the other great hits, one wonders). Clicking on the appropriate title yields the actual lyrics--reading which, in the light of subsequent history, is somewhat amusing.

Where the database being searched allows for booleans or wildcards, the Sleuth gives you search hints next to the appropriate search window. Even the Yahoo index can be searched from within the Sleuth.


Magellan

Magellan is not actually a search engine, but rather an on-line guide to the Internet that contains a directory of rated and reviewed sites, along with an index to lots of unreviewed sites. It is like Yahoo, only less inclusive with a more thorough rating system. (One to four stars, rather than Yahoo's shades to indicate a cool site). Magellan's strength is its system of reviews. It is not a good starting place to do a search, but is rather more useful when looking for sites which are tried and true. The emphasis at Magellan is on pop sites (UFOs are one of the main categories on the front page), but if that is what you are looking for the site is great. The only drawback is the inevitable advertising.


Summary of Search Engine Features

The following table summarizes some of the common features of the search engines discussed above. Boldfaced elements mean that, in my opinion, this search engine makes the best implementation of this feature. The Internet Sleuth and Magellan are not included, since their features are so different from the others here considered. The names at the head of the columns are hot.

Internet Search Engines

Category

AltaVista

Excite

WebCrawler

Lycos

OpenText

InfoSeek

Yahoo!

NlightN

Case Sensitive?

Y

N

N

N

N

Y

N

N

Considers Phrases?

Y

N

Y

N

Y

Y

N

Y

Required Term Operator

+

+

N

N

N

+

N

N

Prohibited Term Operator

-

-

N

N

N

-

N

N

Wildcard Expander

*

N

N

$

N

N

N

N

Limiting Character

N

N

N

.

N

N

N

N

Results Ranking?

Y

Y

Y

Y

Y

Y

N

N

Controllable Results Ranking?

Y

N

N

Y

N

N

N

N

Booleans Allowed?

Y

Y

Y

N

Y

N

N

Y

Proximity Operators Allowed?

Y(10)

N

Y(range)

N

Y(80)

Y(100)

N

N

Subject (Directory) Searching?

N

Y

Y

Y

N

Y

Y

N

Refine Based On First Search?

N

Y

N

N

Y

N

N

N

Controllable Display Format?

Y

N

Y

Y

N

N

N

N


General Search Tips

What is the best search tool? It depends on your premises and why you need the information. If you are just browsing, start at Yahoo, or use the directory services of Webcrawler (GNN) or one of the other subject catalogs. If you are looking for best of web--and your interests are "pop,"--use Magellan. If you need a specialized database, try Internet Sleuth first. If you are doing "serious" research, start with AltaVista, but be prepared to use the other good search engines too, and follow these general rules of thumb:

back