In order to return suitable search results, search engines must be able to understand whether or not the content of a page is relevant for a given search request. While most people would probably find it easy to decide if a text provides useful information on a given topic, it is not trivial for a machine to perform the same task as it lacks human intelligence. However, there exist powerful algorithms that allow search engines to evaluate a website’s relevance in relation to a certain topic in virtually no time.
Usually, a search engine would assess various aspects of a website (commercial search engines up to hundreds of them)1), which are considered to vary in importance. All of them are rated and then weighted according to their importance. Finally, a value is calculated to represent the relevance. While there are many different search engines that differ a great deal in the way they rate websites, they still do have a lot of criteria in common. A very simple but yet important criteria is the relative frequency of the search word on the page. It sounds pretty common sense that an article dealing with a certain topic would – unavoidably – use the key words belonging to that topic more often than a text about something completely different.
However, since relevance is a crucial factor for deciding how high a website should be placed in the search result list and since this in turn often is a crucial factor for economic success of a business, many cases a known where people have been trying to deceive search engines. Good algorithms need thus to tell the difference between high-quality, suiting websites and ones that merely try to get a high ranking without actually offering good content. One example of such “cheating” is keyword stuffing, which means repeating the key words over and over again on a website. This generates a high frequency of the keywords, but no interesting content and is considered a form of spam. It is therefore not the frequency itself but rather the frequency rate (also called keyword density) that is used as criteria to determine the page’s relevance. In order to avoid spamming, the algorithms used by search engines are usually not published.
Most search engines would favor a keyword density of 3-8% based on the total text within the document body. An analysis of keyword density of various combinations of search words and websites shows, however, that Google is less strict about this aspect and every now and then returns pages with a keyword density above 15%2).
Besides the keywords themselves, there are as stated before many many other aspects used in ranking a page's relevance. One possible aspect is the position of the search term(s) within the text. In journalism, important pieces of information are usually put to the text's beginning, rather then the middle section or - even less likely - the end. An aspect that is typically seen as very important is the number of links that point on a given page. Search engines even collect information on which pages people actually click on after having them displayed as search result. If many people indeed click on a displayed page, it must be valuable.