Syllabus SEO Training

Site Map | Contact | About

Tel: +34 693 475 142

Syllabus Public SEO Training Seminars

How many sales do you lose through poor SEO

Attempting to promote your website online, often against stifling competition, can seem an impossible prospect especially if you only have limited knowledge of the techniques required. Learn what it takes to get to the top.

View Our SEO Course Timetable

Contact Syllabus

Syllabus SEO Training,
25 Daisy Street, Glasgow,
Scotland, UK.

Contact Syllabus

3.5   How search engines view web pages

As we noted above, this process does not require human intervention. There is nobody telling Google, for example, that ‘cars’ and ‘vehicles’ are related terms. Instead LSI finds related terms all by itself simply by looking at enough documents.

LSI, in fact, is simply a statistical and mathematical computation that looks at word patterns across documents. It is not an Artificial Intelligence programme that gives Google a way to actually read documents as humans would. In fact, the search engine that uses LSA to index pages remains as stupid as ever in the sense that it cannot understand even the basic meaning of words.

But that is not to say that LSI doesn’t focus on word meaning. Nor does it pay attention to every single word on the page.

In every language, you have two different kinds of word:

In simple terms, the first kind of word has some kind of meaning for us (i.e., we can visualise what a car is or understand the concept of liberty), while the second doesn’t have the same kind of meaning (ask yourself, what is the meaning of ‘the’?) In other words, words can be divided those that carry meaning and those which do not.

LSI works by stripping documents of function words and extraneous terms to focus on terms with semantic content. It is useful to know this, as it is what a search engine will be doing to the words on your web pages when it reads them.

In fact, the search engine employs what is known a stop list in order to strip web pages down to a skeleton of content words. This stop list is a list of commonly used words, function words, verbs, prepositions, etc, which it removes from the page to focus on words that carry the main meaning of the page. This greatly reduces the ‘noise’ on the page and helps the search engine determine what the page is about.

This is all part of a process the search engine performs upon web pages in order to determine the relevance of each page objectively. The process LSI performs upon web pages when indexing a document is as follows:

The search engine removes all markup tags (i.e. code) from a page so that all its content is represented as a series of characters. The search engine moves through the page systematically, working from top to bottom and left to right, removing content of from tags as it finds it.

The search engine strips the page of formatting such as punctuation, capitalisation and markup.

The search engine applies a stop list to remove commonly used words from the document. This leaves us with only content words

The remaining content words are then ‘stemmed’. That is to say that the remaining terms are reduced to common word roots (e.g. ‘techno’ for ‘technology’, ‘technologies’, ‘technological’).

Weighting is the process of how determining how important a term is in a document.

3.5.1  Term Weighting

By ‘term weighting’ we mean the importance given to terms or words that appear in a document.

A search engine does not see all terms in a document as equally important (the use of a stop list, for instance, shows that the search engine treats common words, function words and non-content words as wholly unimportant). Similarly, the search engine does not treat the content words that remain after it has filtered a document as if they are all equally important.

According to Yu et al, the weighting of terms by the search engine is based on two ‘common sense insights’. Firstly, there is a likelihood that content words that are repeated in a single page are more likely to be significant than content words that appear once. Secondly, words that are not used very often are likely to be more significant than words that are used a lot. 

For example, if the word ‘aircraft’ appears a number of times in a single page, it is likely to be fairly significant to that document. Remember that a search engine can’t read, so a recurrence of such terms may just indicate roughly what that page is about.

However, if one takes a word that appears in lots of pages - say, a common content word - then it is treated as less significant. It would not, for example, be much help in allowing the search engine to distinguish between these pages in terms of their different content.

 There are therefore three types of weighting employed by a search engine:

Normalisation simply refers to the process by which documents of different lengths are made to appear ‘equal’. If this did not occur, longer documents - which, of course, contain more keywords - would tend to outweigh or subsume shorter documents.

Local weight refers to the number of times a term appears in a document. A word that features numerous times in a single document will have a greater weight than a word that features only once. This is also known as term frequency (tf).

Global weight refers to the number of times documents in the collection appear that feature the term. This is often referred to as inverse document frequency (IDF)

Keyword weighting is calculated according to the following equation:

tf*IDF

Where tf = term frequency and IDF = inverse document frequency.

3.5.2  Weighting and distribution vs Keyword density

Although this is material is fairly complex as it appears to involve advanced linguistics and complex mathematical formulas, it is useful material to have a basic grasp of as it does have ramifications for the SEO process.

Traditionally, SEO professionals have focused on a thing called keyword density when dealing with term weighting. Keyword density is a measure of the number of times your keywords appear on a page in relation to other terms. For example, if the keyword ‘cars’ appeared three times in a document that contained 100 words, the keyword density for that page would be 0.030 or 3% (3/100).

Under the keyword density model, the more times a keyword appears on a single page, the more likely it is that the search engine will find you relevant for that keyword. Under this system, optimising your page simply involves increasing its keyword density by mentioning your keywords as many times as you can on a single page.

However, SEO professionals are beginning to realise that this is not how search engines work when they look at keywords or determine the importance of terms to a page. Keywords density only refers to the use of keywords on a per page basis and not across the document collection as a whole. As Dr. E. Garcia points out, modern search engines also have to take into account the following factors when dealing with keywords:

These factors have a direct bearing on what a document is about. For example, if the keywords ‘used’ and ‘cars’ have a close proximity, i.e. they appear on the page together as ‘used cars’, then that page is more likely to be about used cars. The same goes when one looks at where the keywords appear on a page (e.g. do they appear in titles and main headings and so forth?).

The concept of keyword density, by contrast, does not take into account the position of keywords in relation to each other on a page. If search engines actually used keyword density as a measure of the relevance of a page, they could potentially return pages that mention ‘used’ and ‘cars’ enough times no matter where they appeared on a page. For the sake of illustration, we could say that the following phrase might make a page relevant for the keyword search ‘used cars’:

‘I used to cycle to work a lot but most people drive their cars to get there.’ 

As you can see, this takes no account of the proximity or distribution of keywords, all of which will have an impact upon what the page is about.

Note: In future units of the course, we will occasionally refer to keyword density as it is a term still used by SEO professionals, and, as a concept, it still works as a suggestive way to get SEO beginners to start increasing the frequency with which they employ their keywords on web pages. However, it will to pay to remember that search engines use a different system than keyword density for determining the importance of keywords on a page. Bear this in mind when we start showing you how to employ keywords in your own pages.

For more information about our Search Engine Optimisation Training Courses contact Syllabus or call +34 693 475 142.

 

 

 

Dating Top
Affiliate Programs
Review Site
Links Html
Seo Services Uk, Web Audit.