Comment désindexer vos pages de Google
Les quelques techniques de base pour faire en sorte que Google ne référence pas vos pages:
Anatomie du moteur de recherche de Google

Suite à mon précédent billet sur l’anatomie du noyau Linux, voici un article sur l’anatomie du moteur de recherche de Google ou plus précisément The Anatomy of a Large-Scale Hypertextual Web Search Engine
L’article date de 1997 et a été rédigé par les deux créateurs de Google, Sergey Brin et Larry Page. Le document résume l’anatomie, le fonctionnement et le futur du moteur de recherche de Google:
- Web Search Engines — Scaling Up: 1994 - 2000
- Google: Scaling with the Web
- Design Goals
- PageRank: Bringing Order to the Web
- Anchor Text
- Other Features
- Information Retrieval
- Differences Between the Web and Well Controlled Collections
- Google Architecture Overview
- Major Data Structures
- Crawling the Web
- Indexing the Web
- Searching
- Storage Requirements
- System Performance
- Search Performance
- Future Work
- High Quality Search
- Scalable Architecture
- A Research Tool
La vision de Sergey Brin et Larry Page en 1997 peut se résumer ainsi:
People are still only willing to look at the first few tens of results. Because of this, as the collection size grows, we need tools that have very high precision (number of relevant documents returned, say in the top tens of results). Indeed, we want our notion of “relevant” to only include the very best documents since there may be tens of thousands of slightly relevant documents.
There is quite a bit of recent optimism that the use of more hypertextual information can help improve search and other applications. In particular, link structure and link text provide a lot of information for making relevance judgments and quality filtering. Google makes use of both link structure and anchor text.
L’article présente aussi une formule pour calculer le PageRank:
We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
The Anatomy of a Large-Scale Hypertextual Web Search Engine donne des réponses à tout ce que vous avez toujours voulu savoir sur le moteur de Google et que vous n’avez jamais osé demander.
Via le Blog de José Duenas























