Project

General

Profile

Feature #837

Adding a robots.txt files to avoid some problems

Added by Hubert Borderiou over 9 years ago. Updated over 9 years ago.

Status:
Bug resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
26/03/2010
Due date:
% Done:

100%

Estimated time:
Complexity:
Normal
SCRUM pts - complexity:
2

Description

Hi,

I had two problems that make think about adding a robots.txt file for my Chamilo web site.

1. a teacher want to have is course open for the planet, with the Assigment tool visible.
During the time the course was open, some web bot crawlers have indexed the work of students.
When the student search the firstname+lastname of the teacher on Google, the first answer was a link to all there dissertations, and she was very cross to have her works indexed by web-bot-crawler.

2. last day I saw my server very very slow, with lots of httpd process openend, and a catastrophic load average... actually, a Yahoo web craler bot was analysing my Chamilo web site. I checked my access log, and it was analysed a planet-visible Chamilo course with Document tool available. The web crawler, as a good web crawler, tests every URL it founds on the page, and was calling the url <Save (ZIP)> of the Docuemnts tool of this course which have really a lot of big documents.
The server abort the script after 60seconds because of fatal error "script max time", the script is too long to execute. The bot try 5 times to call this link... that really slowed my server.

As a result I put a robots.txt file for my site.
The 2 first Disallow to prevent point 1, and the last line for point 2.
I've tested this robots.txt with the Google Web Developpers tools, and it works fine.

[File robots.txt]
User-Agent: *
Disallow: */work/
Disallow: */work
Disallow: */document.php?*downloadfolder

Hope this help,
regards

History

#1

Updated by Julio Montoya over 9 years ago

  • Assignee set to Julio Montoya
  • Target version set to 1.8.7 beta
#2

Updated by Julio Montoya over 9 years ago

  • Status changed from New to Needs more info
  • Assignee changed from Julio Montoya to Hubert Borderiou
  • % Done changed from 0 to 90

Hello Hubert
I added the robots.txt but I added more directories and files

http://code.google.com/p/chamilo/source/detail?r=faef4135e728bacaa5fda0871e253731193971a3&repo=classic

#3

Updated by Hubert Borderiou over 9 years ago

Hi,
thanks for your help and the url of the robots.txt checker.
Googlebots seem to have a different syntax, and I've only test my file with the Google checker (shame on me :s)

I've found 5 Googlebots : Googlebot Googlebot-Mobile Googlebot-Image Mediapartners-Google Adsbot-Google

I've changed my robots.txt file using the common syntax and the google one, I hope it's going to be fine...
(my teachers want to have the document folder of there public courses indexed by Googlebot)

User-agent: *
Disallow: /work/
Disallow: /course/
Disallow: /archive/
Disallow: /documentation/
Disallow: /home/
Disallow: /main/
Disallow: /plugin/
Disallow: /tests/
Disallow: /license.txt
Disallow: /README.txt

User-agent: Googlebot*
Disallow: /work/
Disallow: */work
Disallow: */document.php?*downloadfolder
Disallow: */archive/
Disallow: */archive
Disallow: */documentation/
Disallow: */documentation
Disallow: */home/
Disallow: */home
Disallow: */main/
Disallow: */main
Disallow: */plugin/
Disallow: */plugin
Disallow: */tests/
Disallow: */tests
Disallow: /license.txt
Disallow: /README.txt
Disallow: _

Regards

#4

Updated by Julio Montoya over 9 years ago

  • SCRUM pts - complexity changed from ? to 2
#5

Updated by Yannick Warnier over 9 years ago

  • Target version changed from 1.8.7 beta to 1.8.7 RC1
#6

Updated by Hubert Borderiou over 9 years ago

It's ok
No more "script max time" in my access_log ^^

#7

Updated by Julio Montoya over 9 years ago

  • Status changed from Needs more info to Assigned
  • Assignee changed from Hubert Borderiou to Julio Montoya
  • Target version changed from 1.8.7 RC1 to 1.8.7.1

I already add a robots.txt

#8

Updated by Julio Montoya over 9 years ago

  • Status changed from Assigned to Bug resolved
  • % Done changed from 90 to 100

Feel free to reopen if you want to add something more ...

Also available in: Atom PDF