Adding a robots.txt files to avoid some problems
I had two problems that make think about adding a robots.txt file for my Chamilo web site.
1. a teacher want to have is course open for the planet, with the Assigment tool visible.
During the time the course was open, some web bot crawlers have indexed the work of students.
When the student search the firstname+lastname of the teacher on Google, the first answer was a link to all there dissertations, and she was very cross to have her works indexed by web-bot-crawler.
2. last day I saw my server very very slow, with lots of httpd process openend, and a catastrophic load average... actually, a Yahoo web craler bot was analysing my Chamilo web site. I checked my access log, and it was analysed a planet-visible Chamilo course with Document tool available. The web crawler, as a good web crawler, tests every URL it founds on the page, and was calling the url <Save (ZIP)> of the Docuemnts tool of this course which have really a lot of big documents.
The server abort the script after 60seconds because of fatal error "script max time", the script is too long to execute. The bot try 5 times to call this link... that really slowed my server.
As a result I put a robots.txt file for my site.
The 2 first Disallow to prevent point 1, and the last line for point 2.
I've tested this robots.txt with the Google Web Developpers tools, and it works fine.
Hope this help,
Updated by Julio Montoya over 9 years ago
- Status changed from New to Needs more info
- Assignee changed from Julio Montoya to Hubert Borderiou
- % Done changed from 0 to 90
I added the robots.txt but I added more directories and files
Updated by Hubert Borderiou over 9 years ago
thanks for your help and the url of the robots.txt checker.
Googlebots seem to have a different syntax, and I've only test my file with the Google checker (shame on me :s)
I've found 5 Googlebots : Googlebot Googlebot-Mobile Googlebot-Image Mediapartners-Google Adsbot-Google
I've changed my robots.txt file using the common syntax and the google one, I hope it's going to be fine...
(my teachers want to have the document folder of there public courses indexed by Googlebot)