Bug #1450

HTMLPurifier taking too much memory in forum

Added by Yannick Warnier almost 4 years ago. Updated over 3 years ago.

Status:Bug resolvedStart date:03/06/2010
Priority:UrgentDue date:04/06/2010
Assignee:Julio Montoya% Done:

60%

Category:-Estimated time:4.00 hours
Target version:1.8.8 stableSpent time:-
Complexity:Normal SCRUM pts - complexity:13

Description

In Chamilo 1.8.7, on campus.chamilo.org, a memory limit is reached easily when loading the forum homepage (each forum category seems to require at least 20M of memory). On a developed forum, HTMLPurifier is asking for more than 100M just to display the forum homepage.

Find out why and fix.

[Thu Jun 03 20:57:48 2010] [error] [client 200.31.107.182] PHP Fatal error: Allowed memory size of 62914560 bytes exhausted (tried to allocate 76 bytes) in /main/inc/lib/htmlpurifier/library/HTMLPurifier/DefinitionCache/Serializer.php on line 33, referer: http://campus.chamilo.org/courses/FOROSCHAMILO/?id_session=0&autoreg=1


Related issues

related to Chamilo LMS - Feature #1400: Create a new HTML filter for every content added by user New 28/05/2010
related to Chamilo LMS - Feature #1633: Transfering some encoding-related chnges for HTMLPurifier Feature implemented 25/06/2010
related to Chamilo LMS - Feature #1634: Add performance info when chamilo is set as a Test server Bug resolved 25/06/2010
duplicated by Chamilo LMS - Bug #1449: Bugs, when campus is updated from 1.8.6.2 to 1.8.7 Feature implemented 03/06/2010
duplicated by Chamilo LMS - Bug #1436: Forum in campus.chamilo Feature implemented 03/06/2010

Associated revisions

Revision 12000:f49c82641d56
Added by Julio Montoya almost 4 years ago

Replacing security:remove_XXS with intval to reduce htmlpurifier calls see #1450

Revision 52c88df6
Added by Julio Montoya almost 4 years ago

Replacing security:remove_XXS with intval to reduce htmlpurifier calls see #1450

History

#1 Updated by Simon Legner almost 4 years ago

The same happened in the Quiz module to me. It seems to be working after an update to version 4.1.1 of HTMLPurifier [1].

[1] http://htmlpurifier.org/download

#2 Updated by Julio Montoya almost 4 years ago

Some tests with HTMLPurifier with NO chamilo results in Mb

HTMLPurifier 4.1 HTMLPurifier 3.3
No configuration 2.58 2.52
Modifying the construct with Chamilo parameters 2.75 2.68
Adding attributes file (allowed_tags.inc.php) 2.77 2.7

#3 Updated by Julio Montoya almost 4 years ago

Test results with Chamilo 1.8.7

memory_get_usage() added in footer in Mb

1. Test in a forum with 33 posts with some youtube videos + a FLV video (viewthread.php)
2. Test in a forum with 33 posts with some youtube videos + a FLV video (forum/index.php)
3. Test in a empty forum (viewthread.php)
4. Test in a empty forum (forum/index.php)
5. user_portal with 5 course
6. Course home
7. Chamilo index.php

Best results in bold

HTMLPurifier 4.1+safeObject HTMLPurifier3.3 +safeObject HTMLPurifier 4.1 HTMLPurifier 3.3 HTMLPurifier4.1 Standalone+Safeobject HTMLPurifier4.1 Standalone HTMLPurifier4.1 static +safeobject HTMLPurifier4.1 static + NO safeobject
Test.1 27.64 27.54 14.42 15.23 29.58 16.43 15.19 15.07
Test.2 22.25 22.08 13.75 13.73 24.34 15.92 14.52 14.39
Test.3 17.34 17.26 13.67 16.64 19.44 15.85 14.54 14.31
Test.4 20.41 21.05 13.70 13.63 22.51 15.88 14.46 14.34
Test.5 13.84 13.84 13.84 13.85 17.58 No tested 13.84 13.84
Test.6 12.72 12.72 12.72 12.72 16.46 No tested 12.71 12.71
Test.7 10.81 10.81 10.81 10.81 14.55 No tested 10.81 10.81

#4 Updated by Julio Montoya almost 4 years ago

  • % Done changed from 0 to 10

Chamilo memory test as an admin

Scenario:

New file created test.php in the root /var/www/chamilo-classic/test.php

This file have:

require_once './main/inc/global.inc.php';
Display::display_header('Test');
Display::display_footer();

Some results

Commenting or uncommenting the HTMLPurifier
  • 7.67 Mb with the htmlpurifier commented in global.inc.php
  • 7.70 Mb with the htmlpurifier available in globa.inc.php
If we add one security::remove_XSS($var) with 1500 characters in the file it will take:
  • 9.07 Mb with htmlpurifier 4.1 with safeobject, without safe object is the same thing because we don't use the user status parameter.
If we add one security::remove_XSS($var, STUDENT) with 1500 characters it will take:
  • 9.99 Mb with htmlpurifier 4.1 with safeobject

And if we add some "require_once" functions the same as the forum/viewthread.php:

require_once api_get_path(LIBRARY_PATH).'formvalidator/FormValidator.class.php';
require_once api_get_path(LIBRARY_PATH).'groupmanager.lib.php';
require_once 'main/forum/forumconfig.inc.php';
require_once 'main/forum/forumfunction.inc.php';

We will have:

  • 13.24 Mb Require_once added + 1 security::remove_XSS($var);

But if I use the infamous prepare4display function we will have

  • 14.20 Mb Require_once added + prepare4display with a string with 1500 chars
    prepare4display is heavier because it use security::remove_XSS with one more parameter STUDENT
  • 14.20 Mb Require_once added + Security::remove_XSS($var, STUDENT);

As we know when we add the STUDENT parameter it will load the HTML.SafeObject parameter

IF we add 10 security::remove_XSS($var, STUDENT)
  • 19.66Mb
IF we add 30 security::remove_XSS($var, STUDENT)
  • 27.50 Mb
IF we add 30 security::remove_XSS($var);
  • 9.09 Mb
IF we add 10 security::remove_XSS($var);
  • 9.07 Mb

#5 Updated by Julio Montoya almost 4 years ago

I create a thread in their forum http://htmlpurifier.org/phorum/read.php?3,4718

#6 Updated by Julio Montoya almost 4 years ago

  • % Done changed from 10 to 20

Using the require 'HTMLPurifier.includes.php'; file instead of the HTMLPurifier.auto.php does not help at all.
The INSTALL file here http://htmlpurifier.org/live/INSTALL said that is recommended for better performance.
But I test it with the "test 1" and it needs 29 Mb. No need to add a new row.

#7 Updated by Julio Montoya almost 4 years ago

When we see the table is clear that the problem is the HTML.safeobject is incredible that this parameter changes everything and start eating memory.
The Standalone version does not help at all with the memory performance.

#8 Updated by Julio Montoya almost 4 years ago

In 1.8.6.2 This problem doesn't exist because there was no filtering in the content, so no call to the security::remove_XSS and no SafeObject and no problem at all.

#9 Updated by Julio Montoya almost 4 years ago

I see that we call the security::remove_XSS many times more than 20 in this file by example
http://code.google.com/p/chamilo/source/browse/main/forum/viewthread_flat.inc.php?repo=classic

#10 Updated by Ivan Tcholakov almost 4 years ago

Security::remove_XSS(...) now is:

    public static function remove_XSS ($var,$user_status=ANONYMOUS) {
        $purifier = new HTMLPurifier(null,$user_status);
        if (is_array($var)) {
            return $purifier->purifyArray($var);
        } else {
            return $purifier->purify($var);
        }

    }

On every call of this method a new instance of the class HTMLPurifier is created. When the call is completed, the instance of HTMLPurifier is destroyed and I suppose that the PHP's garbage collector releases the correspondent memory not immediately, but with some delay. On the other hand creation/destruction of the same instance of HTMLPurifier 20 times in the mentioned page maybe is not good for performance.

Suggestion: Let us create the needed instance of HTMLPurifier once and destroy it when the page is loaded. I don't know whether this suggestion would help about the problem here, but it worths to be tried. Here is the modification, it is trivial:

    public static function remove_XSS ($var,$user_status=ANONYMOUS) {
        static $purifier = array();
        if (!isset($purifier[$user_status])) {
            $purifier[$user_status] = new HTMLPurifier(null,$user_status);
        }
        if (is_array($var)) {
            return $purifier[$user_status]->purifyArray($var);
        } else {
            return $purifier[$user_status]->purify($var);
        }

    }

#11 Updated by Ivan Tcholakov almost 4 years ago

A notification: I have just opened another task Feature #1633 that is about transfering two small encoding-related customizations from the previous version of HTMLPurifier. It is minor, I am going to implement it.

#12 Updated by Ivan Tcholakov almost 4 years ago

I have read the document http://htmlpurifier.org/live/INSTALL . It is written inside that if the used encoding is non-UTF-8, it should be set in HTMLPurifier's configuration. Since it is possible Chamilo to be set to be non-UTF-8 system, here is the actual proposal:

    public static function remove_XSS($var, $user_status = ANONYMOUS) {
        static $purifier = array();
        if (!isset($purifier[$user_status])) {
            $config = HTMLPurifier_Config::createDefault();
            $config->set('Core.Encoding', api_get_system_encoding());
            $config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
            $purifier[$user_status] = new HTMLPurifier($config, $user_status);
        }
        if (is_array($var)) {
            return $purifier[$user_status]->purifyArray($var);
        } else {
            return $purifier[$user_status]->purify($var);
        }
    }

#13 Updated by Ivan Tcholakov almost 4 years ago

The previous, shorter proposal is better, it should be tried:

    public static function remove_XSS($var, $user_status = ANONYMOUS) {
        static $purifier = array();
        if (!isset($purifier[$user_status])) {
            $purifier[$user_status] = new HTMLPurifier(null, $user_status);
        }
        if (is_array($var)) {
            return $purifier[$user_status]->purifyArray($var);
        } else {
            return $purifier[$user_status]->purify($var);
        }
    }

Why? I saw that there is customized code in the file .../library/HTMLPurifier/HTMLPurifier.php, in the method __construct(). The system encoding and the used document type are set correctly inside __construct(). It is enough the instance of HTMLPurifier just to be created using the simpler way.

Ok, this is from me so far.

#14 Updated by Julio Montoya almost 4 years ago

I'm updating the table above with Ivan's proposal

Ivan Tcholakov wrote:

The previous, shorter proposal is better, it should be tried:

[...]

Why? I saw that there is customized code in the file .../library/HTMLPurifier/HTMLPurifier.php, in the method __construct(). The system encoding and the used document type are set correctly inside __construct(). It is enough the instance of HTMLPurifier just to be created using the simpler way.

Ok, this is from me so far.

#15 Updated by Julio Montoya almost 4 years ago

  • % Done changed from 20 to 50

With Ivan's proposal + the HTML.safeobject it works fine. Between 14 - 15 Mb for memory usage instead of 25Mb (With no static + safeObject)for Test.1

Changing the HTMLPurifier to static will get Chamilo a little bit "heavier" 1mb more (see table results), but when using/calling the function (security::remove_XSS) many times it won't grow up to 25Mb-30Mb.

Thanks Ivan for your proposal.

Ivan Tcholakov wrote:

The previous, shorter proposal is better, it should be tried:

[...]

Why? I saw that there is customized code in the file .../library/HTMLPurifier/HTMLPurifier.php, in the method __construct(). The system encoding and the used document type are set correctly inside __construct(). It is enough the instance of HTMLPurifier just to be created using the simpler way.

Ok, this is from me so far.

#16 Updated by Yannick Warnier almost 4 years ago

OK Julio, nice debugging table. You make me proud :-)

Ambush commander answered your post on the forum. Apparently the idea is to only filter the forum once. He recommends putting the filtered version in the cache, so it isn't processed twice. If this is easy to do (and you can put the cache in the archive/cache/ directory), then I would go for that solution.

#17 Updated by Julio Montoya almost 4 years ago

So we should create a new cache system in Chamilo to manage htmlpurifier. That will be a new feature.
I proposed to close this task changing to static the htmlpurifier object creation and create a new task proposing a caching system for Chamilo.

Yannick Warnier wrote:

OK Julio, nice debugging table. You make me proud :-)

:)

Ambush commander answered your post on the forum. Apparently the idea is to only filter the forum once. He recommends putting the filtered version in the cache, so it isn't processed twice. If this is easy to do (and you can put the cache in the archive/cache/ directory), then I would go for that solution.

#18 Updated by Ivan Tcholakov almost 4 years ago

Without wish to interfere in the decision-making process, here I would like to share my interpretation of Ambush commander's recommendations. He says:

You shouldn't be running HTML Purifier on display: you should cache its results so you only need to call the purifier once. Check out http://htmlpurifier.org/docs/enduser-slow.html

Let us have a look at the document http://htmlpurifier.org/docs/enduser-slow.html . It recommends two ways for speeding up the HTMLPurifier:

1. Using inbound filtering only, if the page gets too slow. This means, when the page is to be displayed, Security::removeXSS() (which calls HTMLPurifier inside) should not be applied. Security::removeXSS() should be applied on all the "dangerous" fields within a form only when user makes changes submits them (clicking on a button "OK", "Save", etc.).

I think, this advice is applicable for the most heavy pages. For the light pages (not so many calls of Security::removeXSS()) the extra-filtering on displaying data (without editing) may stay as it is, just in case.

2. Caching the filtered output. According to my understanding, behind every "controlled" field in the form there should be two fields in the corresponding database record. These two database fields contain the unaltered version of data (which should be used as initial data on editing) and filtered data (which should be used when data is displayed).

Well, I think this advice is not applicabe (and maybe is not relevant everywhere) in our system. This way is too complex and involves many changes, including in the database structure. Noo. :-)


I think, that removing some Security::removeXSS() calls for forum homepage an other identified "heavy" pages which display (not edit) data is enough for closing this task. Opening a separate task is not needed.

#19 Updated by Julio Montoya almost 4 years ago

  • % Done changed from 50 to 60

Well I proposed something like that here #1400
But it will be a problem when doing a migration to the 1.8.8.
Maybe just testing this feature for the forum? It will be a partial fix.

In the forum code there are many remove_XSS function that could be replace with intvals.

Ivan Tcholakov wrote:

Without wish to interfere in the decision-making process, here I would like to share my interpretation of Ambush commander's recommendations. He says:
[...]
Let us have a look at the document http://htmlpurifier.org/docs/enduser-slow.html . It recommends two ways for speeding up the HTMLPurifier:

1. Using inbound filtering only, if the page gets too slow. This means, when the page is to be displayed, Security::removeXSS() (which calls HTMLPurifier inside) should not be applied. Security::removeXSS() should be applied on all the "dangerous" fields within a form only when user makes changes submits them (clicking on a button "OK", "Save", etc.).

I think, this advice is applicable for the most heavy pages. For the light pages (not so many calls of Security::removeXSS()) the extra-filtering on displaying data (without editing) may stay as it is, just in case.

2. Caching the filtered output. According to my understanding, behind every "controlled" field in the form there should be two fields in the corresponding database record. These two database fields contain the unaltered version of data (which should be used as initial data on editing) and filtered data (which should be used when data is displayed).

Well, I think this advice is not applicabe (and maybe is not relevant everywhere) in our system. This way is too complex and involves many changes, including in the database structure. Noo. :-)


I think, that removing some Security::removeXSS() calls for forum homepage an other identified "heavy" pages which display (not edit) data is enough for closing this task. Opening a separate task is not needed.

#20 Updated by Julio Montoya almost 4 years ago

Security::remove_XSS remove here (only for $_GET values forum ids and thread_ids)
http://code.google.com/p/chamilo/source/detail?r=f49c82641d5693c6697c472385996fbcd5e8c7cf&repo=classic

#21 Updated by Yannick Warnier over 3 years ago

OK, let's keep it that way until just after 1.8.7.1. I'm not in favor of database changes just for security purposes, but I'm OK with a heavy upgrade process.

#22 Updated by Yannick Warnier over 3 years ago

  • Target version changed from 1.8.7.1 to 1.8.8 stable

#24 Updated by Julio Montoya over 3 years ago

we can close this task?

#25 Updated by Yannick Warnier over 3 years ago

  • Status changed from Assigned to Bug resolved

Yes, it is pretty much resolved now. Well done!

Also available in: Atom PDF