Ticket #766 (closed defect: fixed)

Opened 4 months ago

Last modified 2 months ago

Integrating HTMLPurifier Library into QCubed

Reported by: vaibhav Owned by:
Priority: important Milestone: 2.1
Component: QControls Version: 2.0.2 Stable
Keywords: HTMLPurifier, HTML, XSS, Content Filtering, Security Cc:

Description (last modified by alex94040) (diff)

Related thread:  http://qcu.be/content/safeguard-your-qcubed-based-app-xss-htmlpurifier-integrated-qcubed

Well, the first thing - I have marked this to the milestone of 3.0.0 because I DO THINK that this is a huge improvement / addition to QCubed (that may be perhaps coz I am really happy :) ). If an admin thinks it should be a bit lower, I would have no problems either.

There are two files attached. One is enumerations file and the other is the TextBox? base class file. Go through them or create a patch to see the differences. I did not create a patch because
1. I usually mess up with that, specially in QCubed.
2. The version of file on the server and mine might differ coz I do not apply all patches from here.

Basic changes I have done are described below:

=== to _enumerations.inc.php ====

[Add two new consts Purify and PurifyCustom? to the class]:

abstract class QCrossScripting {

const Allow = 'Allow';
const HtmlEntities? = 'HtmlEntities?';
const Deny = 'Deny';
const Purify = 'HTMLPurifierXSSProtectionConfig';
const PurifyCustom? = 'HTMLPurifierEmptyConfig';

}

/////////////////////

=== to QTextBoxBase.class.php ====

[create function]:

public function SetPurifierConfig?($Parameter, $Value) {

$this->HTMLPurifierConfig->set($Parameter, $Value);

}

///////////////////////

[Add switch cases to crossscripting in the validator function]:

case QCrossScripting::Purify:

case QCrossScripting::PurifyCustom?:

// let HTMLPurifier do the job! User should have set it up or be happy with the defaults!
$strText = strtolower($this->strText);
$objPurifier = new HTMLPurifier($this->HTMLPurifierConfig);
$this->strText = $objPurifier->purify($this->strText);

/////////////////////////

[Modified the set function's case of CrossScripting?]

case "CrossScripting?":

try {

$this->strCrossScripting = QType::Cast($mixValue, QType::String);
// Protect from XSS to the best we can do with HTMLPurifier.
if($this->strCrossScripting == QCrossScripting::Purify)
{

/**

  • If we are purifying using HTMLPurify, we will need the autoloader be included.
  • We load lazy to make sure that the library is not loaded everytime 'prepend.inc.php'
  • or 'qcubed.inc.php' is inlcuded. HTMLPurifier is a HUGE and SLOW library. Lazy loading
  • keeps it simpler. */


require_once(INCLUDES .'/external_libraries/htmlpurifier/library/HTMLPurifier.auto.php');


/**

  • We configure the default set of tags (elements) and allowed here
  • (in the setter function) so that the rules are applicable the moment
  • CrossScripting? is set to 'Purify'.
  • If you want to allow any of the tags or attributes blocked below,
  • Please use QCrossScripting::PurifyCustom? and decide all for yourself. */

$this->HTMLPurifierConfig = HTMLPurifier_Config::createDefault();
$this->HTMLPurifierConfig->set('HTML.ForbiddenElements?', 'script,applet,embed,style,link,iframe,body,object');
$this->HTMLPurifierConfig->set('HTML.ForbiddenAttributes?', '*@onfocus,*@onblur,*@onkeydown,*@onkeyup,*@onkeypress,*@onmousedown,*@onmouseup,*@onmouseover,*@onmouseout,*@onmousemove,*@onclick');

}
// Just allow the user to use HTMLPurifier. He decides what HTMLPurifier should do.
if($this->strCrossScripting == QCrossScripting::PurifyCustom?)
{

// If we are purifying using HTMLPurify, we will need the autoloader be included.
require_once(INCLUDES .'/external_libraries/htmlpurifier/library/HTMLPurifier.auto.php');


/**

  • We configure nothing, leaving all to the good wish of the user.
  • It will create a blank / default config only. */

$this->HTMLPurifierConfig = HTMLPurifier_Config::createDefault();

}
break;

} catch (QInvalidCastException $objExc) {

$objExc->IncrementOffset?();
throw $objExc;

}

==============

That is all. To use the facilities provided, you need to use the crossscripting enums as:

$this->txtSomeTextBox->CrossScripting? = QCrossScripting::Purify;
OR
$this->txtSomeTextBox->CrossScripting? = QCrossScripting::Purify;

and use the SetPurifierConfig? function to add other configurations you want:

$this->txtSomeTextBox->SetPurifierConfig?('Core.Encoding', 'UTF-8');
$this->txtSomeTextBox->SetPurifierConfig?('AutoFormat?.AutoParagraph?', true);
$this->txtSomeTextBox->SetPurifierConfig?('HTML.Allowed', 'p,b,strong,i,em,u,i,code,img[src]');

====

Attachments

_enumerations.inc.php Download (4.1 KB) - added by vaibhav 4 months ago.
Modified _enumerations.inc.php file. Removed PurifyCustom? enumeration.
purifier.patch Download (2.7 MB) - added by alex94040 4 months ago.
Merged patch
QTextBoxBase.class.php Download (17.5 KB) - added by vaibhav 4 months ago.
Modified QTextBoxBase.class.php file. changed the file path of HTMLPurifier library's autoloader to variable defined in configuration.inc.php
configuration.inc.php Download (12.8 KB) - added by vaibhav 4 months ago.
Modified configuration definition file.

Change History

  Changed 4 months ago by vaibhav

Made a mistake above (copy - paste he he :P ):

To use the facilities provided, you need to use the crossscripting enums as:
$this->txtSomeTextBox->CrossScripting?? = QCrossScripting::Purify;
OR
$this->txtSomeTextBox->CrossScripting?? = QCrossScripting::PurifyCustom?;

and ... {the rest is ok }

  Changed 4 months ago by alex94040

  • status changed from new to in_QA
  • milestone changed from 3.0.0 Development to 2.2

Thanks so much for doing this! One initial thought: by default, the new purifier should be enacted (as opposed to requiring the developer to enable it). This will allow developers to pick the benefits of the new amazing HTML Purification in their existing apps after they upgrade the framework.

  Changed 4 months ago by vakopian

Very nice vaibhav, thank you.
A couple of necessary changes:

  • Please use the existing naming convention. Namely, variables should have a prefix indicating their type. For example, use $strVariable instead of $Variable, $objHTMLPurifierConfig instead of $HTMLPurifierConfig, etc.
  • Please keep the coding style for spaces, for example there should be a space between the "if" statement and the opening "(". The opening "{" should also be on the same line (typically).
  • It would be nice to have PHPDoc on the methods (e.g. SetPurifierConfig).

Now more substantive:

  • I'm not sure I see the need for the second option "PurifyCustom". I think Purify is enough. The method SetPurifierConfig() can still be used to apply any customization the developer needs. The fact that the two branches in the switch statement in ParsePostData() do the same thing is an indication of this. Yet another indication is the fact, that nothing in the current code prevents me from setting the mode to Purify, then calling SetPurifierConfig() to customize as much as I want. Which is exactly the right behaviour, but it shows that PurifyCustom is redundant.
  • I disagree with Alex that the new mode should be the default. First because that would not be backward compatible and could break legitimate uses in existing applications. And second, because as vaibhav mentioned in the forum post, there is a large cost associated with enabling this functionality, which we should not impose on the developers that don't need it.

  Changed 4 months ago by alex94040

Around enabling this by default vs not: well, if we think cross-site scripting verification is too expensive, we should not be implementing it this way. In my opinion, if I can turn this on for every single textbox on my site, there's no point in using it at all (as the attacker can use the one textbox that is not protected).

I'm ready to be convinced otherwise :)

  Changed 4 months ago by alex94040

  • description modified (diff)

  Changed 4 months ago by vakopian

Alex, that's a good point, but I think it has a better solution: we add a static variable in QTextBox called DefaultCrossScriptingMode (which is set to the current default). Then the constructor initialises the instance property:

$this->strCrossScripting = self::$DefaultCrossScriptingMode;

That way we keep backward compatibility, but allow you set the the default for all your textboxes in one shot:

QTextBox::SetDefaultCrossScriptingMode(QCrossScripting::Purify);

What do you think?

  Changed 4 months ago by vaibhav

@Alex: Yes, I think that the new mode should not be the default. My arguments for that:

1. No developer loves it when he upgrades the framework and his whole creation _SIMPLY FALLS APART_ . This new addition has a lot of potential for that. I could have done that in the code but I did not because I do not want that to happen. Example? Well, I do not think an XSS Code is required on a Login input page. SQL Injection countermeasures are already in place and those are all that are needed. Switch XSS protection on there and for someone, it is really gonna get messy.

2. HTMLPurifier IS A HUGE library and the creators themselves say that it is not a fast library because it takes all the input, tears it apart, analyzes bit by bit and reassembles everything back. So that is not very much needed, not at least everywhere.

I think it is just like Microsoft could have converted all the APIs from the standard windowing interface to full AERO based interface and you could have heard all the developers crying for the change. Well, there is a lot of differences between Windows and QCubed but that was just to help realize how a developer feels when APIs change. I really do not recommend the defaults to be changed straight away. Better, let us have a discussion on the forum for that.

@Vakopian:

1. Made the little changes you suggested about coding style.

2. Doc Blocks were already in place. I mean those

/**
* Comment on what the function does.
*/

3. Coming to 'PurifyCustom?'. I did not add this in the beginning but I had to in the end. It appears that the 'HTML.ForbiddenElements?' takes precedence over 'HTML.Allowed' and 'HTML.AllowedElements?'. Same for 'HTML.ForbiddenAttributes?'. Now, if there were only 'Purify' available to the developer, he would be locked in if he wanted to allow the 'script' tag (for example) in his textbox, because HTMLPurifier would not allow him to let those tags get in (and it does not matter in what order you set the HTML.Forbidden* and HTML.Allowed* config parameters, they are evaluated only when the 'purify' mechanism runs. Locking down the options would not really help.

I noticed all that while testing and so decided to add the 'PurifyCustom?' as well. This makes sure that those who do not know too much about XSS can use the built in protection and feel safe and those who know what they are doing could go on with what they want to do.

Hope this helps you understand by design decision.

  Changed 4 months ago by vakopian

vaibhav,

1. Variable names are still not right: Please use $strVariable and $strValue as parameters for SetPurifierConfig. Also HTMLPurifierConfig is not a string, so please use $objHTMLPurifierConfig instead of $strHTMLPurifierConfig.

2. For PHPDoc, I meant @param annotations in SetPurifierConfig, but that's not very important.

3. About PurifyCustom. I may be missing something, but what prevents me from resetting the settings that are set by default when Purify mode is selected. In your example, if I want to allow the 'script' tag, why can't I do this?

$this->txtSometextBox->CrossScripting = QCrossScripting::Purify;
// overwrite the default, omit 'script'
$this->txtSometextBox->SetPurifierConfig('HTML.ForbiddenElements', 'applet,embed,style,link,iframe,body,object');

Doesn't this reset the configuration of 'HTML.ForbiddenElements', and thus allow the 'script' tag?

Changed 4 months ago by vaibhav

Modified _enumerations.inc.php file. Removed PurifyCustom? enumeration.

  Changed 4 months ago by vaibhav

@Vakopian

1. Done

2. Done

3. Yeah, I did not think that way (that someone might just override it)! Removed the enumeration and its effects in the control. Hope that suffices now? :)

  Changed 4 months ago by vakopian

Looks good.
Except I just noticed that before calling purify() you convert the posted data to lower case:

$strText = strtolower($this->strText);
$objPurifier = new HTMLPurifier($this->objHTMLPurifierConfig);
$this->strText = $objPurifier->purify($strText);

With this you can never get anything that has upper cases from the form, which is definitely too limiting and unexpected. Is this required by HTMLPurifier?

Another question. Is the recommended way of purifying to always create a new HTMLPurifier object every time? How expensive is that instantiation? Can that object be re-used?

BTW, HTMLPurifier seems to have a lot of documentation, but I found it very unhelpful to answer even simple questions that arise...

  Changed 4 months ago by vaibhav

Corrected the strtolower thing. Thanks for pointing out. That wasn't required. In fact I had thought to make it work somewhat like the default 'deny' case just below the Purify case. That was a copy paste which I forgot to delete in the end. My bad!

As far as the configuration object is concerned, creating it has minimal overhead. I went to the file where it gets created and I believe it is pretty light. That object can be reused but I read it somewhere on the HTMLPurifier forums once (when I ran into problems) is that its better to create a new config each time you are putting HTMLPurifier to work (I am not very sure how good that is, I never went deep into HTMLPurifier's code). So I decided to go with what was recommended and anyway it wouldn't hurt much for the creation is quite fast. I did not see any difference between old object and a fresh object for configuration when auditing (did with Chrome). You might want to use advacned tools if you notice any delays.

Its the actual purification call that creates the great overhead. More strict the rules in your config, larger the load. Although HTMLPurifier also does HTML validation and outputs a W3 compliant HTML so it is heavy even with normal config.

for the 'unhelpfulness' - yes, it is not too helpful in the beginning and moreover HTMLPurifier isn't perfect either. It still has a few iussues which they are trying to solve. They aren't bugs by the way, just added functionalities. Go through the configuration doc and read it a few number of times (yeah, I don't get most things the first time either :( ) and you would get it :) . I did not find another library with functionality this close so going with HTMLPurifier alone.

  Changed 4 months ago by vakopian

Now there seems to be a small bug in calling purify($strText): you should call purify($this->strText), I don't think $strText is defined in that context any more (since you deleted the strtolower call).
Didn't PDT put a red warning on that? Just kidding...:-)

  Changed 4 months ago by alex94040

QTextBox::SetDefaultCrossScriptingMode?(QCrossScripting::Purify) <--- Outstanding idea!!!

follow-up: ↓ 17   Changed 4 months ago by alex94040

Or, even, instead of the new method, we can just have a static variable in the QTextBox class that the user can override.

  Changed 4 months ago by vaibhav

@Vakopian: I make so many small mistakes that I am feeling like I should kill myself. And PDT does not do the 'red' underline thing. PHPStorm does? :o

  Changed 4 months ago by vakopian

yes PHPStorm does that, and so much more ... :-)
Thanks for the updated version.

in reply to: ↑ 14   Changed 4 months ago by vakopian

Replying to alex94040:

Or, even, instead of the new method, we can just have a static variable in the QTextBox class that the user can override.

If you mean make QCrossScripting::$DefaultCrossScriptingMode public, so that it can be set directly without a setter method, I don't mind that. But if you mean we just let the user change the QTextBox.class.php file to redefine the value, I don't really like it, since I don't think such a simple setting should necessitate a change in the framework files. BTW, maybe this setting should be put in QApplication instead of QTextBox, since it's already used to set up several configuration settings.

  Changed 4 months ago by vaibhav

Agreed. I think using QApplication for keeping the setting is best keeping in mind the backward compatibility and stuff. +1 for that.

Also, I downloaded a copy of PHPStorm from somewhere with a License key, used it for an hour or so and I think PDT does not stand a chance against this one. Really a very very great tool!

Changed 4 months ago by alex94040

Merged patch

  Changed 4 months ago by alex94040

Uploaded a merged patch file that:
1) Includes the previous patches
2) Has an example that explains XSS and talks about Purifier options
3) Includes the actual Purifier library + hooks it up appropriately into QCubed

The patch file is HUGE (because the purifier library is large).

  Changed 4 months ago by alex94040

  • milestone changed from 2.2 to 2.1

I think this is close enough that we can try to push this for 2.1.

  Changed 4 months ago by vakopian

Alex,
On line 490 in QTextBoxBase.class.php the message in the exception still refers to QCrossScripting::PurifyCustom, could you please remove that before committing.
Also, since usually for enum constants we use the constant names for values, it would probably be preferable to set the value of Purify to 'Purify' in _enumerations.inc.php.

I see that you used the string value in QApplicationBase to define the $DefaultCrossScriptingMode, and I assume that's because _enumerations.inc.php is not available in QApplicationBase. If that's the case maybe moving $DefaultCrossScriptingMode back to QTextBoxBase and using the enum constant would be better.

I still don't think Purifier should be the default. Now that setting the default is easy, there is no reason to break backward compatibility, we should just advertise the feature so people are aware of it. In fact perhaps we could put it as part of config_checker with a strong encouragement to use the new feature.

Changed 4 months ago by vaibhav

Modified QTextBoxBase.class.php file. changed the file path of HTMLPurifier library's autoloader to variable defined in configuration.inc.php

Changed 4 months ago by vaibhav

Modified configuration definition file.

  Changed 4 months ago by vaibhav

NOTE:

I have uploaded a new version of QTextBoxBase.class.php wherein the require_once call int he 'setter' function is changed to this:

require_once(HTMLPURIFIER_AUTOLOADER);

The variable is, in turn defined in configuration.inc.php. I think this is going to be more flexible and help a developer change the path of the HTMLPurifier library from the default for some reason he might want, without having to go too deep into internals (which is obviously difficult when you start off with the framework).

hence added these lines to configuration.inc.php:

// Autoloader file for HTMLPurifier

define('HTMLPURIFIER_AUTOLOADER', INCLUDES .'/external_libraries/htmlpurifier/library/HTMLPurifier.auto.php');

Also, I added a line of credit to myself in the QTextBoxBase.class.php at the top, saying this:

HTMLPurifier Integration done to the core by Vaibhav kaushal (jan 15, 2012)

I hope the community does not mind :|

  Changed 4 months ago by vaibhav

Hey, Since we are using configuration.inc.php for all configuration stuff, why don't we define it (QTEXTBOX_PURIFICATION_MODE) just there? Will be easy to find for someone someday (it, after all is a 'configuration' option, and a big one in a way).

  Changed 4 months ago by alex94040

I don't think we should worry about having the developer store the HTML Purifier library separately from QCubed. HTML Purifier now becomes a core part of the QCubed distribution, similarly to SimpleTest?. I don't want an average developer to worry about it at all - I want them to be secure by default.

Defaults: I can agree to using the Legacy mode by default, if we put that front and center into the configuration.inc.php file. I do agree that a config checker warning would be a good idea there, too.

  Changed 4 months ago by vaibhav

Is this ticket to be closed now?

  Changed 4 months ago by alex94040

I don't believe the patch has been applied to the core yet. So don't close it yet please.

  Changed 4 months ago by vakopian

Alex, would you like to add some wording to configuration.inc.php and maybe the warning in the config checker?
I could also commit this as is, and you can add that later. Whichever you prefer.

  Changed 4 months ago by alex94040

Great ideas, Vakopian. Wanna add that wording + config checker verification + check it in?

I'm travelling without my laptop :-(

  Changed 4 months ago by vakopian

  • status changed from in_QA to closed
  • resolution set to fixed

(In [1389]) code by vaibhav, reviewed and modified by alex and vakopian. Fixes #766.

  Changed 4 months ago by vakopian

As you guys may have noticed I made some minor changes (renamed the enum to QCrossScripting::HTMLPurifier etc). As agreed I changed the default mode back to legacy (Deny). I also reworked the example to account for that and added two more cases.

Alex feel free to change the wording in the example and elsewhere as you wish to emphasise the recommended way (either with this or with a separate ticket).

  Changed 4 months ago by vakopian

(In [1390]) missed from ticket 766. Refs #766

  Changed 4 months ago by kmeirlaen

Is it just me who is worried about the license?

HTMLPurifier has "GNU LESSER GENERAL PUBLIC LICENSE", which I believe is not compatible with the QCubed License (MIT License).
We had some discussions before on this (see ticket http://trac.qcu.be/projects/qcubed/ticket/266 as an example).

If we include this in the core, the entire QCubed project becomes GNU.

  Changed 4 months ago by vakopian

Isn't it LGPL? Not GPL?
I think LGPL is compatible with MIT.
But if not, I think we should remove it, and let the users set it up if they want to use it.

  Changed 4 months ago by vaibhav

from  http://www.opensource.org/licenses/LGPL-2.1

Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does.

Also:

1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library.

You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.

Since we are not modifying HTMLPurifier, point number 2 does not apply. All the rest also seem to be compatible.

Also, I have posted a thread at HTMLPurifier's forum:  http://htmlpurifier.org/phorum/read.php?2,6188 You might just want to check it some time later.

Again, they have upgraded the library to 4.4.0. I think we should upgrade too!

  Changed 4 months ago by kmeirlaen

After some more reading...
Yes, indeed, the HTMLPurifier is LGPL. It seems to me that we can include the code in QCubed without breaking the MIT license for the rest of QCubed.
It does place some restrictions to the developers using QCubed though.
Personally, I would rather see only MIT licensed code in core QCubed.

BTW, in the patch I see a conflict of paths:
HTMLPurifier is included in "includes/qcubed/_core/htmlpurifier/", but configuration.inc.php has

define('HTMLPURIFIER_AUTOLOADER', INCLUDES .'/external_libraries/htmlpurifier/library/HTMLPurifier.auto.php');

(extra external_libraries)

overall, nice job though!

  Changed 4 months ago by vaibhav

I recommend please do not put this in the _core. HTMLPurifier is an 'external_library'. Keeping it in the 'external_libraries' will help people identify and find it easily and modifications (if needed) would be easier.

I am also trying to workout with CKEditor support (it will take time, do not expect it soon) and as such others can be added too. CKEditor can be distributed under three licenses, GPL , LGPL and MPL (mozilla). Keeping libraries separately helps ease the licesing issues as well I believe.

Also, I am going to try out the new version of HTMLPurifier. If things go well, i shall report here and then the new one can be included.

Regards

  Changed 4 months ago by vaibhav

I tested, things are working well with HTMLPurifier 4.4.0 as well. No defaults break.

I think it can be included.

@kmeir: Who does not want to have it all under one hood, in this case 'license' but if things are possible, let's merge them without having to reinvent the wheel :)

  Changed 4 months ago by kmeirlaen

I'm all for including HTMLPurifier in "external_libraries" and not in _core.
Vakopian/Alex?: thoughts?

  Changed 4 months ago by vakopian

Either way is fine with me, with a slight preference for "external_libraries". I think Alex had stronger feelings.

  Changed 4 months ago by vaibhav

I am requesting again (just in case it went unnoticed): upgrade to the new version 4.4. There are security improvements.

Regards

  Changed 3 months ago by vaibhav

We will have to mention in the final release that HTMLPurifier, as a 'sub-component' is LGPL licensed. I think that is not a problem at all.

 http://htmlpurifier.org/phorum/read.php?2,6188

Thanks.

  Changed 3 months ago by vaibhav

please move the htmlpurifier library from the _core to includes/external_libraries/ directory.

  Changed 2 months ago by vakopian

(In [1425]) move htmlpurifier from qcubed/_core to external_libraries. Refs #766

Note: See TracTickets for help on using tickets.