Using AntiSamy to protect your CFM pages from XSS hacks

Posted by Dan on Jan 3, 2008 @ 4:05 PM

I recently posted about a new open source Java project called AntiSamy—which allows you to protect your websites from XSS hacks. I also promised that I'd soon show you some code examples that show you how you can use AntiSamy within ColdFusion.

I've only tested this code under ColdFusion 8. It should theoretically work on any ColdFusion installation, provided you're using a JDK version that supports the compiled version of the AntiSamy code (which is compiled to Java v1.5.)

Before you can actually use AntiSamy, there are a few quick steps you need to make.

  1. Download AnitSamy from it's home on Google Code. You'll want to download the most recent version of the precompiled JAR file. The file name should look something like antisamy-bin.1.0.jar.
  2. You'll also want to download all of the AntiSamy XML policy files listed on the page. They provide some good default policies and provide you with a good starting point if you want to tweak your own custom policies. The example code below uses the antisamy-slashdot.xml policy file, so you will need to download at minimum this policy file.
  3. Copy the antisamy-bin.x.y.jar file to your ColdFusion8\wwwroot\WEB-INF\lib\ folder.
  4. Restart the ColdFusion 8 Application Server service.
  5. Create a sub-folder somewhere on your development server called /AntiSamy.
  6. Create a blank CFM file in the /AntiSamy folder called antisamy.cfm.
  7. Inside the /AntiSamy folder, create a new sub-folder called /policy.
  8. Copy all of the AntiSamy policy files you downloaded in step 2 into the /AntiSamy/policy/ folder.

AntiSamy is now installed and it's ready for use, but how do we use it?

Initializing an instance of the AntiSamy class is very easy, we just need the line:

// get an instance of the AntiSamy class
sanitizer = createObject("java", "org.owasp.validator.html.AntiSamy");

There a number of methods that are available, but the main method you'll be concerned with is the scan() method. (For a list of all available methods, you'll want to download the AntiSamy Javadocs file.)

The in our example, we're going to pass in two arguments to the scan() method—the string to scan and the path to the XML policy file:

// sanitize the user's input
results = sanitizer.scan(input, policyFile);

After running the scan() method the next step is to actually get the "cleaned" results. This will be a string that contains an XSS-free version of the original input.

// get the cleaned output
output = results.getCleanHTML();

So now let's put combine all this code into an actual working example. Open up the antisamy.cfm file you created in step 6 with your favorite editor and follow the instructions below.

The first thing we'll need to do is create an HTML string which contains some XSS hacks. The example below contains two very simple examples of XSS hacks. The second XSS example is something that will only work in Internet Explorer. This code doesn't do anything malicious—but it does show off techniques that could be used to do very malicious things.

<!---// create a string containing HTML with XSS hacks //--->
<cfsavecontent variable="sBadInput">
    <script>alert('xss 1');</script>
    <div style="background:url('javascript:alert('xss 2')')">Some bad HTML!</div>

Now that we have some HTML with some XSS text in it, we can finally get to the good stuff—actually cleaning the code with AntiSamy.

// define the AntiSamy XML policy to use for cleaning the output
sPolicyXml = expandPath("./policy/antisamy-slashdot.xml");
// get an instance of the AntiSamy class
sanitizer = createObject("java", "org.owasp.validator.html.AntiSamy");
// sanitize the user's input
results = sanitizer.scan(sBadInput, sPolicyXml);
// get the cleaned output
sSafeHtml = results.getCleanHTML();

The variable sSafeHtml will contain the cleaned version of the HTML from the sBadInput variable. The next thin we'll want to do is to output the both variables so we can see the changes.

<h3>Bad Input</h3>
<h3>Cleaned Input using AntiSamy</h3>

Now save the file and run the code in your browser. If all goes well you should see something that look like:

Bad Input
<script>alert('xss 1');</script>
<div style="background:url('javascript:alert('xss 2')')">Some bad HTML!</div>

Cleaned Input using AntiSamy
<div style="">Some bad HTML!</div>

As you can see, AntiSamy was able to successfully clean out all of the XSS attacks that were embedded into the HTML.

Next, let's add the following code to your template:

<!---// output the bad HTML, this will generate JavaScript alerts //--->
<!---// now output the clean HTML, which is protected from XSS hacks //--->

If you save the changes and run your template again, you'll notice that you now will see either 1 or 2 alerts—depending on whether or not you're running Internet Explorer.

If you want to see what errors were actually caught by AntiSamy, the results variable will contain a method called getErrorMessages() which returns an array of all the errors returned from the AntiSamy filter.

<cfdump var="#results.getErrorMessages()#" label="AntiSamy Error Messages" />

While many developers feel like XSS isn't really an issue for them, if you're using any kind of Rich Text Editor on your site (such as FCKEditor, TinyMCE, etc) then your site is at risk for XSS attacks. All it takes is a malicious user to post some raw HTML containing an XSS attack for you to be vulnerable.

As you can see, AntiSamy provides you a very quick way to protect yourself from these XSS attacks and it works great with ColdFusion!

Categories: HTML/ColdFusion, Java, Source Code


  • Quick question for you....what is the best time to use this? I think its fantastic...but we wouldn't do this in data inserts that are already cfqueryparamed ....right?

    Would it be on any text outputs from a database, or just a form input that displays on the following page....

    Your thoughts?  ie your idea how you'd actually use?

  • @Eric:

    The <cfqueryparam/> tag will protect you from SQL Injections, but not a XSS hack. An XSS attack by definition is a user exploiting holes in a website that allows them to post malicious code that gets rendered by the browser.

    A common example would be a comment form (much like on this site) that displays output supplied by a user. If you do not do something to explicitly prevent the XSS hacks, you're exposing your site to be attacked.

    A common hack would be to use JavaScript to send the contents of the visitor's cookie to another site. So, if you had a site where you stored personal information into a cookie a malicious XSS attack would transparently send the contents of the users cookie to anyone visiting the page that had JS enabled.

    Does that help?
  • I think so, we otherwise clean some input but not nearly as clean as this, and you are right about XSS.
    My question was geared towards: do we samy every firstname, lastname, address1, address2, etc field since it can be displayed later during searches or admin functions in our apps.  My inclination is yes, and that the performance penalty is slight, considering this probably runs faster and better than what we do.
  • Eric,

    This is a tool you want to use any time you have data that will be output and will not be wrapped with HTMLEditFormat(). If you have simple firstname, lastname, etc, you should be wrapping that output with HTMLEditFormat() to prevent XSS attacks and to present the user the data in the exact format they entered. This also has the bonus of stopping XSS attacks on the cheap.

    Something like anti-Sammy only needs to used when you allow the user to enter HTML tags. This data can not be wrapped with HTMLEditFormat() and thus has to be cleaned of XSS attacks.

    I typically use HTMLEditFormat() on output to preserve the raw data in the database. In this case, I would use anti-Sammy on data input/updates in order to preserve clean, XSS free data in the database!

    I hope that helps!
  • @Eric:

    As Jeffrey pointed out the htmlEditFormat() and htmlCodeFormat() functions provide adequate protection from XSS since they escape anything that could be interpreted as HTML--at least I'm not aware of any XSS attacks that work around the encoding these functions do.

    I always wrap user input in the htmlEditFormat() function.

    AnitSamy is really only needed when you allow HTML input from the user--either by allowing them to type tags in manually or when you're using a rich text editor.
  • is there a fix for non-latin encodings?
    I submit utf-8 encoded Cyrillic and I get back ???? marks :-(
  • @Serge:

    In order to get ColdFusion to correctly support UTF-8, you must explicitly set the page encoding:

    <cfprocessingdirective pageEncoding="utf-8" />
  • yes, that's already there. #tt# > correct encoding, #antisamy.protect(tt)# > ??? marks.
  • @Serge:

    I'd direct your questions to an AntiSamy-related group. It's possible that AntiSamy isn't i18n ready and quite frankly I've never tested or looked at the code to see if it was written for UTF-8.
  • Hello Dan,
    Thank you. I have already contacted them. It's a known issue and it will be fixed with 1.1 release. So for the moment v1.0 is not ready for international use.
  • @Serge:

    Thanks for posting that update. I'm glad to know it's a problem they're working on fixing the problem.
  • Hi Dan,

    Very informative. I was planning on using jtidy until I saw your post and thought that antisamy looks better.

    I'm using CF7. I have followed your instructions, and in addition, copied the required libs into the same webroot/web-inf/lib folder. When I try to run sanitizer.scan(), I get an error:

    500 org/apache/xml/serialize/XHTMLSerializer

    and processing just stops.

    Did you come across this at all?

    Thanks, Peter
  • Dan - sorted it. I put the antisamy jar in the wrong place.
  • So I read most of this entry and your original AntiSamy post thinking that you were saying AntiSpamy (notice the 'p'). I kept thinking, "what does this have to do with spam prevention?" Doh! :)
  • Just got pointed to this post ;o)

    An easier option would probably be to use JavaLoader to load up the antisammy .jar file, so you can use it anywhere you would like.

    Also makes deployment easier as well.

    Just a thought :D Keep up the interesting posts!
  • @Mark:

    I use generally use your JavaLoader, but when blogging I try to keep things to the core steps necessary in order to reduce confusion/complexity. (I just wish Adobe would finally officially support your JavaLoader technique within CF!)
  • Great article, I would also recomend storing the antisamy policy file above the root to prevent public access. You may not be able to set a mapping to access this file - I couldn't - so I set it using the request.scope.
  • I'm using CF9, and am getting back a blank page, no errors, just blank. Anyone using this in CF9?

    By the way your "reset" button below needs to go!
  • Instead of just cleaning the input, I'm wondering why you would not, or do you suggest you abort the input? If the user is trying to hack you, why accept their input?
  • @Roger:

    You can certainly add logic to halt execution if user input looks bad, but that's not always the full picture.

    There are a couple problems with assuming this case protects you in all cases:

    1. While it's certainly ideal that code only persists data in a single place in your codebase, the reality is that most applications aren't built this way. There's many avenues that data can get stored to your persistent layer and they may not funnel through the same business logic as a form you present to a user. For example, you might have a process to import data from an external source. Unless you're running that process through the exact same business logic, you're open to a problem. This is probably more of a concern when the data your importing was compromised w/out your client realizing it.

    2. An exploit could be broken up over multiple fields, so the input on any one single user input may not look bad, but if you put them together, the output may generate issues.

    As a very simple example, let's assume you have a form that has a first and last name field. A hacker knows that this information is going to most likely be put together on the screen--and may even be flatten together in a computed column or views. So the hacker does something like:

    First name: Dan <script
    Last name: >alert('here'); //

    This might end up passing your initial filters as being valid, but as soon as you tried to output first & last name together, unless you're displaying the string in a safe way, you've left yourself open to an exploit.

    Just some food for thought.

Comments for this entry have been disabled.