Protecting your HTML pages from Spam Harvester Bots

Posted by Dan on Jun 12, 2006 @ 10:46 AM

Today I was reading a post on a message list and someone mentioned they were using a function to generate their mailto: links using HTML entities instead of the ASCII characters in order to prevent spam harvesters from snagging the e-mail address. The problem with this method, is it would be pretty easy for a harvester to re-encode the HTML entities back into the correct ASCII characters.

This got me think that the best method would involve using JavaScript to write out the link. In order for a spam harvester bot to parse out the e-mail address, they'd have to understand the context of the page. This means they'd either have to actually parse the page into a DOM object and parse the DOM, or they'd have to specifically knowledge of the function and reverse engineer the links to the function. Either method is probably more than what most harvesting bots are going to use (although this could always change.)

So, I spent a few minutes and whipped up the following code. The spamProtector() JS function takes in an array of ASCII character codes, which will be used to generate the mailto: links. I obfuscated the document.write() statement by breaking the string into chunks in order to throw off any parsers looking for certain strings.

<script type="text/javascript">
function spamProtector(chars){
    function merge(c2){
        var s = [];
        for(var i=0; i < c2.length; i++ ){
            s[i] = String.fromCharCode(c2[i]);
        }
        return s.join("");
    }
    var s2 = merge(chars);
    document.write("<"+"a hr"+"ef=\"ma"+"ilt"+"o:"+s2+"\">"+s2+"</"+"a"+">");
}
</script>

The above function would need to be on any page were you might use the function to output an e-mail address. It could definitely be improved upon. You could add options for formatting, etc. You could even change the anchor mailto: code to it's ASCII equivalent to obscure things even more.

In order to make it easier to use the function, I also wrote a quick little CF-based function that will generate the JS-code necessary to display the mailto: link. This code will take the e-mail address you want to display and then generate the required JS code needed to produce the link. It works by converting each letter in the e-mail address to its ASCII equivalent code and then generating the required <script> tags.

Insert the below UDF into any CFML script you want to use the function in. This function should be very easy to convert to other languages (PHP, ASP, etc.)

<cffunction name="spamProtector" access="public" returntype="string" output="false">
    <cfargument name="email" type="string" required="true" />
    <cfset var sEmail = "" />
    <cfset var iLen = len(arguments.email) />

    <cfloop index="i" from="1" to="#iLen#">
        <cfset sEmail = listAppend(sEmail, asc(mid(arguments.email,i,1))) />
    </cfloop>

    <cfreturn "<script>spamProtector([" & sEmail & "]);</script>" />
</cffunction>

To use the CF function to generate your spam protected mailto: link, simply use the code <cfoutput>#spamProtector("jsmith@emailaddress.com")#</cfoutput>.

Categories: JavaScript, HTML/ColdFusion, Source Code

3 Comments

  • This is what I'm looking for. It hould be very helpful. My question is, does having the email address visible on the page affect the spam protection? How could the script be altered to just have "Email" as the link?
  • Doug,

    In order for a spam detection bot to actually parse the e-mail address, they'd have to be able to actually actually the JS and then parse the resulting page.

    This would take an extremely intelligent bot.

    If it can pick up an address obfuscated this way, then it's going to probably pick up just about any technique you can think of.
  • cool

Comments for this entry have been disabled.