Domain name siphoning, how to protect a website

I recommend

Some time ago I was going through web logs, and I noticed a number of requests being served on a domain that did not belong to my employer. It ends up the entire website was indexed in the search engines under this other domain.

It is simple for anyone to pull this off, all it takes is to point a DNS entry to some other website's IP address, and now all their pages will appear under that domain if the target site is not protecting itself. In a sense the domain name is siphoning off the content of another website.

It ends up there were a number of domains doing this to my employer's web site. I can only guess at the motivation. It might be a free way to build up SEO keyword relevance before it is switched over to other content. It might be a build up to a social engineering hacking attack. Regardless, it is not a desirable situation to allow for many reasons.

  • Your pages may be found in search engines under the siphoning domain, which would be very confusing to your end users.
  • This is a disaster from an SEO perspective, as the siphoning domain is a complete duplicate of your domain's content.
  • The siphoning domain has the potential to use a lot of bandwidth.

There are probably many ways to deal with this issue, but I decided to use rewrite rules to protect against this situation. I implemented the concept of authorized domains for a website.

Rewrite rules for Apache's mod_rewrite or Helicon Tech's Ape

I've used Helicon Tech's ISPAI Rewrite, and now Ape, for years now under IIS, but I've tested these rules using mod_rewrite on Apache 2.2 as well.

I implemented two things that should make attackers steer clear of targeting a domain: return a robots.txt that tells the search engines not to index any content and return a HTTP 410 Gone for any other page request. These two things tell any bot or user that nothing exists on a siphoning domain targeting your website. Valid content will only be returned from domains you authorize.

I created a rewrite map which lists authorized domains. For example, assume the domains www.domain-test1.com and images.domain-test1.com are the domains you use for your site. Create a text file named AuthorizedDomains.txt with the following content:

www.domain-test1.com -
images.domain-test1.com -

Place a file called robotsDisallow.txt in your website's root with the following content.

User-Agent: *
Disallow: /

The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.

The following rewrite rules should be placed in your httpd.conf file.

RewriteMap  lower int:tolower
RewriteMap AuthorizedDomainsMap txt:AuthorizedDomains.txt

# Serve a robots.txt file which tells search engines not to index unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule ^/robots\.txt$ /robotsDisallow.txt [NC,L]

# Return a HTTP 410 page for unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule .? - [G]

For Apache, I put the AuthorizedDomains.txt file in the Apache installation folder on Windows, but you can specify the path to the file as well.

A few notes:

  • This may be obvious, but make sure to thoroughly test these rules. You do not want to accidentally muck up your website.
  • I recommend using authorized domains in your development environment to make sure the rules are working as expected.
  • I used a map file as my team uses a lot of domains in our development environment. This makes for easier management of authorized domains, as well as allowing for the use of the same httpd.conf for both development and production, but different authorized domain configurations.
  • Make sure to put your authorized domain rules after rules that redirect subdomains you wish to send to your primary domain(s).

Rewrite rules for IIS's URL Rewrite Module

You can implement a functionally equivalent set of rules using IIS's URL Rewrite Module. Again, I created a map for the authorzied domain look up, but unlike mod_rewrite, the map is not stored in a separate file. As in the previous example, assume the domains www.domain-test1.com and images.domain-test1.com are your authorized domains.

Place a file called robotsDisallow.txt in your website's root with the following content.

User-Agent: *
Disallow: /

The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.

The following rules go in the website's web.config file.


<rewrite>
   <rules>
       <rule name="robots for unauthorized domain" stopProcessing="true">
           <match url="^robots\.txt$" />
           <conditions logicalGrouping="MatchAll">
               <add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
           </conditions>
           <action type="Rewrite" url="/robotsDisallow.txt" />
       </rule>
       <rule name="Authorized domain check" stopProcessing="true">
           <match url=".?" />
           <conditions logicalGrouping="MatchAll">
               <add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
           </conditions>
           <action type="CustomResponse" statusCode="410" 
              statusReason="Gone" 
              statusDescription="The requested resource is no longer available" />
       </rule>
   </rules>
   <rewriteMaps>
       <rewriteMap name="Authorized domains">
           <add key="www.domain-test1.com" value="-" />
           <add key="images.domain-test1.com" value="-" />
       </rewriteMap>
   </rewriteMaps>
</rewrite>

Like the mod_rewrite rules, these rules will return a robots.txt file that tells search engines not to index any content on an unauthorized domain and to return a HTTP 410 Gone for all page requests on an unauthorized domain.

Another options is to use host headers in IIS to restrict the domains for a website, but this mechanism should be used as a last resort. Host headers in IIS have a number of limitations, so avoid them if possible.

Wrap up

To check for this situation, review your website's web logs and look for any unfamiliar domain names. If you find that a website has already fallen victim to this attack and is indexed by the search engines under the siphoning domain, you can implement the use of authorized domains. Then it will take time. It can take quite a while for your pages on the siphoning domain to start dropping out of the search engines, but it will take effect eventually. If I remember correctly, it took a few months for the search engines to start dropping the pages.

It is surprising how many websites are open to this attack. I checked a number of major websites, and all but one was open to this issue. I'm sure there are legal ways to stop someone from doing this, but with a fairly small technical implementation, this can be averted in a much cheaper way.

If you have any questions please let me know. I hope this helps.

Technical January 21, 2012

comments powered by Disqus