At times it is desirable to create an absolute URL for a HTML element's href or src attribute. I failed to find a shrink wrapped mechanism to generate them in ASP.NET MVC, and some of the methods I found on the web were not quite what I wanted. So I created extension methods for the Uri class, which will generate absolute URLs relative to the current context's Request.Url property.

The class definition follows:

public static class UriHelperExtensions
{
   // Prepend the provided path with the scheme, host, and port of the request.
   public static string FormatAbsoluteUrl(this Uri url, string path)
   {
      return string.Format( 
         "{0}/{1}", url.FormatUrlStart(), path.TrimStart('/') );
   }

   // Generate a string with the scheme, host, and port if not 80.
   public static string FormatUrlStart(this Uri url)
   {
      return string.Format( "{0}://{1}{2}", url.Scheme, 
         url.Host, url.Port == 80 ? string.Empty : ":" + url.Port );
   }
}

The following snippet of code from an ASP.NET MVC Razor view demonstrates how to generate an absolute URL given the relative URL /images/img.jpg.

   <img src="@Request.Url.FormatAbsoluteUrl("/images/img.jpg")" alt="Alt text" />

For my blog, the HTML generated for the img element will be:

   <img src="http://www.nathanfox.net/images/img.jpg" alt="Alt text" />

I hope this helps :-)

Technical ASP.NET .NET ASP.NET MVC January 25, 2012

Some time ago I was going through web logs, and I noticed a number of requests being served on a domain that did not belong to my employer. It ends up the entire website was indexed in the search engines under this other domain.

It is simple for anyone to pull this off, all it takes is to point a DNS entry to some other website's IP address, and now all their pages will appear under that domain if the target site is not protecting itself. In a sense the domain name is siphoning off the content of another website.

It ends up there were a number of domains doing this to my employer's web site. I can only guess at the motivation. It might be a free way to build up SEO keyword relevance before it is switched over to other content. It might be a build up to a social engineering hacking attack. Regardless, it is not a desirable situation to allow for many reasons.

  • Your pages may be found in search engines under the siphoning domain, which would be very confusing to your end users.
  • This is a disaster from an SEO perspective, as the siphoning domain is a complete duplicate of your domain's content.
  • The siphoning domain has the potential to use a lot of bandwidth.

There are probably many ways to deal with this issue, but I decided to use rewrite rules to protect against this situation. I implemented the concept of authorized domains for a website.

Rewrite rules for Apache's mod_rewrite or Helicon Tech's Ape

I've used Helicon Tech's ISPAI Rewrite, and now Ape, for years now under IIS, but I've tested these rules using mod_rewrite on Apache 2.2 as well.

I implemented two things that should make attackers steer clear of targeting a domain: return a robots.txt that tells the search engines not to index any content and return a HTTP 410 Gone for any other page request. These two things tell any bot or user that nothing exists on a siphoning domain targeting your website. Valid content will only be returned from domains you authorize.

I created a rewrite map which lists authorized domains. For example, assume the domains www.domain-test1.com and images.domain-test1.com are the domains you use for your site. Create a text file named AuthorizedDomains.txt with the following content:

www.domain-test1.com -
images.domain-test1.com -

Place a file called robotsDisallow.txt in your website's root with the following content.

User-Agent: *
Disallow: /

The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.

The following rewrite rules should be placed in your httpd.conf file.

RewriteMap  lower int:tolower
RewriteMap AuthorizedDomainsMap txt:AuthorizedDomains.txt

# Serve a robots.txt file which tells search engines not to index unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule ^/robots\.txt$ /robotsDisallow.txt [NC,L]

# Return a HTTP 410 page for unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule .? - [G]

For Apache, I put the AuthorizedDomains.txt file in the Apache installation folder on Windows, but you can specify the path to the file as well.

A few notes:

  • This may be obvious, but make sure to thoroughly test these rules. You do not want to accidentally muck up your website.
  • I recommend using authorized domains in your development environment to make sure the rules are working as expected.
  • I used a map file as my team uses a lot of domains in our development environment. This makes for easier management of authorized domains, as well as allowing for the use of the same httpd.conf for both development and production, but different authorized domain configurations.
  • Make sure to put your authorized domain rules after rules that redirect subdomains you wish to send to your primary domain(s).

Rewrite rules for IIS's URL Rewrite Module

You can implement a functionally equivalent set of rules using IIS's URL Rewrite Module. Again, I created a map for the authorzied domain look up, but unlike mod_rewrite, the map is not stored in a separate file. As in the previous example, assume the domains www.domain-test1.com and images.domain-test1.com are your authorized domains.

Place a file called robotsDisallow.txt in your website's root with the following content.

User-Agent: *
Disallow: /

The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.

The following rules go in the website's web.config file.


<rewrite>
   <rules>
       <rule name="robots for unauthorized domain" stopProcessing="true">
           <match url="^robots\.txt$" />
           <conditions logicalGrouping="MatchAll">
               <add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
           </conditions>
           <action type="Rewrite" url="/robotsDisallow.txt" />
       </rule>
       <rule name="Authorized domain check" stopProcessing="true">
           <match url=".?" />
           <conditions logicalGrouping="MatchAll">
               <add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
           </conditions>
           <action type="CustomResponse" statusCode="410" 
              statusReason="Gone" 
              statusDescription="The requested resource is no longer available" />
       </rule>
   </rules>
   <rewriteMaps>
       <rewriteMap name="Authorized domains">
           <add key="www.domain-test1.com" value="-" />
           <add key="images.domain-test1.com" value="-" />
       </rewriteMap>
   </rewriteMaps>
</rewrite>

Like the mod_rewrite rules, these rules will return a robots.txt file that tells search engines not to index any content on an unauthorized domain and to return a HTTP 410 Gone for all page requests on an unauthorized domain.

Another options is to use host headers in IIS to restrict the domains for a website, but this mechanism should be used as a last resort. Host headers in IIS have a number of limitations, so avoid them if possible.

Wrap up

To check for this situation, review your website's web logs and look for any unfamiliar domain names. If you find that a website has already fallen victim to this attack and is indexed by the search engines under the siphoning domain, you can implement the use of authorized domains. Then it will take time. It can take quite a while for your pages on the siphoning domain to start dropping out of the search engines, but it will take effect eventually. If I remember correctly, it took a few months for the search engines to start dropping the pages.

It is surprising how many websites are open to this attack. I checked a number of major websites, and all but one was open to this issue. I'm sure there are legal ways to stop someone from doing this, but with a fairly small technical implementation, this can be averted in a much cheaper way.

If you have any questions please let me know. I hope this helps.

Technical January 21, 2012

A common desire is to redirect all subdomains to a single domain for all website requests. The following rewrite rule will 301 redirect all subdomains other than www to the www subdomain.

RewriteCond %{HTTP_HOST} (?<!www\.)domain\.com$ [NC]
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]

I used this rule in IIS 7.5 using HeliconTech's Ape, but I've tested it with Apache's mod_rewrite as well. It can go in your .htaccess file or httpd.conf. If put in httpd.conf, it will work as is using RewriteBase /, otherwise remove the trailing slash in the RewriteRule's substitution string.

I hope this helps :-)

Technical January 17, 2012