At times it is desirable to create an absolute URL for a HTML element's href or src attribute. I failed to find a shrink wrapped mechanism to generate them in ASP.NET MVC, and some of the methods I found on the web were not quite what I wanted. So I created extension methods for the Uri class, which will generate absolute URLs relative to the current context's Request.Url property.
The class definition follows:
public static class UriHelperExtensions
{
// Prepend the provided path with the scheme, host, and port of the request.
public static string FormatAbsoluteUrl(this Uri url, string path)
{
return string.Format(
"{0}/{1}", url.FormatUrlStart(), path.TrimStart('/') );
}
// Generate a string with the scheme, host, and port if not 80.
public static string FormatUrlStart(this Uri url)
{
return string.Format( "{0}://{1}{2}", url.Scheme,
url.Host, url.Port == 80 ? string.Empty : ":" + url.Port );
}
}
The following snippet of code from an ASP.NET MVC Razor view demonstrates how to generate an absolute URL given the relative URL /images/img.jpg.
<img src="@Request.Url.FormatAbsoluteUrl("/images/img.jpg")" alt="Alt text" />
For my blog, the HTML generated for the img element will be:
I hope this helps :-)
Some time ago I was going through web logs, and I noticed a number of requests being served on a domain that did not belong to my employer. It ends up the entire website was indexed in the search engines under this other domain.
It is simple for anyone to pull this off, all it takes is to point a DNS entry to some other website's IP address, and now all their pages will appear under that domain if the target site is not protecting itself. In a sense the domain name is siphoning off the content of another website.
It ends up there were a number of domains doing this to my employer's web site. I can only guess at the motivation. It might be a free way to build up SEO keyword relevance before it is switched over to other content. It might be a build up to a social engineering hacking attack. Regardless, it is not a desirable situation to allow for many reasons.
There are probably many ways to deal with this issue, but I decided to use rewrite rules to protect against this situation. I implemented the concept of authorized domains for a website.
I've used Helicon Tech's ISPAI Rewrite, and now Ape, for years now under IIS, but I've tested these rules using mod_rewrite on Apache 2.2 as well.
I implemented two things that should make attackers steer clear of targeting a domain: return a robots.txt that tells the search engines not to index any content and return a HTTP 410 Gone for any other page request. These two things tell any bot or user that nothing exists on a siphoning domain targeting your website. Valid content will only be returned from domains you authorize.
I created a rewrite map which lists authorized domains. For example, assume the domains www.domain-test1.com and images.domain-test1.com are the domains you use for your site. Create a text file named AuthorizedDomains.txt with the following content:
www.domain-test1.com - images.domain-test1.com -
Place a file called robotsDisallow.txt in your website's root with the following content.
User-Agent: * Disallow: /
The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.
The following rewrite rules should be placed in your httpd.conf file.
RewriteMap lower int:tolower
RewriteMap AuthorizedDomainsMap txt:AuthorizedDomains.txt
# Serve a robots.txt file which tells search engines not to index unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule ^/robots\.txt$ /robotsDisallow.txt [NC,L]
# Return a HTTP 410 page for unauthorized domains.
RewriteCond ${AuthorizedDomainsMap:${lower:%{SERVER_NAME}}|NOT_FOUND} NOT_FOUND
RewriteRule .? - [G]
For Apache, I put the AuthorizedDomains.txt file in the Apache installation folder on Windows, but you can specify the path to the file as well.
A few notes:
You can implement a functionally equivalent set of rules using IIS's URL Rewrite Module. Again, I created a map for the authorzied domain look up, but unlike mod_rewrite, the map is not stored in a separate file. As in the previous example, assume the domains www.domain-test1.com and images.domain-test1.com are your authorized domains.
Place a file called robotsDisallow.txt in your website's root with the following content.
User-Agent: * Disallow: /
The robotsDisallow.txt will be served for robots.txt requests on an unauthorized domain.
The following rules go in the website's web.config file.
<rewrite>
<rules>
<rule name="robots for unauthorized domain" stopProcessing="true">
<match url="^robots\.txt$" />
<conditions logicalGrouping="MatchAll">
<add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
</conditions>
<action type="Rewrite" url="/robotsDisallow.txt" />
</rule>
<rule name="Authorized domain check" stopProcessing="true">
<match url=".?" />
<conditions logicalGrouping="MatchAll">
<add input="{Authorized domains:{SERVER_NAME}}" pattern="-" negate="true" />
</conditions>
<action type="CustomResponse" statusCode="410"
statusReason="Gone"
statusDescription="The requested resource is no longer available" />
</rule>
</rules>
<rewriteMaps>
<rewriteMap name="Authorized domains">
<add key="www.domain-test1.com" value="-" />
<add key="images.domain-test1.com" value="-" />
</rewriteMap>
</rewriteMaps>
</rewrite>
Like the mod_rewrite rules, these rules will return a robots.txt file that tells search engines not to index any content on an unauthorized domain and to return a HTTP 410 Gone for all page requests on an unauthorized domain.
Another options is to use host headers in IIS to restrict the domains for a website, but this mechanism should be used as a last resort. Host headers in IIS have a number of limitations, so avoid them if possible.
To check for this situation, review your website's web logs and look for any unfamiliar domain names. If you find that a website has already fallen victim to this attack and is indexed by the search engines under the siphoning domain, you can implement the use of authorized domains. Then it will take time. It can take quite a while for your pages on the siphoning domain to start dropping out of the search engines, but it will take effect eventually. If I remember correctly, it took a few months for the search engines to start dropping the pages.
It is surprising how many websites are open to this attack. I checked a number of major websites, and all but one was open to this issue. I'm sure there are legal ways to stop someone from doing this, but with a fairly small technical implementation, this can be averted in a much cheaper way.
If you have any questions please let me know. I hope this helps.
A common desire is to redirect all subdomains to a single domain for all website requests. The following rewrite rule will 301 redirect all subdomains other than www to the www subdomain.
RewriteCond %{HTTP_HOST} (?<!www\.)domain\.com$ [NC]
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]
I used this rule in IIS 7.5 using HeliconTech's Ape, but I've tested it with Apache's mod_rewrite as well. It can go in your .htaccess file or httpd.conf. If put in httpd.conf, it will work as is using RewriteBase /, otherwise remove the trailing slash in the RewriteRule's substitution string.
I hope this helps :-)
My Links
Tags
Follow me
About
Powered by FoxBlog
Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.
© Copyright 2011, Nathan Fox