How to use HttpWebRequest to simulate Hotmail Login?

Recently one of my clients asked for screen scrapping of a web site which required login into the site first. My first reaction was, why not. We have built plenty of custom screen scrapper for clients using our HTMLParser.Net library. When I started working on it, I realized that simple HttpWebRequest call was not returning the output that we expected. HttpWebResponse object returned from the request method was not returning any error either. Then I monitored the request in Http debugger proxy and saw that when I performed login using browser, the request was being redirected to some other URL.

Now I realized that I also need to simulate the same process. By default HttpWebRequest automatically redirects the calls to next URL and hides the fact that the actual request was redirected to some other URL. If you want to simulate the process of redirection manually, you need to stop HttpWebRequest from redirecting the request automatically. This is where AllowAutoRedirect property comes into play. Default value of this property is set to true. So first thing that you need to do is set this property value to false. And if your request requires cookies to passed with every redirection, then you need to attach a CookieContainer object to every request. The returned cookies need to be attached to subsequent requests.

Now you must be asking, how do I detect if request has been redirected and what is the next URL where I should be sending request to. All this information is provided in HttpWebResponse object returned by GetResponse method. StatusCode property value of response object is set to HttpStatusCode.Found which is equivalent to HTTP code 302. And when this happens, the target URL is returned in Location header of response.

A perfect example of this is login process of Hotmail account. Most of us simply click "Hotmail" button in our MSN home page and not notice what is the final destination URL that displays the screen for entering login credentials. The demo project contains a console application that demonstrates this whole process in action. The code walks the target URLs till it gets the URL which does not require redirection.

	webReq = WebRequest.Create(strUrl) as HttpWebRequest;
	webReq.CookieContainer = cookies;
	webReq.AllowAutoRedirect = false;
	Debug.WriteLine("Hope[" + hops.ToString() + "] - " + strUrl);
	Console.WriteLine("Hope[{0}] - {1}", hops++, strUrl);
	HttpWebResponse webResp = webReq.GetResponse() as HttpWebResponse;
	if (webResp.StatusCode == HttpStatusCode.Found)
		strUrl = webResp.Headers["Location"] as String;
		status = webResp.StatusCode;
		bFoundDestination = (webResp.StatusCode == HttpStatusCode.OK);
} while (true);							

The following is output when I ran the attached demo project. You can see that it took 3 redirections before reaching final URL where user needs to login.

Hope[0] -
Hope[1] -
Hope[2] -
Hope[3] -
comments powered by Disqus

Blog Tags