Wednesday, March 10, 2010

Screen Scrapping with HTTP request.

Hi All,

This posts is about the screen scrapping with out using any third party tool by using HttpWebRequest object.

So guys download fidler 2 here to get started. "http://www.fiddler2.com/fiddler2/"

Open up fidler 2 and click launch IE button in the top right corner. when the browser is launched open up the URL in the browser which needs to be scrapped.

and click on the inspector tab in right top column. there you will see several tabs of header , text view , web form for the request generated.
and in the right bottom column you will notice response in various formats select the tab web view in the bottom.

Create a .net application write the following code in MyPage.aspx.cs file

strURL = "http://www.rppsales.com/Properties.aspx";
string startDate = DateTime.Now.ToShortDateString();
string endDate = DateTime.Now.AddDays(1).ToShortDateString();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strURL);
request.Method = "POST";
request.CookieContainer = new CookieContainer();
request.CookieContainer.Add(new Uri(strURL), new Cookie("cookieName", "CookieValue"));


In this piece of code you will notice that i am using a cookie container for adding cookies that are made while requesting the site. you can check the cookies in Request > header Tab of filder under the tree view cookies / login

When the cookies are placed in the container we will try getting the request data to get that open up the second tab textview of fidler and use it in the code like.

string postData = "__EVENTTARGET=&__EVENTARGUMENT=&__LASTFOCUS="; //Paste data of textview here in double qoutes

then you have to post the data and will read the response. by using the following peice of code.

byte[] byteArray = Encoding.UTF8.GetBytes(postData);
// Set the ContentType property of the WebRequest.
request.ContentType = "application/x-www-form-urlencoded";
// Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length;
// Get the request stream.
Stream dataStream = request.GetRequestStream();
// Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);
// Close the Stream object.
dataStream.Close();
// Get the response.
WebResponse response = request.GetResponse();
// Display the status.
this1.Text = ((HttpWebResponse)response).StatusDescription;
// Get the stream containing content returned by the server.
dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
// Display the content.

this1.Text += responseFromServer;
////Console.WriteLine(responseFromServer);
// Clean up the streams.
reader.Close();
dataStream.Close();
response.Close();



At this stage the response of the server is in your pocket. Apply regular expression to extract the data that you need. and save it any where you want.

Happy Coding.

Cheers.

No comments: