Completely Free .NET word API

Resources » .NET programming » .NET Framework

Build your own search engine using regular expressions

Updated: Category: .NET Framework
Author: Member Level: GoldPoints: 10

Do you want to build your own web search engine? This C# sample code shows how to download webpages, and use regular expressions to parse all hyperlinks from an html source.

Did you ever wanted to develop your own search engine to search in your website site pages?

First thing you may need to do is, spider through all hyperlinks in each of your page and store those page content in some kind of collection in memory. Probably you can use a Hashtable in C# to store the urls and page content as key value pairs. Then the search is very easy - just search in the hashtable and show the matching URLs.

To build your search index, you can access the content of your root page ( homepage ) using the following code :

System.Net.WebResponse response = null;

// Setup our Web request
System.Net.WebRequest request = System.Net.WebRequest.Create(pageUrl);
request.Timeout = timeoutSeconds * 1000;

// Retrieve data from request
response = request.GetResponse();

System.IO.Stream streamReceive = response.GetResponseStream();
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
System.IO.StreamReader streamRead = new System.IO.StreamReader( streamReceive, encoding);

// return the retrieved HTML
return streamRead.ReadToEnd();
catch (Exception ex)
// Error occured grabbing data, return empty string.
return "";
// Check if exists, then close the response.
if ( response != null )

This code will retrieve the content of HTML page. Now scan through this page and retrieve all hyperlinks from this page and then retrieve content of all those hyperlinks. Recursively perform this operation until you cover all pages in your site. Add all those URLs and contents as keyvalue pairs into your collection (Hashtable).

You can use the following regular expression to retrieve all hyperlinks from the page content:

Regex regex = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
RegexOptions.IgnoreCase|RegexOptions.Compiled );

for ( Match match = regex.Match( html ); match.Success; match = match.NextMatch() )
MessageBox.Show( match.Groups[1].ToString() );

Did you like this resource? Share it with your friends and show your love!

Responses to "Build your own search engine using regular expressions"

No responses found. Be the first to respond...


Post Comment:

  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name:   Sign In to fill automatically.
    Email: (Will not be published, but required to validate comment)

    Type the numbers and letters shown on the left.

    Submit Article     Return to Article Index
    Subscribe to Subscribers
    Awards & Gifts
    Talk to Webmaster Tony John

    Online Members

    Jon Wells
    Copyright © SpiderWorks Technologies Pvt Ltd., Kochi, India