Build your own search engine using regular expressions

Do you want to build your own web search engine? This C# sample code shows how to download webpages, and use regular expressions to parse all hyperlinks from an html source.

Did you ever wanted to develop your own search engine to search in your website site pages?

First thing you may need to do is, spider through all hyperlinks in each of your page and store those page content in some kind of collection in memory. Probably you can use a Hashtable in C# to store the urls and page content as key value pairs. Then the search is very easy - just search in the hashtable and show the matching URLs.

To build your search index, you can access the content of your root page ( homepage ) using the following code :

System.Net.WebResponse response = null;

// Setup our Web request
System.Net.WebRequest request = System.Net.WebRequest.Create(pageUrl);
request.Timeout = timeoutSeconds * 1000;

// Retrieve data from request
response = request.GetResponse();

System.IO.Stream streamReceive = response.GetResponseStream();
System.Text.Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
System.IO.StreamReader streamRead = new System.IO.StreamReader( streamReceive, encoding);

// return the retrieved HTML
return streamRead.ReadToEnd();
catch (Exception ex)
// Error occured grabbing data, return empty string.
return "";
// Check if exists, then close the response.
if ( response != null )

This code will retrieve the content of HTML page. Now scan through this page and retrieve all hyperlinks from this page and then retrieve content of all those hyperlinks. Recursively perform this operation until you cover all pages in your site. Add all those URLs and contents as keyvalue pairs into your collection (Hashtable).

You can use the following regular expression to retrieve all hyperlinks from the page content:

Regex regex = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
RegexOptions.IgnoreCase|RegexOptions.Compiled );

for ( Match match = regex.Match( html ); match.Success; match = match.NextMatch() )
MessageBox.Show( match.Groups[1].ToString() );


No responses found. Be the first to comment...

  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name: