Using Index Server to Search Your Web Site - Noise Words

Using Index Server to Search Your Web Site - Noise Words

by John Peterson

Introduction

Since writing my article on using Microsoft's Index Server from ASP (Part 1, Part 2), I've gotten quite a few questions about why people aren't getting the results they expect. There are a number of reasons why this might happen, but one of the most common is that your query includes one or more words that index server considers "noise words". This article will explain what noise words are and show you how to edit the list of words that index server treats as noise words.

The Email

You can thank Yu Zhang for finally getting me to write this article. I've answered quite a few questions about noise words, but his came at the issue from a little bit different angle. Here's his email:

Hi John,

Your article "Using Index Server to Search Your Web Site", helped me a lot, but, I'm having a problem with 'reserved words' such as 'i' and 'about'. To deal with this problem, I need to find the list of Index Server's reserved words so I can filter them.

Do you know where to find the list?

Thanks,
Yu

As it turns out, I didn't know where to find the list... so, I found out. Having taken the time to do so, I figured I should share the info with everyone.

What are Noise Words?

Noise words are words that are very common and yet have very little meaning. Words like 'a', 'an', 'the', 'to', 'so', 'with', etc. are found in almost all documents but provide very little information about the actual meaning of the document. Therefore, there is very little value to be gained from knowing that a document contains any of them. Because of this, Index Server is designed to ignore these type of words when it builds an index from a set of documents.

So, to answer Yu's question from above, you can find the list of all the words that Index Server considers noise words in the System32 folder of your Windows directory. There you'll find a bunch of files named noise.xxx, where xxx represents the language in question. For US English, the file name is noise.enu. On my laptop, the complete path to this file is C:\Windows\System32\noise.enu. The file is a plain text file and you can open and edit it using the text editor of your choice (Windows' Notepad works fine).

Editing the List of Noise Words

So why would you want to add or remove a word? Let's say your site is named "ASP 101" and every page title includes the phrase "ASP 101". In that case, searching for "ASP" might be pretty pointless since it would return every single document and that really sort of defeats the point of searching for something now doesn't it? To avoid this problem, we might want to add "ASP" and "101" to the list of noise words so that Index Server would ignore them while indexing and produce a smaller index and provide faster search results. It would also prevent users from searching for "ASP" and getting back an unmanageable set of results.

Editing the noise word list is basically as simple as editing the text file. As always, you should make a backup copy before you do so and there are a few other caveats, but they are all discussed in Microsoft's Knowledge Base Article #247561 - How to Edit Index Server Noise-Word Lists so I won't go into them here.

That's All Folks

I hope this article has helped shed some light on the topic of noise words for all of you using Index Server. And, keep on sending in those questions... someone has to tell me what you guys want to read about.

As an aside... I just love how much support Microsoft gives Index Server. Check out all the information at the Index Server Support Center. I realize it's not their flagship product or anything, but come on guys... give us something!

Related Articles

  • Using Index Server to Search Your Web Site - Part I: The Basics
  • Using Index Server to Search Your Web Site - Part II: Linking, Limiting, Filtering, and ASP.NET
Close    To Top
  • Prev Article-Web Design:
  • Next Article-Web Design:
  • Now: Tutorial for Web and Software Design > Web Design > ASP > Web Design Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction