lanky Girl

3.8 Information Retrieval

In DITA on January 1, 2010 at 6:40 pm

This week in DITA we have been introduced to the concept of Information retrieval. Information retrieval is a broad concept but essentially it deals with how we are able to access information usually in a text format from large collections stored on computers. Information retrieval is the opposite to the relational model and deals with unstructured data.

The  web has meant that there has been explosion in the amount information we now have stored. (Manning, Raghavan, Schütze, 2009)

During the lab exercise the first task was to do a Boolean search based on an emotional or ASK need. To do a Boolean search you pose a query which is in form of a Boolean expression using the operators ‘AND’, ‘NOT’ and OR. To test  this we used BING search engine and tried to use different search expressions to change the result of the search. I used  the ASK  to find “Places for afternoon tea outside London”.

First search: places for afternoon tea NOT London – results come back that will not feature London

Second Search: Places for afternoon AND tea NOT London-results that came back  featured tea in the afternoon and that werent in London.

Second half of the task was to create an inverted file.

Inverted files are used by search engines to speed up the process of retrieving information. In an index file you have a given term and then for each term a lexicon which is list that shows how many times that term occurs and number that it appears in the text.

Here is a link to a worked example.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: