AlgoDaily - Understanding Underlying Mechanisms

Home > Build Datastores From Scratch > Build Datastores From Scratch > Understanding Underlying Mechanisms

As an experienced developer, you would already know that a Full-Text Search (FTS) system allows the user to efficiently search text content data for keywords. A notable solution for this purpose is Elasticsearch, a popular open-source FTS engine built on top of the Lucene library.

A full-text search engine works by using an index to quickly look up documents containing search terms. When building a basic full-text search engine from scratch, the essential part is designing a data structure to store the index for quick lookups.

You can use a data structure like a Trie or a Hash Map, depending on the requirements and constraints of your system. For instance, Tries can be a great choice when the number of search terms is large, and we need to perform prefix searches often.

Now, let's look at a Python example where we are using a dictionary (Python's built-in hash map) to create a simple inverted index.

xxxxxxxxxx
 
if __name__ == '__main__':
  # We Initialize an empty dictionary to hold our inverted index
  inverted_index = {}
​
  # Imagine these documents are in our data store
  docs = [
    'Data Science is an exciting field',
    'AI and Machine Learning are the buzzwords in tech',
    'Python is commonly used in Machine Learning'
  ]
​
  # We loop through the documents and index the words
  for index, doc in enumerate(docs):
    for word in doc.split():
      word = word.lower()
      if word in inverted_index:
        inverted_index[word].add(index)
      else:
        inverted_index[word] = {index}
​
  # Let's search for a keyword
  keyword = 'machine'.lower()
  if keyword in inverted_index:
    print('Keyword found in documents:', inverted_index[keyword])
  else:
    print('No document contains the keyword')

Programming Categories

Popular Lessons