In the field of financial data processing and AI, efficiency is the key. By incorporating indexing in our document-oriented database equivalent, we can significantly increase our search efficiency, bringing us closer to MongoDB's capabilities.
Consider the insert
method in the displayed code. Apart from inserting the document into our database, it also populates an index, self.index
.
An index in databases is akin to an index in a book. It's a data structure that improves the speed of operations in a database. self.index
is a nested dictionary where the first key is the document field (equivalent to column in SQL), and the second key is the value of that field in the document. The value against this second key is a list of all documents that contain this field-value pair.
The indexed_find
method then leverages this index to execute searches. It's a straightforward iteration over the criteria. For each field-value pair in the criteria, if the field and value exist in self.index
, all matching documents are added to the results. This method is significantly faster than the find
method for large collections, which must check every document fully against the criteria.
Next, we create a DocumentDB
instance, add some documents into the 'products' collection, and execute both find
and indexed_find
methods to retrieve documents matching given criteria.
While Python's dictionaries are essentially hash tables with practically constant time complexity, for complex queries and large amounts of data, tabular databases would still suffer without the use of indices. This trade-off we experience here is commonly seen in real-world applications in finance and AI, where time complexity can significantly impact performance.
xxxxxxxxxx
print(db.indexed_find('products', {'name': 'Turing Machine'}))
if __name__ == '__main__':
class DocumentDB:
def __init__(self):
self.db = {}
self.index = {}
def insert(self, collection, document):
if collection not in self.db:
self.db[collection] = []
self.db[collection].append(document)
for key, value in document.items():
if key not in self.index:
self.index[key] = {}
if value not in self.index[key]:
self.index[key][value] = []
self.index[key][value].append(document)
def find(self, collection, criteria):
results = []
for document in self.db[collection]:
for key, value in criteria.items():
if document.get(key) != value:
break
else:
results.append(document)
def indexed_find(self, collection, criteria):
results = []