Indexing in MongoDB

This entry was posted in Big Data, Blog, MongoDB on by .   0 Comment[s]

What is indexing?

Indexes provide high performance read operations for frequently used queries. For example Index in books it makes us to find the pages easily.



Indexes support the efficient execution of queries in MongoDB. With index, MongoDB will scan only less documents in your collection. But without index it scans all the data in your documents and it takes so much time.

Index is special data structure that uses B-tree algorithm. Indexes in MongoDB are similar to the indexes in other database system like mysql,oracle etc.MongoDB defines indexes at the collection level and supports indexes on any field of the documents. MongoDB can use the index to limit the number of documents


What is B-tree    ?

A B-tree is a method of placing and locating files in a database. B-tree algorithm minimizes the number of times a document must be accessed to locate a desired record, thereby speeding up the overall process.



Index Types

MongoDB provides a number of different types of indexes. In general, you should create indexes that support your common and user-facing queries. Having these indexes will ensure that MongoDB scans the smallest possible number of documents.

  • Single Field Indexes
  • Compound Indexes
  • Multikey Indexes
  • Geospatial Indexes and Queries
  • Text Indexes
  • Hashed Index


Index Creation

Indexes are needed to make queries faster. For example if you need to find records by a field named patientname and the field has a related index, then the query will be faster compared to without index.


collection.createIndex(index[, options], callback) 
db.createIndex(collectionname, index[, options], callback)

  • Index – is the fields to be indexed.
  • Options – are options, for example {sparse: true} to include only records that have indexed field.
  • Callback gets two parameters -an error object (if an error occurred) and the name for the newly created index


Ensure indexes with ensureIndex()

EnsureIndex is similar as the createIndex but with the difference that the index is checked for existence before adding to avoid duplicate indexes.




Remove indexes with dropIndex()

All indexes can be dropped at once with dropIndexes command.




Get index information with indexInformation():

This can be used to fetch some useful information about collection indexes.





The performance difference between the MongoDB with and without index has been discussed below with the examples.

The Dataset used for the example is patient data which has 3 million records.

3 million Records without Indexing

Data base name: Patient
Collection name: pat10
Record type: CSV(Patient data)


Command Line Representation


Bar Chart Representation


Data Size 3000000 Scanned Records 3000000 Time Taken 1112 ms

3 million Records with Indexing

Data base name: Patient
Collection name:pat10wi
Record type: CSV(Patient data)

Command Line Representation



Bar Chart Representation


Data Size 3000000 Scanned Records 5 Time Taken 13 ms



The mongostat utility provides a quick overview of the status of a currently running mongod or mongos instance. mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and mongos instances.


  • Lock While inserting we can see the write lock time. When this is happening no other queries will complete until the lock is given up. This is indicative of a large, global operation like a remove() or dropping a collection and can result in slow performance
  • qr|qw when MongoDB gets too many queries to handle in real time, it queues them up



Top Linux command

Top command displays processor activity of your Linux box and also displays tasks managed by kernel in real-time. It’ll show processor and memory are being used and other information like running processes. This may help you to take correct action. Top command found in UNIX-like operating systems.



mongotop provides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level. By default, mongotop returns values every second.




Written by Amudhan, Data Engineer @ DataDotz.

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.