highres_14391580

Indexing in MongoDB

This entry was posted in Big Data, Blog, MongoDB on by .   0 Comment[s]

What is indexing?

Indexes provide high performance read operations for frequently used queries. For example Index in books it makes us to find the pages easily.

 

Introduction

Indexes support the efficient execution of queries in MongoDB. With index, MongoDB will scan only less documents in your collection. But without index it scans all the data in your documents and it takes so much time.

Index is special data structure that uses B-tree algorithm. Indexes in MongoDB are similar to the indexes in other database system like mysql,oracle etc.MongoDB defines indexes at the collection level and supports indexes on any field of the documents. MongoDB can use the index to limit the number of documents

 

What is B-tree    ?

A B-tree is a method of placing and locating files in a database. B-tree algorithm minimizes the number of times a document must be accessed to locate a desired record, thereby speeding up the overall process.

index1

 

Index Types

MongoDB provides a number of different types of indexes. In general, you should create indexes that support your common and user-facing queries. Having these indexes will ensure that MongoDB scans the smallest possible number of documents.

  • Single Field Indexes
  • Compound Indexes
  • Multikey Indexes
  • Geospatial Indexes and Queries
  • Text Indexes
  • Hashed Index

 

Index Creation

Indexes are needed to make queries faster. For example if you need to find records by a field named patientname and the field has a related index, then the query will be faster compared to without index.

Syntax

collection.createIndex(index[, options], callback) 
db.createIndex(collectionname, index[, options], callback)

  • Index – is the fields to be indexed.
  • Options – are options, for example {sparse: true} to include only records that have indexed field.
  • Callback gets two parameters -an error object (if an error occurred) and the name for the newly created index

 

Ensure indexes with ensureIndex()

EnsureIndex is similar as the createIndex but with the difference that the index is checked for existence before adding to avoid duplicate indexes.

Syntax

collection.ensureIndex({patientname:1})

 

Remove indexes with dropIndex()

All indexes can be dropped at once with dropIndexes command.

Syntax

collection.dropIndexes(callback)

 

Get index information with indexInformation():

This can be used to fetch some useful information about collection indexes.

Syntax

collection.indexInformation(callback)

 

INDEXES PERFORMANCE

The performance difference between the MongoDB with and without index has been discussed below with the examples.

The Dataset used for the example is patient data which has 3 million records.

3 million Records without Indexing

Data base name: Patient
Collection name: pat10
Record type: CSV(Patient data)

 

Command Line Representation

index2

Bar Chart Representation

index3

Data Size 3000000 Scanned Records 3000000 Time Taken 1112 ms

3 million Records with Indexing

Data base name: Patient
Collection name:pat10wi
Record type: CSV(Patient data)

Command Line Representation

index4

 

Bar Chart Representation

index5

Data Size 3000000 Scanned Records 5 Time Taken 13 ms

 

MONITORING MONGO WITH STAT TOOLS

The mongostat utility provides a quick overview of the status of a currently running mongod or mongos instance. mongostat is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and mongos instances.

index6

  • Lock While inserting we can see the write lock time. When this is happening no other queries will complete until the lock is given up. This is indicative of a large, global operation like a remove() or dropping a collection and can result in slow performance
  • qr|qw when MongoDB gets too many queries to handle in real time, it queues them up

index7

 

Top Linux command

Top command displays processor activity of your Linux box and also displays tasks managed by kernel in real-time. It’ll show processor and memory are being used and other information like running processes. This may help you to take correct action. Top command found in UNIX-like operating systems.

index8

Mongotop

mongotop provides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level. By default, mongotop returns values every second.

index9

 

 

Written by Amudhan, Data Engineer @ DataDotz.

DataDotz is a Chennai based BigData Team primarily focussed on consulting and training on technologies such as Apache Hadoop, Apache Spark , NoSQL(HBase, Cassandra, MongoDB), Search and Cloud Computing.