How to Speed up Mongo Queries

As you might already know MongoDB is already fast when performing read queries and specifically when it’s using Indexes

If you are not familiar with Indexes in MongoDB here is a quick summary;

MongoDB has something internally called an index, an index is matched up with an individual collection.

This index is an efficient data structure for looking up sets of records inside of that collection.
So rather than having to look at a record and see if it matches the query and the one after and so on (A full Collection Scan), the index allows us to go directly to the record that we care about.

However, You can very easily write queries that don’t match up with an index or don’t have an index available.
So in those situations, you would very easily run into big performance concerns around your application.

Now there are two ways to solve this performance concern:

1- Creating Multiple Indexes for each collection :

That might seem like a really obvious thing to do.

However, whenever you add indexes to a collection, that impacts your ability to write to that collection performantly.

Also, You might be making queries inside of an application where you can't really figure out ahead of time what indexes you need for it.

2- Caching Data with Redis

Redis is an open-source, in-memory data structure store, used as a database, cache, and message broker.

Here is an illustration of how Redis is implemented as a cache server :

Any time mongoose issues a query it’s going to first go over to Redis, Redis is going to check to see if that exact query has ever been issued before.

If it hasn’t then the server will take the query and send it over to MongoDB and Mongo is going to execute the query.

You then take the results of that query and store them on Redis

Now any time the same exact query is issued again Mongoose is going to send the same query over to the Redis but this time if Redis sees that the query has already been issued once before it’s not going to send the query to MongoDB.

Instead, it’s going to take the response to that query that it got the last time and immediately send it back over to mongoose.

Few notes before we get started:

Redis is a data store that operates only in memory, so that means that once it gets turned off or restarted or anything like that all the data that sits inside is instantly deleted and wiped.

So in practice, we only use Redis for data that we feel kind of OK having suddenly disappear into thin air because who knows maybe we lose power or our machine gets restarted.

We can only store numbers or letters on Redis, that’s why you’ll see a lot of JSON.stringify() and JSON.parse() in the code.

We are gonna be using Express.js and Mongoose with MongoDB and Redis

NOW LET’S GET STARTED:

FIRST STEP:

  • Install Redis on your machine, and don’t forget to add it to your PATH ENVIRONMENT VARIABLES ( You can google all of this )

LET’S START CODING:

You can find all the code in this Github repo, feel free to fork/clone it and follow along.

Now before we go any further let’s discuss how we really are going to implement this caching thing

I know this is going to sound kind of crazy but it’s going to work out really well.

We are going to change how mongoose makes a query and executes it

Just think about it our entire caching strategy is based on the idea of somehow stopping Mongoose from making a query over to Mongo.

And it also is based upon the idea of somehow intercepting the value that comes back from Mongo as well so we can store it inside of our cashe.

So this entire idea of caching is incredibly tightly coupled with mongoose.

Now Under the folder Services, we’re gonna create a file named cache.js where we will have all the logic of caching.

Always start with few imports: (Don’t forget to npm install these)

const mongoose = require('mongoose');
const redis = require('redis');
const util = require('util');

This how we create a redis-client :

const redis_url = 'redis://127.0.0.1:6379'
const client = redis.createClient(redis_url);

For the redis_url, that’s the default for Redis.

Here we are using util.promisify only to make the function “client.hgetreturn a promise instead of accepting a call back (only to avoid writing callbacks each time we make a call to this function)

client.hget = util.promisify(client.hget);

The function hget is the function used to retrieve nested hashes from Redis
And the function hset is the one used to store nested hashes on Redis

As you can see we need to use callbacks each time we call hget and its really tiring.

hget('spanish', 'red', (err, val) => console.log(val))

After using util.promisify we can call it like this :

const cacheValue = await client.hget('spanish', 'red')

Now the next step is to hijack the mongoose process and monkey patch mongoose’s exec function to make it check inside of Redis before sending queries to Mongo :

before that, we need to monkey patch a new method called cache :

mongoose.Query.prototype.cache = function (options = {}) {
this.useCache = true;
this.hashKey = JSON.stringify(options.key || '');
return this;
}

This is the function that we will be using to send the Key (the first argument in the hset function) and choose if a query is gonna be stored in Redis or not
The usage of this function will be something like this:

const blogs = await Blog.find({ _user: req.user.id })
.cache({
key: req.user.id
});

Now let’s move to the exec function:

We save it first

const exec = mongoose.Query.prototype.exec;

Now we Monkey Patch it:

mongoose.Query.prototype.exec = async function () {
if (!this.useCache) {
return exec.apply(this, arguments)
}

const key = JSON.stringify(Object.assign({}, this.getQuery(), {
collection: this.mongooseCollection.name
}));

const cacheValue = await client.hget(this.hashKey, key)

if(cacheValue) {
const doc = JSON.parse(cacheValue);

return Array.isArray(doc)
? doc.map(d => new this.model(d))
: new this.model(doc);
}

const result = await exec.apply(this, arguments);

client.hset(this.hashKey, key, JSON.stringify(result));
return result
}
module.exports = {
clearHash(hashKey) {
client.del(JSON.stringify(hashKey));
}
}

Let’s break this down into little pieces :

Here we are checking if we have called the “cache” method, if we didn’t it will just let mongoose do its regular job without interference, else we start the caching process

if (!this.useCache) {
console.log("FETCHING FROM MONGO")
return exec.apply(this, arguments)
}

Here we are storing the query and the name of collection the query is performed on into an object called key (this is going to be the second and third argument of the HSET function)

const key = JSON.stringify(Object.assign({}, this.getQuery(), {
collection: this.mongooseCollection.name
}));

Here we try to retrieve the data from Redis by using hget

const cacheValue = await client.hget(this.hashKey, key)

if(cacheValue) {
const doc = JSON.parse(cacheValue);

return Array.isArray(doc)
? doc.map(d => new this.model(d))
: new this.model(doc);
}

If we find an exact match of the query we transfor it from a plain javascript object to Mongoose Documents and return the result

Otherwise, issue the intended query to Mongo and store the result in Redis:

const result = await exec.apply(this, arguments);

client.hset(this.hashKey, key, JSON.stringify(result));
return result

This is how we store and retrieve data from Redis

BUT WE STILL NEED TO CLEAR THE REDIS CACHE WHEN WE MAKE ANY CHANGES, OTHERWISE WE WILL ALWAYS BE RETRIEVING THE SAME DATA FROM REDIS

This clearHash is the function that is going to dumb all data associated with a given hashKey (the same first argument we were using in hset/hget)

module.exports = {
clearHash(hashKey) {
client.del(JSON.stringify(hashKey));
}
}

Now in the middlewares folder we create a file called clearCache.js

const { clearHash } = require('../services/cache')

module.exports = async (req, res, next) => {
await next();

clearHash(req.body.userId);

}

This function will be used as a middleware in our post route or any other route that is going to make a change in our data and it needs to be updated

It will look something like this:

router.post('/', clearCache, async (req, res) => {
const { userId, title, content } = req.body

const user = await User.findById(userId)

const article = new Article({
title,
content,
user
})
await article.save();
res.json(article);

})

The clearHash function will call next() which is the route handler we set up, after that finish its execution, clearHash continues its work and deletes the data needed !

We are making this workaround to make the clearCache work after the route handler executed as expected so that we don’t delete data from redis if the route handler didn’t do its job.

Ouuf ! WE FINISHED !

Make sure to check out the code in the github repo

The app in the github repo is fully functional and you can test the caching with it if you have redis installed correctly on your machine.

it’s pretty straight forward and you can understand it pretty quick if you have experience with express !

Connect with me on Linkedin and see you in the next Article !

Backend Developer in Pursuit of Happiness | Entrepreneur in the making