Tutorial hero
Lesson icon

CouchDB, PouchDB, and Ionic 2: Querying Data with MapReduce

Originally published March 07, 2017 Time 9 mins

In the first tutorial of this series I introduced CouchDB at a high level and touched on why you might want to use CouchDB over some other options, and why I think it makes a great companion for Ionic 2 applications.

We mostly talked about CouchDB at a conceptual level, but in this tutorial, I will be covering how to actually do things with CouchDB more specifically. As I mentioned in the last article, CouchDB can be a little hard to understand especially if you are used to relational databases.

The way in which you query your data in CouchDB is very different to using standard SQL, so we will be focusing on how you can go about running queries against your data in CouchDB.

(This is the last theory post I promise, we’ll get into actually building stuff in the next tutorial).

Querying Data

This is probably one of the bigger conceptual pain points for CouchDB – if people have data they generally want to perform operations like:

  • “Get all comments that belong to this post”
  • “Return a count of the total number of comments for this post”
  • “Get all products from this user’s shopping cart”

These types of requests will allow an application to do what it needs to do, but people may also want to retrieve information for the purpose of reports or analytics, like:

  • “Get all users who have signed up in the past week”
  • “Get users who have not posted in the last 60 days”
  • “Get the total number of posts”

The way in which one would run these sorts of queries against a relational database is pretty well known, e.g:

SELECT * FROM users WHERE signup_date > CURDATE() - INTERVAL 7 DAY

or perhaps you may also need to join some tables together using their foreign keys. It’s reasonably easy to look at that query and see what the query would be doing. Even some NoSQL databases provide their own query like structures, MongoDB for example provides operators like $gt, $or, and so on:

db.inventory.find({ status: 'A', qty: { $lt: 30 } });

and then there’s Couchbases’s N1QL which provides a similar to syntax to standard SQL.

Relational databases have been the de facto database for a long time, so this stuff is ingrained into a lot of developers, and transitioning to a NoSQL database that uses a similar approach (like N1QL) might seem less intimidating.

Querying data with CouchDB is a bit of a paradigm shift, and will take some getting used to, but if you could wipe all knowledge of databases from your mind I think that CouchDB’s method of retrieving data would actually be quite easy to learn (especially if you are already familiar with programming Javascript).

Retrieving Data with MapReduce

In order to retrieve data with CouchDB, we use a process called MapReduce, to create views. A view contains rows of data that is sorted by the row’s key (you might use date as a key, for example, to sort your data based on the date). MapReduce is a combination of two concepts Map and Reduce. This isn’t specific to CouchDB at all, in fact, we’ve already covered similar concepts in the past when we discussed how to map, filter, and reduce arrays in JavaScript.

Let’s refer to the Wikipedia definition of MapReduce:

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a Map() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).

I don’t think it is important, at least initially, to understand how MapReduce works behind the scenes (i.e. we don’t really care if the MapReduce process is run in parallel across multiple threads to increase performance), what is important is understanding what you need to do (mapping and reducing). Basically, there are two steps:

  1. First, we map the data, which means we convert the data from one form to another – the map function will run on every single document in the database. In CouchDB, this process will result in rows of data. As we are mapping we emit() a key-value data pair for a document if we want it to be added to the list of data, e.g. emit(key, value). You can use absolutely anything you like as the key, but the key is what you will use to access and sort your data so make sure that the key makes sense. If you want to query by date then you should use the date as the key, but if you wanted to query by something like car make or model you might want to use that as the key instead. Key’s don’t need to be unique, in the case where keys are the same you would just have all rows with matching keys returned in the query.
  2. Once we have our list of data, we also have the option to reduce it (but it is not required, you can just stick with the list you created in the map step). Reducing is more or less what it sounds like, it reduces the result of map into something smaller. This doesn’t mean that we just exclude certain results (i.e. filter out 3 of the 10 rows returned from the map step), usually, in a reduce function your goal is to end up with some single value. We could write a map function that returns a list of data with age as the key and income as the value, restrict it to only people between 30 and 35, and then reduce it to return the average of all of those incomes. So, at the end of the map step we have a list of incomes of people between 30 and 35, and at the end of the reduce step we just have a single value that represents the average income, i.e. 50,000.

A MapReduce view is created by adding it to the ’_design’ doc for the database (which we will discuss later). You can also use Futon, which is accessible with the following URL once you have CouchDB set up:

http://127.0.0.1:5984/_utils/

to create a ‘Temporary View’ directly through the interface:

CouchDB Map

In the example above I have a map and a reduce function set up. The result displayed at the bottom of the image is the result of the map step, which contains rows of data containing the documents name as the key, and the age as the value. We can then enable the reduce step by ticking the reduce box just above the result set.

CouchDB Reduce

Now the reduce function is being applied to that result set, and we generate an object that contains the total number of people and their combined age. The values parameter contains all of the values for rows in the data set, not just one, so we are able to sum all of those values together, and check how many values there are in total. Whatever we return from this function will be the result of the reduce.

To give you another example, if we were to actually implement the example I gave about ages and incomes, it might look something like this:

map

function(doc){

    if(doc.age > 29 && doc.age < 36){
        emit(doc.age, doc.income);
    }

}

reduce

function(keys, values){

    var averageIncome = sum(values) / values.length

    return averageIncome;
}

We will go through some examples of actually creating more realistic MapReduce views and adding them to the design doc (rather than just creating temporary views) in a future tutorial, but for now, I just wanted to show you what it might actually look like.

When you query a view, CouchDB will run the MapReduce function against every document in the database. On the surface, that sounds like a bad idea – especially if you’ve got millions of documents. However, it only performs this process once to create the view initially, and when updates are made to the data it only needs to make updates to the resulting view for that specific document (it doesn’t need to regenerate the entire view again).

To query a view, all you need to do it access its URL, which will look something like this (once you have added it to a design doc):

mydb/_design/nameofdesigndoc/_view/nameofview

You can also supply parameters to the URL to restrict the returned dataset by using things like by_date? with a start_key and an end_key. The process for this would go something like ”Start at the row that has a date that equals the start_key and then keep returning rows until you reach a key that matches the end_key”. We will go through examples of this in later tutorials.

Summary

It’s clear to see that the way in which things are done in CouchDB world are way different than the way you might do them with a relational database. It can be difficult to perform that context switch and learn to not to apply relational concepts to a non-relational database, but once you get comfortable with it I think it’s actually quite nice to work with.

At this point, I think we have enough theory behind us to start building some more practical things. In the next tutorial, we will start looking at integrating CouchDB into an Ionic 2 application.

Learn to build modern Angular apps with my course