Skip to main content

· 6 min read

Last week at the Paris MUG, I had a quick chat about security and MongoDB, and I have decided to create this post that explains how to configure out of the box security available in MongoDB.

You can find all information about MongoDB Security in following documentation chapter:

In this post, I won't go into the detail about how to deploy your database in a secured environment (DMZ/Network/IP/Location/...)

I will focus on Authentication and Authorization, and provide you the steps to secure the access to your database and data.

I have to mention that by default, when you install and start MongoDB, security is not enabled. Just to make it easier to work with.

The first part of the security is the Authentication, you have multiple choices documented here. Let's focus on "MONGODB-CR" mechanism.

The second part is Authorization to select what a user can do or not once he is connected to the database. The documentation about authorization is available here.

Let's now document how-to:

  1. Create an Administrator User
  2. Create Application Users

For each type of users I will show how to grant specific permissions.

· 7 min read

Few days ago I have posted a joke on Twitter

So I decided to move it from a simple picture to a real project. Let's look at the two phases of this so called project:

  • Moving the data from Couchbase to MongoDB
  • Updating the application code to use MongoDB

Look at this screencast to see it in action:

· 6 min read


  • MongoDB & Sage organized an internal Hackathon
  • We use the new X3 Platform based on MongoDB, Node.js and HTML to add cool features to the ERP
  • This shows that “any” enterprise can (should) do it to:
    • look differently at software development
    • build strong team spirit
    • have fun!


I have like many of you participated to multiple Hackathons where developers, designer and entrepreneurs are working together to build applications in few hours/days. As you probably know more and more companies are running such events internally, it is the case for example at Facebook, Google, but also ING (bank), AXA (Insurance), and many more.

Last week, I have participated to the first Sage Hackathon!

In case you do not know Sage is a 30+ years old ERP vendor. I have to say that I could not imagine that coming from such company… Let me tell me more about it.

· 5 min read

In this article we will see how to create a pub/sub application (messaging, chat, notification), and this fully based on MongoDB (without any message broker like RabbitMQ, JMS, ... ).

So, what needs to be done to achieve such thing:

  • an application "publish" a message. In our case, we simply save a document into MongoDB
  • another application, or thread, subscribe to these events and will received message automatically. In our case this means that the application should automatically receive newly created document out of MongoDB

All this is possible with some very cool MongoDB features: capped collections and tailable cursors,

· 7 min read

In the past 2 years, I have met many developers, architects that are working on “big data” projects. This sounds amazing, but quite often the truth is not that amazing.

####TL;TR You believe that you have a big data project?

  • Do not start with the installation of an Hadoop Cluster -- the "how"
  • Start to talk to business people to understand their problem -- the "why"
  • Understand the data you must process
  • Look at the volume -- very often it is not "that" big
  • Then implement it, and take a simple approach, for example start with MongoDB + Apache Spark

'Big Data'

· 9 min read

This post is a quick and simple introduction to Geospatial feature of MongoDB 2.6 using simple dataset and queries.

Storing Geospatial Informations#

As you know you can store any type of data, but if you want to query them you need to use some coordinates, and create index on them. MongoDB supports three types of indexes for GeoSpatial queries:

  • 2d Index : uses simple coordinate (longitude, latitude). As stated in the documentation: The 2d index is intended for legacy coordinate pairs used in MongoDB 2.2 and earlier. For this reason, I won't detail anything about this in this post. Just for the record 2d Index are used to query data stored as points on a two-dimensional plane
  • 2d Sphere Index : support queries of any geometries on an-earth-like sphere, the data can be stored as GeoJSON and legacy coordinate pairs (longitude, latitude). For the rest of the article I will use this type of index and focusing on GeoJSON.
  • Geo Haystack : that are used to query on very small area. It is today less used by applications and I will not describe it in this post. So this article will focus now on the 2d Sphere index with GeoJSON format to store and query documents.

So what is GeoJSON?

You can look at the site, let's do a very short explanation. GeoJSON is a format for encoding, in JSON, a variety of geographic data structures, and support the following types: Point , LineString , Polygon , MultiPoint , MultiLineString , MultiPolygon and Geometry.

The GeoJSON format is quite straightforward based, for the simple geometries, on two attributes: type and coordinates. Let's take some examples:

The city where I spend all my childhood, Pleneuf Val-André, France, has the following coordinates (from Wikipedia)

48° 35′ 30.12″ N, 2° 32′ 48.84″ W

This notation is a point, based on a latitude & longitude using the WGS 84 (Degrees, Minutes, Seconds) system. Not very easy to use by application/code, this is why it is also possible to represent the exact same point using the following values for latitude & longitude:

48.5917, -2.5469

This one uses the WGS 84 (Decimal Degrees) system. This is the coordinates you see use in most of the application/API you are using as developer (eg: Google Maps/Earth for example)

By default GeoJSON, and MongoDB use these values but the coordinates must be stored in the longitude, latitude order, so this point in GeoJSON will look like:

{  "type": "Point",  "coordinates": [  -2.5469,    48.5917  ]}

This is a simple "Point", let's now for example look at a line, a very nice walk on the beach :

{  "type": "LineString",  "coordinates": [    [-2.551082,48.5955632],    [-2.551229,48.594312],    [-2.551550,48.593312],    [-2.552400,48.592312],    [-2.553677, 48.590898]  ]  } )

So using the same approach you will be able to create MultiPoint, MultiLineString, Polygon, MultiPolygon. It is also possible to mix all these in a single document using a GeometryCollection. The following example is a Geometry Collection of MultiLineString and Polygon over Central Park:

{  "type" : "GeometryCollection",  "geometries" : [    {      "type" : "Polygon",      "coordinates" : [[  [ -73.9580, 40.8003 ],  [ -73.9498, 40.7968 ],  [ -73.9737, 40.7648 ],  [ -73.9814, 40.7681 ],  [ -73.9580, 40.8003  ]]      ]    },    {      "type" : "MultiLineString",      "coordinates" : [[ [ -73.96943, 40.78519 ], [ -73.96082, 40.78095 ] ],[ [ -73.96415, 40.79229 ], [ -73.95544, 40.78854 ] ],[ [ -73.97162, 40.78205 ], [ -73.96374, 40.77715 ] ],[ [ -73.97880, 40.77247 ], [ -73.97036, 40.76811 ] ]      ]    }  ]}

Note: You can if you want test/visualize these JSON documents using the service.

Now what? Let's store data!#

Once you have a GeoJSON document you just need to store it into your document. For example if you want to store a document about JFK Airport with its location you can run the following command:

db.airports.insert({  "name" : "John F Kennedy Intl",  "type" : "International",  "code" : "JFK",  "loc" : {    "type" : "Point",    "coordinates" : [ -73.778889, 40.639722 ]  }}

Yes this is that simple! You just save the GeoJSON as one of the attribute of the document, loc in this example)

Querying Geospatial Informations#

Now that we have the data stored in MongoDB, it is now possible to use the geospatial information to do some interesting queries.

For this we need a sample dataset. I have created one using some open data found in various places. This dataset contains the following informations:

  • airports collection with the list of US airport (Point)
  • states collection with the list of US states (MultiPolygon)

I have created this dataset from various OpenData sources ( , ) and use toGeoJSON to convert them into the proper format.

Let's install the dataset:

  1. Download it from here
  2. Unzip file
  3. Restore the data into your mongoDB instance, using the following command

MongoDB allows applications to do the following types of query on geospatial data:

  • inclusion
  • intersection
  • proximity

Obviously, you will be able to use all the other operator in addition to the geospatial ones. Let's now look at some concrete examples.


Find all the airports in California. For this you need to get the California location (Polygon) and use the command $geoWithin in the query. From the shell it will look like :

use geovar cal = db.states.findOne(  {code : "CA"}  );
db.airports.find({  loc : { $geoWithin : { $geometry : cal.loc } }},{ name : 1 , type : 1, code : 1, _id: 0 });


{ "name" : "Modesto City - County", "type" : "", "code" : "MOD" }...{ "name" : "San Francisco Intl", "type" : "International", "code" : "SFO" }{ "name" : "San Jose International", "type" : "International", "code" : "SJC" }...

So the query is using the "California MultiPolygon" and looks in the airports collection to find all the airports that are in these polygons. This looks like the following image on a map:

You can use any other query features or criteria, for example you can limit the query to international airport only sorted by name :

db.airports.find({  loc : { $geoWithin : { $geometry : cal.loc } },  type : "International"},{ name : 1 , type : 1, code : 1, _id: 0 }).sort({ name : 1 });


{ "name" : "Los Angeles Intl", "type" : "International", "code" : "LAX" }{ "name" : "Metropolitan Oakland Intl", "type" : "International", "code" : "OAK" }{ "name" : "Ontario Intl", "type" : "International", "code" : "ONT" }{ "name" : "San Diego Intl", "type" : "International", "code" : "SAN" }{ "name" : "San Francisco Intl", "type" : "International", "code" : "SFO" }{ "name" : "San Jose International", "type" : "International", "code" : "SJC" }{ "name" : "Southern California International", "type" : "International", "code" : "VCV" }

I do not know if you have looked in detail, but we are querying these documents with no index. You can run a query with the explain() to see what's going on. The $geoWithin operator does not need index but your queries will be more efficient with one so let's create the index:

db.airports.ensureIndex( { "loc" : "2dsphere" } );

Run the explain and you will se the difference.


Suppose you want to know what are all the adjacent states to California, for this we just need to search for all the states that have coordinates that "intersects" with California. This is done with the following query:

var cal = db.states.findOne(  {code : "CA"}  );db.states.find({  loc : { $geoIntersects : { $geometry : cal.loc  }  } ,  code : { $ne : "CA"  }  },{ name : 1, code : 1 , _id : 0 });


{ "name" : "Oregon", "code" : "OR" }{ "name" : "Nevada", "code" : "NV" }{ "name" : "Arizona", "code" : "AZ" }

Same as before $geoIntersect operator does not need an index to work, but it will be more efficient with the following index:

db.states.ensureIndex( { loc : "2dsphere" } );


The last feature that I want to highlight in this post is related to query with proximity criteria. Let's find all the international airports that are located at less than 20km from the reservoir in NYC Central Park. For this you will be using the $near operator.

db.airports.find({  loc : {    $near : {      $geometry : {        type : "Point" ,        coordinates : [-73.965355,40.782865]        },      $maxDistance : 20000    }  },  type : "International"},{  name : 1,  code : 1,  _id : 0});


{ "name" : "La Guardia", "code" : "LGA" }{ "name" : "Newark Intl", "code" : "EWR"}

So this query returns 2 airports, the closest being La Guardia, since the $near operator sorts the results by distance. Also it is important to raise here that the $near operator requires an index.


In this first post about geospatial feature you have learned:

  • the basic of GeoJSON
  • how to query documents with inclusion, intersection and proximity criteria.

You can now play more with this for example integrate this into an application that expose data into some UI, or see how you can use the geospatial operators into a aggregation pipeline.

· 6 min read

Wow! it has been a while since I posted something on my blog post. I have been very busy, moving to MongoDB, learning, learning, learning…finally I can breath a little and answer some questions.

Last week I have been helping my colleague Norberto to deliver a MongoDB Essentials Training in Paris. This was a very nice experience, and I am impatient to deliver it on my own. I was happy to see that the audience was well balanced between developers and operations, mostly DBA.

What! I still need DBA?#

This is a good opportunity to raise a point, or comment a wrong idea: the fact that you are using MongoDB, or any other NoSQL datastore does not mean that you do not need a DBA… Like any project, an administrator is not mandatory, but if you have one it is better. So even when MongoDB is pushed by development team it is very important to understand the way the database works, and how to administer, monitor it.

If you are lucky enough to have real operations teams, with good system and database administrators use them! They are very important for your application.

Most DBA/System Administrators have been maintaining systems in production for many years. They know how to keep your application up and running. They also most of the time experienced many “disasters”, and then recover (I hope).

Who knows, you may encounter big issues with your application and you will be happy to have them on your side at this moment.

"Great, but the DBA is slowing down my development!"#

I hear this, sometimes, and I had this feeling in the past to as developer in large organization. Is it true?

Developers and DBA are today, not living in the same worlds:

  • Developers want to integrate new technologies as soon as possible, not only because it is fun and they can brag about it during meetups/conferences; but because these technologies, most of the time, are making them more productive, and offer better service/experience to the consumer
  • DBA, are here to keep the applications up and running! So every time they do not feel confident about a technology they will push back. I think this is natural and I would be probably the same in their position. Like all geeks, they would love to adopt new technologies but they need to understand and trust it before.

System administrators, DBAS look at the technology with a different angle than developers.

Based on this assumption, it is important to bring the operation team as early as possible when the development team wants to integrate MongoDB or any new data store. Having the operation team in the loop early will ease the global adoption of MongoDB in the company.

Personally, and this will show my age, I have seen a big change in the way developers and DBAs are working together.

Back in the 90's, when the main architecture was based on client/server architecture developers and DBAs where working pretty well togethers; probably because they were speaking the same language: SQL was everywhere. I had regular meetings wit

Then, since mid 2000, mots of applications have moved to a web based architecture , with for example Java middleware, and the developers stopped working with DBAs. Probably because the abstraction data layer provided by the ORM exposed the database as a "commodity" service that is supposed to work: "Hey Mr DBA, my application has been written with the best middleware technology on the market, so now deal with the performance and scalability! I am done!"

Yes it is a cliché, but I am sure that some of you will recognize that.

Nevertheless each time I can, I have been pushing developers to talk more to administrators and look closely to their database!

A new era for operations and development teams#

The fast adoption of MongoDB by developers, is a great opportunity to fix what we have broken 10 years ago in large information systems:

  • Let's talk again!

MongoDB has been built first for developers. The document oriented approach gives lot of flexibility to quickly adapt to change. So anytime your business users need a new feature you can implement it, even if this change impact the data structure. Your data model is now driven and controlled by the application, not the database engine.

However, the applications still need to be available 24x7, and performs well. These topics are managed - and shared- by administrator and developers! This has been always the case but, as I described it earlier, it looks like some of us have forgotten that.

Schemas design, change velocity, are driven by the application, so the business and development teams, but all this impacts the database, for example:

  • How storage will grow ?
  • Which indexes must be created to speed up my application?
  • How to organize my cluster to leverage the infrastructure properly:
    • Replica-Set organization (and related write concerns, managed by developer)
    • Sharding options
  • And the most important of them : backup/recovery strategies

So many things that could be managed by the project team, but if you have an operation team with you, it will be better to do that as a single team.

You, the developer, are convinced that MongoDB is the best database for your projects! Now it is time to work with the ops team and convince them too. So you should for sure explain why MongoDB is good for you as developer, but also you should highlight all the benefits for the operations, starting with built-in high-availability with replica sets, and easy scalability with sharding. MongoDB is also here to make the life of the administrator easier! I have shared in the next paragraph a lit of resources that are interesting for operations people.

Let’s repeat it another time, try to involve the operation team as soon as possible, and use that as an opportunity to build/rebuild the relationship between developers and system administrators!


You can find many good resources on the Site to helps operations or learn about this:

· 7 min read

If you have to deal with a large number of documents when doing queries against a Couchbase cluster it is important to use pagination to get rows by page. You can find some information in the documentation in the chapter "Pagination", but I want to go in more details and sample code in this article.

For this example I will start by creating a simple view based on the beer-sample dataset, the view is used to find brewery by country:

function (doc, meta) {  if (doc.type == "brewery" &&{    emit(;  }}

This view list all the breweries by country, the index looks like:

Doc idKeyValue
yellowstone_valley_brewingUnited Statesnull
yuengling_son_brewingUnited Statesnull
zea_rotisserie_and_breweryUnited Statesnull
fosters_tien_gangViet Namnull
hue_breweryViet Namnull

So now you want to navigate in this index with a page size of 5 rows.

Using skip / limit Parameters#

The most simplistic approach is to use limit and skip parameters for example:

Page 1 : ?limit=5&skip0
Page 2 : ?limit=5&skip=5 ... Page x : ?limit=5&skip(limit*(page-1))

You can obviously use any other parameters you need to do range or key queries (startkey/endkey, key, keys) and sort option (descending).

This is simple but not the most efficient way, since the query engine has to read all the rows that match the query, until the skip value is reached.

Some code sample in python that paginate using this view :

This application loops on all the pages until the end of the index.

As I said before this is not the best approach since the system must read all the values until the skip is reached. The following example shows a better way to deal with this.

Using startkey / startkey_docid parameters#

To make this pagination more efficient it is possible to take another approach. This approach uses the startkey and startkey_docid to select the proper documents.

  • The startkey parameter will be the value of the key where the query should start to read (based on the last key of the "previous page"
  • Since for a key for example "Germany" you may have one or more ids (documents) it is necessary to say to Couchbase query engine where to start, for this you need to use the startkey_docid parameter, and ignore this id since it is the last one of the previous page.

So if we look at the index, and add a row number to explain the pagination

Row numDoc idKeyValue

Query for page 1
Query for page 2

Query for page 3
...yellowstone_valley_brewingUnited Statesnull
...yuengling_son_brewingUnited Statesnull
...zea_rotisserie_and_breweryUnited Statesnull
...fosters_tien_gangViet Namnull
...hue_breweryViet Namnull

So as you can see in the examples above, the query uses the startkey, a document id, and just passes it using skip=1.

Let's now look at the application code, once again in Python

from couchbase import Couchbasecb = Couchbase.connect(bucket='beer-sample')
hasRow = TruerowPerPage = 5page = 0currentStartkey=""startDocId=""
while hasRow :    hasRow = False    skip = 0 if page == 0 else 1    page = page + 1    print "-- Page %s --" % (page)    rows = cb.query("test", "by_country", limit=rowPerPage, skip=skip, startkey=currentStartkey, startkey_docid=startDocId)    for row in rows:        hasRow = True        print "Country: \"%s\" \t Id: '%s'" % (row.key, row.docid)        currentStartkey = row.key        startDocId = row.docid    print " -- -- -- -- \n"

This application loops on all the pages until the end of the index

Using this approach, the application start to read the index at a specific key (startkey parameter), and only loop on the necessary entry in the index. This is more efficient than using the simple skip approach.

Views with Reduce function#

When your view is using a reduce function, if you want to paginate on the various keys only (with the reduce function) you need to use the skip and limit parameters.

When you are using the paramater startkey_docid with a reduce function it will calculate the reduce only to the subset of document ids that are part of your query.

Couchbase Java SDK Paginator#

In the previous examples, I have showed how to do pagination using the various query parameters. The Java SDK provides a Paginator object to help developers to deal with pagination. The following example is using the same view with the Paginator API.

package com.couchbase.devday;
import com.couchbase.client.CouchbaseClient;import com.couchbase.client.protocol.views.*;import;import java.util.HashMap;import java.util.LinkedList;import java.util.List;import java.util.Properties;import java.util.concurrent.TimeUnit;import java.util.logging.ConsoleHandler;import java.util.logging.Handler;import java.util.logging.Level;import java.util.logging.Logger;
public class JavaPaginatorSample {
public static void main(String[] args) {
    configure();    System.out.println("--------------------------------------------------------------------------");    System.out.println("\tCouchbase - Paginator");    System.out.println("--------------------------------------------------------------------------");
    List<URI> uris = new LinkedList<URI>();    uris.add(URI.create(""));
    CouchbaseClient cb = null;    try {        cb = new CouchbaseClient(uris, "beer-sample", "");        System.out.println("--------------------------------------------------------------------------");        System.out.println("Breweries (by_name) with docs & JSON parsing");        View view = cb.getView("test", "by_country");        Query query = new Query();        int docsPerPage = 5;
        Paginator paginatedQuery = cb.paginatedQuery(view, query, docsPerPage);        int pageCount = 0;        while(paginatedQuery.hasNext()) {            pageCount++;            System.out.println(" -- Page "+ pageCount +" -- ");            ViewResponse response =;            for (ViewRow row : response) {                System.out.println(row.getKey() + " : " + row.getId());            }            System.out.println(" -- -- -- ");        }                System.out.println("\n\n");        cb.shutdown(10, TimeUnit.SECONDS);    } catch (Exception e) {        System.err.println("Error connecting to Couchbase: " + e.getMessage());    }}

private static void configure() {
    for(Handler h : Logger.getLogger("com.couchbase.client").getParent().getHandlers()) {        if(h instanceof ConsoleHandler) {            h.setLevel(Level.OFF);        }    }    Properties systemProperties = System.getProperties();    systemProperties.put("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.SunLogger");    System.setProperties(systemProperties);
    Logger logger = Logger.getLogger("com.couchbase.client");    logger.setLevel(Level.OFF);    for(Handler h : logger.getParent().getHandlers()) {        if(h instanceof ConsoleHandler){            h.setLevel(Level.OFF);        }    }}

So as you can see you can easily paginate on the results of a Query using the Java Paginator.

  • At the line #37, the Paginator is created from using the view and query objects and a page size is specified
  • Then you just need to use the hasNext() and next() methods to navigate in the results.

The Java Paginator is aware of the fact that they query is using a reduce or not, so you can use it with all type of queries - Internally it will switch between the skip/limit approach and the doc_id approaches. You can see how it is done in the Paginator class.

Note that if you want to do that in a Web application between HTTP request you must keep the Paginator object in the user session since the current API keeps the current page in its state.


In this blog post you have learned how to deal with pagination in Couchbase views; to summarize

  • The pagination is based on some specific parameters that you send when executing a query.
  • Java developers can use the Paginator class that simplifies pagination.

I am inviting you to look at the new Couchbase Query Language N1QL, still under development, that will provide more options to developers including pagination, using LIMIT & OFFSET parameters, for example:

SELECT fname, ageFROM tutorialWHERE age > 30LIMIT 2OFFSET 2

If you want to learn more about N1QL:

· 7 min read


Developers are often asking me how to "version" documents with Couchbase 2.0. The short answer is: the clients and server do not expose such feature, but it is quite easy to implement.

In this article I will use a basic approach, and you will be able to extend it depending of your business requirements.