Tug’s Blog

My journey in Big Data, Hadoop, NoSQL and MapR

Introduction to MongoDB Geospatial Feature

| Comments

This post is a quick and simple introduction to Geospatial feature of MongoDB 2.6 using simple dataset and queries.

Storing Geospatial Informations

As you know you can store any type of data, but if you want to query them you need to use some coordinates, and create index on them. MongoDB supports three types of indexes for GeoSpatial queries:

  • 2d Index : uses simple coordinate (longitude, latitude). As stated in the documentation: The 2d index is intended for legacy coordinate pairs used in MongoDB 2.2 and earlier. For this reason, I won’t detail anything about this in this post. Just for the record 2d Index are used to query data stored as points on a two-dimensional plane
  • 2d Sphere Index : support queries of any geometries on an-earth-like sphere, the data can be stored as GeoJSON and legacy coordinate pairs (longitude, latitude). For the rest of the article I will use this type of index and focusing on GeoJSON.
  • Geo Haystack : that are used to query on very small area. It is today less used by applications and I will not describe it in this post. So this article will focus now on the 2d Sphere index with GeoJSON format to store and query documents.

So what is GeoJSON?

You can look at the http://geojson.org/ site, let’s do a very short explanation. GeoJSON is a format for encoding, in JSON, a variety of geographic data structures, and support the following types: Point , LineString , Polygon , MultiPoint , MultiLineString , MultiPolygon and Geometry.

The GeoJSON format is quite straightforward based, for the simple geometries, on two attributes: type and coordinates. Let’s take some examples:

The city where I spend all my childhood, Pleneuf Val-André, France, has the following coordinates (from Wikipedia)

48° 35′ 30.12″ N, 2° 32′ 48.84″ W

This notation is a point, based on a latitude & longitude using the WGS 84 (Degrees, Minutes, Seconds) system. Not very easy to use by application/code, this is why it is also possible to represent the exact same point using the following values for latitude & longitude:

48.5917, -2.5469

This one uses the WGS 84 (Decimal Degrees) system. This is the coordinates you see use in most of the application/API you are using as developer (eg: Google Maps/Earth for example)

By default GeoJSON, and MongoDB use these values but the coordinates must be stored in the longitude, latitude order, so this point in GeoJSON will look like:

1
2
3
4
5
6
7
{
  "type": "Point",
  "coordinates": [
  -2.5469,
  48.5917
  ]
}

This is a simple “Point”, let’s now for example look at a line, a very nice walk on the beach :

1
2
3
4
5
6
7
8
9
10
{
  "type": "LineString",
  "coordinates": [
    [-2.551082,48.5955632],
    [-2.551229,48.594312],
    [-2.551550,48.593312],
    [-2.552400,48.592312],
    [-2.553677, 48.590898]
  ]
  }

So using the same approach you will be able to create MultiPoint, MultiLineString, Polygon, MultiPolygon. It is also possible to mix all these in a single document using a GeometryCollection. The following example is a Geometry Collection of MultiLineString and Polygon over Central Park:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "type" : "GeometryCollection",
  "geometries" : [
    {
      "type" : "Polygon",
      "coordinates" : [
[
  [ -73.9580, 40.8003 ],
  [ -73.9498, 40.7968 ],
  [ -73.9737, 40.7648 ],
  [ -73.9814, 40.7681 ],
  [ -73.9580, 40.8003  ]
]
      ]
    },
    {
      "type" : "MultiLineString",
      "coordinates" : [
[ [ -73.96943, 40.78519 ], [ -73.96082, 40.78095 ] ],
[ [ -73.96415, 40.79229 ], [ -73.95544, 40.78854 ] ],
[ [ -73.97162, 40.78205 ], [ -73.96374, 40.77715 ] ],
[ [ -73.97880, 40.77247 ], [ -73.97036, 40.76811 ] ]
      ]
    }
  ]
}

Note: You can if you want test/visualize these JSON documents using the http://geojsonlint.com/ service.

Now what? Let’s store data!

Once you have a GeoJSON document you just need to store it into your document. For example if you want to store a document about JFK Airport with its location you can run the following command:

1
2
3
4
5
6
7
8
9
10
db.airports.insert(
{
  "name" : "John F Kennedy Intl",
  "type" : "International",
  "code" : "JFK",
  "loc" : {
    "type" : "Point",
    "coordinates" : [ -73.778889, 40.639722 ]
  }
}

Yes this is that simple! You just save the GeoJSON as one of the attribute of the document, loc in this example)

Querying Geospatial Informations

Now that we have the data stored in MongoDB, it is now possible to use the geospatial information to do some interesting queries.

For this we need a sample dataset. I have created one using some open data found in various places. This dataset contains the following informations:

  • airports collection with the list of US airport (Point)
  • states collection with the list of US states (MultiPolygon)

I have created this dataset from various OpenData sources ( http://geocommons.com/ , http://catalog.data.gov/dataset ) and use toGeoJSON to convert them into the proper format.

Let’s install the dataset:

  1. Download it from here
  2. Unzip geo.zip file
  3. Restore the data into your mongoDB instance, using the following command
1
mongorestore geo.zip

MongoDB allows applications to do the following types of query on geospatial data:

  • inclusion
  • intersection
  • proximity

Obviously, you will be able to use all the other operator in addition to the geospatial ones. Let’s now look at some concrete examples.

Inclusion

Find all the airports in California. For this you need to get the California location (Polygon) and use the command $geoWithin in the query. From the shell it will look like :

1
2
3
4
5
6
7
8
9
use geo
var cal = db.states.findOne(  {code : "CA"}  );

db.airports.find(
{
  loc : { $geoWithin : { $geometry : cal.loc } }
},
{ name : 1 , type : 1, code : 1, _id: 0 }
);

Result:

1
2
3
4
5
{ "name" : "Modesto City - County", "type" : "", "code" : "MOD" }
...
{ "name" : "San Francisco Intl", "type" : "International", "code" : "SFO" }
{ "name" : "San Jose International", "type" : "International", "code" : "SJC" }
...

So the query is using the “California MultiPolygon” and looks in the airports collection to find all the airports that are in these polygons. This looks like the following image on a map:

You can use any other query features or criteria, for example you can limit the query to international airport only sorted by name :

1
2
3
4
5
6
7
db.airports.find(
{
  loc : { $geoWithin : { $geometry : cal.loc } },
  type : "International"
},
{ name : 1 , type : 1, code : 1, _id: 0 }
).sort({ name : 1 });

Result:

1
2
3
4
5
6
7
{ "name" : "Los Angeles Intl", "type" : "International", "code" : "LAX" }
{ "name" : "Metropolitan Oakland Intl", "type" : "International", "code" : "OAK" }
{ "name" : "Ontario Intl", "type" : "International", "code" : "ONT" }
{ "name" : "San Diego Intl", "type" : "International", "code" : "SAN" }
{ "name" : "San Francisco Intl", "type" : "International", "code" : "SFO" }
{ "name" : "San Jose International", "type" : "International", "code" : "SJC" }
{ "name" : "Southern California International", "type" : "International", "code" : "VCV" }

I do not know if you have looked in detail, but we are querying these documents with no index. You can run a query with the explain() to see what’s going on. The $geoWithin operator does not need index but your queries will be more efficient with one so let’s create the index:

1
db.airports.ensureIndex( { "loc" : "2dsphere" } );

Run the explain and you will se the difference.

Intersection

Suppose you want to know what are all the adjacent states to California, for this we just need to search for all the states that have coordinates that “intersects” with California. This is done with the following query:

1
2
3
4
5
6
7
8
var cal = db.states.findOne(  {code : "CA"}  );
db.states.find(
{
  loc : { $geoIntersects : { $geometry : cal.loc  }  } ,
  code : { $ne : "CA"  }
},
{ name : 1, code : 1 , _id : 0 }
);

Result:

1
2
3
{ "name" : "Oregon", "code" : "OR" }
{ "name" : "Nevada", "code" : "NV" }
{ "name" : "Arizona", "code" : "AZ" }

Same as before $geoIntersect operator does not need an index to work, but it will be more efficient with the following index:

1
db.states.ensureIndex( { loc : "2dsphere" } );

Proximity

The last feature that I want to highlight in this post is related to query with proximity criteria. Let’s find all the international airports that are located at less than 20km from the reservoir in NYC Central Park. For this you will be using the $near operator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
db.airports.find(
{
  loc : {
    $near : {
      $geometry : {
        type : "Point" ,
        coordinates : [-73.965355,40.782865]
      },
      $maxDistance : 20000
    }
  },
  type : "International"
},
{
  name : 1,
  code : 1,
  _id : 0
}
);

Results:

1
2
{ "name" : "La Guardia", "code" : "LGA" }
{ "name" : "Newark Intl", "code" : "EWR"}

So this query returns 2 airports, the closest being La Guardia, since the $near operator sorts the results by distance. Also it is important to raise here that the $near operator requires an index.

Conclusion

In this first post about geospatial feature you have learned:

  • the basic of GeoJSON
  • how to query documents with inclusion, intersection and proximity criteria.

You can now play more with this for example integrate this into an application that expose data into some UI, or see how you can use the geospatial operators into a aggregation pipeline.

db.person.find( { ‘Role’ : ‘DBA’ } )

| Comments

Wow! it has been a while since I posted something on my blog post. I have been very busy, moving to MongoDB, learning, learning, learning…finally I can breath a little and answer some questions.

Last week I have been helping my colleague Norberto to deliver a MongoDB Essentials Training in Paris. This was a very nice experience, and I am impatient to deliver it on my own. I was happy to see that the audience was well balanced between developers and operations, mostly DBA.

What! I still need DBA?

This is a good opportunity to raise a point, or comment a wrong idea: the fact that you are using MongoDB, or any other NoSQL datastore does not mean that you do not need a DBA… Like any project, an administrator is not mandatory, but if you have one it is better. So even when MongoDB is pushed by development team it is very important to understand the way the database works, and how to administer, monitor it.

If you are lucky enough to have real operations teams, with good system and database administrators use them! They are very important for your application.

Most DBA/System Administrators have been maintaining systems in production for many years. They know how to keep your application up and running. They also most of the time experienced many “disasters”, and then recover (I hope).

Who knows, you may encounter big issues with your application and you will be happy to have them on your side at this moment.

“Great, but the DBA is slowing down my development!”

I hear this, sometimes, and I had this feeling in the past to as developer in large organization. Is it true?

Developers and DBA are today, not living in the same worlds:

  • Developers want to integrate new technologies as soon as possible, not only because it is fun and they can brag about it during meetups/conferences; but because these technologies, most of the time, are making them more productive, and offer better service/experience to the consumer
  • DBA, are here to keep the applications up and running! So every time they do not feel confident about a technology they will push back. I think this is natural and I would be probably the same in their position. Like all geeks, they would love to adopt new technologies but they need to understand and trust it before.

System administrators, DBAS look at the technology with a different angle than developers.

Based on this assumption, it is important to bring the operation team as early as possible when the development team wants to integrate MongoDB or any new data store. Having the operation team in the loop early will ease the global adoption of MongoDB in the company.

Personally, and this will show my age, I have seen a big change in the way developers and DBAs are working together.

Back in the 90’s, when the main architecture was based on client/server architecture developers and DBAs where working pretty well togethers; probably because they were speaking the same language: SQL was everywhere. I had regular meetings wit

Then, since mid 2000, mots of applications have moved to a web based architecture , with for example Java middleware, and the developers stopped working with DBAs. Probably because the abstraction data layer provided by the ORM exposed the database as a “commodity” service that is supposed to work: “Hey Mr DBA, my application has been written with the best middleware technology on the market, so now deal with the performance and scalability! I am done!”

Yes it is a cliché, but I am sure that some of you will recognize that.

Nevertheless each time I can, I have been pushing developers to talk more to administrators and look closely to their database!

A new era for operations and development teams

The fast adoption of MongoDB by developers, is a great opportunity to fix what we have broken 10 years ago in large information systems:

  • Let’s talk again!

MongoDB has been built first for developers. The document oriented approach gives lot of flexibility to quickly adapt to change. So anytime your business users need a new feature you can implement it, even if this change impact the data structure. Your data model is now driven and controlled by the application, not the database engine.

However, the applications still need to be available 24x7, and performs well. These topics are managed - and shared- by administrator and developers! This has been always the case but, as I described it earlier, it looks like some of us have forgotten that.

Schemas design, change velocity, are driven by the application, so the business and development teams, but all this impacts the database, for example:

  • How storage will grow ?
  • Which indexes must be created to speed up my application?
  • How to organize my cluster to leverage the infrastructure properly:
    • Replica-Set organization (and related write concerns, managed by developer)
    • Sharding options
  • And the most important of them : backup/recovery strategies

So many things that could be managed by the project team, but if you have an operation team with you, it will be better to do that as a single team.

You, the developer, are convinced that MongoDB is the best database for your projects! Now it is time to work with the ops team and convince them too. So you should for sure explain why MongoDB is good for you as developer, but also you should highlight all the benefits for the operations, starting with built-in high-availability with replica sets, and easy scalability with sharding. MongoDB is also here to make the life of the administrator easier! I have shared in the next paragraph a lit of resources that are interesting for operations people.

Let’s repeat it another time, try to involve the operation team as soon as possible, and use that as an opportunity to build/rebuild the relationship between developers and system administrators!

Resources

You can find many good resources on the Site to helps operations or learn about this:

Pagination With Couchbase

| Comments

If you have to deal with a large number of documents when doing queries against a Couchbase cluster it is important to use pagination to get rows by page. You can find some information in the documentation in the chapter “Pagination”, but I want to go in more details and sample code in this article.

For this example I will start by creating a simple view based on the beer-sample dataset, the view is used to find brewery by country:

1
2
3
4
5
function (doc, meta) {
  if (doc.type == "brewery" && doc.country){
    emit(doc.country);
  }
}

This view list all the breweries by country, the index looks like:

Doc idKeyValue
bersaglierArgentinanull
cervecera_jeromeArgentinanull
brouwerij_nacional_balashiArubanull
australian_brewing_corporationAustralianull
carlton_and_united_breweriesAustralianull
coopers_breweryAustralianull
foster_s_australia_ltdAustralianull
gold_coast_breweryAustralianull
lion_nathan_australia_hunter_streetAustralianull
little_creatures_breweryAustralianull
malt_shovel_breweryAustralianull
matilda_bay_brewingAustralianull
yellowstone_valley_brewingUnited Statesnull
yuengling_son_brewingUnited Statesnull
zea_rotisserie_and_breweryUnited Statesnull
fosters_tien_gangViet Namnull
hue_breweryViet Namnull

So now you want to navigate in this index with a page size of 5 rows.

How to Implement Document Versioning With Couchbase

| Comments

Introduction

Developers are often asking me how to “version” documents with Couchbase 2.0. The short answer is: the clients and server do not expose such feature, but it is quite easy to implement.

In this article I will use a basic approach, and you will be able to extend it depending of your business requirements.

Deploy Your Node/Couchbase Application to the Cloud With Clever Cloud

| Comments

Introduction

Clever Cloud is the first PaaS to provide Couchbase as a service allowing developers to run applications in a fully managed environment. This article shows how to deploy an existing application to Clever Cloud.

I am using a very simple Node application that I have documented in a previous article: “Easy application development with Couchbase, Angular and Node”.

Clever Cloud provides support for various databases MySQL, PostgreSQL, but also and this is most important for me Couchbase. No only Clever Cloud allows you to use database services but also you can deploy and host your application that could be developed in the language/technology of your choice : Java, Node, Scala, Python, PHP, … and all this in a secure, scalable and managed environment.

SQL to NoSQL : Copy Your Data From MySQL to Couchbase

| Comments

TL;DR: Look at the project on Github.

Introduction

During my last interactions with the Couchbase community, I had the question how can I easily import my data from my current database into Couchbase. And my answer was always the same:

  • Take an ETL such as Talend to do it
  • Just write a small program to copy the data from your RDBMS to Couchbase…

So I have written this small program that allows you to import the content of a RDBMS into Couchbase. This tools could be used as it is, or you can look at the code to adapt it to your application.

The Tool: Couchbase SQL Importer

The Couchbase SQL Importer, available here, allows you with a simple command line to copy all -or part of- your SQL schema into Couchbase. Before explaining how to run this command, let’s see how the data are stored into Couchbase when they are imported:

  • Each table row is imported a single JSON document
    • where each table column becomes a JSON attribute
  • Each document as a key made of the name of the table and a counter (increment)

The following concrete example, based on the MySQL World sample database, will help you to understand how it works. This database contains 3 tables : City, Country, CountryLanguage. The City table looks like:

1
2
3
4
5
6
7
8
9
+-------------+----------+------+-----+---------+----------------+
| Field       | Type     | Null | Key | Default | Extra          |
+-------------+----------+------+-----+---------+----------------+
| ID          | int(11)  | NO   | PRI | NULL    | auto_increment |
| Name        | char(35) | NO   |     |         |                |
| CountryCode | char(3)  | NO   |     |         |                |
| District    | char(20) | NO   |     |         |                |
| Population  | int(11)  | NO   |     | 0       |                |
+-------------+----------+------+-----+---------+----------------+

The JSON document that matches this table looks like the following:

1
2
3
4
5
6
7
8
city:3805
{
  "Name": "San Francisco",
  "District": "California",
  "ID": 3805,
  "Population": 776733,
  "CountryCode": "USA"
}

You see that here I am simply taking all the rows and “moving” them into Couchbase. This is a good first step to play with your dataset into Couchbase, but it is probably not the final model you want to use for your application; most of the time you will have to see when to use embedded documents, list of values, .. into your JSON documents.

In addition to the JSON document the tool create views based on the following logic:

  • a view that list all imported documents with the name of the “table” (aka type) as key
  • a view for each table with the primary key columns

View: all/by_type

1
2
3
4
5
6
7
{
  "rows": [
  {"key": "city", "value": 4079},
  {"key": "country", "value": 239},
  {"key": "countrylanguage", "value": 984}
  ]
}

As you can see this view allows you to get with a single Couchbase query the number of document by type.

Also for each table/document type, a view is created where the key of the index is built from the table primary key. Let’s for example query the “City” documents.

View: city/by_pk?reduce=false&limit=5

1
2
3
4
5
6
7
8
9
10
{
  "total_rows": 4079,
  "rows": [
  {"id": "city:1", "key": 1, "value": null},
  {"id": "city:2", "key": 2, "value": null},
  {"id": "city:3", "key": 3, "value": null},
  {"id": "city:4", "key": 4, "value": null},
  {"id": "city:5", "key": 5, "value": null}
  ]
}

The index key matches the value of the City.ID column. When the primary key is made of multiple columns the key looks like:

View: CountryLanguage/by_pk?reduce=false&limit=5

1
2
3
4
5
6
7
8
9
10
{
  "total_rows": 984,
  "rows": [
  {"id": "countrylanguage:1", "key": ["ABW", "Dutch"], "value": null},
  {"id": "countrylanguage:2", "key": ["ABW", "English"], "value": null},
  {"id": "countrylanguage:3", "key": ["ABW", "Papiamento"], "value": null},
  {"id": "countrylanguage:4", "key": ["ABW", "Spanish"], "value": null},
  {"id": "countrylanguage:5", "key": ["AFG", "Balochi"], "value": null}
  ]
}

This view is built from the CountryLanguage table primary key made of CountryLanguage.CountryCode and CountryLanguage.Language` columns.

1
2
3
4
5
6
7
8
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3)       | NO   | PRI |         |       |
| Language    | char(30)      | NO   | PRI |         |       |
| IsOfficial  | enum('T','F') | NO   |     | F       |       |
| Percentage  | float(4,1)    | NO   |     | 0.0     |       |
+-------------+---------------+------+-----+---------+-------+

How to use Couchbase SQL Importer tool?

The importer is a simple Java based command line utility, quite simple to use:

1- Download the CouchbaseSqlImporter.jar file from here. This file is contains all the dependencies to work with Couchbase: the Java Couchbase Client, and GSON. 2- Download the JDBC driver for the database you are using as data source. For this example I am using MySQL and I have download the driver for MySQL Site. 3- Configure the import using a properties file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## SQL Information ##
sql.connection=jdbc:mysql://192.168.99.19:3306/world
sql.username=root
sql.password=password

## Couchbase Information ##
cb.uris=http://localhost:8091/pools
cb.bucket=default
cb.password=

## Import information
import.tables=ALL
import.createViews=true
import.typefield=type
import.fieldcase=lower

This sample properties file contains three sections :

  • The two first sections are used to configure the connections to your SQL database and Couchbase cluster (note that the bucket must be created first)
  • The third section allow you to configure the import itself

4- Run the tool !

1
java -cp "./CouchbaseSqlImporter.jar:./mysql-connector-java-5.1.25-bin.jar" com.couchbase.util.SqlImporter import.properties

So you run the Java command with the proper classpath (-cp parameter).

And you are done, you can get your data from your SQL database into Couchbase.

If you are interested to see how it is working internally, you can take a look to the next paragraph.

The Code: How it works?

The main class of the tool is really simple com.couchbase.util.SqlImporter, the process is:

  1. Connect to the SQL database
  2. Connect to Couchbase
  3. Get the list of tables
  4. For each tables execute a “select * from table” 4.1. Analyze the ResultSetMetadata to get the list of columns 4.2. Create a Java map for each rows where the key is the name of the columns and the value…is the value 4.3. Serialize this Map into a GSON document and save it into Couchbase

The code is available in the ImportTable(String table) Java method.

One interesting point is that you can use and extend the code to deal with your application.

Conclusion

I have created this tool quickly to help some people in the community, if you are using it and need new features, let me know, using comment or pull request.

Create a Couchbase Cluster in Less Than a Minute With Ansible

| Comments

TL;DR: Look at the Couchbase Ansible Playbook on my Github.

Introduction

When I was looking for a more effective way to create my cluster I asked some sysadmins which tools I should use to do it. The answer I got during OSDC was not Puppet, nor Chef, but was Ansible.

This article shows you how you can easily configure and create a Couchbase cluster deployed and many linux boxes…and the only thing you need on these boxes is an SSH Server!

Thanks to Jan-Piet Mens that was one of the person that convinced me to use Ansible and answered questions I had about Ansible.

You can watch the demonstration below, and/or look at all the details in the next paragraph.

Ansible

Ansible is an open-source software that allows administrator to configure and manage many computers over SSH.

I won’t go in all the details about the installation, just follow the steps documented in the Getting Started Guide. As you can see from this guide, you just need Python and few other libraries and clone Ansible project from Github. So I am expecting that you have Ansible working with your various servers on which you want to deploy Couchbase.

Also for this first scripts I am using root on my server to do all the operations. So be sure you have register the root ssh keys to your administration server, from where you are running the Ansible scripts.

Create a Couchbase Cluster

So before going into the details of the Ansible script it is interesting to explain how you create a Couchbase Cluster. So here are the 5 steps to create and configure a cluster:

  1. Install Couchbase on each nodes of the cluster, as documented here.
  2. Take one of the node and “initialize” the cluster, using cluster-init command.
  3. Add the other nodes to the cluster, using server-add command.
  4. Rebalance, using rebalance command.
  5. Create a Bucket, using bucket-create command.

So the goal now is to create an Ansible Playbook that does these steps for you.

Ansible Playbook for Couchbase

The first think you need is to have the list of hosts you want to target, so I have create a hosts file that contains all my server organized in 2 groups:

1
2
3
4
5
6
[couchbase-main]
vm1.grallandco.com

[couchbase-nodes]
vm2.grallandco.com
vm3.grallandco.com

The group [couchbase-main] group is just one of the node that will drive the installation and configuration, as you probably already know, Couchbase does not have any master… All nodes in the cluster are identical.

To ease the configuration of the cluster, I have create another file that contains all parameters that must be sent to all the various commands. This file is located in the group_vars/all see the section Splitting Out Host and Group Specific Data in the documentation.

1
2
3
4
5
6
7
8
9
10
11
# Adminisrator user and password
admin_user: Administrator
admin_password: password

# ram quota for the cluster
cluster_ram_quota: 1024

# bucket and replicas
bucket_name: ansible
bucket_ram_quota: 512
num_replicas: 2

Use this file to configure your cluster.

Let’s describe the playbook file :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
- name: Couchbase Installation
hosts: all
user: root

tasks:

- name: download Couchbase package
get_url: url=http://packages.couchbase.com/releases/2.0.1/couchbase-server-enterprise_x86_64_2.0.1.deb dest=~/.

- name: Install dependencies
apt: pkg=libssl0.9.8 state=present

- name: Install Couchbase .deb file on all machines
shell: dpkg -i ~/couchbase-server-enterprise_x86_64_2.0.1.deb

As expected, the installation has to be done on all servers as root then we need to execute 3 tasks:

  1. Download the product, the get_url command will only download the file if not already present
  2. Install the dependencies with the apt command, the state=present allows the system to only install this package if not already present
  3. Install Couchbase with a simple shell command. (here I am not checking if Couchbase is already installed)

So we have now installed Couchbase on all the nodes. Let’s now configure the first node and add the others:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
- name: Initialize the cluster and add the nodes to the cluster
hosts: couchbase-main
user: root

tasks:
- name: Configure main node
shell: /opt/couchbase/bin/couchbase-cli cluster-init -c 127.0.0.1:8091  --cluster-init-username=${admin_user} --cluster-init-password=${admin_password} --cluster-init-port=8091 --cluster-init-ramsize=${cluster_ram_quota}

- name: Create shell script for configuring main node
action: template src=couchbase-add-node.j2 dest=/tmp/addnodes.sh mode=750

- name: Launch config script
action: shell /tmp/addnodes.sh

- name: Rebalance the cluster
shell: /opt/couchbase/bin/couchbase-cli rebalance -c 127.0.0.1:8091 -u ${admin_user} -p ${admin_password}

- name: create bucket ${bucket_name} with ${num_replicas} replicas
shell: /opt/couchbase/bin/couchbase-cli bucket-create -c 127.0.0.1:8091 --bucket=${bucket_name} --bucket-type=couchbase --bucket-port=11211 --bucket-ramsize=${bucket_ram_quota}  --bucket-replica=${num_replicas} -u ${admin_user} -p ${admin_password}

Now we need to execute specific taks on the “main” server:

  • Initialization of the cluster using the Couchbase CLI, on line 06 and 07

Then the system needs to ask all other server to join the cluster. For this the system needs to get the various IP and for each IP address execute the add-server command with the IP address. As far as I know it is not possible to get the IP address from the main playbook YAML file, so I ask the system to generate a shell script to add each node and execute the script.

This is done from the line 09 to 13.

To generate the shell script, I use Ansible Template, the template is available in the couchbase-add-node.j2 file.

1
2
3

As you can see this script loop on each server in the [couchbase-nodes] group and use its IP address to add the node to the cluster.

Finally the script rebalance the cluster (line 16) and add a new bucket (line 19).

You are now ready to execute the playbook using the following command :

1
./bin/ansible-playbook -i ./couchbase/hosts ./couchbase/couchbase.yml -vv

I am adding the -vv parameter to allow you to see more information about what’s happening during the execution of the script.

This will execute all the commands described in the playbook, and after few seconds you will have a new cluster ready to be used! You can for example open a browser and go to the Couchase Administration Console and check that your cluster is configured as expected.

As you can see it is really easy and fast to create a new cluster using Ansible.

I have also create a script to uninstall properly the cluster.. just launch

1
./bin/ansible-playbook -i ./couchbase/hosts ./couchbase/couchbase-uninstall.yml

Six Months as Technical Evangelist at Couchbase

| Comments

Already 6 months! Already 6 months that I have joined Couchbase as Technical Evangelist. This is a good opportunity to take some time to look back.

So first of all what is a Developer/Technical Evangelist?

Hmm it depends of each company/product, but let me tell you what it is for me, inside Couchbase. This is one of the most exciting job I ever had. And I think it is the best job you can have when you are passionate about technology, and you like to share this passion with others. So my role as Technical Evangelist is to help the developers to adopt NoSQL technologies in general, and as you can guess Couchbase in particular.

Let’s now see in more details what I have done during these past six months and why I am so happy about it. I have organized the different activities in three types:

  • Outbound activities : meet the developers
  • Online activities : reach even more developers
  • Inbound Activities : make the product better !

Outbound activities : meet the developers !

A large part of my activities for this first semester was made of conferences and meetups. All these events are great opportunities for me to talk about NoSQL and get more people to use Couchbase Server 2.0, here a short list of what I have done:

  • participated to many Couchbase Developer Days in various cities (Portland, Seattle, Vancouver, Oslo, Copenhagen, Stockholm, Munich, Amsterdam, Barcelona, Paris, …), these are one day workshops where I am helping developers to get their hands dirty on Couchbase
  • participated to Couchconf Berlin and Couchbase [UK] our main European events where I met many Customer and key members of the community
  • submitted talks to conferences and adapt them to the conference, then spoken in various conferences about NoSQL and Couchbase (33Degree Warsaw, NoSQL & Big Data Israel, Devoxx France, NoSQL Matters, and many others).
  • met many developers during user groups and meetups. I have to say that I have been very active there, and quite happy to see that NoSQL is a very hot topic for developers, and this in all languages.
  • delivered BrowBagLunches to various technical teams in companies.

Yes! Be a Technical Evangelist means, at least for me, be on the road. It is very nice to meet developers from various countries, different cultures, languages, and… this also means tasting many different types of food!

Another interesting thing when you work on a database/infrastructure layer is the fact that it is technology agnostic; you can access Couchbase with multiple programming languages: Java, .Net,Javascript/Node, Ruby, PHP, Python, C, … and even Go. So with this job I met developers with different backgrounds and views about application development. So yes when I am at a conference or meetup, I am suppose to “teach” something to people, but I have also learned a lot of things, and still doing it.

Online activities : reach even more developers!

Meeting developers during conferences is great but it, it is also very important to produce content to reach even more people, so I have :

  • written blog post about Couchbase usage, most of them based on feedback/questions from the community
  • created sample code to show how it works
  • monitored and answered questions on various sites and mailing lists, from Couchbase discussion forums, mailing lists, Stack Overflow, Quora and others…

This task is quite interesting because it is the moment where you can reach many developers and also get feedback from users, and understand how they are using the product. I have to say that I was not as productive as I was expected, mainly because I was traveling a lot during this period.

Another important thing about online activities, is the “Couchbase Community” itself, many users of Couchbase are creating content : blog posts, samples, new applications, or features - for example I am talking with a person that is developing a Dart Client for Couchbase, so as Technical Evangelist I am also working closely with the most active contributor.

Inbound Activities : make the product better !

So the ultimate goal of a Technical Evangelist at Couchbase is to “convert” developers to NoSQL/Couchbase and get them to talk about Couchbase. Meeting them online or during events is a way of achieving this; but it is also great to do it directly with the product. This means participating to the “development” of the product or its ecosystem. Here some of the things that I have done on this topic:

  • talked a lot with the development team, core developers, product managers, architects, … Quite exciting to work with so much smart people and have access to them. During this discussions I was able to comment the roadmap, influence features, but also it is all the time an opportunity to learn new things about Couchbase - and many other things around architecture, programming languages, take a look for example to this nice post from Damien Katz .
  • contributed some code, yes remember Couchbase is an open source project and it is quite easy to participate to the development. Obviously based on my skills I have only help a little bit with the Java and the Javascript SDK. So if like me you are interested to contribute to the project, take a look to this page: “Contributing Changes
  • but the biggest contributions to the products are such like doc reviews, testing and writing bug reports, and this is very important and interesting, since once again it helps a lot with the product adoption by the developers.

So what?

As you can see the Technical Evangelist job is a quite exciting job, and one of the reason I really love it, it is simply because it allows me to do many different things, that are all related to the technology. Six months is still a very short period, I still have many things to learn and to with the team to be successful, such as be more present online (blog, sample code, technical article, screencast, ..), be accepted in more conferences, and code a little more (I have to finish for example the Couchbase Data Provider for Hibernate OGM, and many other ideas around application development experience)

Finally, Couchbase needs you ! This is a good opportunity to say that Couchbase is always looking for talents, especially in the Technical/Developer Evangelist team, so do not hesitate to look at the different job openings and join the team !

Screencast : Fun With Couchbase MapReduce and Twitter

| Comments

I have created this simple screencast to show how you can, using Couchbase do some realtime analysis based on Twitter feed.

The key steps of this demonstration are

  1. Inject Tweets using a simple program available on my Github Couchbase-Twitter-Injector
  2. Create views to index and query the Tweets by
    • User name
    • Tags
    • Date

The views that I used in this demonstration are available at the bottom of this post.

Views:

Easy Application Development With Couchbase, Angular and Node

| Comments

Note : This article has been written in March 2013, since Couchbase and its drivers have a changed a lot. I am not working with/for Couchbase anymore, with no time to update the code.

A friend of mine wants to build a simple system to capture ideas, and votes. Even if you can find many online services to do that, I think it is a good opportunity to show how easy it is to develop new application using a Couchbase and Node.js.

So how to start?

Some of us will start with the UI, other with the data, in this example I am starting with the model. The basics steps are :

  1. Model your documents
  2. Create Views
  3. Create Services
  4. Create the UI
  5. Improve your application by iteration

The sources of this sample application are available in Gihub :

https://github.com/tgrall/couchbase-node-ideas

Use the following command to clone the project locally :

1
git clone https://github.com/tgrall/couchbase-node-ideas.git

Note: my goal is not to provide a complete application, but to describe the key steps to develop an application.