we`re all mad here

nosql for beginners

In beginner, nosql on August 25, 2010 at 2:04 pm

one of the most frequent question that people use to ask me about nosql is: what is the best nosql tool that enables me start with using my programming language (java, .net, php, python, etc..)? its almost impossible to have a quick answer 'cos it involves many things like: data-model, durability and usage scenario (single node, qty of nodes or cloud setup), language binding and easy to setup.

during this post i'll try to answer this question as succinct and objective as i can. this post is divided into 4 sections that will guide you to the answer ;) – one important point that i'll not cover during this post is the administration of those tools, but definitely is a really important thing to keep in mind when you select any tool.

note that this post is written for beginner not for dummies ;)

data model

once we start to talk about data storage (nosql or not) we need to think about our needs… these needs enables us to model our data. and here is the first thing that you need to understand: a nosql can be schemaless (in fact they are more schemalast but this is a theme for other blog post) but you still need to model your data!

when talking about data modeling into nosql space… it varies a lot. if you're using a document oriented solution you need to model your documents, if you gonna use a key-value tool… your design is completely different the same happens with graphs or tabular solutions. second tip: as less resources you have.. its more difficult to model (ie. key-value is harder than documents), this tip can be a bit polemic (i'll try to cover this theme on a new blog post).

data manipulation

in tha same time that you are analysing your data to model, you need to think about your data manipulation needs. if you need a really flexible ad-hoc query engine… key-value is not for you! mongodb would be a better solution.

queries are usually easy to execute on documents or graphs. using key-value you can just query a key. and most common tabular solutions are also based just on keys (the returned data will be collection of columns).

usage scenario 

the third point that you should understand when thinking about nosql tool is your usage scenario (if you just wanna play with and are not think about delivery anything.. you could escape this step, but it's good to understand it anyway).

if you plan to use it in a single machine (what we call single node) you should not choose solutions that durability is based on data replica (like mongodb or riak). if are planning to deploy your solution into the cloud.. you should care if you're planning to use a graph database because its really hard to scale out this kind of data structure.

now it's time to choose

after considering the above points, you are almost ready to select your nosql tool. what's missing? its necessary to know if the tools that you are looking for has a driver to your language (or, if there is no driver, if you can still access it using, for instance, http).

i tried to create a table that shows the most important/used/twitter/commented nosql tools available that are easy to setup. there are lot's of other tools (probably i'm missing several ones here) but to start with it's easier to have a small list to choose.

 

  name durability mode java ruby python php .net http
document mongodb based on replica yes yes yes yes yes yes
couchdb single node yes yes yes yes yes yes
ravendb single node no no no no yes yes
key-value redis

in memory, but can serialize on disk

yes yes yes yes yes no
riak based on replica yes yes yes yes yes yes
tabular cassandra based on replica yes yes yes yes yes no
graph neo4j single node yes yes yes yes no yes
sones single node no no no no yes yes

 

other important thing to keep in mind: you nosql tool does not necessarily should be developed using your preferred language. as we don't select mysql, postgres, oracle or sql server 'cos they were written using a specific language… pick the one that fits your needs – specially you from .net ;) !

i'm writing an article for java magazine that will cover more deeply all these aspects giving examples on how to model data or how identify your usage scenarios. that's it! i hope that this post help you start with nosql ;)

special thanks for my tech reviewer @feuteston ;)

cheers!

note: there are so many drivers to these projects, i just picked one (alleatory) that you can start with. to check other drivers you can google it or check the project website.