After a brief introduction into the history of Database Management Systems different types of NoSQL data stores are characterized. Theoretical background information about sharding mechanisms, horizontal scaling and the CAP theorem are getting explained.
After a comparison of different NoSQL stores you will get to know the pros and cons of the different approaches and you will learn how to take the decision for the best fitting database in your project.
31. KeyValue Store Characteristics
Most simple data model
DB does not care about data types
Similar to persistent hash map
Fast lookups
Easy to distribute
Inspired by Amazon Dynamo paper
Restricted possibilities of querying
31
32. Open Source Advanced KeyValue Store
In-Memory Store with optional durability
Knows types like strings, hashes, lists, sets
BSD License
Implemented in C
Very small footprint (20k LOC for rel. 2.2)
APIs for C/C++, C#, Closure, Lisp, Erlang, Go, Haskell,
Java, JavaScript, Objective-C, Perl, PHP, Python, Ruby, ...
Used at Twitter, Instagram, flickr, stackoverflow, ...
32
33. Open Source KeyValue Store
Highly available and fault-tolerant
Basho Technologies
Apache License
Implemented in Erlang
APIs for Java, Erlang, Ruby, Php, Python, Closure, C#,
C/C++, HTTP, Node.js, Perl, Scala, Smalltalk, ...
Used at Mozilla, Comcast,AOL
33
34. Open Source KeyValue Store
Big, distributed, persistent, fault-tolerant hash table
Developed by LinkedIn
Implemented in Java
Apache 2.0 License
Dynamo Scale Out
Used at LinkedIn
34
37. Document Store Characteristics
You can query into document structure
You can use natural aggregates as documents
You can retrieve portions of a document
You can update portions of a document
You can have links between documents
Compared to key value data model the document is more
transparent
No schema / implicit schema
Some queries are a pain in the neck!
37
38. Open Source Document Store
„Most popular NoSQL database“
Stores JSON like documents
Implemented in C++
GNU AGPL License
APIs for C/C++, C#, Go, Erlang, Java, JavaScript, Node.js,
Perl, PHP, Python, Ruby, Scala, HTTP/REST
Used at Craigslist, eBay, Foursquare, SourceForge,
NYT, ...
38
39. Open Source Document Store
Ease of Use
No update locks
Stores JSON like documents
Implemented in Erlang
Apache License
APIs for JavaScript, MapReduce, HTTP/REST
Used at BBC, Credit Suisse, Meebo, ...
39
40. Open Source Distributed Document Store
Optimized for interactive applications
Merged from Membase and CouchDB
Implemented in C++, Erlang, C
Apache License / Proprietary
APIs for Java, .NET, PHP, Ruby, Python, C
Used at AOL, Cisco, LinkedIn, Salesforce.com, Zynga, ...
40
42. Schemaless
Schemaless is one of the main reasons of interest
in NoSQL databases
Schemaless reduces ceremony
Schemaless increases flexibility
BUT...
42
43. Schemaless means
implicit schema
To query specific attributes
you have to know their names
Schema Managment is shifted from db to code
http://martinfowler.com/articles/schemaless/
43
46. more complicated data model
rich structure
single key (row key)
easy/ fast access to columns/column families in a row
rows can contain 100s or 1000s of columns
aggregate oriented
Column Family Characteristics
46
47. Open Source Wide Column Store
Supports multi data center replication
Good for distributed DBs with massive write loads
Implemented in Java
Apache License 2.0
APIs for C#, C++, Clojure, Erlang, Go, Haskell, Java,
JavaScript, Perl, PHP, Python, Ruby, Scala
Used at CERN, Facebook, Netflix, Rackspace,
SoundCloud,Twitter ...
47
48. Open Source Column Oriented Database
Part of Hadoop, Inspired by Googles BigTable
Implemented in Java
Apache License 2.0
APIs for Restful HTTP,Thrift, C/C++, C#, Groovy, Java,
PHP, Python, Scala
Used at Amazon,Adobe,AOL, Cloudspace, eBay,
Facebook, IBM, Last.fm, LinkedIn, Spotify,Yahoo!, ...
48
52. Graph DBs disassemble things in fragments and
relations
You can do very interesting queries on graph
structures - things you can not event think of in SQL
Good for complex graph structured data
Fast lookups, fast traversing
Whiteboard Friendly
Graph DB Characteristics
52
53. Open Source Graph Database
Embedded, disk-based, fully transactional
Implemented in Java
GPLv3 and AGPLv3 / commercial
APIs for .NET, Clojure, Go, Groovy, Java, JavaScript,
Perl, PHP, Pyhton, Ruby, Scala
Used at Adobe, Cisco,Telekom...
53
54. Open Source Document Database
with Graph oriented extensions
Supports SQL (without join) as query language
Supports ACID transactions
Implemented in Java
Apache License 2.0
Commercial support available
APIs for HTTP/REST, Java, JavaScript, Scala, PHP,
Ruby, .NET, Clojure, Node.js, Python, ...
Used at SKY, Spielo, UltraDNS...
54
58. Hashing Problems
common way of choosing a server:
server = hash(key) mod n
Every object
gets hashed to
a new location!
What happens, if a server goes down?
58
59. Consistent Hashing
Use same hash function for both objects and servers
shards:A, B, C
objects: 1, 2, 3, 4
http://www.tom-e-white.com/2007/11/consistent-hashing.html
59
63. RDBMS will not die
Use a relational database
unless you have good reason not to
63
64. RDBMS have their limits
Vertical scaling is expensive and has hard limits
Horizontal scaling is not possible/ limited
Joins on big and distributed tables too
expenisve/ too slow
Rigid Schema inappropriate for semi
structured/dynamic data (sparse tables)
Consistency is higher rated than availability
64
65. NoSQL come to the rescue
Distribution and scalability are fundamental
design goals of NoSQL DBs
Tradeoff between Consistency,Availability and
horizontal scalability (CAP Theorem, BASE)
Small footprint in favor of ease of use
Outstandingly proven in practice (Google,
Amazon, Facebook, LinkedIn,Twitter, ...)
65
66. There are cons too
Broad spectrum of products is difficult to
understand
You have to get used to designing models for
Key/Value or Column Family stores
Mostly no ad hoc queries
No standards - no portability
Sometimes poor documentation
Few commercial support offers
66
67. RDBMS vs. NoSQL
think about data think about queries
redundancy is bad redundancy is ok
indexes managed by DB manage own indexes
query over relations no joins
always exact results results may be out of date
SQL proprietary APIs
67
70. Polyglot Persistence
NoSQL will break the relational dominance unlike the
OODBMSs in the 80ies
RDBMS is not the one and only option any more
Select the storage technology that best fits your
current situation
Enterprises will use different storage technologies for
different kinds of data
DB is no integration point any more
Apps talk via WebServices and encapsulate their
individual data storage technologies
70
71. NewSQL
The answer of traditional RDBMS vendors to the great
success of NoSQL
Improved RDBMS offer more features and better
scalability
Oracle launches Oracle NoSQL, their own NoSQL DB
based upon a revised Berkley DB
Oracle, Microsoft, Sybase, IBM, Greenplum, Pervuasive
already have a tight Hadoop Integration
„Can‘t fight it? Embrace it!“
71