With the advent of Big Data, always connected devices and the push of our lives online it has become increasingly important to evolve the way data is handled. MongoDB (MDB) is a non-relational NoSQL database that stores data as documents rather than records.
MDB’s main characteristics are its high performance, high availability, rich query language and horizontal scaling. MDB is free and open source under the Free Software Foundation’s GNU AGPL v3.0. MDB is currently on version 3.6. HistoryMDB was not conceived out of inspiration but rather out of necessity; in 2007 the founders of MDB originally set out to create an open sourced platform as a service (PAAS) for the cloud. For this service, there were no open sourced databases that had (Metz, 2011) “the ‘cloud computing’ principles you want them to have: elasticity, scalability, and .
.. easy administration, but also ease of use for developers and operators,”. After realizing that a new database would need to be built from the ground up the founders set out to intentionally design the database in a non-relational way opting for a model that would lend itself to a distributed platform. After completing the build of the new database the founders realized the potential of the database and abandoned the PAAS altogether to focus on and launch MDB.
MDB was founded in 2007, open sourced in 2009, reached 20 million downloads in 2016, and now in 2017 has reached over 30 million downloads. Data Model In MDB data is stored in a flexible schema that does not need to be predetermined and unlike SQL databases MDB does not enforce a document structure on the collections. MDB uses a document data model that stores BSON documents which are JSON documents with the binary representation that includes data types such as int, long, float, etc. MDB document structures consist of field/value.
Values can be any BSON data type and support embedded sub-documents and arrays, and arrays of other documents. These documents are then held in collections which are the SQL equivalent to tables. MDB’s arrays and embedded document model eliminate the need for joins by aggregating data within the same document. Aggregating data also increases performance by decreasing the number of reads to memory; when reading from the document model the database only needs to read once from a single document vs having multiple reads across different tables for the same data. This also allows for higher scalability because the data is physically stored as one document they can be easily distributed across a system.Key Features Horizontal Scaling/ Sharding. MDB horizontally scales through the sharding technique. Sharding occurs at the collection – RDBMS equivalent of a table – level.
The collections are sharded into “chunks” that are identified by the unique shard key for that collection. The chunks are then distributed across the shards using a sharded cluster balancer that attempts to balance the chunks evenly across the cluster. When chunks grow beyond the specified size they are split, however, MDB this does not affect the data or the shards unless the chunks create an uneven distribution amongst the shards in which case they will be redistributed. Sharded clusters can continue to operate even if shards are temporarily down and the available shards can be updated. For sharding, MDB offers two options, hashed sharding, and ranged sharding. In hashed sharding, a hash is computed based on the shard key and distributed to a range based on that hashed shard value.
In ranged sharding, the chunks are distributed to a range based on the shard key alone. Rich Query Language/ CRUD operations. MDB provides support for CRUD (create, read, update, delete) operations, data aggregation and bulk writing operations.
In MDB methods are provided for inserting documents to collections that target a single collection and all write operations performed on a document are atomic. For documents with embedded data MDB groups all related data as a single as a single document. The read operation retrieves documents from a collection and the delete operation deletes documents from a single collection.
Bulk write operations are provided that allow applications to write to an entire collection of documents. High Availability. In MDB a replica set is a cluster of MDB serves that implement master-slave replication method on the same data set. In the replica set, there is one primary node that receives all write operations.
The primary node records all changes to its data and operations log (oplog) then secondary nodes replicate this log and make the changes to their datasets. Customers/Use cases. MDB is best suited for applications with real-time and low latency requirements such as mobile apps, content management, and product catalogs. With MDB’s high availability, scalability, performance, and flexible schema companies such as Expedia, Verizon, ADP and Intuit.
All of these organizations have applications with millions of daily users where reliability and scalability are critical. With MDB, ADP created their “ADP Mobile Solutions” app. The app allowed employees to use apps across different platforms to pull up payroll information, tax information and other ADP offerings but also, with MDB, allowed ADP to collect and analyze all the data across the app to improve and personalize the app for them.
In addition to real-time analytics, MDB cut the design time for the new app drastically where in the past ADP would need to design a new schema and transform the data with a relational database. At Intuit, the challenge was to create an analytics platform that would enable Intuit to leverage 10 years worth of data and over 500,000 websites hosted in order to create insights for lead generation and sales conversions. Intuit landed on MBD because of the higher overall performance – 2.5 times faster writes over MySQL – and ease of use for deployment when compared to deploying a Hadoop system. Intuit was able to build a prototype solution within one week using MDB and based on that prototype decided it was the best solution for them.
Strengths/Weaknesses. MongoDB has many features that position it very well for the future of data and applications but it is not always best suited for all situations. The strongest features of MDB are it’s flexible document structure, high performance, high scalability, and high availability. The flexible schema cuts down on design and implementation time but also leaves the database available to receive data that it was not originally intended to hold. Sharding allows for MDB to be horizontally scaled easily and because of the document model replication across multiple nodes is easily accomplished. With embedded documents and arrays write times are much more efficient and can mainly be done in memory without joins.
However, if all the data is not contained in the same document or referenced within that document joins become almost impossible and multiple queries are needed in order to join the data manually. In MDB ACID transactions are not as powerful or well supported. Transactions are atomic at the document level meaning that if there are embedded documents or multiple documents referenced in that document MDB will treat it as one document with an all or nothing approach to modifying multiple documents. MDB is also not well suited for applications requiring complex transactions such as bookkeeping systems or legacy systems that have been built around relational databases. Security. MDB provides several security features including user authentication, role-based access control, and encryption.
Authentication. For authentication, MDB supports SCRAM-SHA-1, MongoDB Challenge and Response, x.509 Certificate Authentication, LDAP proxy authentication and Kerberos.
Role-Based Access Control. Users are defined in an authentication database and are uniquely identified by their name and database they are included in. Users can have permissions to act on other databases. MDB implements role-based access control wherein the user is assigned roles that allow them to act on a database or have access to resources; outside of the roles assigned to a user, the user has no permissions or access to the database.
Encryption. For network, traffic MDB uses TLS/SSL to ensure that the data is only readable by the intended recipient. MDB’s implementation of SSL uses Open SSL libraries and only allows the use of SSL ciphers of a minimum of 128-bit key length for all connections. For data at rest in the database, MDB has 2 approaches to encryption, application level encryption and storage encryption, both can be used independently or in tandem. For it’s default storage encryption, MDB uses 256-bit Advanced Encryption Standard in Cipher Block Chaining VIA OpenSSL. All data files are fully encrypted and only exist unencrypted in memory and during transmission.
For encryption across multiple nodes, it is possible to use the same key but generally not recommended individual keys are used for each node. Encryption can also be placed on a per-field or per-document basis with application level encryption. For this kind of encryption custom encryption and decryption must be used When used together with the transport encryption and storage encryption keep compliance with several standards including HIPPA, PCI-DSS, and FERPA. Figures. Shown in Figure 1 is a common format that serves reports well. Always introduce the figure or table in the text before the figure appears. There is no need to point to the figure, such as see below, this is unnecessary since the location of the figure is assumed to follow the text. ConclusionsOverall MDB is positioned well to take advantage of the swarms of data in today’s mobile first, web first, data-driven environment.
The flexible nature of the document based data model allows efficient storage of new and possibly unknown types of data without needing a change to the schema. The document model also allows for easy scalability with sharding documents across several nodes without losing much performance from the fact that most of the data is stored in or referenced by the document. Along with that data, aggregation time can be significantly reduced for the same reasons. For most modern-day applications no and in the foreseeable future MDB can check off almost all of the boxes that an organization.