Mongodb Databases Documents Collections
In-Depth Exploration
Programming
Database
Data Management
Open Source
Database Management System
Computer Science
Database Management Systems
database
No matter what database we are learning, it's essential to grasp its fundamental concepts. In MongoDB, the basic building blocks are documents, collections, and databases. Let's explore them one by one.
The following table will help you better understand some of the key concepts in MongoDB:
| SQL Term/Concept | MongoDB Term/Concept | Explanation |
| --- | --- | --- |
| database | database | Database |
| table | collection | Database table/collection |
| row | document | Data record row/document |
| column | field | Data column/field |
| index | index | Index |
| table joins | | Table joins; MongoDB does not support this |
| primary key | primary key | Primary key; MongoDB automatically sets the _id field as the primary key |
Through the example diagram below, we can also gain a more intuitive understanding of some MongoDB concepts:
!(#)
**Complete Glossary of Terms:**
* **Document**: The basic data unit in MongoDB, typically a JSON-like structure that can contain various data types.
* **Collection**: Similar to a table in relational databases, a collection is a container for a group of documents. In MongoDB, documents within a collection do not need to adhere to a fixed schema.
* **Database**: A MongoDB instance that contains one or more collections.
* **BSON**: Short for Binary JSON, BSON is the binary-encoded format MongoDB uses to store and transmit documents.
* **Index**: A data structure used to optimize query performance. Indexes can be created on one or more fields within a collection.
* **Sharding**: A method of distributing data across multiple servers (called shards) to handle large datasets and high-throughput applications.
* **Replica Set**: A set of MongoDB servers that maintain identical datasets, providing redundancy and high availability.
* **Primary Node**: The server in a replica set responsible for handling all write operations.
* **Secondary Node**: A server in a replica set used for reading data and taking over as the primary node in case of failure.
* **MongoDB Shell**: The command-line interface provided by MongoDB for interacting with MongoDB instances.
* **Aggregation Framework**: A series of operations used to perform complex data processing and aggregation tasks.
* **Map-Reduce**: A programming model for parallel computation designed to process large datasets.
* **GridFS**: A specification for storing and retrieving files larger than the BSON document size limit.
* **ObjectId**: A unique identifier automatically generated by MongoDB for each document.
* **CRUD Operations**: Create, Read, Update, Delete operations.
* **Transactions**: Supported since MongoDB 4.0, allowing a set of operations to be executed as an atomic unit.
* **Operators**: Special fields used to query and update documents.
* **Joins**: MongoDB allows SQL-like join operations using the `$lookup` operator in queries.
* **TTL (Time-To-Live)**: TTL can be set for certain fields in a collection to automatically delete old data.
* **Storage Engine**: The underlying technology MongoDB uses for data storage and management, such as WiredTiger and MongoDBβs older MMAPv1 storage engine.
* **MongoDB Compass**: A graphical user interface tool for visualizing and managing MongoDB data.
* **MongoDB Atlas**: A cloud service provided by MongoDB that allows hosting MongoDB databases in the cloud.
---
## Databases
A single MongoDB instance can host multiple databases.
If no database is specified during operations, MongoDB defaults to a database named `test`, which is stored in the `data` directory.
A single MongoDB instance can accommodate several independent databases, each with its own collections and permissions, and different databases are stored in separate files.
Starting from MongoDB 4.0, multi-document transactions are supported within a single database.
The `show dbs` command displays a list of all databases.
```bash
$ ./mongo
MongoDB shell version: 3.0.6
connecting to: test
> show dbs
local 0.078GB
test 0.078GB
>
Executing the `db` command shows the current database object or collection.
```bash
$ ./mongo
MongoDB shell version: 3.0.6
connecting to: test
> db
test
>
Running the `use` command connects to a specified database.
```bash
> use local
switched to db local
> db
local
>
In the above examples, `local` is the database you are connecting to.
In the next section, we will discuss the usage of MongoDB commands in detail.
Databases are identified by their names. A database name can be any UTF-8 string that meets the following criteria:
* It cannot be an empty string (`""`).
* It must not contain spaces (` `), periods (`.`), dollar signs (`$`), forward slashes (`/`), backslashes (``), or null characters (``).
* It must be entirely lowercase.
* It can be up to 64 bytes long.
Some database names are reserved and provide access to special-purpose databases:
* **admin**: From a permissions perspective, this is the "root" database. Adding a user to this database automatically grants that user permissions across all databases. Certain server-side administrative commands can only be run from this database, such as listing all databases or shutting down the server.
* **local**: This database is never replicated and can be used to store any collections restricted to a single local server.
* **config**: When MongoDB is used for sharding, the `config` database is internally used to store shard-related information.
---
## Documents
A document is a set of key-value pairs (i.e., BSON). MongoDB documents do not require a uniform schema, and even the same field can hold different data typesβthis is a major difference from relational databases and a defining feature of MongoDB.
Hereβs a simple example of a document:
```json
{"site":"www..com", "name":""}
The following table compares terms between RDBMS and MongoDB:
| RDBMS | MongoDB |
| --- | --- |
| Database | Database |
| Table | Collection |
| Row | Document |
| Column | Field |
| Table Join | Embedded Documents |
| Primary Key | Primary Key (MongoDB provides `_id` as the primary key) |
| Database Server and Client |
| MySQLd/Oracle | MongoDB |
| mysql/sqlplus | mongo |
Key points to note:
1. Key-value pairs in a document are ordered.
2. Values in a document can be strings enclosed in double quotes, as well as several other data types (even entire embedded documents).
3. MongoDB distinguishes between types and case sensitivity.
4. MongoDB documents cannot have duplicate keys.
5. Document keys are strings. With a few exceptions, keys can use any UTF-8 character.
Document key naming conventions:
* Keys cannot contain `` (null character), which marks the end of a key.
* Periods (`.`) and dollar signs (`$`) have special meanings and should only be used in specific contexts.
* Keys starting with an underscore (`_`) are reserved (though not strictly required).
---
## Collections
A collection is a group of MongoDB documents, similar to a table in an RDBMS (Relational Database Management System).
Collections exist within a database and have no fixed structure, meaning you can insert data of varying formats and types into a collection. However, the data inserted into a collection usually exhibits some degree of relatedness.
For example, you can insert the following documents into a collection:
```json
{"site":"www.baidu.com"}
{"site":"www.google.com","name":"Google"}
{"site":"www..com","name":"","num":5}
When the first document is inserted, the collection is created.
### Valid Collection Names
* A collection name cannot be an empty string `""`.
* A collection name cannot contain `` (null character), which signifies the end of the name.
* A collection name cannot start with `system.`, as this prefix is reserved for system collections.
* User-created collection names should avoid reserved characters. Some drivers do allow these characters in collection names because certain system-generated collections include them. Unless youβre accessing such system-created collections, avoid using `$` in your collection name.
Example:
```javascript
db.col.findOne()
### Capped Collections
Capped collections are collections with a fixed size.
They offer high performance and expiration based on insertion order, somewhat similar to the concept of "RRD."
Capped collections efficiently maintain the insertion order of objects. They are ideal for logging and similar use cases. Unlike regular collections, you must explicitly create a capped collection and specify its size in bytes; the storage space is allocated upfront.
Capped collections preserve documents in the order they were inserted, and their physical storage location on disk also follows this sequence. Therefore, when updating documents in a capped collection, the updated document cannot exceed the size of the original, ensuring all documents remain in their original positions on disk.
Because capped collections determine insertion order based on the sequence of insertion rather than indexes, they improve data insertion efficiency. MongoDBβs operation log file, `oplog.rs`, is implemented using capped collections.
Note that the specified storage size includes the database header information.
```javascript
db.createCollection("mycoll", {capped:true, size:100000})
* You can add new documents to a capped collection.
* Updates are allowed, but the total storage size cannot increase. If it does, the update fails.
* You cannot delete individual documents from a capped collection, though you can remove all rows in the collection using the `drop()` method.
* After deletion, you must explicitly recreate the collection.
* On 32-bit systems, the maximum storage capacity of a capped collection is 1 billion bytes (1 Γ 10βΉ).
---
## Metadata
Database metadata is stored in specific collections using the system namespace:
`dbname.system.*`
In MongoDB, namespaces like `dbname.system.*` are special collections containing various system-level information, as shown below:
| Namespace | Description |
| --- | --- |
| dbname.system.namespaces | Lists all namespaces. |
| dbname.system.indexes | Lists all indexes. |
| dbname.system.profile | Contains profile information about the database. |
| dbname.system.users | Lists all users who can access the database. |
| dbname.local.sources | Contains information and status of replication slave servers. |
There are restrictions on modifying objects within system collections:
You can insert data into `system.indexes` to create indexes, but otherwise, this table is immutable (special `drop index` commands automatically update related information).
`system.users` is editable, while `system.profile` is deletable.
---
## MongoDB Data Types
The following table lists commonly used data types in MongoDB:
| Data Type | Description |
| --- | --- |
| String | Strings are commonly used to store textual data. In MongoDB, only UTF-8 encoded strings are valid. |
| Integer | Integer values used to store numeric data. Depending on your server configuration, they can be either 32-bit or 64-bit. |
| Boolean | Boolean values used to store true/false states. |
| Double | Double-precision floating-point numbers used to store decimal values. |
| Min/Max Keys | Compares a value against the minimum and maximum elements of BSON (binary JSON). |
| Array | Used to store arrays, lists, or multiple values under a single key. |
| Timestamp | Records the exact time a document was modified or added. |
| Object | Used for embedding documents. |
| Null | Used to create null values. |
| Symbol | Essentially equivalent to the string type, but distinct.
YouTip