As a Database Administrator (DBA), one of the primary responsibilities is to ensure the stability and performance of our database system. MongoDB, a leading NoSQL database, provides a powerful feature that can make your job easier and more efficient - Change Streams.
What are MongoDB Change Streams?
MongoDB Change Streams is a feature that allows you to monitor real-time changes to data in your MongoDB database. This feature provides a continuous stream of data change events, making it a valuable tool for DBAs, developers, and system administrators alike.
Earlier, the realtime updates were done using oplog tailing, a special capped collection that keeps a rolling record of the operations that modify data in MongoDB database. Oplog can be tailed using a cursor to track changes in real-time. However, tailing oplogs requires a lot of work to implement and maintain, and change streams provide clear advantages like access control, and since they are powered by the MongoDB aggregation framework, your application can apply filters, projections or any additional stages in the pipeline. For example, if your application wants to listen only to insert notifications, you can set the same by using the $match stage of the aggregation pipeline.
change_stream = client.changestream.collection.watch([{
'$match': {
'operationType': { '$in': ['insert'] }
}
}])
Some other stages that you can use are $project, $addfields, $replaceroot, and $redact.
Why use MongoDB Change streams?
MongoDB Change streams were introduced in MongoDB 3.6, and since then there have been many improvements in the feature. The most popular uses of MongoDB Change streams are in applications based on event-driven architecture, real-time analytics, notifications, and sharing IoT and data pipelines to other systems.
Change streams are push-based, i.e. change events are pushed to users in real-time. The most important advantage of MongoDB Change streams is that the MongoDB database can consume millions of change events in real time. Also, since it works on plug and play architecture, new use cases can be deployed and tested independently.
How does MongoDB Change streams work?
You can add real-time updates about data changes in your MongoDB database using the watch() function. Applications can create a cursor (like Java or Python) or use a callback (node) to learn about the change events happening in the database. For example, if you use the watch() function on the ‘room_status’ collection, as room_status.watch(), any real-time updates to the collection will return details like the type of operation performed in the database (operationType), and the fields that were affected (updatedFields).
–Open the stream
stream = db.room_status.watch();
for change in stream:
–perform some actions
Suppose your application inserts a new row indicating availability of a new room. The change stream output is generated in the oplogs with the details about the new event:
{
"_id": {
"_data": "..."
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 1652324617,
"i": 1
}
},
"wallTime": {
"$date": "2023-10-30T17:06:23.409Z"
},
"fullDocument": {
"_id": {
"$oid": "<_id value of document>"
},
"test": 1
},
"ns": {
"db": "hotels",
"coll": "room_status"
},
"documentKey": {
"_id": {
"$oid": "<_id value of document>"
}
}
}
Note that in the above example, the operationType is an insert, however it can also be replace, update and delete.
In the case of an insert and replace, we receive the complete document (fullDocument field).
In case of update, we can get the details of the updatedFields in the updateDescription field. If you want to receive the full document after an update as well, you can pass the updateLookup value for the fullDocument field.
const stream = room_status.watch([], { fullDocument: 'updateLookup' });
Also, for updates to more than one document, there will be a separate notification for each updated document.
As long as you are in the oplogs window, you can read the change stream events of the past as well. If the contents of the oplogs are overridden by new content, the old content will be lost in which case you cannot read the past events. MongoDB allows you to configure the oplogs by size or by time.
In case of issues like a network issue or application downtime, where there are chances of missing events, change stream provides you a resume token, through which you can receive events from where you left. The resume token also ensures you don’t receive any events twice. The resume token is the _id field of the change stream event document.
However, if an invalidate event closes the stream, you cannot resume the same. An invalidate event occurs when an operation renders the change stream invalid. For example, drop or rename collection or drop or rename database.
{
"_id": { <resume token=""> },
"operationType": "invalidate",
"clusterTime": <timestamp>, // timestamp of the oplog entry
"wallTime": <isodate> //server date and time of the operation
}
</isodate></timestamp></resume>
Pre-images and post-images
Capturing the pre and post images of data is a new feature introduced in 2022. This feature is disabled by default and you can enable it for the collection you want to watch. You can also set the retention configuration using the cluster level parameter expireAfterSeconds. To enable pre and post images on a collection, use the collMod operator:
{
collMod: ‘collection_name’,
changeStreamPreAndPostImages: {enabled: true}
}
Then, you can open the stream and include the pre-image:
cursor = db.watch([], {fullDocument: “required”, fullDocumentBeforeChange: “required”})
New events
In the new change streams (2022), a new feature is the inclusion of new events for creating or dropping an index, and creating, dropping, renaming, modifying or sharding a new collection. These options should be enabled using the option showExpandedEvents as true.
Key Benefits for DBAs
Real-Time Monitoring
With Change Streams, DBAs can stay informed about every data modification as it happens. This real-time visibility into database changes is crucial for maintaining data integrity and diagnosing issues promptly.
Automated Alerts
Change Streams enable you to set up automated alerts based on specific data changes. For example, you can configure alerts to trigger when certain documents are inserted, updated, or deleted. This proactive monitoring helps DBAs respond quickly to potential problems.
Streamlined Debugging
When issues arise, debugging is often a time-consuming task. Change Streams simplify the process by providing a detailed history of data changes. This historical context is invaluable for identifying the root cause of problems and rolling back changes if necessary.
Replication and Synchronization
In a distributed database environment, such as a MongoDB replica set or sharded cluster, Change Streams can be used to monitor data synchronization and replication. DBAs can ensure that data consistency is maintained across all nodes.
Integration with External Systems
MongoDB Change Streams can be integrated with external systems, such as message queues or data warehouses, to trigger actions or perform analytics based on data changes. This flexibility enables DBAs to extend their monitoring and management capabilities.
Practical Use Cases
Here are a few practical use cases where MongoDB Change Streams can benefit DBAs:
Audit Logging: Implementing an audit log that tracks all changes to sensitive data for compliance and security purposes.
Performance Optimization: Identifying and addressing performance bottlenecks by analyzing real-time query patterns and data changes.
Data Replication: Monitoring data replication across multiple MongoDB instances to ensure data consistency and high availability.
Error Detection: Detecting and responding to errors or anomalies in data processing pipelines by alerting DBAs to unexpected changes.
Data Validation: Ensuring data quality by validating incoming data against predefined criteria and taking corrective actions when necessary.
Conclusion
MongoDB Change Streams empower DBAs with real-time visibility into data changes, automated alerts, and streamlined debugging capabilities. By harnessing this feature, DBAs can proactively manage their MongoDB databases, enhance data security, and optimize performance.
In a world where data is constantly evolving, MongoDB Change Streams empower DBAs to stay ahead of the curve and maintain a robust and reliable database infrastructure.