Queryable Encryption in MongoDB

Sep, 2023

Queryable Encryption is a new feature added by MongoDB in April 2023. Although still in its testing phases, Queryable Encryption is set to redefine the way we look at security of the most confidential data, particularly “reads”, as the name “queryable” suggests.

Data security in MongoDB

MongoDB is secure by default. This means, when you create a new cluster, you have all the authorization and authentication features in place and you do not have to write exclusive code for security. Your application database needs security at three levels - at rest, in-transit, and in-use. MongoDB provides encryption for all these levels, by default.

In-transit encryption

MongoDB supports encryption in-transit through the Transport Layer Security (TLS) - by default. This includes data transmitted to MongoDB clusters as well as data transmitted between the MongoDB cluster nodes.

At-rest encryption

The volume/disk data stored in MongoDB are protected at database-level through WiredTiger, a database storage engine using the AES256-CBC (256-bit Advanced Encryption Standard in Cipher Block Chaining mode) by default. This encrypted storage engine performs cryptographic operation using the cryptography provider of the operating system used. MongoDB also provides Application level encryption to encrypt data within the application layer, field wise or document wise.

In-use encryption

MongoDB encrypts data throughout its lifecycle - from the client-side to being sent to the database - and while retrieving from the database and sending back to the client. This is done via customer-controlled encryption keys. The encryption in-use is particularly useful for highly critical data, and the advantage is that the developers need not worry about writing the encryption code in their application - of course, apart from meeting the strict data privacy guidelines, like CCPA, HIPAA and more. There are two ways for in-use encryption:

Client-side Field Level Encryption (CSFLE) - This feature enables the application to encrypt data before storing it in MongoDB. The data remains encrypted throughout its lifecycle, and can be decrypted on the client side only. CSFLE uses deterministic encryption, where each input has a standard encrypted output value.
Queryable Encryption - A new feature released as preview with MongoDB 6.0, where you can run expressive queries on the encrypted data, without losing out on performance. Queryable Encryption uses structured encryption schemes.

Need for multiple security mechanisms

Although data is encrypted at multiple levels, it can also get leaked at different levels - client-side, during http request and response, files, connections from point to point and more. Hackers keep looking out for weak points during transactions, and hence you may need to secure some part of data more than others. While data is in-use, it is more vulnerable to insider access, database breaches or compromises, like authorized and compromised admins, DBAs and users, RAM scraping, and process inspections.

Queryable Encryption

Queryable Encryption (QE) is an in-use encryption method where sensitive data is encrypted at all times - whether it is in rest, in use, in transit, in logs, on the server, in the database, in the backups or just anywhere in the network. It prevents

database privilege abuse, for example, stolen credentials or violation of privilege,
network snooping, like packet sniffing or IP/DNS spoofing,
data theft due to file exposure or loss of data
access to the memory of database host, as for RAM scraping or memory dump analysis

Queryable encryption allows applications to query databases, like a regular query, and get the expected results too. However, you can only see the encrypted tokens everywhere throughout the transaction - in logs, backups, and even in the query. Only the person/client application holding the end-to-end encryption key can access the sensitive data. QE is much suitable for querying highly confidential data like PII or billing information.

As in the above diagram, we can see that MongoDB data does not have the key for the encrypted data, but supports querying the data and returns the result in an encrypted format. QE uses fast and searchable encryption schemes, based on structured encryption, which encrypt data in such a way that they can be privately queried, and produce different encrypted output values for the same cleartext input. For example, for a query q and a key k, the encrypted data can be queried using the token qtk. The response can be plain text or encrypted. There is no scope for data leakage (beyond the defined leakage), even by eavesdropping, as the encrypted fields, including the queries, are transmitted, stored, processed and retrieved as ciphertext.

The CSFLE method is quite good for inserts, and find matches for equality, however, does not support range, prefix, suffix and substring finds. MongoDB 7.0 comes with all these features in QE.

How does Queryable Encryption work?

Queryable Encryption should be specified during the collection creation. To specify the fields for query and encryption, you can define a JSON schema, with the properties like path, keyId, queryType and bsonType on the fields that you want to be encrypted automatically. For example,

{
fields: [
      {
         path: "personal_id_number",
         keyId: "<unique data encryption key>",
         bsonType: "int",
         queries: { queryType: "equality" }
      }
]

You can then send the fields object to the client application that will create the collection.

{
const client = new MongoClient(uri, {
   autoEncryption: {
      keyVaultNameSpace: "<your keyvault namespace>",
      kmsProviders: "</your><your kms provider>",
      extraOptions: {
cryptSharedLibPath: "<path to FLE Shared Library>"
      },
      encryptedFieldsMap: {
         "<databasename .collectionName>": { encryptedFieldsObject }
      }
   }

   ...

   await client.db("<database name>").createCollection("<collection name>");
}

</collection></database></databasename></path>

Above code is taken from MongoDB documentation.

Queryable Encryption uses dynamic volume-hiding Encrypted-Multi-Map (EMM) scheme, that stores a pair of label and value tuples. The labels are pseudorandom function evaluations, like keyed HMACs (Hash-based Message Authentication code). The cryptographic hashing makes the EMM look-up fast and efficient.

Let us take an example of a query that needs to find details of a user using their personal identification number (pid).

Here is what happens when the find query is fired for an encrypted field:

The query goes to the MongoDB driver that identifies whether the field is encrypted and gets the encryption key from a customer-provisioned key provider like Azure Key Vault, Google Cloud KMS, AWS Key Management Service (KMS), or any other KMIP compliant key provider.
The driver then submits the query to the MongoDB Data platform with the encrypted fields as a ciphertext. Throughout the process, only the token is seen and not the actual field.
Queryable Encryption uses EMM to retrieve data, without actually knowing about the data. The data as well as the query are still encrypted.
The encrypted result is sent to the driver that uses the key to decrypt the result and send it as a plaintext (JSON) to the authenticated client application.

MongoDB drivers

MongoDB supports Queryable Encryption on applications written in all major languages like Java, Python, Node.js, C#, C++, C, Ruby, Rust, PHP and Go, using the respective drivers.

Conclusion

Queryable Encryption is a groundbreaking innovation from the cyber research team of MongoDB and can prove to be extremely useful for certain regions of data that are more sensitive than other parts of data. It can prevent intentional as well as unintentional data leak, due to any compromises, security breaches and other losses, including frequency analysis attacks. This is because neither the client nor the database need to know the exact field. For example, if you want to find a fraudulent card, you may as well give the last few digits of the credit card, and ask for the entire information, without knowing anything about it. The information is accessible only to the person who holds the encryption key. That’s the power of QE.