Serverless CouchDB/Cloudant attachment de-attacher
If you are using Cloudant or CouchDB and occasionally storing binary attachments inside documents, then detacher may be for you. It is a serverless function that runs in IBM Cloud Functions (based on Apache OpenWhisk) that is invoked whenever a Cloudant document changes. If the document contains attachments, those documents are copied into Cloud Object Storage or AWS S3 and removed from the document.
This allows the Cloudant database to remain free of binary attachments with no loss of data.
Here is a typical document before
{
"_id": "7",
"_rev": "2-920d8da7eb1a1175fcbc10cf6f989d99",
"first_name": "Glynn",
"last_name": "Bird",
"job": "Developer Advocate @ IBM",
"twitter": "@glynn_bird",
"_attachments": {
"headshot.jpg": {
"content_type": "image/jpeg",
"revpos": 2,
"digest": "md5-N0JXExRZxZaOD3sszjMXzA==",
"length": 46998,
"stub": true
}
}
}
CouchDB/Cloudant stores attached files in an object called _attachmments
. After processing by detacher, the document is modified to look like this:
{
"_id": "7",
"_rev": "3-c3272191e6e94d3bd2a3d72145c7d4fd",
"first_name": "Glynn",
"last_name": "Bird",
"job": "Developer Advocate @ IBM",
"twitter": "@glynn_bird",
"attachments": {
"headshot.jpg": {
"content_type": "image/jpeg",
"revpos": 2,
"digest": "md5-N0JXExRZxZaOD3sszjMXzA==",
"length": 46998,
"stub": true,
"Location": "https://detacher.s3.eu-west-2.amazonaws.com/7-headshot.jpg",
"Key": "7-headshot.jpg"
}
}
}
Notice that the _attachments
key is no longer there: Cloudant is not storing the attachment anymore. In its place is attachments
(without the underscore) which contains the same data but with an extra Location
and Key
which record where in your Object Storage the file is stored.
You need:
Ensure you have a new “bucket” in your Object Storage service and a new database in your Cloudant service.
Set up environment variables containing the credentials of your Cloudant service and Object storage service:
export CLOUDANT_HOST="myhost.cloudant.com"
export CLOUDANT_USERNAME="myusername"
export CLOUDANT_PASSWORD="mypassword"
export CLOUDANT_DATABASE="mydatabase"
export AWS_ACCESS_KEY_ID="ABC123"
export AWS_SECRET_ACCESS_KEY="XYZ987"
export AWS_BUCKET="mybucket"
export AWS_REGION="eu-west-2"
export AWS_ENDPOINT="https://ec2.eu-west-2.amazonaws.com"
If you are using Amazon S3, you can omit the
AWS_ENDPOINT
environment variable. For the IBM Cloud Object Storage service, the endpoints are listed here.
Then run the deploy.sh
script
./deploy.sh
You can now add document to your database and add an attachment too it. In a few moments the document will have updated and will no longer contain attachments, but references to those files in your object storage.