IBM Cloud Docs
Using IBM Cloudant

Using IBM Cloudant

If you never use IBM Cloudant or NoSQL databases in general, scan this introduction and some best practices before you read further. It describes the most important things you need to know about IBM Cloudant and how to use it best. The rest of the documentation assumes that you know these basics.

You can find more information about IBM Cloudant in the following sections:

Connecting to IBM Cloudant

To access IBM Cloudant, you must have an IBM Cloud® account.

HTTP API

All requests to IBM Cloudant go over the web. This statement means that any system that can speak to the web can speak to IBM Cloudant. All language-specific libraries for IBM Cloudant are just wrappers that provide some convenience and linguistic niceties to help you work with a simple API. Many users choose to use raw HTTP libraries for working with IBM Cloudant.

For more information about how IBM Cloudant uses HTTP, see HTTP in the API reference.

IBM Cloudant supports the following HTTP request methods:

GET
Request the specified item. As with normal HTTP requests, the format of the URL defines what is returned. With IBM Cloudant, this definition can include static items, database documents, and configuration and statistical information. In most cases, the information is returned in the form of a JSON document.
HEAD
The HEAD method retrieves the HTTP header of a GET request without the body of the response.
POST
Upload data. In IBM Cloudant's API, the POST method sets values, uploads documents, sets document values, and starts some administration commands.
PUT
Used to "store" a specific resource. In IBM Cloudant's API, PUT creates new objects, including databases, documents, views, and design documents.
DELETE
Deletes the specified resource, including documents, views, and design documents.
COPY
A special method that copies documents and objects.

If the client (such as some web browsers) doesn't support the use of HTTP methods, POST can be used instead with the X-HTTP-Method-Override request header set to the actual HTTP method.

Method not allowed error

If you use an unsupported HTTP request type with a URL that doesn't support the specified type, a 405 error is returned. The error that lists the supported HTTP methods, as shown in the following example.

Example error message in response to an unsupported request

{
    "error":"method_not_allowed",
    "reason":"Only GET,HEAD allowed"
}

JSON

IBM Cloudant stores documents that use JSON (JavaScript Object Notation) encoding, so anything encoded into JSON can be stored as a document. Files that include media, such as images, videos, and audio, are called BLOBs (Binary Large Objects). BLOBs can be stored as attachments associated with documents.

More information about JSON can be found in the JSON Guide.

Distributed systems

By using IBM Cloudant's API, you can interact with a collaboration of numerous machines, called a cluster. The machines in a cluster must be in the same datacenter, but can be within different "pods" in that datacenter. Using different pods helps improve the High Availability characteristics of IBM Cloudant.

An advantage of clustering is that when you need more computing capacity, you add more machines. This method is often more cost-effective and fault-tolerant than scaling up or enhancing an existing single machine.

For more information about IBM Cloudant and distributed system concepts, see the CAP Theorem guide.

Replication

Replication is a procedure followed by IBM Cloudant, CouchDB, PouchDB, and other distributed databases. Replication synchronizes the state of two databases so that their contents are identical.

You can replicate continuously. Continuous replication means that a target database updates every time the source database changes. Continuous replication can be used for backups of data, aggregating data across many databases, or for sharing data.

However, continuous replication means testing continuously for any source database changes. This testing requires continuous internal calls, which might impact performance or the cost of using the database.

Continuous replication can result in many internal calls. These calls might affect costs for multi-tenant users of IBM Cloudant systems. Continuous replication is disabled by default.

Using the proper tool for the job

IBM Cloudant is a scalable, durable, highly available, operational JSON document store with an HTTP API. It's suitable for the following purposes:

  • Powering your always-on web application.
  • Being the server-side data store for mobile applications.
  • Storing time-series data in time-boxed databases before you archive to object storage and delete the original.
  • Storing application objects as JSON while queries are delivered from secondary indexes.
  • Replicating data sets across geographies for disaster recovery, extra capacity, or moving data nearer to your users.

IBM Cloudant doesn't include the following features:

For more information, see the Best and worst practice blog.

Organizing documents and databases

IBM Cloudant data is organized in a hierarchy of databases and documents. A document is a JSON object with a unique identifier: its _id. A database is a collection of documents with a primary index that allows documents to be retrieved by _id. It also has optional secondary indexes that allow documents to be queried by other attributes in the object.

When developers start a project, they sometimes struggle with the following questions:

  • How much data can I put into a single object?
  • Must I store different document types in the same collection or one database per document type?

It is important for a document to include all the data about an object that is modeled by your application, for example, a user, an order, or a product. This practice ensures you fetch the entire object from the database in one API call. IBM Cloudant doesn't have the concept of joins like a relational database, so data isn't normalized. However, data can repeat across objects. For example, an order document can include a subset of the product documents that were purchased.

It's common to store several object types in the same database: a convention is that a type attribute is used to denote the object type. This option is a good one if you need to perform queries that return several object types or if a database needs to be replicated to another location altogether. Otherwise, separate databases, for example, users, orders, products, might be better so that secondary indexes are specific to each object type.

If you're storing arrays of objects within a document, consider whether the array items must really be their own document. For example, a product and each product review must be stored in separate documents, but a user and each of that user's orders must have their own document.

If you have an ever-growing data set, then you probably don't want to store data in a single, ever-growing database. Data is best stored in time-boxed databases that allow older data to be archived and deleted cleanly. Deleting an IBM Cloudant document leaves a tombstone document behind, so don't rely on deleting documents to recover disk space. Instead, you must rely on deleting whole databases.

JSON doesn't offer a native way to store dates or timestamps. Choose your date format carefully if you intend to query it later.

The maximum document size is 1 MB, but documents must be much smaller than that size, typically a few KB.

For more information, see the following blog posts:

Making the most of the primary index

IBM Cloudant has a primary index on the document's _id attribute. This index allows documents to be retrieved by _id (GET /db/id) or a range of _ids (GET /db/_all_docs?startkey="a"&endkey="z"). By storing data in the primary key and ensuring that each _id is unique, the primary index can be used to fetch documents and ranges of documents without secondary indexing. See the following list of ideas:

  • If you have something unique in your object that would be useful to query against, use it as your _id field, for example, bob.smith@gmail.com, isbn9780241265543, or oakland,ca.
  • If your objects contain a hierarchy, model that in your _id: usa:ca:oakland or books:fiction:9780241265543. The hierarchy goes from largest to smallest, so you can use the primary index to find all the cities in usa or all the cities in usa:ca, without secondary indexing.
  • If you're storing time-series data, encoding time at the start of your _id sorts the primary index by time, for example, 001j40Ox1b2c1B2ubbtm4CsuLB4L35wQ.
  • Partitioned databases group documents that share a partition key together. A partition key must have many values and must not include hot spots to avoid directing a large proportion of your application's traffic to a few partitions.

For more information, see the following blog posts:

Querying and secondary indexes

IBM Cloudant allows queries to run against a single database that returns an array of matching documents and a bookmark, which allows access to the next block of search results. Achieving better query performance depends on having your queries that are supported by suitable secondary indexes. An index allows the database to answer a query without having to trawl through every document in the database, yielding much faster performance.

See the following tips:

  • It's sometimes difficult to measure the performance of your queries until your data set is large enough to expose slow operations. Generate enough realistic data so that you can test your indexing and query performance before you get to production.
  • IBM Cloudant might return data to you without an index, but you must never rely on this data for production workloads. If your result set includes the warning, No matching index found. Create an index to optimize query time, then you need to revisit your indexing strategy. Use the explain feature to see which index is being selected for each query.
  • With several object types in the same database, many use cases can be serviced by a few indexes on fixed attributes. For more information, see Optimal IBM Cloudant Indexing.
  • Give your indexes meaningful names, and specify the index name at query-time, so that it's obvious which index corresponds to which of your application's queries.

For more information, see the following blog posts: