IBM Cloud Docs
Creating Views (MapReduce)

Creating Views (MapReduce)

Views are used to obtain data stored within a database. Within views, you use reduce functions, map and reduce functions, and storing a view definition.

Learn more about the simplest view, reduce functions, map and reduce function restrictions, and storing a view definition. Plus, see the examples that are provided. Views are written by using JavaScript.

View concepts

Views are mechanisms for working with document content in databases. A view can selectively filter documents and speed up the search for content. It can be used to pre-process the results before they're returned to the client.

Views are simply JavaScript functions, which are defined within the views field of a design document. When you use a view, or more accurately when you run a query by using your view, the system applies the JavaScript function to every document in the database. Views can be complex. You might choose to define a collection of JavaScript functions to create the overall view that is required.

View index partitioning type

A view index inherits the partitioning type from the options.partitioned field of the design document that contains it.

A simple view

The simplest form of view is a map function. The map function produces output data that represents an analysis (a mapping) of the documents that are stored within the database.

For example, you might want to find out which user completed the online registration and has a verified email to contact. You might find this information by inspecting each document, and looking for a field in the document called "email_verified" and getting the value of "email". If the field is present and has the value true, it means the user completed the registration, and you can contact them by email. If the field isn't present or has a value of something else than true, the user didn't complete the registration.

Using the emit function in a view function makes it easy to produce a list in response to running a query by using the view. The list consists of key and value pairs, where the key helps you identify the specific document and the value provides just the precise detail you want. The list also includes metadata such as the number of key:value pairs returned.

The document _id is automatically included in each of the key:value pair result records. The document _id is included to make it easier for the client to work with the results.

See an example of a simple view by using a map function:

function(user) {
  if(user.email_verified === true) {
    emit(user.email, {name: user.name, email_verified: user.email_verified, joined: user.joined});
  }
}

See sample data for demonstrating the simple view example:

[
    {
        "_id":"abc123",
        "name": "Bob Smith",
        "email": "bob.smith@aol.com",
        "email_verified": true,
        "joined": "2019-01-24T10:42:59.000Z"
    },
    {
        "_id":"abc125",
        "name": "Amelie Smith",
        "email": "amelie.smith@aol.com",
        "email_verified": true,
        "joined": "2020-04-24T10:42:59.000Z"
    }
]

See an example response from running the simple view query:

{
  "total_rows": 2,
  "offset": 0,
  "rows": [
    {
      "id": "abc125",
      "key": "amelie.smith@aol.com",
      "value": {
        "name": "Amelie Smith",
        "email_verified": true,
        "joined": "2020-04-24T10:42:59.000Z"
      }
    },
    {
      "id": "abc123",
      "key": "bob.smith@aol.com",
      "value": {
        "name": "Bob Smith",
        "email_verified": true,
        "joined": "2019-01-24T10:42:59.000Z"
      }
    }
  ]
}

Map function examples

The definition of a view within a design document also creates an index based on the key information. The production and use of the index significantly increases the speed of access and searching or selecting documents from the view.

The following sections describe indexing with simple and complex keys, and reduce functions.

Your indexing functions work in a memory-constrained environment where the document forms part of the memory used in the environment. Your code's stack and document must fit within the memory. We limit documents to a maximum size of 64 MB.

Indexing a field

The following map function checks whether the object has a name field, and if so emits the value of this field. With this check, you can query against the value of the name field.

See an example of indexing a field:

function(doc) {
  if (doc.name) {
    emit("name", doc.name);
  }
}

An index for a one-to-many relationship

If the object passed to emit has an _id field, a view query with include_docs set to true contains the document with the specific ID.

See an example of indexing a one-to-many relationship:

function(doc) {
  if (doc.friends) {
    for (friend in doc.friends) {
      emit(doc._id, { "_id": friend });
    }
  }
}

Complex keys

Keys aren't limited to simple values. You can use arbitrary JSON values to influence sorting.

When the key is an array, view results can be grouped by a subsection of the key. For example, if keys have the form [year, month, day], then results can be reduced to a single value or by year, month, or day.

For more information, see Using views.

Reduce functions

Design documents with options.partitioned set to true can't contain custom JavaScript reduce functions. Only built-in reduces are allowed.

No reducer

A view definition inside a design document is permitted to have no reduce attribute, indicating that no query-time aggregation is performed.

{
    "views": {
        "getVerifiedEmails": {
            "map": "function(user) { if(user.email_verified === true) { emit(user.email); } }"
        }
    }
}

The previous map function generates a secondary index suitable for selection only. The index is always ordered by the key (the emit function's first parameter) - in this case user.email. This view is ideal for fetching documents by a known user email or ranges of users email addresses.

Built-in reduce functions

For performance reasons, a few simple reduce functions are built in. Whenever possible, you must use one of these functions instead of writing your own.

To use one of the built-in functions, put the name into the reduce field of the view object in your design document.

{
    "views": {
        "sumPrices": {
            "map": "function(user) { if(user.email_verified === true) { emit(user.name, user.email); } }",
            "reduce": "_count"
        }
    }
}

The previous MapReduce view creates an index that is keyed on the username and whose counts all active email. As the reducer is _count, the view outputs the total email count for the selection of data queried. It is suitable for counting the registered users.

The numeric reducers _stats/_sum act upon the value (the emit function's second parameter) which can be a number, array, or object. Consider the following MapReduce definition on the products partitioned database:

{
    "views": {
        "statsReadingsObject": {
            "map": "function(product) {  emit(product.type, { price: product.price, tax: product.tax }); }",
            "reduce": "_sum"
        }
    }
}

The view is keyed on the type of the product, and the value is an object that contains two values: price and tax. The _sum reduce calculates totals for each attribute of the object that it finds:

{"rows":[
    {"key":null,"value":{"price":144.97, "tax":7.32}}
]}

Or add ?group=true when querying the view. The output is grouped and summed by a unique key, in this case, type:

{"rows":[
    {"key":"portable","value":{"price":14.99,"tax":1.14}},
    {"key":"product","value":{"price":129.98,"tax":6.18}}
]}

The numeric reducers also calculate multiple reductions when the value of an index is an array of numbers:

{
    "views": {
        "statsReadingsArray": {
            "map": "function(doc) { emit(doc.date, [doc.price, doc.tax]); }",
            "reduce": "_stats"
        }
    }
}

The previous definition calculates statistics on the numerical values that it finds in the array that is emitted as the index's value. The values are returned as an array in the same order as supplied in the map function:

{"rows":[
    {"key":"portable","value":[
        {"sum":14.99,"count":1,"min":14.99,"max":14.99,"sumsqr":224.7001},
        {"sum":1.14,"count":1,"min":1.14,"max":1.14,"sumsqr":1.2995}
    ]},
    {"key":"product","value":[
        {"sum":129.98,"count":2,"min":29.99,"max":99.99,"sumsqr":10897.4002},
        {"sum":6.18,"count":2,"min":1.62,"max":4.56,"sumsqr":23.418}
    ]}
]}

The _count reducer simply counts the number of key-value pairs that are emitted into the index.

{"rows":[
    {"key":"product","value":3}
]}

The _approx_count_distinct_reducer acts upon the key of the index, as opposed to the numeric reducers that act upon the index's value.

{"rows":[
    {"key":null,"value":2}
]}
Table 1. Built-in reduce functions
Function Description
_count Produces the row count for a specific key. The values can be any valid JSON.
_stats Produces a JSON structure that contains the sum, the count, the min, the max, and the sum-squared values. All values must be numeric.
_sum Produces the sum of all values for a key. The values must be numeric.
_approx_count_distinct Approximates the number of distinct keys in a view index by using a variant of the HyperLogLog algorithm.

Custom reduce functions

Most customers find that built-in reducers are sufficient to perform aggregations on the view key-value pairs emitted from their Map functions. However, for unusual use-cases, a JavaScript reduce function can be supplied instead of the name of one of the built-in reducers.

Reduce functions are passed three arguments in the following order:

  • keys
  • values
  • rereduce

If a view has a custom JavaScript reduce function, it is used to produce aggregate results for that view. A reduce function is passed a set of intermediate values and combines them to a single value. A reduce function must accept, as input, results emitted by its corresponding map function, as well as results returned by the reduce function itself. The latter case is referred to as a "rereduce".

A description of the reduce functions is shown in the following example.

See the following example of a custom reduce function:

function (keys, values, rereduce) {
  return sum(values);
}

Reduce functions must handle two cases:

  1. When rereduce is false:

    • keys is an array whose elements are arrays of the form [key, id], where key is a key that is emitted by the map function, and id identifies the document from which the key was generated values is an array of the values that are emitted for the respective elements in keys, for example: reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false).
  2. When rereduce is true:

    • keys is null.
    • values is an array of the values that are returned by previous calls to the reduce function, for example: reduce(null, [intermediate1,intermediate2,intermediate3], true).

Reduce functions must return a single value, suitable for both the value field of the final view, and as a member of the values array that is passed to the reduce function.

Often, reduce functions can be written to handle rereduce calls without any extra code, like the summation function in the earlier example. In such cases, the rereduce argument can be ignored.

By feeding the results of reduce functions back into the reduce function, MapReduce can split up the analysis of huge data sets into discrete, parallel tasks, which can be completed much faster.

When you use the built-in reduce function, if the input is invalid, the builtin_reduce_error error is returned. More detailed information about the failure is provided in the reason field. The original data that caused the error is returned in the caused_by field.

See an example reply:

{
    "rows": [
        {
            "key": null,
            "value": {
                "error": "builtin_reduce_error",
                "reason": "The _sum function requires that map values be numbers, arrays of numbers, or objects. Objects can't be mixed with other data structures. Objects can be arbitrarily nested, if the values for all fields are themselves numbers, arrays of numbers, or objects.",
                "caused_by": [
                    {
                        "a": 1
                    },
                    {
                        "a": 2
                    },
                    {
                        "a": 3
                    },
                    {
                        "a": 4
                    }
                ]
            }
        }
    ]
}

Map and reduce function restrictions

Map and reduce function restrictions are described here.

Referential transparency

The map function must be referentially transparent. Referential transparency means that an expression can be replaced with the same value without changing the result, in this case, a document, and a key-value pair. Because of referential transparency, IBM Cloudant views can be updated incrementally and reindex only the delta since the last update.

Commutative and associative properties

In addition to referential transparency, the reduce function must also have commutative and associative properties for the input. These properties make it possible for the MapReduce function to reduce its own output and produce the same response, for example:

f(Key, Values) == f(Key, [ f(Key, Values) ] )

As a result, IBM Cloudant can store intermediate results to the inner nodes of the B-tree indexes. These restrictions also make it possible for indexes to spread across machines and reduce at query time.

Document partitioning

Due to sharding, IBM Cloudant offers no guarantees that the output of any two specific map functions passes to the same instance of a reduce call. You must not rely on any ordering. The reduce function that you use must consider all the values that are passed to it and return the correct answer irrespective of ordering. IBM Cloudant is also guaranteed to call your reduce function with rereduce=true at query time even if it didn't need to do so when it built the index. It's essential that your functions work correctly in that case (rereduce=true means that the keys parameter is null and the values array is filled with results from previous reduce function calls).

Reduced value size

IBM Cloudant computes view indexes and the corresponding reduce values then caches these values inside each of the B-tree node pointers. Now, IBM Cloudant can reuse reduced values when it updates the B-tree. You must pay attention to the amount of data that is returned from reduce functions.

It's best that the size of your returned data set stays small and grows no faster than log(num_rows_processed). If you ignore this restriction, IBM Cloudant does not automatically throw an error, but B-tree performance degrades dramatically. If your view works correctly with small data sets but quits working when more data is added, your view might violate the growth rate characteristic restriction.

Execution environment

Your indexing functions work in a memory-constrained environment where the document forms part of the memory used in the environment. Your code's stack and document must fit within the memory. We limit documents to a maximum size of 64 MB.

No JavaScript reducers when options.partitioned is true

Design documents with options.partitioned set to true can't contain JavaScript reduce functions, only built-ins Erlang reducers such as _stats.

Storing the view definition

Each view is a JavaScript function. Views are stored in design documents. So, to store a view, IBM Cloudant simply stores the function definition within a design document. A design document can be created or updated just like any other document.

To store a view definition, PUT the view definition content into a _design document.

In the following example, the getVerifiedEmails view is defined as a map function, and is available within the views field of the design document.

Use the PUT method to add a view into a design document:

PUT $SERVICE_URL/$DATABASE/_design/$DDOC HTTP/1.1
Content-Type: application/json

The following sample adds a new getVerifiedEmails named view function to the allusers design document with view definition:

{
    "views": {
        "getVerifiedEmails": {
            "map": "function(user) { if(user.email_verified === true){ emit(doc.email, {name: user.name, email_verified: user.email_verified, joined: user.joined}) }}  "
        }
    }
}

See the request examples:

curl -X PUT "$SERVICE_URL/users/_design/allusers" --data '{
  "views": {
    "getVerifiedEmails": {
      "map": "function(user) { if(user.email_verified === true){ emit(doc.email, {name: user.name, email_verified: user.email_verified, joined: user.joined}) }}"
    }
  }
}'
import com.ibm.cloud.cloudant.v1.Cloudant;
import com.ibm.cloud.cloudant.v1.model.DesignDocument;
import com.ibm.cloud.cloudant.v1.model.DesignDocumentViewsMapReduce;
import com.ibm.cloud.cloudant.v1.model.DocumentResult;
import com.ibm.cloud.cloudant.v1.model.PutDesignDocumentOptions;

import java.util.Collections;

Cloudant service = Cloudant.newInstance();

DesignDocumentViewsMapReduce emailViewMapReduce =
    new DesignDocumentViewsMapReduce.Builder()
        .map("function(user) { if(user.email_verified === true){ emit(doc.email,{name: user.name, email_verified: user.email_verified, joined: user.joined}) }")
        .build();

DesignDocument designDocument = new DesignDocument();
designDocument.setViews(
        Collections.singletonMap("getVerifiedEmails", emailViewMapReduce));

PutDesignDocumentOptions designDocumentOptions =
    new PutDesignDocumentOptions.Builder()
        .db("users")
        .designDocument(designDocument)
        .ddoc("allusers")
        .build();

DocumentResult response =
    service.putDesignDocument(designDocumentOptions).execute()
        .getResult();

System.out.println(response);
import { CloudantV1 } from '@ibm-cloud/cloudant';

const service = CloudantV1.newInstance({});

const emailViewMapReduce: CloudantV1.DesignDocumentViewsMapReduce = {
  map: 'function(user) { if(user.email_verified === true){ emit(doc.email, {name: user.name, email_verified: user.email_verified, joined: user.joined}) }}'
}

const designDocument: CloudantV1.DesignDocument = {
  views: {'getVerifiedEmails': emailViewMapReduce}
}

service.putDesignDocument({
  db: 'users',
  designDocument: designDocument,
  ddoc: 'allusers'
}).then(response => {
  console.log(response.result);
});
from ibmcloudant.cloudant_v1 import CloudantV1

service = CloudantV1.new_instance()

email_view_map_reduce = DesignDocumentViewsMapReduce(
  map='function(user) { if(user.email_verified === true){ emit(doc.email, {name: user.name, email_verified: user.email_verified, joined: user.joined}) }}'
)

design_document = DesignDocument(
  views={'getVerifiedEmails': email_view_map_reduce}
)

response = service.put_design_document(
  db='users',
  design_document=design_document,
  ddoc='allusers'
).get_result()

print(response)
emailViewMapReduce, err := service.NewDesignDocumentViewsMapReduce(
  "function(user) { if(user.email_verified === true){ emit(doc.email, {name: user.name, email_verified: user.email_verified, joined: user.joined}) }}",
)
if err != nil {
  panic(err)
}

designDocument := &cloudantv1.DesignDocument{
  Views: map[string]cloudantv1.DesignDocumentViewsMapReduce{
    "getVerifiedEmails": *emailViewMapReduce,
  },
}

putDesignDocumentOptions := service.NewPutDesignDocumentOptions(
  "users",
  "allusers",
  designDocument,
)

documentResult, _, err := service.PutDesignDocument(putDesignDocumentOptions)
if err != nil {
  panic(err)
}

b, _ := json.MarshalIndent(documentResult, "", "  ")
fmt.Println(string(b))

The previous Go example requires the following import block:

import (
  "encoding/json"
  "fmt"
  "github.com/IBM/cloudant-go-sdk/cloudantv1"
)

All Go examples require the service object to be initialized. For more information, see the API documentation's Authentication section for examples.