Apache Kafka

Apache Kafka is a distributed event streaming platform. Connect to an Apache Kafka real-time processing server to write and to read Streams of events from and into topics.

Configure the following details for Apache Kafka data source:

Register data source
Field	Description
Display name	Enter the data source name to be displayed on the screen.
Hostname	Enter the hostname. You can add multiple host information. To add, click the Add icon. A new row appears for adding hostname and port. Enter the details.
Port	Enter the port number.
SASL connection	Use the toggle switch to enable or disable the Simple Authentication Security Layer (SASL) to include an authentication mechanism. If enabled, Upload the SSL certificate: i. The Upload SSL certificate (.pem, .crt, .cert, or .cer) link is enabled. ii. Click the Upload SSL certificate (.pem, .crt, .cert, or .cer) link. iii. Browse the SSL certificate and upload. Select one of the following SASL mechanisms: PLAIN SCRAM SHA-256 SCRAM SHA-512 Enter the Username and API key/Password.
Connection status	Click the Test connection link to test the data source connection. If the data source connection is successful, a success message appears.
Associate catalog	Select the checkbox to associate a catalog to the data source. This catalog is automatically associated with your data source and serves as your query interface with the data stored within.
Catalog name	Enter the name of the catalog.
Add topics	You can add topics after you create the data source. i. Go to the Infrastructure manager. ii. Click the Apache Kafka data source. iii. Click Add topics option. iv. Upload .json definition files. You can either drag the files or use the Click to upload option. Topic names are determined from the definition files. v. Use the Edit option to view and edit the topic files.
Create	Click Create to create the data source.

Sample .json definition file

The following is the sample .json definition file to be uploaded to the Kafka source configuration section for Kafka topics:

{
 "topicName": "customer_orders",
 "tableName": "orders",
 "fileContent": {
     "tableName": "orders",
     "columns": [
         {
             "name": "order_id",
             "type": "INTEGER",
             "primaryKey": true
         }
     ],
     "partitionKey": "customer_id",
     "retentionPeriod": "7 days"
 },
 "contents": {
     "tableName": "orders",
     "topicConfig": {
         "partitions": 1,
         "replicationFactor": 1,
         "retentionMs": 604800000,
         "cleanupPolicy": "delete"
     },
     "schema": {
         "type": "struct",
         "fields": [
             {
                 "name": "order_id",
                 "type": "int64"
             }
         ]
     }
 }
}

Limitations for SQL statements

For data source-based catalogs the CREATE SCHEMA, CREATE TABLE, DROP SCHEMA, DROP TABLE, DELETE, DROP VIEW, ALTER TABLE, and ALTER SCHEMA statements are not available in the Data Manager UI.

Limitations for data types

When the fields of data type REAL have 6 digits or more in the decimal part with the digits being predominately zero, the values when queried are rounded off. It is observed that the rounding off occurs differently based on the precision of the values. For example, a decimal number 1.654 when rounded to 3-digits after the decimal point are the same. Another example is 10.890009 and 10.89000. It is noticed that 10.89000 is rounded to 10.89, whereas 10.89009 is not rounded off. This is an inherent issue because of the representational limitations of binary floating point formats. This might have a significant impact when querying involves sorting.