Dataristix® Cluster

Dataristix for Kubernetes

This article describes configuration steps and considerations when deploying Dataristix on Kubernetes. We assume that you already have a working cluster.

Dataristix instances

Each Dataristix instance is installed as a singleton in a StatefulSet with an attached persistent volume containing configuration data including the identity of the instance. The identity of each instance comprises instance-specific identifiers and certificates. In many applications Dataristix initiates the connection as a client and uses certificates to identiy itself to the external service. Dataristix may also act as the server (OPC UA reverse-connect server or MQTT broker), in which case the specific instance needs to be made accessible to external clients; this will need consideration when configuring your ingress controller. Ingress and external access configuration is not covered in this article; please refer to your ingress controller documentation.

Scaling

Depending on allocated resources, you may be able to process tens of thousands of data points per second with a single instance of Dataristix. To scale out, additional instances may be deployed with their own identity. In simple terms, you can then export the project from the first instance, import the project into the second instance, and then run only half of the tasks on the first instance and the other half on the second instance. Use MQTT for inter-instance communications should it be required.

Redundancy

Redundancy is achieved by attaching a redundant persistent volume to each Dataristix instance. Should a node fail, then the failed instance can be re-instated from the persistent volume. Tasks that are configured to start automatically will begin data processing on the new instance.

Service configuration

Each instance has its own service configuration. Here we simply call it dataristix-1, in the anticipation that we may want to add additional instances  dataristix-2dataristix-3, and so forth, in the future.

Create a file dataristix-1-service.yml and edit as follows.

apiVersion: v1
kind: Service
metadata:
  name: dataristix-1
  labels:
    app: dataristix-1
spec:
  ports:
  - port: 8282
  clusterIP: None
  selector:
    app: dataristix-1

Apply to create the service.

kubectl apply -f dataristix-1-service.yml

Persistent volume configuration

Your preferred persistent volume configuration will depend on your environment. Adjust the configuration so that your chosen persistent volume is redundant and secure. In particular, the dataristix-secret volume mount used in Dataristix pods (see below) may contain sensitive data, and the dataristix-data volume mount should have restricted access. In this example, we define the persistent volume file dataristix-1-volume.yml with a simple hostPath as follows. Notably, we use the  ReadWriteOncePod access mode to ensure that only a single Dataristix instance has access to the volume. This feature requires Kubernetes 1.22 or later.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: dataristix-1-volume
  labels:
    type: local
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOncePod
  hostPath:
  path: /mnt/data

Apply to create the persistent volume.

kubectl apply -f dataristix-1-volume.yml 

Persistent volume claim configuration

We claim the volume for the single Dataristix instance in file dataristix-1-volume-claim.yml as follows.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dataristix-1-volume-claim
spec:
  accessModes:
    - ReadWriteOncePod
  resources:
    requests:
      storage: 1Gi

Apply to create the persistent volume claim.

kubectl apply -f dataristix-1-volume-claim.yml 

Pod configuration

Dataristix uses multi-container pods, consisting of the Core and Proxy containers plus selected connector module containers. The following example configures all available connectors, but chances are that you only need some. Remove or comment out any connector modules that are not required to save resources. The dataristix-1-pod.yml file contains:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dataristix-1-instance
spec:
  selector:
    matchLabels:
      app: dataristix-1 # has to match .spec.template.metadata.labels
  serviceName: "dataristix-1"
  replicas: 1 # by default is 1
  minReadySeconds: 10 # by default is 0
  template:
    metadata:
      labels:
        app: dataristix-1 # has to match .spec.selector.matchLabels
    spec:
      securityContext:
        runAsNonRoot: true
      terminationGracePeriodSeconds: 10
      volumes:
        - name: dataristix-1-volume
          persistentVolumeClaim:
            claimName: dataristix-1-volume-claim
      containers:
      - name: dataristix-core
      image: docker.io/dataristix/dataristix-core:latest
        # Define connector modules that Dataristix should expect to be available
        # and include corresponding containers in this configuration.
        # Remove module arguments and corresponding containers that are not required:
        args:
        - --modules="CSV, E-Mail, Excel, Google Sheets, MySQL, MQTT, OPC UA, Oracle, PostgreSQL, Power BI, REST, Script, SQL Server, SQLite"
      volumeMounts: &commonVolumeMounts
        - name: dataristix-1-volume
          mountPath: /dataristix-data        
        - name: dataristix-1-volume
          mountPath: /dataristix-secret
      - name: dataristix-proxy
      image: docker.io/dataristix/dataristix-proxy:latest
        ports:
        - containerPort: 8282
          name: dataristix-port
        volumeMounts: *commonVolumeMounts
      # CSV
      - name: dataristix-for-csv
      image: docker.io/dataristix/dataristix-for-csv:latest
        volumeMounts: *commonVolumeMounts
      # E-Mail
      - name: dataristix-for-email
      image: docker.io/dataristix/dataristix-for-email:latest
        volumeMounts: *commonVolumeMounts
      # Excel
      - name: dataristix-for-excel
      image: docker.io/dataristix/dataristix-for-excel:latest
        # Map remote RTD server port if required
        # ports:
        # - containerPort: 22783
        #   name: dx-excel-rtd        
        volumeMounts: *commonVolumeMounts
      # Google Sheets
      - name: dataristix-for-googlesheets
      image: docker.io/dataristix/dataristix-for-googlesheets:latest
        volumeMounts: *commonVolumeMounts   
      # MQTT
      - name: dataristix-for-mqtt
      image: docker.io/dataristix/dataristix-for-mqtt:latest
        ports:
        - containerPort: 1883
          name: dx-mqtt-tcp
        - containerPort: 8883
          name: dx-mqtt-tls
        # add WebSockets ports if required
        volumeMounts: *commonVolumeMounts
      # MySQL
      - name: dataristix-for-mysql
      image: docker.io/dataristix/dataristix-for-mysql:latest
        volumeMounts: *commonVolumeMounts
      # OPC UA
      - name: dataristix-for-opcua
      image: docker.io/dataristix/dataristix-for-opcua:latest
        # Map reverse-connect port if required
        # ports:
        # - containerPort: 7999
        #   name: dx-opcua
        volumeMounts: *commonVolumeMounts
      # Oracle
      - name: dataristix-for-oracle
      image: docker.io/dataristix/dataristix-for-oracle:latest
        volumeMounts: *commonVolumeMounts
      # PostgreSQL
      - name: dataristix-for-postgresql
      image: docker.io/dataristix/dataristix-for-postgresql:latest
        volumeMounts: *commonVolumeMounts
      # Power BI
      - name: dataristix-for-powerbi
      image: docker.io/dataristix/dataristix-for-powerbi:latest
        volumeMounts: *commonVolumeMounts
      # REST
      - name: dataristix-for-rest
      image: docker.io/dataristix/dataristix-for-rest:latest
        volumeMounts: *commonVolumeMounts
      # Script
      - name: dataristix-for-script
      image: docker.io/dataristix/dataristix-for-script:latest
        volumeMounts: *commonVolumeMounts
      # SQL Server
      - name: dataristix-for-sqlserver
      image: docker.io/dataristix/dataristix-for-sqlserver:latest
        volumeMounts: *commonVolumeMounts
      # SQLite
      - name: dataristix-for-sqlite
      image: docker.io/dataristix/dataristix-for-sqlite:latest
      volumeMounts: *commonVolumeMounts

Apply to create the pod.

kubectl apply -f dataristix-1-pod.yml

Ingress and port forwarding

The Dataristix pod is now available at port 8282. You may already have an ingress controller that is suitable as a reverse proxy to forward requests to the Dataristix service. Note that any proxy must also support WebSockets.

For testing in a local setup (i.e., minikube), you can simply use port forwarding:

kubectl port-forward dataristix-1-instance-0 8282:8282

Browse to http://localhost:8282 to view your Dataristix instance!

Helm charts

We hope to provide helm charts here soon! Stay tuned.

Feeback

We welcome any feedback you may have. Please contact support@rensen.io.

Available connector modules

The following modules are available as containers, for use in a Kubernetes or Docker Compose deployments:

Connector forContainer Support
CSV
E-Mail
Excel
MQTT
OPC UA
ODBC-
ODBC (32-bit) -
SQL Server®
MySQL®
PostgreSQL®
Oracle®
SQLite
Power BI®
REST
Script
SOAP-
Google Sheets
IoT Devices
InfluxDB®TBA

Microsoft, Microsoft Access, Excel, Power BI, and SQL Server are registered trademarks of Microsoft Corporation. Oracle and MySQL are registered trademarks of Oracle. PostgreSQL is a registered trademark of the PostgreSQL Community Association of Canada. SAP, SAP HANA are registered trademarks of SAP. IBM, IBM DB2 are registered trademarks of IBM. InfluxDB is a trademark of InfluxData. All other product names, trademarks and registered trademarks are the property of their respective owners.