# Running on Kubernetes

This document describes how to run HStreamDB kubernetes using the specs that we provide. The document assumes basic previous kubernetes knowledge. By the end of this section, you'll have a fully running HStreamDB cluster on kubernetes that's ready to receive reads/writes, process datas, etc.

# Building your Kubernetes Cluster

The first step is to have a running kubernetes cluster. You can use a managed cluster (provided by your cloud provider), a self-hosted cluster or a local kubernetes cluster using a tool like minikube. Make sure that kubectl points to whatever cluster you're planning to use.

Also, you need a storageClass named hstream-store, you can create by kubectl or by your cloud provider web page if it has.

TIP

For minikube user, you can use the default storage class called standard.

# Install Zookeeper

HStreamDB depends on Zookeeper for storing queries information and some internal storage configuration. So we will need to provision a zookeeper ensemble that HStreamDB will be able to access. For this demo, we will use helm (opens new window) (A package manager for kubernetes) to install zookeeper. After installing helm run:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

helm install zookeeper bitnami/zookeeper \
  --set image.tag=3.6 \
  --set replicaCount=3 \
  --set persistence.storageClass=hstream-store \
  --set persistence.size=20Gi
1
2
3
4
5
6
7
8
NAME: zookeeper
LAST DEPLOYED: Tue Jul  6 10:51:37 2021
NAMESPACE: test
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
** Please be patient while the chart is being deployed **

ZooKeeper can be accessed via port 2181 on the following DNS name from within your cluster:

    zookeeper.svc.cluster.local

To connect to your ZooKeeper server run the following commands:

    export POD_NAME=$(kubectl get pods -l "app.kubernetes.io/name=zookeeper,app.kubernetes.io/instance=zookeeper,app.kubernetes.io/component=zookeeper" -o jsonpath="{.items[0].metadata.name}")
    kubectl exec -it $POD_NAME -- zkCli.sh

To connect to your ZooKeeper server from outside the cluster execute the following commands:

    kubectl port-forward svc/zookeeper 2181:2181 &
    zkCli.sh 127.0.0.1:2181
WARNING: Rolling tag detected (bitnami/zookeeper:3.6), please note that it is strongly recommended to avoid using rolling tags in a production environment.
+info https://docs.bitnami.com/containers/how-to/understand-rolling-tags-containers/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

This will by default install a 3 nodes zookeeper ensemble. Wait until all the three pods are marked as ready:

kubectl get pods
1
NAME         READY   STATUS    RESTARTS   AGE
zookeeper-0  1/1     Running   0          22h
zookeeper-1  1/1     Running   0          4d22h
zookeeper-2  1/1     Running   0          16m
1
2
3
4

# Configuring and Starting HStreamDB

Once all the zookeeper pods are ready, we're ready to start installing the HStreamDB cluster.

# Fetching The K8s Specs

git clone git@github.com:hstreamdb/hstream.git
cd hstream/deploy/k8s
1
2

# Update Configuration

If you used a different way to install zookeeper, make sure to update the zookeeper connection string in storage config file config.json and server service file hstream-server.yaml.

It should look something like this:

$ cat config.json | grep -A 2 zookeeper
  "zookeeper": {
    "zookeeper_uri": "ip://zookeeper-0.zookeeper-headless:2181,zookeeper-1.zookeeper-headless:2181,zookeeper-2.zookeeper-headless:2181",
    "timeout": "30s"
  }

$ cat hstream-server.yaml | grep -A 1 metastore-uri
            - "--metastore-uri"
            - "zk://zookeeper-0.zookeeper-headless:2181,zookeeper-1.zookeeper-headless:2181,zookeeper-2.zookeeper-headless:2181"
1
2
3
4
5
6
7
8
9

TIP

The zookeeper connection string in storage config file and the service file can be different. But for normal scenario, they are the same.

By default, this spec installs a 3 nodes HStream server cluster and 4 nodes storage cluster. If you want a bigger cluster, modify the hstream-server.yaml and logdevice-statefulset.yaml file, and increase the number of replicas to the number of nodes you want in the cluster. Also by default, we attach a 40GB persistent storage to the nodes, if you want more you can change that under the volumeClaimTemplates section.

# Starting the Cluster

kubectl apply -k .
1

When you run kubectl get pods, you should see something like this:

NAME                                                 READY   STATUS    RESTARTS   AGE
hstream-server-0                                     1/1     Running   0          6d18h
hstream-server-1                                     1/1     Running   0          6d18h
hstream-server-2                                     1/1     Running   0          6d18h
logdevice-0                                          1/1     Running   0          6d18h
logdevice-1                                          1/1     Running   0          6d18h
logdevice-2                                          1/1     Running   0          6d18h
logdevice-3                                          1/1     Running   0          6d18h
logdevice-admin-server-deployment-5c5fb9f8fb-27jlk   1/1     Running   0          6d18h
zookeeper-0                                          1/1     Running   0          6d22h
zookeeper-1                                          1/1     Running   0          10d
zookeeper-2                                          1/1     Running   0          6d
1
2
3
4
5
6
7
8
9
10
11
12

# Bootstrapping cluster

Once all the logdevice pods are running and ready, you'll need to bootstrap the cluster to enable all the nodes. To do that, run:

kubectl run hstream-admin -it --rm --restart=Never --image=hstreamdb/hstream:latest -- \
  hadmin store --host logdevice-admin-server-service \
    nodes-config bootstrap --metadata-replicate-across 'node:3'
1
2
3

This will start a hstream-admin pod, that connects to the store admin server and invokes the nodes-config bootstrap hadmin store command and sets the metadata replication property of the cluster to be replicated across three different nodes. On success, you should see something like:

Successfully bootstrapped the cluster
pod "hstream-admin" deleted
1
2

Now, you can boostrap hstream server, by running the following command:

kubectl run hstream-admin -it --rm --restart=Never --image=hstreamdb/hstream:latest -- \
    hadmin server --host hstream-server-0.hstream-server init
1
2

On success, you should see something like:

Cluster is ready!
pod "hstream-admin" deleted
1
2

Note that depending on how fast the storage cluster completes bootstrap, running hadmin init may fail. So you may need to run the command more than once.

# Managing the Storage Cluster

kubectl run hstream-admin -it --rm --restart=Never --image=hstreamdb/hstream:latest -- bash
1

Now you can run hadmin store to manage the cluster:

hadmin store --help
1

To check the state of the cluster, you can then run:

hadmin store --host logdevice-admin-server-service status
1
+----+-------------+-------+---------------+
| ID |    NAME     | STATE | HEALTH STATUS |
+----+-------------+-------+---------------+
| 0  | logdevice-0 | ALIVE | HEALTHY       |
| 1  | logdevice-1 | ALIVE | HEALTHY       |
| 2  | logdevice-2 | ALIVE | HEALTHY       |
| 3  | logdevice-3 | ALIVE | HEALTHY       |
+----+-------------+-------+---------------+
Took 2.567s
1
2
3
4
5
6
7
8
9