Exploring Kadalu Storage in k3d Cluster - CSI Driver

2021-Mar-25 • Tags: kubernetes, kadalu, csi

In the previous article we setup a k3d cluster and discussed about a typical workflow. We'll be utilising earlier concepts and deploy a CSI Driver on the k3d cluster, perform minimal operations while exploring Kadalu storage as Persistence Storage via CSI.

Even though I'll be concentrating on the Kadalu CSI Driver component in current blog post, in itself has many moving parts. Due to that, I'll be making cross references than re-iterating the details and add extra context only when it's needed. On that note, let's get started.

Introduction §

In short, kadalu is a/an:

Open Source Project/Organization providing tools around gluster filesystem
Storage Provider (SP) compatible with CSI
Kubernetes (CO) Operator managing gluster in kubernetes
CSI Driver interacting with SP and CO

Kadalu storage can be published to various Container Orchestrators (Kubernetes, RKE, OpenShift, Microk8s)

If you have a running Kubernetes cluster and want to deploy Kadalu storage please refer quick-start from the docs. However, this blog post deals with local testing/development with k3d and it's a bit involving when deploying any CSI storage on a docker based environment alone, so please follow along.

You can use one of devices or directory path or persistent volumes to act as an underlying storage for gluster. We'll reserve all minute details around Operator and Gluster storage in containers for a later post and concentrate on CSI Driver for now.

If you are feeling adventurous and just want a script to setup and teardown k3d cluster with kadalu storage please refer this script but it carries a huge disclaimer that do not run without checking what it does or else your devices (sdc, sdd, sde) will get formatted. ⚠️

Kindly raise a github issue if any of the processes stated here resulting in an error

Kadalu in k3d cluster §

Storage systems in Kubernetes need a bi-directional mount to the underlying host and in our case we need to have a shared directory (for storing secret tokens) with k3d mapping to host system as well.

Please create cluster with below commands, I strongly recommend going through previous article to get to know about local container registry, importing images into k3d cluster etc.:

# I'll be using below directories for gluster storage
-> df -h | grep /mnt
/dev/sdc                             10G  104M  9.9G   2% /mnt/sdc
/dev/sdd                             10G  104M  9.9G   2% /mnt/sdd
/dev/sde                             10G  104M  9.9G   2% /mnt/sde

# Make a dir to be used for shared mount
-> mkdir -p /tmp/k3d/kubelet/pods

# My local registry (optional, if not used remove corresponding arg while creating the cluster)
-> bat ~/.k3d/registries.yaml  --plain
mirrors:
  "registry.localhost:5000":
    endpoint:
      - "http://registry.localhost:5000"

# Create a k3d cluster with volume mounts and local registry
-> k3d cluster create test -a 3 -v /tmp/k3d/kubelet/pods:/var/lib/kubelet/pods:shared \
-v /mnt/sdc:/mnt/sdc -v /mnt/sdd:/mnt/sdd -v /mnt/sde:/mnt/sde \
-v ~/.k3d/registries.yaml:/etc/rancher/k3s/registries.yaml
[...]
INFO[0000] Created volume 'k3d-test-images'
INFO[0001] Creating node 'k3d-test-server-0'
[...]
INFO[0044] Starting helpers...
INFO[0044] Starting Node 'k3d-test-serverlb'
[...]
kubectl cluster-info

# Deploy kadalu operator with setting 'verbose' to 'yes'
-> curl -s https://raw.githubusercontent.com/kadalu/kadalu/devel/manifests/kadalu-operator.yaml \
| sed 's/"no"/"yes"/' | kubectl apply -f -

Once kadalu operator is deployed it reconciles the state as per config and deploys nodeplugin as daemonset,provisioner (~controller) as statefulset and watches CRD for creating kadalu storage among others.

Things to take note of:

You can refer above stated script for importing local docker images into k3d cluster before deploying the operator.
For installing operator through helm please refer github
At the time of this writing, HEAD on devel branch is at commit 9fe6ad4

Verify all the pods are deployed and are in running state in kadalu namespace. You can install kubectx and kubens for easy navigation across contexts and namespaces.

-> kubectl get pods -n kadalu -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP          NODE                NOMINATED NODE   READINESS GATES
operator-88bd4784c-bkzlt      1/1     Running   0          23m   10.42.0.5   k3d-test-server-0   <none>           <none>
kadalu-csi-nodeplugin-8ttmk   3/3     Running   0          23m   10.42.3.3   k3d-test-agent-2    <none>           <none>
kadalu-csi-nodeplugin-fv57x   3/3     Running   0          23m   10.42.1.5   k3d-test-agent-0    <none>           <none>
kadalu-csi-nodeplugin-ngfm2   3/3     Running   0          23m   10.42.2.4   k3d-test-agent-1    <none>           <none>
kadalu-csi-nodeplugin-7qwhm   3/3     Running   0          23m   10.42.0.6   k3d-test-server-0   <none>           <none>
kadalu-csi-provisioner-0      5/5     Running   0          23m   10.42.3.4   k3d-test-agent-2    <none>           <none>

# Using mounted volumes for creating storage pool
-> bat ../storage-config-path.yaml --plain; kubectl apply -f ../storage-config-path.yaml
---
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
  name: replica3
spec:
  type: Replica3
  storage:
    - node: k3d-test-agent-0
      path: /mnt/sdc
    - node: k3d-test-agent-1
      path: /mnt/sdd
    - node: k3d-test-agent-2
      path: /mnt/sde
kadalustorage.kadalu-operator.storage/replica3 created

# Verify server pods are up and running
-> kubectl get pods -l app.kubernetes.io/component=server
NAME                  READY   STATUS    RESTARTS   AGE
server-replica3-1-0   1/1     Running   0          4m28s
server-replica3-2-0   1/1     Running   0          4m27s
server-replica3-0-0   1/1     Running   0          4m29s

The end, you can follow official docs for creating pv, pvs from above created kadalu.replica3 storage class and use them in app pods and comfortably skip what follows next or continue if you want to know about debugging Kadalu CSI Driver (or running a debug container in general).

Debugging Kadalu CSI Driver §

I read a couple of blog posts discussing about debugging a (python) application running in a container however they didn't fit my needs well (either they are editor dependent or time taking 😕 ).

I'm not saying the methods shared here are superior however they are making my workflow a tad bit easier rather than making changes to source code, committing the docker container and re-deploying cycle or running a server accessible to editor and debugging the code.

We have one server (master) and three agents (worker) in our k3d cluster, to ease the things you can get away with running a single server node and debug your application. However, I'm more interested in simulating a user environment as much as possible and so is the distributed nature of the environment.

Prerequisite (or Good to know info) §

About CSI volume plugin & driver implementation:

Please refer this great article about building a CSI Plugin which summarizes CSI Design Proposal and CSI Spec succinctly
(All are optional assuming some degree of knowledge on these) Protobuf Python tutorial, GRPC Python tutorial and GRPC status codes

About Python debugger:

Starting from Python v3.7, we can pause the execution and drop into debugger using the function breakpoint() more from the docs
More on using breakpoint feature you can refer this blog post as well
Possible ways of attaching a debugger to a running process from a SO answer, however the gist is, we can't attach debugger to running python process without restarting or having a singal handler in the source

About Kubectl cp and port-forward:

Kubectl has functionality for copying files to and from containers, more from the docs
We can forward a containers port directly using port-forward functionality rather than using Service resource (not recommended for any prod deployments), refer docs for reference

Miscellaneous:

socat: Utility which can expose a Unix Domain Socket (UDS) over tcp, you can refer this Dockerfile for running socat in a pod and exposing CSI UDS connection over tcp
Definitely check out csi-sanity to verify whether a CSI Driver is conforming to spec or not and refer this article for more on testing CSI Volume Drivers
Container Storage Client (csc): A CLI client with CSI RPCs implemented to talk with a CSI endpoint via tcp or UDS
gRPCurl: Optional but a great addition if you want to communicate with any gRPC server

Alright, as we got hold of the basics, on to the problem statement, implementation and debugging code.

Note: I can't possibly go through every minute detail, please start a discussion in comments section.

Problem Statement §

We have an unimplemented RPC method ListVolumes which is a part of Controller service, usually this method is invoked by External Health Monitor Controller
Currently we are not using external-health-monitor-controller sidecar to test this feature even if we implement and need to call this method manually

I've created a PVC and mounted it in a container, we just need to return PVC Volume name with the minimum required fields as per the proto definition to satisfy ListVolumes RPC

# PVC which is hosted on above created Kadalu Replica3 storage class
-> kubectl get pvc
NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
pvc2g   Bound    pvc-02072076-468a-43b2-bf40-b33ae6978e19   2Gi        RWX            kadalu.replica3   23h

# Pod which is currently using `pvc2g` claim
-> kubectl describe pvc pvc2g | grep Used
Used By:       pod1

Not that it's tough to implement, just not to lengthen this article we'll cutomize our RPC client call to return all the volumes without tokenizing. Before proceeding with implemention, knowing about how Kadalu storage provisions PVC would be helpful.

Kadalu creates a normal Gluster volume out of the bricks/paths provided above and creates a PVC directory on it
Using the Gluster subdir functionality, CSI Driver only mounts PVC directory into the container upon receiving NodePublishVolume and sets and updates quota not to spill over then capacity received as part of the initial request

Typical Implementation §

Note that most if not all of the RPC calls should be idempotent and all the methods that implement them internally should try to reach to required state or log and fail with error.

One of the commits de-coupled the code at the process level which enabled the separation of concerns with monitoring state and reconciling the process to required state without which steps followed in the rest of the article will not be possible.

Before we proceed further, let's invoke ListVolumes method with no code change and then arrive at the solution. We'll deploy socat pod as a daemonset on all k3d agents which exposes CSI UDS as tcp connection and use csc to connect to the tcp port.

As provisioner pod uses an emptyDir we need to access that differently and use a deployment with 1 replica and schedule that on node where provisioner pod is deployed.

Important: The beauty of one of the recommended approaches packaging all services in a single binary we can get away with not having extra deployment on provisioner pod's node. The downside is, when we access Controller Services the log messages end up in Node Service Pods. For brevity, I'm using separate pod for accessing provisioner csi.sock file.

A pod manifest exists in the repo however below is a modified form:

1 -> bat tests/test-csi/sanity-debug.yaml --plain
2 ---
3 apiVersion: apps/v1
4 kind: DaemonSet
5 metadata:
6   namespace: kadalu
7   name: sanity-ds
8   labels:
9     name: sanity-ds
10 spec:
11   selector:
12     matchLabels:
13       name: sanity-ds
14   template:
15     metadata:
16       labels:
17         name: sanity-ds
18     spec:
19       containers:
20         - name: socat
21           image: alpine/socat:1.0.5
22           args:
23             - tcp-listen:10000,fork,reuseaddr
24             - unix-connect:/plugin/csi.sock
25           volumeMounts:
26             - name: csi-sock
27               mountPath: /plugin/csi.sock
28       volumes:
29         - name: csi-sock
30           hostPath:
31             path: /var/lib/kubelet/plugins_registry/kadalu/csi.sock
32             type: Socket
33 ---
34 apiVersion: apps/v1
35 kind: Deployment
36 metadata:
37   namespace: kadalu
38   name: sanity-dp
39   labels:
40     name: sanity-dp
41 spec:
42   replicas: 1
43   selector:
44     matchLabels:
45       name: sanity-dp
46   template:
47     metadata:
48       labels:
49         name: sanity-dp
50     spec:
51       affinity:
52         podAffinity:
53           requiredDuringSchedulingIgnoredDuringExecution:
54           - labelSelector:
55               matchExpressions:
56               - key: app.kubernetes.io/name
57                 operator: In
58                 values:
59                 - kadalu-csi-provisioner
60             topologyKey: "kubernetes.io/hostname"
61       containers:
62         - name: socat
63           image: alpine/socat:1.0.5
64           args:
65             - tcp-listen:10001,fork,reuseaddr
66             - unix-connect:/plugin/csi.sock
67           volumeMounts:
68             - name: csi-sock
69               mountPath: /plugin/csi.sock
70       volumes:
71         - name: csi-sock
72           hostPath:
73             # UID of the POD should be replaced before deployment
74             path: '/var/lib/kubelet/pods/POD_UID/volumes/kubernetes.io~empty-dir/socket-dir/csi.sock'
75             type: Socket

Deploying pods after replacing POD_UID in the yaml manifest

1 # Store Provisioner Pod UID
2 -> POD_UID=$(kubectl get pods kadalu-csi-provisioner-0 -o jsonpath={'.metadata.uid'})
3 
4 # Applying and verifying the manifest
5 -> sed "s/POD_UID/$POD_UID/" tests/test-csi/sanity-debug.yaml | kubectl apply -f -
6 daemonset.apps/sanity-ds created
7 deployment.apps/sanity-dp created
8 
9 # Pods after reaching Ready state (Sanitized output)
10 -> kubectl get pods --sort-by='{.spec.nodeName}' \
11 -o=custom-columns='NAME:.metadata.name,NODE:.spec.nodeName' | grep -P 'sanity|csi|NODE'
12 NAME                          NODE
13 kadalu-csi-nodeplugin-fv57x   k3d-test-agent-0
14 sanity-ds-6mxxc               k3d-test-agent-0
15 
16 sanity-ds-qtz6d               k3d-test-agent-1
17 kadalu-csi-nodeplugin-ngfm2   k3d-test-agent-1
18 
19 sanity-ds-z6f5s               k3d-test-agent-2
20 kadalu-csi-provisioner-0      k3d-test-agent-2
21 sanity-dp-67cc596d6c-xknf7    k3d-test-agent-2
22 kadalu-csi-nodeplugin-8ttmk   k3d-test-agent-2
23 
24 sanity-ds-2khrd               k3d-test-server-0
25 kadalu-csi-nodeplugin-7qwhm   k3d-test-server-0

You can see from above output that we can have access to csi.sock from every k3d node exposed via sanity pod on port 10000 (10001 for accessing Controller Server). All we have to do is port-foward exposed port and access it with csc.

Here we are port-forward'ing from pod sanity-dp-67cc596d6c-xknf7 so that we can talk with controller service deployed on kadalu-csi-provisioner-0 pod.

# In one pane, run a 'kubectl port-forward'
-> kubectl port-forward pods/sanity-dp-67cc596d6c-xknf7 :10001
Forwarding from 127.0.0.1:41289 -> 10001
Forwarding from [::1]:41289 -> 10001

# In another pane, run `ncat` to keep above port-fowarding alive
-> while true; do nc -vz 127.0.0.1 41289 ; sleep 15 ; done

# Another pane, finally we can access tcp connection to talk with our CSI Controller server
-> csc identity plugin-info -e tcp://127.0.0.1:41289
"kadalu"        "devel"

-> csc controller get-capabilities -e tcp://127.0.0.1:41289
&{type:CREATE_DELETE_VOLUME }
&{type:LIST_VOLUMES }
&{type:EXPAND_VOLUME }

# What we want to implement
-> csc controller list-volumes -e tcp://127.0.0.1:41289
Failed to serialize response

Please use -h,--help for more information

# Logs from provisioner when above is run
-> kubectl logs kadalu-csi-provisioner-0 kadalu-provisioner | tail
[2021-03-25 07:09:53,332] ERROR [_common - 88:_transform] - Exception serializing message!
Traceback (most recent call last):
  File "/kadalu/lib/python3.8/site-packages/grpc/_common.py", line 86, in _transform
    return transformer(message)
TypeError: descriptor 'SerializeToString' for 'google.protobuf.pyext._message.CMessage' objects doesn't apply to a 'NoneType' object

Couple of points to take note of above:

We can change the csi.sock path in daemon-set/deployment manifest to be able to use above method in any CSI Driver accordingly
We can use secrets if the CSI Driver (/underlying storage) supports it for calling RPC methods
We haven't yet dealt with changing/adding code snippets to src files in the container and test them
We either have a bug in csc that's not able to parse the response or the response is malformed from CSI driver. Either ways we'll implement this function and review the results (sneak peak, the bug is in Driver)

Note: By the time you read this post, the bug may be fixed however our main aim for this post is about the process of debugging CSI Driver

I cloned Kadalu repo and implemented a quick and dirty method definition for ListVolumes and below is the code snippet:

1 # csi/controllerserver.py
2 # ...
3 def ListVolumes(self, request, context):
4     # Return list of all volumes (pvc's) in every hostvol
5 
6     errmsg = ''
7     pvcs = []
8 
9     try:
10         # Mount hostvol, walk through directories and return pvcs
11         for volume in get_pv_hosting_volumes({}):
12             hvol = volume['name']
13             mntdir = os.path.join(HOSTVOL_MOUNTDIR, hvol)
14             mount_glusterfs(volume, mntdir)
15             json_files = glob.glob(os.path.join(mntdir, 'info', '**',
16                                                 '*.json'),
17                                     recursive=True)
18             pvcs.extend([
19                 name[name.find('pvc'):name.find('.json')]
20                 for name in json_files
21             ])
22     except Exception as excep:
23         errrmsg = str(excep)
24 
25     if not pvcs or errmsg:
26         errmsg = errmsg or "Unable to find pvcs"
27         logging.error("ERROR: %s", errmsg)
28         context.set_details(errmsg)
29         context.set_code(grpc.StatusCode.NOT_FOUND)
30         return csi_pb2.ListVolumesResponse()
31 
32     logging.info(logf("Got list of volumes", pvcs=pvcs))
33     return csi_pb2.ListVolumesResponse(entries=[{
34         "volume": {
35             "volume_id": pvc
36         }
37     } for pvc in pvcs])
38 # ...

Debugging or testing the changes §

Now that we have the method implemented we will copy the corresponding src file into container, kill main.py (which registers all CSI services) and reconciler (start.py) monitors/observes the process absense then it'll run main.py as subprocess which'll run our modified python src file.

1 # Copy the src file into kadalu-provisioner
2 -> kubectl cp csi/controllerserver.py kadalu-csi-provisioner-0:/kadalu/controllerserver.py -c kadalu-provisioner
3 
4 # Processes running in provisioner container
5 -> kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'ps -ef | grep python'
6 root           1       0  0 Mar23 ?        00:00:24 python3 /kadalu/start.py
7 root           8       1  0 Mar23 ?        00:01:13 python3 /kadalu/main.py
8 root           9       1  0 Mar23 ?        00:00:32 python3 /kadalu/exporter.py
9 root      246800       0  0 10:33 pts/3    00:00:00 sh -c ps -ef | grep python
10 root      246808  246800  0 10:33 pts/3    00:00:00 grep python
11 
12 # Init process is `start.py` and it runs `main.py` and `exporter.py` as subprocess
13 # monitors and tries it's best to keep them running.
14 # Killing `main.py` will be singalled to `start.py` and will be re-run again
15 -> kubeclt exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'kill 8'
16 
17 # `main.py` is run again and got a PID 246855, as methods from `csi/controllerserver.py` is
18 # imported in `main.py` it'll call above modified method
19 -> kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'ps -ef | grep python'
20 root           1       0  0 Mar23 ?        00:00:24 python3 /kadalu/start.py
21 root           9       1  0 Mar23 ?        00:00:32 python3 /kadalu/exporter.py
22 root      246855       1  3 10:33 ?        00:00:00 python3 /kadalu/main.py
23 root      246897       0  0 10:33 pts/3    00:00:00 sh -c ps -ef | grep python
24 root      246904  246897  0 10:33 pts/3    00:00:00 grep python

If in a hurry you call the csc client again to ListVolumes using the same tcp port, you'll be treated with a Connection Closed message (cause it's actually closed upon killing process)

-> csc identity plugin-info -e tcp://127.0.0.1:41289
connection closed

Please use -h,--help for more information

As we have deployed socat pods using deployment and daemonset kind we can delete the pod to be presented a new tcp connection at the worst case or we can perform the same (port-foward and nc) before using csc again

1 # Tada! I did get it correct in first try itself :)
2 -> csc controller list-volumes -e tcp://127.0.0.1:46171
3 "pvc-02072076-468a-43b2-bf40-b33ae6978e19"      0
4 
5 # Validation
6 -> kubectl get pvc
7 NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
8 pvc2g   Bound    pvc-02072076-468a-43b2-bf40-b33ae6978e19   2Gi        RWX            kadalu.replica3   32h
9 
10 # Logs from provisioner upon calling above RPC method
11 -> k logs kadalu-csi-provisioner-0 kadalu-provisioner | tail
12 TypeError: descriptor 'SerializeToString' for 'google.protobuf.pyext._message.CMessage' objects doesn't apply to a 'NoneType' object
13 Latest consumption on /mnt/replica3/subvol/9a/3a/pvc-02072076-468a-43b2-bf40-b33ae6978e19 : 0
14 Latest consumption on /mnt/replica3/subvol/9a/3a/pvc-02072076-468a-43b2-bf40-b33ae6978e19 : 0
15 Latest consumption on /mnt/replica3/subvol/9a/3a/pvc-02072076-468a-43b2-bf40-b33ae6978e19 : 0
16 [2021-03-25 10:33:39,051] INFO [kadalulib - 369:monitor_proc] - Restarted Process        name=csi
17 [2021-03-25 10:33:39,403] DEBUG [volumeutils - 812:mount_glusterfs] - Already mounted    mount=/mnt/replica3
18 [2021-03-25 10:33:39,404] INFO [main - 36:mount_storage] - Volume is mounted successfully        hvol=replica3
19 [2021-03-25 10:33:39,417] INFO [main - 56:main] - Server started
20 [2021-03-25 10:36:37,664] DEBUG [volumeutils - 812:mount_glusterfs] - Already mounted    mount=/mnt/replica3
21 [2021-03-25 10:36:37,709] INFO [controllerserver - 345:ListVolumes] - Got list of volumes        pvcs=['pvc-02072076-468a-43b2-bf40-b33ae6978e19']

Unfortunately, setting a breakpoint() in a grpc context results in bdb.BdbQuit error when attached to TTY of the container. We'll go through using breakpoint() feature in subsequent posts which supports it and below is the brief process:

Wherever we want to pause the execution just introduce breakpoint() function in the src file and perform cp, restart of socat pod and perform the operation which triggers the breakpoint
The execution will be paused at breakpoint and attach the container from daemonset/statefulsets/deployments kinds using command similar to

# Target can be 'ds'/'sts'/deploy' kinds
-> kubectl attach sts/kadalu-csi-provisioner -c kadalu-provisioner -it
Unable to use a TTY - container kadalu-provisioner did not allocate one
If you don't see a command prompt, try pressing enter.

[...]

If we want to test/kill main.py which is the init process, container itself will be killed and replaced with a new pod, so the modified code will not come into effect.

In such cases we need to (docker) commit the container after cp of the code blocks, retag and push to the local registry (remember k3d cluster can access local registry) and change/edit/patch the image source in yaml manifests. (We'll go through this scenario as well in later posts 😃 )

Caveats and tips §

If you don't have a busy system you can assign a static port in kubectl port-forward command and register the PORT as the environment variable to re-use everytime
It goes without saying, the above said process will be super simple if we are developing/debugging changes in a single node cluster
It's not always easy to change, cp file and restart the tcp connection, you can configure an alias for easy of use
Not all cases can be debuggable however with some effort we can change the workflow to our needs
Better logging systems is hardly replaceable in any code base and should be the first place to look on hitting any issues

Summary §

If you give a couple of reads you can easily derive below gist:

Create a k3d cluster and deploy kadalu operator which'll pull all the necessary images and creates pods/containers needed to serve storage
Refer/create newly available storage in your PVC manifests and mount the PV in the pod/container
For debugging CSI Volume Driver, get a hold of csi.sock file and expose that as a tcp connection (via Socat), access that tcp connection with csc
Change the src code and cp the file to container, restart the socat pods and communicate via csc client for debugging grpc methods

Cleanup of the cluster: If you have followed previous article and current post, you can delete entire k3d cluster without any trace by following below steps:

First, delete all the daemonsets/deployments which aren't created by kadalu operator
Run the provided cleanup script bash <(curl -s https://raw.githubusercontent.com/kadalu/kadalu/devel/extras/scripts/cleanup) to delete all kadalu components
Delete k3d cluster by running -> k3d cluster delete test
Due to the usage of shared directories there'll be some left overs mounted even after cluster deletion, when you create a new cluster these directories will be masked and new mounts happen leaving a lot of unwanted entries, take a diff after cluster deletion, unmount whichever isn't mounted after the end of cluster deletion by running:

-> diff <(df -ha | grep pods | awk '{print $NF}') <(df -h | grep pods | awk '{print $NF}') \
| awk '{print $2}' | xargs umount -l

# Some housekeeping for docker (Don't run these without knowing what they do)
-> docker rmi $(docker images -f "dangling=true" -q)
-> docker volume prune -f
-> docker volume rm $(docker volume ls -qf dangling=true)

As stated earlier script for setup and teardon of k3d cluster is available here, you have been warned, don't run without checking it.

It may seem that we have covered a lot of ground but I had to intentionally drop off some excerpts. I'll be continuing with exploring another component of Kadalu storage in later posts and add any points missed in current post. Stay tuned 👀

Send an email for any comments. Kudos for making it to the end. Thanks!

1	-> bat tests/test-csi/sanity-debug.yaml --plain
2	---
3	apiVersion: apps/v1
4	kind: DaemonSet
5	metadata:
6	namespace: kadalu
7	name: sanity-ds
8	labels:
9	name: sanity-ds
10	spec:
11	selector:
12	matchLabels:
13	name: sanity-ds
14	template:
15	metadata:
16	labels:
17	name: sanity-ds
18	spec:
19	containers:
20	- name: socat
21	image: alpine/socat:1.0.5
22	args:
23	- tcp-listen:10000,fork,reuseaddr
24	- unix-connect:/plugin/csi.sock
25	volumeMounts:
26	- name: csi-sock
27	mountPath: /plugin/csi.sock
28	volumes:
29	- name: csi-sock
30	hostPath:
31	path: /var/lib/kubelet/plugins_registry/kadalu/csi.sock
32	type: Socket
33	---
34	apiVersion: apps/v1
35	kind: Deployment
36	metadata:
37	namespace: kadalu
38	name: sanity-dp
39	labels:
40	name: sanity-dp
41	spec:
42	replicas: 1
43	selector:
44	matchLabels:
45	name: sanity-dp
46	template:
47	metadata:
48	labels:
49	name: sanity-dp
50	spec:
51	affinity:
52	podAffinity:
53	requiredDuringSchedulingIgnoredDuringExecution:
54	- labelSelector:
55	matchExpressions:
56	- key: app.kubernetes.io/name
57	operator: In
58	values:
59	- kadalu-csi-provisioner
60	topologyKey: "kubernetes.io/hostname"
61	containers:
62	- name: socat
63	image: alpine/socat:1.0.5
64	args:
65	- tcp-listen:10001,fork,reuseaddr
66	- unix-connect:/plugin/csi.sock
67	volumeMounts:
68	- name: csi-sock
69	mountPath: /plugin/csi.sock
70	volumes:
71	- name: csi-sock
72	hostPath:
73	# UID of the POD should be replaced before deployment
74	path: '/var/lib/kubelet/pods/POD_UID/volumes/kubernetes.io~empty-dir/socket-dir/csi.sock'
75	type: Socket

1	# Store Provisioner Pod UID
2	-> POD_UID=$(kubectl get pods kadalu-csi-provisioner-0 -o jsonpath={'.metadata.uid'})
3
4	# Applying and verifying the manifest
5	-> sed "s/POD_UID/$POD_UID/" tests/test-csi/sanity-debug.yaml \| kubectl apply -f -
6	daemonset.apps/sanity-ds created
7	deployment.apps/sanity-dp created
8
9	# Pods after reaching Ready state (Sanitized output)
10	-> kubectl get pods --sort-by='{.spec.nodeName}' \
11	-o=custom-columns='NAME:.metadata.name,NODE:.spec.nodeName' \| grep -P 'sanity\|csi\|NODE'
12	NAME NODE
13	kadalu-csi-nodeplugin-fv57x k3d-test-agent-0
14	sanity-ds-6mxxc k3d-test-agent-0
15
16	sanity-ds-qtz6d k3d-test-agent-1
17	kadalu-csi-nodeplugin-ngfm2 k3d-test-agent-1
18
19	sanity-ds-z6f5s k3d-test-agent-2
20	kadalu-csi-provisioner-0 k3d-test-agent-2
21	sanity-dp-67cc596d6c-xknf7 k3d-test-agent-2
22	kadalu-csi-nodeplugin-8ttmk k3d-test-agent-2
23
24	sanity-ds-2khrd k3d-test-server-0
25	kadalu-csi-nodeplugin-7qwhm k3d-test-server-0

1	# csi/controllerserver.py
2	# ...
3	def ListVolumes(self, request, context):
4	# Return list of all volumes (pvc's) in every hostvol
5
6	errmsg = ''
7	pvcs = []
8
9	try:
10	# Mount hostvol, walk through directories and return pvcs
11	for volume in get_pv_hosting_volumes({}):
12	hvol = volume['name']
13	mntdir = os.path.join(HOSTVOL_MOUNTDIR, hvol)
14	mount_glusterfs(volume, mntdir)
15	json_files = glob.glob(os.path.join(mntdir, 'info', '**',
16	'*.json'),
17	recursive=True)
18	pvcs.extend([
19	name[name.find('pvc'):name.find('.json')]
20	for name in json_files
21	])
22	except Exception as excep:
23	errrmsg = str(excep)
24
25	if not pvcs or errmsg:
26	errmsg = errmsg or "Unable to find pvcs"
27	logging.error("ERROR: %s", errmsg)
28	context.set_details(errmsg)
29	context.set_code(grpc.StatusCode.NOT_FOUND)
30	return csi_pb2.ListVolumesResponse()
31
32	logging.info(logf("Got list of volumes", pvcs=pvcs))
33	return csi_pb2.ListVolumesResponse(entries=[{
34	"volume": {
35	"volume_id": pvc
36	}
37	} for pvc in pvcs])
38	# ...

1	# Copy the src file into kadalu-provisioner
2	-> kubectl cp csi/controllerserver.py kadalu-csi-provisioner-0:/kadalu/controllerserver.py -c kadalu-provisioner
3
4	# Processes running in provisioner container
5	-> kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'ps -ef \| grep python'
6	root 1 0 0 Mar23 ? 00:00:24 python3 /kadalu/start.py
7	root 8 1 0 Mar23 ? 00:01:13 python3 /kadalu/main.py
8	root 9 1 0 Mar23 ? 00:00:32 python3 /kadalu/exporter.py
9	root 246800 0 0 10:33 pts/3 00:00:00 sh -c ps -ef \| grep python
10	root 246808 246800 0 10:33 pts/3 00:00:00 grep python
11
12	# Init process is `start.py` and it runs `main.py` and `exporter.py` as subprocess
13	# monitors and tries it's best to keep them running.
14	# Killing `main.py` will be singalled to `start.py` and will be re-run again
15	-> kubeclt exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'kill 8'
16
17	# `main.py` is run again and got a PID 246855, as methods from `csi/controllerserver.py` is
18	# imported in `main.py` it'll call above modified method
19	-> kubectl exec -it kadalu-csi-provisioner-0 -c kadalu-provisioner -- sh -c 'ps -ef \| grep python'
20	root 1 0 0 Mar23 ? 00:00:24 python3 /kadalu/start.py
21	root 9 1 0 Mar23 ? 00:00:32 python3 /kadalu/exporter.py
22	root 246855 1 3 10:33 ? 00:00:00 python3 /kadalu/main.py
23	root 246897 0 0 10:33 pts/3 00:00:00 sh -c ps -ef \| grep python
24	root 246904 246897 0 10:33 pts/3 00:00:00 grep python

1	# Tada! I did get it correct in first try itself :)
2	-> csc controller list-volumes -e tcp://127.0.0.1:46171
3	"pvc-02072076-468a-43b2-bf40-b33ae6978e19" 0
4
5	# Validation
6	-> kubectl get pvc
7	NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
8	pvc2g Bound pvc-02072076-468a-43b2-bf40-b33ae6978e19 2Gi RWX kadalu.replica3 32h
9
10	# Logs from provisioner upon calling above RPC method
11	-> k logs kadalu-csi-provisioner-0 kadalu-provisioner \| tail
12	TypeError: descriptor 'SerializeToString' for 'google.protobuf.pyext._message.CMessage' objects doesn't apply to a 'NoneType' object
13	Latest consumption on /mnt/replica3/subvol/9a/3a/pvc-02072076-468a-43b2-bf40-b33ae6978e19 : 0
14	Latest consumption on /mnt/replica3/subvol/9a/3a/pvc-02072076-468a-43b2-bf40-b33ae6978e19 : 0
15	Latest consumption on /mnt/replica3/subvol/9a/3a/pvc-02072076-468a-43b2-bf40-b33ae6978e19 : 0
16	[2021-03-25 10:33:39,051] INFO [kadalulib - 369:monitor_proc] - Restarted Process name=csi
17	[2021-03-25 10:33:39,403] DEBUG [volumeutils - 812:mount_glusterfs] - Already mounted mount=/mnt/replica3
18	[2021-03-25 10:33:39,404] INFO [main - 36:mount_storage] - Volume is mounted successfully hvol=replica3
19	[2021-03-25 10:33:39,417] INFO [main - 56:main] - Server started
20	[2021-03-25 10:36:37,664] DEBUG [volumeutils - 812:mount_glusterfs] - Already mounted mount=/mnt/replica3
21	[2021-03-25 10:36:37,709] INFO [controllerserver - 345:ListVolumes] - Got list of volumes pvcs=['pvc-02072076-468a-43b2-bf40-b33ae6978e19']