Exploring Kadalu Storage in k3d Cluster - Operator
2021-May-03 • Tags: kubernetes, kadalu
We are picking up where we left off on the discussion about CSI Driver component of kadalu storage. After the deployment of CSI Pods it's all user facing, adding storage and provisioning PVs etc.
Today we'll be discussing about where it all starts, namely Kadalu Operator. As always I'll provide references to resources which takes away re-iterating the same concepts.
Introduction §
A kubernetes operator has functional knowledge on how to manage a piece of tech (here, it's bootstrapping glusterfs to be used in kubernetes) among others. Custom Resource Definitions (CRDs) are at the heart of an operator in which we'll be listening to events on CRD object and turn knobs in managed resource.
Below are mandatory if you didn't come across kubernetes operators or how to use kubernetes python api:
Optional:
Since we have already covered a lot of ground in previous post about Kadalu Project and prerequisites, we'll be diving directly into the kadalu operator component.
Deep dive into Operator §
We'll be using a lot of python PDB module abilities and directly follow the actions of operator in container itself and a good understanding of python debugger will be useful to follow along. Python documentation on pdb is easy to follow and self sufficient.
As my workflow is heavily CLI based around tmux and neovim I think below image would be helpful going forward to reference on which pane I'll be performing the tasks.
- Pane1: Any change to the source code will be performed which will be copied into the container
- Pane2: Verify the state of the cluster and to run any other ad hoc commands
- Pane3: Exec into
kadalu-operator
container running onoperator
pod - Pane4: Attach to command prompt of
kadalu-operator
container and make sure not to send anyKeyboardInterrupt
(^C) ever
If you are interested in only knowing about how we can enter into PDB in a container, please refer below and rest of the post talks about various ways leveraging this simple concept to illustrate the workflow that happens in the operator.
-> python -m pdb -c 'import signal' -c 'signal.signal(signal.SIGUSR1, lambda *args: breakpoint())' -cc <pyfile>
How it works?
-m pdb
: invokes python interpreter in trace mode of pdb-c
: provided topdb
as command line argumentsimport signal
: imports signal module from stdlib into pdb (as the starting word is not a pdb reserved keyword, there's no need to start the statement with!
, same for next set of commands)signal.signal(signal.SIGUSR1, lambda *args: breakpoint())
: registering a handler forSIGUSR1
signal, (breakpoint() [in version >=3.7] =~ import pdb; pdb.set_trace()
)c
:continue
, without this pdb prompt wait for user input
Pros:
- After the signal handler is registered we can invoke python debugger by running
kill -USR1 <pid>
on python process asynchronously - We'll have the ability to navigate around last ~10 stack frames and inspect arguments, variables etc.
- In general, getting into debugger of a running process without changing any source code is an added benefit, having said that nothing can replace a good logging system 😄
Cons:
- If the breakpoints are set by dropping into pdb via kill signal, no history/breakpoints will not be restored if the process is restarted with
run/restart
commands 😞
I would say above is a major con however you can evaluate both pros and cons against your use case. In quest of providing resources around pdb I find these articles are good recommendations, series on pdb and pdbrc
Actually the end, the rest of the article explains what happens in the kadalu operator in a greater detail and hang on if you want to see how we can apply above concept for a container running python program. At the time of writing this blog, HEAD
of kadalu repo is at commit f86799f
Creating a k3d
cluster
§
I've a newly created k3d
cluster and all operations are performed on that. I usually follow this script for creating a cluster, you can modify and use the same. HEAD
of forge repo is at commit 73604c
Please don't run without checking the script.
-> bash <(curl -s https://raw.githubusercontent.com/leelavg/forge/master/adhoc/k3d-kadalu.sh) -m -k n
[...]
Process Monitor (Reconciler) §
Every component of Kadalu, i.e., Operator, Server and CSI depends on a core monitor process and runs business logic as subprocess and tries it's best to keep child alive.
More than often, if you are debugging any component make sure this monitor is always stable as it works behind the scenes. These monitor methods are imported into start.py
in every component and triggers a subprocess main.py
among others which are specific to the component.
Overriding docker entrypoint
§
More than often there'll be a confusion between docker entrypoint and kubernetes command directive, please refer this official doc once and for all to understand the difference if you have struggled earlier. At least I did find that confusing for some time 😥
In this method I did override only the entrypoint (python3 /kadalu/start.py
) specified in the docker image of kadalu-operator as below:
-> git diff manifests/kadalu-operator.yaml
diff --git a/manifests/kadalu-operator.yaml b/manifests/kadalu-operator.yaml
index 985a297..9fc372c 100644
--- a/manifests/kadalu-operator.yaml
+++ b/manifests/kadalu-operator.yaml
@@ -271,6 +271,9 @@ spec:
privileged: true
image: docker.io/kadalu/kadalu-operator:devel
imagePullPolicy: IfNotPresent
+ stdin: true
+ tty: true
+ command: ["python3", "-m", "pdb", "-c", "alias reg_sig tbreak /kadalu/kadalulib.py:322;; c;; signal.signal(signal.SIGUSR1, lambda *args:breakpoint());; c;;", "/kadalu/start.py"]
env:
- name: WATCH_NAMESPACE
valueFrom:
@@ -293,4 +296,4 @@ spec:
- name: K8S_DIST
value: "kubernetes"
- name: VERBOSE
- value: "no"
\ No newline at end of file
+ value: "yes"
Setting stdin
and tty
as true
is mandatory to communicate with pdb
, if these aren't set as soon as you hit (Pdb)
prompt, BadQuit
exception is raised and your container goes into failed state after multiple tries.
Here I'm not running any pdb commands but only registering an alias for reg_sig
and if we deploy above manifest it'll wait for input and we can add any breakpoints if we wish to retain them if init process gets restarted manually or automatically.
By looking at the source I know by the time kadalulib.py
is called signal
module is appended to sys.path
, just for convenience I'm registering the signal handler at this specific line. After deploying above manifest I'm just checking the alias and registering the signal handler.
Note:
'tbreak'
is called temporary break and gets cleared after it's hit for the first time,'kubectl attach'
operation is only performed onces until it times out, it's just being shown in the listings to refer to one of above mentioned panes.
-> kubectl apply -f manifests/kadalu-operator.yaml
[...]
-> kubectl logs deploy/operator -c kadalu-operator
> /kadalu/start.py(1)<module>()
-> import os
(Pdb)
-> kubectl attach deploy/operator -c kadalu-operator -it
If you don't see a command prompt, try pressing enter.
(Pdb) alias reg_sig
reg_sig = tbreak /kadalu/kadalulib.py:322;; c;; signal.signal(signal.SIGUSR1, lambda *args:breakpoint());; c;;
(Pdb) reg_sig
Breakpoint 1 at /kadalu/kadalulib.py:322
Deleted breakpoint 1 at /kadalu/kadalulib.py:322
> /kadalu/kadalulib.py(322)__init__()
-> if procs is not None:
<Handlers.SIG_DFL: 0>
[...]
Now we can send a USR1 signal to reach pdb prompt as below and look at stack frames.
You'll always be ending up reaching pdb prompt when the start process is sleeping, however knowing that start process is monitoring on children status I can set a breakpoint accordingly and kill the children to figure out any discrepancies while launching child process (if there are any).
-> kubectl exec deploy/operator -c kadalu-operator -it -- bash
root@operator-67b7866747-gv5tv:/# ps -ef | grep [s]tart.py
root 1 0 0 03:25 pts/0 00:00:00 python3 -m pdb -c alias reg_sig tbreak /kadalu/kadalulib.py:322;; c;; signal.signal(signal.SIGUSR1, lambda *args:breakpoint());; c;; /kadalu/start.py
root@operator-67b7866747-gv5tv:/# kill -USR1 1
root@operator-67b7866747-gv5tv:/#
-> kubectl attach deploy/operator -c kadalu-operator -it
If you don't see a command prompt, try pressing enter.
--Return--
> <stdin>(1)<lambda>()->None
(Pdb) w
/usr/lib/python3.8/runpy.py(194)_run_module_as_main()
-> return _run_code(code, main_globals, None,
/usr/lib/python3.8/runpy.py(87)_run_code()
-> exec(code, run_globals)
/usr/lib/python3.8/pdb.py(1731)<module>()
-> pdb.main()
/usr/lib/python3.8/pdb.py(1704)main()
-> pdb._runscript(mainpyfile)
/usr/lib/python3.8/pdb.py(1573)_runscript()
-> self.run(statement)
/usr/lib/python3.8/bdb.py(580)run()
-> exec(cmd, globals, locals)
<string>(1)<module>()
/kadalu/start.py(19)<module>()
-> main()
/kadalu/start.py(14)main()
-> mon.monitor()
/kadalu/kadalulib.py(388)monitor()
-> time.sleep(1)
> <stdin>(1)<lambda>()->None
(Pdb) u
> /kadalu/kadalulib.py(388)monitor()
-> time.sleep(1)
(Pdb) locals()
{'self': <kadalulib.Monitor object at 0x7f654cdee070>, 'terminating': False, '_': 'operator', 'state': <kadalulib.ProcState object at 0x7f654cc7eac0>}
(Pdb) p terminating
False
(Pdb) c
-> kubectl exec deploy/operator -c kadalu-operator -it -- bash
root@operator-67b7866747-gv5tv:/# ps -ef | grep [p]ython
root 1 0 0 03:25 pts/0 00:00:00 python3 -m pdb -c alias reg_sig tbreak /kadalu/kadalulib.py:322;; c;; signal.signal(signal.SIGUSR1, lambda *args:breakpoint());; c;; /kadalu/start.py
root 7 1 0 06:06 pts/0 00:00:01 python3 /kadalu/main.py
root@operator-67b7866747-gv5tv:/#
Using initContainers
and postStart
§
We are lazy (at least I'm at times 😅
) and don't want to make pdb wait for input we can make use of initContainers in conjunction with pdbrc to run commands on dropping into pdb prompt. On a clean k3d
cluster I've deployed kadalu-operator.yaml with below modifications:
-> git diff manifests/kadalu-operator.yaml
diff --git a/manifests/kadalu-operator.yaml b/manifests/kadalu-operator.yaml
index 985a297..8e93c82 100644
--- a/manifests/kadalu-operator.yaml
+++ b/manifests/kadalu-operator.yaml
@@ -271,6 +271,18 @@ spec:
privileged: true
image: docker.io/kadalu/kadalu-operator:devel
imagePullPolicy: IfNotPresent
+ command: ["python3", "-m", "pdb", "/kadalu/start.py"]
+ stdin: true
+ tty: true
+ lifecycle:
+ postStart:
+ exec:
+ command:
+ - '/usr/bin/sh'
+ - '-c'
+ - 'while ! ps -ef | grep [m]ain.py; do sleep 1;
+ echo Waiting for subprocess to start;
+ done && umount /root/.pdbrc'
env:
- name: WATCH_NAMESPACE
valueFrom:
@@ -293,4 +305,22 @@ spec:
- name: K8S_DIST
value: "kubernetes"
- name: VERBOSE
- value: "no"
\ No newline at end of file
+ value: "yes"
+ volumeMounts:
+ - name: pdbrc
+ mountPath: "/root/.pdbrc"
+ subPath: ".pdbrc"
+ initContainers:
+ - name: pdbrc
+ image: busybox
+ command:
+ - 'sh'
+ - '-c'
+ - 'printf "%s\n%s\n%s\n%s" "tbreak /kadalu/kadalulib.py:322" "continue"
+ "signal.signal(signal.SIGUSR1, lambda *args: breakpoint())" "continue" > /pdbrc/.pdbrc'
+ volumeMounts:
+ - name: pdbrc
+ mountPath: "/pdbrc"
+ volumes:
+ - name: pdbrc
+ emptyDir: {}
After looking at above, you might be of opinion of why to go through all this hassle rather than just import signal
and set a breakpoint()
in command
value itself.
However there might be scenarios where you would want to set an actual breakpoint using break
/tbreak
and ensure that's hit even after restart, in that case you need to use similar manifest as above as setting break
from -c
isn't working (as expected?).
Let's see what's going on in above manifest:
- Using an initContainer in conjunction with
emptyDir
I'm creating apdbrc
file - As
kadalu-operator
container starts after initContainer, I'm making.pdbrc
file available to it usingsubPath
andpdb
reads this rc file
Before explaining why lifecycle
stanza is required let me show you what happens if we miss this stanza, I applied manifest and sent a signal similar to what we have done in above method.
-> kubectl apply -f manifests/kadalu-operator.yaml
[...]
-> kubectl logs deploy/operator -c kadalu-operator
Breakpoint 1 at /kadalu/kadalulib.py:322
Deleted breakpoint 1 at /kadalu/kadalulib.py:322
<Handlers.SIG_DFL: 0>
[...]
-> kubectl exec deploy/operator -c kadalu-operator -it -- bash
root@operator-965866889-grl87:/# ps -ef | grep [p]ython
root 1 0 0 06:06 pts/0 00:00:00 python3 -m pdb /kadalu/start.py
root 7 1 3 06:06 pts/0 00:00:01 python3 /kadalu/main.py
root@operator-965866889-grl87:/# kill -USR1 1
-> kubectl attach deploy/operator -c kadalu-operator -it
If you don't see a command prompt, try pressing enter.
--Return--
Breakpoint 2 at /kadalu/kadalulib.py:322
Because of the presence of .pdbrc
, pdb prompt didn't wait for user input (as last command is continue
) and ran all the commands (can be referred from kubectl logs
), however here's a gotcha you can see whenever we send a kill -USR1 1
signal it's dropping in pdb and reading .pdbrc
and continuing the execution path, for that to not happen we need to perform below and lifecycle
stanza serves the same need:
-> kubectl exec deploy/operator -c kadalu-operator -it -- bash
root@operator-965866889-grl87:/# df -ha | grep pdbrc
tmpfs 3.9G 611M 3.3G 16% /root/.pdbrc
root@operator-965866889-grl87:/# umount /root/.pdbrc
root@operator-965866889-grl87:/# df -ha | grep pdbrc
root@operator-965866889-grl87:/# kill -USR1 1
After deploying the manifest with postStart
, whenever we send a USR1 signal we'll drop into pdb without any issues:
-> kubectl attach deploy/operator -c kadalu-operator -it
If you don't see a command prompt, try pressing enter.
--Return--
> <stdin>(1)<lambda>()->None
(Pdb) w
/usr/lib/python3.8/runpy.py(194)_run_module_as_main()
-> return _run_code(code, main_globals, None,
/usr/lib/python3.8/runpy.py(87)_run_code()
-> exec(code, run_globals)
/usr/lib/python3.8/pdb.py(1731)<module>()
-> pdb.main()
/usr/lib/python3.8/pdb.py(1704)main()
-> pdb._runscript(mainpyfile)
/usr/lib/python3.8/pdb.py(1573)_runscript()
-> self.run(statement)
/usr/lib/python3.8/bdb.py(580)run()
-> exec(cmd, globals, locals)
<string>(1)<module>()
/kadalu/start.py(19)<module>()
-> main()
/kadalu/start.py(14)main()
-> mon.monitor()
/kadalu/kadalulib.py(388)monitor()
-> time.sleep(1)
> <stdin>(1)<lambda>()->None
(Pdb) c
Debugging Kadalu Operator §
If you are wondering whether above methods have any connection with actual operator implementation or not then the answer would be, only a slight connection, it's just that start.py
launches main.py
that performs operator logic which is, the deployment of config map, storage classes, csi pods and watch event stream for any operations on kadalu storage CRD.
Until now, we are not touching python source but only adding few steps to deployment config. However there's limitation that we can't reach main.py
from start.py
due to below style of launching subprocess. Even if we make changes to this code and use send_signal
, it's tough to manipulate multiple child processes.
# sed -n '293,300p' lib/kadalulib.py
def start(self):
"""Start a Process"""
self.subproc = subprocess.Popen(
self.proc.with_args(),
stderr=sys.stderr,
universal_newlines=True,
env=os.environ
)
We'll just use above limitation to our advantage and set a signal handler on main.py
(kubectl cp
to container) itself to reach pdb prompt knowing that if we kill child monitor just restarts the child and automatically it picks up new file (similar to what was stated in previous post).
I applied manifests/kadalu-operator.yaml
without any changes and once all the below resources are created by kadalu-operator
...
-> kubectl get sc,sts,ds; k get cm kadalu-info;
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
storageclass.storage.k8s.io/kadalu.replica3 kadalu Delete Immediate true 8m32s
storageclass.storage.k8s.io/kadalu.replica2 kadalu Delete Immediate true 8m31s
storageclass.storage.k8s.io/kadalu.replica1 kadalu Delete Immediate true 8m30s
storageclass.storage.k8s.io/kadalu kadalu Delete Immediate true 8m29s
NAME READY AGE
statefulset.apps/kadalu-csi-provisioner 1/1 13m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kadalu-csi-nodeplugin 2 2 2 2 2 <none> 13m
NAME DATA AGE
kadalu-info 2 16m
... patch the deploy/operator resource to support stdin
and tty
using below manifest
-> bat /tmp/patch.yaml --plain
spec:
template:
spec:
containers:
- name: kadalu-operator
stdin: true
tty: true
-> kubectl patch deployments.apps operator --patch "$(cat /tmp/patch.yaml)"
[...]
Now, all the things are in place to trace the steps that operator performs, I made below changes and copied the source to kadalu-operator
container
# git diff operator/main.py
diff --git a/operator/main.py b/operator/main.py
index 0a13e27..7d1509d 100644
--- a/operator/main.py
+++ b/operator/main.py
@@ -905,5 +905,8 @@ if __name__ == "__main__":
# ignore these warnings as we know to make calls only inside of
# kubernetes cluster
urllib3.disable_warnings()
-
+ import signal
+ signal.signal(signal.SIGUSR1, lambda *args: breakpoint())
+ from time import sleep
+ sleep(10)
main()
-> kubectl cp operator/main.py $(kubectl get pods -l name=kadalu --no-headers | awk '{print $1}'):/kadalu/main.py
We can delete the resources created by the operator, kill main.py
to pick up changed file and before execution goes past sleep(10)
we need to send a USR1 signal as below
-> kubectl delete sc,sts,ds --all; k delete cm kadalu-info;
storageclass.storage.k8s.io "kadalu.replica3" deleted
storageclass.storage.k8s.io "kadalu.replica2" deleted
storageclass.storage.k8s.io "kadalu.replica1" deleted
storageclass.storage.k8s.io "kadalu" deleted
statefulset.apps "kadalu-csi-provisioner" deleted
daemonset.apps "kadalu-csi-nodeplugin" deleted
configmap "kadalu-info" deleted
-> kubectl get pods
NAME READY STATUS RESTARTS AGE
operator-559755484c-69hfq 1/1 Running 3 70m
-> kubectl exec deploy/operator -c kadalu-operator -it -- bash
root@operator-559755484c-69hfq:/# ps -ef | grep [m]ain.py | awk '{print $2}' | xargs kill
root@operator-559755484c-69hfq:/# ps -ef | grep [m]ain.py | awk '{print $2}' | xargs kill -USR1
Finally we are in pdb prompt and main.py
execution is stopped and refer below listing and inline comments to figure out the steps being performed by the operator
-> kubectl attach deploy/operator -c kadalu-operator -it
If you don't see a command prompt, try pressing enter.
--Return--
> /kadalu/main.py(909)<lambda>()->None
-> signal.signal(signal.SIGUSR1, lambda *args: breakpoint())
(Pdb) w
/kadalu/main.py(911)<module>()
-> sleep(10)
> /kadalu/main.py(909)<lambda>()->None
-> signal.signal(signal.SIGUSR1, lambda *args: breakpoint())
(Pdb) l .
904 # This not advised in general, but in kadalu's operator, it is OK to
905 # ignore these warnings as we know to make calls only inside of
906 # kubernetes cluster
907 urllib3.disable_warnings()
908 import signal
909 -> signal.signal(signal.SIGUSR1, lambda *args: breakpoint())
910 from time import sleep
911 sleep(10)
912 main()
[EOF]
# ----------------------------------------------------
# Setting temporary debug points at important methods
# just before deploying resources
# ----------------------------------------------------
(Pdb) tbreak deploy_config_map
Breakpoint 1 at /kadalu/main.py:806
(Pdb) tbreak deploy_csi_pods
Breakpoint 2 at /kadalu/main.py:773
(Pdb) tbreak deploy_storage_class
Breakpoint 3 at /kadalu/main.py:841
(Pdb) tbreak crd_watch
Breakpoint 4 at /kadalu/main.py:745
(Pdb) c
# -----------------------------------------------------------------
# First, an uid is generated and kadalu-info configmap is deployed
# All yaml files are pre-made using Jinja2 templating and values
# are filled before applying those manifests
# -----------------------------------------------------------------
Deleted breakpoint 1 at /kadalu/main.py:806
> /kadalu/main.py(809)deploy_config_map()
-> configmaps = core_v1_client.list_namespaced_config_map(
(Pdb) a
core_v1_client = <kubernetes.client.api.core_v1_api.CoreV1Api object at 0x7f2b8f93bf10>
(Pdb) tbreak 835
Breakpoint 5 at /kadalu/main.py:835
(Pdb) c
[...]
Deleted breakpoint 5 at /kadalu/main.py:835
> /kadalu/main.py(835)deploy_config_map()
-> if not upgrade:
(Pdb) upgrade
False
(Pdb) filename
'/kadalu/templates/configmap.yaml'
(Pdb) pp open(filename).read()
('# -*- mode: yaml -*-\n'
'---\n'
'kind: ConfigMap\n'
'apiVersion: v1\n'
'metadata:\n'
' name: kadalu-info\n'
' namespace: kadalu\n'
'data:\n'
' volumes: ""\n'
' uid: c00cb7a2-8196-46b3-aafa-7974f3c34666')
(Pdb) c
[...]
# ----------------------
# CSI pods are deployed
# ----------------------
Deleted breakpoint 2 at /kadalu/main.py:773
> /kadalu/main.py(778)deploy_csi_pods()
-> pods = core_v1_client.list_namespaced_pod(
(Pdb) tbreak 802
Breakpoint 6 at /kadalu/main.py:802
(Pdb) c
[...]
Deleted breakpoint 6 at /kadalu/main.py:802
> /kadalu/main.py(802)deploy_csi_pods()
-> lib_execute(KUBECTL_CMD, APPLY_CMD, "-f", filename)
(Pdb) filename
'/kadalu/templates/csi.yaml'
(Pdb) KUBECTL_CMD
'/usr/bin/kubectl'
(Pdb) APPLY_CMD
'apply'
(Pdb) c
[...]
# -------------------------------------------------------------
# All storage classes are deployed, currently Kadalu supports
# Replica1, Replica2, Replica3 natively and Disperse support is
# up for review and already supports external gluster cluster.
# -------------------------------------------------------------
Deleted breakpoint 3 at /kadalu/main.py:841
> /kadalu/main.py(844)deploy_storage_class()
-> api_instance = client.StorageV1Api()
(Pdb) tbreak 861
Breakpoint 7 at /kadalu/main.py:861
(Pdb) c
[...]
Deleted breakpoint 7 at /kadalu/main.py:861
> /kadalu/main.py(861)deploy_storage_class()
-> template(filename, namespace=NAMESPACE, kadalu_version=VERSION)
(Pdb) filename
'/kadalu/templates/storageclass-kadalu.replica3.yaml'
(Pdb) c
[...]
# -----------------------------------------------------------------
# After above resources are deployed operator waits for any
# requests on Kadalu Storage CRD and performs subsequent operations.
# If we didn't provide 'sleep' we'll be ending up at this point
# where operator just waits for request to be served.
# -----------------------------------------------------------------
Deleted breakpoint 4 at /kadalu/main.py:745
> /kadalu/main.py(749)crd_watch()
-> crds = client.CustomObjectsApi(k8s_client)
(Pdb) b
(Pdb) b 764
Breakpoint 8 at /kadalu/main.py:764
(Pdb) b
Num Type Disp Enb Where
8 breakpoint keep yes at /kadalu/main.py:764
Let's add some storage to kadalu and see how they are being handled in brief. I'm using paths on same node just as a demo. As you can see from below above breakpoint is hit when storage is being handed and we just need to watch kubernetes event stream (using python api), it's that simple 😄
-> bat ../storage-config-path-minimal.yaml --plain;
---
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
name: replica3
spec:
type: Replica3
storage:
- node: k3d-test-agent-0
path: /mnt/sdc
- node: k3d-test-agent-0
path: /mnt/sdd
- node: k3d-test-agent-0
path: /mnt/sde
-> kubectl apply -f ../storage-config-path-minimal.yaml
kadalustorage.kadalu-operator.storage/replica3 created
[...]
(Pdb) b
Num Type Disp Enb Where
8 breakpoint keep yes at /kadalu/main.py:764
(Pdb) c
> /kadalu/main.py(764)crd_watch()
-> logging.debug(logf("Event", operation=operation, object=repr(obj)))
(Pdb) operation
'ADDED'
(Pdb) c
[...]
Operator deploys one pod for every storage path specified above which acts as a backend gluster brick, config map is updated with volume (replica3
) info and upon receiving any request for PVC on this backend volume, a subdir corresponding to PVC will be created and served as storage for PV and this connects us back to previous post on CSI Driver
-> kubectl get pods
NAME READY STATUS RESTARTS AGE
operator-559755484c-69hfq 1/1 Running 0 18h
kadalu-csi-nodeplugin-tpxwz 3/3 Running 0 23m
kadalu-csi-nodeplugin-8lzzb 3/3 Running 0 23m
kadalu-csi-provisioner-0 5/5 Running 0 23m
server-replica3-1-0 1/1 Running 0 2m
server-replica3-2-0 1/1 Running 0 119s
server-replica3-0-0 1/1 Running 0 2m1s
-> kubectl describe cm kadalu-info
Name: kadalu-info
Namespace: kadalu
Labels: <none>
Annotations: <none>
Data
====
volumes:
----
# Broken into new lines for readability
replica3.info:
----
{'bricks': [{'brick_device': '',
'brick_device_dir': '',
'brick_index': 0,
'brick_path': '/bricks/replica3/data/brick',
'host_brick_path': '/mnt/sdc',
'kube_hostname': 'k3d-test-agent-0',
'node': 'server-replica3-0-0.replica3',
'node_id': '20c47b4c-ac91-11eb-aeca-3a728c8b6546',
'pvc_name': ''},
{'brick_device': '',
'brick_device_dir': '',
'brick_index': 1,
'brick_path': '/bricks/replica3/data/brick',
'host_brick_path': '/mnt/sdd',
'kube_hostname': 'k3d-test-agent-0',
'node': 'server-replica3-1-0.replica3',
'node_id': '20c480a6-ac91-11eb-aeca-3a728c8b6546',
'pvc_name': ''},
{'brick_device': '',
'brick_device_dir': '',
'brick_index': 2,
'brick_path': '/bricks/replica3/data/brick',
'host_brick_path': '/mnt/sde',
'kube_hostname': 'k3d-test-agent-0',
'node': 'server-replica3-2-0.replica3',
'node_id': '20c485b0-ac91-11eb-aeca-3a728c8b6546',
'pvc_name': ''}],
'kadalu_version': 'devel',
'namespace': 'kadalu',
'options': {},
'type': 'Replica3',
'volname': 'replica3',
'volume_id': '20c46b98-ac91-11eb-aeca-3a728c8b6546'}
uid:
----
c00cb7a2-8196-46b3-aafa-7974f3c34666
Events: <none>
Operator watches event stream for three operations as stated below and dispatches to the method which handles that operation
744 # sed -n '745,770p' operator/main.py
745 def crd_watch(core_v1_client, k8s_client):
746 """
747 Watches the CRD to provision new PV Hosting Volumes
748 """
749 crds = client.CustomObjectsApi(k8s_client)
750 k8s_watch = watch.Watch()
751 resource_version = ""
752 for event in k8s_watch.stream(crds.list_cluster_custom_object,
753 "kadalu-operator.storage",
754 "v1alpha1",
755 "kadalustorages",
756 resource_version=resource_version):
757 obj = event["object"]
758 operation = event['type']
759 spec = obj.get("spec")
760 if not spec:
761 continue
762 metadata = obj.get("metadata")
763 resource_version = metadata['resourceVersion']
764 logging.debug(logf("Event", operation=operation, object=repr(obj)))
765 if operation == "ADDED":
766 handle_added(core_v1_client, obj)
767 elif operation == "MODIFIED":
768 handle_modified(core_v1_client, obj)
769 elif operation == "DELETED":
770 handle_deleted(core_v1_client, obj)
Finally, most if not all the operations that we have seen from pdb can easily be viewed from logs however this'll come handy when you are debugging operator or any python application in general.
We'll be covering how kadalu manages/uses glusterfs underneath and provides a streamlined process for consuming storage in kubernetes workloads in a later post.
Summary §
At times pdb can be hard to manage however if you can pick one of the workflows that suits you and spend some time refining it, you'll be learning a lot and effectively reducing the feedback loop between development and testing cycles. One tip would be, if you are dropped into pdb when the main process is sleeping go up (u
) the stack and step (s
) into the process.
The concepts shared here can be used in any normal python program and I felt it'd be helpful to use these concepts in real-time to breakdown and understand complex piece of software one step at a time.
I hope this helped in cementing some of the operations performed by a typical kubernetes operator and motivate you to contribute to open source projects.
Send an email for any comments. Kudos for making it to the end. Thanks!