Skip to main content

Quick Start

This document explains how to quickly run the AlterShield Operator service.

Getting Started

Before running the AlterShield Operator service, you need to read the following documents:

Running Local Sever

Install dependencies:

go mod tidy

2. Register the CRD to the Kubernetes cluster:

kubectl apply -f config/crd/bases

3. Use the following command to start the local service:

you need to set the environment variable ENVIRONMENT=DEV

ENVIRONMENT=DEV make run
  • When you see the following log, it means the service has started successfully:
{"level":"info","ts":"2023-05-10T14:44:33.604+0800","caller":"controller/controller.go:241","msg":"Starting workers","controller":"deployment","controllerGroup":"apps","controllerKind":"Deployment","worker count":5}
{"level":"info","ts":"2023-05-10T14:44:33.604+0800","caller":"controller/controller.go:241","msg":"Starting workers","controller":"changepod","controllerGroup":"app.ops.cloud.alipay.com","controllerKind":"ChangePod","worker count":20}
{"level":"info","ts":"2023-05-10T14:44:33.604+0800","caller":"controller/controller.go:241","msg":"Starting workers","controller":"pod","controllerGroup":"","controllerKind":"Pod","worker count":5}
{"level":"info","ts":"2023-05-10T14:44:33.605+0800","caller":"controller/controller.go:241","msg":"Starting workers","controller":"changeworkload","controllerGroup":"app.ops.cloud.alipay.com","controllerKind":"ChangeWorkload","worker count":5}

Start first test deployment

1. Set the default namespace to be controlled

kubectl label namespace default admission-webhook-altershield=enabled

When you see the following log, it means the setting is successful:

namespace/default labeled

2. Create a Deployment resource sleep

apiVersion: apps/v1
kind: Deployment
metadata:
name: sleep
labels:
app: sleep
spec:
replicas: 5
selector:
matchLabels:
app: sleep
template:
metadata:
labels:
app: sleep
test: "123"
spec:
containers:
- name: sleep
image: busybox
command: ["/bin/sleep","infinity"]
imagePullPolicy: IfNotPresent
kubectl apply -f config/samples/sleep.yaml
  • When you see the following log, it means the deployment is successful:
deployment.apps/sleep created

3. Check if the deployment is successful

kubectl get pods
  • When you see 5 sleep pods running, it means the deployment is successful:
NAME                     READY   STATUS    RESTARTS   AGE
sleep-5c698f4449-5m5g4 1/1 Running 0 2m
sleep-5c698f4449-ctfd5 1/1 Running 0 2m
sleep-5c698f4449-jkv5r 1/1 Running 0 2m
sleep-5c698f4449-rjgkn 1/1 Running 0 2m
sleep-5c698f4449-7q9q2 1/1 Running 0 2m

Self-healing rollback

1. Intentionally set the wrong image and modify the label test.

apiVersion: apps/v1
kind: Deployment
metadata:
name: sleep
labels:
app: sleep
spec:
replicas: 5
selector:
matchLabels:
app: sleep
template:
metadata:
labels:
app: sleep
test: "456"
spec:
containers:
- name: sleep
image: busybo
command: ["/bin/sleep","infinity"]
imagePullPolicy: IfNotPresent
  • Reapply the sleep.yaml:
kubectl apply -f config/samples/sleep.yaml
  • When you see the following log, it means the deployment is successful:
deployment.apps/sleep configured

2. Check the pod status of the sleep deployment again

kubectl get pods
  • At this point, you will find that the pod status of the sleep deployment has 4 running and 3 ErrImagePull. This is because the deployment sets an incorrect image and uses rolling deployment, resulting in abnormal status of some pods:
NAME                     READY   STATUS             RESTARTS   AGE
sleep-5c698f4449-5m5g4 1/1 Running 0 2m49s
sleep-5c698f4449-ctfd5 1/1 Running 0 2m49s
sleep-5c698f4449-jkv5r 1/1 Running 0 2m49s
sleep-5c698f4449-rjgkn 1/1 Running 0 2m49s
sleep-6c55bbc8d6-m6g64 0/1 ErrImagePull 0 76s
sleep-6c55bbc8d6-nlzrf 0/1 ImagePullBackOff 0 76s
sleep-6c55bbc8d6-x7wvc 0/1 ErrImagePull 0 76s

3. Observe the rollback status

  • After the abnormal pod status exceeds the threshold time (default 2 minutes), the AlterShield Operator will automatically rollback the deployment to the previous normal version (if it exists).
kubectl get pods
  • After waiting for the threshold time, you will find that all the pod status of the sleep deployment are running:
NAME                     READY   STATUS    RESTARTS   AGE
sleep-5c698f4449-5m5g4 1/1 Running 0 3m49s
sleep-5c698f4449-ctfd5 1/1 Running 0 3m49s
sleep-5c698f4449-jkv5r 1/1 Running 0 3m49s
sleep-5c698f4449-rjgkn 1/1 Running 0 3m49s
sleep-5c698f4449-7q9q2 1/1 Running 0 1m49s

4. Rollback of version

kubectl get deployment sleep -o yaml
  • The current template image of the deployment is busybox, and the test label is "123".
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"sleep"},"name":"sleep","namespace":"default"},"spec":{"replicas":5,"selector":{"matchLabels":{"app":"sleep"}},"template":{"metadata":{"labels":{"app":"sleep","test":"1233"}},"spec":{"containers":[{"command":["/bin/sleep","infinity"],"image":"busybo","imagePullPolicy":"IfNotPresent","name":"sleep"}]}}}}
creationTimestamp: "2023-05-10T07:30:40Z"
generation: 3
labels:
admission-webhook-altershield.antgroup.com/version: c6c45d23c098bdf181853a85b60b5d74
altershield.defense.antgroup.com/defense-status: processed
app: sleep
name: sleep
namespace: default
resourceVersion: "174364"
uid: 63e253a2-d18a-4100-b928-38e004263762
spec:
progressDeadlineSeconds: 600
replicas: 5
revisionHistoryLimit: 10
selector:
matchLabels:
app: sleep
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
admission-webhook-altershield.antgroup.com/version: c6c45d23c098bdf181853a85b60b5d74
app: sleep
test: "123"
spec:
containers:
- command:
- /bin/sleep
- infinity
image: busybox
imagePullPolicy: IfNotPresent
name: sleep
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 5
conditions:
- lastTransitionTime: "2023-05-10T07:30:42Z"
lastUpdateTime: "2023-05-10T07:30:42Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2023-05-10T07:30:40Z"
lastUpdateTime: "2023-05-10T07:34:15Z"
message: ReplicaSet "sleep-5c698f4449" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 3
readyReplicas: 5
replicas: 5
updatedReplicas: 5