k8s ipv4/ipv6双栈

ipv6在k8s中的支持情况

ipv6特性在k8s中作为一个特定的feature IPv6DualStack来管理。在1.15版本到1.20版本该feature为alpha版本,默认不开启。从1.21版本开始,该feature为beta版本,默认开启,支持pod和service网络的双栈。1.23版本变为稳定版本。

配置双栈

alpha版本需要通过kube-apiserver、kube-controller-manager、kubelet和kube-proxy的–feature-gates=”IPv6DualStack=true”命令来开启该特性,beta版本该特性默认开启。

kube-apiserver:

  • --service-cluster-ip-range=<IPv4 CIDR>,<IPv6 CIDR>

kube-controller-manager:

  • --cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR>
  • --service-cluster-ip-range=<IPv4 CIDR>,<IPv6 CIDR>
  • --node-cidr-mask-size-ipv4|--node-cidr-mask-size-ipv6 对于 IPv4 默认为 /24,对于 IPv6 默认为 /64

kube-proxy:

  • --cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR>

使用kind创建k8s集群

由于ipv6的环境并不容易找到,可以使用kind快速在本地拉起一个k8s的单集群环境,同时kind支持ipv6的配置。

检查当前系统是否支持ipv6,如果输出结果为0,说明当前系统支持ipv6.

1
2
$ sysctl -a | grep net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 0

创建ipv6单栈k8s集群执行如下命令:

1
2
3
4
5
6
7
8
9
10
11
cat > kind-ipv6.conf <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: ipv6-only
networking:
ipFamily: ipv6
# networking:
# apiServerAddress: "127.0.0.1"
# apiServerPort: 6443
EOF
kind create cluster --config kind-ipv6.conf

创建ipv4/ipv6双栈集群执行如下命令:

1
2
3
4
5
6
7
8
9
10
11
cat > kind-dual-stack.conf <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: dual-stack
networking:
ipFamily: dual
# networking:
# apiServerAddress: "127.0.0.1"
# apiServerPort: 6443
EOF
kind create cluster --config kind-dual-stack.conf

service网络

spec定义

k8s的service网络为了支持双栈,新增加了几个字段。

.spec.ipFamilies用来设置地址族,service一旦场景后该值不可认为修改,但可以通过修改.spec.ipFamilyPolicy来简介修改该字段的值。为数组格式,支持如下值:

  • [“IPv4”]
  • [“IPv6”]
  • [“IPv4”,”IPv6”] (双栈)
  • [“IPv6”,”IPv4”] (双栈)

上述数组中的第一个元素会决定.spec.ClusterIP中的字段。

.spec.ClusterIPs:由于.spec.ClusterIP的value为字符串,不支持同时设置ipv4和ipv6两个ip地址。因此又扩展出来了一个.spec.ClusterIPs字段,该字段的value为宿主元祖。在Headless类型的Service情况下,该字段的值为None。

.spec.ClusterIP:该字段的值跟.spec.ClusterIPs中的第一个元素的值保持一致。

.spec.ipFamilyPolicy支持如下值:

  • SingleStack:默认值。会使用.spec.ipFamilies数组中配置的第一个协议来分配cluster ip。如果没有指定.spec.ipFamilies,会使用service-cluster-ip-range配置中第一个cidr中来配置地址。
  • PreferDualStack:如果 .spec.ipFamilies 没有设置,使用 k8s 集群默认的 ipFamily。
  • RequireDualStack:同时分配ipv4和ipv6地址。.spec.ClusterIP的值会从.spec.ClusterIPs选择第一个元素。

service配置场景

先创建如下的deployment,以便于后面试验的service可以关联到pod。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: MyApp
spec:
replicas: 3
selector:
matchLabels:
app: MyApp
template:
metadata:
labels:
app: MyApp
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80

未指定任何协议栈信息

在没有指定协议栈信息的Service,创建出来的service .spec.ipFamilyPolicy为SingleStack。同时会使用service-cluster-ip-range配置中第一个cidr中来配置地址。如果第一个cidr为ipv6,则service分配的clusterip为ipv6地址。创建如下的service

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Service
metadata:
name: my-service
labels:
app: MyApp
spec:
selector:
app: MyApp
ports:
- protocol: TCP
port: 80

会生成如下的service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: v1
kind: Service
metadata:
labels:
app: MyApp
name: my-service
namespace: default
spec:
clusterIP: 10.96.80.114
clusterIPs:
- 10.96.80.114
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: MyApp
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

指定.spec.ipFamilyPolicy为PreferDualStack

创建如下的service

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
name: my-service-2
labels:
app: MyApp
spec:
ipFamilyPolicy: PreferDualStack
selector:
app: MyApp
ports:
- protocol: TCP
port: 80

提交到环境后会生成如下的service,可以看到.spec.clusterIPs中的ip地址跟ipFamilies中的顺序一致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
apiVersion: v1
kind: Service
metadata:
labels:
app: MyApp
name: my-service-2
namespace: default
spec:
clusterIP: 10.96.221.70
clusterIPs:
- 10.96.221.70
- fd00:10:96::7d1
ipFamilies:
- IPv4
- IPv6
ipFamilyPolicy: PreferDualStack
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: MyApp
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

查看Endpoints,可以看到subsets中的地址为pod的ipv4协议地址。Endpoints中的地址跟service的.spec.ipFamilies数组中的第一个协议的值保持一致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
apiVersion: v1
kind: Endpoints
metadata:
labels:
app: MyApp
name: my-service-2
namespace: default
subsets:
- addresses:
- ip: 10.244.0.5
nodeName: dual-stack-control-plane
targetRef:
kind: Pod
name: nginx-deployment-65b5dd4c68-vrfps
namespace: default
resourceVersion: "16875"
uid: 30a1d787-f799-4250-8c56-c96564ca9239
- ip: 10.244.0.6
nodeName: dual-stack-control-plane
targetRef:
kind: Pod
name: nginx-deployment-65b5dd4c68-wgz72
namespace: default
resourceVersion: "16917"
uid: 8166d43e-2702-45c6-839e-b3005f44f647
- ip: 10.244.0.7
nodeName: dual-stack-control-plane
targetRef:
kind: Pod
name: nginx-deployment-65b5dd4c68-x4lt5
namespace: default
resourceVersion: "16896"
uid: f9c2968f-ca59-4ba9-a69f-358c202a964b
ports:
- port: 80
protocol: TCP

接下来指定.spec.ipFamilies的顺序再看一下执行的结果,创建如下的service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Service
metadata:
name: my-service-3
labels:
app: MyApp
spec:
ipFamilyPolicy: PreferDualStack
ipFamilies:
- IPv6
- IPv4
selector:
app: MyApp
ports:
- protocol: TCP
port: 80

在环境中生成的service如下,可以看到.spec.clusterIPs中的顺序第一个为ipv6地址,.spec.clusterIP同样为ipv6地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
apiVersion: v1
kind: Service
metadata:
labels:
app: MyApp
name: my-service-3
namespace: default
spec:
clusterIP: fd00:10:96::c306
clusterIPs:
- fd00:10:96::c306
- 10.96.147.82
ipFamilies:
- IPv6
- IPv4
ipFamilyPolicy: PreferDualStack
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: MyApp
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

查看Endpoints,可以看到subsets中的地址为pod的ipv6协议地址。Endpoints中的地址跟service的.spec.ipFamilies数组中的第一个协议的值保持一致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
apiVersion: v1
kind: Endpoints
metadata:
labels:
app: MyApp
name: my-service-3
namespace: default
subsets:
- addresses:
- ip: fd00:10:244::5
nodeName: dual-stack-control-plane
targetRef:
kind: Pod
name: nginx-deployment-65b5dd4c68-vrfps
namespace: default
resourceVersion: "16875"
uid: 30a1d787-f799-4250-8c56-c96564ca9239
- ip: fd00:10:244::6
nodeName: dual-stack-control-plane
targetRef:
kind: Pod
name: nginx-deployment-65b5dd4c68-wgz72
namespace: default
resourceVersion: "16917"
uid: 8166d43e-2702-45c6-839e-b3005f44f647
- ip: fd00:10:244::7
nodeName: dual-stack-control-plane
targetRef:
kind: Pod
name: nginx-deployment-65b5dd4c68-x4lt5
namespace: default
resourceVersion: "16896"
uid: f9c2968f-ca59-4ba9-a69f-358c202a964b
ports:
- port: 80
protocol: TCP

单栈和双栈之间的切换

虽然.spec.ipFamilies字段不允许直接修改,但.spec.ipFamilyPolicy字段允许修改,但并不影响单栈和双栈之间的切换。
单栈变双栈只需要修改.spec.ipFamilyPolicy从SingleStack变为PreferDualStack或者RequireDualStack即可。
双栈同样可以变为单栈,只需要修改.spec.ipFamilyPolicy从PreferDualStack或者RequireDualStack变为SingleStack即可。此时.spec.ipFamilies会自动变更为一个元素,.spec.clusterIPs同样会变更为一个元素。

LoadBalancer类型的Service

对于LoadBalancer类型的Service,单栈的情况下,会在.status.loadBalancer.ingress设置vip地址。如果是ingress中的ip地址为双栈,此时应该是将双栈的vip地址同时写到.status.loadBalancer.ingress中,并且要保证其顺序跟serivce的.spec.ipFamilies中的顺序一致。

pod网络

pod网络要支持ipv6,需要容器网络插件的支持。为了支持ipv6特性,新增加了.status.podIPs字段,用来展示pod上分配的ipv4和ipv6的信息。.status.podIP字段的值跟.status.podIPs数组的第一个元素的值保持一致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
apiVersion: v1
kind: Pod
metadata:
labels:
k8s-app: kube-dns
pod-template-hash: 558bd4d5db
name: coredns-558bd4d5db-b2zbj
namespace: kube-system
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: k8s.gcr.io/coredns/coredns:v1.8.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: 8181
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hgxnc
readOnly: true
dnsPolicy: Default
enableServiceLinks: true
nodeName: dual-stack-control-plane
nodeSelector:
kubernetes.io/os: linux
preemptionPolicy: PreemptLowerPriority
priority: 2000000000
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: coredns
serviceAccountName: coredns
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
...
volumes:
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: coredns
name: config-volume
- name: kube-api-access-hgxnc
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-01-16T12:51:24Z"
status: "True"
type: Initialized
...
containerStatuses:
- containerID: containerd://6da36ab908291ca1b4141a86d70f8c2bb150a933336d852bcabe2118aa1a3439
image: k8s.gcr.io/coredns/coredns:v1.8.0
imageID: sha256:296a6d5035e2d6919249e02709a488d680ddca91357602bd65e605eac967b899
lastState: {}
name: coredns
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-01-16T12:51:27Z"
hostIP: 172.18.0.2
phase: Running
podIP: 10.244.0.3
podIPs:
- ip: 10.244.0.3
- ip: fd00:10:244::3
qosClass: Burstable
startTime: "2022-01-16T12:51:24Z"

pod中要想获取到ipv4、ipv6地址,可以通过downward api的形式将.status.podIPs以环境变量的形式传递到容器中,在pod中通过环境变量获取到的格式为: 10.244.1.4,a00:100::4

k8s node

在宿主机上可以通过ip addr show eth0的方式来查看网卡上的ip地址。

1
2
3
4
5
6
7
8
9
# ip addr show eth0
23: eth0@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::2/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:2/64 scope link
valid_lft forever preferred_lft forever

宿主机的网卡上有ipv6地址,k8s node上的.status.addresses中有所体现,type为InternalIP即包含了ipv4地址,又包含了ipv6地址,但此处并没有字段标识当前地址为ipv4,还是ipv6。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: dual-stack-control-plane
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: ""
node-role.kubernetes.io/master: ""
node.kubernetes.io/exclude-from-external-load-balancers: ""
name: dual-stack-control-plane
spec:
podCIDR: 10.244.0.0/24
podCIDRs:
- 10.244.0.0/24
- fd00:10:244::/64
providerID: kind://docker/dual-stack/dual-stack-control-plane
status:
addresses:
- address: 172.18.0.2
type: InternalIP
- address: fc00:f853:ccd:e793::2
type: InternalIP
- address: dual-stack-control-plane
type: Hostname
allocatable:
cpu: "4"
ephemeral-storage: 41152812Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 15936568Ki
pods: "110"
capacity:
cpu: "4"
ephemeral-storage: 41152812Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 15936568Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2022-01-16T15:01:48Z"
lastTransitionTime: "2022-01-16T12:50:54Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
...
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- k8s.gcr.io/kube-proxy:v1.21.1
sizeBytes: 132714699
...
nodeInfo:
architecture: amd64
bootID: 7f95abb9-7731-4a8c-9258-4a91cdcfb2ca
containerRuntimeVersion: containerd://1.5.2
kernelVersion: 4.18.0-305.an8.x86_64
kubeProxyVersion: v1.21.1
kubeletVersion: v1.21.1
machineID: 8f6a98bffc184893ab6bc260e705421b
operatingSystem: linux
osImage: Ubuntu 21.04
systemUUID: f7928fdb-32be-4b6e-8dfd-260b6820f067

Ingress

Ingress 可以通过开关 disable-ipv6 来控制是否开启 ipv6,默认 ipv6 开启。

在开启 ipv6 的情况下,如果 nginx ingress 的 pod 本身没有ipv6 的 ip地址,则在 nginx 的配置文件中并不会监听 ipv6 的端口号。

1
2
listen 80  ;
listen 443 ssl http2 ;

如果 nginx ingress 的 pod 本身包含 ipv6 地址,则 nginx 的配置文件如下:

1
2
3
4
listen 80  ;
listen [::]:80 ;
listen 443 ssl http2 ;
listen [::]:443 ssl http2 ;

参考资料:

引用