本文共 22750 字,大约阅读时间需要 75 分钟。
kubernetes新一代的监控模型由:核心指标流水线和第三方非核心监控流水线组成。核心指标流水线由kubelet、metric-server 以及由API-server提供的API组成;负责CPU累积使用率、内存实时使用率、POD资源占用率、Container磁盘占用率等。而第三方非核心监控流水线 负责从OS收集各种指标数据并提供给终端用户、存储系统、以及HPA等。
监控系统收集两种指标: 资源指标与自定义指标。Metrics-server 是资源指标API 。它提供核心指标,包括CPU累积使用率、内存实时使用率、Pod 的资源占用率及容器的磁盘占用率。这些指标由kubelet、metrics-server以及由API server提供的。
Prometheus是自定义指标 的提供者。它收集的数据还需要经过kube-state-metrics转换处理,再由 k8s-prometheus-adapter 输出为metrics-api 才能被 kubernetes cluster 所读取。用于从系统收集各种指标数据,并经过处理提供给 终端用户、存储系统以及HPA,这些数据包括核心指标和许多非核心指标。资源指标API 负责收集各种资源指标,但它需要扩展APIServer 。可以利用 aggregator 将 metrics-server 与 APIServer进行聚合,达到扩展功能的效果。这样 就可以利用扩展的 API Server 功能(即资源指标API)进行收集 各种资源指标(1.8+支持)。kubectl top 、HPA等功能组件 必须依赖资源指标API (早期版本它们依赖heapster)。HPA 根据CPU、Memory、IO、net connections等指标进行扩展或收缩(早期的heapster只能提供CPU、Memory指标)一、metrics-server 是托管在kubernetes cluster上的一个Pod ,再由 kube-aggregator 将它和原API Server 进行聚合,达到扩展API 的效果。它是现在 kubectl top 、HPA的前提依赖。部署metrics-server 如下:参考 :[root@k8s-master-dev metric-v0.3]# cat metrics-server.yaml---apiVersion: v1kind: ServiceAccountmetadata: name: metrics-server namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: system:metrics-serverrules:- apiGroups: - "" resources: - pods - nodes - nodes/stats - namespaces verbs: - get - list - watch- apiGroups: - "extensions" resources: - deployments verbs: - get - list - watch---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: system:metrics-serverroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-serversubjects:- kind: ServiceAccount name: metrics-server namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: metrics-server-auth-reader namespace: kube-systemroleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-readersubjects:- kind: ServiceAccount name: metrics-server namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: metrics-server:system:auth-delegatorroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegatorsubjects:- kind: ServiceAccount name: metrics-server namespace: kube-system---apiVersion: apiregistration.k8s.io/v1beta1kind: APIServicemetadata: name: v1beta1.metrics.k8s.iospec: service: name: metrics-server namespace: kube-system group: metrics.k8s.io version: v1beta1 insecureSkipTLSVerify: true groupPriorityMinimum: 100 versionPriority: 100---apiVersion: apps/v1kind: Deploymentmetadata: name: metrics-server namespace: kube-system labels: k8s-app: metrics-serverspec: selector: matchLabels: k8s-app: metrics-server template: metadata: name: metrics-server labels: k8s-app: metrics-server spec: serviceAccountName: metrics-server volumes: # mount in tmp so we can safely use from-scratch images and/or read-only containers - name: tmp-dir emptyDir: {} containers: - name: metrics-server image: k8s.gcr.io/metrics-server-amd64:v0.3.0 imagePullPolicy: IfNotPresent command: - /metrics-server - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP volumeMounts: - name: tmp-dir mountPath: /tmp---apiVersion: v1kind: Servicemetadata: name: metrics-server namespace: kube-system labels: kubernetes.io/name: "Metrics-server"spec: selector: k8s-app: metrics-server ports: - port: 443 protocol: TCP targetPort: 443[root@k8s-master-dev metric-v0.3]# kubectl apply -f metrics-server.yaml [root@k8s-master-dev metric-v0.3]# cd[root@k8s-master-dev ~]# kubectl api-versionsadmissionregistration.k8s.io/v1beta1apiextensions.k8s.io/v1beta1apiregistration.k8s.io/v1apiregistration.k8s.io/v1beta1apps/v1apps/v1beta1apps/v1beta2authentication.k8s.io/v1authentication.k8s.io/v1beta1authorization.k8s.io/v1authorization.k8s.io/v1beta1autoscaling/v1autoscaling/v2beta1batch/v1batch/v1beta1certificates.k8s.io/v1beta1custom.metrics.k8s.io/v1beta1events.k8s.io/v1beta1extensions/v1beta1*metrics.k8s.io/v1beta1*networking.k8s.io/v1policy/v1beta1rbac.authorization.k8s.io/v1rbac.authorization.k8s.io/v1beta1scheduling.k8s.io/v1beta1storage.k8s.io/v1storage.k8s.io/v1beta1v1[root@k8s-master-dev ~]#[root@k8s-master-dev ~]# kubectl top nodesNAME CPU(cores) CPU% MEMORY(bytes) MEMORY%k8s-master-dev 299m 3% 1884Mi 11%k8s-node1-dev 125m 1% 4181Mi 26%k8s-node2-dev 66m 3% 2736Mi 17%k8s-node3-dev 145m 1% 2686Mi 34%[root@k8s-master-dev metric-v0.3]# kubectl top podsNAME CPU(cores) MEMORY(bytes)mongo-0 12m 275Mimongo-1 11m 251Mimongo-2 8m 271Mi[root@k8s-master-dev metric-v0.3]#
当metrics-server部署完毕后,如上所示可以查看到 metrics相关的API,并且可以使用kubectl top 命令查看node或pod的资源占用情况 。
如果需要安装最新版本可以 git clone cd metrics-server/deploy/1.8+/kubectl apply -f ./如果发现metrics-server 的pod可以正常启动,但在执行kubectl top node时提示metrics-server 不可用,在执行 kubectl log metrics-server-* -n kube-system 时有错误提示,很可能是因为:resource-reader.yaml 文件中 ClusterRole 的rules中缺少 namespaces 权限,以及 metrics-server-deployment.yaml文件中container下缺少以下语句,以忽略tls认证。command: - /metrics-server - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP
二、Prometheus
架构图如下:Prometheus 通过node_exporter获取各Nodes的信息。 node_exporter它只负责节点级别的信息汇总,如果需要采集其它指标数据,就需要部署专用的exporter 。Prometheus 通过 metrics-url 地址到各Pods获取数据 。prometheus 提供了一个Restful 风格的PromQL接口,可以让用户输入查询表达式。但K8s的 API Server 无法查询其值 ,因为它们默认的数据格式不统一。数据需要kube-state-metrics组件将其处理、转换,然后由k8s-prometheus-adapter组件读取并聚合到API上,最后 kubernetes cluster 的API server 才能识别。所以各节点需要部署node_exporter 组件,然后Prometheus从各节点的node_exporter上获取infomation,然后就可以通过 PromQL 查询各种数据。这些数据的格式再由kube-state-metrics组件进行转换,然后再由kube-prometheus-adapter组件将转换后的数据输出为Custom metrics API ,并聚合到API上,以便用户使用。示意图如下部署Prometheus, 如下:参考 :1) 定义名称空间[root@k8s-master-dev prometheus]# cd k8s-prom/[root@k8s-master-dev k8s-prom]#[root@k8s-master-dev k8s-prom]# lsk8s-prometheus-adapter namespace.yaml podinfo README.mdkube-state-metrics node_exporter prometheus[root@k8s-master-dev k8s-prom]# cat namespace.yaml---apiVersion: v1kind: Namespacemetadata: name: prom[root@k8s-master-dev k8s-prom]# kubectl apply -f namespace.yamlnamespace/prom created
2) 部署node_exporter
[root@k8s-master-dev k8s-prom]# cd node_exporter/[root@k8s-master-dev node_exporter]# lsnode-exporter-ds.yaml node-exporter-svc.yaml[root@k8s-master-dev node_exporter]# vim node-exporter-ds.yaml[root@k8s-master-dev node_exporter]# kubectl apply -f ./daemonset.apps/prometheus-node-exporter createdservice/prometheus-node-exporter created[root@k8s-master-dev node_exporter]# kubectl get pods -n promNAME READY STATUS RESTARTS AGEprometheus-node-exporter-7729r 1/1 Running 0 17sprometheus-node-exporter-hhc7f 1/1 Running 0 17sprometheus-node-exporter-jxjcq 1/1 Running 0 17sprometheus-node-exporter-pswbb 1/1 Running 0 17s[root@k8s-master-dev node_exporter]# cd ..
3) 部署prometheus
[root@k8s-master-dev k8s-prom]# cd prometheus/[root@k8s-master-dev prometheus]# lsprometheus-cfg.yaml prometheus-deploy.yaml prometheus-rbac.yaml prometheus-svc.yaml[root@k8s-master-dev prometheus]# kubectl apply -f ./configmap/prometheus-config createddeployment.apps/prometheus-server createdclusterrole.rbac.authorization.k8s.io/prometheus createdserviceaccount/prometheus createdclusterrolebinding.rbac.authorization.k8s.io/prometheus createdservice/prometheus created[root@k8s-master-dev prometheus]# kubectl get all -n promNAME READY STATUS RESTARTS AGEpod/prometheus-node-exporter-7729r 1/1 Running 0 1mpod/prometheus-node-exporter-hhc7f 1/1 Running 0 1mpod/prometheus-node-exporter-jxjcq 1/1 Running 0 1mpod/prometheus-node-exporter-pswbb 1/1 Running 0 1mpod/prometheus-server-65f5d59585-5fj6n 1/1 Running 0 33sNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/prometheus NodePort 10.98.96.669090:30090/TCP 34sservice/prometheus-node-exporter ClusterIP None 9100/TCP 1mNAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEdaemonset.apps/prometheus-node-exporter 4 4 4 4 4 1mNAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGEdeployment.apps/prometheus-server 1 1 1 1 34sNAME DESIRED CURRENT READY AGEreplicaset.apps/prometheus-server-65f5d59585 1 1 1 34s[root@k8s-master-dev prometheus]#
然后就可以以PromQL的方式查询数据了,如下所示:
4) 部署kube-state-metrics[root@k8s-master-dev prometheus]# cd ..[root@k8s-master-dev k8s-prom]# cd kube-state-metrics/[root@k8s-master-dev kube-state-metrics]# lskube-state-metrics-deploy.yaml kube-state-metrics-rbac.yaml kube-state-metrics-svc.yaml[root@k8s-master-dev kube-state-metrics]# kubectl apply -f ./deployment.apps/kube-state-metrics createdserviceaccount/kube-state-metrics createdclusterrole.rbac.authorization.k8s.io/kube-state-metrics createdclusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics createdservice/kube-state-metrics created[root@k8s-master-dev kube-state-metrics]#[root@k8s-master-dev kube-state-metrics]# kubectl get all -n promNAME READY STATUS RESTARTS AGEpod/kube-state-metrics-58dffdf67d-j4jdv 0/1 Running 0 34spod/prometheus-node-exporter-7729r 1/1 Running 0 3mpod/prometheus-node-exporter-hhc7f 1/1 Running 0 3mpod/prometheus-node-exporter-jxjcq 1/1 Running 0 3mpod/prometheus-node-exporter-pswbb 1/1 Running 0 3mpod/prometheus-server-65f5d59585-5fj6n 1/1 Running 0 2mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/kube-state-metrics ClusterIP 10.108.165.1718080/TCP 35sservice/prometheus NodePort 10.98.96.66 9090:30090/TCP 2mservice/prometheus-node-exporter ClusterIP None 9100/TCP 3mNAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEdaemonset.apps/prometheus-node-exporter 4 4 4 4 4 3mNAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGEdeployment.apps/kube-state-metrics 1 1 1 0 35sdeployment.apps/prometheus-server 1 1 1 1 2mNAME DESIRED CURRENT READY AGEreplicaset.apps/kube-state-metrics-58dffdf67d 1 1 0 35sreplicaset.apps/prometheus-server-65f5d59585 1 1 1 2m[root@k8s-master-dev kube-state-metrics]# cd ..
5) 部署prometheus-adapter
参考 :[root@k8s-master-dev k8s-prom]# cd k8s-prometheus-adapter/[root@k8s-master-dev k8s-prometheus-adapter]# lscustom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml custom-metrics-apiserver-service.yamlcustom-metrics-apiserver-auth-reader-role-binding.yaml custom-metrics-apiservice.yamlcustom-metrics-apiserver-deployment.yaml custom-metrics-cluster-role.yamlcustom-metrics-apiserver-deployment.yaml.bak custom-metrics-config-map.yamlcustom-metrics-apiserver-resource-reader-cluster-role-binding.yaml custom-metrics-resource-reader-cluster-role.yamlcustom-metrics-apiserver-service-account.yaml hpa-custom-metrics-cluster-role-binding.yaml[root@k8s-master-dev k8s-prometheus-adapter]# grep secretName custom-metrics-apiserver-deployment.yaml secretName: cm-adapter-serving-certs[root@k8s-master-dev k8s-prometheus-adapter]# cd /etc/kubernetes/pki/[root@k8s-master-dev pki]# (umask 077; openssl genrsa -out serving.key 2048)Generating RSA private key, 2048 bit long modulus.....................+++..........+++e is 65537 (0x10001)[root@k8s-master-dev pki]#[root@k8s-master-dev pki]# openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"[root@k8s-master-dev pki]# openssl x509 -req -in serving.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out serving.crt -days 3650Signature oksubject=/CN=servingGetting CA Private Key[root@k8s-master-dev pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key -n promsecret/cm-adapter-serving-certs created[root@k8s-master-dev pki]# kubectl get secret -n promNAME TYPE DATA AGEcm-adapter-serving-certs Opaque 2 9sdefault-token-w4f44 kubernetes.io/service-account-token 3 8mkube-state-metrics-token-dfcmf kubernetes.io/service-account-token 3 4mprometheus-token-4lb78 kubernetes.io/service-account-token 3 6m[root@k8s-master-dev pki]#[root@k8s-master-dev pki]# cd -/root/manifests/prometheus/k8s-prom/k8s-prometheus-adapter[root@k8s-master-dev k8s-prometheus-adapter]# ls custom-metrics-config-map.yamlcustom-metrics-config-map.yaml[root@k8s-master-dev k8s-prometheus-adapter]# cat custom-metrics-config-map.yamlapiVersion: v1kind: ConfigMapmetadata: name: adapter-config namespace: promdata: config.yaml: | rules: - seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod_name: resource: pod name: matches: ^container_(.*)_seconds_total$ as: "" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[5m])) by (<<.GroupBy>>) - seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}' seriesFilters: - isNot: ^container_.*_seconds_total$ resources: overrides: namespace: resource: namespace pod_name: resource: pod name: matches: ^container_(.*)_total$ as: "" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[5m])) by (<<.GroupBy>>) - seriesQuery: '{__name__=~"^container_.*",container_name!="POD",namespace!="",pod_name!=""}' seriesFilters: - isNot: ^container_.*_total$ resources: overrides: namespace: resource: namespace pod_name: resource: pod name: matches: ^container_(.*)$ as: "" metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}) by (<<.GroupBy>>) - seriesQuery: '{namespace!="",__name__!~"^container_.*"}' seriesFilters: - isNot: .*_total$ resources: template: <<.Resource>> name: matches: "" as: "" metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>) - seriesQuery: '{namespace!="",__name__!~"^container_.*"}' seriesFilters: - isNot: .*_seconds_total resources: template: <<.Resource>> name: matches: ^(.*)_total$ as: "" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>) - seriesQuery: '{namespace!="",__name__!~"^container_.*"}' seriesFilters: [] resources: template: <<.Resource>> name: matches: ^(.*)_seconds_total$ as: "" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[5m])) by (<<.GroupBy>>)[root@k8s-master-dev k8s-prometheus-adapter]# grep namespace custom-metrics-apiserver-deployment.yaml namespace: prom[root@k8s-master-dev k8s-prometheus-adapter]# kubectl apply -f ./clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator createdrolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader createddeployment.apps/custom-metrics-apiserver createdclusterrolebinding.rbac.authorization.k8s.io/custom-metrics-resource-reader createdserviceaccount/custom-metrics-apiserver createdservice/custom-metrics-apiserver createdapiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io createdclusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources createdconfigmap/adapter-config createdclusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader createdclusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created[root@k8s-master-dev k8s-prometheus-adapter]# kubectl get cm -n promNAME DATA AGEadapter-config 1 21sprometheus-config 1 21m[root@k8s-master-dev k8s-prometheus-adapter]# kubectl get all -n promNAME READY STATUS RESTARTS AGEpod/custom-metrics-apiserver-65f545496-2hfvb 1/1 Running 0 40spod/kube-state-metrics-58dffdf67d-j4jdv 0/1 Running 0 20mpod/prometheus-node-exporter-7729r 1/1 Running 0 23mpod/prometheus-node-exporter-hhc7f 1/1 Running 0 23mpod/prometheus-node-exporter-jxjcq 1/1 Running 0 23mpod/prometheus-node-exporter-pswbb 1/1 Running 0 23mpod/prometheus-server-65f5d59585-5fj6n 1/1 Running 0 22mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/custom-metrics-apiserver ClusterIP 10.100.7.28443/TCP 41sservice/kube-state-metrics ClusterIP 10.108.165.171 8080/TCP 20mservice/prometheus NodePort 10.98.96.66 9090:30090/TCP 22mservice/prometheus-node-exporter ClusterIP None 9100/TCP 23mNAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEdaemonset.apps/prometheus-node-exporter 4 4 4 4 4 23mNAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGEdeployment.apps/custom-metrics-apiserver 1 1 1 1 42sdeployment.apps/kube-state-metrics 1 1 1 0 20mdeployment.apps/prometheus-server 1 1 1 1 22mNAME DESIRED CURRENT READY AGEreplicaset.apps/custom-metrics-apiserver-65f545496 1 1 1 42sreplicaset.apps/kube-state-metrics-58dffdf67d 1 1 0 20mreplicaset.apps/prometheus-server-65f5d59585 1 1 1 22m[root@k8s-master-dev k8s-prometheus-adapter]#[root@k8s-master-dev k8s-prometheus-adapter]# kubectl api-versions | grep customcustom.metrics.k8s.io/v1beta1[root@k8s-master-dev k8s-prometheus-adapter]#
三、Grafana
grafana 是一个可视化面板,有着非常漂亮的图表和布局展示,功能齐全的度量仪表盘和图形编辑器,支持 Graphite、zabbix、InfluxDB、Prometheus、OpenTSDB、Elasticsearch 等作为数据源,比 Prometheus 自带的图表展示功能强大太多,更加灵活,有丰富的插件,功能更加强大。(使用promQL语句查询出了一些数据,并且在 Prometheus 的 Dashboard 中进行了展示,但是明显可以感觉到 Prometheus 的图表功能相对较弱,所以一般会使用第三方的工具展示这些数据,例Grafana)部署Grafana参考 :[root@k8s-master-dev prometheus]# lsgrafana k8s-prom[root@k8s-master-dev prometheus]# cd grafana/[root@k8s-master-dev grafana]# head -11 grafana.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: monitoring-grafana namespace: promspec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: grafana[root@k8s-master-dev grafana]# tail -2 grafana.yaml k8s-app: grafana type: NodePort[root@k8s-master-dev grafana]# kubectl apply -f grafana.yamldeployment.apps/monitoring-grafana createdservice/monitoring-grafana created[root@k8s-master-dev grafana]# kubectl get pods -n promNAME READY STATUS RESTARTS AGEcustom-metrics-apiserver-65f545496-2hfvb 1/1 Running 0 13mkube-state-metrics-58dffdf67d-j4jdv 1/1 Running 0 32mmonitoring-grafana-ffb4d59bd-w9lg9 0/1 Running 0 8sprometheus-node-exporter-7729r 1/1 Running 0 35mprometheus-node-exporter-hhc7f 1/1 Running 0 35mprometheus-node-exporter-jxjcq 1/1 Running 0 35mprometheus-node-exporter-pswbb 1/1 Running 0 35mprometheus-server-65f5d59585-5fj6n 1/1 Running 0 34m[root@k8s-master-dev grafana]# kubectl get svc -n promNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEcustom-metrics-apiserver ClusterIP 10.100.7.28443/TCP 13mkube-state-metrics ClusterIP 10.108.165.171 8080/TCP 32mmonitoring-grafana NodePort 10.100.131.108 80:42690/TCP 22sprometheus NodePort 10.98.96.66 9090:30090/TCP 34mprometheus-node-exporter ClusterIP None 9100/TCP 35m[root@k8s-master-dev grafana]#
Grafana的使用,默认用户名密码都是admin,登录后首先添加数据源 (如果登录grafana web 时不用输入用户名、密码即可操作,说明在grafana.yml 文件中的GF_AUTH_ANONYMOUS_ENABLED 项设置了true,导致匿名用户以admin的角色登录;将其更改为 false,然后再次kubectl apply -f grafana.yml 即可解决 )
指定Prometheus 的数据源从哪个PromQL URL获取:然后导入Dashboard (Dashboard可以在https://grafana.com/dashboards下载)(补充) 笔者在grafana官网下载了k8s 相关的dashboard 如下所示:然后将下载的k8s cluster summary 再导入到自己环境的grafana中,效果如下所示:如果对dashboard不满意,可以自行创建或修改Dashboard.转载于:https://blog.51cto.com/caiyuanji/2367673