Our of curiosity, how do you handle monitoring in ...
# general
m
Our of curiosity, how do you handle monitoring in your Kubernetes clusters? What kind of tools / setup do you use?
r
I’m pretty happy with the obvious choice for Kubernetes monitoring: Prometheus deployed via prometheus-operator including kube-state-metrics/node-exporter/prometheus-adapter for cluster info and `ServiceMonitor`s (instances of a Prometheus CRD) to register our custom app metric endpoints. https://github.com/coreos/prometheus-operator or full blown setup with https://github.com/coreos/kube-prometheus
Those guys are using jsonnet for generating their manifests, which is somehow similar to how pulumi works but without being a real programming language.
I’m thinking about porting some of the magic to pulumi definitions so that there is a better option than using the helm-chart or this jsonnet setup for generating plain manifests.
w
@rhythmic-finland-36256 Care to share the relevant Pulumi bits? I'm playing around with the prometheus operator helm chart and trying not to get confused by the messy history of the helm chart vs coreos bits.
I want to get the adapter in there too
This is what I have so far:
Copy code
const monitoringNamespace = new k8s.core.v1.Namespace("monitoring", { metadata: { name: "monitoring" } }, { provider : provider });

function setMonitoringNamespace(obj: any) {
    if (obj.metadata.namespace === undefined) {
        obj.metadata.namespace = "monitoring";
    }
}

const prometheusOperatorChart = new k8s.helm.v2.Chart("po", {
    repo: "stable",
    chart: "prometheus-operator",
    namespace: "monitoring",
    transformations: [ setMonitoringNamespace ],
    values: {
        kubeControllerManager: { enabled: false },
        kubeEtcd: { enabled: false },
        kubeScheduler: { enabled: false },
        kubeTargetVersionOverride: k8sVersion,
        prometheusOperator: { createCustomResource: false }
    }
}, {
    dependsOn: monitoringNamespace,
    provider: provider
});
I'm also running on AWS EKS, which seems to complicate matters.
The last problem with Pulumi is the grafana-test pod fails and Pulumi waits until timeout