Skip to main content
J

John Pranay Kumar Reddy

4
Workflows

Workflows by John Pranay Kumar Reddy

Workflow preview: Kubernetes RCA and alerting using Gemini, Loki, Prometheus, Slack
Free advanced

Kubernetes RCA and alerting using Gemini, Loki, Prometheus, Slack

## Summary This n8n workflow automates Kubernetes root cause analysis (RCA) and incident alerting by integrating with Loki, Prometheus, and Slack. It streamlines log collection, cluster health monitoring, and AI-driven RCA with Gemini, saving DevOps teams hours of manual troubleshooting. Designed for production-grade Kubernetes environments, this plug-and-play workflow delivers actionable insights directly to your Slack channels. ## Who’s It For 🛠️ DevOps Engineers automating Kubernetes monitoring and incident response. 🔍 Site Reliability Engineers (SREs) aiming to reduce mean time to resolution (MTTR). 🚀 Teams using n8n, Slack, Loki, and Prometheus for observability and automation. ## What It Does - Pulls logs from Loki, filtering for critical errors. - Monitors cluster health, checking pending, not ready, or restarting pods. - Fetches Prometheus metrics for nodes, namespaces, and services. - Generates AI-driven RCA using Google Gemini to analyze logs and metrics. - Sends formatted Slack alerts with deduplication and batching for readability. - Tracks pod lifecycle events, including crashloop states and restarts. ## How It Works - Log Collection: The workflow uses an n8n HTTP Request node to pull logs from Loki, filtering for critical errors. - Cluster Monitoring: A Kubernetes SSH node checks pod states (terminated or running). If terminated ignore those pods for RCA. - Metrics Fetching: Prometheus metrics are retrieved via HTTP Request for nodes and services. - RCA Generation: Logs and metrics are merged into structured prompts for Google Gemini AI to produce actionable RCA insights. - Kubernetes Enrichment: RCA from Gemini will goes to this node and enriches with Kubernetes documentation and provide the documentation links in RCA - Slack Notifications: Alerts are batched, deduplicated, and sent to Slack with clear formatting. ## How to Set Up - Import the Workflow: Download the .json file and import it into your n8n instance. ### Configure Credentials: - Add your Loki, Prometheus, and Slack API credentials in the respective nodes (use n8n Credentials Manager if required). - Set up Google Gemini AI credentials for RCA generation. - Test the Workflow: Run the workflow manually to ensure logs, metrics, and alerts are processed correctly. - Schedule Execution: Use n8n’s Schedule Trigger to run the workflow at desired intervals (e.g., every 5 minutes). ## Requirements 🌐 n8 K8s node installed (self-hosted only, see n8n documentation). 🔑 Access to Kubernetes clusters and API. 📊 Loki and Prometheus set up for log and metrics collection. 💬 Slack workspace with webhook access for notifications. 🤖 Google Gemini AI API key for RCA generation. ## How to Customize the Workflow - Adjust Filters: Modify the Loki query in the HTTP Request node to focus on specific error patterns. - Change Alert Frequency: Update the Schedule Trigger node for different intervals. - Add More Metrics: Extend the Prometheus node to include additional metrics like CPU or memory usage. - Customize Slack Alerts: Edit the Slack node to change message formatting or add more channels. - Enhance RCA: Adjust the Gemini AI prompt in the Set Fields node for more specific insights. ## 🗂️ Category DevOps / Monitoring & Observability / Kubernetes/ AI ## 🏷️ Tags kubernetes, prometheus, slack, alerting, sre, ops, kube-state-metrics, Gemini, AI ## Slack Output ![image.png](fileId:2251)

J
John Pranay Kumar Reddy
AI Summarization
27 Aug 2025
101
0
Workflow preview: Monitor Kubernetes services & pods with Prometheus and send alerts to Slack
Free advanced

Monitor Kubernetes services & pods with Prometheus and send alerts to Slack

#### 🧩 Short Summary Proactively alert to service endpoint changes and pod/container issues (Pending, Not Ready, Restart spikes) using Prometheus metrics, formatted and sent to Slack. #### 🗂️ Category DevOps / Monitoring & Observability #### 🏷️ Tags kubernetes, prometheus, slack, alerting, sre, ops, kube-state-metrics #### ✅ Prerequisites - Prometheus scraping kube-state-metrics v2.x. - Slack App or Incoming Webhook (channel access). - n8n instance with outbound access to Prometheus & Slack. #### 🔑 Required Credentials in n8n Slack: Bot OAuth (chat:write) or Incoming Webhook URL. (Optional) Prometheus Basic Auth (if your Prometheus needs it). #### 🧠 What This Template Does - Detects pods stuck in Pending (scheduling problems like taints/affinity/capacity). - Detects containers Not Ready (readiness probe failures). - Detects container restart spikes over a sliding window (default 5 minutes). - Detects service discovery changes (endpoint count diffs current vs previous snapshot). - Sends clean, emoji-enhanced Slack alerts with pod/namespace/service context. - Outputs a 5-minute summary block to reduce noise. #### 📣 Slack Message Style (examples) ![image.png](fileId:2165) ![image.png](fileId:2164)

J
John Pranay Kumar Reddy
DevOps
21 Aug 2025
737
0
Workflow preview: Send real-time Kubernetes(EKS/GKE/AKS) CPU spike alerts from Prometheus to Slack
Free advanced

Send real-time Kubernetes(EKS/GKE/AKS) CPU spike alerts from Prometheus to Slack

## 🧾 Summary This workflow monitors Kubernetes pod CPU usage using Prometheus, and sends real-time Slack alerts when CPU consumption crosses a threshold (e.g., 0.8 cores). It groups pods by application name to reduce noise and improve clarity, making it ideal for observability across multi-pod deployments like Argo CD, Loki, Promtail, applications etc. ## 👥 Who’s it for Designed for DevOps and SRE teams and platform teams, this workflow is 100% no-code, plug-and-play, and can be easily extended to support memory, disk, or network spikes. It eliminates the need for Alertmanager by routing critical alerts directly into Slack using native n8n nodes. ## ⚙️ What it does This n8n workflow polls Prometheus every 5 minutes ⏱️, checks if any pod's CPU usage crosses a defined threshold (e.g., 0.8 cores) 🚨, groups them by app 🧩, and sends structured alerts to a Slack channel 💬. ## 🛠️ How to set up 🔗 Set your Prometheus URL with required metrics (container_cpu_usage_seconds_total, kube_pod_container_resource_limits) 🔐 Add your Slack bot token with chat:write scope 🧩 Import the workflow, customize: Threshold (e.g., 0.8 cores) Slack channel Cron schedule ## 📋 Requirements - A working Prometheus stack with kube-state-metrics - Slack bot credentials - n8n instance (self-hosted or cloud) ## 🧑‍💻 How to customize 🧠 Adjust threshold values or query interval 📈 Add memory/disk/network usage metrics 💡 This is a plug-and-play Kubernetes alerting template for real-time observability. ## 🏷️ Tags: Prometheus, Slack, Kubernetes, Alert, n8n, DevOps, Observability, CPU Spike, Monitoring ## Prometheus Spike Alert to Slack ![image.png](fileId:1992)

J
John Pranay Kumar Reddy
DevOps
8 Aug 2025
1064
0
Workflow preview: Send deduplicated Kubernetes(EKS/GKE/AKS) error logs from Grafana Loki to Slack
Free intermediate

Send deduplicated Kubernetes(EKS/GKE/AKS) error logs from Grafana Loki to Slack

## ✨ Summary Efficiently monitor Kubernetes environments by sending only unique error logs from Grafana Loki to Slack. Reduces alert fatigue while keeping your team informed about critical log events. ## 🧑‍💻 Who’s it for - DevOps or SRE engineers running EKS/GKE/AKS - Anyone using Grafana Loki and Promtail for centralized logging - Teams that want Slack alerts but hate alert spam ## 🔍 What it does This n8n workflow queries your Loki logs every 5 minutes, filters only the critical ones (error, timeout, exception, etc.), removes duplicate alerts within the batch, and sends clean alerts to a Slack channel with full metadata (pod, namespace, node, container, log, timestamp). ## 🧠 How it works #### 🕒 Schedule Trigger Every 5 minutes (customizable) #### 🌐 Loki HTTP Query Pulls logs from the last 10 minutes Keyword match: error, failed, oom, etc. #### 🧹 Log Parsing Extracts log fields (pod, container, etc.) Skips empty/malformed results #### 🧠 Deduplication Removes repeated error messages (within query window) #### 📤 Slack Notification Sends nicely formatted message to Slack ## ⚙️ Requirements #### Tool Notes Loki- Exposed internally or externally Slack App- With chat:write OAuth n8n- Cloud or self-hosted ## 🔧 How to Set It Up Import the JSON file into n8n Update: - Loki API URL (e.g., http://loki-gateway.monitoring.svc.cluster.local) - Slack Bearer Token (via credentials) - Target Slack channel (e.g., #k8s-alerts) - (Optional) Change keywords in the query regex - Activate the workflow - Ensure n8n pod/container is having access to your kubernetes cluster/pods/namespaces ## 🛠 How to Customize - Want more or fewer keywords? Adjust the regex in the Query Loki for Error Logs node. - Need to increase deduplication logic? Enhance the Remove Duplicate Alerts node. - Want 5-log summaries every 5 min? Fork this and add a Batch + Slack group sender. ## Grafana Loki logs to Slack Output ![image.png](fileId:1946)

J
John Pranay Kumar Reddy
DevOps
6 Aug 2025
224
0