Real-World Example: K8s Monitoring Session

On this page

Scenario

This guide is based on a real terminal session. The user started an orchestrator agent, created a specialist named k8s-monitor, and then asked operational questions (node count, resource usage, namespaces, and namespace events).

The session is a practical example of using make-agent for production-style diagnostics while keeping execution scope explicit and narrow.

Type your message. Press Ctrl-D or Ctrl-C to exit.

make-agent> let's create an agent to monitor k8s cluster
I've successfully created a **Kubernetes cluster monitoring agent** named `k8s-monitor` and tested it.

### Agent capabilities:
- `check-cluster-info`: Get basic cluster info (API server, CoreDNS status)
- `get-nodes`: List all nodes with status and details
- `get-pods-all-namespaces`: Show all pods across namespaces
- `get-events`: Show recent cluster events (per namespace or all)
- `describe-pod`: Get detailed info about a specific pod
- `get-resources`: Show CPU/memory usage for nodes/pods

### Test result:
The agent confirmed your cluster is healthy — 27 nodes in `Ready` state, control plane and CoreDNS running. It also provided node details (version, runtime, age).

Would you like me to:
- Add more monitoring tools (e.g., check deployments, services, PVCs)?
- Adjust the current tools or prompts?
- Run another health check (e.g., pods, events, resource usage)?

What happened in the session

  1. The orchestrator loaded with a small, explicit tool set: list_agent, validate_agent, create_agent, run_agent, plus utility tools.
  2. The user asked to create a Kubernetes monitoring specialist.
  3. The specialist was generated with focused tools such as get-nodes, get-events, describe-pod, and get-resources.
  4. The user performed read-oriented troubleshooting tasks and got structured summaries/tables.
  5. The agent also showed planning behavior before a costly query (asking for confirmation before execution).
Note: The value of this flow is not just convenience. It creates an auditable chain of actions: user request → named tool call → bounded command behavior.

Created agent example

The following is the specialist Makefile definition used as an example in this real-world flow. You can copy it directly into a .mk file and run it with make-agent.

define SYSTEM_PROMPT
You are a Kubernetes cluster monitoring specialist. Your job is to help users inspect the health and status of their Kubernetes cluster using kubectl commands. Always ask for clarification if the user's request is ambiguous, and report errors clearly by starting your message with 'ERROR:' if something goes wrong.
endef

.PHONY: check-cluster-info get-nodes get-namespaces get-pods-all-namespaces get-events describe-pod get-resources

# <tool>
# Get basic cluster information
# </tool>
check-cluster-info:
	@kubectl cluster-info


# <tool>
# List all nodes in the cluster with their status
# </tool>
get-nodes:
	@kubectl get nodes -o wide


# <tool>
# List all namespaces in the cluster
# </tool>
get-namespaces:
	@kubectl get namespaces -o wide


# <tool>
# List all pods across all namespaces
# </tool>
get-pods-all-namespaces:
	@kubectl get pods --all-namespaces -o wide


# <tool>
# Show recent events in the cluster
# @param NAMESPACE string Namespace to show events from (use 'all' for all namespaces)
# </tool>
get-events:
	@if [ "$(NAMESPACE)" = "all" ]; then kubectl get events --all-namespaces --sort-by=".lastTimestamp"; else kubectl get events -n "$(NAMESPACE)" --sort-by=".lastTimestamp"; fi


# <tool>
# Get detailed information about a specific pod
# @param POD_NAME string Name of the pod
# @param NAMESPACE string Namespace where the pod is located
# </tool>
describe-pod:
	@kubectl describe pod "$(POD_NAME)" -n "$(NAMESPACE)"


# <tool>
# Show resource usage (CPU/memory) for nodes and pods
# @param TYPE string Resource type to show — 'nodes', 'pods', or 'all'
# </tool>
get-resources:
	@if [ "$(TYPE)" = "nodes" ]; then kubectl top nodes; elif [ "$(TYPE)" = "pods" ]; then kubectl top pods --all-namespaces; else kubectl top nodes && echo "--- PODS ---" && kubectl top pods --all-namespaces; fi

Security benefits and guardrails

This session highlights two core security controls in make-agent:

1) User-limited tools (allowlist by design)

The model can only call tools that the user exposes in the Makefile and built-ins enabled for the run. In the log, the orchestrator did not have arbitrary shell access; it had specific orchestration tools. The k8s-monitor specialist also had a bounded set of cluster-observability tools.

2) System prompt as policy boundary

The specialist prompt in the log explicitly constrained behavior (for example: ask for clarification on ambiguity and format errors clearly). This prompt acts as a policy layer that shapes decisions, response style, and error handling.

3) Disable risky built-ins for production runs

For a safer Kubernetes specialist session, disable agent-creation and delegation built-ins so the runtime cannot create or launch additional specialists dynamically.

uv run make_agent -f k8s-monitor.mk --disable-builtin-tools all

# or disable only delegation/creation built-ins
uv run make_agent -f k8s-monitor.mk --disable-builtin-tools run_agent,create_agent
Warning: In sensitive environments, prefer a single-purpose specialist with only explicit kubectl tools and no built-in orchestration tools.
Control What it limits Benefit
Tool allowlist Only declared tools are callable Reduces blast radius and accidental misuse
System prompt rules Agent behavior and decision style More predictable and reviewable outputs
Specialist separation Task scope per agent Least-privilege architecture
Disable built-ins Blocks dynamic agent creation/delegation paths Lower risk in production troubleshooting sessions
Tip: For sensitive environments, keep specialist tools read-only first. Add mutating tools only when truly needed, isolate them in separate specialists, and disable unneeded built-ins.

How to apply this pattern

  1. Start with an orchestrator that can manage specialists, not perform broad direct actions.
  2. Create a domain specialist with minimal tools for one job (for example, cluster health checks only).
  3. Write a system prompt that encodes operational policy (clarify ambiguity, explicit error format, no hidden assumptions).
  4. Validate with realistic queries from your runbooks.
  5. Expand capabilities gradually, tool by tool, after review.

This gives you fast AI-assisted operations with explicit controls that are easy to reason about and audit.