commit 17cb564448d90c013f3ffc3fe3b1569358b759a0 Author: lolwierd Date: Fri Apr 24 13:46:01 2026 +0530 init diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..18ea1ca --- /dev/null +++ b/.gitignore @@ -0,0 +1,24 @@ +# macOS +.DS_Store + +# Editors +.vscode/ +.idea/ +*.swp +*~ + +# Agent / local installs +node_modules/ +.npm/ +.claude/ +.agents/ +.augment/ + +# Local builds / tmp +*.log +tmp/ +build/ + +# Env files (never commit) +.env +.env.* diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..8563a68 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,104 @@ +# AGENTS.md + +Guidance for AI coding agents (Claude Code, Cursor, Codex, opencode, ...) and human contributors editing this repository. This file is loaded automatically by agents that honour `AGENTS.md` conventions. + +## What this repo is + +A collection of [agent skills](https://agentskills.io/) for working with Excloud. Skills are `SKILL.md` playbooks that an agent loads on demand; they are prose, not code. They encode Excloud-specific knowledge (auth, safety rails, command syntax, error recovery) that the agent would otherwise have to rediscover each run. + +Skills are installed with [`npx skills add`](https://www.npmjs.com/package/skills); see `README.md` for user-facing install instructions. + +## Directory layout + +``` +skills/ + / # kebab-case, matches the `name:` in SKILL.md frontmatter + SKILL.md # required; the playbook + scripts/ # optional; executable helpers the skill references + *.sh # prefer bash; mark chmod +x + references/ # optional; long-form docs the skill links to + *.md +``` + +One skill per directory. Do not put multiple skills' playbooks in the same `SKILL.md`. + +## SKILL.md conventions + +### Frontmatter + +```yaml +--- +name: +description: +--- +``` + +- `name` must match the parent directory name and be unique across the repo. +- `description` is what the agent sees before it decides to load the skill — write it as a trigger hint. Name the resource or verb, the pain point it addresses, and ideally a couple of trigger phrases. +- Optional: `metadata.internal: true` hides the skill from default discovery; users opt in with `INSTALL_INTERNAL_SKILLS=1`. + +### Body + +Write for an agent that has never seen this surface. The best skills: + +- Tell the agent what to **discover before acting** (`list` / `get` / `--help`) rather than hard-coding IDs, flags, or versions. +- Call out destructive operations explicitly and tell the agent to confirm first. +- State the auth model once, plus what "not authenticated" looks like. +- Capture real error strings the CLI or API emits, paired with the actual cause and fix. These are pure gold — they turn agent flailing into one-shot recovery. +- Document output shapes (table vs. JSON vs. Go-struct dump) so the agent picks the right parser (`awk`, `jq`, or "don't parse this"). +- Prefer "here is the shape you'll see" over "here is the schema" when the surface shifts often. +- Open with a disclaimer that the installed tool's `--help` is canonical and this file can drift. The agent should trust the tool over the skill when they disagree. + +Avoid: + +- Hard-coded account / org / resource IDs. +- Personal paths like `~/Projects/…` or `/Users//…` (those leak into generated commands). +- Source-tree pointers (consumers of the skill won't have the repo checked out). +- Any secret material (tokens, keys, passwords) — even in examples. +- `"last updated"` timestamps or version numbers in the body; they rot fast. + +### Length + +Keep `SKILL.md` under ~400 lines when you can. If a topic needs more, split it into a `references/.md` and link to it from `SKILL.md`. + +## Adding a new skill + +1. `mkdir -p skills/` (kebab-case; matches `name:` in frontmatter). +2. Create `skills//SKILL.md` with the frontmatter above. +3. Verify the skill loads cleanly: + ```bash + # list only — verifies the skill is discoverable without touching your agent dirs + npx skills add ./ --list + + # real install from the local checkout + npx skills add ./ --skill + ``` +4. Update the "Available skills" section of `README.md` with a short blurb and the main use-cases. +5. Open a PR. One skill per PR keeps review easy. + +## Editing an existing skill + +- Changes to `SKILL.md` should be reviewable as prose — small, focused edits, with the commit message explaining the _intent_ (e.g. "document --download flag on compute scp" beats "update skill"). +- If you verified a new behaviour against a live tool, mention the tool version or the date of verification in the PR description (not in the skill body). +- If a section becomes stale because the upstream tool's surface shifted, prefer rewriting it to point the agent at `--help` rather than chasing every flag change. + +## Testing + +There is no build step. A "test" is: does an agent loaded with this skill do the right thing on a representative prompt? A useful flow: + +1. Install the branch locally: `npx skills add --skill -a claude-code`. +2. Ask the agent to do something the skill targets (create a VM, fetch a kubeconfig, etc.). +3. Watch for: does it discover IDs from `list`? Does it confirm before destructive ops? Does it recognise real error strings? + +If all three feel right, the skill is doing its job. + +## Style + +- Markdown only. Plain prose; lists for flag enumerations; fenced code blocks for commands. +- Backticks around every flag, file path, env var, and command substring. +- American or British English, pick one and stay consistent within a skill. +- No emojis in `SKILL.md` unless the user explicitly asked for a playful tone somewhere. + +## Release / publish + +There is no package to publish. `npx skills add` clones this repo directly from `git.excloud.in` and reads `SKILL.md` files straight from the default branch, so merging to `main` is the release — no tags, no npm publish step. diff --git a/README.md b/README.md new file mode 100644 index 0000000..fcd93f8 --- /dev/null +++ b/README.md @@ -0,0 +1,82 @@ +# Excloud Agent Skills + +A collection of [agent skills](https://agentskills.io/) for working with [Excloud](https://excloud.in) from AI coding agents (Claude Code, Cursor, Codex, opencode, etc.). + +Skills are `SKILL.md` playbooks the agent can load on demand — authenticated paths, safety guardrails, command syntax, and error recovery for a specific surface. Install them with the [`skills`](https://www.npmjs.com/package/skills) CLI. + +## Install + +This repo lives on Excloud's self-hosted Gitea at . Point `npx skills add` at the full clone URL (note the `.git` suffix — without it, the CLI looks for a `.well-known/agent-skills/index.json` manifest rather than cloning): + +```bash +npx skills add https://git.excloud.in/excloud-in/excloud-skills.git +``` + +Install a single skill: + +```bash +npx skills add https://git.excloud.in/excloud-in/excloud-skills.git --skill excloud-cli +``` + +Install all skills into every supported agent without prompts: + +```bash +npx skills add https://git.excloud.in/excloud-in/excloud-skills.git --all +``` + +Target specific agents with repeated `-a` flags (defaults to global, user-level install): + +```bash +npx skills add https://git.excloud.in/excloud-in/excloud-skills.git -a claude-code -a opencode --all +``` + +Dry-run — list what the repo offers without installing: + +```bash +npx skills add https://git.excloud.in/excloud-in/excloud-skills.git --list +``` + +Private-repo users: set up git auth for `git.excloud.in` first (an HTTPS credential helper / `~/.netrc`, or use the SSH form `git@git.excloud.in:excloud-in/excloud-skills.git`). `npx skills add` shells out to plain `git clone`, so anything `git clone` can read, it can install from. + +By default skills land under the agent's standard directory (`~/.claude/skills/`, `.agents/skills/`, `.augment/skills/`, ...). Run `npx skills --help` for the full option list, and `npx skills list` to see what is currently installed. + +## Available skills + +### `excloud-cli` + +Safe end-to-end control of Excloud resources through the `exc` CLI. Covers compute (create / inspect / resize / restart / terminate, delete protection, exec / scp / console), networking (subnets, public IPv4, security groups and rules/bindings), volumes and snapshots, SSH keys, Kubernetes (clusters, workers, kubeconfig fetch / merge), IAM (accounts, service accounts, API keys, policies), billing, quota, serial console logs, and metrics. + +**Use when:** the user asks to plan or run `exc` commands, provision / introspect / tear down VMs, attach a public IP, adjust a security group, pull a kubeconfig, debug a stuck boot via serial logs, or exec / scp against a VM. + +**Key guidance the skill encodes:** + +- Auth precedence (`EXCLOUD_ACCESS_TOKEN` / env tokens / `~/.exc/config`) and what "not authenticated" means. +- Discovery first: lookup tables (`exc compute instancetype list`, `image list`, `subnet list`, `securitygroup list`, ...) as the source of truth — no hard-coded IDs. +- Safety guardrails around `terminate`, `publicip release`, `rule delete`, `cluster delete`, `apikey delete`, and destructive shell commands over `exec`. +- Interactive access patterns: when to use `exec` (one-shot, bash-interpreted), when `scp` (upload/download, symlinks rejected), when `console` (interactive TTY, SSH ↔ WS fallback). +- Output-format buckets so agents pipe to `awk`/`jq` correctly. +- A cheat sheet of error messages the CLI actually emits, paired with what each means and how to fix it. + +Prerequisites on the user's machine: the `exc` CLI on `$PATH` and an `exc login` session. `k8s cluster kubeconfig merge` additionally needs `kubectl`. + +## Repository layout + +``` +excloud-skills/ +├── README.md # this file +├── AGENTS.md # contributor / agent guidance for editing skills +├── .gitignore +└── skills/ + └── excloud-cli/ + └── SKILL.md # the skill itself +``` + +Each skill lives in `skills//` with a single `SKILL.md`. Scripts (`scripts/`) and reference docs (`references/`) can live alongside `SKILL.md` when a skill needs them. + +## Contributing a new skill + +See [`AGENTS.md`](./AGENTS.md) for conventions (directory name, `SKILL.md` frontmatter, when to add scripts vs. references, how to keep skills discovery-friendly). Open a PR with the new `skills//` directory; agent-side install is `npx skills add excloud-in/excloud-skills --skill ` once merged. + +## License + +TBD. diff --git a/skills/excloud-cli/SKILL.md b/skills/excloud-cli/SKILL.md new file mode 100644 index 0000000..4135c68 --- /dev/null +++ b/skills/excloud-cli/SKILL.md @@ -0,0 +1,310 @@ +--- +name: excloud-cli +description: Drive Excloud resources (compute, networking, security groups, volumes, snapshots, public IPs, IAM, billing, Kubernetes) through the `exc` CLI. Use when a user asks to plan or execute `exc` commands - creating / inspecting / updating / deleting VMs, running commands on them via `exec` / `scp` / `console`, managing security groups and public IPs, or pulling Kubernetes kubeconfigs - with safety guardrails and auth checks. +--- + +# Excloud CLI + +This skill is a _starting guide_, not a spec. The `exc` CLI is generated from a live OpenAPI surface, so commands and flags change. **Whenever a command or flag in this file disagrees with `exc --help`, trust the CLI.** Re-read the relevant `--help` before shaping a real command, and prefer discovering the surface interactively over memorising it from here. + +``` +exc --help +exc --help +exc --help +``` + +Everything below has been observed working at some point; the model should still verify before running anything destructive. + +--- + +## Workflow principles + +- Prefer `exc` for all Excloud actions unless the user explicitly asks for direct API / SDK use. +- Confirm before anything destructive (see Safety). +- If authentication is missing or expired, tell the user to run `exc login` and stop — do not invent tokens. +- When flag names or behaviours look odd, run `exc <...> --help` rather than guessing. Generated CLIs evolve between releases. +- Read `list` / `get` output shapes carefully before trying to parse them; there is no universal `-o json` flag today (see Output formats). + +## Authentication + +The CLI reads credentials in this precedence order: + +1. `EXCLOUD_ACCESS_TOKEN` or `ACCESS_TOKEN` env var. +2. `EXCLOUD_ID_TOKEN` or `ID_TOKEN` env var. +3. `~/.exc/config` (JSON) written by `exc login` — contains the default account, default org, default zone, and per-account `id_token` / `access_token` material. + +If none of those are present or valid, commands that need a token (`exec`, `scp`, `console`, `k8s cluster kubeconfig get/merge`) fail with `not authenticated; run \`exc login\``. `exc login` opens a browser flow and serves a callback on `http://localhost:7899/callback`. + +`exc me`, `exc org list`, `exc account list`, `exc config list` are useful "where am I?" probes after login. + +## Safety guardrails + +Require explicit user confirmation before running any of these: + +- `exc compute terminate` (especially with `--delete_root_volume`). +- `exc compute volume delete`, `exc compute snapshot delete`, `exc compute key delete`. +- `exc compute publicip release`, `exc compute publicip disassociate`. +- `exc securitygroup delete`, `exc securitygroup rule delete`, `exc securitygroup binding delete`. +- `exc k8s cluster delete`, `exc k8s cluster worker delete`. +- `exc account revoke`, `exc serviceaccount delete`, `exc apikey delete`, `exc policy delete`, `exc policy binding delete`. + +For shell commands delivered through `exc compute exec` or an `exec` script file, refuse or confirm explicitly before running anything like `shutdown`, `reboot`, `rm -rf`, `mkfs`, `dd`, `wipefs`, rewrites of `/etc/fstab`, bootloader edits, or `systemctl stop ssh*` (the last one will make the VM unreachable over SSH — see Interactive access). + +## Discoverability and authoritative lookups + +The skill does _not_ hard-code IDs, instance type names, image IDs, subnet IDs, security group IDs, or zone IDs. Those change per account and over time. Before any `create` / `rule create` / `binding create` call, confirm the IDs with the relevant `list` command: + +- `exc compute instancetype list` — CPU / memory / disk for each advertised type. Pick the smallest type whose CPU/MEMORY columns cover the workload; default to the cheapest advertised micro for scratch work and step up for real workloads. +- `exc compute instancetype capacity --instance_type ` — per-zone availability probe (`available=true|false`). Unknown types return `false` gracefully rather than 404, so `true` is the only reliable signal. +- `exc compute image list` — authoritative image catalog. Image IDs vary per org; do not hard-code them. +- `exc compute subnet list` + `exc compute subnet get --id ` — check `DISABLE_IPV4_PUBLIC_IP`: subnets with this set cannot take `--allocate_public_ipv4=true` at create time. +- `exc securitygroup list` + `exc securitygroup rule list --security_group_id ` + `exc securitygroup binding list --security_group_id ` (or `--interface_id `) — confirm what a SG allows and where it's bound before relying on it. +- `exc compute publicip list` / `exc compute key list` / `exc compute volume list` / `exc compute snapshot list` — authoritative inventories for each resource type. + +If `--help` on the installed CLI shows commands or flags not documented here, prefer `--help`. + +## Common VM lifecycle + +### Create + +Required flags for `exc compute create`: + +- `--name ` (lowercase, `[a-z0-9][a-z0-9-]*[a-z0-9]`). +- `--subnet_id ` (zone of the subnet must match your default zone). +- `--allocate_public_ipv4=true|false` — the flag must be explicit. +- `--image_id ` +- `--instance_type ` +- `--root_volume_size_gib ` + +Useful optional flags (verify via `--help`): + +- `--security_group_ids ` — attach one or more SGs to the primary interface at create time. **If you omit this, the VM may come up with no SG attached** — set at least one. +- `--ssh_pubkey ""` — inline SSH public key string _or_ the `name` of a key managed via `exc compute key`. +- `--public_ipv4_reservation_id ` — attach an existing reserved public IPv4 instead of allocating a new ephemeral one. +- `--root_password ` — for console / emergency access only; SSH keys are strongly preferred. +- `--root_volume_id ` **or** `--root_volume_source_snapshot_id ` (mutually exclusive) — reuse an existing volume or clone from a snapshot for the root disk. +- `--root_volume_baseline_iops ` / `--root_volume_baseline_throughput_mbps ` — provisioned performance for EBS-backed roots. +- `--user_data ` or `--user-data-file ` — first-boot script. See User data below. + +Do not pass flags the help output does not list; deprecated flags (e.g. `--root_volume_perf_tier`) are removed or hidden and will error or be ignored. + +`create` prints a one-row table with at minimum `ID`, `NAME`, `STATE` (usually `STARTING` or `CREATING`), `ZONE`, `SUBNET`, `ROOT_VOLUME_ID`, `PUBLIC_IPV4`, `INTERFACE_IPV4`, `INTERFACE_IPV6`. Note that this row does **not** include `INTERFACE_ID`; fetch that later with `exc compute get --id `. + +### Wait for RUNNING (no native `--wait`) + +The CLI does not provide a wait primitive. Poll `compute get` and key off the `STATE` column: + +```bash +until [ "$(exc compute get --id | awk 'NR==2 {for (i=1;i<=NF;i++) if ($i ~ /^(CREATING|STARTING|RUNNING|STOPPING|STOPPED|RESTARTING|TERMINATING|TERMINATED)$/) print $i}')" = "RUNNING" ]; do sleep 3; done +``` + +(Using column-name matching rather than a fixed index because the header ordering in `compute get` has shifted between releases; trust the header row rather than a hard-coded `$4`.) + +Typical progression for a fresh VM: `CREATING` → `STARTING` → `RUNNING` in roughly half a minute, plus another 15–20 seconds before cloud-init finishes and SSH answers. After RUNNING, wait a bit before the first `exc compute exec` or SSH connection will be reliable. + +### Inspect and control + +- `exc compute list` — hides `TERMINATED` VMs by default. Use this for "what is alive now". +- `exc compute instances list` — rich-metadata variant that shows **all** states unless filtered; add `--states running,stopped`, `--created_after `, `--created_before ` as appropriate. +- `exc compute get --id ` — single VM detail. Shows `INTERFACE_ID` (needed for publicip / SG binding ops) but not `ROOT_VOLUME_ID`. +- `exc compute rename --vm_id --name ` +- `exc compute resize --vm_id --instance_type ` — generally requires the VM to be STOPPED first. +- `exc compute start --vm_id ` +- `exc compute stop --vm_id [--reserve_public_ipv4]` — pass `--reserve_public_ipv4` to keep the ephemeral public IPv4 across the stop. +- `exc compute restart --vm_id ` — a full API-level restart; useful to recover a VM whose SSH stack you broke from `exec`. +- `exc compute terminate --vm_id [--delete_root_volume]` — without `--delete_root_volume` the root volume is kept and can be reused via `create --root_volume_id `. + +### Delete protection + +Three commands can change the `delete_protection` flag; all return the updated VM as JSON: + +- `exc compute protect --vm-id ` — enable protection. +- `exc compute unprotect --vm-id ` — disable protection. +- `exc compute rename --vm_id --name [--delete_protection=true|false]` — rename the VM and, if `--delete_protection` is passed, set protection in the same call. Omitting the flag on `rename` leaves the protection flag untouched, so a bare rename will not accidentally clear it. + +While protection is enabled, `exc compute terminate` returns `VM delete protection is enabled. Disable delete protection before terminating this instance.` (exit 1). Run `unprotect` first, then retry `terminate`. + +### Termination clean-up + +After terminate with `--delete_root_volume`, confirm both with: + +```bash +exc compute get --id # STATE should become TERMINATED in a few seconds +exc compute volume list # the root volume should disappear / move to DELETING +``` + +## User data + +- `--user-data-file ` wins over `--user_data ` if both are set (the inline one is ignored with a warning). +- The CLI is permissive — it only warns when content looks neither like a shell script nor a cloud-init document. Accepted heuristics: + - Shebang start: `#!/bin/bash`, `#!/usr/bin/env bash`, `#!/bin/sh`. + - First non-empty line begins with `#cloud-` (e.g. `#cloud-config`, `#cloud-boothook`). +- Prefer real `#!/bin/bash` scripts or `#cloud-config` YAML; other content will run but triggers the warning. + +## Interactive access: `connect`, `exec`, `scp`, `console` + +`exc compute connect` is the low-level session primitive; `exec`, `scp` and `console` all build on it. + +- `exc compute connect --vm_id [--user ubuntu] [--return_private_key]` — returns a short-lived session ID and, when `--return_private_key` is set, a base64-encoded PEM authorised for the VM. +- `exc compute exec --vm-id (--command "" | --script-file ) [--user ubuntu] [--timeout ]` + - `--command` and `--script-file` are mutually exclusive; exactly one is required. + - `--script-file` is **interpreted as bash on the VM** (piped into `bash -s`). It is not a plain upload — plain-text files that contain non-command lines will fail with `command not found`. For transferring files verbatim, use `scp`. + - `--timeout` has a sensible default (tens of seconds) and a hard backend cap (check `--help`). A timed-out command prints `command timed out` and returns exit 124. + - Remote exit codes propagate: `exit 42` on the VM → local exit 42, with `Process exited with status 42` on stderr. + - On success and failure alike the command emits `warning: host key not verified` on stderr — that is expected (the CLI trusts the instance-connect key without pinning). Redirect stderr when scripting. + - SSH targets are tried in order: public IPv4 → any interface private IPv4 → any interface IPv6. If all SSH targets fail, `exec` automatically falls back to the WebSocket console transport. The fallback uses a unique marker to capture the remote exit code. Whether the WS transport succeeds depends on the compute service — if it rejects the session (`unknown session`) or times out (`Timeout connecting to the instance`), `exec` will fail with a 255 exit. In that case, confirm the VM is actually reachable via its public IPv4 (security group / sshd status) rather than relying on WS. +- `exc compute scp --vm-id --src --dst [--user ubuntu] [--recursive] [--download] [--timeout ]` + - Default direction is **upload** (local → VM). Pass `--download` to pull files from the VM to local. + - `--recursive` is required for directory transfers in either direction. + - Symlinks are **rejected** — an encountered symlink fails the whole transfer with `symlink entries are not supported: ` (exit 1). Dereference or archive them locally (e.g. `tar -czhf ...`) before calling `scp`. + - `scp` does **not** fall back to the WebSocket transport when SSH is unreachable; it errors out. Use `scp` only on VMs whose SSH is reachable. + - If the destination requires elevation, upload to a writable path (e.g. `/tmp/...`) and move with `sudo` via `exc compute exec`. +- `exc compute console --vm-id [--user ubuntu] [--timeout ] [--ssh | --ws]` + - Opens an **interactive** shell on the VM. By default it tries SSH first, then falls back to the WebSocket console. + - `--ssh` forces SSH only, `--ws` forces WebSocket only. + - Requires a real TTY — piping input or running inside a non-interactive shell will fail with `failed to set terminal to raw mode: inappropriate ioctl for device`. For scripted one-shots use `exec`; for interactive work suggest the user run `exc compute console` directly. + +### Troubleshooting SSH / exec failures + +1. Does the VM have a reachable address? `exc compute get --id ` — check `PUBLIC_IPV4`, `INTERFACE_IPV4`, `INTERFACE_IPV6`. +2. Is a security group bound and does it permit SSH? + - `exc securitygroup binding list --interface_id ` + - `exc securitygroup binding create --interface_id --security_group_id ` + - `exc securitygroup rule list --security_group_id ` +3. Is there an ingress rule for port 22 from your source IP? If not, create one: + - `exc securitygroup rule create --security_group_id --is_ingress=true --protocol TCPv4 --port_range 22 --cidr "/32"` +4. Is there an egress rule for the VM to reach the internet? Most setups want a broad egress rule: + - `exc securitygroup rule create --security_group_id --is_ingress=false --protocol IPv4 --port_range ANY --cidr 0.0.0.0/0` +5. If `exec` says `connection refused` on port 22, sshd is likely not running. `exc compute restart --vm_id ` brings it back (the API-level restart does not need SSH). + +## Serial console logs + +`exc compute seriallogs --id [--boot_id ] [--offset --direction older|newer] [--limit ] [-f]` + +- Omitting `--boot_id` returns the latest boot. +- `--offset` and `--direction` must be set together; the valid directions are `older` and `newer`. +- `--limit` must be positive when set; typical default is ~200 and the backend has a hard cap. +- `-f / --follow` polls for newer lines every couple of seconds — not a native stream. +- Lines are prefixed with `[ offset=]`. Look for `Cloud-init ... finished`, `Reached target ... cloud-init.target`, and the login banner (`Ubuntu X.Y.Z ip-a-b-c-d ttyS0`) to confirm a clean boot. + +## Networking + +### Subnets + +- `exc compute subnet list` — the `DISABLE_IPV4_PUBLIC_IP` column is the gate on whether `--allocate_public_ipv4=true` is legal. +- `exc compute subnet get --id ` + +### Public IPv4 + +- `exc compute publicip list` / `exc compute publicip get --id ` +- `exc compute publicip reserve --name [--interface_id ]` — if `--interface_id` is passed the new reservation is also attached in one step. +- `exc compute publicip associate --interface_id --reservation_id ` +- `exc compute publicip disassociate --reservation_id ` +- `exc compute publicip rename --reservation_id --name ` +- `exc compute publicip release --reservation_id ` (destructive). + +### Local IP check + +`exc compute localip --ip ` asks the service whether a given IP falls inside Excloud's local ranges. It returns `{ip, is_local}` and is a backend-defined membership probe — not a "what is my public IP" helper (observed returning `is_local=true` for some clearly non-Excloud addresses, so do not use it as a precise classifier). To learn the caller's public IP, use an external service (e.g. `curl -s https://api.ipify.org`). + +## Security groups + +- `exc securitygroup create --name [--description "..."]` +- `exc securitygroup list` +- `exc securitygroup get --id ` (note: the flag here is `--id`, not `--security_group_id`). +- `exc securitygroup delete --security_group_id ` + +### Rules + +- `exc securitygroup rule create --security_group_id --is_ingress=true|false --protocol --port_range --cidr [--description "..."]` + - `--is_ingress` is **required**. Pass `=true` for ingress, `=false` for egress. Omitting it errors with `required flag(s) "is_ingress" not set`. + - `--protocol` takes Excloud family strings such as `TCPv4`, `UDPv4`, `ICMPv4`, `IPv4` — verify current valid values via a successful `rule list` if unsure. + - `--port_range` accepts single ports (`22`), ranges (`80-443`), or `ANY`. + - Rules are not updatable — to change one, `rule delete` and `rule create` again. +- `exc securitygroup rule list --security_group_id ` +- `exc securitygroup rule delete --security_group_rule_id ` (destructive). + +### Bindings + +- `exc securitygroup binding create --interface_id --security_group_id ` +- `exc securitygroup binding list (--interface_id | --security_group_id )` — at least one filter is required. +- `exc securitygroup binding delete --interface_id --security_group_id ` + +## Volumes and snapshots + +- `exc compute volume list` / `exc compute volume get --id ` +- `exc compute volume create --name --size_gib [--source_snapshot_id ] [--baseline_iops ] [--baseline_throughput_mbps ]` — zone is injected from config; there is no `--zone_id` flag. +- `exc compute volume rename --volume_id --name ` +- `exc compute volume resize --volume_id --new_size_gib [--baseline_iops ] [--baseline_throughput_mbps ]` +- `exc compute volume delete --volume_id ` (destructive). +- `exc compute snapshot list` / `exc compute snapshot create --volume_id ` / `exc compute snapshot delete --snapshot_id ` + +## SSH key catalog + +- `exc compute key list` / `exc compute key get --id ` +- `exc compute key create --name (--ssh-public-key "" | --ssh-public-key-path )` +- `exc compute key delete --id ` +- The key `name` can be passed to `compute create --ssh_pubkey` in place of a raw public key string. + +## Kubernetes + +- `exc k8s health` +- `exc k8s cluster list` +- `exc k8s cluster create --control_plane_image_id --control_plane_instance_type --subnet_id --root_volume_size_gib [--allocate_public_ipv4] [--security_group_ids ] [--ssh_pubkey ""] [-o ]` + - The response contains the admin kubeconfig inline. Passing `-o ` writes it to disk (mode 0600, creating parent dirs) and strips it from stdout — strongly preferred. +- `exc k8s cluster delete --cluster_id ` (destructive). +- `exc k8s cluster worker list --cluster_id ` +- `exc k8s cluster worker create --cluster_id --worker_image_id --worker_instance_type --subnet_id --root_volume_size_gib [--allocate_public_ipv4] [--security_group_ids ] [--ssh_pubkey ""]` +- `exc k8s cluster worker delete --cluster_id --worker_id ` (destructive). +- `exc k8s cluster kubeconfig get --cluster_id [-o ]` — fetches the current kubeconfig and prints to stdout (or writes to `-o` with mode 0600). Returns a clear 404 if the cluster id is unknown. +- `exc k8s cluster kubeconfig merge --cluster_id [--kubeconfig ] [--backup=true|false]` — merges into `~/.kube/config` (or `--kubeconfig`) using `kubectl config view --merge --flatten --raw`. Requires `kubectl` on PATH. `--backup` defaults to `true` and writes `.bak`, `.bak1`, ... before overwriting. +- `exc k8s bootstrap controlplane get --vm_id --x-exc-imds-token ` — operator bootstrap path; the IMDS token must come from inside the VM's IMDS agent, not be invented. + +## IAM, billing, quota + +- `exc org list` +- `exc account list` / `exc account invite --email ` / `exc account revoke --email ` (the revoke flag is `--email`, not an invite id). +- `exc serviceaccount list` / `exc serviceaccount delete --name ` +- `exc apikey list` / `exc apikey create` (prints the new key once — capture it immediately) / `exc apikey delete --hash ` +- `exc policy list` / `exc policy delete --id ` +- `exc policy binding list (--account_id | --service_account_id )` — at least one filter required; neither errors with `either account_id or service_account_id must be provided`. +- `exc policy binding delete --policy_id (--account_id | --service_account_id )` +- `exc billing get` / `exc quota` + +## Config and misc + +- `exc me` / `exc version` / `exc completion ` +- `exc config list` — shows the current default account / org / zone and configured accounts. +- `exc config set [-a|--account ] [-o|--org ]` — no `--zone` here; default zone is set at login time. + +## Output formats + +Every command either prints a column table (or TSV) or prints JSON — no command should print raw Go-struct dumps anymore. Both shapes are machine-parseable; pick your tool accordingly. + +- **Column tables / TSV** (awk / `cut` / `awk -F\t` friendly): `compute list`, `compute instances list`, `compute get`, `compute create`, `compute terminate` (TSV `vm_id\tstate`), `compute instancetype list` / `capacity`, `compute image list`, `compute subnet list`, `compute volume list`, `compute volume get`, `compute snapshot list`, `compute publicip list`, `compute key list`, `securitygroup list` / `rule list` / `binding list`, `org list`, `account list`, `apikey list`, `policy list`, `config list`, `compute seriallogs`. +- **JSON** (pipe through `jq`): `me`, `quota`, `billing get`, `compute health` (`{"raw":"OK"}`), `k8s health`, `compute subnet get`, `compute publicip get`, `compute key get`, `securitygroup get`, `compute metrics`, `compute connect`, `serviceaccount list`, `compute protect`, `compute unprotect`, `compute rename`, `k8s cluster kubeconfig get` (raw kubeconfig YAML, not JSON-wrapped), and the inline `kubeconfig` field inside the JSON response from `k8s cluster create` when `-o` is not set. + +Before scripting heavy logic against a command, run it once and check the shape. The split between "table" and "JSON" is not always guessable — lists tend to be tables, getters tend to be JSON, but verify. + +## Metrics + +`exc compute metrics --vm_id --start --end [--family ]` + +- Only `cpu` is currently supported. Omitting `--family` defaults to CPU. Any other family (`memory`, `network`, `diskio`, ...) returns `Requested metrics family is not supported for this endpoint.` with exit 1. Re-check `--help` and the above claim if the backend later adds families. +- Output is JSON: `{"series":[{"family":"cpu","period_seconds":5,"points":[{"timestamp":"...","average":,"max":,"min":}, ...],"unit":"Percent"}]}`. Parse with `jq` (e.g. `jq '.series[0].points[-1].average'`). + +## Error messages to recognise + +- `not authenticated; run \`exc login\`` — no valid token in env or `~/.exc/config`. +- `required flag(s) "" not set` — cobra-level enforcement. Read `--help` again. +- `Could not parse your request!! Are you sure you passed the correct flags?` — generic backend 400. Typically means an unknown ID, a value of the wrong type, or a server-side required field that the CLI accepted as empty. Verify every ID against a `list` before retrying. +- `Oops could not find the you specified, maybe try checking if the exists?` — backend 404-ish. Trust the hint. +- `Oops the IP provided is invalid` — syntactic IP validation on `compute localip`. +- `Something went wrong on our end!!` — backend 500. Observed on `compute connect` for a non-existent VM. Verify the VM exists via `compute get`; do not retry blindly. +- `VM delete protection is enabled. Disable delete protection before terminating this instance.` — run `exc compute unprotect --vm-id ` first, then retry `terminate`. +- `At least one field must be provided: name or delete_protection.` — you hit `compute rename` / `compute update` with neither flag set. Pass `--name ` and/or use `protect` / `unprotect` instead of `rename --delete_protection=...` for protection changes. +- `command timed out` (exit 124) — `exec --timeout` elapsed. Raise the timeout, or launch the work in the background on the VM (`nohup`, systemd unit) and poll with subsequent `exec` calls. +- `invalid --direction "": must be one of older or newer` / `use --offset and --direction together` / `--limit must be greater than 0` — `seriallogs` argument validation. +- `either account_id or service_account_id must be provided` — `policy binding list` needs at least one filter. +- `symlink entries are not supported: ` (exit 1) — `scp --recursive` refuses trees containing symlinks; archive or dereference locally first. +- `unknown session` / `Timeout connecting to the instance!` from `exec` WS fallback — the server-side console rejected the session. SSH is the only reliable path right now; tell the user to ensure the VM has a reachable SSH address and permissive SG rather than relying on WS fallback.