AIStore Observability: Logs

AIStore (AIS) provides comprehensive logging that captures system operations, performance metrics, and error conditions.

Scope. How to configure, collect, and read AIS logs.

AIS logs are the cluster’s ground truth: every proxy or target writes a chronological stream of events, warnings, and periodic performance snapshots. Well‑rotated logs let operators:

Reconstruct incidents (root‑cause analysis)
Correlate client symptoms with internal state changes
Spot long‑running jobs without polling the control plane

Configuring Logging
Severity & Verbosity
Log Format and Structure
Log File Layout & Rotation
Accessing Logs
Common Log Patterns
Key Performance Metrics
Troubleshooting Checklist
Operational Tips
Related Documentation

Configuring Logging

# Show the cluster‑wide logging section (current values)
ais config cluster log

Key	Purpose	Typical prod value
`level`	Info verbosity 0-5 (`3` = normal, `4`/`5` = chatty, `<3` disables info). `W` and `E` are always logged.	`3`
`modules`	Space‑separated list of modules whose info lines are forced to level 5 (e.g., `ec space`). Use `none` to clear the override.	`none`
`max_size`	Rotate when a single file exceeds this size	`32MiB`
`max_total`	Upper bound for the entire directory (oldest files deleted first)	`1GiB`
`flush_time`	How often each daemon flushes its in‑memory buffer	`10s`
`stats_time`	Interval for automatic performance snapshots	`60s`
`to_stderr`	Duplicate log lines to stderr (handy for systemd / kubectl logs)	`false`

Show current values:

ais config cluster log               # cluster‑wide
ais config node   NODE_ID log        # single node (effective)

The new value propagates to every node within a second.

Example (Development Defaults)

"log": {
    "level": "3",
    "modules": "none",
    "max_size": "4MiB",
    "max_total": "128MiB",
    "flush_time": "1m",
    "stats_time": "1m",
    "to_stderr": false
}

Example (Production Configuration)

In production environments, settings are typically adjusted for higher retention and less frequent statistics collection:

$ ais config cluster log
PROPERTY         VALUE
log.level        3
log.max_size     4MiB
log.max_total    512MiB
log.flush_time   1m
log.stats_time   3m
log.to_stderr    false

At startup, AIS logs some of these settings:

I 19:28:24.774518 config:2143 log.dir: "/var/log/ais"; l4.proto: tcp; pub port: 51080; verbosity: 3
I 19:28:24.774523 config:2145 config: "/etc/ais/.ais.conf"; stats_time: 10s; authentication: false; backends: [aws]

Severity & Verbosity

AIS prepends every line with a severity prefix and—in the case of informational messages—an internal numeric level.

Severity Prefixes

Prefix	Meaning	Printed when
`E`	Error – unrecoverable / user‑visible	Always
`W`	Warning – succeeded but suspicious	Always
`I`	Informational	Only if allowed by level

Numeric Levels for I Lines

Level	Typical use (examples)
5	Hot‑path trace, request headers, per‑part
4	Verbose progress, retries, caching stats
3	Startup, shutdown, xaction summaries (default)
2‑0	Progressively quieter; at `≤2` almost silent

Tip. Temporarily crank a node:

ais config node set TARGET log.level 4
ais config cluster log.modules ec xs   # focus on EC & batch jobs (xactions)

Per‑module Overrides (`log.modules`)

log.modules lets you boost just a subset of subsystems to level 5 without flooding the whole cluster.

# Elevate erasure‑coding (ec) and xaction scheduler (xs):
ais config cluster log.modules ec xs

# Revert to normal
ais config cluster log.modules none

Log Format and Structure

AIS logs follow a consistent format:

I 2025‑05‑19 13:42:17.791884 cpu:60 Reducing GOMAXPROCS (prev=256) to 32
│ │            │      │              └─ message
│ │            │      └─ Go file:line inside AIS source
│ │            └─ timestamp (µs precision)
│ └─ severity prefix
└─ 'I' for INFO

Common prefixes:

config: – effective runtime configuration
x-<n>: – extended (batch) action lifecycle
nvmeXnY: – per‑disk I/O snapshot
kvstats: – cluster‑wide key‑value metrics (see below)

Log File Layout & Rotation

Environment	Where logs appear	Notes
Bare‑metal	`/var/log/ais/<node>.log`	One file per daemon (proxy / target)
Kubernetes	container stdout (kubectl logs)	Collected by CRI‑O / containerd

File names include the node ID plus a sequence number (target‑A43c.log.3). Rotation is triggered by max_size; retention is enforced by max_total.

AIS implements automatic log rotation as indicated by the header:

Rotated at 2025/05/14 21:00:38, host ais-target-13, go1.24.3 for linux/amd64

When logs are rotated, new log files are created and old ones are typically compressed or archived according to the retention policy.

Accessing Logs

Via CLI

The AIS CLI provides commands to view and collect logs:

# View logs from a specific node
ais log show [NODE_ID]

# Filter logs by severity
ais log show [NODE_ID] --severity error

# Collect logs from all nodes
ais log get --help

Directly in Kubernetes

In Kubernetes deployments, access logs using kubectl:

kubectl logs -n ais ais-proxy-15
kubectl logs -n ais ais-target-13

Common Log Patterns

Startup Sequence

The startup sequence provides important information about the AIS node configuration:

Started up at 2025/05/13 19:28:24, host ais-proxy-15, go1.24.3 for linux/amd64
W 19:28:24.774364 config:1506 control and data share the same intra-cluster network: ais-proxy-15.ais-proxy.ais.svc.cluster.local
I 19:28:24.774518 config:2143 log.dir: "/var/log/ais"; l4.proto: tcp; pub port: 51080; verbosity: 3
I 19:28:24.774523 config:2145 config: "/etc/ais/.ais.conf"; stats_time: 10s; authentication: false; backends: [aws]
I 19:28:24.774540 daemon:311 Version 3.28.2ec8b22, build 2025-05-13T19:20:12+0000, CPUs(32, runtime=256), containerized

Operation Logs

AIS logs details about operations such as list, put, get:

I 21:00:43.063816 base:211 x-list[ApJcaebM5]-ais://yodas-21:00:03.062994-00:00:00.000000 finished
I 21:00:44.482430 base:211 x-list[J1qpgaWbxG]-ais://yodas-21:00:04.481894-00:00:00.000000 finished

Performance Metrics

AIS regularly logs performance metrics in two formats:

Disk-specific performance:

I 21:00:48.784074 nvme3n1: 54MiB/s, 119KiB, 0B/s, 0B, 26%
I 21:00:48.784078 nvme9n1: 41MiB/s, 119KiB, 0B/s, 0B, 20%
I 21:00:48.784080 nvme10n1: 38MiB/s, 112KiB, 0B/s, 0B, 18%

Comprehensive key-value statistics (at regular intervals defined by stats_time):

I 18:06:18.785011 {aws.head.n:114227,aws.head.ns.total:16799090100532,del.n:109,err.get.n:296108,err.ren.n:1,etl.offline.n:1136785,etl.offline.ns.total:73240613148414,get.bps:219094016,get.n:1006219,get.ns:425545477639,get.ns.total:3041645043551487925,get.redir.ns:3262104,get.size:126049578974910,lcache.evicted.n:1211669,lcache.flush.cold.n:491970,lst.n:8824,put.n:286,put.ns.total:1706491834824,put.size:101717912588,ren.n:103,state.flags:32774,stream.in.n:441,stream.in.size:96717189120,stream.out.n:446,stream.out.size:84119388160,disk.nvme7n1.read.bps:35240346...}

Kubernetes-specific Information

In Kubernetes deployments, AIS logs include pod and cluster-specific details:

I 19:28:24.786264 k8s:93 Pod info: name ais-proxy-15 ,namespace ais ,node 10.49.41.55 ,hostname ais-proxy-15 ,host_network false
I 19:28:24.786281 k8s:101   ais-ais-state (&PersistentVolumeClaimVolumeSource{ClaimName:ais-ais-state-ais-proxy-15,ReadOnly:false,})
I 19:28:24.786304 k8s:103   config-template
I 19:28:24.786306 k8s:103   config-mount

Key Performance Metrics

The key-value statistics contain valuable operational metrics:

Key / pattern	Description
`get.n`	Number of GET operations
`put.n`	Number of PUT operations
`get.size`, `put.size`	Cumulative bytes, GET and PUT respectively
`get.bps`	Bytes per second for GET operations
`aws.<name>.ns.total`	Cumulative latency against a cloud backend (AWS S3, in this example)
`aws.head.n`	Number of HEAD requests to AWS S3
`err.get.n`	Number of GET errors
`disk.<device>.read.bps`	Read throughput for specific disk
`disk.<device>.util`	Device utilization percentage

Troubleshooting Checklist

Scan for E & W lines around the timeframe.
Look for spikes in err.<n>.n counters.
Watch disk util > 80% or sustained read.bps plateaus.
Temporarily raise log.level or log.modules on a single node to capture more detail.

For advanced log analysis, consider forwarding logs to external systems for aggregation and visualization.

Operational Tips

Keep log.level=3 in production; raise to 4 or 5 only while debugging. Lower to 2 or below if you truly need silence.
Raise stats_time (≥ 60s) if logs get noisy on busy systems.
Ship rotated logs off‑host weekly.
Always attach ais cluster download-logs tarball to GitHub issues.

Document	Description
Overview	Introduction to AIS observability
CLI	Command-line monitoring tools
Prometheus	Configuring Prometheus with AIS
Metrics Reference	Complete metrics catalog
Grafana	Visualizing AIS metrics with Grafana
Kubernetes	Working with Kubernetes monitoring stacks

MONITORING-LOGS

AIStore Observability: Logs

Table of Contents

Configuring Logging

Example (Development Defaults)

Example (Production Configuration)

Severity & Verbosity

Severity Prefixes

Numeric Levels for I Lines

Per‑module Overrides (`log.modules`)

Log Format and Structure

Log File Layout & Rotation

Accessing Logs

Via CLI

Directly in Kubernetes

Common Log Patterns

Startup Sequence

Operation Logs

Performance Metrics

Kubernetes-specific Information

Key Performance Metrics

Troubleshooting Checklist

Operational Tips

AIStore Observability: Logs

Table of Contents

Configuring Logging

Example (Development Defaults)

Example (Production Configuration)

Severity & Verbosity

Severity Prefixes

Numeric Levels for I Lines

Per‑module Overrides (log.modules)

Log Format and Structure

Log File Layout & Rotation

Accessing Logs

Via CLI

Directly in Kubernetes

Common Log Patterns

Startup Sequence

Operation Logs

Performance Metrics

Kubernetes-specific Information

Key Performance Metrics

Troubleshooting Checklist

Operational Tips

Related Documentation

Per‑module Overrides (`log.modules`)