JOB
Introduction, background, definitions
Batch operations that run asynchronously and may take seconds (minutes, hours, etc.) to execute - are called eXtended actions (xactions).
Internally, xaction
is an abstraction at the root of the inheritance hierarchy that also contains specific user-visible jobs: copy-bucket
, evict-objects
, and more.
For the most recently updated list of all supported jobs and their respective compile-time properties, see the source.
All jobs run asynchronously, have start and stop times, and common generic statistics
Further, each and every job kind has its own display name, access permissions, scope (bucket and/or global), and a number of boolean properties - examples including:
Property | Description |
---|---|
Startable |
true if user can start this job via generic jobi-start API |
RefreshCap |
the system must refresh capacity stats upon the job’s completion |
Many kinds of jobs can be manually started via generic job API (which’s in turn utilized by the ais start
command - see next).
Notable exceptions include electing new primary and listing objects in a given bucket - in both of those cases, there’s a separate, more convenient and intuitive API that does the job, so to speak.
Job starting, stopping (i.e., aborting), and monitoring commands all have equivalent shorter versions. For instance
ais start download
can be expressed asais start download
, whileais wait copy-bucket Z8WkHxwIrr
is the same asais wait Z8WkHxwIrr
.
Rest of this document covers starting, stopping, and otherwise managing job kinds and specific job instances. For job monitoring, please use ais show job
command and its numerous subcommands and options.
See also
- static descriptors (source code)
xact
package README.batch jobs
- CLI:
dsort
(distributed shuffle) - CLI:
download
from any remote source - And more:
ais job
command
Has the following static completions aka subcommands:
$ ais job <TAB-TAB>
start stop wait rm show
and further:
$ ais job --help
NAME:
ais job - monitor, query, start/stop and manage jobs and eXtended actions (xactions)
USAGE:
ais job command [command options] [arguments...]
COMMANDS:
start run batch job
stop terminate a single batch job or multiple jobs (press <TAB-TAB> to select, '--help' for options)
wait wait for a specific batch job to complete (press <TAB-TAB> to select, '--help' for options)
rm cleanup finished jobs
show show running and finished jobs ('--all' for all, or press <TAB-TAB> to select, '--help' for options)
OPTIONS:
--help, -h show help
Notice, though, that start
, stop, and
wait` (verbs) have shorter versions, e.g.:
ais start
is a built-in alias forais job start
, and so on.
For all configured pre-built and user-defined aliases (aka “shortcuts”), run
ais alias
orais alias --help
Table of Contents
Start job
ais start <JOB_NAME> [arguments...]
Start a certain job. Some jobs require additional arguments such as bucket name to execute.
Note: job start download|dsort
have slightly different options. Please see their documentation for more:
Examples
Start cluster-wide LRU
Starts LRU xaction on all nodes
$ ais start lru
Started "lru" xaction.
An administrator may choose to run LRU on a subset of buckets. This can be achieved by using the --buckets
flag to provide a comma-separated list of buckets, for instance --buckets bck1,gcp://bck2
, on which LRU needs to be performed.
Additionally, the --force
(-f
) option can be used to override the bucket’s lru.enabled
property.
Note: To ensure safety, the force flag (-f
) only works when a list of buckets is provided.
$ ais start lru --buckets ais://buck1,aws://buck2 -f
Stop job
Stop a single job or multiple jobs.
$ ais stop --help
NAME:
ais stop - (alias for "job stop") terminate a single batch job or multiple jobs, e.g.:
- 'stop tco-cysbohAGL' - terminate a given (multi-object copy/transform) job identified by its unique ID;
- 'stop copy-listrange' - terminate all multi-object copies;
- 'stop copy-objects' - same as above (using display name);
- 'stop list' - stop all list-objects jobs;
- 'stop ls' - same as above;
- 'stop prefetch-listrange' - stop all prefetch jobs;
- 'stop prefetch' - same as above;
- 'stop g731 --force' - forcefully abort global rebalance g731 (advanced usage only);
- 'stop --all' - terminate all running jobs
press <TAB-TAB> to select, '--help' for more options.
USAGE:
ais stop [command options] [NAME] [JOB_ID] [NODE_ID] [BUCKET]
OPTIONS:
--all all running jobs
--regex value regular expression to select jobs by name, kind, or description, e.g.: --regex "ec|mirror|elect"
--force, -f force execution of the command (caution: advanced usage only)
--yes, -y assume 'yes' to all questions
--help, -h show help
Examples stopping a single job:
ais stop download JOB_ID
ais stop JOB_ID
ais stop dsort JOB_ID
Examples stopping multiple jobs:
ais stop download --all
# stop all downloadsais stop copy-bucket ais://abc --all
# stop allcopy-bucket
jobs where the destination bucket is ais://abcais stop resilver t[rt2erGhbr]
# ask target t[rt2erGhbr] to stop resilvering
and more.
Note: job stop download|dsort
have slightly different options. Please see their documentation for more:
More Examples
Stop cluster-wide LRU
Stops currently running LRU eviction.
$ ais stop lru
Stopped LRU eviction.
Show job statistics
ais show job [NAME] [JOB_ID] [NODE_ID] [BUCKET]
You can show jobs by any combination of the optional (filtering) arguments: NAME, JOB_ID, etc..
Use --all
option to include finished (or aborted) jobs.
As usual, press <TAB-TAB> to select and see
--help` for details.
job show download|dsort
have slightly different options. Please see their documentation for more:
Show extended statistics
All jobs show the number of processed objects(column OBJECTS
) and the total size of the data(column BYTES
).
Both values are cumulative for the entire job’s life-time.
Certain kinds of supported jobs provide extended statistics, including:
Show EC Encoding Statistics
The output contains a few extra columns:
ERRORS
- the total number of objects EC failed to encodeQUEUE
- the average length of working queue: the average number of objects waiting in the queue when a new EC encode request received. Values close to0
mean that every object was processed immediately after the request had been receivedAVG TIME
- the average total processing time for an object: from the moment the object is put to the working queue and to the moment the last encoded slice is sent to another targetENC TIME
- the average amount of time spent on encoding an object.
The extended statistics may give a hint what is the possible bottleneck:
- high values in
QUEUE
- EC is congested and does not have time to process all incoming requests - low values in
QUEUE
andENC TIME
, but high ones inAVG TIME
may mean that the network is slow and a lot of time spent on sending the encoded slices - low values in
QUEUE
, andENC TIME
close toAVG TIME
may mean that the local hardware is overloaded: either local drives or CPUs are overloaded.
Show EC Restoring Statistics
Show information about EC restore requests.
The output contains a few extra columns:
ERRORS
- the total number of objects EC failed to restoreQUEUE
- the average length of working queue: the average number of objects waiting in the queue when a new EC encode request received. Values close to0
mean that every object was processed immediately after the request had been receivedAVG TIME
- the average total processing time for an object: from the moment the object is put to the working queue and to the moment the last encoded slice is sent to another target
Options
Flag | Type | Description | Default |
---|---|---|---|
--json |
bool |
Output details in JSON format | false |
--all |
bool |
If set, additionally displays old, finished xactions | false |
--active |
bool |
If set, displays only running xactions | false |
--verbose -v |
bool |
If set, displays all xaction statistics including extended ones. If the number of xaction to display is greater than one, the flag is ignored. | false |
Certain extended actions have additional CLI. In particular, rebalance stats can also be displayed using the following command:
ais show rebalance
Display details about the most recent rebalance xaction.
Flag | Type | Description | Default |
---|---|---|---|
--refresh |
duration |
Refresh interval - time duration between reports. The usual unit suffixes are supported and include m (for minutes), s (seconds), ms (milliseconds). Ctrl-C to stop monitoring. |
` ` |
--all |
bool |
If set, show all rebalance xactions | false |
Output of this command differs from the generic xaction output.
Examples
Default compact tabular view:
$ ais show job --all
NODE ID KIND BUCKET OBJECTS BYTES START END STATE
zXZXt8084 FXjl0NWGOU ec-put TESTAISBUCKET-ec-mpaths 5 4.56MiB 12-02 13:04:50 12-02 13:04:50 Aborted
Verbose tabular view:
$ ais show job FXjl0NWGOU --verbose
PROPERTY VALUE
.aborted true
.bck ais://TESTAISBUCKET-ec-mpaths
.end 12-02 13:04:50
.id FXjl0NWGOU
.kind ec-put
.start 12-02 13:04:50
ec.delete.err.n 0
ec.delete.n 0
ec.delete.time 0s
ec.encode.err.n 0
ec.encode.n 5
ec.encode.size 4.56MiB
ec.encode.time 16.964552ms
ec.obj.process.time 17.142239ms
ec.queue.len.n 0
in.obj.n 0
in.obj.size 0
is_idle true
loc.obj.n 5
loc.obj.size 4.56MiB
out.obj.n 0
out.obj.size 0
Wait for job
ais wait [NAME] [JOB_ID] [NODE_ID] [BUCKET]
Wait for the specified job to finish.
job wait download|dsort
have slightly different options. Please see their documentation for more:
Options
Flag | Type | Description | Default |
---|---|---|---|
--refresh |
duration |
Refresh interval - time duration between reports. The usual unit suffixes are supported and include m (for minutes), s (seconds), ms (milliseconds) |
` ` |
Distributed Sort
ais start dsort
or ais start dsort
Run dSort. Further reference for this command can be found here.
Downloader
ais start download
or ais start download
Run the AIS Downloader. Further reference for this command can be found here.