Job execution and queuing

About the Schematics job queue

When a user runs a Schematics operation, jobs are queued on a a shared service queue. A job remains in the queue until workers are available in the targeted regions to execute the job. Scheduling policies are applied to jobs to ensure the service is equitably available to all users.

Functioning of job queue

Depending upon the number of jobs submitted by a user and the time to run the jobs, the consumer can experience delays before jobs are executed. The job queue ensures that Schematics is equitably available to all its users regardless of the load generated by a user.

Example

If an user-1 has 20 jobs waiting in the queue, and user-2 submits a new job. User-2 job waits into a queue, ahead of user-1's 20 jobs to make Schematics equitably available for both user-1 and user-2.

When does the job enter into the pending queue?

Following are the tasks of the job when it enter into pending queue.

your job requires more time to complete. Check that enough time is specified for the job to execute.
your image that is used by your job run does not exist. Check that the provided image details exist and name is specified.
The environment variable parameters that are required by the job are not specified. Check that the environment variables are defined.
The commands or arguments that are passed to the job are not valid. Check that the argument flags specified are correct.

Job time out

Terraform jobs such as plan, apply, and destroy on a workspace should not generally take more than few hours to provision or deprovision resources. If you are provisioning many resources simultaneously, which takes many hours, it is suggested to split the resources into different workspaces. Schematics limits the execution time of a job to 24 hours. After 24 hours the jobs are terminated and the job is marked as STOPPED and the workspace shows ACTIVE, or INACTIVE.

After 24 hours, an interrupt signal is sent to stop job execution. A grace period of 10 minutes is given for the command to finish. If not completed in this time, a kill signal is sent and the job is terminated. A Terraform refresh is performed after stopping the job to ensure the state file and other data is collected.

In a job, multiple commands such as terraform init, terraform apply, and terraform refresh are executed. If the job times out in a command, all further commands only get the additional 10 minutes to finish. At the end of 10 minutes, each command is killed.

Example If a job is stuck forever on a Terraform apply, when the command is stopped, and if you run a refresh. If refresh is also stuck, after 15 minutes, a kill is executed.

The Terraform local exec and remote exec have a time limit of 30 minutes and terminates job execution if exceeded.