INFRA-135: Timeout stalled buildkite jobs

Metadata

Source
INFRA-135
Type
Improvement
Priority
Major
Status
Closed
Resolution
Fixed
Assignee
Avtar Gill
Reporter
Justin Obara
Created
2018-03-07T14:38:03.300-0500
Updated
2018-03-28T13:07:15.391-0400
Versions
N/A
Fixed Versions
N/A
Component
N/A

Description

Recently I submitted updates to a PR that failed to complete their buildkite steps.

https://buildkite.com/fluid-project/fluid-infusion/builds/65
https://buildkite.com/fluid-project/fluid-infusion/builds/67

It seems that this was related to a missing dependency in one of the test files. However, showcased an issue with the current setup where a stalled buildkite job will block all other buildkite operations.

Comments

  • Avtar Gill commented 2018-03-07T16:23:13.249-0500

    I'm aiming to make a PR for this next week. For reference there's a way to create a global timeout using the Buildkite web UI but we're using pipeline configs:

    https://github.com/buildkite/feedback/issues/170

    The workaround will be to add timeout_in_minutes to each step.

  • Giovanni Tirloni commented 2018-03-28T13:07:15.388-0400

    Avtar submitted PR#884 which added timeouts.