INFRA-55: Implement persistent storage setup for Kubernetes

Metadata

Source
INFRA-55
Type
Technical task
Priority
Major
Status
Closed
Resolution
Won't Do
Assignee
N/A
Reporter
Alan Harnum
Created
2015-11-02T10:30:03.255-0500
Updated
2017-09-21T10:31:35.595-0400
Versions
N/A
Fixed Versions
N/A
Component
N/A

Description

We'll want to be able (now or in the future) to run containerized applications with persistent data. The most viable models for our needs appear to be either NFS or Ceph.

Background reading:
https://github.com/kubernetes/kubernetes/blob/release-1.0/docs/user-guide/volumes.md
https://github.com/kubernetes/kubernetes/blob/release-1.0/docs/user-guide/persistent-volumes.md

@@Giovanni Tirloni's notes on failure modes of NFS:

  • Something to write a lot of meaningful data to the disk while crash happens and be able to check consistency (candidates: MySQL, InfluxDB, CouchDB)
  • Failure modes:
  • Server A or B stops receives each other heartbeats and doesn't know who is the master anymore => Split brain situation
  • Server A or B dies, is reinstalled from scratch and wants to join cluster (sync with remaining healthy node)
  • Server A primary / Server B secondary => Server A shuts down and stays down
  • Server A primary / Server B secondary => Server B shuts down and stays down
  • Server A secondary / Server B primary => Server A shuts down and stays down
  • Server A secondary / Server B primary => Server B shuts down and stays down
  • Server A primary / Server B secondary => Server A hard resets and tries to join cluster as healthy node
  • Server A primary / Server B secondary => Server B hard resets and tries to join cluster as healthy node
  • Server A secondary / Server B primary => Server A hard resets and tries to join cluster as healthy node
  • Server A secondary / Server B primary => Server B hard resets and tries to join cluster as healthy node

Comments

  • Giovanni Tirloni commented 2017-09-21T10:31:35.593-0400

    This was created for a demo cluster back in 2015 but we ended up choosing a different solution at the time. Closing ticket to document this and will open new ones for other automation work I'll need to do.