github.com/altoros/juju-vmware@v0.0.0-20150312064031-f19ae857ccca/doc/backup_and_restore.txt

github.com/altoros/juju-vmware@v0.0.0-20150312064031-f19ae857ccca/doc/backup_and_restore.txt (about)

1 Backup and Restore
2 ===================
3
4 Backup of juju's state is a critical feature, not only for juju users
5 but for use inside juju itself. This is likewise the case for the
6 ability to restore previous backups. This doc is intended as an
7 overview of both since changes in juju are prone to break both.
8
9 Backup
10 -------------------
11
12 Backing up juju state involves dumping the state database (currently
13 from mongo) and copying all files that are critical to juju's operation.
14 All the files are bundled up into an archive file. Effectively the
15 archive represents a snapshot of juju state.
16
17 That snapshot is stored by the state server in such a way that only a
18 backup ID is needed for restore (no need to upload an archive). Note
19 that if the state server is not available, restoring with just the ID is
20 not an option. While that situation will need to be addressed in the
21 short-term, it should not require much additional effort.
22
23 We make reasonable efforts to ensure that the archive is consistent with
24 the snapshot. This includes stopping the database for the length of
25 time it takes to dump it. There is, however, room for improvement with
26 regard to ensuring the consistency of the archive.
27
28 First of all, running juju commands will fail while the DB is
29 unavailable (already running services should not be affected). While
30 this period of time is rather short, we expect that it will grow over
31 time. Furthermore, the larger an environment's state, the larger the
32 impact of this downtime. In the long term this makes it less than ideal
33 to run backup as often as it should be.
34
35 Secondly, state currently does not block for the backup process as a
36 whole. This means that if we dump the DB first, it may be out of date
37 by the time we finish gathering the state-related files. In practice
38 this isn't a big concern since we do not expect the files to change
39 during the interval that backup is running. However, we do backup some
40 log files, so there is a small chance they will differ from when backups
41 started.
42
43 Restore
44 -------------------
45
46 Restore involves reviving the juju state in a new environment by
47 reversing the steps taken by backup. However, the process is a bit more
48 complicated than just gathering files and dumping the DB.
49
50 If no state-server is present restore will do the following:
51
52 1. bootstrap a new node in safe mode (ProvisionerSafeMode reports
53 whether the provisioner should not destroy machines it does not know
54 about),
55 2. stop juju-db,
56 3. stop jujud-machine,
57 4. load the backed-up db in place of the recently created one,
58 5. un-tar the fs files onto the root dir of the current machine,
59 6. run a set of bash scripts that replace the dns/instance names of the
60 old machine with those of the new machine in the relevant config
61 files and also in the db (if this step is not performed peergrouper
62 will kick our machine out of the vote list and fill it with the old
63 dead ones),
64 7. restart all services.
65
66 As noted above, restoring via an uploaded archive (rather than by using
67 an ID) will need to be addressed in the short term, since the existing
68 restore functionality works this way. However, it shouldn't involve
69 more than bootstrapping a new environment, uploading the archive to it,
70 and then requesting restore of that backup. The design in this document
71 already accommodates doing this.
72
73 HA
74 -------------------
75
76 HA is a work in progress, for the moment we have a basic support which is an
77 extension of the regular backup functionality.
78 Read carefully before attempting backup/restore on an HA environment.
79
80 In the case of HA, the backup process will backup files/db for machine 0,
81 support for "any working state server" is plans for the near future.
82 We assume, for now, that if you are running restore is because you have
83 lost all state-server machines. Out of this restore you will get one
84 functioning state-server that you can use to start you other state machines.
85 BEWARE, only run restore in the case where you no longer have working
86 State Servers since otherwise this will take them offline and possibly
87 cripple your environment
88
89 Previous Implementation
90 -------------------
91
92 Backup and restore were both implemented as plugins (in cmd/plugins/) to
93 the juju CLI. The plugins were essentially scripts we sent over SSH to
94 the state machine and ran there. However, they were definitely distinct
95 pieces of software.
96
97
98 Implementation
99 ===================
100
101 Key Points
102 -------------------
103
104 * Backups are created and then stored on the state machine.
105 * Each backup archive has an associated metadata document (stored in
106 mongo).
107 * Each backup archive is stored relative to the state machine (currently
108 env storage).
109 * Restore will have access to the state machine where the archive and
110 metadata are stored.
111 * In the common case there is no need to upload or download a backup.
112 * The backups machinery has its own facade in state/apiserver/backups.
113 * The choice of mechanism for uploading and downloading backups has not
114 been decided yet.
115 * The backups machinery is divided into 4 layers:
116 - state-dependent functionality,
117 - state-independent functionality,
118 - the state API facade for backups,
119 - the juju CLI sub-command for backups.
120 * The state-independent functionality can be broken down further:
121 - a high-level backups interface/implementation,
122 - low-level backup/restore functionality,
123 - components of the backups machinery.
124 * Backups depend on the github.com/juju/utils/filestorage package.
125 * Backups have a special relationship with state (see note at
126 beginning of state/backups.go).
127
128 Backup Archives
129 -------------------
130
131 Each backup archive is a gzipped tar file (.tar.gz). It has the
132 following structure.
133
134 juju-backup/
135 metadata.json - the backup metadata for the archive.
136 root.tar - the bundle of state-related files (exluding mongo).
137 dump/ - all the files dumped from the DB (using mongodump).
138
139 At present we do not include any sort of manifest/index file in the
140 archive.
141
142 For more information, see:
143 - state/backups/db/dump.go - how the DB is dumped;
144 - state/backups/files/files.go - which files are included in root.tar.
145
146 File Layout
147 --------------------
148
149 The layering of the backups machinery and divisions of the state-
150 independent functionality map almost directly to the following
151 structure in the filesystem. The state API facade for backups is spread
152 between state/apiserver and state/api.
153
154 state/
155 backups.go - state-dependent functionality (basically the
156 interaction with mongo and with env storage)
157 backups/ - state-independent functionality
158 backups.go - high-level/public backups interface/implementation
159 create.go - low-level implementation of backing up juju state
160 restore.go - low-level implementation of restoring juju state
161 archive/ - an abstraction of a backups archive
162 db/ - all stuff related to external interaction with the
163 DB (internal interactions live in state/backups.go)
164 files/ - all stuff related to files we back up and restore
165 metadata/ - the backups metadata implementation
166 apiserver/
167 backups/ - the state API facade for backups
168 backups.go - facade implementation (not including methods
169 for end-points)
170 create.go - implementation of the Create() end-point
171 info.go - (wraps state/backups/backups.go:Backups.Get)
172 list.go
173 remove.go
174 restore.go - implementation of the Restore() end-point
175 api/
176 backups.go - the juju state API client implementation for the
177 backups facade
178 params/
179 backups.go - the backups-related API arg/result types
180 cmd/
181 juju/
182 backups.go - the juju CLI sub-command implementation
183
184 Note that upload/download aren't accommodated in apiserver/backups/ yet.
185
186 Layers of Abstraction
187 --------------------
188
189 As noted above, the backups machinery is divided in 4 layers and the
190 state-independent portion into 3 parts. Here is an example (using
191 "create") of how those layers interact.
192
193 * The juju CLI for backups wraps:
194 - the backups facade's Create() method.
195 * The state API facade wraps:
196 - the high-level backups implementation (state/backups/backups.go),
197 - the state-backups interactions (state/backups.go).
198 * the backups implementation wraps:
199 - a "filestorage" implementation (../utils/filestorage:FileStorage),
200 - the low-level "create" implementation,
201 - DB connection info (state/backups/db/info.go),
202 - the backup's metadata.
203 * the "create" implementation makes use of:
204 - the code in state/backups/{archive,db,files}.
205
206 Backups Interface
207 --------------------
208
209 Backups
210 Add(meta Metadata, archive io.ReadCloser) error
211 Create() (id string, err error)
212 Get(id string) (Metadata, io.ReadCloser, error)
213 List() ([]Metadata, error)
214 Remove(id string) error
215 Restore(id string) error
216
217 Note: Restore() makes use of Get().
218
219 State API Facade
220 --------------------
221
222 BackupsAPI
223 Create(BackupsCreateArgs) BackupsCreateResult
224 Info(BackupsInfoArgs) BackupsMetadataResult
225 List() BackupsMetadataListResult
226 Remove(BackupsRemoveArgs)
227 Restore(BackupsRestoreArgs)
228
229 Again note that upload and download are not yet included here.
230
231 CLI sub-command
232 --------------------
233
234 The juju CLI sub-command for backups is called "backups". Its own
235 sub-commands have basically a 1-to-1 equivalence with the API client
236 methods of the same respective names. The essential sub-commands are
237 exposed via the following options:
238
239 - juju backups [--create] [--quiet] [<notes>]
240 - juju backups --info <ID>
241 - juju backups --list [--brief]
242 - juju backups --remove <ID>
243 - juju backups --restore <ID>
244
245 Note: further options may be appropriate for later addition
246 (e.g. [<filename>] on --restore).
247
248 Other anticipated subcommands:
249
250 - juju backups --download <ID> [<filename>]
251 - juju backups --upload <filename>
252
253 Note that download and upload are only hypothetical, pending support in
254 the API facade. When we add the ability to restore from an archive
255 (rather than an ID), --download and --upload (or a --filename option on
256 restore) will become essential.