github.com/stevegt/grokker/v3@v3.0.12/core/testdata/te-full.txt (about) 1 2 $Id: turing.xml,v 1.58 2002/11/09 03:53:26 stevegt Exp $ 3 ****** Why Order Matters: 4 Turing Equivalence 5 in 6 Automated Systems Administration ****** 7 Steve Traugott, TerraLuna, LLC -- http://www.stevegt.com 8 Lance Brown, National Institute of Environmental Health Sciences - 9 - lance@bearcircle.net 10 Originally accepted for publication in the proceedings of the USENIX Large 11 Installation System Administration conference, Philadelphia, PA Nov 3-8, 2002. 12 Copyright 2002 Stephen Gordon Traugott, All Rights Reserved 13 ***** Abstract ***** 14 Hosts in a well-architected enterprise infrastructure are self-administered; 15 they perform their own maintenance and upgrades. By definition, self- 16 administered hosts execute self-modifying code. They do not behave according to 17 simple state machine rules, but can incorporate complex feedback loops and 18 evolutionary recursion. 19 The implications of this behavior are of immediate concern to the reliability, 20 security, and ownership costs of enterprise computing. In retrospect, it 21 appears that the same concerns also apply to manually-administered machines, in 22 which administrators use tools that execute in the context of the target disk 23 to change the contents of the same disk. The self-modifying behavior of both 24 manual and automatic administration techniques helps explain the difficulty and 25 expense of maintaining high availability and security in conventionally- 26 administered infrastructures. 27 The practice of infrastructure architecture tool design exists to bring order 28 to this self-referential chaos. Conventional systems administration can be 29 greatly improved upon through discipline, culture, and adoption of practices 30 better fitted to enterprise needs. Creating a low-cost maintenance strategy 31 largely remains an art. What can we do to put this art into the hands of 32 relatively junior administrators? We think that part of the answer includes 33 adopting a well-proven strategy for maintenance tools, based in part upon the 34 theoretical properties of computing. 35 In this paper, we equate self-administered hosts to Turing machines in order to 36 help build a theoretical foundation for understanding this behavior. We discuss 37 some tools that provide mechanisms for reliably managing self-administered 38 hosts, using deterministic ordering techniques. 39 Based on our findings, it appears that no tool, written in any language, can 40 predictably administer an enterprise infrastructure without maintaining a 41 deterministic, repeatable order of changes on each host. The runtime 42 environment for any tool always executes in the context of the target operating 43 system; changes can affect the behavior of the tool itself, creating circular 44 dependencies. The behavior of these changes may be difficult to predict in 45 advance, so testing is necessary to validate changed hosts. Once changes have 46 been validated in testing they must be replicated in production in the same 47 order in which they were tested, due to these same circular dependencies. 48 The least-cost method of managing multiple hosts also appears to be 49 deterministic ordering. All other known management methods seem to include 50 either more testing or higher risk for each host managed. 51 This paper is a living document; revisions and discussion can be found at 52 Infrastructures.Org, a project of TerraLuna, LLC. 53 ***** 1 Foreword ***** 54 ...by Steve Traugott 55 In 1998, Joel Huddleston and I suggested that an entire enterprise 56 infrastructure could be managed as one large "enterprise virtual machine" (EVM) 57 [bootstrap]. That paper briefly described parts of a management toolset, later 58 named ISconf [isconf]. This toolset, based on relatively simple makefiles and 59 shell scripts, did not seem extraordinary at the time. At one point in the 60 paper, we said that we would likely use cfengine [cfengine] the next time 61 around -- I had been following Mark Burgess' progress since 1994. 62 That 1998 paper spawned a web site and community at Infrastructures.Org. This 63 community in turn helped launch the Infrastructure Architecture (IA) career 64 field. In the intervening years, we've seen the Infrastructures.Org community 65 grow from a few dozen to a few hundred people, and the IA field blossom from 66 obscurity into a major marketing campaign by a leading systems vendor. 67 Since 1998, Joel and I have both attempted to use other tools, including 68 cfengine version 1. I've also tried to write tools from scratch again several 69 times, with mixed success. We have repeatedly hit indications that our 1998 70 toolset was more optimized than we had originally thought. It appears that in 71 some ways Joel and I, and the rest of our group at the Bank, were lucky; our 72 toolset protected us from many of the pitfalls that are laying in wait for IAs. 73 One of these pitfalls appears to be deterministic ordering; I never realized 74 how important it was until I tried to use other tools that don't support it. 75 When left without the ability to concisely describe the order of changes to be 76 made on a machine, I've seen a marked decrease in my ability to predict the 77 behavior of those changes, and a large increase in my own time spent 78 monitoring, troubleshooting, and coding for exceptions. These experiences have 79 shown me that loss of order seems to result in lower production reliability and 80 higher labor cost. 81 The ordered behavior of ISconf was more by accident than design. I needed a 82 quick way to get a grip on 300 machines. I cobbled a prototype together on my 83 HP100LX palmtop one March '94 morning, during the 35-minute train ride into 84 Manhattan. I used 'make' as the state engine because it's available on most 85 UNIX machines. The deterministic behavior 'make' uses when iterating over 86 prerequisite lists is something I didn't think of as important at the time -- I 87 was more concerned with observing known dependencies than creating repeatable 88 order. 89 Using that toolset and the EVM mindset, we were able to repeatedly respond to 90 the chaotic international banking mergers and acquisitions of the mid-90's. 91 This response included building and rebuilding some of the largest trading 92 floors in the world, launching on schedule each time, often with as little as a 93 few months' notice, each launch cleaner than the last. We knew at the time that 94 these projects were difficult; after trying other tool combinations for more 95 recent projects I think I have a better appreciation for just how difficult 96 they were. The phrase "throwing a truck through the eye of a needle" has 97 crossed my mind more than once. I don't think we even knew the needle was 98 there. 99 At the invitation of Mark Burgess, I joined his LISA 2001 [lisa] cfengine 100 workshop to discuss what we'd found so far, with possible targets for the 101 cfengine 2.0 feature set. The ordering requirement seemed to need more work; I 102 found ordering surprisingly difficult to justify to an audience practiced in 103 the use of convergent tools, where ordering is often considered a constraint to 104 be specifically avoided [couch] [eika-sandnes]. Later that week, Lance Brown 105 and I were discussing this over dinner, and he hit on the idea of comparing a 106 UNIX machine to a Turing machine. The result is this paper. 107 Based on the symptoms we have seen when comparing ISconf to other tools, I 108 suspect that ordering is a keystone principle in automated systems 109 administration. Lance and I, with a lot of help from others, will attempt to 110 offer a theoretical basis for this suspicion. We encourage others to attempt to 111 refute or support this work at will; I think systems administration may be 112 about to find its computer science roots. We have also already accumulated a 113 large FAQ for this paper -- we'll put that on the website. Discussion on this 114 paper as well as related topics is encouraged on the infrastructures mailing 115 list at http://Infrastructures.Org. 116 ***** 2 Why Order Matters ***** 117 There seem to be (at least) several major reasons why the order of changes 118 made to machines is important in the administration of an enterprise 119 infrastructure: 120 A "circular dependency" or control-loop problem exists when an administrative 121 tool executes code that modifies the tool or the tool's own foundations (the 122 underlying host). Automated administration tool designers cannot assume that 123 the users of their tool will always understand the complex behavior of these 124 circular dependencies. In most cases we will never know what dependencies end 125 users might create. See sections (8.40), (8.46). 126 A test infrastructure is needed to test the behavior of changes before rolling 127 them to production. No tool or language can remove this need, because no 128 testing is capable of validating a change in any conditions other than those 129 tested. This test infrastructure is useless unless there is a way to ensure 130 that production machines will be built and modified in the same way as the test 131 machines. See section (6), 'The_Need_for_Testing'. 132 It appears that a tool that produces deterministic order of changes is cheaper 133 to use than one that permits more flexible ordering. The unpredictable behavior 134 resulting from unordered changes to disk is more costly to validate than the 135 predictable behavior produced by deterministic ordering. See section (8.58). 136 Because cost is a significant driver in the decision-making process of most IT 137 organizations, we will discuss this point more in section (3). 138 Local staff must be able to use administrative tools after a cost-effective 139 (i.e. cheap and quick) turnover phase. While senior infrastructure architects 140 may be well-versed in avoiding the pitfalls of unordered change, we cannot be 141 on the permanent staff of every IT shop on the globe. In order to ensure 142 continued health of machines after rollout of our tools, the tools themselves 143 need to have some reasonable default behavior that is safe if the user lacks 144 this theoretical knowledge. See section (8.54). 145 This business requirement must be addressed by tool developers. In our own 146 practice, we have been able to successfully turnover enterprise infrastructures 147 to permanent staff many times over the last several years. Turnover training in 148 our case is relatively simple, because our toolsets have always implemented 149 ordered change by default. Without this default behavior, we would have also 150 needed to attempt to teach advanced techniques needed for dealing with 151 unordered behavior, such as inspection of code in vendor-supplied binary 152 packages. See section (7.2.2), 'Right_Packages,_Wrong_Order'. 153 ***** 3 A Prediction ***** 154 "Order Matters" when we care about both quality and cost while maintaining an 155 enterprise infrastructure. If the ideas described in this paper are correct, 156 then we can make the following prediction: 157 The least-cost way to ensure that the behavior of any two hosts will 158 remain completely identical is to always implement the same changes 159 in the same order on both hosts. 160 This sounds very simple, almost intuitive, and for many people it is. But to 161 our knowledge, isconf [isconf] is the only generally-available tool which 162 specifically supports administering hosts this way. There seems to be no prior 163 art describing this principle, and in our own experience we have yet to see it 164 specified in any operational procedure. It is trivially easy to demonstrate in 165 practice, but has at times been surprisingly hard to support in conversation, 166 due to the complexity of theory required for a proof. 167 Note that this prediction does not apply only to those situations when you want 168 to maintain two or more identical hosts. It applies to any computer-using 169 organization that needs cost-effective, reliable operation. This includes those 170 that have many unique production hosts. See section (6), 'The_Need_for 171 Testing'. Section (4.3) discusses this further, including single-host rebuilds 172 after a security breach. 173 This prediction also applies to disaster recovery (DR) or business continuity 174 planning. Any part of a credible DR procedure includes some method of 175 rebuilding lost hosts, often with new hardware, in a new location. Restoring 176 from backups is one way to do this, but making complete backups of multiple 177 hosts is redundant -- the same operating system components must be backed up 178 for each host, when all we really need are the user data and host build 179 procedures (how many copies of /bin/ls do we really need on tape?). It is 180 usually more efficient to have a means to quickly and correctly rebuild each 181 host from scratch. A tool that maintains an ordered record of changes made 182 after install is one way to do this. 183 This prediction is particularly important for those organizations using what we 184 call self-administered hosts. These are hosts that run an automated 185 configuration or administration tool in the context of their own operating 186 environment. Commercial tools in this category include Tivoli, Opsware, and 187 CenterRun [tivoli] [opsware] [centerrun]. Open-source tools include cfengine, 188 lcfg, pikt, and our own isconf [cfengine] [lcfg] [pikt] [isconf]. We will 189 discuss the fitness of some of these tools later -- not all appear fully suited 190 to the task. 191 This prediction applies to those organizations which still use an older 192 practice called "cloning" to create and manage hosts. In cloning, an 193 administrator or tool copies a disk image from one machine to another, then 194 makes the changes needed to make the host unique (at minimum, IP address and 195 hostname). After these initial changes, the administrator will often make 196 further changes over the life of the machine. These changes may be required for 197 additional functionality or security, but are too minor to justify re-cloning. 198 Unless order is observed, identical changes made to multiple hosts are not 199 guaranteed to behave in a predictable way (8.47). The procedure needed for 200 properly maintaining cloned machines is not substantially different from that 201 described in section (7.1). 202 This prediction, stated more formally in section (8.58), seems to apply to 203 UNIX, Windows, and any other general-purpose computer with a rewritable disk 204 and modern operating system. More generally, it seems to apply to any von 205 Neumann machine with rewritable nonvolatile storage. 206 ***** 4 Management Methods ***** 207 All computer systems management methods can be classified into one of three 208 categories: divergent, convergent, and congruent. 209 **** 4.1 Divergence **** 210 Divergence (figure_4.1.1) generally implies bad management. Experience shows us 211 that virtually all enterprise infrastructures are still divergent today. 212 Divergence is characterized by the configuration of live hosts drifting away 213 from any desired or assumed baseline disk content. 214 [images/divergence.png] 215 Figure 4.1.1: Divergence 216 One quick way to tell if a shop is divergent is to ask how changes are made on 217 production hosts, how those same changes are incorporated into the baseline 218 build for new or replacement hosts, and how they are made on hosts that were 219 down at the time the change was first deployed. If you get different answers, 220 then the shop is divergent. 221 The symptoms of divergence include unpredictable host behavior, unscheduled 222 downtime, unexpected package and patch installation failure, unclosed security 223 vulnerabilities, significant time spent "firefighting", and high 224 troubleshooting and maintenance costs. 225 The causes of divergence are generally that class of operations that create 226 non-reproducible change. Divergence can be caused by ad-hoc manual changes, 227 changes implemented by two independent automatic agents on the same host, and 228 other unordered changes. Scripts which drive rdist, rsync, ssh, scp, [rdist] 229 [rsync] [ssh] or other change agents as a push operation [bootstrap] are also a 230 common source of divergence. 231 **** 4.2 Convergence **** 232 Convergence (figure_4.2.1) is the process most senior systems administrators 233 first begin when presented with a divergent infrastructure. They tend to start 234 by manually synchronizing some critical files across the diverged machines, 235 then they figure out a way to do that automatically. Convergence is 236 characterized by the configuration of live hosts moving towards an ideal 237 baseline. By definition, all converging infrastructures are still diverged to 238 some degree. (If an infrastructure maintains full compliance with a fully 239 descriptive baseline, then it is congruent according to our definition, not 240 convergent. See section (4.3), 'Congruence'.) 241 [images/convergence.png] 242 Figure 4.2.1: Convergence 243 The baseline description in a converging infrastructure is characteristically 244 an incomplete description of machine state. You can quickly detect convergence 245 in a shop by asking how many files are currently under management control. If 246 an approximate answer is readily available and is on the order of a few hundred 247 files or less, then the shop is likely converging legacy machines on a file-by- 248 file basis. 249 A convergence tool is an excellent means of bringing some semblance of order to 250 a chaotic infrastructure. Convergent tools typically work by sampling a small 251 subset of the disk -- via a checksum of one or more files, for example -- and 252 taking some action in response to what they find. The samples and actions are 253 often defined in a declarative or descriptive language that is optimized for 254 this use. This emulates and preempts the firefighting behavior of a reactive 255 human systems administrator -- "see a problem, fix it". Automating this process 256 provides great economies of scale and speed over doing the same thing manually. 257 Convergence is a feature of Mark Burgess' Computer Immunology principles 258 [immunology]. His cfengine is in our opinion the best tool for this job 259 [cfengine]. Simple file replication tools [sup] [cvsup] [rsync] provide a 260 rudimentary convergence function, but without the other action semantics and 261 fine-grained control that cfengine provides. 262 Because convergence typically includes an intentional process of managing a 263 specific subset of files, there will always be unmanaged files on each host. 264 Whether current differences between unmanaged files will have an impact on 265 future changes is undecidable, because at any point in time we do not know the 266 entire set of future changes, or what files they will depend on. 267 It appears that a central problem with convergent administration of an 268 initially divergent infrastructure is that there is no documentation or 269 knowledge as to when convergence is complete. One must treat the whole 270 infrastructure as if the convergence is incomplete, whether it is or not. So 271 without more information, an attempt to converge formerly divergent hosts to an 272 ideal configuration is a never-ending process. By contrast, an infrastructure 273 based upon first loading a known baseline configuration on all hosts, and 274 limited to purely orthogonal and non-interacting sets of changes, implements 275 congruence (4.3). Unfortunately, this is not the way most shops use convergent 276 tools such as cfengine. 277 The symptoms of a convergent infrastructure include a need to test all changes 278 on all production hosts, in order to detect failures caused by remaining 279 unforeseen differences between hosts. These failures can impact production 280 availability. The deployment process includes iterative adjustment of the 281 configuration tools in response to newly discovered differences, which can 282 cause unexpected delays when rolling out new packages or changes. There may be 283 a higher incidence of failures when deploying changes to older hosts. There may 284 be difficulty eliminating some of the last vestiges of the ad-hoc methods 285 mentioned in section (4.1). Continued use of ad-hoc and manual methods 286 virtually ensures that convergence cannot complete. 287 With all of these faults, convergence still provides much lower overall 288 maintenance costs and better reliability than what is available in a divergent 289 infrastructure. Convergence features also provide more adaptive self-healing 290 ability than pure congruence, due to a convergence tool's ability to detect 291 when deviations from baseline have occurred. Congruent infrastructures rely on 292 monitoring to detect deviations, and generally call for a rebuild when they 293 have occurred. We discuss the security reasons for this in section (4.3). 294 We have found apparent limits to how far convergence alone can go. We know of 295 no previously divergent infrastructure that, through convergence alone, has 296 reached congruence (4.3). This makes sense; convergence is a process of 297 eliminating differences on an as-needed basis; the managed disk content will 298 generally be a smaller set than the unmanaged content. In order to prove 299 congruence, we would need to sample all bits on each disk, ignore those that 300 are user data, determine which of the remaining bits are relevant to the 301 operation of the machine, and compare those with the baseline. 302 In our experience, it is not enough to prove via testing that two hosts 303 currently exhibit the same behavior while ignoring bit differences on disk; we 304 care not only about current behavior, but future behavior as well. Bit 305 differences that are currently deemed not functional, or even those that truly 306 have not been exercised in the operation of the machine, may still affect the 307 viability of future change directives. If we cannot predict the viability of 308 future change actions, we cannot predict the future viability of the machine. 309 Deciding what bit differences are "functional" is often open to individual 310 interpretation. For instance, do we care about the order of lines and comments 311 in /etc/inetd.conf? We might strip out comments and reorder lines without 312 affecting the current operation of the machine; this might seem like a non- 313 functional change, until two years from now. After time passes, the lack of 314 comments will affect our future ability to correctly understand the 315 infrastructure when designing a new change. This example would seem to indicate 316 that even non-machine-readable bit differences can be meaningful when 317 attempting to prove congruence. 318 Unless we can prove congruence, we cannot validate the fitness of a machine 319 without thorough testing, due to the uncertainties described in section (8.25). 320 In order to be valid, this testing must be performed on each production host, 321 due to the factors described in section (8.47). This testing itself requires 322 either removing the host from production use or exposing untested code to 323 users. Without this validation, we cannot trust the machine in mission-critical 324 operation. 325 **** 4.3 Congruence **** 326 Congruence (figure_4.3.1) is the practice of maintaining production hosts in 327 complete compliance with a fully descriptive baseline (7.1). Congruence is 328 defined in terms of disk state rather than behavior, because disk state can be 329 fully described, while behavior cannot (8.59). 330 [images/congruence.png] 331 Figure 4.3.1: Congruence 332 By definition, divergence from baseline disk state in a congruent environment 333 is symptomatic of a failure of code, administrative procedures, or security. In 334 any of these three cases, we may not be able to assume that we know exactly 335 which disk content was damaged. It is usually safe to handle all three cases as 336 a security breach: correct the root cause, then rebuild. 337 You can detect congruence in a shop by asking how the oldest, most complex 338 machine in the infrastructure would be rebuilt if destroyed. If years of 339 sysadmin work can be replayed in an hour, unattended, without resorting to 340 backups, and only user data need be restored from tape, then host management is 341 likely congruent. 342 Rebuilds in a congruent infrastructure are completely unattended and generally 343 faster than in any other; anywhere from 10 minutes for a simple workstation to 344 2 hours for a node in a complex high-availability server cluster (most of that 345 two hours is spent in blocking sleeps while meeting barrier conditions with 346 other nodes). 347 Symptoms of a congruent infrastructure include rapid, predictable, "fire-and- 348 forget" deployments and changes. Disaster recovery and production sites can be 349 easily maintained or rebuilt on demand in a bit-for-bit identical state. 350 Changes are not tested for the first time in production, and there are no 351 unforeseen differences between hosts. Unscheduled production downtime is 352 reduced to that caused by hardware and application problems; firefighting 353 activities drop considerably. Old and new hosts are equally predictable and 354 maintainable, and there are fewer host classes to maintain. There are no ad-hoc 355 or manual changes. We have found that congruence makes cost of ownership much 356 lower, and reliability much higher, than any other method. 357 Our own experience and calculations show that the return-on-investment (ROI) of 358 converting from divergence to congruence is less than 8 months for most 359 organizations. See (figure_4.3.2). This graph assumes an existing divergent 360 infrastructure of 300 hosts, 2%/month growth rate, followed by adoption of 361 congruent automation techniques. Typical observed values were used for other 362 input parameters. Automation tool rollout began at the 6-month mark in this 363 graph, causing temporarily higher costs; return on this investment is in 5 364 months, where the manual and automatic lines cross over at the 11 month mark. 365 Following crossover, we see a rapidly increasing cost savings, continuing over 366 the life of the infrastructure. While this graph is calculated, the results 367 agree with actual enterprise environments that we have converted. There is a 368 CGI generator for this graph at Infrastructures.Org, where you can experiment 369 with your own parameters. 370 [images/t7a_automation_curve.png] 371 Figure 4.3.2: Cumulative costs for fully automated (congruent) versus 372 manual administration. 373 Congruence allows us to validate a change on one host in a class, in an 374 expendable test environment, then deploy that change to production without risk 375 of failure. Note that this is useful even when (or especially when) there may 376 be only one production host in that class. 377 A congruence tool typically works by maintaining a journal of all changes to be 378 made to each machine, including the initial image installation. The journal 379 entries for a class of machine drive all changes on all machines in that class. 380 The tool keeps a lifetime record, on the machine's local disk, of all changes 381 that have been made on a given machine. In the case of loss of a machine, all 382 changes made can be recreated on a new machine by "replaying" the same journal; 383 likewise for creating multiple, identical hosts. The journal is usually 384 specified in a declarative language that is optimized for expressing ordered 385 sets and subsets. This allows subclassing and easy reuse of code to create new 386 host types. See section (7.1), 'Describing_Disk_State'. 387 There are few tools that are capable of the ordered lifetime journaling 388 required for congruent behavior. Our own isconf (7.3.1) is the only 389 specifically congruent tool we know of in production use, though cfengine, with 390 some care and extra coding, appears to be usable for administration of 391 congruent environments. We discuss this in more detail in section (7.3.2). 392 We recognize that congruence may be the only acceptable technique for managing 393 life-critical systems infrastructures, including those that: 394 * Influence the results of human-subject health and medicine experiments 395 * Provide command, control, communications, and intelligence (C3I) for 396 battlefield and weapons systems environments 397 * Support command and telemetry systems for manned aerospace vehicles, 398 including spacecraft and national airspace air traffic control 399 Our personal experience shows that awareness of the risks of conventional host 400 management techniques has not yet penetrated many of these organizations. This 401 is cause for concern. 402 ***** 5 Ordered Thinking ***** 403 We have found that designers of automated systems administration tools can 404 benefit from a certain mindset: 405 Think like a kernel developer, not an application programmer. 406 A good multitasking operating system is designed to isolate applications (and 407 their bugs) from each other and from the kernel, and produce the illusion of 408 independent execution. Systems administration is all about making sure that 409 users continue to see that illusion. 410 Modern languages, compilers, and operating systems are designed to isolate 411 applications programmers from "the bare hardware" and the low-level machine 412 code, and enable object-oriented, declarative, and other high-level 413 abstractions. But it is important to remember that the central processing unit 414 (s) on a general-purpose computer only accepts machine-code instructions, and 415 these instructions are coded in a procedural language. High-level languages are 416 convenient abstractions, but are dependent on several layers of code to deliver 417 machine language instructions to the CPU. 418 In reality, on any computer there is only one program; it starts running when 419 the machine finishes power-on self test (POST), and stops when you kill the 420 power. This program is machine language code, dynamically linked at runtime, 421 calling in fragments of code from all over the disk. These "fragments" of code 422 are what we conventionally think of as applications, shared libraries, device 423 drivers, scripts, commands, administrative tools, and the kernel itself -- all 424 of the components that make up the machine's operating environment. 425 None of these fragments can run standalone on the bare hardware -- they all 426 depend on others. We cannot analyze the behavior of any application-layer tool 427 as if it were a standalone program. Even kernel startup depends on the 428 bootloader, and in some operating systems the kernel runtime characteristics 429 can be influenced by one or more configuration files found elsewhere on disk. 430 This perspective is opposite from that of an application programmer. An 431 application programmer "sees" the system as an axiomatic underlying support 432 infrastructure, with the application in control, and the kernel and shared 433 libraries providing resources. A kernel developer, though, is on the other side 434 of the syscall interface; from this perspective, an application is something 435 you load, schedule, confine, and kill if necessary. 436 On a UNIX machine, systems administration tools are generally ordinary 437 applications that run as root. This means that they, too, are at the mercy of 438 the kernel. The kernel controls them, not the other way around. And yet, we 439 depend on automated systems administration tools to control, modify, and 440 occasionally replace not only that kernel, but any and all other disk content. 441 This presents us with the potential for a circular dependency chain. 442 A common misconception is that "there is some high-level tool language that 443 will avoid the need to maintain strict ordering of changes on a UNIX machine". 444 This belief requires that the underlying runtime layers obey axiomatic and 445 immutable behavioral laws. When using automated administration tools we cannot 446 consider the underlying layers to be axiomatic; the administration tool itself 447 perturbs those underlying layers. See section (7.2.3), 'Circular_Dependencies'. 448 Inspection of high-level code alone is not enough. Without considering the 449 entire system and its resulting machine language code, we cannot prove 450 correctness. For example: 451 print "hello\n"; 452 This looks like a trivial-enough Perl program; it "obviously" should work. But 453 what if the Perl interpreter is broken? In other words, a conclusion of "simple 454 enough to easily prove" can only be made by analyzing low-level machine 455 language code, and the means by which it is produced. 456 "Order Matters" because we need to ensure that the machine-language 457 instructions resulting from a set of change actions will execute in the correct 458 order, with the correct operands. Unless we can prove program correctness at 459 this low level, we cannot prove the correctness of any program. It does no good 460 to prove correctness of a higher-level program when we do not know the 461 correctness of the lower runtime layers. If the high-level program can modify 462 those underlying layers, then the behavior of the program can change with each 463 modification. Ordering of those modifications appears to be important to our 464 ability to predict the behavior of the high-level program. (Put simply, it is 465 important to ensure that you can step off of the tree limb before you cut 466 through it.) 467 ***** 6 The Need for Testing ***** 468 Just as we urge tool designers to think like kernel developers (5), we urge 469 systems administrators to think like operating systems vendors -- because they 470 are. Systems administration is actually systems modification; the administrator 471 replaces binaries and alters configuration files, creating a combination which 472 the operating system vendor has never tested. Since many of these modifications 473 are specific to a single site or even a single machine, it is unreasonable to 474 assume that the vendor has done the requisite testing. The systems 475 administrator must perform the role of systems vendor, testing each unique 476 combination -- before the users do. 477 Due to modern society's reliance on computers, it is unethical (and just plain 478 bad business practice) for an operating system vendor to release untested 479 operating systems without at least noting them as such. Better system vendors 480 undertake a rigorous and exhaustive series of unit, system, regression, 481 application, stress, and performance testing on each build before release, 482 knowing full well that no amount of testing is ever enough (8.9). They do this 483 in their own labs; it would make little sense to plan to do this testing on 484 customers' production machines. 485 And yet, IT shops today habitually have no dedicated testing environment for 486 validating changed operating systems. They deploy changes directly to 487 production without prior testing. Our own experience and informal surveys show 488 that greater than 95% of shops still do business this way. It is no wonder that 489 reliability, security, and high availability are still major issues in IT. 490 We urge systems administrators to create and use dedicated testing 491 environments, not inflict changes on users without prior testing, and consider 492 themselves the operating systems vendors that they really are. We urge IT 493 management organizations to understand and support administrators in these 494 efforts; the return on investment is in the form of lower labor costs and much 495 higher user satisfaction. See section (8.42). Availability of a test 496 environment enables the deployment of automated systems administration tools, 497 bringing major cost savings. See (figure_4.3.2). 498 A test environment is useless until we have a means to replicate the changes we 499 made in testing onto production machines. "Order matters" when we do this 500 replication; an earlier change will often affect the outcome of a later change. 501 This means that changes made to a test machine must later be "replayed" in the 502 same order on the machine's production counterpart. See section (8.45). 503 Testing costs can be greatly reduced by limiting the number of unique builds 504 produced; this holds true for both vendors and administrators. This calls for 505 careful management of changes and host classes in an IT environment, with an 506 intent of limiting proliferation of classes. See section (8.41). 507 Note that use of open-source operating systems does not remove the need for 508 local testing of local modifications. In any reasonably complex infrastructure, 509 there will always be local configuration and non-packaged binary modifications 510 which the community cannot have previously exercised. We prefer open source; we 511 do not expect it to relieve us from our responsibilities though. 512 ***** 7 Ordering HOWTO ***** 513 Automated systems administration is very straightforward. There is only one way 514 for a user-side administrative tool to change the contents of disk in a running 515 UNIX machine -- the syscall interface. The task of automated administration is 516 simply to make sure that each machine's kernel gets the right system calls, in 517 the right order, to make it be the machine you want it to be. 518 **** 7.1 Describing Disk State **** 519 If there are N bits on a disk, then there are 2^N possible disk states. In 520 order to maintain the baseline host description needed for congruent 521 management, we need to have a way to describe any arbitrary disk state in a 522 highly compressed way, preferably in a human-readable configuration file or 523 script. For the purposes of this description, we neglect user data and log 524 files -- we want to be able to describe the root-owned and administered 525 portions of disk. 526 "Order Matters" whether creating or modifying a disk: 527 A concise and reliable way to describe any arbitrary state of a disk 528 is to describe the procedure for creating that state. 529 This procedure will include the initial state (bare-metal build) of the disk, 530 followed by the steps used to change it over time, culminating in the desired 531 state. This procedure must be in writing, preferably in machine-readable form. 532 This entire set of information, for all hosts, constitutes the baseline 533 description of a congruent infrastructure. Each change added to the procedure 534 updates the baseline. See section (4.3), 'Congruence'. 535 There are tools which can help you maintain and execute this procedure. See 536 section (7.3), 'Example_Tools_and_Techniques'. 537 While it is conceivable that this procedure could be a documented manual 538 process, executing these steps manually is tedious and costly at best. (Though 539 we know of many large mission-critical shops which try.) It is generally error- 540 prone. Manual execution of complex procedures is one of the best methods we 541 know of for generating divergence (4.1). 542 The starting state (bare-metal install) description of the disk may take the 543 form of a network install tool's configuration file, such as that used for 544 Solaris Jumpstart or RedHat Kickstart. The starting state might instead be a 545 bitstream representing the entire initial content of the disk (usually a 546 snapshot taken right after install from vendor CD). The choice of which of 547 these methods to use is usually dependent on the vendor-supplied install tool - 548 - some will support either method, some require one or the other. 549 **** 7.2 How to Break an Enterprise **** 550 A systems administrator, whether a human or a piece of software (8.36), can 551 easily break an enterprise infrastructure by executing the right actions in the 552 wrong order. In this section, we will explore some of the ways this can happen. 553 *** 7.2.1 Right Commands, Wrong Order *** 554 First we will cover a trivial but devastating example that is easily avoided. 555 This once happened to a colleague while doing manual operations on a machine. 556 He wanted to clean out the contents of a directory which ordinarily had the 557 development group's source code NFS mounted over top of it. Here is what he 558 wanted to do: 559 umount /apps/src 560 cd /apps/src 561 rm -rf . 562 mount /apps/src 563 564 Here's what he actually did: 565 umount /apps/src 566 ...umount fails, directory in use; while resolving 567 this, his pager goes off, he handles the interrupt, 568 then... 569 cd /apps/src 570 rm -rf . 571 572 Needless to say, there had also been no backup of the development source tree 573 for quite some time... 574 In this example, "correct order" includes some concept of sufficient error 575 handling. We show this example because it highlights the importance of a 576 default behavior of "halt on error" for automatic systems administration tools. 577 Not all tools halt on error by default; isconf does (7.3.1). 578 *** 7.2.2 Right Packages, Wrong Order *** 579 We in the UNIX community have long accused Windows developers of poor library 580 management, due to the fact that various Windows applications often come 581 bundled with differing version of the same DLLs. It turns out that at least 582 some UNIX and Linux distributions appear to suffer from the same problem. 583 Jeffrey D'Amelia and John Hart [hart] demonstrated this in the case of RedHat 584 RPMs, both official and contributed. They showed that the order in which you 585 install RPMs can matter, even when there are no applicable dependencies 586 specified in the package. We don't assume that this situation is restricted to 587 RPMs only -- any package management system should be susceptible to this 588 problem. An interesting study would be to investigate similar overlaps in 589 vendor-supplied packages for commercial UNIX distributions. 590 Detecting this problem for any set of packages involves extensive analysis by 591 talented persons. In the case of [hart], the authors developed a suite of 592 global analysis tools, and repeatedly downloaded and unpacked thousands of 593 RPMs. They still only saw "the tip of the iceberg" (their words). They 594 intentionally ignored the actions of postinstall scripts, and they had not yet 595 executed any packaged code to look for behavioral interactions. 596 Avoiding the problem is easier; install the packages, record the order of 597 installation, test as usual, and when satisfied with testing, install the same 598 packages in the same order on production machines. 599 While we've used packages in this example, we'd like to remind the reader that 600 these considerations apply not only to package installation but any other 601 change that affects the root-owned portions of disk. 602 *** 7.2.3 Circular Dependencies *** 603 There is a "chicken and egg" or bootstrapping problem when updating either an 604 automated systems administration tool (ASAT) or its underlying foundations 605 (8.40). Order is important when changes the tool makes can change the ability 606 of the tool to make changes. 607 For example, cfengine version 2 includes new directives available for use in 608 configuration files. Before using a new configuration file, the new version of 609 cfengine needs to be installed. The new client is named 'cfagent' rather than 610 'cfengine', so wrapper scripts and crontab entries will also need to be 611 updated, and so on. 612 For fully automated operation on hundreds or thousands of machines, we would 613 like to be able to upgrade cfengine under the control of cfengine (8.46). We 614 want to ensure that the following actions will take place on all machines, 615 including those currently down: 616 1. fetch new configuration file containing the following instructions 617 2. install new cfagent binary 618 3. run cfkey to generate key pair 619 4. fetch new configuration file containing version 2 directives 620 5. update calling scripts and crontab entries 621 There are several ordering considerations here. We won't know that we need the 622 new cfagent binary until we do step 1. We shouldn't proceed with step 4 until 623 we know that 2 and 3 were successful. If we do 5 too early, we may break the 624 ability for cfengine to operate at all. If we do step 4 too early and try to 625 run the resulting configuration file using the old version of cfengine, it will 626 fail. 627 While this example may seem straightforward, implementing it in a language 628 which does not by default support deterministic ordering requires much use of 629 conditionals, state chaining, or equivalent. If this is the case, then code 630 flow will not be readily apparent, making inspection and edits error-prone. 631 Infrastructure automation code runs as root and has the ability to stop work 632 across the entire enterprise; it needs to be simple, short, and easy for humans 633 to read, like security-related code paths in tools such as PGP or ssh. 634 If the tool's language does not support "halt on error" by default, then it is 635 easy to inadvertently allow later actions to take place when we would have 636 preferred to abort. Going back to our cfengine example, if we can easily abort 637 and leave the cfengine version 1 infrastructure in place, then we can still use 638 version 1 to repair the damage. 639 *** 7.2.4 Other Sources of Breakage *** 640 There are many other examples we could show, some including multi-host 641 "barrier" problems. These include: 642 * Updating ssh to openssh on hundreds of hosts and getting the 643 authorized_keys and/or protocol version configuration out of order. This 644 can greatly hinder further contact with the target hosts. Daniel Hagerty 645 [hagerty] ran into this one; many of us have been bitten by this at some 646 point. 647 * Reconfiguring network routes or interfaces while communicating with the 648 target device via those same routes or interfaces. Ordering errors can 649 prevent further contact with the target, and often require a physical 650 visit to resolve. This is especially true if the target is a workstation 651 with no remote serial console access. Again, most readers have had this 652 happen to them. 653 **** 7.3 Example Tools and Techniques **** 654 While there are many automatic systems administration tools (ASAT) available, 655 the two we are most familiar with are cfengine and our own isconf [cfengine] 656 [isconf]. In this section, we will look at these two tools from the perspective 657 of Turing equivalence (8), with a focus on how each can be used 658 deterministically. 659 In general, some of the techniques that seem to work well for the design and 660 use of most ASATs include: 661 * Keep the "Turing tape" a finite size by holding the network content 662 constant (8.23), or versioning it using CVS or another version control 663 tool [cvs] [bootstrap]. This helps prevent some of the more insidious 664 behaviors that are a potential in self-modifying machines (8.40). 665 * Continuing in that vein, when using distributed package repositories such 666 as the public Debian [debian] package server infrastructure, always 667 specify version numbers when automating the installation of packages, 668 rather than let the package installation tool (in Debian's case apt-get) 669 select the latest version. If you do not specify the package version, 670 then you may introduce divergence (4.1). This risk varies, of course, 671 depending on your choice of 'stable' or 'unstable' distribution, though 672 we suspect it still applies in 'stable', especially when using the 673 'security' packages. It certainly applies in all cases when you need to 674 maintain your own kernel or kernel modules rather than using the 675 distributed packages. 676 We have experienced this repeatedly -- machines which built correctly the 677 first time with a given package list will not rebuild with the same 678 package list a few weeks later, due to package version changes on the 679 public servers, and resulting unresolved incompatibilities with local 680 conditions and configuration file contents. Remember, your hosts are 681 unique in the world -- there are likely no others like them. Package 682 maintainers cannot be expected to test every configuration, especially 683 yours. You must retain this responsibility. See section (6), 'The_Need 684 for_Testing'. 685 We use Debian in this example because it is a distribution we like a lot; 686 note that other package distribution and installation infrastructures, 687 such as the RedHat up2date system, also have this problem. 688 * Expect long dependency or sequence chains when building enterprise 689 infrastructures. If an ASAT can easily support encapsulation and ordering 690 of 10, 50, or even 100 complex atomic actions in a single chain, then it 691 is likely capable of fully automated administration of machines, 692 including package, kernel, build, and even rebuild management. If the 693 ASAT is cumbersome to use when chains become only two or three actions 694 deep, then it is likely most suited for configuration file management, 695 not package, binary, or kernel manipulation. 696 *** 7.3.1 ISconf *** 697 As we mentioned in section (1), isconf originally began life as a quick hack. 698 Its basic utility has proven itself repeatedly over the last 8 years, and as 699 adoption has grown it is currently managing more production infrastructures 700 than we are personally aware of. 701 While we show some ISconf makefile examples here, we do not show any example of 702 the top-level configuration file which drives the environment and targets for 703 'make'. It is this top-level configuration file, and the scripts which 704 interpret it, which are the core of ISconf and enable the typing or classing of 705 hosts. These top-level facilities also are what governs the actions ISconf is 706 to take during boot versus cron or other execution contexts. More information 707 and code is available at ISconf.org and Infrastructures.Org. 708 We also do not show here the network fetch and update portions of ISconf, and 709 the way that it updates its own code and configuration files at the beginning 710 of each run. This default behavior is something that we feel is important in 711 the design of any automated systems administration tool. If the tool does not 712 support it, end-users will have to figure out how to do it themselves, reducing 713 the usability of the tool. 714 ** 7.3.1.1 ISconf Version 2 ** 715 Version 2 of ISconf was a late-90's rewrite to clean up and make portable the 716 lessons learned from version 1. As in version 1, the code used was Bourne 717 shell, and the state engine used was 'make'. 718 In (listing 1), we show a simplified example of Version 2 usage. While examples 719 related to this can be found in [hart] and in our own makefiles, real-world 720 usage is usually much more complex than the example shown here. We've contrived 721 this one for clarity of explanation. 722 In this contrived example, we install two packages which we have not proven 723 orthogonal. We in fact do not wish to take the time to detect whether or not 724 they are orthogonal, due to the considerations expressed in section (8.58). We 725 may be tool users, rather than tool designers, and may not have the skillset to 726 determine orthogonality, as in section (8.54). 727 These packages might both affect the same shared library, for instance. Again 728 according to [hart] and our own experience, it is not unusual for two packages 729 such as these to list neither as prerequisites, so we might gain no ordering 730 guidance from the package headers either. 731 In other words, all we know is that we installed package 'foo', tested and 732 deployed it to production, and then later installed package 'bar', tested it 733 and deployed. These installs may have been weeks or months apart. All went well 734 throughout, users were happy, and we have no interest in unpacking and 735 analyzing the contents of these packages for possible reordering for any 736 reason; we've gone on to other problems. 737 Because we know this order works, we wish for these two packages, 'foo' and 738 'bar', to be installed in the same order on every future machine in this class. 739 This makefile will ensure that; the touch $@ command at the end of each stanza 740 will prevent this stanza from being run again. The ISconf code always changes 741 to the timestamps directory before starting 'make' (and takes other measures to 742 constrain the normal behavior of 'make', so that we never try to "rebuild" this 743 target either). 744 The class name in this case (listing 1) is 'Block12'. You can see that 745 'Block12' is also made up of many other packages; we don't show the makefile 746 stanzas for these here. These packages are listed as prerequisites to 747 'Block12', in chronological order. Note that we only want to add items to the 748 end of this list, not the middle, due to the considerations expressed in 749 section (8.49). 750 In this example, even though we take advantage of the Debian package server 751 infrastructure, we specify the version of package that we want, as in the 752 introduction to section (7.3). We also use a caching proxy when fetching Debian 753 packages, in order to speed up our own builds and reduce the load on the Debian 754 servers to a minimum. 755 Note that we get "halt-on-error" behavior from 'make', as we wished for in 756 section (7.2.1). If any of the commands in the 'foo' or 'bar' sections exit 757 with a non-zero return code, then 'make' aborts processing immediately. The 758 'touch' will not happen, and we normally configure the infrastructure such that 759 the ISconf failure will be noticed by a monitoring tool and escalated for 760 resolution. In practice, these failures very rarely occur in production; we see 761 and fix them in test. Production failures, by the definition of congruence 762 (4.3), usually indicate a systemic, security, or organizational problem; we 763 don't want them fixed without human investigation. 764 Listing 1: ISconf makefile package ordering example. 765 Block12: cvs ntp foo lynx wget serial_console bar sudo mirror_rootvg 766 767 foo: 768 apt-get -y install foo=0.17-9 769 touch $@ 770 771 bar: 772 apt-get -y install bar=1.0.2-1 773 echo apple pear > /etc/bar.conf 774 touch $@ 775 776 ... 777 778 ** 7.3.1.2 ISconf Version 3 ** 779 ISconf version 3 was a rewrite in Perl, by Luke Kanies. This version adds more 780 "lessons learned", including more fine-grained control of actions as applied to 781 target classes and hosts. There are more layers of abstraction between the 782 administrator and the target machines; the tool uses various input files to 783 generate intermediate and final file formats which eventually are fed to 784 'make'. 785 One feature in particular is of special interest for this paper. In ISconf 786 version 2, the administrator still had the potential to inadvertently create 787 unordered change by an innocent makefile edit. While it is possible to avoid 788 this with foreknowledge of the problem, version 3 uses timestamps in an 789 intermediate file to prevent it from being an issue. 790 The problem which version 3 fixes can be reproduced in version 2 as follows: 791 Refer to (listing 1). If both 'foo' and 'bar' have been executed (installed) on 792 production machines, then the administrator adds 'baz' as a prerequisite to 793 'bar', then this would qualify as "editing prior actions" and create the 794 divergence described in (8.49). 795 ISconf version 3, rather than using a human-edited makefile, reads other input 796 files which the administrator maintains, and generates intermediate and final 797 files which include timestamps to detect the problem and correct the ordering. 798 ** 7.3.1.3 ISconf version 4 ** 799 ISconf version 4, currently in prototype, represents a significant 800 architectural change from versions 1 through 3. If the current feature plan is 801 fully implemented, version 4 will enable cross-organizational collaboration for 802 development and use of ordered change actions. A core requirement is 803 decentralized development, storage, and distribution of changes. It will enable 804 authentication and signing, encryption, and other security measures. We are 805 likely to replace 'make' with our own state engine, continuing the migration 806 begun in version 3. See ISconf.Org for the latest information. 807 ** 7.3.1.4 Baseline Management ** 808 In section (4.3), we discussed the concept of maintaining a fully descriptive 809 baseline for congruent management. In (7.1), we discussed in general terms how 810 this might be done. In this section, we will show how we do it in isconf. 811 First, we install the base disk image as in section (7.1), usually using 812 vendor-supplied network installation tools. We discuss this process more in 813 [bootstrap]. We might name this initial image 'Block00'. Then we use the 814 process we mentioned in (7.3.1.1) to apply changes to the machine over the 815 course of its life. Each change we add updates our concept of what is the 816 'baseline' for that class of host. 817 As we add changes, any new machine we build will need to run isconf longer on 818 first boot, to add all of the accumulated changes to the Block00 image. After 819 about forty minutes' worth of changes have built up on top of the initial 820 image, it helps to be able to build one more host that way, set the hostname/IP 821 to 'baseline', cut a disk image of it, and declare that new image to be the new 822 baseline. This infrequent snapshot or checkpoint not only reduces the build 823 time of future hosts, but reduces the rebuild time and chance of error in 824 rebuilding existing hosts -- we always start new builds from the latest 825 baseline image. 826 In an isconf makefile, this whole process is reflected as in (listing 2). Note 827 that whether we cut a new image and start the next install from that, or if we 828 just pull an old machine off the shelf with a Block00 image and plug it in, 829 we'll still end up with a Block20 image with apache and a 2.2.12 kernel, due to 830 the way the makefile prerequisites are chained. 831 This example shows a simple, linear build of successive identical hosts with no 832 "branching" for different host classes. Classes add slightly more complexity to 833 the makefile. They require a top-level configuration file to define the classes 834 and target them to the right hosts, and they require wrapper script code to 835 read the config file. 836 There is a little more complexity to deal with things that should only happen 837 at boot, and that can happen when cron runs the code every hour or so. There 838 are examples of all of this in the isconf-2i package available from ISconf.Org. 839 Listing 2: Baseline Management in an ISconf Makefile 840 841 # 01 Feb 97 - Block00 is initial disk install from vendor cd, 842 # with ntp etc. added later 843 Block00: ntp cvs lynx ... 844 845 # 15 Jul 98 - got tired of waiting for additions to Block00 to build, 846 # cut new baseline image, later add ssh etc. 847 Block10: Block00 ssh ... 848 849 # 17 Jan 99 - new baseline again, later add apache, rebuild kernel, etc. 850 Block20: Block10 apache kernel-2.2.12 ... 851 *** 7.3.2 Cfengine *** 852 Cfengine is likely the most popular purpose-built tool for automated systems 853 administration today. The cfengine language was optimized for dynamic 854 prerequisite analysis rather than long, deterministic ordered sets. 855 While the cfengine language wasn't specifically optimized for ordered behavior, 856 it is possible to achieve this with extra work. It should be possible to 857 greatly reduce the amount of effort involved, by using some tool to generate 858 cfengine configuration files from makefile-like (or equivalent) input files. 859 One good starting point might be Tobias Oetiker's TemplateTree II [oetiker]. 860 Automatic generation of cfengine configuration files appears to be a near- 861 requirement if the tool is to be used to maintain congruent infrastructures; 862 the class and action-type structures tend to get relatively complex rather fast 863 if congruent ordering, rather than convergence, is the goal. 864 Other gains might be made from other features of cfengine; we have made 865 progress experimenting with various helper modules, for instance. Another 866 technique that we have put to good use is to implement atomic changes using 867 very small cfengine scripts, each equivalent to an ISconf makefile stanza. 868 These scripts we then drive within a deterministically ordered framework. 869 In the cfengine version 2 language there are new features, such as the 870 FileExists() evaluated class function, which may reduce the amount of code. So 871 far, based on our experience over the last few years in trial attempts, it 872 appears that a cfengine configuration file that does the same job as an ISconf 873 makefile would still need anywhere from 2-3 times the number of lines of code. 874 We consider this an open and evolving effort though -- check the cfengine.org 875 and Infrastructures.Org websites for the latest information. 876 ***** 8 Brown/Traugott Turing Equivalence ***** 877 If it should turn out that the basic logics of a machine designed for 878 the numerical solution of differential equations coincide with the 879 logics of a machine intended to make bills for a department store, I 880 would regard this as the most amazing coincidence that I have ever 881 encountered. -- Howard Aiken, founder of Harvard's Computer Science 882 department and architect of the IBM/Harvard Mark I. 883 Turing equivalence in host management appears to be a new factor relative to 884 the age of the computing industry. The downsizing of mainframe installations 885 and distribution of their tasks to midrange and desktop machines by the early 886 1990's exposed administrative challenges which have taken the better part of a 887 decade for the systems administration community to understand, let alone deal 888 with effectively. 889 Older computing machinery relied more on dedicated hardware rather than 890 software to perform many administrative tasks. Operating systems were limited 891 in their ability to accept changes on the fly, often requiring recompilation 892 for tasks as simple as adding terminals or changing the time zone. Until 893 recently, the most popular consumer desktop operating system still required a 894 reboot when changing IP address. 895 In the interests of higher uptime, modern versions of UNIX and Linux have 896 eliminated most of these issues; there is very little software or configuration 897 management that cannot be done with the machine "live". We have evolved to a 898 model that is nearly equivalent to that of a Universal Turing Machine, with all 899 of its benefits and pitfalls. To avoid this equivalence, we would need to go 900 back to shutting operating systems down in order to administer them. Rather 901 than go back, we should seek ways to go further forward; understanding Turing 902 equivalence appears to be a good next step. 903 This situation may soon become more critical, with the emergence of "soft 904 hardware". These systems use Field-Programmable Gate Arrays to emulate 905 dedicated processor and peripheral hardware. Newer versions of these devices 906 can be reprogrammed, while running, under control of the software hosted on the 907 device itself [xilinx]. This will bring us the ability to modify, for instance, 908 our own CPU, using high-level automated administration tools. Imagine not only 909 accidentally unconfiguring your Ethernet interface, but deleting the circuitry 910 itself... 911 We have synthesized a thought experiment to demonstrate some of the 912 implications of Turing equivalence in host management, based on our 913 observations over the course of several years. The description we provide here 914 is not as rigorous as the underlying theories, and much of it should be 915 considered as still subject to proof. We do not consider ourselves theorists; 916 it was surprising to find ourselves in this territory. The theories cited here 917 provided inspiration for the thought experiment, but the goal is practical 918 management of UNIX and other machines. We welcome any and all future 919 exploration, pro or con. See section (9), 'Conclusion_and_Critique'. 920 In the following description of this thought experiment, we will develop a 921 model of system administration starting at the level of the Turing machine. We 922 will show how a modern self-administered machine is equivalent to a Turing 923 machine with several tapes, which is in turn equivalent to a single-tape Turing 924 machine. We will construct a Turing machine which is able to update its own 925 program by retrieving new instructions from a network-accessible tape. We will 926 develop the idea of configuration management for this simpler machine model, 927 and show how problems such as circular dependencies and uncertainty about 928 behavior arise naturally from the nature of computation. 929 We will discuss how this Turing machine relates to a modern general-purpose 930 computer running an automatic administration tool. We will introduce the 931 implications of the self-modifying code which this arrangement allows, and the 932 limitations of inspection and testing in understanding the behavior of this 933 machine. We will discuss how ordering of changes affects this behavior, and how 934 deterministically ordered changes can make its behavior more deterministic. 935 We will expand beyond single machines into the realm of distributed computing 936 and management of multiple machines, and their associated inspection and 937 testing costs. We will discuss how ordering of changes affects these costs, and 938 how ordered change apparently provides the lowest cost for managing an 939 enterprise infrastructure. 940 Readers who are interested in applied rather than mathematical or theoretical 941 arguments may want to review (7) or skip to section (9). 942 8.1 - A Turing machine (figure_8.1.1) reads bits from an infinite tape, 943 interprets them as data according to a hardwired program and rewrites portions 944 of the tape based on what it finds. It continues this cycle until it reaches a 945 completion state, at which time it halts [turing]. 946 [images/turing.png] 947 Figure 8.1.1: Turing machine block diagram; the machine reads and 948 writes an infinite tape and updates an internal state variable based 949 on a hardwired or stored ruleset. 950 8.2 - Because a Turing machine's program is hardwired, it is common practice to 951 say that the program describes or is the machine. A Turing machine's program is 952 stated in a descriptive language which we will call the machine language. Using 953 this language, we describe the actions the machine should take when certain 954 conditions are discovered. We will call each atom of description an 955 instruction. An example instruction might say: 956 If the current machine state is 's3', and the tape cell at the 957 machine's current head position contains the letter 'W', then change 958 to state 's7', overwrite the 'W' with a 'P', and move the tape one 959 cell to the right. 960 Each instruction is commonly represented as a quintuple; it contains the letter 961 and current state to be matched, as well as the letter to be written, the tape 962 movement command, and the new state. The instruction we described above would 963 look like: 964 s3,W ⇒ s7,P,r 965 Note that a Turing machine's language is in no way algorithmic; the order of 966 quintuples in a program listing is unimportant; there are no branching, 967 conditional, or loop statements in a Turing machine program. 968 8.3 - The content of a Turing tape is expressed in a language that we will call 969 the input language. A Turing machine's program is said to either accept or 970 reject a given input language, if it halts at all. If our Turing machine halts 971 in an accept state, (which might actually be a state named 'accept') then we 972 know that our program is able to process the data and produce a valid result - 973 - we have validated our input against our machine. If our Turing machine halts 974 because there is no instruction that matches the current combination of state 975 and cell content (8.2), then we know that our program is unable to process this 976 input, so we reject. If we never halt, then we cannot state a result, so we 977 cannot validate the input or the machine. 978 8.4 - A Universal Turing Machine (UTM) is able to emulate any arbitrary Turing 979 machine. Think of this as running a Turing "virtual machine" (TVM) on top of a 980 host UTM. A UTM's machine language program (8.2) is made up of instructions 981 which are able to read and execute the TVM's machine language instructions. The 982 TVM's machine language instructions are the UTM's input data, written on the 983 input tape of the UTM alongside the TVM's own input data (figure_8.4.1). 984 Any multiple-tape Turing machine can be represented by a single-tape Turing 985 machine, so it is equally valid to think of our Universal Turing Machine as 986 having two tapes; one for TVM program, and the other for TVM data. 987 A Universal Turing Machine appears to be a useful model for analyzing the 988 theoretical behavior of a "real" general-purpose computer; basic computability 989 theory seems to indicate that a UTM can solve any problem that a general- 990 purpose computer can solve [church]. 991 [images/utmtape.png] 992 Figure 8.4.1: The tape of a Universal Turing Machine (UTM) stores the 993 program and data of a hosted Turing Virtual Machine (TVM). 994 8.5 - Further work by John von Neumann and others demonstrated one way that 995 machines could be built which were equivalent in ability to Universal Turing 996 Machines, with the exception of the infinite tape size [vonneumann]. The von 997 Neumann architecture is considered to be a foundation of modern general purpose 998 computers [godfrey]. 999 8.6 - As in von Neumann's "stored program" architecture, the TVM program and 1000 data are both stored as rewritable bits on the UTM tape (8.4) (figure_8.4.1). 1001 This arrangement allows the TVM to change the machine language instructions 1002 which describe the TVM itself. If it does so, our TVM enjoys the advantages 1003 (and the pitfalls) of self-modifying code [nordin]. 1004 8.7 - There is no algorithm that a Turing machine can use to determine whether 1005 another specific Turing machine will halt for a given tape; this is known as 1006 the "halting problem". In other words, Turing machines can contain 1007 constructions which are difficult to validate. This is not to say that every 1008 machine contains such constructions, but that that an arbitrary machine and 1009 tape chosen at random has some chance of containing one. 1010 8.8 - Note that, since a Turing machine is an imaginary construct [turing], our 1011 own brain, a pencil, and a piece of paper are (theoretically) sufficient to 1012 work through the tape, producing a result if there is one. In other words, we 1013 can inspect the code and determine what it would do. There may be tools and 1014 algorithms we can use to assist us in this [laitenberger]. We are not 1015 guaranteed to reach a result though -- in order for us to know that we have a 1016 valid machine and valid input, we must halt and reach an accept state. 1017 Inspection is generally considered to be a form of testing. 1018 Inspection has a cost (which we will use later): 1019 Cinspect 1020 This cost includes the manual labor required to inspect the code, any machine 1021 time required for execution of inspection tools, and the manual labor to 1022 examine the tool results. 1023 8.9 - There is no software testing algorithm that is guaranteed to ensure fully 1024 reliable program operation across all inputs -- there appears to be no 1025 theoretical foundation for one [hamlet]. We suspect that some of the reasons 1026 for this may be related to the halting problem (8.7), Gödel's incompleteness 1027 theorem [godel], and some classes of computational intractability problems, 1028 such as the Traveling Salesman and NP completeness [greenlaw] [garey] 1029 [brookshear] [dewdney]. 1030 In practice, we can use multiple test runs to explore the input domain via a 1031 parameter study, equivalence partitioning [richardson], cyclomatic complexity 1032 analysis [mccabe], pseudo-random input, or other means. Using any or all of 1033 these methods, we may be able to build a confidence level for predictability of 1034 a given program. Note that we can never know when testing is complete, and that 1035 testing only proves incorrectness of a program, not correctness. 1036 Testing cost includes the manual labor required to design the test, any machine 1037 time required for execution, and the manual labor needed to examine the test 1038 results: 1039 Ctest 1040 8.10 - For software testing to be meaningful, we must also ensure code 1041 coverage. Code coverage requirements are generally determined through some form 1042 of inspection (8.8), with or without the aid of tools. Coverage information is 1043 only valid for a fixed program -- even relatively minor code changes can affect 1044 code coverage information in unpredictable ways [elbaum]. We must repeat 1045 testing (8.9) for every variation of program code. 1046 To ensure code coverage, testing includes the manual labor required to inspect 1047 the code, any machine time required for execution of the coverage tools and 1048 tests, and the manual labor needed to examine the test results. Because testing 1049 for coverage includes code inspection, we know that testing is more expensive 1050 than inspection alone: 1051 Ctest > Cinspect 1052 8.11 - Once we have found a UTM tape that produces the result we desire, we can 1053 make many copies of that tape, and run them through many identical Universal 1054 Turing Machines simultaneously. This will produce many simultaneous, identical 1055 results. This is not very interesting -- what we really want to be able to do 1056 is hold the TVM program portion of the tape constant while changing the TVM 1057 data portion, then feed those differing tapes through identical machines. The 1058 latter arrangement can give us a form of distributed or parallel computing. 1059 8.12 - Altering the tapes (8.11) presents a problem though. We cannot in 1060 advance know whether these altered tapes will provide valid results, or even 1061 reach completion. We can exhaustively test the same program with a wide variety 1062 of sample inputs, validating each of these. This is fundamentally a time- 1063 consuming, pseudo-statistical process, due to the iterative validations 1064 normally required. And it is not a complete solution (8.9). 1065 8.13 - If we for some reason needed to solve slightly different problems with 1066 the distributed machines in (8.11), we may decide to use slightly different 1067 programs in each machine, rather than add functionality to our original 1068 program. But using these unique programs would greatly worsen our testing 1069 problem. We would not only need to validate across our range of input data 1070 (8.9), but we would also need to repeat the process for each program variant 1071 (8.10). We know that testing many unique programs will be more expensive than 1072 testing one: 1073 Cmany > Ctest 1074 8.14 - It is easy to imagine a Turing Machine that is connected to a network, 1075 and which is able to use the net to fetch data from tapes stored remotely, 1076 under program control. This is simply a case of an multiple-tape Turing 1077 machine, with one or more of the tapes at the other end of a network 1078 connection. 1079 8.15 - Building on (8.14), imagine a Turing Virtual Machine (TVM) running on 1080 top of a networked Universal Turing Machine (UTM) (8.4). In this case, we might 1081 have 3 tapes; one for the TVM program, one for the TVM data, and a third for 1082 the remote network tape. It is easy to imagine a sequence of TVM operations 1083 which involve fetching a small amount of data from the remote tape, and storing 1084 it on the local program tape as additional and/or replacement TVM instructions 1085 (8.6). We will name the old TVM instruction set A. The set of fetched 1086 instructions we will name B, and the resulting merger of the two we will name 1087 AB. Note that some of the instructions in B may have replaced some of those in 1088 A (figure_8.15.1). Before the fetch, our TVM could be described (8.2) as an A 1089 machine, after the fetch we have an AB machine -- the TVM's basic functionality 1090 has changed. It is no longer the same machine. 1091 [images/ab.png] 1092 Figure 8.15.1: Instruction set B partially overlays instruction set 1093 A, creating set AB. 1094 8.16 - Note that, if any of the instructions in set B replace any of those in 1095 set A, (8.15), then the order of loading these sets is important. A TVM with 1096 the instruction set AB will be a different machine than one with set BA (figure 1097 8.16.1). 1098 [images/ba.png] 1099 Figure 8.16.1: Instruction set BA is created by loading B before A; A 1100 partially overlays B this time. 1101 8.17 - It is easy to imagine that the TVM in (8.15) could later execute an 1102 instruction from set B, which could in turn cause the machine to fetch another 1103 set of one or more instructions in a set we will call C, resulting in an ABC 1104 machine: 1105 [images/abc.png] 1106 Figure 8.17.1: If instructions from set AB load C, then ABC results. 1107 8.18 - After each fetch described in section (8.17), the local program and data 1108 tapes will contain bits from (at least) three sources: the new instruction set 1109 just copied over the net, any old instructions still on tape, and the data 1110 still on tape from ongoing execution of all previous instructions. 1111 8.19 - The choice of next instruction to be fetched from the remote tape in 1112 section (8.17) can be calculated by the currently available instructions on the 1113 local program tape, based on current tape content (8.18). 1114 8.20 - The behavior of one or more new instructions fetched in (8.17) can (and 1115 usually will) be influenced by other content on the local tapes (8.18). With 1116 careful inspection and testing we can detect some of the ways content will 1117 affect instruction fetches, but due to the indeterminate results of software 1118 testing (8.9), we may never know if we found all of them. 1119 8.21 - Let us go back to our three TVM instruction sets, A, B, and C (8.17). 1120 These were loaded over the net and executed using the procedure described in 1121 (8.19). Assume we start with blank local program and data tapes. Assume our UTM 1122 is hardwired to fetch set A if the local program tape is found to be blank. If 1123 we then run the TVM, A can collect data over the net and begin processing it. 1124 At some point later, A can cause set B to be loaded. Our local tapes will now 1125 contain the TVM data resulting from execution of A, and the new TVM machine 1126 instructions AB. If the TVM later loads C, our program tape will contain ABC. 1127 8.22 - If the networked UTM machine constructed in (8.21) always starts with 1128 the same (blank) local tape content, and the remote tape content does not 1129 change, then we can demonstrate that an A TVM will always evolve to an AB, then 1130 an ABC machine, before halting and producing a result. 1131 8.23 - Assuming the network-resident data never changes, we can rebuild our 1132 networked UTM at any time and restore it to any prior state by clearing the 1133 local tapes, resetting the machine state, and restarting execution with the 1134 load of A (8.21). The machine will execute and produce the same intermediate 1135 and final results as it did before, as in section (8.22). 1136 8.24 - If the network-resident data does change, though, we may not be able to 1137 rebuild to an identical state. For example, if someone were to alter the 1138 network-resident master copy of the B instruction set after we last fetched it, 1139 then it may no longer produce the same intermediate results and may no longer 1140 fetch C (8.19). We might instead halt at AB. 1141 8.25 - Without careful (and possibly intractable) inspection (8.8), we cannot 1142 prove in advance whether an BCA or CAB machine can produce the same result as 1143 an ABC machine. It is possible that these, or other, variations might yield the 1144 same result. We can validate the result for a given input (8.3). We would also 1145 need to do iterative testing (8.12) to demonstrate that multiple inputs would 1146 produce the same result. Our cost of testing multiple or partially ordered 1147 sequences is greater than that required to test a single sequence: 1148 Cpartial > Ctest 1149 8.26 - If the behavior of any instruction from B in (8.22) is in any way 1150 dependent on other content found on tape (8.18) (8.19) (8.20), then we can 1151 expect our TVM to behave differently if we load B before loading A (8.16). We 1152 cannot be certain that a UTM loaded with only a B instruction set will accept 1153 the input language, or even halt, until after we validate it (8.3). 1154 8.27 - We might want to rollback from the load or execution of a new 1155 instruction set. In order to do this, we would need to return the local program 1156 and data tape to a previous content. For example, if machine A executes and 1157 loads B, our instruction set will now be AB. We might rollback by replacing our 1158 tape with the A copy. 1159 8.28 - Due to (8.26), it is not safe to try to rollback the instruction set of 1160 machine AB to recreate machine A by simply removing the B instructions. Some of 1161 B may have replaced A. The AB machine, while executing, may have even loaded C 1162 already (8.21), in which case you won't end up with A, but with AC. If the AB 1163 machine executed for any period of time, it is likely that the input data 1164 language now on the data tape is only acceptable to an AB machine -- an A 1165 machine might reject it or fail to halt (8.3). The only safe rollback method 1166 seems to be something similar to (8.27). 1167 8.29 - It is easy to imagine an automatic process which conducts a rollback. 1168 For example, in (8.27), machine AB itself might have the ability to clear its 1169 own tapes, reset the machine state, and restart execution at the beginning of 1170 A, as in section (8.23). 1171 8.30 - But the system described in (8.29) will loop infinitely. Each time A 1172 executes, it will load B, then AB will execute and reset the local tapes again. 1173 In practice, a human might detect and break this loop; to represent this 1174 interaction, we would need to add a fourth tape, representing the user 1175 detection and input data. 1176 8.31 - It is easy to imagine an automatic process which emulates a rollback 1177 while avoiding loops, without requiring the user input tape in (8.30). For 1178 example, instruction set C might contain the instructions from A that B 1179 overlaid. In other words, installing C will "rollback" B. Note that this is not 1180 a true rollback; we never return to a tape state that is completely identical 1181 to any previous state. Although this is an imperfect solution, it is the best 1182 we seem to be able to do without human intervention. 1183 8.32 - The loop in section (8.30) will cause our UTM to never reach completion 1184 -- we will not halt, and cannot validate a result (8.3). A method such as 1185 (8.31) can prevent a rollback-induced loop, but is not a true rollback -- we 1186 never return to an earlier tape content. If these, or similar, methods are the 1187 only ones available to us, it appears that program-controlled tape changes must 1188 be monotonic -- we cannot go back to a previous tape content under program 1189 control, otherwise we loop. 1190 You are in a maze of twisty little passages, all alike. -- Will 1191 Crowther's "Adventure" 1192 8.33 - Let us now look at a conventional application program, running as an 1193 ordinary user on a correctly configured UNIX host. This program can be loaded 1194 from disk into memory and executed. At no time is the program able to modify 1195 the "master" copy of itself on disk. An application program typically executes 1196 until it has output its results, at which time it either sleeps or halts. This 1197 application is equivalent to a fixed-program Turing machine (8.1) in the 1198 following ways: Both can be validated for a given input (8.3) to prove that 1199 they will produce results in a finite time and that those results are correct. 1200 Both can be tested over a range of inputs (8.9) to build confidence in their 1201 reliability. Neither can modify their own executable instructions; in the UNIX 1202 machine they are protected by filesystem permissions; in the Turing machine 1203 they are hardwired. (We stipulate that there are some ways in which (8.33) and 1204 (8.1) are not equivalent -- a Turing machine has a theoretically infinite tape, 1205 for instance.) 1206 8.34 - We can say that the application program in (8.33) is running on top of 1207 an application virtual machine (AVM). If the application is written in Java, 1208 for example, the AVM consists of the Java Virtual Machine. In Perl, the AVM is 1209 the Perl bytecode VM. For C programs, the AVM is the kernel system call 1210 interface. Low-level code in shared libraries used by a C program uses the same 1211 syscall interface to interact with the hardware -- shared libraries are part of 1212 the C AVM. A Perl program can load modules -- these become part of the 1213 program's AVM. A C or Perl program that uses the system() or exec() system 1214 calls relies on any executables called -- these other executables, then, are 1215 part of the C or Perl program's AVM. Any executables called via exec() or 1216 system() in turn may require other executables, shared libraries, or other 1217 facilities. Many, if not most, of these components are dependent on one or more 1218 configuration files. These components all form an AVM chain of dependency for 1219 any given application. Regardless of the size or shape of this chain, all 1220 application programs on a UNIX machine ultimately interact with the hardware 1221 and the outside world via the kernel syscall interface. 1222 8.35 - When we perform system administration actions as root on a running UNIX 1223 machine, we can use tools found on the local disk to cause the machine to 1224 change portions of that same disk. Those changes can include executables, 1225 configuration files, and the kernel itself. Changes can include the system 1226 administration tools themselves, and changed components and configuration files 1227 can influence the fundamental behavior and viability of those same executables 1228 in unforeseen ways, as in section (8.10), as applied to changes in the AVM 1229 chain (8.34). 1230 8.36 - A self-administered UNIX host runs an automatic systems administration 1231 tool (ASAT) periodically and/or at boot. The ASAT is an application program 1232 (8.33), but it runs as root rather than an ordinary user. While executing, the 1233 ASAT is able to modify the "master" copy of itself on disk, as well as the 1234 kernel, shared libraries, filesystem layout, or any other portion of disk, as 1235 in section (8.35). 1236 8.37 - The ASAT described in section (8.36) is equivalent to a Turing Virtual 1237 Machine (8.4) in the ways described in section (8.33). In addition, a self- 1238 administered host running an ASAT is similar to a Universal Turing Machine in 1239 that the ASAT can modify its own program code (8.6). 1240 8.38 - A self-administered UNIX host connected to a network is equivalent to a 1241 network-connected Universal Turing Machine (8.14) in the following ways: The 1242 host's ASAT (8.36) can fetch and execute an arbitrary new program as in section 1243 (8.15). The fetched program can fetch and execute another as in (8.17). 1244 Intermediate results can control which program is fetched next, as in (8.19). 1245 The behavior of each fetched program can be influenced by the results of 1246 previous programs. 1247 8.39 - When we do administration via automated means (8.36), we rely on the 1248 executable portions of disk, controlled by their configuration files, to 1249 rewrite those same executables and configuration files (8.35). Like the 1250 Universal Turing Machine in (8.32), changes made under program control must be 1251 assumed to be monotonic; non-reversible short of "resetting the tape state" by 1252 reformatting the disk. 1253 8.40 - An ASAT (8.36) runs in the context of the host kernel and configuration 1254 files, and depends either directly or indirectly on other executables and 1255 shared libraries on the host's disk (8.26). 1256 The circular dependency of the ASAT AVM dependency tree (8.34) forces us to 1257 assume that, even though we may not ever change the ASAT code itself, we can 1258 unintentionally change its behavior if we change other components of the 1259 operating system. This is similar to the indeterminacy described in (8.20). 1260 It is not enough for an ASAT designer to statically link the ASAT binary and 1261 carefully design it for minimum dependencies. Other executables, their shared 1262 libraries, scripts, and configuration files might be required by ASAT 1263 configuration files written by a system administrator -- the tool's end user. 1264 When designing tools we cannot know whether the system administrator is aware 1265 of the AVM dependency tree (we certainly can't expect them to have read this 1266 paper). We must assume that there will be circular dependencies, and we must 1267 assume that the tool designer will never know what these dependencies are. The 1268 tool must support some means of dealing with them by default. We've found over 1269 the last several years that a default paradigm of deterministic ordering will 1270 do this. 1271 8.41 - We cannot always keep all hosts identical; a more practical method, for 1272 instance, is to set up classes of machines, such as "workstation" and "mail 1273 server", and keep the code within a class identical. This reduces the amount of 1274 coverage testing required (8.10). This testing is similar to that described in 1275 section (8.13). 1276 8.42 - The question of whether a particular piece of software is of sufficient 1277 quality for the job remains intractable (8.9). 1278 But in practice, in a mission-critical environment, we still want to try to 1279 find most defects before our users do. The only accurate way to do this is to 1280 duplicate both program and input data, and validate the combination (8.3). In 1281 order for this validation to be useful, the input data would need to be an 1282 exact copy of real-world, production data, as would the program code. Since we 1283 want to be able to not only validate known real-world inputs but also test some 1284 possible future inputs (8.9), we expect to modify and disrupt the data itself. 1285 We cannot do this in production. Application developers and QA engineers tend 1286 to use test environments to do this work. It appears to us that systems 1287 administrators should have the same sort of test facilities available for 1288 testing infrastructure changes, and should make good use of them. 1289 8.43 - Because the ASAT (8.36) is itself a complex, critical application 1290 program, it needs to be tested using the procedure in (8.42). Because the ASAT 1291 can affect the operation of the UNIX kernel and all subsidiary processes, this 1292 testing usually will conflict with ordinary application testing. Because the 1293 ASAT needs to be tested against every class of host (8.41) to be used in 1294 production, this usually requires a different mix of hosts than that required 1295 for testing an ordinary application. 1296 8.44 - The considerations in section (8.43) dictate a need for an 1297 infrastructure test environment for testing automated systems administration 1298 tools and techniques. This environment needs to be separate from production, 1299 and needs to be as identical as possible in terms of user data and host class 1300 mix. 1301 8.45 - Changes made to hosts in the test environment (8.44), once tested 1302 (8.12), need to be transferred to their production counterpart hosts. When 1303 doing so, the ordering precautions in section (8.26) need to be observed. Over 1304 the last several years, we have found that if you observe these precautions, 1305 then you will see the benefits of repeatable results as shown in (8.22). In 1306 other words, if you always make the same changes first in test, then 1307 production, and you always make those changes in the same order on each host, 1308 then changes that worked in test will work in production. 1309 8.46 - Because an ASAT (8.36) installed on many machines must be able to be 1310 updated without manual intervention, it is our standard practice to always have 1311 the tool update itself as well as its own configuration files and scripts. This 1312 allows the entire system state to progress through deterministic and repeatable 1313 phases, with the tool, its configuration files, and other possibly dependent 1314 components kept in sync with each other. 1315 By having the ASAT update itself, we know that we are purposely adding another 1316 circular dependency beyond that mentioned in section (8.40). This adds to the 1317 urgency of the need for ordering constraints such as (8.45). 1318 We suspect control loop theory applies here; this circular dependency creates a 1319 potential feedback loop. We need to "break the loop" and prevent runaway 1320 behavior such as oscillation (replacing the same file over and over) or loop 1321 lockup (breaking the tool so that it cannot do anything anymore). 1322 Deterministically ordered changes seem to do the trick, acting as an effective 1323 damper. 1324 We stipulate that this is not standard practice for all ASAT users. But all 1325 tools must be updated at some point; there are always new features or bug fixes 1326 which need to be addressed. If the tool cannot support a clean and predictable 1327 update of its own code, then these very critical updates must be done "out of 1328 band". This defeats the purpose of using an ASAT, and ruins any chance of 1329 reproducible change in an enterprise infrastructure. 1330 8.47 - Due to (8.45), if we allow the order of changes to be A, B, C on some 1331 hosts, and A, C, B on others, then we must test both versions of the resulting 1332 hosts (8.13). We have inadvertently created two host classes (8.41); due to the 1333 risk of unforeseen interactions we must also test both versions of hosts for 1334 all future changes as well, regardless of ordering of those future changes. The 1335 hosts have diverged (4.1). 1336 8.48 - It is tempting to ask "Why don't we just test changes in production, and 1337 rollback if they don't work?" This does not work unless you are able to take 1338 the time to restore from tape, as in section (8.27). There's also the user data 1339 to consider -- if a change has been applied to a production machine, and the 1340 machine has run for any length of time, then the data may no longer be 1341 compatible with the earlier version of code (8.28). When using an ASAT in 1342 particular, it appears that changes should be assumed to be monotonic (8.39). 1343 8.49 - It appears that editing, removing, or otherwise altering the master 1344 description of prior changes (8.24) is harmful if those changes have already 1345 been deployed to production machines. Editing previously-deployed changes is 1346 one cause of divergence (4.1). A better method is to always "roll forward" by 1347 adding new corrective changes, as in section (8.31). 1348 8.50 - It is extremely tempting to try to create a declarative or descriptive 1349 language L that is able to overcome the ordering restrictions in (8.45) and 1350 (8.49). The appeal of this is obvious: "Here are the results I want, go make it 1351 so." 1352 A tool that supports this language would work by sampling subsets of disk 1353 content, similar to the way our Turing machine samples individual tape cells 1354 (8.1). The tool would read some instruction set P, which was written in L by 1355 the sysadmin. While sampling disk content, the tool would keep track of some 1356 internal state S, similar to our Turing machine's state (8.2). Upon discovering 1357 a state and disk sample that matched one of the instructions in P, the tool 1358 could then change state, rewrite some part of the disk, and look at some other 1359 part of the disk for something else to do. Assuming a constant instruction set 1360 P, and a fixed virtual machine in which to interpret P, this would provide 1361 repeatable, validatable results (8.3). 1362 8.51 - Since the tool in section (8.50) is an ASAT (8.36), influenced by the 1363 AVM dependency tree (8.34), it is equivalent to a Turing Virtual Machine as in 1364 (8.37). This means that it is subject to the ordering constraints of (8.45). If 1365 the host is networked, then the behavior shown in (8.15) through (8.20) will be 1366 evident. 1367 8.52 - Due to (8.51), there appears to be no language, declarative or 1368 imperative, that is able to fully describe the desired content of the root- 1369 owned, managed portions of a disk while neglecting ordering and history. This 1370 is not a language problem: The behavior of the language interpreter or AVM 1371 (8.34) itself is subject to current disk content in unforeseen ways (8.35). 1372 We stipulate that disk content can be completely described in any language by 1373 simply stating the complete contents of the disk. This is still a case of 1374 ordering, a case in which there is only one change to be made. Cloning, 1375 discussed in section (3), is an applied example of this case. This class of 1376 change seems to be free of the circular dependencies of an AVM; the new disk 1377 image is usually applied when running from an NFS or ramdisk root partition, 1378 not while modifying a live machine. 1379 8.53 - A tool constructed as in section (8.50) is useful for a very well- 1380 defined purpose; when hosts have diverged (8.47) beyond any ability to keep 1381 track of what changes have already been made. At this point, you have two 1382 choices; rebuild the hosts from scratch, using a tool that tracks lifetime 1383 ordering; or use a convergence tool to gain some control over them. 1384 8.54 - It is tempting to ask "Does every change really need to be strictly 1385 sequenced? Aren't some changes orthogonal?" By orthogonal we mean that the 1386 subsystems affected by the changes are fully independent, non-overlapping, 1387 cause no conflict, and have no interaction each other, and therefore are not 1388 subject to ordering concerns. 1389 While it is true that some changes will always be orthogonal, we cannot easily 1390 prove orthogonality in advance. It might appear that some changes are 1391 "obviously unrelated" and therefore not subject to sequencing issues. The 1392 problem is, who decides? We stipulate that talent and experience are useful 1393 here, for good reason: it turns out that orthogonality decisions are subject to 1394 the same pitfalls as software testing. 1395 For example, inspection (8.8) and testing (8.9) can help detect changes which 1396 are not orthogonal. Code coverage information (8.10) can be used to ensure the 1397 validity of the testing itself. But in the end, none of these provide assurance 1398 that any two changes are orthogonal, and like other testing, we cannot know 1399 when we have tested or inspected for orthogonality enough. 1400 Due to this lack of assurance, the cost of predicting orthogonality needs to 1401 accrue the potential cost of any errors that result from a faulty prediction. 1402 This error cost includes lost revenue, labor required for recovery, and loss of 1403 goodwill. We may be able to reduce this error cost, but it cannot be zero -- a 1404 zero cost implies that we never make mistakes when analyzing orthogonality. 1405 Because the cost of prediction includes this error cost as well as the cost of 1406 testing, we know that prediction of orthogonality is more expensive than either 1407 the testing or error cost alone: 1408 Cpredict > Cerror 1409 Cpredict > Ctest 1410 8.55 - As a crude negative proof, let us take a look at what would happen if we 1411 were to allow the order of changes to be totally unsequenced on a production 1412 host. First, if we were to do this, it is apparent that some sequences would 1413 not work at all, and probably damage the host (8.26). We would need to have a 1414 way of preventing them from executing, probably by using some sort of exclusion 1415 list. In order to discover the full list of bad sequences, we would need to 1416 test and/or inspect each possible sequence. 1417 This is an intractable problem: the number of possible orderings of M changes 1418 is M!. If each build/test cycle takes an hour, then any number of changes 1419 beyond 7 or 8 becomes impractical -- testing all combinations of 8 changes 1420 would require 4.6 years. In practice, we see change sets much larger than this; 1421 the ISconf version 2i makefile for building HACMP clusters, for instance, has 1422 sequences as long as 121 operations -- that's 121!/24/365, or 9.24*10^196 1423 years. It is easier to avoid unsequenced changes. 1424 The cost of testing and inspection required to enable randomized sequencing 1425 appears to be greater than the cost of testing a subset of all sequences 1426 (8.25), and greater than the testing, inspection, and accrued error of 1427 predicting orthogonality (8.54): 1428 Crandom > Cpredict > Cpartial 1429 8.56 - As a self-administering machine changes its disk contents, it may change 1430 its ability to change its disk contents. A change directive that works now may 1431 not work in the same way on the same machine in the future and vice versa 1432 (8.26). There appears to be a need to constrain the order of change directives 1433 in order to obtain predictable behavior. 1434 8.57 - In contrast to (8.52), a language that supports execution of an ordered 1435 set of changes appears to satisfy (8.56), and appears to have the ability to 1436 fully describe any arbitrary disk content, as in (7.1). 1437 8.58 - In practice, sysadmins tend to make changes to UNIX hosts as they 1438 discover the need for them; in response to user request, security concern, or 1439 bug fix. If the goal is minimum work for maximum reliability, then it would 1440 appear that the "ideal" sequence is the one which is first known to work -- the 1441 sequence in which the changes were created and tested. This sequence carries 1442 the least testing cost. It carries a lower risk than a sequence which has been 1443 partially tested or not tested at all. 1444 The costs in sections (8.8), (8.9), (8.25), (8.54), and (8.55) are related to 1445 each other as shown in (figure_8.58.1). This leads us to these conclusions: 1446 * Validating, inspecting, testing, and deploying a single sequence (Ctest) 1447 appears to be the least-cost host change management technique. 1448 * Adequate testing of partially-ordered sequences (Cpartial) is more 1449 expensive. 1450 * Predicting orthogonality between partial sequences (Cpredict) is yet more 1451 expensive. 1452 * The testing required to enable random change sequences (Crandom) is more 1453 expensive than any other testing, due to the N! combinatorial explosions 1454 involved. 1455 [images/costs.png] 1456 Figure 8.58.1: Relationship between costs of various ordering 1457 techniques; larger set size means higher cost. 1458 8.59 - The behavioral attributes of a complex host seem to be effectively 1459 infinite over all possible inputs, and therefore difficult to fully quantify 1460 (8.9). The disk size is finite, so we can completely describe hosts in terms of 1461 disk content (7.1), but we cannot completely describe hosts in terms of 1462 behavior. We can easily test all disk content, but we do not seem to be able to 1463 test all possible behavior. 1464 This point has important implications for the design of management tools - 1465 - behavior seems to be a peripheral issue, while disk content seems to play a 1466 more central role. It would seem that tools which test only for behavior will 1467 always be convergent at best. Tools which test for disk content have the 1468 potential to be congruent, but only if they are able to describe the entire 1469 disk state. One way to describe the entire disk is to support an initial disk 1470 state description followed by ordered changes, as in (7.1). 1471 8.60 - There appears to be a general statement we can make about software 1472 systems that run "on top of" others in a "virtual machine" or other software- 1473 constructed execution environment (8.34): 1474 If any virtual machine instruction has the ability to alter the 1475 virtual machine instruction set, then different instruction execution 1476 orders can produce different instruction sets. Order of execution of 1477 these instructions is critical in determining the future instruction 1478 set of the machine. Faulty order has the potential to remove the 1479 ability for the machine to update the instruction set or to function 1480 at all. 1481 This applies to any application, automatic administration tool (8.37), or 1482 shared library code executed as root on a UNIX machine (it also applies to 1483 other cases on other operating systems). These all interact with hardware and 1484 the outside world via the operation system kernel, and have the ability to 1485 change that same kernel as well as higher-level elements of their "virtual 1486 machine". This statement appears to be independent of the language of the 1487 virtual machine instruction set (8.52). 1488 ***** 9 Conclusion and Critique ***** 1489 One interesting result of automated systems administration efforts might be 1490 that, like the term 'computer', the term 'system administrator' may someday 1491 evolve to mean a piece of technology rather than a chained human. 1492 Sometime in the last few years, we began to suspect that deterministic ordering 1493 of host changes may be the airfoil of automated systems administration. Many 1494 other tool designers make use of algorithms that specifically avoid any 1495 ordering constraint; we accepted ordering as an axiom. 1496 With this constraint in place, we built and maintained thousands of hosts, in 1497 many mission-critical production infrastructures worldwide, with excellent 1498 results. These results included high reliability and security, low cost of 1499 ownership, rapid deployments and changes, easy turnover, and excellent 1500 longevity -- after several years, some of our first infrastructures are still 1501 running and are actively maintained by people we've never met, still using the 1502 same toolset. Our attempts to duplicate these results while neglecting ordering 1503 have not met these same standards as well as we would like. 1504 In this paper, our first attempt at explaining a theoretical reason why these 1505 results might be expected, we have not "proven" the connection between ordering 1506 and theory in any mathematical sense. We have, however, been able to provide a 1507 thought experiment which we hope will help guide future research. Based on this 1508 thought experiment, it seems that more in-depth theoretical models may be able 1509 to support our practical results. 1510 This work seems to imply that, if hosts are Turing equivalent (with the 1511 possible exception of tape size) and if an automated administration tool is 1512 Turing equivalent in its use of language, then there may be certain self- 1513 referential behaviors which we might want to either avoid or plan for. This in 1514 turn would imply that either order of changes is important, or the host or 1515 method of administration needs to be constrained to less than Turing 1516 equivalence in order to make order unimportant. The validity of this claim is 1517 still an open question. In our deployments we have decided to err on the side 1518 of ordering. 1519 On tape size: one addition to our "thought experiment" might be a stipulation 1520 that a network-connected host may in fact be fully equivalent to a Universal 1521 Turing Machine, including infinite tape size, if the network is the Internet. 1522 This is possibly true, due to the fact that the host's own network interface 1523 card will always have a lower bandwidth than the growth rate of the Internet 1524 itself -- the host cannot ever reach "the end of the tape". We have not 1525 explored the implications or validity of this claim. If true, this claim may be 1526 especially interesting in light of the recent trend of package management tools 1527 which are able to self-select, download, and install packages from arbitrary 1528 servers elsewhere on the Internet. 1529 Synthesizing a theoretical basis for why "order matters" has turned out to be 1530 surprisingly difficult. The concepts involve the circular dependency chain 1531 mentioned in section (5), the dependency trees which conventional package 1532 management schemes support, as well as the interactions between these and more 1533 granular changes, such as patches and configuration file edits. Space and 1534 accessibility concerns precluded us from accurately providing rigorous proofs 1535 for the points made in section (8). Rather than do so, we have tried to express 1536 these points as hypotheses, and have provided some pointers to some of the 1537 foundation theories that we believe to be relevant. We encourage others to 1538 attempt to refute or support these points. 1539 One issue we have not adequately covered is the fact that changing the order of 1540 actions can not only break machines, but the actions themselves may not 1541 complete. Altering order often calls for altering the content of the actions 1542 themselves if success is to be assured. 1543 There may be useful vulnerabilities or benefits hidden in the structure of 1544 section (8). Even after the many months we have spent poring over it, it is 1545 still certainly more complex than it needs to be, with many intertwined threads 1546 and long chains of assumptions (figure_9.1). One reason for this complexity was 1547 our desire to avoid forward references within that section; we didn't want to 1548 inadvertently base any point on circular logic. A much more readable text could 1549 likely be produced by reworking these threads into a single linear order, 1550 though that would likely require adding the forward references back in. 1551 For further theoretical study, we recommend: 1552 * Gödel Numbers 1553 * Gödel's Incompleteness Theorem 1554 * Chomsky's Hierarchy 1555 * Diagonalization 1556 * The halting problem 1557 * NP completeness and the Traveling Salesman Problem 1558 * Theory of ordered sets 1559 * Closed-loop control theory 1560 Starting points for most of these can be found in [greenlaw] [garey] 1561 [brookshear] [dewdney]. 1562 [images/sref-small.gif] 1563 Figure 9.1: Thread structure of section (8) 1564 ***** 10 Acknowledgments ***** 1565 We'd like to thank all souls who strive to better your organizations' computing 1566 infrastructures, often against active opposition by your own management. You 1567 know that your efforts are not likely to be understood by your own CIO. You do 1568 this for the good of the organization and the global economy; you do this in 1569 order to improve the quality of life of your constituents, often at the cost of 1570 your own health; you do this because you know it is the right thing to do. In 1571 this year of security-related tragedies and corporate accounting scandals, you 1572 know that if the popular media recognized what's going on in our IT departments 1573 there'd be hell to pay. But you know they won't, not for many years, if ever. 1574 Still you try to clean up the mess, alone. You are all heroes. 1575 The debate that was the genesis of this paper began in Mark Burgess' cfengine 1576 workshop, LISA 2001. 1577 Alva Couch provided an invaluable sounding board for the theoretical 1578 foundations of this paper. Paul Anderson endured the intermediate drafts, 1579 providing valuable constructive criticism. Paul's wife, Jessie, confirmed 1580 portability of these principles to other operating systems and provided early 1581 encouragement. Jon Stearley provided excellent last-minute review guidance. 1582 Joel Huddleston responded to our recall with his usual deep interest in any 1583 brain-exploding problem, the messier the better. 1584 The members of the infrastructures list have earned our respect as a group of 1585 very smart, very capable individuals. Their reactions to drafts were as good as 1586 rocket fuel. In addition to those mentioned elsewhere, notable mention goes to 1587 Ryan Nowakowski and Kevin Counts, for their last-minute readthrough of final 1588 drafts. 1589 Steve's wife, Joyce Cao Traugott, made this paper possible. Her sense of 1590 wonder, analytical interest in solving the problem, and unconditional love let 1591 Steve stay immersed far longer than any of us suspected would be necessary. 1592 Thank You, Joyce. 1593 ***** 11 About the Authors ***** 1594 Steve Traugott is a consulting Infrastructure Architect, and publishes tools 1595 and techniques for automated systems administration. His firm, TerraLuna LLC, 1596 is a specialty consulting organization that focuses on enterprise 1597 infrastructure architecture. His deployments have ranged from New York trading 1598 floors, IBM mainframe UNIX labs, and NASA supercomputers to web farms and 1599 growing startups. He can be reached via the Infrastructures.Org, TerraLuna.Com, 1600 or stevegt.com web sites. 1601 Lance Brown taught himself Applesoft BASIC in 9th grade by pestering the 11th 1602 graders taking Computer Science so much their teacher gave him a complete copy 1603 of all the handouts she used for the entire semester. Three weeks later he 1604 asked for more. He graduated college with a BA in Computer Science, attended 1605 graduate school, and began a career as a software developer and then systems 1606 administrator. He has been the lead Unix sysadmin for central servers at the 1607 National Institute of Environmental Health Sciences in Research Triangle Park, 1608 North Carolina for the last six years. 1609 ***** 12 References ***** 1610 [bootstrap] Bootstrapping an infrastructure, Steve Traugott and Joel 1611 Huddleston, Proceedings of the 12th Systems Administration Conference (LISA 1612 XII) (USENIX Association: Berkeley, CA), pp. 181, 1998 1613 [brookshear] Computer Science, An Overview, (very accessible text), J. Glenn 1614 Brookshear, Addison Wesley, 2000, ISBN 0-201-35747-X 1615 [centerrun] CenterRun Application Management System, http://www.centerrun.com 1616 [cfengine] Cfengine, A configuration engine, http://www.cfengine.org/ 1617 [church] Review of Turing 1936, Church, A., 1937a Journal of Symbolic Logic, 2, 1618 42-43. 1619 [couch] The Maelstrom: Network Service Debugging via "Ineffective Procedures", 1620 Alva Couch and N. Daniels, Proceedings of the Fifteenth Systems Administration 1621 Conference (LISA XV) (USENIX Association: Berkeley, CA), pp. 63, 2001 1622 [cvs] Concurrent Version System, http://www.cvshome.org 1623 [cvsup] CVSup Versioned Software Distribution package, http://www.openbsd.org/ 1624 cvsup.html 1625 [debian] Debian Linux, http://www.debian.org 1626 [dewdney] The (New) Turing Omnibus -- 66 Excursions in Computer Science, A. K. 1627 Dewdney, W. H. Freeman and Company, 1993 1628 [eika-sandnes] Scheduling Partially Ordered Events In A Randomized Framework - 1629 Empirical Results And Implications For Automatic Configuration Management, 1630 Frode Eika Sandnes, Proceedings of the Fifteenth Systems Administration 1631 Conference (LISA XV) (USENIX Association: Berkeley, CA), 2001 1632 [elbaum] The Impact of Software Evolution on Code Coverage Information 1633 Sebastian G. Elbaum, David Gable, Gregg Rothermel, International Conference on 1634 Software Engineering p. 170-179, 2001 1635 [garey] Computers and Intractability, A guide to the theory of NP-Completeness, 1636 Michael R. Garey, David S. Johnson, W.H. Freeman and and Company, 2002, ISBN 0- 1637 7167-1045-5 1638 [godel] Uber formal unentscheidbare Satze der Principia Mathematica und 1639 verwandter Systeme, Kurt Godel, Monatshefte fur Mathematik und Physik, 38:173-- 1640 198, 1931. 1641 [godfrey] The Computer as Von Newmann Planned it, M.D. Godfrey, D.F Hendry, 1642 IEEE Annals of the History of Computing, Vol 15, No 1, 1993 1643 [greenlaw] Fundamentals of the Theory of Computation, (includes examples in C 1644 and UNIX shell, detailed references to seminal works, Raymond Greenlaw, H James 1645 Hoover, Morgan Kaufmann, 1998, ISBN 1-55860-474-X 1646 [hagerty] Daniel Hagerty, hag@ai.mit.edu, 2002, personal correspondence 1647 [hamlet] Foundations of Software Testing: Dependability Theory, Dick Hamlet, 1648 Software Engineering Notes v 19, No.5, Proceedings of the Second ACM SIGSOFT 1649 Symposium on Foundations of Software Engineering, pp. 128-139, 1994 1650 [hart] An Analysis of RPM Validation Drift, John Hart and Jeffrey D'Amelia, 1651 Proceedings of the 16th Systems Administration Conference (USENIX Association: 1652 Berkeley, CA), 2002 1653 [immunology] Computer immunology, M. Burgess, Proceedings of the Twelth Systems 1654 Administration Conference (LISA XII) (USENIX Association: Berkeley, CA), pp. 1655 283, 1998 1656 [isconf] ISconf, Infrastructure configuration manager, http://www.isconf.org 1657 and http://www.infrastructures.org 1658 [jiang] Basic Notions in Computational Complexity, Tao Jiang, Ming Li, Bala 1659 Ravikumar, Algorithms and Theory of Computation Handbook p. 24-1, CRC Press, 1660 1999, ISBN 0-8493-2649-4 1661 [laitenberger] An encompassing life cycle centric survey of software 1662 inspection, Oliver Laitenberger and Jean-Marc DeBaud, The Journal of Systems 1663 and Software, vol 50, num 1, pp. 5--31, 2000 1664 [lcfg] LCFG: A large scale UNIX configuration system, http://www.lcfg.org 1665 [lisa] Large Installation Systems Administration Conference, USENIX 1666 Association, Berkeley, CA, http://www.usenix.org 1667 [mccabe] Software Complexity, McCabe, Thomas J. & Watson, Arthur H, Crosstalk, 1668 Journal of Defense Software Engineering 7, 12 (December 1994): 5-9. 1669 [nordin] Evolving Turing-Complete Programs for a Register Machine with Self- 1670 modifying Code, Peter Nordin and Wolfgang Banzhaf, Genetic Algorithms: 1671 Proceedings of the Sixth International Conference (ICGA95), Morgan Kaufmann, L. 1672 Eshelman, pp. 318--325, 1995, 15-19, ISBN 1-55860-370-0 1673 [oetiker] Template Tree II: The Post-Installation Setup Tool, T. Oetiker, 1674 Proceedings of the Fifteenth Systems Administration Conference (LISA XV) 1675 (USENIX Association: Berkeley, CA), pp. 179, 2001 1676 [opsware] Opsware Management System, http://www.opsware.com 1677 [pikt] PIKT: "Problem Informant/Killer Tool", http://www.pikt.org 1678 [rdist] Overhauling Rdist for the '90s, M.A. Cooper, Proceedings of the Sixth 1679 Systems Administration Conference (LISA VI) (USENIX Association: Berkeley, CA), 1680 pp. 175, 1992 1681 [richardson] Partition analysis: a method combining testing and verification, 1682 D. J. Richardson and L. A. Clarke, IEEE Trans. Soft. Eng., 11(12):1477--1490, 1683 1985 1684 [rsync] rsync incremental file transfer utility, http://samba.anu.edu.au/rsync 1685 [ssh] SSH protocol suite of network connectivity tools, http://www.openssh.org 1686 [sup] The SUP Software Upgrade Protocol, Steven Shafer and Mary Thompson, 1989 1687 [tivoli] Tivoli Management Framework, http://www.tivoli.com 1688 [turing] On Computable Numbers, with an Application to the 1689 Entscheidungsproblem, Alan M. Turing, Proceedings of the London Mathematical 1690 Society, Series 2, 42 (1936-37), pp.230-265. 1691 [vonneumann] First Draft of a Report on the EDVAC, John Von Neumann, IEEE 1692 Annals of the History of Computing, Vol 15, No 4, 1993 1693 [xilinx] Xilink Virtex-II Platform FPGA, http://www.xilinx.com