modernc.org/knuth@v0.0.4/mf/internal/trap/trapman.tex (about) 1 % The TRAP manual: How to validate MF --- last updated by D E Knuth on 4 Dec 89 2 \font\eighttt= cmtt8 3 \font\eightrm= cmr8 4 \font\titlefont=cmssdc10 at 40pt 5 \let\mc=\eightrm 6 \font\logo=manfnt % font used for the METAFONT logo 7 \def\MF{{\logo META}\-{\logo FONT}} 8 \rm 9 \let\mainfont=\tenrm 10 11 \def\.#1{\hbox{\tt#1}} 12 \def\\#1{\hbox{\it#1\/\hskip.05em}} % italic type for identifiers 13 14 \parskip 2pt plus 1pt 15 \baselineskip 12pt plus .25pt 16 17 \def\verbatim#1{\begingroup \frenchspacing 18 \def\do##1{\catcode`##1=12 } \dospecials 19 \parskip 0pt \parindent 0pt 20 \catcode`\ =\active \catcode`\^^M=\active 21 \tt \def\par{\ \endgraf} \obeylines \obeyspaces 22 \input #1 \endgroup} 23 % a blank line will be typeset at the end of the file; 24 % if you're unlucky it will appear on a page by itself! 25 {\obeyspaces\global\let =\ } 26 27 \output{\shipout\box255\global\advance\pageno by 1} % for the title page only 28 \null 29 \vfill 30 \centerline{\titlefont A Torture Test} 31 \vskip8pt 32 \centerline{\titlefont for \logo ()*+,-.*} 33 \vskip 24pt 34 \centerline{by Donald E. Knuth} 35 \centerline{Stanford University} 36 \vskip 6pt 37 \centerline{({\sl Version 2, January 1990\/})} 38 \vfill 39 \centerline{\vbox{\hsize 4in 40 \noindent Programs that claim to be implementations of \MF84 are 41 supposed to be able to process the test routine contained in this 42 report, producing the outputs contained in this report.}} 43 \vskip 24pt 44 {\baselineskip 9pt 45 \eightrm\noindent 46 The preparation of this report was supported in part by the National Science 47 Foundation under grants IST-8201926 and MCS-8300984, 48 and by the System Development Foundation. 49 {\logo opqrstuq} is a trademark of Addison-Wesley Publishing Company. 50 51 52 }\pageno=0\eject 53 54 \output{\shipout\vbox{ % for subsequent pages 55 \baselineskip0pt\lineskip0pt 56 \hbox to\hsize{\strut 57 \ifodd\pageno \hfil\eightrm\firstmark\hfil 58 \mainfont\the\pageno 59 \else\mainfont\the\pageno\hfil 60 \eightrm\firstmark\hfil\fi} 61 \vskip 10pt 62 \box255} 63 \global\advance\pageno by 1} 64 \let\runninghead=\mark 65 \outer\def\section#1.{\noindent{\bf#1.}\quad 66 \runninghead{\uppercase{#1} }\ignorespaces} 67 68 \section Introduction. 69 People often think that their programs are ``debugged'' when large applications 70 have been run successfully. But system programmers know that a typical large 71 application tends to use at most about 50 per cent of the instructions 72 in a typical compiler. Although the other half of the code---which tends 73 to be the ``harder half''---might be riddled with errors, the system seems 74 to be working quite impressively until an unusual case shows up on the 75 next day. And on the following day another error manifests itself, and so on; 76 months or years go by before certain parts of the compiler are even 77 activated, much less tested in combination with other portions of the system, 78 if user applications provide the only tests. 79 80 How then shall we go about testing a compiler? Ideally we would like to 81 have a formal proof of correctness, certified by a computer. 82 This would give us a lot of confidence, 83 although of course the formal verification program might itself be incorrect. 84 A more serious drawback of automatic verification is that the formal 85 specifications of the compiler are likely to be wrong, since they aren't 86 much easier to write than the compiler itself. Alternatively, we can 87 substitute an informal proof of correctness: The programmer writes his or 88 her code in a structured manner and checks that appropriate relations 89 remain invariant, etc. This helps greatly to reduce errors, but it cannot 90 be expected to remove them completely; the task of checking a large 91 system is sufficiently formidable that human beings cannot do it without 92 making at least a few slips here and there. 93 94 Thus, we have seen that test programs are unsatisfactory if they are simply 95 large user applications; yet some sort of test program is needed because 96 proofs of correctness aren't adequate either. People have proposed schemes 97 for constructing test data automatically from a program text, but such 98 approaches run the risk of circularity, since they cannot assume that a 99 given program has the right structure. 100 101 I have been having good luck with a somewhat different approach, 102 first used in 1960 to debug an {\mc ALGOL} compiler. The idea is to 103 construct a test file that is about as different from a typical user 104 application as could be imagined. Instead of testing things that people 105 normally want to do, the file tests complicated things that people would 106 never dare to think of, and it embeds these complexities in still 107 more arcane constructions. Instead of trying to make the compiler do the 108 right thing, the goal is to make it fail (until the bugs have all been found). 109 110 To write such a fiendish test routine, one simply gets into a nasty frame 111 of mind and tries to do everything in the unexpected way. Parameters 112 that are normally positive are set negative or zero; borderline cases 113 are pushed to the limit; deliberate errors are made in hopes that the 114 compiler will not be able to recover properly from them. 115 116 A user's application tends to exercise 50\%\ of a compiler's logic, 117 but my first fiendish tests tend to improve this to about 90\%. As the 118 next step I generally make use of frequency-counting software to identify 119 the instructions that have still not been called upon. Then I add ever more 120 fiendishness to the test routine, until more than 99\%\ of the code 121 has been used at least once. (The remaining bits are things that 122 can occur only if the source program is really huge, or if certain 123 fatal errors are detected; or they are cases so similar to other well-tested 124 things that there can be little doubt of their validity.) 125 126 Of course, this is not guaranteed to work. But my experience in 1960 was 127 that only two bugs were ever found in that {\mc ALGOL} compiler after it 128 correctly translated that original fiendish test. And one of those bugs 129 was actually present in the results of the test; I simply had failed to 130 notice that the output was incorrect. Similar experiences occurred later 131 during the 60s and 70s, with respect to a few assemblers, compilers, 132 and simulators that I wrote. 133 134 This method of debugging, combined with the methodology of structured 135 programming and informal proofs (otherwise known as careful desk checking), 136 leads to greater reliability of production software than any other 137 method I know. Therefore I have used it in developing \MF84, and the 138 main bulk of this report is simply a presentation of the test program 139 that was used to get the bugs out of \MF. 140 141 Such a test file is useful also after a program has been debugged, since 142 it can be used to give some assurance that subsequent modifications don't 143 mess things up. 144 145 The test file is called \.{TRAP.MF}, because of my warped sense of humor: 146 \MF's companion system, \TeX, has a similar test file called \.{TRIP}, and I 147 couldn't help thinking about Billy Goat Gruff and the story of ``trip, 148 trap, trip, trap.'' 149 150 The contents of this test file are so remote from what people actually 151 do with \MF, I feel apologetic if I have to explain the correct 152 translation of \.{TRAP.MF}; nobody really cares about most of the 153 nitty-gritty rules that are involved. Yet I believe \.{TRAP} exemplifies 154 the sort of test program that has outstanding diagnostic ability, as 155 explained above. 156 157 If somebody claims to have a correct implementation of \MF, I will not 158 believe it until I see that \.{TRAP.MF} is translated properly. 159 I propose, in fact, that a program must meet two criteria before it 160 can justifiably be called \MF: (1)~The person who wrote it must be 161 happy with the way it works at his or her installation; and (2)~the 162 program must produce the correct results from \.{TRAP.MF}. 163 164 \MF\ is in the public domain, and its algorithms are published; 165 I've done this since I do not want to discourage its use by placing 166 proprietary restrictions on the software. However, I don't want 167 faulty imitations to masquerade as \MF\ processors, since users 168 want \MF\ to produce identical results on different machines. 169 Hence I am planning to do whatever I can to suppress any systems that 170 call themselves \MF\ without meeting conditions (1) and~(2). 171 I have copyrighted the programs so that I have some chance to forbid 172 unauthorized copies; I explicitly authorize copying of correct 173 \MF\ implementations, and not of incorrect ones! 174 175 The remainder of this report consists of appendices, whose contents ought 176 to be described briefly here: 177 178 Appendix A explains in detail how to carry out a test of \MF, given 179 a tape that contains copies of the other appendices. 180 181 Appendix B is \.{TRAP.MF}, the fiendish test file that has already 182 been mentioned. People who think that they understand \MF\ are challenged 183 to see if they know what \MF\ is supposed to do with this file. 184 People who know only a little about \MF\ might still find it 185 interesting to study Appendix~B, just to get some insights into the 186 methodology advocated here. 187 188 Appendix C is \.{TRAPIN.LOG}, a correct transcript file \.{TRAP.LOG} 189 that results if \.{INIMF} is applied to \.{TRAP.MF}. (\.{INIMF} is 190 the name of a version of \MF\ that does certain initializations; 191 this run of \.{INIMF} also creates a binary base file called \.{TRAP.BASE}.) 192 193 Appendix D is a correct transcript file \.{TRAP.LOG} that results if 194 \.{INIMF} or any other version of \MF\ is applied to \.{TRAP.MF} 195 with base file \.{TRAP.BASE}. 196 197 Appendix E is \.{TRAP.TYP}, the symbolic version of a correct output 198 file \.{TRAP.72270GF} that was produced at the same time as the \.{TRAP.LOG} 199 file of Appendix~D. 200 201 Appendix F is \.{TRAP.PL}, the symbolic version of a correct output 202 file \.{TRAP.TFM} that was produced at the same time as the \.{TRAP.LOG} 203 file of Appendix~D. 204 205 Appendix G is \.{TRAP.FOT}, an abbreviated version of Appendix D that 206 appears on the user's terminal during the run that produces \.{TRAP.LOG}, 207 \.{TRAP.72270GF}, and \.{TRAP.TFM}. 208 209 The debugging of \MF\ and the testing of the adequacy of \.{TRAP.MF} 210 could not have been done nearly as well as reported here except for 211 the magnificent software support provided by my colleague David R. Fuchs. 212 In particular, he extended our local Pascal compiler so that 213 frequency counting and a number of other important features were added 214 to its online debugging abilities. 215 216 The method of testing advocated here has one chief difficulty that deserves 217 comment: I had to verify by hand that \MF\ did the right things 218 to \.{TRAP.MF}. This took many hours, and perhaps I have missed 219 something (as I did in 1960); I must confess that I have not checked 220 every single number in Appendices D, E, and~F. However, I'm willing to pay 221 $\$$81.92 to the first finder of any remaining bug in \MF, and I will 222 be surprised if that bug doesn't show up also in one of these appendices. 223 224 \vfill\eject 225 226 \section Appendix A: How to test \MF. 227 228 \item{0.} Let's assume that you have a tape containing \.{TRAP.MF}, 229 \.{TRAPIN.LOG}, \.{TRAP.LOG}, \.{TRAP.TYP}, \.{TRAP.PL}, and \.{TRAP.FOT}, 230 as in Appendices B, C, D, E, F, and~G. Furthermore, let's suppose that you 231 have a working \.{WEB} system, and that you have working programs 232 \.{TFtoPL} and \.{GFtype}, as described in the \TeX ware and \MF ware reports. 233 234 \item{1.} Prepare a version of \.{INIMF}. (This means that your \.{WEB} 235 change file should have {\bf init} and {\bf tini} defined to be null.) 236 The {\bf debug} and {\bf gubed} macros should be null, in order to 237 activate special printouts that occur when $\\{tracingedges}>1.0$. 238 The {\bf stat} and {\bf tats} macros should also be null, so that 239 statistics are kept. Set \\{mem\_top} and \\{mem\_max} to 3000 240 (or to \\{mem\_min} plus 3000, if \\{mem\_min} isn't zero), 241 for purposes of this test version. 242 Also set $\\{error\_line}=64$, $\\{half\_error\_line}=32$, 243 $\\{max\_print\_line}=72$, $\\{screen\_width}=100$, and 244 $\\{screen\_depth}=200$; these parameters affect many of the lines of 245 the test output, so your job will be much easier if you use the same 246 settings that were used to produce Appendix~E. Also (if possible) set 247 $\\{gf\_buf\_size}=8$, since this tests more parts of the program. 248 You probably should also use the ``normal'' settings of other parameters 249 found in \.{MF.WEB} (e.g., $\\{max\_internal}=100$, $\\{buf\_size}=500$, 250 etc.), since these show up in a few lines of the test output. Finally, 251 change \MF's screen-display routines by putting the following simple lines 252 in the change file: 253 $$\vbox{\halign{\tt#\hfil\cr 254 \char`\@x Screen routines:\cr 255 begin init\char`\_screen:=false;\cr 256 \char`\@y\cr 257 begin init\char`\_screen:=true; 258 \char`\{screen instructions will be logged\char`\}\cr 259 \char`\@z\cr}}$$ 260 None of the other screen routines (\\{update\_screen}, \\{blank\_rectangle}, 261 \\{paint\_row}) should be changed in any way; the effect will be to have 262 \MF's actions recorded in the transcript files instead of on the screen, 263 in a machine-independent way. 264 265 \item{2.} Run the \.{INIMF} prepared in step 1. In response to the first 266 `\.{**}' prompt, type carriage return (thus getting another `\.{**}'). 267 Then type `\.{\char`\\input trap}'. You should get an output that matches 268 the file \.{TRAPIN.LOG} (Appendix~C). Don't be alarmed by the error 269 messages that you see, unless they are different from those in Appendix~C. 270 271 \def\sp{{\char'40}} 272 \item{3.} Run \.{INIMF} again. This time type `\.{\sp\&trap\sp\sp trap\sp}'. 273 (The spaces in this input help to check certain parts of \MF\ that 274 aren't otherwise used.) You should get outputs \.{TRAP.LOG}, \.{TRAP.72270GF}, 275 and \.{TRAP.TFM}. 276 Furthermore, your terminal should receive output that matches \.{TRAP.FOT} 277 (Appendix~G). During the middle part of this test, however, the terminal 278 will not be getting output, because \.{batchmode} is being 279 tested; don't worry if nothing seems to be happening for a while---nothing 280 is supposed to. 281 282 \item{4.} Compare the \.{TRAP.LOG} file from step 3 with the ``master'' 283 \.{TRAP.LOG} file of step~0. (Let's hope you put that master file in a 284 safe place so that it wouldn't be clobbered.) There should be perfect 285 agreement between these files except in the following respects: 286 287 \itemitem{a)} The dates and possibly the file names will 288 naturally be different. 289 290 \itemitem{b)} If you had different values for \\{stack\_size}, \\{buf\_size}, 291 etc., the corresponding capacity values will be different when they 292 are printed out at the end. 293 294 \itemitem{c)} Help messages may be different; indeed, the author encourages 295 non-English help messages in versions of \MF\ for people who don't 296 understand English as well as some other language. 297 298 \itemitem{d)} The total number and length of strings at the end and/or 299 ``still untouched'' may well be different. 300 301 \itemitem{e)} If your \MF\ uses a different memory allocation or 302 packing scheme, the memory usage statistics may change. 303 304 \itemitem{f)} If you use a different storage allocation scheme, the 305 capsule numbers will probably be different, but the order of variables 306 should be unchanged when dependent variables are shown. \MF\ should also 307 choose the same variables to be dependent. 308 309 \itemitem{g)} If your computer handles integer division of negative operands 310 in a nonstandard way, you may get results that are rounded differently. 311 Although \TeX\ is careful to be machine-independent in this regard, 312 \MF\ is not, because integer divisions are present in so many places. 313 314 \item{5.} Use \.{GFtype} to convert your file \.{TRAP.72270GF} to a file 315 \.{TRAP.TYP}. (Both of \.{GFtype}'s options, i.e., mnemonic output and image 316 output, should be enabled so that you get the maximum amount of output.) 317 The resulting file should agree with the master \.{TRAP.TYP} file of step~0, 318 assuming that your \.{GFtype} has the ``normal'' values of compile-time 319 constants ($\\{top\_pixel}=69$, etc.). 320 321 \item{6.} Use \.{TFtoPL} to convert your file \.{TRAP.TFM} to a file 322 \.{TRAP.PL}. The resulting file should agree with the master \.{TRAP.PL} 323 file of step~0. 324 325 \item{7.} You might also wish to test \.{TRAP} with other versions of 326 \MF\ (i.e., \.{VIRMF} or a production version with another base file 327 preloaded). It should work unless \MF's primitives have been redefined in 328 the base file. However, this step isn't essential, since all the code of 329 \.{VIRMF} appears in \.{INIMF}; you probably won't catch any more errors 330 this way, unless they would already become obvious from normal use of 331 the~system. 332 333 \vfill\eject 334 335 \section Appendix B: The \.{TRAP.MF} file. 336 The contents of the test routine are prefixed here with line numbers, for 337 ease in comparing this file with the error messages printed later; the 338 line numbers aren't actually present. 339 \runninghead{APPENDIX B: \.{TRAP.MF} (CONTINUED)} 340 341 \vskip 8pt 342 \begingroup\count255=0 343 \everypar{\global\advance\count255 by 1 344 \hbox to 20pt{\sevenrm\hfil\the\count255\ \ }} 345 \verbatim{trap.mf} 346 \endgroup 347 \vfill\eject 348 349 \section Appendix C: The \.{TRAPIN.LOG} file. 350 When \.{INIMF} makes the \.{TRAP.BASE} file, it also creates a file called 351 \.{TRAP.LOG} that looks like this. 352 \runninghead{APPENDIX C: \.{TRAPIN.LOG} (CONTINUED)} 353 354 \vskip8pt 355 \verbatim{trapin.log} 356 \vfill\eject 357 358 \section Appendix D: The \.{TRAP.LOG} file. 359 Here is the major output of the \.{TRAP} test; it is generated by running 360 \.{INIMF} and loading \.{TRAP.BASE}, then reading \.{TRAP.MF}. 361 \runninghead{APPENDIX D: \.{TRAP.LOG} (CONTINUED)} 362 363 {\let\tt=\eighttt\leftskip 1in\baselineskip 9pt plus .1pt minus .1pt 364 \vskip8pt 365 \verbatim{trap.log} 366 } 367 \vfill\eject 368 369 \section Appendix E: The \.{TRAP.TYP} file. 370 Here is another major component of the test. It shows the output of \.{GFtype} 371 applied to the file \.{TRAP.72270GF} that is created at the same time 372 Appendix D was produced. 373 \runninghead{APPENDIX E: \.{TRAP.TYP} (CONTINUED)} 374 375 {\let\tt=\eighttt\leftskip 1in\baselineskip 9pt plus .1pt minus .1pt 376 \vskip8pt 377 \verbatim{trap.typ} 378 } 379 \vfill\eject 380 381 \section Appendix F: The \.{TRAP.PL} file. 382 In this case we have the output of \.{TFtoPL} 383 applied to the file \.{TRAP.TFM} that is created at the same time 384 Appendix D was produced. 385 \runninghead{APPENDIX F: \.{TRAP.PL} (CONTINUED)} 386 387 {\let\tt=\eighttt\leftskip 1in\baselineskip 9pt plus .1pt minus .1pt 388 \vskip8pt 389 \verbatim{trap.pl} 390 } 391 \vfill\eject 392 393 \section Appendix G: The \.{TRAP.FOT} file. 394 This shows what appeared on the terminal while Appendix D was being produced. 395 \runninghead{APPENDIX G: \.{TRAP.FOT} (CONTINUED)} 396 397 \vskip8pt 398 \verbatim{trap.fot} 399 400 \vfill\end