modernc.org/knuth@v0.0.4/tangle/tangle.web (about) 1 % This program by D. E. Knuth is not copyrighted and can be used freely. 2 % Version 0 was released in December, 1981. 3 % Version 1 was released in September, 1982, with version 0 of TeX. 4 % Slight changes were made in October, 1982, for version 0.6 of TeX. 5 % Version 1.2 introduced {:nnn} comments, added @@= and @@\ (December, 1982). 6 % Version 1.4 added "history" (February, 1983). 7 % Version 1.5 conformed to TeX version 0.96 and fixed @@\ (March, 1983). 8 % Version 1.7 introduced the new change file format (June, 1983). 9 % Version 2.0 was released in July, 1983, with version 0.999 of TeX. 10 % Version 2.5 was released in November, 1983, with version 1.0 of TeX. 11 % Version 2.6 fixed a bug: force-line-break after a constant (August, 1984). 12 % Version 2.7 fixed the definition of check_sum_prime (May, 1985). 13 % Version 2.8 fixed a bug in change_buffer movement (August, 1985). 14 % Version 2.9 allows nonnumeric macros before their def (December, 1988). 15 % Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989). 16 % Version 4 was major change to allow 8-bit input (September, 1989). 17 % Version 4.1 conforms to ANSI standard for-loop rules (September, 1990). 18 % Version 4.2 fixes stat report if phase one dies (March, 1991). 19 % Version 4.3 fixes @@ bug in verbatim, catches extra } (September, 1991). 20 % Version 4.4 activates debug_help on errors as advertised (February, 1993). 21 % Version 4.5 prevents modno-comments from being split across lines (Dec 2002). 22 % Version 4.6 fixes archaic @@z logic; is again big enough for TeX (Jan 2021). 23 24 % Here is TeX material that gets inserted after \input webmac 25 \def\hang{\hangindent 3em\indent\ignorespaces} 26 \font\ninerm=cmr9 27 \let\mc=\ninerm % medium caps for names like SAIL 28 \def\PASCAL{Pascal} 29 \def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|) 30 \def\v{\.{\char'174}} % vertical (|) in typewriter font 31 \mathchardef\BA="3224 % double arrow 32 \def\({} % kludge for alphabetizing certain module names 33 34 \def\title{TANGLE} 35 \def\contentspagenumber{125} % should be odd 36 \def\topofcontents{\null\vfill 37 \titlefalse % include headline on the contents page 38 \def\rheader{\mainfont Appendix E\hfil \contentspagenumber} 39 \centerline{\titlefont The {\ttitlefont TANGLE} processor} 40 \vskip 15pt 41 \centerline{(Version 4.6)} 42 \vfill} 43 \pageno=\contentspagenumber \advance\pageno by 1 44 45 @* Introduction. 46 This program converts a \.{WEB} file to a \PASCAL\ file. It was written 47 by D. E. Knuth in September, 1981; a somewhat similar {\mc SAIL} program had 48 been developed in March, 1979. Since this program describes itself, a 49 bootstrapping process involving hand-translation had to be used to get started. 50 51 For large \.{WEB} files one should have a large memory, since \.{TANGLE} keeps 52 all the \PASCAL\ text in memory (in an abbreviated form). The program uses 53 a few features of the local \PASCAL\ compiler that may need to be changed in 54 other installations: 55 56 \yskip\item{1)} Case statements have a default. 57 \item{2)} Input-output routines may need to be adapted for use with a particular 58 character set and/or for printing messages on the user's terminal. 59 60 \yskip\noindent 61 These features are also present in the \PASCAL\ version of \TeX, where they 62 are used in a similar (but more complex) way. System-dependent portions 63 of \.{TANGLE} can be identified by looking at the entries for `system 64 dependencies' in the index below. 65 @!@^system dependencies@> 66 67 The ``banner line'' defined here should be changed whenever \.{TANGLE} 68 is modified. 69 70 @d banner=='This is TANGLE, Version 4.6' 71 72 @ The program begins with a fairly normal header, made up of pieces that 73 @^system dependencies@> 74 will mostly be filled in later. The \.{WEB} input comes from files |web_file| 75 and |change_file|, the \PASCAL\ output goes to file |Pascal_file|, 76 and the string pool output goes to file |pool|. 77 78 If it is necessary to abort the job because of a fatal error, the program 79 calls the `|jump_out|' procedure, which goes to the label |end_of_TANGLE|. 80 81 @d end_of_TANGLE = 9999 {go here to wrap it up} 82 83 @p @t\4@>@<Compiler directives@>@/ 84 program TANGLE(@!web_file,@!change_file,@!Pascal_file,@!pool); 85 label end_of_TANGLE; {go here to finish} 86 const @<Constants in the outer block@>@/ 87 type @<Types in the outer block@>@/ 88 var @<Globals in the outer block@>@/ 89 @<Error handling procedures@>@/ 90 procedure initialize; 91 var @<Local variables for initialization@>@/ 92 begin @<Set initial values@>@/ 93 end; 94 95 @ Some of this code is optional for use when debugging only; 96 such material is enclosed between the delimiters |debug| and $|gubed|$. 97 Other parts, delimited by |stat| and $|tats|$, are optionally included if 98 statistics about \.{TANGLE}'s memory usage are desired. 99 100 @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} 101 @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} 102 @f debug==begin 103 @f gubed==end 104 @# 105 @d stat==@{ {change this to `$\\{stat}\equiv\null$' 106 when gathering usage statistics} 107 @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' 108 when gathering usage statistics} 109 @f stat==begin 110 @f tats==end 111 112 @ The \PASCAL\ compiler used to develop this system has ``compiler 113 directives'' that can appear in comments whose first character is a dollar sign. 114 In production versions of \.{TANGLE} these directives tell the compiler that 115 @^system dependencies@> 116 it is safe to avoid range checks and to leave out the extra code it inserts 117 for the \PASCAL\ debugger's benefit, although interrupts will occur if 118 there is arithmetic overflow. 119 120 @<Compiler directives@>= 121 @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} 122 @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} 123 124 @ Labels are given symbolic names by the following definitions. We insert 125 the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a 126 procedure in which we have used the `|return|' statement defined below; 127 the label `|restart|' is occasionally used at the very beginning of a 128 procedure; and the label `|reswitch|' is occasionally used just prior to 129 a \&{case} statement in which some cases change the conditions and we wish to 130 branch to the newly applicable case. 131 Loops that are set up with the \&{loop} construction defined below are 132 commonly exited by going to `|done|' or to `|found|' or to `|not_found|', 133 and they are sometimes repeated by going to `|continue|'. 134 135 @d exit=10 {go here to leave a procedure} 136 @d restart=20 {go here to start a procedure again} 137 @d reswitch=21 {go here to start a case statement again} 138 @d continue=22 {go here to resume a loop} 139 @d done=30 {go here to exit a loop} 140 @d found=31 {go here when you've found it} 141 @d not_found=32 {go here when you've found something else} 142 143 @ Here are some macros for common programming idioms. 144 145 @d incr(#) == #:=#+1 {increase a variable by unity} 146 @d decr(#) == #:=#-1 {decrease a variable by unity} 147 @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} 148 @d do_nothing == {empty statement} 149 @d return == goto exit {terminate a procedure call} 150 @f return == nil 151 @f loop == xclause 152 153 @ We assume that |case| statements may include a default case that applies 154 if no matching label is found. Thus, we shall use constructions like 155 @^system dependencies@> 156 $$\vbox{\halign{#\hfil\cr 157 |case x of|\cr 158 1: $\langle\,$code for $x=1\,\rangle$;\cr 159 3: $\langle\,$code for $x=3\,\rangle$;\cr 160 |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr 161 |endcases|\cr}}$$ 162 since most \PASCAL\ compilers have plugged this hole in the language by 163 incorporating some sort of default mechanism. For example, the compiler 164 used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label, 165 and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or 166 `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| 167 and |endcases| should be changed to agree with local conventions. (Of 168 course, if no default mechanism is available, the |case| statements of 169 this program must be extended by listing all remaining cases. The author 170 would have taken the trouble to modify \.{TANGLE} so that such extensions 171 were done automatically, if he had not wanted to encourage \PASCAL\ 172 compiler writers to make this important change in \PASCAL, where it belongs.) 173 174 @d othercases == others: {default for cases not listed explicitly} 175 @d endcases == @+end {follows the default case in an extended |case| statement} 176 @f othercases == else 177 @f endcases == end 178 179 @ The following parameters are set big enough to handle \TeX, so they 180 should be sufficient for most applications of \.{TANGLE}. 181 182 @<Constants...@>= 183 @!buf_size=100; {maximum length of input line} 184 @!max_bytes=45000; {|1/ww| times the number of bytes in identifiers, 185 strings, and module names; must be less than 65536} 186 @!max_toks=65000; {|1/zz| times the number of bytes in compressed \PASCAL\ code; 187 must be less than 65536} 188 @!max_names=4000; {number of identifiers, strings, module names; 189 must be less than 10240} 190 @!max_texts=2000; {number of replacement texts, must be less than 10240} 191 @!hash_size=353; {should be prime} 192 @!longest_name=400; {module names shouldn't be longer than this} 193 @!line_length=72; {lines of \PASCAL\ output have at most this many characters} 194 @!out_buf_size=144; {length of output buffer, should be twice |line_length|} 195 @!stack_size=50; {number of simultaneous levels of macro expansion} 196 @!max_id_length=12; {long identifiers are chopped to this length, which must 197 not exceed |line_length|} 198 @!unambig_length=7; {identifiers must be unique if chopped to this length} 199 {note that 7 is more strict than \PASCAL's 8, but this can be varied} 200 201 @ A global variable called |history| will contain one of four values 202 at the end of every run: |spotless| means that no unusual messages were 203 printed; |harmless_message| means that a message of possible interest 204 was printed but no serious errors were detected; |error_message| means that 205 at least one error was found; |fatal_message| means that the program 206 terminated abnormally. The value of |history| does not influence the 207 behavior of the program; it is simply computed for the convenience 208 of systems that might want to use such information. 209 210 @d spotless=0 {|history| value for normal jobs} 211 @d harmless_message=1 {|history| value when non-serious info was printed} 212 @d error_message=2 {|history| value when an error was noted} 213 @d fatal_message=3 {|history| value when we had to stop prematurely} 214 @# 215 @d mark_harmless==@t@>@+if history=spotless then history:=harmless_message 216 @d mark_error==history:=error_message 217 @d mark_fatal==history:=fatal_message 218 219 @<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?} 220 221 @ @<Set init...@>=history:=spotless; 222 223 @* The character set. 224 One of the main goals in the design of \.{WEB} has been to make it readily 225 portable between a wide variety of computers. Yet \.{WEB} by its very 226 nature must use a greater variety of characters than most computer 227 programs deal with, and character encoding is one of the areas in which 228 existing machines differ most widely from each other. 229 230 To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted 231 to an internal eight-bit code that is essentially standard ASCII, the ``American 232 Standard Code for Information Interchange.'' The conversion is done 233 immediately when each character is read in. Conversely, characters are 234 converted from ASCII to the user's external representation just before 235 they are output. (The original ASCII code was seven bits only; \.{WEB} now 236 allows eight bits in an attempt to keep up with modern times.) 237 238 Such an internal code is relevant to users of \.{WEB} only because it is 239 the code used for preprocessed constants like \.{"A"}. If you are writing 240 a program in \.{WEB} that makes use of such one-character constants, you 241 should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do. 242 Otherwise \.{WEB}'s internal coding scheme does not affect you. 243 @^ASCII code@> 244 245 Here is a table of the standard visible ASCII codes: 246 $$\def\:{\char\count255\global\advance\count255 by 1} 247 \count255='40 248 \vbox{ 249 \hbox{\hbox to 40pt{\it\hfill0\/\hfill}% 250 \hbox to 40pt{\it\hfill1\/\hfill}% 251 \hbox to 40pt{\it\hfill2\/\hfill}% 252 \hbox to 40pt{\it\hfill3\/\hfill}% 253 \hbox to 40pt{\it\hfill4\/\hfill}% 254 \hbox to 40pt{\it\hfill5\/\hfill}% 255 \hbox to 40pt{\it\hfill6\/\hfill}% 256 \hbox to 40pt{\it\hfill7\/\hfill}} 257 \vskip 4pt 258 \hrule 259 \def\^{\vrule height 10.5pt depth 4.5pt} 260 \halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^ 261 \hbox to 40pt{\tt\hfill#\hfill\^}& 262 &\hbox to 40pt{\tt\hfill#\hfill\^}\cr 263 04&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 264 05&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 265 06&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 266 07&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 267 10&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 268 11&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 269 12&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 270 13&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 271 14&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 272 15&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 273 16&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 274 17&\:&\:&\:&\:&\:&\:&\:\cr} 275 \hrule width 280pt}$$ 276 (Actually, of course, code @'040 is an invisible blank space.) Code @'136 277 was once an upward arrow (\.{\char'13}), and code @'137 was 278 once a left arrow (\.^^X), in olden times when the first draft 279 of ASCII code was prepared; but \.{WEB} works with today's standard 280 ASCII in which those codes represent circumflex and underline as shown. 281 282 @<Types...@>= 283 @!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers} 284 285 @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit 286 character sets were common, so it did not make provision for lowercase 287 letters. Nowadays, of course, we need to deal with both capital and small 288 letters in a convenient way, so \.{WEB} assumes that it is being used 289 with a \PASCAL\ whose character set contains at least the characters of 290 standard ASCII as listed above. Some \PASCAL\ compilers use the original 291 name |char| for the data type associated with the characters in text files, 292 while other \PASCAL s consider |char| to be a 64-element subrange of a larger 293 data type that has some other name. 294 295 In order to accommodate this difference, we shall use the name |text_char| 296 to stand for the data type of the characters in the input and output 297 files. We shall also assume that |text_char| consists of the elements 298 |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The 299 following definitions should be adjusted if necessary. 300 @^system dependencies@> 301 302 @d text_char == char {the data type of characters in text files} 303 @d first_text_char=0 {ordinal number of the smallest element of |text_char|} 304 @d last_text_char=255 {ordinal number of the largest element of |text_char|} 305 306 @<Types...@>= 307 @!text_file=packed file of text_char; 308 309 @ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and 310 the user's external character set by means of arrays |xord| and |xchr| 311 that are analogous to \PASCAL's |ord| and |chr| functions. 312 313 @<Globals...@>= 314 @!xord: array [text_char] of ASCII_code; 315 {specifies conversion of input characters} 316 @!xchr: array [ASCII_code] of text_char; 317 {specifies conversion of output characters} 318 319 @ If we assume that every system using \.{WEB} is able to read and write the 320 visible characters of standard ASCII (although not necessarily using the 321 ASCII codes to represent them), the following assignment statements initialize 322 most of the |xchr| array properly, without needing any system-dependent 323 changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears 324 in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code 325 on the external medium on which it resides, but \.{TANGLE} will convert from 326 this external code to ASCII and back again. Therefore the assignment 327 statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file, 328 and \PASCAL\ will compile this statement so that |xchr[65]| receives the 329 character \.A in the external (|char|) code. Note that it would be quite 330 incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of 331 type |integer|, not |char|, and because we have $|"A"|=65$ regardless of 332 the external character set. 333 334 @<Set init...@>= 335 xchr[@'40]:=' '; 336 xchr[@'41]:='!'; 337 xchr[@'42]:='"'; 338 xchr[@'43]:='#'; 339 xchr[@'44]:='$'; 340 xchr[@'45]:='%'; 341 xchr[@'46]:='&'; 342 xchr[@'47]:='''';@/ 343 xchr[@'50]:='('; 344 xchr[@'51]:=')'; 345 xchr[@'52]:='*'; 346 xchr[@'53]:='+'; 347 xchr[@'54]:=','; 348 xchr[@'55]:='-'; 349 xchr[@'56]:='.'; 350 xchr[@'57]:='/';@/ 351 xchr[@'60]:='0'; 352 xchr[@'61]:='1'; 353 xchr[@'62]:='2'; 354 xchr[@'63]:='3'; 355 xchr[@'64]:='4'; 356 xchr[@'65]:='5'; 357 xchr[@'66]:='6'; 358 xchr[@'67]:='7';@/ 359 xchr[@'70]:='8'; 360 xchr[@'71]:='9'; 361 xchr[@'72]:=':'; 362 xchr[@'73]:=';'; 363 xchr[@'74]:='<'; 364 xchr[@'75]:='='; 365 xchr[@'76]:='>'; 366 xchr[@'77]:='?';@/ 367 xchr[@'100]:='@@'; 368 xchr[@'101]:='A'; 369 xchr[@'102]:='B'; 370 xchr[@'103]:='C'; 371 xchr[@'104]:='D'; 372 xchr[@'105]:='E'; 373 xchr[@'106]:='F'; 374 xchr[@'107]:='G';@/ 375 xchr[@'110]:='H'; 376 xchr[@'111]:='I'; 377 xchr[@'112]:='J'; 378 xchr[@'113]:='K'; 379 xchr[@'114]:='L'; 380 xchr[@'115]:='M'; 381 xchr[@'116]:='N'; 382 xchr[@'117]:='O';@/ 383 xchr[@'120]:='P'; 384 xchr[@'121]:='Q'; 385 xchr[@'122]:='R'; 386 xchr[@'123]:='S'; 387 xchr[@'124]:='T'; 388 xchr[@'125]:='U'; 389 xchr[@'126]:='V'; 390 xchr[@'127]:='W';@/ 391 xchr[@'130]:='X'; 392 xchr[@'131]:='Y'; 393 xchr[@'132]:='Z'; 394 xchr[@'133]:='['; 395 xchr[@'134]:='\'; 396 xchr[@'135]:=']'; 397 xchr[@'136]:='^'; 398 xchr[@'137]:='_';@/ 399 xchr[@'140]:='`'; 400 xchr[@'141]:='a'; 401 xchr[@'142]:='b'; 402 xchr[@'143]:='c'; 403 xchr[@'144]:='d'; 404 xchr[@'145]:='e'; 405 xchr[@'146]:='f'; 406 xchr[@'147]:='g';@/ 407 xchr[@'150]:='h'; 408 xchr[@'151]:='i'; 409 xchr[@'152]:='j'; 410 xchr[@'153]:='k'; 411 xchr[@'154]:='l'; 412 xchr[@'155]:='m'; 413 xchr[@'156]:='n'; 414 xchr[@'157]:='o';@/ 415 xchr[@'160]:='p'; 416 xchr[@'161]:='q'; 417 xchr[@'162]:='r'; 418 xchr[@'163]:='s'; 419 xchr[@'164]:='t'; 420 xchr[@'165]:='u'; 421 xchr[@'166]:='v'; 422 xchr[@'167]:='w';@/ 423 xchr[@'170]:='x'; 424 xchr[@'171]:='y'; 425 xchr[@'172]:='z'; 426 xchr[@'173]:='{'; 427 xchr[@'174]:='|'; 428 xchr[@'175]:='}'; 429 xchr[@'176]:='~';@/ 430 xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used} 431 432 @ Some of the ASCII codes below @'40 have been given symbolic names in 433 \.{WEAVE} and \.{TANGLE} because they are used with a special meaning. 434 435 @d and_sign=@'4 {equivalent to `\.{and}'} 436 @d not_sign=@'5 {equivalent to `\.{not}'} 437 @d set_element_sign=@'6 {equivalent to `\.{in}'} 438 @d tab_mark=@'11 {ASCII code used as tab-skip} 439 @d line_feed=@'12 {ASCII code thrown away at end of line} 440 @d form_feed=@'14 {ASCII code used at end of page} 441 @d carriage_return=@'15 {ASCII code used at end of line} 442 @d left_arrow=@'30 {equivalent to `\.{:=}'} 443 @d not_equal=@'32 {equivalent to `\.{<>}'} 444 @d less_or_equal=@'34 {equivalent to `\.{<=}'} 445 @d greater_or_equal=@'35 {equivalent to `\.{>=}'} 446 @d equivalence_sign=@'36 {equivalent to `\.{==}'} 447 @d or_sign=@'37 {equivalent to `\.{or}'} 448 449 @ When we initialize the |xord| array and the remaining parts of |xchr|, 450 it will be convenient to make use of an index variable, |i|. 451 452 @<Local variables for init...@>= 453 @!i:0..255; 454 455 @ Here now is the system-dependent part of the character set. 456 If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which 457 only standard ASCII codes will appear in the input and output files, you 458 don't need to make any changes here. But if you have, for example, an extended 459 character set like the one in Appendix~C of {\sl The \TeX book}, the first 460 line of code in this module should be changed to 461 $$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$ 462 \.{WEB}'s character set is essentially identical to \TeX's, even with respect to 463 characters less than @'40. 464 @^system dependencies@> 465 466 Changes to the present module will make \.{WEB} more friendly on computers 467 that have an extended character set, so that one can type things like 468 \.^^Z\ instead of \.{<>}. If you have an extended set of characters that 469 are easily incorporated into text files, you can assign codes arbitrarily 470 here, giving an |xchr| equivalent to whatever characters the users of 471 \.{WEB} are allowed to have in their input files, provided that unsuitable 472 characters do not correspond to special codes like |carriage_return| 473 that are listed above. 474 475 (The present file \.{TANGLE.WEB} does not contain any of the non-ASCII 476 characters, because it is intended to be used with all implementations of 477 \.{WEB}. It was originally created on a Stanford system that has a 478 convenient extended character set, then ``sanitized'' by applying another 479 program that transliterated all of the non-standard characters into 480 standard equivalents.) 481 482 @<Set init...@>= 483 for i:=1 to @'37 do xchr[i]:=' '; 484 for i:=@'200 to @'377 do xchr[i]:=' '; 485 486 @ The following system-independent code makes the |xord| array contain a 487 suitable inverse to the information in |xchr|. 488 489 @<Set init...@>= 490 for i:=first_text_char to last_text_char do xord[chr(i)]:=" "; 491 for i:=1 to @'377 do xord[xchr[i]]:=i; 492 xord[' ']:=" "; 493 494 @* Input and output. 495 The input conventions of this program are intended to be very much like those 496 of \TeX\ (except, of course, that they are much simpler, because much less 497 needs to be done). Furthermore they are identical to those of \.{WEAVE}. 498 Therefore people who need to make modifications to all three systems 499 should be able to do so without too many headaches. 500 501 We use the standard \PASCAL\ input/output procedures in several places that 502 \TeX\ cannot, since \.{TANGLE} does not have to deal with files that are named 503 dynamically by the user, and since there is no input from the terminal. 504 505 @ Terminal output is done by writing on file |term_out|, which is assumed to 506 consist of characters of type |text_char|: 507 @^system dependencies@> 508 509 @d print(#)==write(term_out,#) {`|print|' means write on the terminal} 510 @d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line} 511 @d new_line==write_ln(term_out) {start new line} 512 @d print_nl(#)== {print information starting on a new line} 513 begin new_line; print(#); 514 end 515 516 @<Globals...@>= 517 @!term_out:text_file; {the terminal as an output file} 518 519 @ Different systems have different ways of specifying that the output on a 520 certain file will appear on the user's terminal. Here is one way to do this 521 on the \PASCAL\ system that was used in \.{TANGLE}'s initial development: 522 @^system dependencies@> 523 524 @<Set init...@>= 525 rewrite(term_out,'TTY:'); {send |term_out| output to the terminal} 526 527 @ The |update_terminal| procedure is called when we want 528 to make sure that everything we have output to the terminal so far has 529 actually left the computer's internal buffers and been sent. 530 @^system dependencies@> 531 532 @d update_terminal == break(term_out) {empty the terminal output buffer} 533 534 @ The main input comes from |web_file|; this input may be overridden 535 by changes in |change_file|. (If |change_file| is empty, there are no changes.) 536 537 @<Globals...@>= 538 @!web_file:text_file; {primary input} 539 @!change_file:text_file; {updates} 540 541 @ The following code opens the input files. Since these files were listed 542 in the program header, we assume that the \PASCAL\ runtime system has 543 already checked that suitable file names have been given; therefore no 544 additional error checking needs to be done. 545 @^system dependencies@> 546 547 @p procedure open_input; {prepare to read |web_file| and |change_file|} 548 begin reset(web_file); reset(change_file); 549 end; 550 551 @ The main output goes to |Pascal_file|, and string pool constants are 552 written to the |pool| file. 553 554 @<Globals...@>= 555 @!Pascal_file: text_file; 556 @!pool: text_file; 557 558 @ The following code opens |Pascal_file| and |pool|. 559 Since these files were listed in the program header, we assume that the 560 \PASCAL\ runtime system has checked that suitable external file names have 561 been given. 562 @^system dependencies@> 563 564 @<Set init...@>= 565 rewrite(Pascal_file); rewrite(pool); 566 567 @ Input goes into an array called |buffer|. 568 569 @<Globals...@>=@!buffer: array[0..buf_size] of ASCII_code; 570 571 @ The |input_ln| procedure brings the next line of input from the specified 572 file into the |buffer| array and returns the value |true|, unless the file has 573 already been entirely read, in which case it returns |false|. The conventions 574 of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line 575 of the file are input into |buffer[0]|, |buffer[1]|, \dots, 576 |buffer[limit-1]|; trailing blanks are ignored; 577 and the global variable |limit| is set to the length of the 578 @^system dependencies@> 579 line. The value of |limit| must be strictly less than |buf_size|. 580 581 We assume that none of the |ASCII_code| values 582 of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|, 583 or |carriage_return|. 584 585 @p function input_ln(var f:text_file):boolean; 586 {inputs a line or returns |false|} 587 var final_limit:0..buf_size; {|limit| without trailing blanks} 588 begin limit:=0; final_limit:=0; 589 if eof(f) then input_ln:=false 590 else begin while not eoln(f) do 591 begin buffer[limit]:=xord[f^]; get(f); 592 incr(limit); 593 if buffer[limit-1]<>" " then final_limit:=limit; 594 if limit=buf_size then 595 begin while not eoln(f) do get(f); 596 decr(limit); {keep |buffer[buf_size]| empty} 597 if final_limit>limit then final_limit:=limit; 598 print_nl('! Input line too long'); loc:=0; error; 599 @.Input line too long@> 600 end; 601 end; 602 read_ln(f); limit:=final_limit; input_ln:=true; 603 end; 604 end; 605 606 @* Reporting errors to the user. 607 The \.{TANGLE} processor operates in two phases: first it inputs the source 608 file and stores a compressed representation of the program, then it produces 609 the \PASCAL\ output from the compressed representation. 610 611 The global variable |phase_one| tells whether we are in Phase I or not. 612 613 @<Globals...@>= 614 @!phase_one: boolean; {|true| in Phase I, |false| in Phase II} 615 616 @ If an error is detected while we are debugging, 617 we usually want to look at the contents of memory. 618 A special procedure will be declared later for this purpose. 619 620 @<Error handling...@>= 621 @!debug @+ procedure debug_help; forward;@+ gubed 622 623 @ During the first phase, syntax errors are reported to the user by saying 624 $$\hbox{`|err_print('! Error message')|'},$$ 625 followed by `|jump_out|' if no recovery from the error is provided. 626 This will print the error message followed by an indication of where the error 627 was spotted in the source file. Note that no period follows the error message, 628 since the error routine will automatically supply a period. 629 630 Errors that are noticed during the second phase are reported to the user 631 in the same fashion, but the error message will be 632 followed by an indication of where the error was spotted in the output file. 633 634 The actual error indications are provided by a procedure called |error|. 635 636 @d err_print(#)==begin new_line; print(#); error; 637 end 638 639 @<Error handling...@>= 640 procedure error; {prints '\..' and location of error message} 641 var j: 0..out_buf_size; {index into |out_buf|} 642 @!k,@!l: 0..buf_size; {indices into |buffer|} 643 begin if phase_one then @<Print error location based on input buffer@> 644 else @<Print error location based on output buffer@>; 645 update_terminal; mark_error; 646 @!debug debug_skipped:=debug_cycle; debug_help;@+gubed 647 end; 648 649 @ The error locations during Phase I can be indicated by using the global 650 variables |loc|, |line|, and |changing|, which tell respectively the first 651 unlooked-at position in |buffer|, the current line number, and whether or not 652 the current line is from |change_file| or |web_file|. 653 This routine should be modified on systems whose standard text editor 654 has special line-numbering conventions. 655 @^system dependencies@> 656 657 @<Print error location based on input buffer@>= 658 begin if changing then print('. (change file ')@+else print('. ('); 659 print_ln('l.', line:1, ')'); 660 if loc>=limit then l:=limit else l:=loc; 661 for k:=1 to l do 662 if buffer[k-1]=tab_mark then print(' ') 663 else print(xchr[buffer[k-1]]); {print the characters already read} 664 new_line; 665 for k:=1 to l do print(' '); {space out the next line} 666 for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read} 667 print(' '); {this space separates the message from future asterisks} 668 end 669 670 @ The position of errors detected during the second phase can be indicated 671 by outputting the partially-filled output buffer, which contains |out_ptr| 672 entries. 673 674 @<Print error location based on output...@>= 675 begin print_ln('. (l.',line:1,')'); 676 for j:=1 to out_ptr do print(xchr[out_buf[j-1]]); {print current partial line} 677 print('... '); {indicate that this information is partial} 678 end 679 680 @ The |jump_out| procedure just cuts across all active procedure levels 681 and jumps out of the program. This is the only non-local |goto| statement 682 in \.{TANGLE}. It is used when no recovery from a particular error has 683 been provided. 684 685 Some \PASCAL\ compilers do not implement non-local |goto| statements. 686 @^system dependencies@> 687 In such cases the code that appears at label |end_of_TANGLE| should be 688 copied into the |jump_out| procedure, followed by a call to a system procedure 689 that terminates the program. 690 691 @d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out; 692 end 693 694 @<Error handling...@>= 695 procedure jump_out; 696 begin goto end_of_TANGLE; 697 end; 698 699 @ Sometimes the program's behavior is far different from what it should be, 700 and \.{TANGLE} prints an error message that is really for the \.{TANGLE} 701 maintenance person, not the user. In such cases the program says 702 |confusion('indication of where we are')|. 703 704 @d confusion(#)==fatal_error('! This can''t happen (',#,')') 705 @.This can't happen@> 706 707 @ An overflow stop occurs if \.{TANGLE}'s tables aren't large enough. 708 709 @d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded') 710 @.Sorry, x capacity exceeded@> 711 712 713 @* Data structures. 714 Most of the user's \PASCAL\ code is packed into eight-bit integers 715 in two large arrays called |byte_mem| and |tok_mem|. 716 The |byte_mem| array holds the names of identifiers, strings, and modules; 717 the |tok_mem| array holds the replacement texts 718 for macros and modules. Allocation is sequential, since things are deleted only 719 during Phase II, and only in a last-in-first-out manner. 720 721 Auxiliary arrays |byte_start| and |tok_start| are used as directories to 722 |byte_mem| and |tok_mem|, and the |link|, |ilk|, |equiv|, and |text_link| 723 arrays give further information about names. These auxiliary arrays 724 consist of sixteen-bit items. 725 726 @<Types...@>= 727 @!eight_bits=0..255; {unsigned one-byte quantity} 728 @!sixteen_bits=0..65535; {unsigned two-byte quantity} 729 730 @ \.{TANGLE} has been designed to avoid the need for indices that are more 731 than sixteen bits wide, so that it can be used on most computers. But 732 there are programs that need more than 65536 tokens, and some programs 733 even need more than 65536 bytes; \TeX\ is one of these. To get around 734 this problem, a slight complication has been added to the data structures: 735 |byte_mem| and |tok_mem| are two-dimensional arrays, whose first index is 736 either 0 or 1 or 2. (For generality, the first index is actually allowed to run 737 between 0 and |ww-1| in |byte_mem|, or between 0 and |zz-1| in |tok_mem|, 738 where |ww| and |zz| are set to 2 and~3; the program will work for any 739 positive values of |ww| and |zz|, and it can be simplified in obvious ways 740 if |ww=1| or |zz=1|.) 741 742 @d ww=2 {we multiply the byte capacity by approximately this amount} 743 @d zz=3 {we multiply the token capacity by approximately this amount} 744 745 @<Globals...@>= 746 @!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code; 747 {characters of names} 748 @!tok_mem: packed array [0..zz-1,0..max_toks] of eight_bits; {tokens} 749 @!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|} 750 @!tok_start: array [0..max_texts] of sixteen_bits; {directory into |tok_mem|} 751 @!link: array [0..max_names] of sixteen_bits; {hash table or tree links} 752 @!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links} 753 @!equiv: array [0..max_names] of sixteen_bits; {info corresponding to names} 754 @!text_link: array [0..max_texts] of sixteen_bits; {relates replacement texts} 755 756 @ The names of identifiers are found by computing a hash address |h| and 757 then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|, 758 |link[link[hash[h]]]|, \dots, until either finding the desired name 759 or encountering a zero. 760 761 A `|name_pointer|' variable, which signifies a name, is an index into 762 |byte_start|. The actual sequence of characters in the name pointed to by 763 |p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive, 764 in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when 765 |ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]| 766 and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|. 767 The pointer 0 is used for undefined module names; we don't 768 want to use it for the names of identifiers, since 0 stands for a null 769 pointer in a linked list. 770 771 Strings are treated like identifiers; the first character (a double-quote) 772 distinguishes a string from an alphabetic name, but for \.{TANGLE}'s purposes 773 strings behave like numeric macros. (A `string' here refers to the 774 strings delimited by double-quotes that \.{TANGLE} processes. \PASCAL\ 775 string constants delimited by single-quote marks are not given such special 776 treatment; they simply appear as sequences of characters in the \PASCAL\ 777 texts.) The total number of strings in the string 778 pool is called |string_ptr|, and the total number of names in |byte_mem| 779 is called |name_ptr|. The total number of bytes occupied in 780 |byte_mem[w,@t$*$@>]| is called |byte_ptr[w]|. 781 782 We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]| 783 for |0<=w<ww|, since these are the starting positions for the next |ww| 784 names to be stored in |byte_mem|. 785 786 @d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name} 787 788 @<Types...@>= 789 @!name_pointer=0..max_names; {identifies a name} 790 791 @ @<Global...@>= 792 @!name_ptr:name_pointer; {first unused position in |byte_start|} 793 @!string_ptr:name_pointer; {next number to be given to a string of length |<>1|} 794 @!byte_ptr:array [0..ww-1] of 0..max_bytes; 795 {first unused position in |byte_mem|} 796 @!pool_check_sum:integer; {sort of a hash for the whole string pool} 797 798 @ @<Local variables for init...@>= 799 @!wi: 0..ww-1; {to initialize the |byte_mem| indices} 800 801 @ @<Set init...@>= 802 for wi:=0 to ww-1 do 803 begin byte_start[wi]:=0; byte_ptr[wi]:=0; 804 end; 805 byte_start[ww]:=0; {this makes name 0 of length zero} 806 name_ptr:=1; string_ptr:=256; pool_check_sum:=271828; 807 808 @ Replacement texts are stored in |tok_mem|, using similar conventions. 809 A `|text_pointer|' variable is an index into |tok_start|, and the 810 replacement text that corresponds to |p| runs from positions 811 |tok_start[p]| to |tok_start[p+zz]-1|, inclusive, in the segment of 812 |tok_mem| whose first index is |p mod zz|. Thus, when |zz=2| the 813 even-numbered replacement texts appear in |tok_mem[0,@t$*$@>]| and the 814 odd-numbered ones appear in |tok_mem[1,@t$*$@>]|. Furthermore, 815 |text_link[p]| is used to connect pieces of text that have the same name, 816 as we shall see later. The pointer 0 is used for undefined replacement 817 texts. 818 819 The first position of |tok_mem[z,@t$*$@>]| that is unoccupied by 820 replacement text is called |tok_ptr[z]|, and the first unused location of 821 |tok_start| is called |text_ptr|. We usually have the identity 822 |tok_start[text_ptr+z]=tok_ptr[(text_ptr+z) mod zz]|, for |0<=z<zz|, since 823 these are the starting positions for the next |zz| replacement texts to 824 be stored in |tok_mem|. 825 826 @<Types...@>= 827 @!text_pointer=0..max_texts; {identifies a replacement text} 828 829 @ It is convenient to maintain a variable |z| that is equal to |text_ptr 830 mod zz|, so that we always insert tokens into segment |z| of |tok_mem|. 831 832 @<Glob...@>= 833 @t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|} 834 @t\hskip1em@>@!tok_ptr:array[0..zz-1] of 0..max_toks; 835 {first unused position in a given segment of |tok_mem|} 836 @t\hskip1em@>@!z:0..zz-1; {current segment of |tok_mem|} 837 stat @!max_tok_ptr:array[0..zz-1] of 0..max_toks; 838 {largest values assumed by |tok_ptr|} 839 tats 840 841 @ @<Local variables for init...@>= 842 @!zi:0..zz-1; {to initialize the |tok_mem| indices} 843 844 @ @<Set init...@>= 845 for zi:=0 to zz-1 do 846 begin tok_start[zi]:=0; tok_ptr[zi]:=0; 847 end; 848 tok_start[zz]:=0; {this makes replacement text 0 of length zero} 849 text_ptr:=1; z:=1 mod zz; 850 851 @ Four types of identifiers are distinguished by their |ilk|: 852 853 \yskip\hang |normal| identifiers will appear in the \PASCAL\ program as 854 ordinary identifiers since they have not been defined to be macros; the 855 corresponding value in the |equiv| array 856 for such identifiers is a link in a secondary hash table that 857 is used to check whether any two of them agree in their first |unambig_length| 858 characters after underline symbols are removed and lowercase letters are 859 changed to uppercase. 860 861 \yskip\hang |numeric| identifiers have been defined to be numeric macros; 862 their |equiv| value contains the corresponding numeric value plus $2^{15}$. 863 Strings are treated as numeric macros. 864 865 \yskip\hang |simple| identifiers have been defined to be simple macros; 866 their |equiv| value points to the corresponding replacement text. 867 868 \yskip\hang |parametric| identifiers have been defined to be parametric macros; 869 like simple identifiers, their |equiv| value points to the replacement text. 870 871 @d normal=0 {ordinary identifiers have |normal| ilk} 872 @d numeric=1 {numeric macros and strings have |numeric| ilk} 873 @d simple=2 {simple macros have |simple| ilk} 874 @d parametric=3 {parametric macros have |parametric| ilk} 875 876 @ The names of modules are stored in |byte_mem| together 877 with the identifier names, but a hash table is not used for them because 878 \.{TANGLE} needs to be able to recognize a module name when given a prefix of 879 that name. A conventional binary search tree is used to retrieve module names, 880 with fields called |llink| and |rlink| in place of |link| and |ilk|. The 881 root of this tree is |rlink[0]|. If |p| is a pointer to a module name, 882 |equiv[p]| points to its replacement text, just as in simple and parametric 883 macros, unless this replacement text has not yet been defined (in which case 884 |equiv[p]=0|). 885 886 @d llink==link {left link in binary search tree for module names} 887 @d rlink==ilk {right link in binary search tree for module names} 888 889 @<Set init...@>= 890 rlink[0]:=0; {the binary search tree starts out with nothing in it} 891 equiv[0]:=0; {the undefined module has no replacement text} 892 893 @ Here is a little procedure that prints the text of a given name. 894 895 @p procedure print_id(@!p:name_pointer); {print identifier or module name} 896 var k:0..max_bytes; {index into |byte_mem|} 897 @!w:0..ww-1; {segment of |byte_mem|} 898 begin if p>=name_ptr then print('IMPOSSIBLE') 899 else begin w:=p mod ww; 900 for k:=byte_start[p] to byte_start[p+ww]-1 do print(xchr[byte_mem[w,k]]); 901 end; 902 end; 903 904 @* Searching for identifiers. 905 The hash table described above is updated by the |id_lookup| procedure, 906 which finds a given identifier and returns a pointer to its index in 907 |byte_start|. If the identifier was not already present, it is inserted with 908 a given |ilk| code; and an error message is printed if the identifier is being 909 doubly defined. 910 911 Because of the way \.{TANGLE}'s scanning mechanism works, it is most convenient 912 to let |id_lookup| search for an identifier that is present in the |buffer| 913 array. Two other global variables specify its position in the buffer: the 914 first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|. 915 Furthermore, if the identifier is really a string, the global variable 916 |double_chars| tells how many of the characters in the buffer appear 917 twice (namely \.{@@@@} and \.{""}), since this additional information makes 918 it easy to calculate the true length of the string. The final double-quote 919 of the string is not included in its ``identifier,'' but the first one is, 920 so the string length is |id_loc-id_first-double_chars-1|. 921 922 We have mentioned that |normal| identifiers belong to two hash tables, 923 one for their true names as they appear in the \.{WEB} file and the other 924 when they have been reduced to their first |unambig_length| characters. 925 The hash tables are kept by the method of simple chaining, where the 926 heads of the individual lists appear in the |hash| and |chop_hash| arrays. 927 If |h| is a hash code, the primary hash table list starts at |hash[h]| and 928 proceeds through |link| pointers; the secondary hash table list starts at 929 |chop_hash[h]| and proceeds through |equiv| pointers. Of course, the same 930 identifier will probably have two different values of |h|. 931 932 The |id_lookup| procedure uses an auxiliary array called |chopped_id| to 933 contain up to |unambig_length| characters of the current identifier, if 934 it is necessary to compute the secondary hash code. (This array could be 935 declared local to |id_lookup|, but in general we are making all array 936 declarations global in this program, because some compilers and some machine 937 architectures make dynamic array allocation inefficient.) 938 939 @<Glob...@>= 940 @!id_first:0..buf_size; {where the current identifier begins in the buffer} 941 @!id_loc:0..buf_size; {just after the current identifier in the buffer} 942 @!double_chars:0..buf_size; {correction to length in case of strings} 943 @# 944 @!hash,@!chop_hash:array [0..hash_size] of sixteen_bits; {heads of hash lists} 945 @!chopped_id:array [0..unambig_length] of ASCII_code; {chopped identifier} 946 947 @ Initially all the hash lists are empty. 948 949 @<Local variables for init...@>= 950 @!h:0..hash_size; {index into hash-head arrays} 951 952 @ @<Set init...@>= 953 for h:=0 to hash_size-1 do 954 begin hash[h]:=0; chop_hash[h]:=0; 955 end; 956 957 @ Here now is the main procedure for finding identifiers (and strings). 958 The parameter |t| is set to |normal| except when the identifier is 959 a macro name that is just being defined; in the latter case, |t| will be 960 |numeric|, |simple|, or |parametric|. 961 962 @p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier} 963 label found, not_found; 964 var c:eight_bits; {byte being chopped} 965 @!i:0..buf_size; {index into |buffer|} 966 @!h:0..hash_size; {hash code} 967 @!k:0..max_bytes; {index into |byte_mem|} 968 @!w:0..ww-1; {segment of |byte_mem|} 969 @!l:0..buf_size; {length of the given identifier} 970 @!p,@!q:name_pointer; {where the identifier is being sought} 971 @!s:0..unambig_length; {index into |chopped_id|} 972 begin l:=id_loc-id_first; {compute the length} 973 @<Compute the hash code |h|@>; 974 @<Compute the name location |p|@>; 975 if (p=name_ptr)or(t<>normal) then 976 @<Update the tables and check for possible errors@>; 977 id_lookup:=p; 978 end; 979 980 @ A simple hash code is used: If the sequence of 981 ASCII codes is $c_1c_2\ldots c_n$, its hash value will be 982 $$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$ 983 984 @<Compute the hash...@>= 985 h:=buffer[id_first]; i:=id_first+1; 986 while i<id_loc do 987 begin h:=(h+h+buffer[i]) mod hash_size; incr(i); 988 end 989 990 @ If the identifier is new, it will be placed in position |p=name_ptr|, 991 otherwise |p| will point to its existing location. 992 993 @<Compute the name location...@>= 994 p:=hash[h]; 995 while p<>0 do 996 begin if length(p)=l then 997 @<Compare name |p| with current identifier, |goto found| if equal@>; 998 p:=link[p]; 999 end; 1000 p:=name_ptr; {the current identifier is new} 1001 link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list} 1002 found: 1003 1004 @ @<Compare name |p|...@>= 1005 begin i:=id_first; k:=byte_start[p]; w:=p mod ww; 1006 while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do 1007 begin incr(i); incr(k); 1008 end; 1009 if i=id_loc then goto found; {all characters agree} 1010 end 1011 1012 @ @<Update the tables...@>= 1013 begin if ((p<>name_ptr)and(t<>normal)and(ilk[p]=normal)) or 1014 ((p=name_ptr)and(t=normal)and(buffer[id_first]<>"""")) then 1015 @<Compute the secondary hash code |h| and put the first characters 1016 into the auxiliary array |chopped_id|@>; 1017 if p<>name_ptr then 1018 @<Give double-definition error, if necessary, and change |p| to type |t|@> 1019 else @<Enter a new identifier into the table at position |p|@>; 1020 end 1021 1022 @ The following routine, which is called into play when it is necessary to 1023 look at the secondary hash table, computes the same hash function as before 1024 (but on the chopped data), and places a zero after the chopped identifier 1025 in |chopped_id| to serve as a convenient sentinel. 1026 1027 @<Compute the secondary...@>= 1028 begin i:=id_first; s:=0; h:=0; 1029 while (i<id_loc)and(s<unambig_length) do 1030 begin if buffer[i]<>"_" then 1031 begin if buffer[i]>="a" then chopped_id[s]:=buffer[i]-@'40 1032 else chopped_id[s]:=buffer[i]; 1033 h:=(h+h+chopped_id[s]) mod hash_size; incr(s); 1034 end; 1035 incr(i); 1036 end; 1037 chopped_id[s]:=0; 1038 end 1039 1040 @ If a nonnumeric macro has appeared before it was defined, \.{TANGLE} 1041 will still work all right; after all, such behavior is typical of the 1042 replacement texts for modules, which act very much like macros. 1043 However, an undefined numeric macro may not be used on the right-hand 1044 side of another numeric macro definition, so \.{TANGLE} finds it 1045 simplest to make a blanket rule that numeric macros should be defined 1046 before they are used. The following routine gives an error message and 1047 also fixes up any damage that may have been caused. 1048 1049 @<Give double...@>= {now |p<>name_ptr| and |t<>normal|} 1050 begin if ilk[p]=normal then 1051 begin if t=numeric then err_print('! This identifier has already appeared'); 1052 @.This identifier has already...@> 1053 @<Remove |p| from secondary hash table@>; 1054 end 1055 else err_print('! This identifier was defined before'); 1056 @.This identifier was defined...@> 1057 ilk[p]:=t; 1058 end 1059 1060 @ When we have to remove a secondary hash entry, because a |normal| identifier 1061 is changing to another |ilk|, the hash code |h| and chopped identifier have 1062 already been computed. 1063 1064 @<Remove |p| from secondary...@>= 1065 q:=chop_hash[h]; 1066 if q=p then chop_hash[h]:=equiv[p] 1067 else begin while equiv[q]<>p do q:=equiv[q]; 1068 equiv[q]:=equiv[p]; 1069 end 1070 1071 @ The following routine could make good use of a generalized |pack| procedure 1072 that puts items into just part of a packed array instead of the whole thing. 1073 1074 @<Enter a new identifier...@>= 1075 begin if (t=normal)and(buffer[id_first]<>"""") then 1076 @<Check for ambiguity and update secondary hash@>; 1077 w:=name_ptr mod ww; k:=byte_ptr[w]; 1078 if k+l>max_bytes then overflow('byte memory'); 1079 if name_ptr>max_names-ww then overflow('name'); 1080 i:=id_first; {get ready to move the identifier into |byte_mem|} 1081 while i<id_loc do 1082 begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i); 1083 end; 1084 byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr); 1085 if buffer[id_first]<>"""" then ilk[p]:=t 1086 else @<Define and output a new string of the pool@>; 1087 end 1088 1089 @ @<Check for ambig...@>= 1090 begin q:=chop_hash[h]; 1091 while q<>0 do 1092 begin @<Check if |q| conflicts with |p|@>; 1093 q:=equiv[q]; 1094 end; 1095 equiv[p]:=chop_hash[h]; chop_hash[h]:=p; {put |p| at front of secondary list} 1096 end 1097 1098 @ @<Check if |q| conflicts...@>= 1099 begin k:=byte_start[q]; s:=0; w:=q mod ww; 1100 while (k<byte_start[q+ww]) and (s<unambig_length) do 1101 begin c:=byte_mem[w,k]; 1102 if c<>"_" then 1103 begin if c>="a" then c:=c-@'40; {merge lowercase with uppercase} 1104 if chopped_id[s]<>c then goto not_found; 1105 incr(s); 1106 end; 1107 incr(k); 1108 end; 1109 if (k=byte_start[q+ww])and(chopped_id[s]<>0) then goto not_found; 1110 print_nl('! Identifier conflict with '); 1111 @.Identifier conflict...@> 1112 for k:=byte_start[q] to byte_start[q+ww]-1 do print(xchr[byte_mem[w,k]]); 1113 error; q:=0; {only one conflict will be printed, since |equiv[0]=0|} 1114 not_found: 1115 end 1116 1117 @ We compute the string pool check sum by working modulo a prime number 1118 that is large but not so large that overflow might occur. 1119 1120 @d check_sum_prime==@'3777777667 {$2^{29}-73$} 1121 @^preprocessed strings@> 1122 1123 @<Define and output a new string...@>= 1124 begin ilk[p]:=numeric; {strings are like numeric macros} 1125 if l-double_chars=2 then {this string is for a single character} 1126 equiv[p]:=buffer[id_first+1]+@'100000 1127 else begin equiv[p]:=string_ptr+@'100000; 1128 l:=l-double_chars-1; 1129 if l>99 then err_print('! Preprocessed string is too long'); 1130 @.Preprocessed string is too long@> 1131 incr(string_ptr); 1132 write(pool,xchr["0"+l div 10],xchr["0"+l mod 10]); {output the length} 1133 pool_check_sum:=pool_check_sum+pool_check_sum+l; 1134 while pool_check_sum>check_sum_prime do 1135 pool_check_sum:=pool_check_sum-check_sum_prime; 1136 i:=id_first+1; 1137 while i<id_loc do 1138 begin write(pool,xchr[buffer[i]]); {output characters of string} 1139 pool_check_sum:=pool_check_sum+pool_check_sum+buffer[i]; 1140 while pool_check_sum>check_sum_prime do 1141 pool_check_sum:=pool_check_sum-check_sum_prime; 1142 if (buffer[i]="""") or (buffer[i]="@@") then 1143 i:=i+2 {omit second appearance of doubled character} 1144 else incr(i); 1145 end; 1146 write_ln(pool); 1147 end; 1148 end 1149 1150 @* Searching for module names. 1151 The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the 1152 search tree, after inserting it if necessary, and returns a pointer to 1153 where it was found. 1154 1155 @<Glob...@>= 1156 @!mod_text:array [0..longest_name] of ASCII_code; {name being sought for} 1157 1158 @ According to the rules of \.{WEB}, no module name 1159 should be a proper prefix of another, so a ``clean'' comparison should 1160 occur between any two names. The result of |mod_lookup| is 0 if this 1161 prefix condition is violated. An error message is printed when such violations 1162 are detected during phase two of \.{WEAVE}. 1163 1164 @d less=0 {the first name is lexicographically less than the second} 1165 @d equal=1 {the first name is equal to the second} 1166 @d greater=2 {the first name is lexicographically greater than the second} 1167 @d prefix=3 {the first name is a proper prefix of the second} 1168 @d extension=4 {the first name is a proper extension of the second} 1169 1170 @p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name} 1171 label found; 1172 var c:less..extension; {comparison between two names} 1173 @!j:0..longest_name; {index into |mod_text|} 1174 @!k:0..max_bytes; {index into |byte_mem|} 1175 @!w:0..ww-1; {segment of |byte_mem|} 1176 @!p:name_pointer; {current node of the search tree} 1177 @!q:name_pointer; {father of node |p|} 1178 begin c:=greater; q:=0; p:=rlink[0]; {|rlink[0]| is the root of the tree} 1179 while p<>0 do 1180 begin @<Set \(|c| to the result of comparing the given name to 1181 name |p|@>; 1182 q:=p; 1183 if c=less then p:=llink[q] 1184 else if c=greater then p:=rlink[q] 1185 else goto found; 1186 end; 1187 @<Enter a new module name into the tree@>; 1188 found: if c<>equal then 1189 begin err_print('! Incompatible section names'); p:=0; 1190 @.Incompatible module names@> 1191 end; 1192 mod_lookup:=p; 1193 end; 1194 1195 @ @<Enter a new module name...@>= 1196 w:=name_ptr mod ww; k:=byte_ptr[w]; 1197 if k+l>max_bytes then overflow('byte memory'); 1198 if name_ptr>max_names-ww then overflow('name'); 1199 p:=name_ptr; 1200 if c=less then llink[q]:=p else rlink[q]:=p; 1201 llink[p]:=0; rlink[p]:=0; c:=equal; equiv[p]:=0; 1202 for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j]; 1203 byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr); 1204 1205 @ @<Set \(|c|...@>= 1206 begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1; 1207 while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do 1208 begin incr(k); incr(j); 1209 end; 1210 if k=byte_start[p+ww] then 1211 if j>l then c:=equal 1212 else c:=extension 1213 else if j>l then c:=prefix 1214 else if mod_text[j]<byte_mem[w,k] then c:=less 1215 else c:=greater; 1216 end 1217 1218 @ The |prefix_lookup| procedure is supposed to find exactly one module 1219 name that has |mod_text[1..l]| as a prefix. Actually the algorithm silently 1220 accepts also the situation that some module name is a prefix of 1221 |mod_text[1..l]|, because the user who painstakingly typed in more than 1222 necessary probably doesn't want to be told about the wasted effort. 1223 1224 @p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension} 1225 var c:less..extension; {comparison between two names} 1226 @!count:0..max_names; {the number of hits} 1227 @!j:0..longest_name; {index into |mod_text|} 1228 @!k:0..max_bytes; {index into |byte_mem|} 1229 @!w:0..ww-1; {segment of |byte_mem|} 1230 @!p:name_pointer; {current node of the search tree} 1231 @!q:name_pointer; {another place to resume the search after one branch is done} 1232 @!r:name_pointer; {extension found} 1233 begin q:=0; p:=rlink[0]; count:=0; r:=0; {begin search at root of tree} 1234 while p<>0 do 1235 begin @<Set \(|c|...@>; 1236 if c=less then p:=llink[p] 1237 else if c=greater then p:=rlink[p] 1238 else begin r:=p; incr(count); q:=rlink[p]; p:=llink[p]; 1239 end; 1240 if p=0 then 1241 begin p:=q; q:=0; 1242 end; 1243 end; 1244 if count<>1 then 1245 if count=0 then err_print('! Name does not match') 1246 @.Name does not match@> 1247 else err_print('! Ambiguous prefix'); 1248 @.Ambiguous prefix@> 1249 prefix_lookup:=r; {the result will be 0 if there was no match} 1250 end; 1251 1252 @* Tokens. 1253 Replacement texts, which represent \PASCAL\ code in a compressed format, 1254 appear in |tok_mem| as mentioned above. The codes in 1255 these texts are called `tokens'; some tokens occupy two consecutive 1256 eight-bit byte positions, and the others take just one byte. 1257 1258 If $p>0$ points to a replacement text, |tok_start[p]| is the |tok_mem| position 1259 of the first eight-bit code of that text. If |text_link[p]=0|, 1260 this is the replacement text for a macro, otherwise it is the replacement 1261 text for a module. In the latter case |text_link[p]| is either equal to 1262 |module_flag|, which means that there is no further text for this module, or 1263 |text_link[p]| points to a 1264 continuation of this replacement text; such links are created when 1265 several modules have \PASCAL\ texts with the same name, and they also 1266 tie together all the \PASCAL\ texts of unnamed modules. 1267 The replacement text pointer for the first unnamed module 1268 appears in |text_link[0]|, and the most recent such pointer is |last_unnamed|. 1269 1270 @d module_flag==max_texts {final |text_link| in module replacement texts} 1271 1272 @<Glob...@>= 1273 @!last_unnamed:text_pointer; {most recent replacement text of unnamed module} 1274 1275 @ @<Set init...@>= last_unnamed:=0; text_link[0]:=0; 1276 1277 @ If the first byte of a token is less than @'200, the token occupies a 1278 single byte. Otherwise we make a sixteen-bit token by combining two consecutive 1279 bytes |a| and |b|. If |@'200<=a<@'250|, then $(a-@'200)\times2^8+b$ points 1280 to an identifier; if |@'250<=a<@'320|, then 1281 $(a-@'250)\times2^8+b$ points to a module name; otherwise, i.e., if 1282 |@'320<=a<@'400|, then $(a-@'320)\times2^8+b$ is the number of the module 1283 in which the current replacement text appears. 1284 1285 Codes less than @'200 are 7-bit ASCII codes that represent themselves. 1286 In particular, a single-character identifier like `|x|' will be a one-byte 1287 token, while all longer identifiers will occupy two bytes. 1288 1289 Some of the 7-bit ASCII codes will not be present, however, so we can 1290 use them for special purposes. The following symbolic names are used: 1291 1292 \yskip\hang |param| denotes insertion of a parameter. This occurs only in 1293 the replacement texts of parametric macros, outside of single-quoted strings 1294 in those texts. 1295 1296 \hang |begin_comment| denotes \.{@@\{}, which will become either 1297 \.{\{} or \.{[}. 1298 1299 \hang |end_comment| denotes \.{@@\}}, which will become either 1300 \.{\}} or \.{]}. 1301 1302 \hang |octal| denotes the \.{@@\'} that precedes an octal constant. 1303 1304 \hang |hex| denotes the \.{@@"} that precedes a hexadecimal constant. 1305 1306 \hang |check_sum| denotes the \.{@@\char'44} that denotes the string pool 1307 check sum. 1308 1309 \hang |join| denotes the concatenation of adjacent items with no 1310 space or line breaks allowed between them (the \.{@@\&} operation of \.{WEB}). 1311 1312 \hang |double_dot| denotes `\.{..}' in \PASCAL. 1313 1314 \hang |verbatim| denotes the \.{@@=} that begins a verbatim \PASCAL\ string. 1315 The \.{@@>} at the end of such a string is also denoted by |verbatim|. 1316 1317 \hang |force_line| denotes the \.{@@\\} that forces a new line in the 1318 \PASCAL\ output. 1319 @^ASCII code@> 1320 1321 @d param=0 {ASCII null code will not appear} 1322 @d verbatim=@'2 {extended ASCII alpha should not appear} 1323 @d force_line=@'3 {extended ASCII beta should not appear} 1324 @d begin_comment=@'11 {ASCII tab mark will not appear} 1325 @d end_comment=@'12 {ASCII line feed will not appear} 1326 @d octal=@'14 {ASCII form feed will not appear} 1327 @d hex=@'15 {ASCII carriage return will not appear} 1328 @d double_dot=@'40 {ASCII space will not appear except in strings} 1329 @d check_sum=@'175 {will not be confused with right brace} 1330 @d join=@'177 {ASCII delete will not appear} 1331 1332 @ The following procedure is used to enter a two-byte value into 1333 |tok_mem| when a replacement text is being generated. 1334 1335 @p procedure store_two_bytes(@!x:sixteen_bits); 1336 {stores high byte, then low byte} 1337 begin if tok_ptr[z]+2>max_toks then overflow('token'); 1338 tok_mem[z,tok_ptr[z]]:=x div@'400; {this could be done by a shift command} 1339 tok_mem[z,tok_ptr[z]+1]:=x mod@'400; {this could be done by a logical and} 1340 tok_ptr[z]:=tok_ptr[z]+2; 1341 end; 1342 1343 @ When \.{TANGLE} is being operated in debug mode, it has a procedure to display 1344 a replacement text in symbolic form. This procedure has not been spruced up to 1345 generate a real great format, but at least the results are not as bad as 1346 a memory dump. 1347 1348 @p @!debug procedure print_repl(@!p:text_pointer); 1349 var k:0..max_toks; {index into |tok_mem|} 1350 @!a: sixteen_bits; {current byte(s)} 1351 @!zp: 0..zz-1; {segment of |tok_mem| being accessed} 1352 begin if p>=text_ptr then print('BAD') 1353 else begin k:=tok_start[p]; zp:=p mod zz; 1354 while k<tok_start[p+zz] do 1355 begin a:=tok_mem[zp,k]; 1356 if a>=@'200 then @<Display two-byte token starting with |a|@> 1357 else @<Display one-byte token |a|@>; 1358 incr(k); 1359 end; 1360 end; 1361 end; 1362 gubed 1363 1364 @ @<Display two-byte...@>= 1365 begin incr(k); 1366 if a<@'250 then {identifier or string} 1367 begin a:=(a-@'200)*@'400+tok_mem[zp,k]; print_id(a); 1368 if byte_mem[a mod ww,byte_start[a]]="""" then print('"') 1369 else print(' '); 1370 end 1371 else if a<@'320 then {module name} 1372 begin print('@@<'); print_id((a-@'250)*@'400+tok_mem[zp,k]); 1373 print('@@>'); 1374 end 1375 else begin a:=(a-@'320)*@'400+tok_mem[zp,k]; {module number} 1376 print('@@',xchr["{"],a:1,'@@',xchr["}"]); {can't use right brace 1377 between \&{debug} and \&{gubed}} 1378 end; 1379 end 1380 1381 @ @<Display one-byte...@>= 1382 case a of 1383 begin_comment: print('@@',xchr["{"]); 1384 end_comment: print('@@',xchr["}"]); {can't use right brace 1385 between \&{debug} and \&{gubed}} 1386 octal: print('@@'''); 1387 hex: print('@@"'); 1388 check_sum: print('@@$'); 1389 param: print('#'); 1390 "@@": print('@@@@'); 1391 verbatim: print('@@='); 1392 force_line: print('@@\'); 1393 othercases print(xchr[a]) 1394 endcases 1395 1396 @* Stacks for output. 1397 Let's make sure that our data structures contain enough information to 1398 produce the entire \PASCAL\ program as desired, by working next on the 1399 algorithms that actually do produce that program. 1400 1401 @ The output process uses a stack to keep track of what is going on at 1402 different ``levels'' as the macros are being expanded. 1403 Entries on this stack have five parts: 1404 1405 \yskip\hang |end_field| is the |tok_mem| location where the replacement 1406 text of a particular level will end; 1407 1408 \hang |byte_field| is the |tok_mem| location from which the next token 1409 on a particular level will be read; 1410 1411 \hang |name_field| points to the name corresponding to a particular level; 1412 1413 \hang |repl_field| points to the replacement text currently being read 1414 at a particular level; 1415 1416 \hang |mod_field| is the module number, or zero if this is a macro. 1417 1418 \yskip\noindent The current values of these five quantities are referred to 1419 quite frequently, so they are stored in a separate place instead of in 1420 the |stack| array. We call the current values |cur_end|, |cur_byte|, 1421 |cur_name|, |cur_repl|, and |cur_mod|. 1422 1423 The global variable |stack_ptr| tells how many levels of output are 1424 currently in progress. The end of all output occurs when the stack is 1425 empty, i.e., when |stack_ptr=0|. 1426 1427 @<Types...@>= 1428 @t\4@>@!output_state=record 1429 @!end_field: sixteen_bits; {ending location of replacement text} 1430 @!byte_field: sixteen_bits; {present location within replacement text} 1431 @!name_field: name_pointer; {|byte_start| index for text being output} 1432 @!repl_field: text_pointer; {|tok_start| index for text being output} 1433 @!mod_field: 0..@'27777; {module number or zero if not a module} 1434 end; 1435 1436 @ @d cur_end==cur_state.end_field {current ending location in |tok_mem|} 1437 @d cur_byte==cur_state.byte_field {location of next output byte in |tok_mem|} 1438 @d cur_name==cur_state.name_field {pointer to current name being expanded} 1439 @d cur_repl==cur_state.repl_field {pointer to current replacement text} 1440 @d cur_mod==cur_state.mod_field {current module number being expanded} 1441 1442 @<Globals...@>= 1443 @!cur_state : output_state; {|cur_end|, |cur_byte|, |cur_name|, 1444 |cur_repl|, |cur_mod|} 1445 @!stack : array [1..stack_size] of output_state; {info for non-current levels} 1446 @!stack_ptr: 0..stack_size; {first unused location in the output state stack} 1447 1448 @ It is convenient to keep a global variable |zo| equal to |cur_repl mod zz|. 1449 1450 @<Glob...@>= 1451 @!zo:0..zz-1; {the segment of |tok_mem| from which output is coming} 1452 1453 @ Parameters must also be stacked. They are placed in 1454 |tok_mem| just above the other replacement texts, and dummy parameter 1455 `names' are placed in |byte_start| just after the other names. 1456 The variables |text_ptr| and |tok_ptr[z]| essentially serve as parameter 1457 stack pointers during the output phase, so there is no need for a separate 1458 data structure to handle this problem. 1459 1460 @ There is an implicit stack corresponding to meta-comments that are output 1461 via \.{@@\{} and \.{@@\}}. But this stack need not be represented in detail, 1462 because we only need to know whether it is empty or not. A global variable 1463 |brace_level| tells how many items would be on this stack if it were present. 1464 1465 @<Globals...@>= 1466 @!brace_level: eight_bits; {current depth of $\.{@@\{}\ldots\.{@@\}}$ nesting} 1467 1468 @ To get the output process started, we will perform the following 1469 initialization steps. We may assume that |text_link[0]| is nonzero, since it 1470 points to the \PASCAL\ text in the first unnamed module that generates 1471 code; if there are no such modules, there is nothing to output, and an 1472 error message will have been generated before we do any of the initialization. 1473 1474 @<Initialize the output stacks@>= 1475 stack_ptr:=1; brace_level:=0; cur_name:=0; cur_repl:=text_link[0]; 1476 zo:=cur_repl mod zz; cur_byte:=tok_start[cur_repl]; 1477 cur_end:=tok_start[cur_repl+zz]; cur_mod:=0; 1478 1479 @ When the replacement text for name |p| is to be inserted into the output, 1480 the following subroutine is called to save the old level of output and get 1481 the new one going. 1482 1483 @p procedure push_level(@!p:name_pointer); {suspends the current level} 1484 begin if stack_ptr=stack_size then overflow('stack') 1485 else begin stack[stack_ptr]:=cur_state; {save |cur_end|, |cur_byte|, etc.} 1486 incr(stack_ptr); 1487 cur_name:=p; cur_repl:=equiv[p]; zo:=cur_repl mod zz; 1488 cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz]; 1489 cur_mod:=0; 1490 end; 1491 end; 1492 1493 @ When we come to the end of a replacement text, the |pop_level| subroutine 1494 does the right thing: It either moves to the continuation of this replacement 1495 text or returns the state to the most recently stacked level. Part of this 1496 subroutine, which updates the parameter stack, will be given later when we 1497 study the parameter stack in more detail. 1498 1499 @p procedure pop_level; {do this when |cur_byte| reaches |cur_end|} 1500 label exit; 1501 begin if text_link[cur_repl]=0 then {end of macro expansion} 1502 begin if ilk[cur_name]=parametric then 1503 @<Remove a parameter from the parameter stack@>; 1504 end 1505 else if text_link[cur_repl]<module_flag then {link to a continuation} 1506 begin cur_repl:=text_link[cur_repl]; {we will stay on the same level} 1507 zo:=cur_repl mod zz; 1508 cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz]; 1509 return; 1510 end; 1511 decr(stack_ptr); {we will go down to the previous level} 1512 if stack_ptr>0 then 1513 begin cur_state:=stack[stack_ptr]; zo:=cur_repl mod zz; 1514 end; 1515 exit: end; 1516 1517 @ The heart of the output procedure is the |get_output| routine, which produces 1518 the next token of output that is not a reference to a macro. This procedure 1519 handles all the stacking and unstacking that is necessary. It returns the 1520 value |number| if the next output has a numeric value (the value of a 1521 numeric macro or string), in which case |cur_val| has been set to the 1522 number in question. The procedure also returns the value |module_number| 1523 if the next output begins or ends the replacement text of some module, 1524 in which case |cur_val| is that module's number (if beginning) or the 1525 negative of that value (if ending). And it returns the value |identifier| 1526 if the next output is an identifier of length two or more, in which case 1527 |cur_val| points to that identifier name. 1528 1529 @d number=@'200 {code returned by |get_output| when next output is numeric} 1530 @d module_number=@'201 {code returned by |get_output| for module numbers} 1531 @d identifier=@'202 {code returned by |get_output| for identifiers} 1532 1533 @<Globals...@>= 1534 @!cur_val:integer; {additional information corresponding to output token} 1535 1536 @ If |get_output| finds that no more output remains, it returns the value zero. 1537 1538 @p function get_output:sixteen_bits; {returns next token after macro expansion} 1539 label restart, done, found; 1540 var a:sixteen_bits; {value of current byte} 1541 @!b:eight_bits; {byte being copied} 1542 @!bal:sixteen_bits; {excess of \.( versus \.) while copying a parameter} 1543 @!k:0..max_bytes; {index into |byte_mem|} 1544 @!w:0..ww-1; {segment of |byte_mem|} 1545 begin restart: if stack_ptr=0 then 1546 begin a:=0; goto found; 1547 end; 1548 if cur_byte=cur_end then 1549 begin cur_val:=-cur_mod; pop_level; 1550 if cur_val=0 then goto restart; 1551 a:=module_number; goto found; 1552 end; 1553 a:=tok_mem[zo,cur_byte]; incr(cur_byte); 1554 if a<@'200 then {one-byte token} 1555 if a=param then 1556 @<Start scanning current macro parameter, |goto restart|@> 1557 else goto found; 1558 a:=(a-@'200)*@'400+tok_mem[zo,cur_byte]; incr(cur_byte); 1559 if a<@'24000 then {|@'24000=(@'250-@'200)*@'400|} 1560 @<Expand macro |a| and |goto found|, or |goto restart| if no output found@>; 1561 if a<@'50000 then {|@'50000=(@'320-@'200)*@'400|} 1562 @<Expand module |a-@'24000|, |goto restart|@>; 1563 cur_val:=a-@'50000; a:=module_number; cur_mod:=cur_val; 1564 found: 1565 @!debug if trouble_shooting then debug_help;@;@+gubed@/ 1566 get_output:=a; 1567 end; 1568 1569 @ The user may have forgotten to give any \PASCAL\ text for a module name, 1570 or the \PASCAL\ text may have been associated with a different name by mistake. 1571 1572 @<Expand module |a-...@>= 1573 begin a:=a-@'24000; 1574 if equiv[a]<>0 then push_level(a) 1575 else if a<>0 then 1576 begin print_nl('! Not present: <'); print_id(a); print('>'); error; 1577 @.Not present: <section name>@> 1578 end; 1579 goto restart; 1580 end 1581 1582 @ @<Expand macro ...@>= 1583 begin case ilk[a] of 1584 normal: begin cur_val:=a; a:=identifier; 1585 end; 1586 numeric: begin cur_val:=equiv[a]-@'100000; a:=number; 1587 end; 1588 simple: begin push_level(a); goto restart; 1589 end; 1590 parametric: begin @<Put a parameter on the parameter stack, 1591 or |goto restart| if error occurs@>; 1592 push_level(a); goto restart; 1593 end; 1594 othercases confusion('output') 1595 endcases;@/ 1596 goto found; 1597 end 1598 1599 @ We come now to the interesting part, the job of putting a parameter on 1600 the parameter stack. First we pop the stack if necessary until getting to 1601 a level that hasn't ended. Then the next character must be a `\.('; 1602 and since parentheses are balanced on each level, the entire parameter must 1603 be present, so we can copy it without difficulty. 1604 1605 @<Put a parameter...@>= 1606 while (cur_byte=cur_end)and(stack_ptr>0) do pop_level; 1607 if (stack_ptr=0)or(tok_mem[zo,cur_byte]<>"(") then 1608 begin print_nl('! No parameter given for '); print_id(a); error; 1609 @.No parameter given for macro@> 1610 goto restart; 1611 end; 1612 @<Copy the parameter into |tok_mem|@>; 1613 equiv[name_ptr]:=text_ptr; ilk[name_ptr]:=simple; w:=name_ptr mod ww; 1614 k:=byte_ptr[w]; 1615 @!debug if k=max_bytes then overflow('byte memory'); 1616 byte_mem[w,k]:="#"; incr(k); byte_ptr[w]:=k; 1617 gubed {this code has set the parameter identifier for debugging printouts} 1618 if name_ptr>max_names-ww then overflow('name'); 1619 byte_start[name_ptr+ww]:=k; incr(name_ptr); 1620 if text_ptr>max_texts-zz then overflow('text'); 1621 text_link[text_ptr]:=0; tok_start[text_ptr+zz]:=tok_ptr[z]; 1622 incr(text_ptr); 1623 z:=text_ptr mod zz 1624 1625 @ The |pop_level| routine undoes the effect of parameter-pushing when 1626 a parameter macro is finished: 1627 1628 @<Remove a parameter...@>= 1629 begin decr(name_ptr); decr(text_ptr); 1630 z:=text_ptr mod zz; 1631 stat if tok_ptr[z]>max_tok_ptr[z] then max_tok_ptr[z]:=tok_ptr[z]; 1632 tats {the maximum value of |tok_ptr| occurs just before parameter popping} 1633 tok_ptr[z]:=tok_start[text_ptr]; 1634 @!debug decr(byte_ptr[name_ptr mod ww]);@+gubed 1635 end 1636 1637 @ When a parameter occurs in a replacement text, we treat it as a simple 1638 macro in position (|name_ptr-1|): 1639 1640 @<Start scanning...@>= 1641 begin push_level(name_ptr-1); goto restart; 1642 end 1643 1644 @ Similarly, a |param| token encountered as we copy a parameter is converted 1645 into a simple macro call for |name_ptr-1|. 1646 Some care is needed to handle cases like \\{macro}|(#; print('#)'))|; the 1647 \.{\#} token will have been changed to |param| outside of strings, but we 1648 still must distinguish `real' parentheses from those in strings. 1649 1650 @d app_repl(#)==begin if tok_ptr[z]=max_toks then overflow('token'); 1651 tok_mem[z,tok_ptr[z]]:=#; incr(tok_ptr[z]); end 1652 1653 @<Copy the parameter...@>= 1654 bal:=1; incr(cur_byte); {skip the opening `\.('} 1655 loop@+ begin b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1656 if b=param then store_two_bytes(name_ptr+@'77777) 1657 else begin if b>=@'200 then 1658 begin app_repl(b); 1659 b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1660 end 1661 else case b of 1662 "(": incr(bal); 1663 ")": begin decr(bal); 1664 if bal=0 then goto done; 1665 end; 1666 "'": repeat app_repl(b); 1667 b:=tok_mem[zo,cur_byte]; incr(cur_byte); 1668 until b="'"; {copy string, don't change |bal|} 1669 othercases do_nothing 1670 endcases; 1671 app_repl(b); 1672 end; 1673 end; 1674 done: 1675 1676 @* Producing the output. 1677 The |get_output| routine above handles most of the complexity of output 1678 generation, but there are two further considerations that have a nontrivial 1679 effect on \.{TANGLE}'s algorithms. 1680 1681 First, we want to make sure that the output is broken into lines not 1682 exceeding |line_length| characters per line, where these breaks occur at 1683 valid places (e.g., not in the middle of a string or a constant or an 1684 identifier, not between `\.<' and `\.>', not at a `\.{@@\&}' position 1685 where quantities are being joined together). Therefore we assemble the 1686 output into a buffer before deciding where the line breaks will appear. 1687 However, we make very little attempt to make ``logical'' line breaks that 1688 would enhance the readability of the output; people are supposed to read 1689 the input of \.{TANGLE} or the \TeX ed output of \.{WEAVE}, but not the 1690 tangled-up output. The only concession to readability is that a break after 1691 a semicolon will be made if possible, since commonly used ``pretty 1692 printing'' routines give better results in such cases. 1693 1694 Second, we want to decimalize non-decimal constants, and to combine integer 1695 quantities that are added or subtracted, because \PASCAL\ doesn't allow 1696 constant expressions in subrange types or in case labels. This means we 1697 want to have a procedure that treats a construction like \.{(E-15+17)} 1698 as equivalent to `\.{(E+2)}', while also leaving `\.{(1E-15+17)}' and 1699 `\.{(E-15+17*y)}' untouched. Consider also `\.{-15+17.5}' versus 1700 `\.{-15+17..5}'. We shall not combine integers preceding or following 1701 \.*, \./, \.{div}, \.{mod}, or \.{@@\&}. Note that if |y| has been defined 1702 to equal $-2$, we must expand `\.{x*y}' into `\.{x*(-2)}'; but `\.{x-y}' 1703 can expand into `\.{x+2}' and we can even change `\.{x - y mod z}' to 1704 @^mod@> 1705 `\.{x + 2 mod z}' because \PASCAL\ has a nonstandard \&{mod} operation! 1706 1707 The following solution to these problems has been adopted: An array 1708 |out_buf| contains characters that have been generated but not yet output, 1709 and there are three pointers into this array. One of these, |out_ptr|, is 1710 the number of characters currently in the buffer, and we will have 1711 |1<=out_ptr<=line_length| most of the time. The second is |break_ptr|, 1712 which is the largest value |<=out_ptr| such that we are definitely entitled 1713 to end a line by outputting the characters |out_buf[1..(break_ptr-1)]|; 1714 we will always have |break_ptr<=line_length|. Finally, |semi_ptr| is either 1715 zero or the largest known value of a legal break after a semicolon or comment 1716 on the current line; we will always have |semi_ptr<=break_ptr|. 1717 1718 @<Globals...@>= 1719 @!out_buf: array [0..out_buf_size] of ASCII_code; {assembled characters} 1720 @!out_ptr: 0..out_buf_size; {first available place in |out_buf|} 1721 @!break_ptr: 0..out_buf_size; {last breaking place in |out_buf|} 1722 @!semi_ptr: 0..out_buf_size; {last semicolon breaking place in |out_buf|} 1723 1724 @ Besides having those three pointers, 1725 the output process is in one of several states: 1726 1727 \yskip\hang |num_or_id| means that the last item in the buffer is a number or 1728 identifier, hence a blank space or line break must be inserted if the next 1729 item is also a number or identifier. 1730 1731 \yskip\hang |unbreakable| means that the last item in the buffer was followed 1732 by the \.{@@\&} operation that inhibits spaces between it and the next item. 1733 1734 \yskip\hang |sign| means that the last item in the buffer is to be followed 1735 by \.+ or \.-, depending on whether |out_app| is positive or negative. 1736 1737 \yskip\hang |sign_val| means that the decimal equivalent of 1738 $\vert|out_val|\vert$ should be appended to the buffer. If |out_val<0|, 1739 or if |out_val=0| and |last_sign<0|, the number should be preceded by a minus 1740 sign. Otherwise it should be preceded by the character |out_sign| unless 1741 |out_sign=0|; the |out_sign| variable is either 0 or \.{"\ "} or \.{"+"}. 1742 1743 \yskip\hang |sign_val_sign| is like |sign_val|, but also append \.+ or \.- 1744 afterwards, depending on whether |out_app| is positive or negative. 1745 1746 \yskip\hang |sign_val_val| is like |sign_val|, but also append the decimal 1747 equivalent of |out_app| including its sign, using |last_sign| in case 1748 |out_app=0|. 1749 1750 \yskip\hang |misc| means none of the above. 1751 1752 \yskip\noindent 1753 For example, the output buffer and output state run through the following 1754 sequence as we generate characters from `\.{(x-15+19-2)}': 1755 $$\vbox{\halign{$\hfil#\hfil$\quad&#\hfil&\quad\hfil#\hfil&\quad 1756 \hfil#\hfil&\quad\hfil#\hfil&\quad\hfil#\hfil\quad&\hfil#\hfil\cr 1757 output&|out_buf|&|out_state|&|out_sign|&|out_val|&|out_app|&|last_sign|\cr 1758 \noalign{\vskip 3pt} 1759 (&\.(&|misc|\cr 1760 x&\.{(x}&|num_or_id|\cr 1761 -&\.{(x}&|sign|&&&$-1$&$-1$\cr 1762 15&\.{(x}&|sign_val|&\.{"+"}&$-15$&&$-1$\cr 1763 +&\.{(x}&|sign_val_sign|&\.{"+"}&$-15$&$+1$&$+1$\cr 1764 19&\.{(x}&|sign_val_val|&\.{"+"}&$-15$&$+19$&$+1$\cr 1765 -&\.{(x}&|sign_val_sign|&\.{"+"}&$+4$&$-1$&$-1$\cr 1766 2&\.{(x}&|sign_val_val|&\.{"+"}&$+4$&$-2$&$-1$\cr 1767 )&\.{(x+2)}&|misc|\cr}}$$ 1768 At each stage we have put as much into the buffer as possible without 1769 knowing what is coming next. Examples like `\.{x-0.1}' indicate why 1770 |last_sign| is needed to associate the proper sign with an output of zero. 1771 1772 In states |num_or_id|, |unbreakable|, and |misc| the last item in the buffer 1773 lies between |break_ptr| and |out_ptr-1|, inclusive; in the other states we 1774 have |break_ptr=out_ptr|. 1775 1776 The numeric values assigned to |num_or_id|, etc., have been chosen to 1777 shorten some of the program logic; for example, the program makes use of 1778 the fact that |sign+2=sign_val_sign|. 1779 1780 @d misc=0 {state associated with special characters} 1781 @d num_or_id=1 {state associated with numbers and identifiers} 1782 @d sign=2 {state associated with pending \.+ or \.-} 1783 @d sign_val=num_or_id+2 {state associated with pending sign and value} 1784 @d sign_val_sign=sign+2 {|sign_val| followed by another pending sign} 1785 @d sign_val_val=sign_val+2 {|sign_val| followed by another pending value} 1786 @d unbreakable=sign_val_val+1 {state associated with \.{@@\&}} 1787 1788 @<Globals...@>= 1789 @!out_state:eight_bits; {current status of partial output} 1790 @!out_val,@!out_app:integer; {pending values} 1791 @!out_sign:ASCII_code; {sign to use if appending |out_val>=0|} 1792 @!last_sign:-1..+1; {sign to use if appending a zero} 1793 1794 @ During the output process, |line| will equal the number of the next line 1795 to be output. 1796 1797 @<Initialize the output buffer@>= 1798 out_state:=misc; out_ptr:=0; break_ptr:=0; semi_ptr:=0; out_buf[0]:=0; line:=1; 1799 1800 @ Here is a routine that is invoked when |out_ptr>line_length| 1801 or when it is time to flush out the final line. The |flush_buffer| procedure 1802 often writes out the line up to the current |break_ptr| position, then moves the 1803 remaining information to the front of |out_buf|. However, it prefers to 1804 write only up to |semi_ptr|, if the residual line won't be too long. 1805 1806 @d check_break==if out_ptr>line_length then flush_buffer 1807 1808 @p procedure flush_buffer; {writes one line to output file} 1809 var k:0..out_buf_size; {index into |out_buf|} 1810 @!b:0..out_buf_size; {value of |break_ptr| upon entry} 1811 begin b:=break_ptr; 1812 if (semi_ptr<>0)and(out_ptr-semi_ptr<=line_length) then break_ptr:=semi_ptr; 1813 for k:=1 to break_ptr do write(Pascal_file,xchr[out_buf[k-1]]); 1814 write_ln(Pascal_file); incr(line); 1815 if line mod 100 = 0 then 1816 begin print('.'); 1817 if line mod 500 = 0 then print(line:1); 1818 update_terminal; {progress report} 1819 end; 1820 if break_ptr<out_ptr then 1821 begin if out_buf[break_ptr]=" " then 1822 begin incr(break_ptr); {drop space at break} 1823 if break_ptr>b then b:=break_ptr; 1824 end; 1825 for k:=break_ptr to out_ptr-1 do out_buf[k-break_ptr]:=out_buf[k]; 1826 end; 1827 out_ptr:=out_ptr-break_ptr; break_ptr:=b-break_ptr; semi_ptr:=0; 1828 if out_ptr>line_length then 1829 begin err_print('! Long line must be truncated'); out_ptr:=line_length; 1830 @.Long line must be truncated@> 1831 end; 1832 end; 1833 1834 @ @<Empty the last line from the buffer@>= 1835 break_ptr:=out_ptr; semi_ptr:=0; flush_buffer; 1836 if brace_level<>0 then 1837 err_print('! Program ended at brace level ',brace_level:1); 1838 @.Program ended at brace level n@> 1839 1840 @ Another simple and useful routine appends the decimal equivalent of 1841 a nonnegative integer to the output buffer. 1842 1843 @d app(#)==begin out_buf[out_ptr]:=#; incr(out_ptr); {append a single character} 1844 end 1845 1846 @p procedure app_val(@!v:integer); {puts |v| into buffer, assumes |v>=0|} 1847 var k:0..out_buf_size; {index into |out_buf|} 1848 begin k:=out_buf_size; {first we put the digits at the very end of |out_buf|} 1849 repeat out_buf[k]:=v mod 10; v:=v div 10; decr(k); 1850 until v=0; 1851 repeat incr(k); app(out_buf[k]+"0"); 1852 until k=out_buf_size; {then we append them, most significant first} 1853 end; 1854 1855 @ The output states are kept up to date by the output routines, which are 1856 called |send_out|, |send_val|, and |send_sign|. The |send_out| procedure 1857 has two parameters: |t| tells the type of information being sent and 1858 |v| contains the information proper. Some information may also be passed 1859 in the array |out_contrib|. 1860 1861 \yskip\hang If |t=misc| then |v| is a character to be output. 1862 1863 \hang If |t=str| then |v| is the length of a string or something like `\.{<>}' 1864 in |out_contrib|. 1865 1866 \hang If |t=ident| then |v| is the length of an identifier in |out_contrib|. 1867 1868 \hang If |t=frac| then |v| is the length of a fraction and/or exponent in 1869 |out_contrib|. 1870 1871 @d str=1 {|send_out| code for a string} 1872 @d ident=2 {|send_out| code for an identifier} 1873 @d frac=3 {|send_out| code for a fraction} 1874 1875 @<Glob...@>= 1876 @!out_contrib:array[1..line_length] of ASCII_code; {a contribution to |out_buf|} 1877 1878 @ A slightly subtle point in the following code is that the user may ask 1879 for a |join| operation (i.e., \.{@@\&}) following whatever is being sent 1880 out. We will see later that |join| is implemented in part by calling 1881 |send_out(frac,0)|. 1882 1883 @p procedure send_out(@!t:eight_bits; @!v:sixteen_bits); 1884 {outputs |v| of type |t|} 1885 label restart; 1886 var k: 0..line_length; {index into |out_contrib|} 1887 begin @<Get the buffer ready for appending the new information@>; 1888 if t<>misc then for k:=1 to v do app(out_contrib[k]) 1889 else app(v); 1890 check_break; 1891 if (t=misc)and((v=";")or(v="}")) then 1892 begin semi_ptr:=out_ptr; break_ptr:=out_ptr; 1893 end; 1894 if t>=ident then out_state:=num_or_id {|t=ident| or |frac|} 1895 else out_state:=misc {|t=str| or |misc|} 1896 end; 1897 1898 @ Here is where the buffer states for signs and values collapse into simpler 1899 states, because we are about to append something that doesn't combine with 1900 the previous integer constants. 1901 1902 We use an ASCII-code trick: Since |","-1="+"| and |","+1="-"|, we have 1903 |","-c=@t sign of $c$@>|, when $\vert c\vert=1$. 1904 1905 @<Get the buffer ready...@>= 1906 restart: case out_state of 1907 num_or_id: if t<>frac then 1908 begin break_ptr:=out_ptr; 1909 if t=ident then app(" "); 1910 end; 1911 sign: begin app(","-out_app); check_break; break_ptr:=out_ptr; 1912 end; 1913 sign_val,sign_val_sign: begin @<Append \(|out_val| to buffer@>; 1914 out_state:=out_state-2; goto restart; 1915 end; 1916 sign_val_val: @<Reduce |sign_val_val| to |sign_val| and |goto restart|@>; 1917 misc: if t<>frac then break_ptr:=out_ptr;@/ 1918 othercases do_nothing {this is for |unbreakable| state} 1919 endcases 1920 1921 @ @<Append \(|out_val|...@>= 1922 if (out_val<0)or((out_val=0)and(last_sign<0)) then app("-") 1923 else if out_sign>0 then app(out_sign); 1924 app_val(abs(out_val)); check_break; 1925 1926 @ @<Reduce |sign_val_val|...@>= 1927 begin if (t=frac)or(@<Contribution is \.* or \./ or \.{DIV} or \.{MOD}@>) then 1928 begin @<Append \(|out_val| to buffer@>; 1929 out_sign:="+"; out_val:=out_app; 1930 end 1931 else out_val:=out_val+out_app; 1932 out_state:=sign_val; goto restart; 1933 end 1934 1935 @ @<Contribution is \.*...@>= 1936 ((t=ident)and(v=3)and@| 1937 (((out_contrib[1]="D")and(out_contrib[2]="I")and(out_contrib[3]="V")) or@| 1938 ((out_contrib[1]="M")and(out_contrib[2]="O")and(out_contrib[3]="D")) ))or@| 1939 @^uppercase@> 1940 ((t=misc)and((v="*")or(v="/"))) 1941 1942 @ The following routine is called with $v=\pm1$ when a plus or minus sign is 1943 appended to the output. It extends \PASCAL\ to allow repeated signs 1944 (e.g., `\.{--}' is equivalent to `\.+'), rather than to give an error message. 1945 The signs following `\.E' in real constants are treated as part of a fraction, 1946 so they are not seen by this routine. 1947 1948 @p procedure send_sign(@!v:integer); 1949 begin case out_state of 1950 sign, sign_val_sign: out_app:=out_app*v; 1951 sign_val:begin out_app:=v; out_state:=sign_val_sign; 1952 end; 1953 sign_val_val: begin out_val:=out_val+out_app; out_app:=v; 1954 out_state:=sign_val_sign; 1955 end; 1956 othercases begin break_ptr:=out_ptr; out_app:=v; out_state:=sign; 1957 end 1958 endcases;@/ 1959 last_sign:=out_app; 1960 end; 1961 1962 @ When a (signed) integer value is to be output, we call |send_val|. 1963 1964 @d bad_case=666 {this is a label used below} 1965 1966 @p procedure send_val(@!v:integer); {output the (signed) value |v|} 1967 label bad_case, {go here if we can't keep |v| in the output state} 1968 exit; 1969 begin case out_state of 1970 num_or_id: begin @<If previous output was \.{DIV} or \.{MOD}, |goto bad_case|@>; 1971 out_sign:=" "; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr; 1972 last_sign:=+1; 1973 end; 1974 misc: begin @<If previous output was \.* or \./, |goto bad_case|@>; 1975 out_sign:=0; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr; 1976 last_sign:=+1; 1977 end; 1978 @t\4@>@<Handle cases of |send_val| when |out_state| contains a sign@>@; 1979 othercases goto bad_case 1980 endcases;@/ 1981 return; 1982 bad_case: @<Append the decimal value of |v|, with parentheses if negative@>; 1983 exit: end; 1984 1985 @ @<Handle cases of |send_val|...@>= 1986 sign: begin out_sign:="+"; out_state:=sign_val; out_val:=out_app*v; 1987 end; 1988 sign_val: begin out_state:=sign_val_val; out_app:=v; 1989 err_print('! Two numbers occurred without a sign between them'); 1990 end; 1991 sign_val_sign: begin out_state:=sign_val_val; out_app:=out_app*v; 1992 end; 1993 sign_val_val: begin out_val:=out_val+out_app; out_app:=v; 1994 err_print('! Two numbers occurred without a sign between them'); 1995 @.Two numbers occurred...@> 1996 end; 1997 1998 @ @<If previous output was \.*...@>= 1999 if (out_ptr=break_ptr+1)and((out_buf[break_ptr]="*")or(out_buf[break_ptr]="/")) 2000 then goto bad_case 2001 2002 @ @<If previous output was \.{DIV}...@>= 2003 if (out_ptr=break_ptr+3)or 2004 ((out_ptr=break_ptr+4)and(out_buf[break_ptr]=" ")) then 2005 @^uppercase@> 2006 if ((out_buf[out_ptr-3]="D")and(out_buf[out_ptr-2]="I")and 2007 (out_buf[out_ptr-1]="V"))or @/ 2008 ((out_buf[out_ptr-3]="M")and(out_buf[out_ptr-2]="O")and 2009 (out_buf[out_ptr-1]="D")) then@/ goto bad_case 2010 2011 @ @<Append the decimal value...@>= 2012 if v>=0 then 2013 begin if out_state=num_or_id then 2014 begin break_ptr:=out_ptr; app(" "); 2015 end; 2016 app_val(v); check_break; out_state:=num_or_id; 2017 end 2018 else begin app("("); app("-"); app_val(-v); app(")"); check_break; 2019 out_state:=misc; 2020 end 2021 2022 @* The big output switch. 2023 To complete the output process, we need a routine that takes the results 2024 of |get_output| and feeds them to |send_out|, |send_val|, or |send_sign|. 2025 This procedure `|send_the_output|' will be invoked just once, as follows: 2026 2027 @<Phase II: Output the contents of the compressed tables@>= 2028 if text_link[0]=0 then 2029 begin print_nl('! No output was specified.'); mark_harmless; 2030 @.No output was specified@> 2031 end 2032 else begin print_nl('Writing the output file'); update_terminal;@/ 2033 @<Initialize the output stacks@>; 2034 @<Initialize the output buffer@>; 2035 send_the_output;@/ 2036 @<Empty the last line...@>; 2037 print_nl('Done.'); 2038 end 2039 2040 @ A many-way switch is used to send the output: 2041 2042 @d get_fraction=2 {this label is used below} 2043 2044 @p procedure send_the_output; 2045 label get_fraction, {go here to finish scanning a real constant} 2046 reswitch, continue; 2047 var cur_char:eight_bits; {the latest character received} 2048 @!k:0..line_length; {index into |out_contrib|} 2049 @!j:0..max_bytes; {index into |byte_mem|} 2050 @!w:0..ww-1; {segment of |byte_mem|} 2051 @!n:integer; {number being scanned} 2052 begin while stack_ptr>0 do 2053 begin cur_char:=get_output; 2054 reswitch: case cur_char of 2055 0: do_nothing; {this case might arise if output ends unexpectedly} 2056 @t\4@>@<Cases related to identifiers@>@; 2057 @t\4@>@<Cases related to constants, possibly leading to 2058 |get_fraction| or |reswitch|@>@; 2059 "+","-": send_sign(","-cur_char); 2060 @t\4@>@<Cases like \.{<>} and \.{:=}@>@; 2061 "'": @<Send a string, |goto reswitch|@>; 2062 @<Other printable characters@>: send_out(misc,cur_char); 2063 @t\4@>@<Cases involving \.{@@\{} and \.{@@\}}@>@; 2064 join: begin send_out(frac,0); out_state:=unbreakable; 2065 end; 2066 verbatim: @<Send verbatim string@>; 2067 force_line: @<Force a line break@>; 2068 othercases err_print('! Can''t output ASCII code ',cur_char:1) 2069 @.Can't output ASCII code n@> 2070 endcases;@/ 2071 goto continue; 2072 get_fraction: @<Special code to finish real constants@>; 2073 continue: end; 2074 end; 2075 2076 @ @<Cases like \.{<>}...@>= 2077 and_sign: begin out_contrib[1]:="A"; out_contrib[2]:="N"; out_contrib[3]:="D"; 2078 @^uppercase@> 2079 send_out(ident,3); 2080 end; 2081 not_sign: begin out_contrib[1]:="N"; out_contrib[2]:="O"; out_contrib[3]:="T"; 2082 send_out(ident,3); 2083 end; 2084 set_element_sign: begin out_contrib[1]:="I"; out_contrib[2]:="N"; 2085 send_out(ident,2); 2086 end; 2087 or_sign: begin out_contrib[1]:="O"; out_contrib[2]:="R"; send_out(ident,2); 2088 end; 2089 left_arrow: begin out_contrib[1]:=":"; out_contrib[2]:="="; send_out(str,2); 2090 end; 2091 not_equal: begin out_contrib[1]:="<"; out_contrib[2]:=">"; send_out(str,2); 2092 end; 2093 less_or_equal: begin out_contrib[1]:="<"; out_contrib[2]:="="; send_out(str,2); 2094 end; 2095 greater_or_equal: begin out_contrib[1]:=">"; out_contrib[2]:="="; 2096 send_out(str,2); 2097 end; 2098 equivalence_sign: begin out_contrib[1]:="="; out_contrib[2]:="="; 2099 send_out(str,2); 2100 end; 2101 double_dot: begin out_contrib[1]:="."; out_contrib[2]:="."; send_out(str,2); 2102 end; 2103 2104 @ Please don't ask how all of the following characters can actually get 2105 through \.{TANGLE} outside of strings. It seems that |""""| and |"{"| 2106 cannot actually occur at this point of the program, but they have 2107 been included just in case \.{TANGLE} changes. 2108 2109 If \.{TANGLE} is producing code for a \PASCAL\ compiler that uses `\.{(.}' 2110 and `\.{.)}' instead of square brackets (e.g., on machines with {\mc EBCDIC} 2111 code), one should remove |"["| and |"]"| from this list and put them into 2112 the preceding module in the appropriate way. Similarly, some compilers 2113 want `\.\^' to be converted to `\.{@@}'. 2114 @^system dependencies@>@^EBCDIC@> 2115 2116 @<Other printable characters@>= 2117 "!","""","#","$","%","&","(",")","*",",","/",":",";","<","=",">","?", 2118 "@@","[","\","]","^","_","`","{","|" 2119 2120 @ Single-character identifiers represent themselves, while longer ones 2121 appear in |byte_mem|. All must be converted to uppercase, 2122 with underlines removed. Extremely long identifiers must be chopped. 2123 2124 (Some \PASCAL\ compilers work with lowercase letters instead of 2125 uppercase. If this module of \.{TANGLE} is changed, it's also necessary 2126 to change from uppercase to lowercase in the modules that are 2127 listed in the index under ``uppercase''.) 2128 @^system dependencies@> 2129 @^uppercase@> 2130 2131 @d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14, 2132 #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,# 2133 2134 @<Cases related to identifiers@>= 2135 "A",up_to("Z"): begin out_contrib[1]:=cur_char; send_out(ident,1); 2136 end; 2137 "a",up_to("z"): begin out_contrib[1]:=cur_char-@'40; send_out(ident,1); 2138 end; 2139 identifier: begin k:=0; j:=byte_start[cur_val]; w:=cur_val mod ww; 2140 while (k<max_id_length)and(j<byte_start[cur_val+ww]) do 2141 begin incr(k); out_contrib[k]:=byte_mem[w,j]; incr(j); 2142 if out_contrib[k]>="a" then out_contrib[k]:=out_contrib[k]-@'40 2143 else if out_contrib[k]="_" then decr(k); 2144 end; 2145 send_out(ident,k); 2146 end; 2147 2148 @ After sending a string, we need to look ahead at the next character, in order 2149 to see if there were two consecutive single-quote marks. Afterwards we go to 2150 |reswitch| to process the next character. 2151 2152 @<Send a string...@>= 2153 begin k:=1; out_contrib[1]:="'"; 2154 repeat if k<line_length then incr(k); 2155 out_contrib[k]:=get_output; 2156 until (out_contrib[k]="'")or(stack_ptr=0); 2157 if k=line_length then err_print('! String too long'); 2158 @.String too long@> 2159 send_out(str,k); cur_char:=get_output; 2160 if cur_char="'" then out_state:=unbreakable; 2161 goto reswitch; 2162 end 2163 2164 @ Sending a verbatim string is similar, but we don't have to look ahead. 2165 2166 @<Send verbatim string@>= 2167 begin k:=0; 2168 repeat if k<line_length then incr(k); 2169 out_contrib[k]:=get_output; 2170 until (out_contrib[k]=verbatim)or(stack_ptr=0); 2171 if k=line_length then err_print('! Verbatim string too long'); 2172 @.Verbatim string too long@> 2173 send_out(str,k-1); 2174 end 2175 2176 @ In order to encourage portable software, \.{TANGLE} complains 2177 if the constants get dangerously close to the largest value representable 2178 on a 32-bit computer ($2^{31}-1$). 2179 2180 @d digits=="0","1","2","3","4","5","6","7","8","9" 2181 2182 @<Cases related to constants...@>= 2183 digits: begin n:=0; 2184 repeat cur_char:=cur_char-"0"; 2185 if n>=@'1463146314 then err_print('! Constant too big') 2186 @.Constant too big@> 2187 else n:=10*n+cur_char; 2188 cur_char:=get_output; 2189 until (cur_char>"9")or(cur_char<"0"); 2190 send_val(n); k:=0; 2191 if cur_char="e" then cur_char:="E"; 2192 @^uppercase@> 2193 if cur_char="E" then goto get_fraction 2194 else goto reswitch; 2195 end; 2196 check_sum: send_val(pool_check_sum); 2197 octal: begin n:=0; cur_char:="0"; 2198 repeat cur_char:=cur_char-"0"; 2199 if n>=@'2000000000 then err_print('! Constant too big') 2200 else n:=8*n+cur_char; 2201 cur_char:=get_output; 2202 until (cur_char>"7")or(cur_char<"0"); 2203 send_val(n); goto reswitch; 2204 end; 2205 hex: begin n:=0; cur_char:="0"; 2206 repeat if cur_char>="A" then cur_char:=cur_char+10-"A" 2207 else cur_char:=cur_char-"0"; 2208 if n>=@"8000000 then err_print('! Constant too big') 2209 else n:=16*n+cur_char; 2210 cur_char:=get_output; 2211 until (cur_char>"F")or(cur_char<"0")or@| 2212 ((cur_char>"9")and(cur_char<"A")); 2213 send_val(n); goto reswitch; 2214 end; 2215 number: send_val(cur_val); 2216 ".": begin k:=1; out_contrib[1]:="."; cur_char:=get_output; 2217 if cur_char="." then 2218 begin out_contrib[2]:="."; send_out(str,2); 2219 end 2220 else if (cur_char>="0")and(cur_char<="9") then goto get_fraction 2221 else begin send_out(misc,"."); goto reswitch; 2222 end; 2223 end; 2224 2225 @ The following code appears at label `|get_fraction|', when we want to 2226 scan to the end of a real constant. The first |k| characters of a fraction 2227 have already been placed in |out_contrib|, and |cur_char| is the next character. 2228 2229 @<Special code...@>= 2230 repeat if k<line_length then incr(k); 2231 out_contrib[k]:=cur_char; cur_char:=get_output; 2232 if (out_contrib[k]="E")and((cur_char="+")or(cur_char="-")) then 2233 @^uppercase@> 2234 begin if k<line_length then incr(k); 2235 out_contrib[k]:=cur_char; cur_char:=get_output; 2236 end 2237 else if cur_char="e" then cur_char:="E"; 2238 until (cur_char<>"E")and((cur_char<"0")or(cur_char>"9")); 2239 if k=line_length then err_print('! Fraction too long'); 2240 @.Fraction too long@> 2241 send_out(frac,k); goto reswitch 2242 2243 @ Some \PASCAL\ compilers do not recognize comments in braces, so the 2244 comments must be delimited by `\.{(*}' and `\.{*)}'. 2245 @^system dependencies@> 2246 In such cases the statement `|out_contrib[1]:="{"|' that appears here should 2247 be replaced by `\ignorespaces|begin out_contrib[1]:="("; out_contrib[2]:="*"; 2248 incr(k); end|', and a similar change should be made to 2249 `|out_contrib[k]:="}"|'. 2250 2251 @<Cases involving \.{@@\{} and \.{@@\}}@>= 2252 begin_comment: begin if brace_level=0 then send_out(misc,"{") 2253 else send_out(misc,"["); 2254 incr(brace_level); 2255 end; 2256 end_comment: if brace_level>0 then 2257 begin decr(brace_level); 2258 if brace_level=0 then send_out(misc,"}") 2259 else send_out(misc,"]"); 2260 end 2261 else err_print('! Extra @@}'); 2262 @.Extra \AT!\}@> 2263 module_number: begin k:=2; 2264 if brace_level=0 then out_contrib[1]:="{" 2265 else out_contrib[1]:="["; 2266 if cur_val<0 then 2267 begin out_contrib[k]:=":"; cur_val:=-cur_val; incr(k); 2268 end; 2269 n:=10; 2270 while cur_val>=n do n:=10*n; 2271 repeat n:=n div 10; 2272 out_contrib[k]:="0"+(cur_val div n); cur_val:=cur_val mod n; incr(k); 2273 until n=1; 2274 if out_contrib[2]<>":" then 2275 begin out_contrib[k]:=":"; incr(k); 2276 end; 2277 if brace_level=0 then out_contrib[k]:="}" 2278 else out_contrib[k]:="]"; 2279 send_out(str,k); 2280 end; 2281 2282 @ @<Force a line break@>= 2283 begin send_out(str,0); {normalize the buffer} 2284 while out_ptr>0 do 2285 begin if out_ptr<=line_length then break_ptr:=out_ptr; 2286 flush_buffer; 2287 end; 2288 out_state:=misc; 2289 end 2290 2291 @* Introduction to the input phase. 2292 We have now seen that \.{TANGLE} will be able to output the full 2293 \PASCAL\ program, if we can only get that program into the byte memory in 2294 the proper format. The input process is something like the output process 2295 in reverse, since we compress the text as we read it in and we expand it 2296 as we write it out. 2297 2298 There are three main input routines. The most interesting is the one that gets 2299 the next token of a \PASCAL\ text; the other two are used to scan rapidly past 2300 \TeX\ text in the \.{WEB} source code. One of the latter routines will jump to 2301 the next token that starts with `\.{@@}', and the other skips to the end 2302 of a \PASCAL\ comment. 2303 2304 @ But first we need to consider the low-level routine |get_line| 2305 that takes care of merging |change_file| into |web_file|. The |get_line| 2306 procedure also updates the line numbers for error messages. 2307 2308 @<Globals...@>= 2309 @!ii:integer; {general purpose |for| loop variable in the outer block} 2310 @!line:integer; {the number of the current line in the current file} 2311 @!other_line:integer; {the number of the current line in the input file that 2312 is not currently being read} 2313 @!temp_line:integer; {used when interchanging |line| with |other_line|} 2314 @!limit:0..buf_size; {the last character position occupied in the buffer} 2315 @!loc:0..buf_size; {the next character position to be read from the buffer} 2316 @!input_has_ended: boolean; {if |true|, there is no more input} 2317 @!changing: boolean; {if |true|, the current line is from |change_file|} 2318 2319 @ As we change |changing| from |true| to |false| and back again, we must 2320 remember to swap the values of |line| and |other_line| so that the |err_print| 2321 routine will be sure to report the correct line number. 2322 2323 @d change_changing== 2324 changing := not changing; 2325 temp_line:=other_line; other_line:=line; line:=temp_line 2326 {|line @t$\null\BA\null$@> other_line|} 2327 2328 @ When |changing| is |false|, the next line of |change_file| is kept in 2329 |change_buffer[0..change_limit]|, for purposes of comparison with the next 2330 line of |web_file|. After the change file has been completely input, we 2331 set |change_limit:=0|, so that no further matches will be made. 2332 2333 @<Globals...@>= 2334 @!change_buffer:array[0..buf_size] of ASCII_code; 2335 @!change_limit:0..buf_size; {the last position occupied in |change_buffer|} 2336 2337 @ Here's a simple function that checks if the two buffers are different. 2338 2339 @p function lines_dont_match:boolean; 2340 label exit; 2341 var k:0..buf_size; {index into the buffers} 2342 begin lines_dont_match:=true; 2343 if change_limit<>limit then return; 2344 if limit>0 then 2345 for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return; 2346 lines_dont_match:=false; 2347 exit: end; 2348 2349 @ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation 2350 for the next matching operation. Since blank lines in the change file are 2351 not used for matching, we have |(change_limit=0)and not changing| if and 2352 only if the change file is exhausted. This procedure is called only 2353 when |changing| is true; hence error messages will be reported correctly. 2354 2355 @p procedure prime_the_change_buffer; 2356 label continue, done, exit; 2357 var k:0..buf_size; {index into the buffers} 2358 begin change_limit:=0; {this value will be used if the change file ends} 2359 @<Skip over comment lines in the change file; |return| if end of file@>; 2360 @<Skip to the next nonblank line; |return| if end of file@>; 2361 @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>; 2362 exit: end; 2363 2364 @ While looking for a line that begins with \.{@@x} in the change file, 2365 we allow lines that begin with \.{@@}, as long as they don't begin with 2366 \.{@@y} or \.{@@z} (which would probably indicate that the change file is 2367 fouled up). 2368 2369 @<Skip over comment lines in the change file...@>= 2370 loop@+ begin incr(line); 2371 if not input_ln(change_file) then return; 2372 if limit<2 then goto continue; 2373 if buffer[0]<>"@@" then goto continue; 2374 if (buffer[1]>="X")and(buffer[1]<="Z") then 2375 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2376 if buffer[1]="x" then goto done; 2377 if (buffer[1]="y")or(buffer[1]="z") then 2378 begin loc:=2; err_print('! Where is the matching @@x?'); 2379 @.Where is the match...@> 2380 end; 2381 continue: end; 2382 done: 2383 2384 @ Here we are looking at lines following the \.{@@x}. 2385 2386 @<Skip to the next nonblank line...@>= 2387 repeat incr(line); 2388 if not input_ln(change_file) then 2389 begin err_print('! Change file ended after @@x'); 2390 @.Change file ended...@> 2391 return; 2392 end; 2393 until limit>0; 2394 2395 @ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>= 2396 begin change_limit:=limit; 2397 if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k]; 2398 end 2399 2400 @ The following procedure is used to see if the next change entry should 2401 go into effect; it is called only when |changing| is false. 2402 The idea is to test whether or not the current 2403 contents of |buffer| matches the current contents of |change_buffer|. 2404 If not, there's nothing more to do; but if so, a change is called for: 2405 All of the text down to the \.{@@y} is supposed to match. An error 2406 message is issued if any discrepancy is found. Then the procedure 2407 prepares to read the next line from |change_file|. 2408 2409 @p procedure check_change; {switches to |change_file| if the buffers match} 2410 label exit; 2411 var n:integer; {the number of discrepancies found} 2412 @!k:0..buf_size; {index into the buffers} 2413 begin if lines_dont_match then return; 2414 n:=0; 2415 loop@+ begin change_changing; {now it's |true|} 2416 incr(line); 2417 if not input_ln(change_file) then 2418 begin err_print('! Change file ended before @@y'); 2419 @.Change file ended...@> 2420 change_limit:=0; change_changing; {|false| again} 2421 return; 2422 end; 2423 @<If the current line starts with \.{@@y}, 2424 report any discrepancies and |return|@>; 2425 @<Move |buffer| and |limit|...@>; 2426 change_changing; {now it's |false|} 2427 incr(line); 2428 if not input_ln(web_file) then 2429 begin err_print('! WEB file ended during a change'); 2430 @.WEB file ended...@> 2431 input_has_ended:=true; return; 2432 end; 2433 if lines_dont_match then incr(n); 2434 end; 2435 exit: end; 2436 2437 @ @<If the current line starts with \.{@@y}...@>= 2438 if limit>1 then if buffer[0]="@@" then 2439 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 2440 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2441 if (buffer[1]="x")or(buffer[1]="z") then 2442 begin loc:=2; err_print('! Where is the matching @@y?'); 2443 @.Where is the match...@> 2444 end 2445 else if buffer[1]="y" then 2446 begin if n>0 then 2447 begin loc:=2; err_print('! Hmm... ',n:1, 2448 ' of the preceding lines failed to match'); 2449 @.Hmm... n of the preceding...@> 2450 end; 2451 return; 2452 end; 2453 end 2454 2455 @ @<Initialize the input system@>= 2456 open_input; line:=0; other_line:=0;@/ 2457 changing:=true; prime_the_change_buffer; change_changing;@/ 2458 limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false; 2459 2460 @ The |get_line| procedure is called when |loc>limit|; it puts the next 2461 line of merged input into the buffer and updates the other variables 2462 appropriately. A space is placed at the right end of the line. 2463 2464 @p procedure get_line; {inputs the next line} 2465 label restart; 2466 begin restart: if changing then 2467 @<Read from |change_file| and maybe turn off |changing|@>; 2468 if not changing then 2469 begin @<Read from |web_file| and maybe turn on |changing|@>; 2470 if changing then goto restart; 2471 end; 2472 loc:=0; buffer[limit]:=" "; 2473 end; 2474 2475 @ @<Read from |web_file|...@>= 2476 begin incr(line); 2477 if not input_ln(web_file) then input_has_ended:=true 2478 else if change_limit>0 then check_change; 2479 end 2480 2481 @ @<Read from |change_file|...@>= 2482 begin incr(line); 2483 if not input_ln(change_file) then 2484 begin err_print('! Change file ended without @@z'); 2485 @.Change file ended...@> 2486 buffer[0]:="@@"; buffer[1]:="z"; limit:=2; 2487 end; 2488 if limit>1 then {check if the change has ended} 2489 if buffer[0]="@@" then 2490 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 2491 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 2492 if (buffer[1]="x")or(buffer[1]="y") then 2493 begin loc:=2; err_print('! Where is the matching @@z?'); 2494 @.Where is the match...@> 2495 end 2496 else if buffer[1]="z" then 2497 begin prime_the_change_buffer; change_changing; 2498 end; 2499 end; 2500 end 2501 2502 @ At the end of the program, we will tell the user if the change file 2503 had a line that didn't match any relevant line in |web_file|. 2504 2505 @<Check that all changes have been read@>= 2506 if change_limit<>0 then {|changing| is false} 2507 begin for ii:=0 to change_limit-1 do buffer[ii]:=change_buffer[ii]; 2508 limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit; 2509 err_print('! Change file entry did not match'); 2510 @.Change file entry did not match@> 2511 end 2512 2513 @ Important milestones are reached during the input phase when certain 2514 control codes are sensed. 2515 2516 Control codes in \.{WEB} begin with `\.{@@}', and the next character 2517 identifies the code. Some of these are of interest only to \.{WEAVE}, 2518 so \.{TANGLE} ignores them; the others are converted by \.{TANGLE} into 2519 internal code numbers by the |control_code| function below. The ordering 2520 of these internal code numbers has been chosen to simplify the program logic; 2521 larger numbers are given to the control codes that denote more significant 2522 milestones. 2523 2524 @d ignore=0 {control code of no interest to \.{TANGLE}} 2525 @d control_text=@'203 {control code for `\.{@@t}', `\.{@@\^}', etc.} 2526 @d format=@'204 {control code for `\.{@@f}'} 2527 @d definition=@'205 {control code for `\.{@@d}'} 2528 @d begin_Pascal=@'206 {control code for `\.{@@p}'} 2529 @d module_name=@'207 {control code for `\.{@@<}'} 2530 @d new_module=@'210 {control code for `\.{@@\ }' and `\.{@@*}'} 2531 2532 @p function control_code(@!c:ASCII_code):eight_bits; {convert |c| after \.{@@}} 2533 begin case c of 2534 "@@": control_code:="@@"; {`quoted' at sign} 2535 "'": control_code:=octal; {precedes octal constant} 2536 """": control_code:=hex; {precedes hexadecimal constant} 2537 "$": control_code:=check_sum; {string pool check sum} 2538 " ",tab_mark: control_code:=new_module; {beginning of a new module} 2539 "*": begin print('*',module_count+1:1); 2540 update_terminal; {print a progress report} 2541 control_code:=new_module; {beginning of a new module} 2542 end; 2543 "D","d": control_code:=definition; {macro definition} 2544 "F","f": control_code:=format; {format definition} 2545 "{": control_code:=begin_comment; {begin-comment delimiter} 2546 "}": control_code:=end_comment; {end-comment delimiter} 2547 "P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module} 2548 "T","t","^",".",":": control_code:=control_text; {control text to be ignored} 2549 "&": control_code:=join; {concatenate two tokens} 2550 "<": control_code:=module_name; {beginning of a module name} 2551 "=": control_code:=verbatim; {beginning of \PASCAL\ verbatim mode} 2552 "\": control_code:=force_line; {force a new line in \PASCAL\ output} 2553 othercases control_code:=ignore {ignore all other cases} 2554 endcases; 2555 end; 2556 2557 @ The |skip_ahead| procedure reads through the input at fairly high speed 2558 until finding the next non-ignorable control code, which it returns. 2559 2560 @p function skip_ahead:eight_bits; {skip to next control code} 2561 label done; 2562 var c:eight_bits; {control code found} 2563 begin loop begin if loc>limit then 2564 begin get_line; 2565 if input_has_ended then 2566 begin c:=new_module; goto done; 2567 end; 2568 end; 2569 buffer[limit+1]:="@@"; 2570 while buffer[loc]<>"@@" do incr(loc); 2571 if loc<=limit then 2572 begin loc:=loc+2; c:=control_code(buffer[loc-1]); 2573 if (c<>ignore)or(buffer[loc-1]=">") then goto done; 2574 end; 2575 end; 2576 done: skip_ahead:=c; 2577 end; 2578 2579 @ The |skip_comment| procedure reads through the input at somewhat high speed 2580 until finding the first unmatched right brace or until coming to the end 2581 of the file. It ignores characters following `\.\\' characters, since all 2582 braces that aren't nested are supposed to be hidden in that way. For 2583 example, consider the process of skipping the first comment below, 2584 where the string containing the right brace has been typed as \.{\`\\.\\\}\'} 2585 in the \.{WEB} file. 2586 2587 @p procedure skip_comment; {skips to next unmatched `\.\}'} 2588 label exit; 2589 var bal:eight_bits; {excess of left braces} 2590 @!c:ASCII_code; {current character} 2591 begin bal:=0; 2592 loop@+ begin if loc>limit then 2593 begin get_line; 2594 if input_has_ended then 2595 begin err_print('! Input ended in mid-comment'); 2596 @.Input ended in mid-comment@> 2597 return; 2598 end; 2599 end; 2600 c:=buffer[loc]; incr(loc); 2601 @<Do special things when |c="@@", "\", "{", "}"|; |return| at end@>; 2602 end; 2603 exit:end; 2604 2605 @ @<Do special things when |c="@@"...@>= 2606 if c="@@" then 2607 begin c:=buffer[loc]; 2608 if (c<>" ")and(c<>tab_mark)and(c<>"*") then incr(loc) 2609 else begin err_print('! Section ended in mid-comment'); 2610 @.Section ended in mid-comment@> 2611 decr(loc); return; 2612 end 2613 end 2614 else if (c="\")and(buffer[loc]<>"@@") then incr(loc) 2615 else if c="{" then incr(bal) 2616 else if c="}" then 2617 begin if bal=0 then return; 2618 decr(bal); 2619 end 2620 2621 @* Inputting the next token. 2622 As stated above, \.{TANGLE}'s most interesting input procedure is the 2623 |get_next| routine that inputs the next token. However, the procedure 2624 isn't especially difficult. 2625 2626 In most cases the tokens output by |get_next| have the form used in 2627 replacement texts, except that two-byte tokens are not produced. 2628 An identifier that isn't one letter long is represented by the 2629 output `|identifier|', and in such a case the global variables 2630 |id_first| and |id_loc| will have been set to the appropriate values 2631 needed by the |id_lookup| procedure. A string that begins with a 2632 double-quote is also considered an |identifier|, and in such a case 2633 the global variable |double_chars| will also have been set appropriately. 2634 Control codes produce the corresponding output of the |control_code| 2635 function above; and if that code is |module_name|, the value of |cur_module| 2636 will point to the |byte_start| entry for that module name. 2637 2638 Another global variable, |scanning_hex|, is |true| during the time that 2639 the letters \.A through \.F should be treated as if they were digits. 2640 2641 @<Globals...@>= 2642 @!cur_module: name_pointer; {name of module just scanned} 2643 @!scanning_hex: boolean; {are we scanning a hexadecimal constant?} 2644 2645 @ @<Set init...@>= 2646 scanning_hex:=false; 2647 2648 @ At the top level, |get_next| is a multi-way switch based on the next 2649 character in the input buffer. A |new_module| code is inserted at the 2650 very end of the input file. 2651 2652 @p function get_next:eight_bits; {produces the next input token} 2653 label restart,done,found; 2654 var c:eight_bits; {the current character} 2655 @!d:eight_bits; {the next character} 2656 @!j,@!k:0..longest_name; {indices into |mod_text|} 2657 begin restart: if loc>limit then 2658 begin get_line; 2659 if input_has_ended then 2660 begin c:=new_module; goto found; 2661 end; 2662 end; 2663 c:=buffer[loc]; incr(loc); 2664 if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit, 2665 otherwise set |scanning_hex:=false|@>; 2666 case c of 2667 "A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>; 2668 """": @<Get a preprocessed string@>; 2669 "@@": @<Get control code and possible module name@>; 2670 @t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@; 2671 " ",tab_mark: goto restart; {ignore spaces and tabs} 2672 "{": begin skip_comment; goto restart; 2673 end; 2674 "}": begin err_print('! Extra }'); goto restart; 2675 @.Extra \}@> 2676 end; 2677 othercases if c>=128 then goto restart {ignore nonstandard characters} 2678 else do_nothing 2679 endcases; 2680 found:@!debug if trouble_shooting then debug_help;@;@+gubed@/ 2681 get_next:=c; 2682 end; 2683 2684 @ @<Go to |found| if |c| is a hexadecimal digit...@>= 2685 if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found 2686 else scanning_hex:=false 2687 2688 @ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the 2689 respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used 2690 for \TeX\ comments in \PASCAL\ text. 2691 2692 @d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end 2693 2694 @<Compress two-symbol...@>= 2695 ".": if buffer[loc]="." then compress(double_dot) 2696 else if buffer[loc]=")" then compress("]"); 2697 ":": if buffer[loc]="=" then compress(left_arrow); 2698 "=": if buffer[loc]="=" then compress(equivalence_sign); 2699 ">": if buffer[loc]="=" then compress(greater_or_equal); 2700 "<": if buffer[loc]="=" then compress(less_or_equal) 2701 else if buffer[loc]=">" then compress(not_equal); 2702 "(": if buffer[loc]="*" then compress(begin_comment) 2703 else if buffer[loc]="." then compress("["); 2704 "*": if buffer[loc]=")" then compress(end_comment); 2705 2706 @ We have to look at the preceding character to make sure this isn't part 2707 of a real constant, before trying to find an identifier starting with 2708 `\.e' or `\.E'. 2709 2710 @<Get an identifier@>= 2711 begin if ((c="e")or(c="E"))and(loc>1) then 2712 if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=0; 2713 if c<>0 then 2714 begin decr(loc); id_first:=loc; 2715 repeat incr(loc); d:=buffer[loc]; 2716 until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z")) and 2717 (d<>"_"); 2718 if loc>id_first+1 then 2719 begin c:=identifier; id_loc:=loc; 2720 end; 2721 end 2722 else c:="E"; {exponent of a real constant} 2723 end 2724 2725 @ A string that starts and ends with double-quote marks is converted into 2726 an identifier that behaves like a numeric macro by means of the following 2727 piece of the program. 2728 @^preprocessed strings@> 2729 2730 @<Get a preprocessed string@>= 2731 begin double_chars:=0; id_first:=loc-1; 2732 repeat d:=buffer[loc]; incr(loc); 2733 if (d="""")or(d="@@") then 2734 if buffer[loc]=d then 2735 begin incr(loc); d:=0; incr(double_chars); 2736 end 2737 else begin if d="@@" then err_print('! Double @@ sign missing') 2738 @.Double \AT! sign missing@> 2739 end 2740 else if loc>limit then 2741 begin err_print('! String constant didn''t end'); d:=""""; 2742 @.String constant didn't end@> 2743 end; 2744 until d=""""; 2745 id_loc:=loc-1; c:=identifier; 2746 end 2747 2748 @ After an \.{@@} sign has been scanned, the next character tells us 2749 whether there is more work to do. 2750 2751 @<Get control code and possible module name@>= 2752 begin c:=control_code(buffer[loc]); incr(loc); 2753 if c=ignore then goto restart 2754 else if c=hex then scanning_hex:=true 2755 else if c=module_name then 2756 @<Scan the \(module name and make |cur_module| point to it@> 2757 else if c=control_text then 2758 begin repeat c:=skip_ahead; 2759 until c<>"@@"; 2760 if buffer[loc-1]<>">" then 2761 err_print('! Improper @@ within control text'); 2762 @.Improper \AT! within control text@> 2763 goto restart; 2764 end; 2765 end 2766 2767 @ @<Scan the \(module name...@>= 2768 begin @<Put module name into |mod_text[1..k]|@>; 2769 if k>3 then 2770 begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then 2771 cur_module:=prefix_lookup(k-3) 2772 else cur_module:=mod_lookup(k); 2773 end 2774 else cur_module:=mod_lookup(k); 2775 end 2776 2777 @ Module names are placed into the |mod_text| array with consecutive spaces, 2778 tabs, and carriage-returns replaced by single spaces. There will be no 2779 spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate 2780 this, since the |mod_lookup| routine uses |mod_text[1]| as the first 2781 character of the name.) 2782 2783 @<Set init...@>=mod_text[0]:=" "; 2784 2785 @ @<Put module name...@>= 2786 k:=0; 2787 loop@+ begin if loc>limit then 2788 begin get_line; 2789 if input_has_ended then 2790 begin err_print('! Input ended in section name'); 2791 @.Input ended in section name@> 2792 goto done; 2793 end; 2794 end; 2795 d:=buffer[loc]; 2796 @<If end of name, |goto done|@>; 2797 incr(loc); if k<longest_name-1 then incr(k); 2798 if (d=" ")or(d=tab_mark) then 2799 begin d:=" "; if mod_text[k-1]=" " then decr(k); 2800 end; 2801 mod_text[k]:=d; 2802 end; 2803 done: @<Check for overlong name@>; 2804 if (mod_text[k]=" ")and(k>0) then decr(k); 2805 2806 @ @<If end of name,...@>= 2807 if d="@@" then 2808 begin d:=buffer[loc+1]; 2809 if d=">" then 2810 begin loc:=loc+2; goto done; 2811 end; 2812 if (d=" ")or(d=tab_mark)or(d="*") then 2813 begin err_print('! Section name didn''t end'); goto done; 2814 @.Section name didn't end@> 2815 end; 2816 incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again} 2817 end 2818 2819 @ @<Check for overlong name@>= 2820 if k>=longest_name-2 then 2821 begin print_nl('! Section name too long: '); 2822 @.Section name too long@> 2823 for j:=1 to 25 do print(xchr[mod_text[j]]); 2824 print('...'); mark_harmless; 2825 end 2826 2827 @* Scanning a numeric definition. 2828 When \.{TANGLE} looks at the \PASCAL\ text following the `\.=' of a numeric 2829 macro definition, it calls on the procedure |scan_numeric(p)|, where |p| 2830 points to the name that is to be defined. This procedure evaluates the 2831 right-hand side, which must consist entirely of integer constants and 2832 defined numeric macros connected with \.+ and \.- signs (no parentheses). 2833 It also sets the global variable |next_control| to the control code that 2834 terminated this definition. 2835 2836 A definition ends with the control codes |definition|, |format|, |module_name|, 2837 |begin_Pascal|, and |new_module|, all of which can be recognized 2838 by the fact that they are the largest values |get_next| can return. 2839 2840 @d end_of_definition(#)==(#>=format) 2841 {is |#| a control code ending a definition?} 2842 2843 @<Global...@>= 2844 @!next_control:eight_bits; {control code waiting to be acted upon} 2845 2846 @ The evaluation of a numeric expression makes use of two variables called the 2847 |accumulator| and the |next_sign|. At the beginning, |accumulator| is zero and 2848 |next_sign| is $+1$. When a \.+ or \.- is scanned, |next_sign| is multiplied 2849 by the value of that sign. When a numeric value is scanned, it is multiplied by 2850 |next_sign| and added to the |accumulator|, then |next_sign| is reset to $+1$. 2851 2852 @d add_in(#)==begin accumulator:=accumulator+next_sign*(#); next_sign:=+1; 2853 end 2854 2855 @p procedure scan_numeric(@!p:name_pointer); {defines numeric macros} 2856 label reswitch, done; 2857 var accumulator:integer; {accumulates sums} 2858 @!next_sign:-1..+1; {sign to attach to next value} 2859 @!q:name_pointer; {points to identifiers being evaluated} 2860 @!val:integer; {constants being evaluated} 2861 begin @<Set \(|accumulator| to the value of the right-hand side@>; 2862 if abs(accumulator)>=@'100000 then 2863 begin err_print('! Value too big: ',accumulator:1); accumulator:=0; 2864 @.Value too big@> 2865 end; 2866 equiv[p]:=accumulator+@'100000; {name |p| now is defined to equal |accumulator|} 2867 end; 2868 2869 @ @<Set \(|accumulator| to the value of the right-hand side@>= 2870 accumulator:=0; next_sign:=+1; 2871 loop@+ begin next_control:=get_next; 2872 reswitch: case next_control of 2873 digits: begin @<Set |val| to value of decimal constant, and 2874 set |next_control| to the following token@>; 2875 add_in(val); goto reswitch; 2876 end; 2877 octal: begin @<Set |val| to value of octal constant, and 2878 set |next_control| to the following token@>; 2879 add_in(val); goto reswitch; 2880 end; 2881 hex: begin @<Set |val| to value of hexadecimal constant, and 2882 set |next_control| to the following token@>; 2883 add_in(val); goto reswitch; 2884 end; 2885 identifier: begin q:=id_lookup(normal); 2886 if ilk[q]<>numeric then 2887 begin next_control:="*"; goto reswitch; {leads to error} 2888 end; 2889 add_in(equiv[q]-@'100000); 2890 end; 2891 "+": do_nothing; 2892 "-": next_sign:=-next_sign; 2893 format, definition, module_name, begin_Pascal, new_module: goto done; 2894 ";": err_print('! Omit semicolon in numeric definition'); 2895 @.Omit semicolon in numeric def...@> 2896 othercases @<Signal error, flush rest of the definition@> 2897 endcases; 2898 end; 2899 done: 2900 2901 @ @<Signal error, flush rest...@>= 2902 begin err_print('! Improper numeric definition will be flushed'); 2903 @.Improper numeric definition...@> 2904 repeat next_control:=skip_ahead 2905 until end_of_definition(next_control); 2906 if next_control=module_name then 2907 begin {we want to scan the module name too} 2908 loc:=loc-2; next_control:=get_next; 2909 end; 2910 accumulator:=0; goto done; 2911 end 2912 2913 @ @<Set |val| to value of decimal...@>= 2914 val:=0; 2915 repeat val:=10*val+next_control-"0"; next_control:=get_next; 2916 until (next_control>"9")or(next_control<"0") 2917 2918 @ @<Set |val| to value of octal...@>= 2919 val:=0; next_control:="0"; 2920 repeat val:=8*val+next_control-"0"; next_control:=get_next; 2921 until (next_control>"7")or(next_control<"0") 2922 2923 @ @<Set |val| to value of hex...@>= 2924 val:=0; next_control:="0"; 2925 repeat if next_control>="A" then next_control:=next_control+"0"+10-"A"; 2926 val:=16*val+next_control-"0"; next_control:=get_next; 2927 until (next_control>"F")or(next_control<"0")or@| 2928 ((next_control>"9")and(next_control<"A")) 2929 2930 @* Scanning a macro definition. 2931 The rules for generating the replacement texts corresponding to simple 2932 macros, parametric macros, and \PASCAL\ texts of a module are almost 2933 identical, so a single procedure is used for all three cases. The 2934 differences are that 2935 2936 \yskip\item{a)} The sign |#| denotes a parameter only when it appears 2937 outside of strings in a parametric macro; otherwise it stands for the 2938 ASCII character |#|. (This is not used in standard \PASCAL, but some 2939 \PASCAL s allow, for example, `\.{/\#}' after a certain kind of file name.) 2940 2941 \item{b)}Module names are not allowed in simple macros or parametric macros; 2942 in fact, the appearance of a module name terminates such macros and denotes 2943 the name of the current module. 2944 2945 \item{c)}The symbols \.{@@d} and \.{@@f} and \.{@@p} are not allowed after 2946 module names, while they terminate macro definitions. 2947 2948 @ Therefore there is a procedure |scan_repl| whose parameter |t| specifies 2949 either |simple| or |parametric| or |module_name|. After |scan_repl| has 2950 acted, |cur_repl_text| will point to the replacement text just generated, and 2951 |next_control| will contain the control code that terminated the activity. 2952 2953 @<Globals...@>= 2954 @!cur_repl_text:text_pointer; {replacement text formed by |scan_repl|} 2955 2956 @ @p procedure scan_repl(@!t:eight_bits); {creates a replacement text} 2957 label continue, done, found, reswitch; 2958 var a:sixteen_bits; {the current token} 2959 @!b:ASCII_code; {a character from the buffer} 2960 @!bal:eight_bits; {left parentheses minus right parentheses} 2961 begin bal:=0; 2962 loop@+ begin continue: a:=get_next; 2963 case a of 2964 "(": incr(bal); 2965 ")": if bal=0 then err_print('! Extra )') 2966 @.Extra )@> 2967 else decr(bal); 2968 "'": @<Copy a string from the buffer to |tok_mem|@>; 2969 "#": if t=parametric then a:=param; 2970 @t\4@>@<In cases that |a| is a non-ASCII token (|identifier|, 2971 |module_name|, etc.), either process it and change |a| to a byte 2972 that should be stored, or |goto continue| if |a| should be ignored, 2973 or |goto done| if |a| signals the end of this replacement text@>@; 2974 othercases do_nothing 2975 endcases;@/ 2976 app_repl(a); {store |a| in |tok_mem|} 2977 end; 2978 done: next_control:=a; 2979 @<Make sure the parentheses balance@>; 2980 if text_ptr>max_texts-zz then overflow('text'); 2981 cur_repl_text:=text_ptr; tok_start[text_ptr+zz]:=tok_ptr[z]; 2982 incr(text_ptr); 2983 if z=zz-1 then z:=0@+else incr(z); 2984 end; 2985 2986 @ @<Make sure the parentheses balance@>= 2987 if bal>0 then 2988 begin if bal=1 then err_print('! Missing )') 2989 else err_print('! Missing ',bal:1,' )''s'); 2990 @.Missing n )@> 2991 while bal>0 do 2992 begin app_repl(")"); decr(bal); 2993 end; 2994 end 2995 2996 @ @<In cases that |a| is...@>= 2997 identifier: begin a:=id_lookup(normal); app_repl((a div @'400)+@'200); 2998 a:=a mod @'400; 2999 end; 3000 module_name: if t<>module_name then goto done 3001 else begin app_repl((cur_module div @'400)+@'250); 3002 a:=cur_module mod @'400; 3003 end; 3004 verbatim: @<Copy verbatim string from the buffer to |tok_mem|@>; 3005 definition, format, begin_Pascal: if t<>module_name then goto done 3006 else begin err_print('! @@',xchr[buffer[loc-1]], 3007 @.\AT!p is ignored in Pascal text@> 3008 @.\AT!d is ignored in Pascal text@> 3009 @.\AT!f is ignored in Pascal text@> 3010 ' is ignored in Pascal text'); goto continue; 3011 end; 3012 new_module: goto done; 3013 3014 @ @<Copy a string...@>= 3015 begin b:="'"; 3016 loop@+ begin app_repl(b); 3017 if b="@@" then 3018 if buffer[loc]="@@" then incr(loc) {store only one \.{@@}} 3019 else err_print('! You should double @@ signs in strings'); 3020 @.You should double \AT! signs@> 3021 if loc=limit then 3022 begin err_print('! String didn''t end'); 3023 @.String didn't end@> 3024 buffer[loc]:="'"; buffer[loc+1]:=0; 3025 end; 3026 b:=buffer[loc]; incr(loc); 3027 if b="'" then 3028 begin if buffer[loc]<>"'" then goto found 3029 else begin incr(loc); app_repl("'"); 3030 end; 3031 end; 3032 end; 3033 found: end {now |a| holds the final |"'"| that will be stored} 3034 3035 @ @<Copy verbatim string...@>= 3036 begin app_repl(verbatim); 3037 buffer[limit+1]:="@@"; 3038 reswitch: if buffer[loc]="@@" then 3039 begin if loc<limit then if buffer[loc+1]="@@" then 3040 begin app_repl("@@"); 3041 loc:=loc+2; 3042 goto reswitch; 3043 end; 3044 end 3045 else begin app_repl(buffer[loc]); 3046 incr(loc); 3047 goto reswitch; 3048 end; 3049 if loc>=limit then err_print('! Verbatim string didn''t end') 3050 @.Verbatim string didn't end@> 3051 else if buffer[loc+1]<>">" then 3052 err_print('! You should double @@ signs in verbatim strings'); 3053 @.You should double \AT! signs@> 3054 loc:=loc+2; 3055 end {another |verbatim| byte will be stored, since |a=verbatim|} 3056 3057 @ The following procedure is used to define a simple or parametric macro, 3058 just after the `\.{==}' of its definition has been scanned. 3059 3060 @p procedure define_macro(@!t:eight_bits); 3061 var p:name_pointer; {the identifier being defined} 3062 begin p:=id_lookup(t); scan_repl(t);@/ 3063 equiv[p]:=cur_repl_text; text_link[cur_repl_text]:=0; 3064 end; 3065 3066 @* Scanning a module. 3067 The |scan_module| procedure starts when `\.{@@\ }' or `\.{@@*}' has been 3068 sensed in the input, and it proceeds until the end of that module. It 3069 uses |module_count| to keep track of the current module number; with luck, 3070 \.{WEAVE} and \.{TANGLE} will both assign the same numbers to modules. 3071 3072 @<Globals...@>= 3073 @!module_count:0..@'27777; {the current module number} 3074 3075 @ The top level of |scan_module| is trivial. 3076 @p procedure scan_module; 3077 label continue, done, exit; 3078 var p:name_pointer; {module name for the current module} 3079 begin incr(module_count); 3080 @<Scan the \(definition part of the current module@>; 3081 @<Scan the \PASCAL\ part of the current module@>; 3082 exit: end; 3083 3084 @ @<Scan the \(definition part...@>= 3085 next_control:=0; 3086 loop@+ begin continue: while next_control<=format do 3087 begin next_control:=skip_ahead; 3088 if next_control=module_name then 3089 begin {we want to scan the module name too} 3090 loc:=loc-2; next_control:=get_next; 3091 end; 3092 end; 3093 if next_control<>definition then goto done; 3094 next_control:=get_next; {get identifier name} 3095 if next_control<>identifier then 3096 begin err_print('! Definition flushed, must start with ', 3097 @.Definition flushed...@> 3098 'identifier of length > 1'); goto continue; 3099 end; 3100 next_control:=get_next; {get token after the identifier} 3101 if next_control="=" then 3102 begin scan_numeric(id_lookup(numeric)); goto continue; 3103 end 3104 else if next_control=equivalence_sign then 3105 begin define_macro(simple); goto continue; 3106 end 3107 else @<If the next text is `\.{(\#)==}', call |define_macro| 3108 and |goto continue|@>; 3109 err_print('! Definition flushed since it starts badly'); 3110 @.Definition flushed...@> 3111 end; 3112 done: 3113 3114 @ @<If the next text is `\.{(\#)==}'...@>= 3115 if next_control="(" then 3116 begin next_control:=get_next; 3117 if next_control="#" then 3118 begin next_control:=get_next; 3119 if next_control=")" then 3120 begin next_control:=get_next; 3121 if next_control="=" then 3122 begin err_print('! Use == for macros'); 3123 @.Use == for macros@> 3124 next_control:=equivalence_sign; 3125 end; 3126 if next_control=equivalence_sign then 3127 begin define_macro(parametric); goto continue; 3128 end; 3129 end; 3130 end; 3131 end; 3132 3133 @ @<Scan the \PASCAL...@>= 3134 case next_control of 3135 begin_Pascal:p:=0; 3136 module_name: begin p:=cur_module; 3137 @<Check that |=| or |==| follows this module name, otherwise |return|@>; 3138 end; 3139 othercases return 3140 endcases;@/ 3141 @<Insert the module number into |tok_mem|@>; 3142 scan_repl(module_name); {now |cur_repl_text| points to the replacement text} 3143 @<Update the data structure so that the replacement text is accessible@>; 3144 3145 @ @<Check that |=|...@>= 3146 repeat next_control:=get_next; 3147 until next_control<>"+"; {allow optional `\.{+=}'} 3148 if (next_control<>"=")and(next_control<>equivalence_sign) then 3149 begin err_print('! Pascal text flushed, = sign is missing'); 3150 @.Pascal text flushed...@> 3151 repeat next_control:=skip_ahead; 3152 until next_control=new_module; 3153 return; 3154 end 3155 3156 @ @<Insert the module number...@>= 3157 store_two_bytes(@'150000+module_count); {|@'150000=@'320*@'400|} 3158 3159 @ @<Update the data...@>= 3160 if p=0 then {unnamed module} 3161 begin text_link[last_unnamed]:=cur_repl_text; last_unnamed:=cur_repl_text; 3162 end 3163 else if equiv[p]=0 then equiv[p]:=cur_repl_text {first module of this name} 3164 else begin p:=equiv[p]; 3165 while text_link[p]<module_flag do p:=text_link[p]; {find end of list} 3166 text_link[p]:=cur_repl_text; 3167 end; 3168 text_link[cur_repl_text]:=module_flag; 3169 {mark this replacement text as a nonmacro} 3170 3171 @* Debugging. 3172 The \PASCAL\ debugger with which \.{TANGLE} was developed allows breakpoints 3173 to be set, and variables can be read and changed, but procedures cannot be 3174 executed. Therefore a `|debug_help|' procedure has been inserted in the main 3175 loops of each phase of the program; when |ddt| and |dd| are set to appropriate 3176 values, symbolic printouts of various tables will appear. 3177 3178 The idea is to set a breakpoint inside the |debug_help| routine, at the 3179 place of `\ignorespaces|breakpoint:|\unskip' below. Then when 3180 |debug_help| is to be activated, set |trouble_shooting| equal to |true|. 3181 The |debug_help| routine will prompt you for values of |ddt| and |dd|, 3182 discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending 3183 with zero or a negative number. Then control either passes to the 3184 breakpoint, allowing you to look at and/or change variables (if you typed 3185 zero), or to exit the routine (if you typed a negative value). 3186 3187 Another global variable, |debug_cycle|, can be used to skip silently 3188 past calls on |debug_help|. If you set |debug_cycle>1|, the program stops 3189 only every |debug_cycle| times |debug_help| is called; however, 3190 any error stop will set |debug_cycle| to zero. 3191 3192 @<Globals...@>= 3193 @!debug@!trouble_shooting:boolean; {is |debug_help| wanted?} 3194 @!ddt:integer; {operation code for the |debug_help| routine} 3195 @!dd:integer; {operand in procedures performed by |debug_help|} 3196 @!debug_cycle:integer; {threshold for |debug_help| stopping} 3197 @!debug_skipped:integer; {we have skipped this many |debug_help| calls} 3198 @!term_in:text_file; {the user's terminal as an input file} 3199 gubed 3200 3201 @ The debugging routine needs to read from the user's terminal. 3202 @^system dependencies@> 3203 @<Set init...@>= 3204 @!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0;@/ 3205 trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works} 3206 reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|} 3207 gubed 3208 3209 @ @d breakpoint=888 {place where a breakpoint is desirable} 3210 @^system dependencies@> 3211 3212 @p @!debug procedure debug_help; {routine to display various things} 3213 label breakpoint,exit; 3214 var k:integer; {index into various arrays} 3215 begin incr(debug_skipped); 3216 if debug_skipped<debug_cycle then return; 3217 debug_skipped:=0; 3218 loop@+ begin print_nl('#'); update_terminal; {prompt} 3219 read(term_in,ddt); {read a debug-command code} 3220 if ddt<0 then return 3221 else if ddt=0 then 3222 begin goto breakpoint;@\ {go to every label at least once} 3223 breakpoint: ddt:=0;@\ 3224 end 3225 else begin read(term_in,dd); 3226 case ddt of 3227 1: print_id(dd); 3228 2: print_repl(dd); 3229 3: for k:=1 to dd do print(xchr[buffer[k]]); 3230 4: for k:=1 to dd do print(xchr[mod_text[k]]); 3231 5: for k:=1 to out_ptr do print(xchr[out_buf[k]]); 3232 6: for k:=1 to dd do print(xchr[out_contrib[k]]); 3233 othercases print('?') 3234 endcases; 3235 end; 3236 end; 3237 exit:end; 3238 gubed 3239 3240 @* The main program. 3241 We have defined plenty of procedures, and it is time to put the last 3242 pieces of the puzzle in place. Here is where \.{TANGLE} starts, and where 3243 it ends. 3244 @^system dependencies@> 3245 3246 @p begin initialize; 3247 @<Initialize the input system@>; 3248 print_ln(banner); {print a ``banner line''} 3249 @<Phase I: Read all the user's text and compress it into |tok_mem|@>; 3250 stat for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];@+tats@;@/ 3251 @<Phase II:...@>; 3252 end_of_TANGLE: 3253 if string_ptr>256 then @<Finish off the string pool file@>; 3254 stat @<Print statistics about memory usage@>;@+tats@;@/ 3255 @t\4\4@>{here files should be closed if the operating system requires it} 3256 @<Print the job |history|@>; 3257 end. 3258 3259 @ @<Phase I:...@>= 3260 phase_one:=true; 3261 module_count:=0; 3262 repeat next_control:=skip_ahead; 3263 until next_control=new_module; 3264 while not input_has_ended do scan_module; 3265 @<Check that all changes have been read@>; 3266 phase_one:=false; 3267 3268 @ @<Finish off the string pool file@>= 3269 begin print_nl(string_ptr-256:1, ' strings written to string pool file.'); 3270 write(pool,'*'); 3271 for ii:=1 to 9 do 3272 begin out_buf[ii]:=pool_check_sum mod 10; 3273 pool_check_sum:=pool_check_sum div 10; 3274 end; 3275 for ii:=9 downto 1 do write(pool,xchr["0"+out_buf[ii]]); 3276 write_ln(pool); 3277 end 3278 3279 @ @<Glob...@>= 3280 stat @!wo:0..ww-1; {segment of memory for which statistics are being printed} 3281 tats 3282 3283 @ @<Print statistics about memory usage@>= 3284 print_nl('Memory usage statistics:'); 3285 print_nl(name_ptr:1, ' names, ', text_ptr:1, ' replacement texts;'); 3286 print_nl(byte_ptr[0]:1); 3287 for wo:=1 to ww-1 do print('+',byte_ptr[wo]:1); 3288 if phase_one then 3289 for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii]; 3290 print(' bytes, ', max_tok_ptr[0]:1); 3291 for ii:=1 to zz-1 do print('+',max_tok_ptr[ii]:1); 3292 print(' tokens.'); 3293 3294 @ Some implementations may wish to pass the |history| value to the 3295 operating system so that it can be used to govern whether or not other 3296 programs are started. Here we simply report the history to the user. 3297 @^system dependencies@> 3298 3299 @<Print the job |history|@>= 3300 case history of 3301 spotless: print_nl('(No errors were found.)'); 3302 harmless_message: print_nl('(Did you see the warning message above?)'); 3303 error_message: print_nl('(Pardon me, but I think I spotted something wrong.)'); 3304 fatal_message: print_nl('(That was a fatal error, my friend.)'); 3305 end {there are no other cases} 3306 3307 @* System-dependent changes. 3308 This module should be replaced, if necessary, by changes to the program 3309 that are necessary to make \.{TANGLE} work at a particular installation. 3310 It is usually best to design your change file so that all changes to 3311 previous modules preserve the module numbering; then everybody's version 3312 will be consistent with the printed program. More extensive changes, 3313 which introduce new modules, can be inserted here; then only the index 3314 itself will get a new module number. 3315 @^system dependencies@> 3316 3317 @* Index. 3318 Here is a cross-reference table for the \.{TANGLE} processor. 3319 All modules in which an identifier is 3320 used are listed with that identifier, except that reserved words are 3321 indexed only when they appear in format definitions, and the appearances 3322 of identifiers in module names are not indexed. Underlined entries 3323 correspond to where the identifier was declared. Error messages and 3324 a few other things like ``ASCII code'' are indexed here too.