modernc.org/knuth@v0.0.4/weave/weave.web (about) 1 % This program by D. E. Knuth is not copyrighted and can be used freely. 2 % Version 0 was released in December, 1981. 3 % Version 1 was released in September, 1982, with version 0 of TeX. 4 % Slight changes were made in October, 1982, for version 0.6 of TeX. 5 % Version 1.1 changed "_" to "\_" if not within an identifier (November, 1982). 6 % Version 1.2 added @@= and @@\ and marked changed modules (December, 1982). 7 % Version 1.3 marked and indexed changed modules better (January, 1983). 8 % Version 1.4 added "history" (February, 1983). 9 % Version 1.5 conformed to TeX version 0.96 (March, 1983). 10 % Version 1.6 conformed to TeX version 0.98 (May, 1983). 11 % Version 1.7 introduced the new change file format (June, 1983). 12 % Version 2 was released in July, 1983, with version 0.999 of TeX. 13 % Version 2.1 corrected a bug in changed_module reckoning (August, 1983). 14 % Version 2.2 corrected it better (August, 1983). 15 % Version 2.3 starts the output with \input webmac (August, 1983). 16 % Version 2.4 fixed a bug in compress(#) (September, 1983). 17 % Version 2.5 cleared xrefswitch after module names (November, 1983). 18 % Version 2.6 fixed a bug in declaration of trans array (January, 1984). 19 % Version 2.7 fixed a bug in real constants (August, 1984). 20 % Version 2.8 fixed a bug in change_buffer movement (August, 1985). 21 % Version 2.9 increased max_refs and max_toks to 30000 each (January, 1987). 22 % Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989). 23 % Version 3.1 fixed a bug for programs with only one module (April, 1989). 24 % Version 4 was major change to allow 8-bit input (September, 1989). 25 % Version 4.1, for Breitenlohner, avoids English-only output (March, 1990). 26 % Version 4.2 conforms to ANSI standard for-loop rules (September, 1990). 27 % Version 4.3 catches extra } in input (Breitenlohner, September, 1991). 28 % Version 4.4 corrects changed_module logic, %-overflow (January, 1992). 29 % Version 4.5 corrects archaic @@z logic and empty change file (January, 2021). 30 31 % Here is TeX material that gets inserted after \input webmac 32 \def\hang{\hangindent 3em\indent\ignorespaces} 33 \font\ninerm=cmr9 34 \let\mc=\ninerm % medium caps for names like SAIL 35 \def\PASCAL{Pascal} 36 \def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|) 37 \def\v{\.{\char'174}} % vertical (|) in typewriter font 38 \def\dleft{[\![} \def\dright{]\!]} % double brackets 39 \mathchardef\RA="3221 % right arrow 40 \mathchardef\BA="3224 % double arrow 41 \def\({} % kludge for alphabetizing certain module names 42 43 \def\title{WEAVE} 44 \def\contentspagenumber{15} % should be odd 45 \def\topofcontents{\null\vfill 46 \titlefalse % include headline on the contents page 47 \def\rheader{\mainfont Appendix D\hfil \contentspagenumber} 48 \centerline{\titlefont The {\ttitlefont WEAVE} processor} 49 \vskip 15pt 50 \centerline{(Version 4.5)} 51 \vfill} 52 \pageno=\contentspagenumber \advance\pageno by 1 53 54 @* Introduction. 55 This program converts a \.{WEB} file to a \TeX\ file. It was written 56 by D. E. Knuth in October, 1981; a somewhat similar {\mc SAIL} program had 57 been developed in March, 1979, although the earlier program used a top-down 58 parsing method that is quite different from the present scheme. 59 60 The code uses a few features of the local \PASCAL\ compiler that may need 61 to be changed in other installations: 62 63 \yskip\item{1)} Case statements have a default. 64 \item{2)} Input-output routines may need to be adapted for use with a particular 65 character set and/or for printing messages on the user's terminal. 66 67 \yskip\noindent 68 These features are also present in the \PASCAL\ version of \TeX, where they 69 are used in a similar (but more complex) way. System-dependent portions 70 of \.{WEAVE} can be identified by looking at the entries for `system 71 dependencies' in the index below. 72 @!@^system dependencies@> 73 74 The ``banner line'' defined here should be changed whenever \.{WEAVE} 75 is modified. 76 77 @d banner=='This is WEAVE, Version 4.5' 78 79 @ The program begins with a fairly normal header, made up of pieces that 80 @^system dependencies@> 81 will mostly be filled in later. The \.{WEB} input comes from files |web_file| 82 and |change_file|, and the \TeX\ output goes to file |tex_file|. 83 84 If it is necessary to abort the job because of a fatal error, the program 85 calls the `|jump_out|' procedure, which goes to the label |end_of_WEAVE|. 86 87 @d end_of_WEAVE = 9999 {go here to wrap it up} 88 89 @p @t\4@>@<Compiler directives@>@/ 90 program WEAVE(@!web_file,@!change_file,@!tex_file); 91 label end_of_WEAVE; {go here to finish} 92 const @<Constants in the outer block@>@/ 93 type @<Types in the outer block@>@/ 94 var @<Globals in the outer block@>@/ 95 @<Error handling procedures@>@/ 96 procedure initialize; 97 var @<Local variables for initialization@>@/ 98 begin @<Set initial values@>@/ 99 end; 100 101 @ Some of this code is optional for use when debugging only; 102 such material is enclosed between the delimiters |debug| and $|gubed|$. 103 Other parts, delimited by |stat| and $|tats|$, are optionally included 104 if statistics about \.{WEAVE}'s memory usage are desired. 105 106 @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} 107 @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} 108 @f debug==begin 109 @f gubed==end 110 @# 111 @d stat==@{ {change this to `$\\{stat}\equiv\null$' 112 when gathering usage statistics} 113 @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' 114 when gathering usage statistics} 115 @f stat==begin 116 @f tats==end 117 118 @ The \PASCAL\ compiler used to develop this system has ``compiler 119 directives'' that can appear in comments whose first character is a dollar sign. 120 In production versions of \.{WEAVE} these directives tell the compiler that 121 @^system dependencies@> 122 it is safe to avoid range checks and to leave out the extra code it inserts 123 for the \PASCAL\ debugger's benefit, although interrupts will occur if 124 there is arithmetic overflow. 125 126 @<Compiler directives@>= 127 @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} 128 @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} 129 130 @ Labels are given symbolic names by the following definitions. We insert 131 the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a 132 procedure in which we have used the `|return|' statement defined below; 133 the label `|restart|' is occasionally used at the very beginning of a 134 procedure; and the label `|reswitch|' is occasionally used just prior to 135 a \&{case} statement in which some cases change the conditions and we wish to 136 branch to the newly applicable case. 137 Loops that are set up with the \&{loop} construction defined below are 138 commonly exited by going to `|done|' or to `|found|' or to `|not_found|', 139 and they are sometimes repeated by going to `|continue|'. 140 141 @d exit=10 {go here to leave a procedure} 142 @d restart=20 {go here to start a procedure again} 143 @d reswitch=21 {go here to start a case statement again} 144 @d continue=22 {go here to resume a loop} 145 @d done=30 {go here to exit a loop} 146 @d found=31 {go here when you've found it} 147 @d not_found=32 {go here when you've found something else} 148 149 @ Here are some macros for common programming idioms. 150 151 @d incr(#) == #:=#+1 {increase a variable by unity} 152 @d decr(#) == #:=#-1 {decrease a variable by unity} 153 @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} 154 @d do_nothing == {empty statement} 155 @d return == goto exit {terminate a procedure call} 156 @f return == nil 157 @f loop == xclause 158 159 @ We assume that |case| statements may include a default case that applies 160 if no matching label is found. Thus, we shall use constructions like 161 @^system dependencies@> 162 $$\vbox{\halign{#\hfil\cr 163 |case x of|\cr 164 1: $\langle\,$code for $x=1\,\rangle$;\cr 165 3: $\langle\,$code for $x=3\,\rangle$;\cr 166 |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr 167 |endcases|\cr}}$$ 168 since most \PASCAL\ compilers have plugged this hole in the language by 169 incorporating some sort of default mechanism. For example, the compiler 170 used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label, 171 and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or 172 `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| 173 and |endcases| should be changed to agree with local conventions. 174 (Of course, if no default mechanism is available, the |case| statements of 175 this program must be extended by listing all remaining cases.) 176 177 @d othercases == others: {default for cases not listed explicitly} 178 @d endcases == @+end {follows the default case in an extended |case| statement} 179 @f othercases == else 180 @f endcases == end 181 182 @ The following parameters are set big enough to handle \TeX, so they 183 should be sufficient for most applications of \.{WEAVE}. 184 185 @<Constants...@>= 186 @!max_bytes=45000; {|1/ww| times the number of bytes in identifiers, 187 index entries, and module names; must be less than 65536} 188 @!max_names=5000; {number of identifiers, index entries, and module names; 189 must be less than 10240} 190 @!max_modules=2000;{greater than the total number of modules} 191 @!hash_size=353; {should be prime} 192 @!buf_size=100; {maximum length of input line} 193 @!longest_name=400; {module names shouldn't be longer than this} 194 @!long_buf_size=500; {|buf_size+longest_name|} 195 @!line_length=80; {lines of \TeX\ output have at most this many characters, 196 should be less than 256} 197 @!max_refs=30000; {number of cross references; must be less than 65536} 198 @!max_toks=30000; {number of symbols in \PASCAL\ texts being parsed; 199 must be less than 65536} 200 @!max_texts=2000; {number of phrases in \PASCAL\ texts being parsed; 201 must be less than 10240} 202 @!max_scraps=1000; {number of tokens in \PASCAL\ texts being parsed} 203 @!stack_size=200; {number of simultaneous output levels} 204 205 @ A global variable called |history| will contain one of four values 206 at the end of every run: |spotless| means that no unusual messages were 207 printed; |harmless_message| means that a message of possible interest 208 was printed but no serious errors were detected; |error_message| means that 209 at least one error was found; |fatal_message| means that the program 210 terminated abnormally. The value of |history| does not influence the 211 behavior of the program; it is simply computed for the convenience 212 of systems that might want to use such information. 213 214 @d spotless=0 {|history| value for normal jobs} 215 @d harmless_message=1 {|history| value when non-serious info was printed} 216 @d error_message=2 {|history| value when an error was noted} 217 @d fatal_message=3 {|history| value when we had to stop prematurely} 218 @# 219 @d mark_harmless==@t@>@+if history=spotless then history:=harmless_message 220 @d mark_error==history:=error_message 221 @d mark_fatal==history:=fatal_message 222 223 @<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?} 224 225 @ @<Set init...@>=history:=spotless; 226 227 @* The character set. 228 One of the main goals in the design of \.{WEB} has been to make it readily 229 portable between a wide variety of computers. Yet \.{WEB} by its very 230 nature must use a greater variety of characters than most computer 231 programs deal with, and character encoding is one of the areas in which 232 existing machines differ most widely from each other. 233 234 To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is 235 converted to an internal eight-bit code that is essentially standard 236 ASCII, the ``American Standard Code for Information Interchange.'' 237 The conversion is done immediately when each character is read in. 238 Conversely, characters are converted from ASCII to the user's external 239 representation just before they are output. (The original ASCII code 240 was seven bits only; \.{WEB} now allows eight bits in an attempt to 241 keep up with modern times.) 242 243 Such an internal code is relevant to users of \.{WEB} only because it is 244 the code used for preprocessed constants like \.{"A"}. If you are writing 245 a program in \.{WEB} that makes use of such one-character constants, you 246 should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do. 247 Otherwise \.{WEB}'s internal coding scheme does not affect you. 248 @^ASCII code@> 249 250 Here is a table of the standard visible ASCII codes: 251 $$\def\:{\char\count255\global\advance\count255 by 1} 252 \count255='40 253 \vbox{ 254 \hbox{\hbox to 40pt{\it\hfill0\/\hfill}% 255 \hbox to 40pt{\it\hfill1\/\hfill}% 256 \hbox to 40pt{\it\hfill2\/\hfill}% 257 \hbox to 40pt{\it\hfill3\/\hfill}% 258 \hbox to 40pt{\it\hfill4\/\hfill}% 259 \hbox to 40pt{\it\hfill5\/\hfill}% 260 \hbox to 40pt{\it\hfill6\/\hfill}% 261 \hbox to 40pt{\it\hfill7\/\hfill}} 262 \vskip 4pt 263 \hrule 264 \def\^{\vrule height 10.5pt depth 4.5pt} 265 \halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^ 266 \hbox to 40pt{\tt\hfill#\hfill\^}& 267 &\hbox to 40pt{\tt\hfill#\hfill\^}\cr 268 04&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 269 05&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 270 06&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 271 07&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 272 10&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 273 11&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 274 12&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 275 13&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 276 14&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 277 15&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 278 16&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule} 279 17&\:&\:&\:&\:&\:&\:&\:\cr} 280 \hrule width 280pt}$$ 281 (Actually, of course, code @'040 is an invisible blank space.) Code @'136 282 was once an upward arrow (\.{\char'13}), and code @'137 was 283 once a left arrow (\.^^X), in olden times when the first draft 284 of ASCII code was prepared; but \.{WEB} works with today's standard 285 ASCII in which those codes represent circumflex and underline as shown. 286 287 @<Types...@>= 288 @!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers} 289 290 @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit 291 character sets were common, so it did not make provision for lowercase 292 letters. Nowadays, of course, we need to deal with both capital and small 293 letters in a convenient way, so \.{WEB} assumes that it is being used 294 with a \PASCAL\ whose character set contains at least the characters of 295 standard ASCII as listed above. Some \PASCAL\ compilers use the original 296 name |char| for the data type associated with the characters in text files, 297 while other \PASCAL s consider |char| to be a 64-element subrange of a larger 298 data type that has some other name. 299 300 In order to accommodate this difference, we shall use the name |text_char| 301 to stand for the data type of the characters in the input and output 302 files. We shall also assume that |text_char| consists of the elements 303 |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The 304 following definitions should be adjusted if necessary. 305 @^system dependencies@> 306 307 @d text_char == char {the data type of characters in text files} 308 @d first_text_char=0 {ordinal number of the smallest element of |text_char|} 309 @d last_text_char=255 {ordinal number of the largest element of |text_char|} 310 311 @<Types...@>= 312 @!text_file=packed file of text_char; 313 314 @ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and 315 the user's external character set by means of arrays |xord| and |xchr| 316 that are analogous to \PASCAL's |ord| and |chr| functions. 317 318 @<Globals...@>= 319 @!xord: array [text_char] of ASCII_code; 320 {specifies conversion of input characters} 321 @!xchr: array [ASCII_code] of text_char; 322 {specifies conversion of output characters} 323 324 @ If we assume that every system using \.{WEB} is able to read and write the 325 visible characters of standard ASCII (although not necessarily using the 326 ASCII codes to represent them), the following assignment statements initialize 327 most of the |xchr| array properly, without needing any system-dependent 328 changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears 329 in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code 330 on the external medium on which it resides, but \.{TANGLE} will convert from 331 this external code to ASCII and back again. Therefore the assignment 332 statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file, 333 and \PASCAL\ will compile this statement so that |xchr[65]| receives the 334 character \.A in the external (|char|) code. Note that it would be quite 335 incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of 336 type |integer|, not |char|, and because we have $|"A"|=65$ regardless of 337 the external character set. 338 339 @<Set init...@>= 340 xchr[@'40]:=' '; 341 xchr[@'41]:='!'; 342 xchr[@'42]:='"'; 343 xchr[@'43]:='#'; 344 xchr[@'44]:='$'; 345 xchr[@'45]:='%'; 346 xchr[@'46]:='&'; 347 xchr[@'47]:='''';@/ 348 xchr[@'50]:='('; 349 xchr[@'51]:=')'; 350 xchr[@'52]:='*'; 351 xchr[@'53]:='+'; 352 xchr[@'54]:=','; 353 xchr[@'55]:='-'; 354 xchr[@'56]:='.'; 355 xchr[@'57]:='/';@/ 356 xchr[@'60]:='0'; 357 xchr[@'61]:='1'; 358 xchr[@'62]:='2'; 359 xchr[@'63]:='3'; 360 xchr[@'64]:='4'; 361 xchr[@'65]:='5'; 362 xchr[@'66]:='6'; 363 xchr[@'67]:='7';@/ 364 xchr[@'70]:='8'; 365 xchr[@'71]:='9'; 366 xchr[@'72]:=':'; 367 xchr[@'73]:=';'; 368 xchr[@'74]:='<'; 369 xchr[@'75]:='='; 370 xchr[@'76]:='>'; 371 xchr[@'77]:='?';@/ 372 xchr[@'100]:='@@'; 373 xchr[@'101]:='A'; 374 xchr[@'102]:='B'; 375 xchr[@'103]:='C'; 376 xchr[@'104]:='D'; 377 xchr[@'105]:='E'; 378 xchr[@'106]:='F'; 379 xchr[@'107]:='G';@/ 380 xchr[@'110]:='H'; 381 xchr[@'111]:='I'; 382 xchr[@'112]:='J'; 383 xchr[@'113]:='K'; 384 xchr[@'114]:='L'; 385 xchr[@'115]:='M'; 386 xchr[@'116]:='N'; 387 xchr[@'117]:='O';@/ 388 xchr[@'120]:='P'; 389 xchr[@'121]:='Q'; 390 xchr[@'122]:='R'; 391 xchr[@'123]:='S'; 392 xchr[@'124]:='T'; 393 xchr[@'125]:='U'; 394 xchr[@'126]:='V'; 395 xchr[@'127]:='W';@/ 396 xchr[@'130]:='X'; 397 xchr[@'131]:='Y'; 398 xchr[@'132]:='Z'; 399 xchr[@'133]:='['; 400 xchr[@'134]:='\'; 401 xchr[@'135]:=']'; 402 xchr[@'136]:='^'; 403 xchr[@'137]:='_';@/ 404 xchr[@'140]:='`'; 405 xchr[@'141]:='a'; 406 xchr[@'142]:='b'; 407 xchr[@'143]:='c'; 408 xchr[@'144]:='d'; 409 xchr[@'145]:='e'; 410 xchr[@'146]:='f'; 411 xchr[@'147]:='g';@/ 412 xchr[@'150]:='h'; 413 xchr[@'151]:='i'; 414 xchr[@'152]:='j'; 415 xchr[@'153]:='k'; 416 xchr[@'154]:='l'; 417 xchr[@'155]:='m'; 418 xchr[@'156]:='n'; 419 xchr[@'157]:='o';@/ 420 xchr[@'160]:='p'; 421 xchr[@'161]:='q'; 422 xchr[@'162]:='r'; 423 xchr[@'163]:='s'; 424 xchr[@'164]:='t'; 425 xchr[@'165]:='u'; 426 xchr[@'166]:='v'; 427 xchr[@'167]:='w';@/ 428 xchr[@'170]:='x'; 429 xchr[@'171]:='y'; 430 xchr[@'172]:='z'; 431 xchr[@'173]:='{'; 432 xchr[@'174]:='|'; 433 xchr[@'175]:='}'; 434 xchr[@'176]:='~';@/ 435 xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used} 436 437 @ Some of the ASCII codes below @'40 have been given symbolic names in 438 \.{WEAVE} and \.{TANGLE} because they are used with a special meaning. 439 440 @d and_sign=@'4 {equivalent to `\.{and}'} 441 @d not_sign=@'5 {equivalent to `\.{not}'} 442 @d set_element_sign=@'6 {equivalent to `\.{in}'} 443 @d tab_mark=@'11 {ASCII code used as tab-skip} 444 @d line_feed=@'12 {ASCII code thrown away at end of line} 445 @d form_feed=@'14 {ASCII code used at end of page} 446 @d carriage_return=@'15 {ASCII code used at end of line} 447 @d left_arrow=@'30 {equivalent to `\.{:=}'} 448 @d not_equal=@'32 {equivalent to `\.{<>}'} 449 @d less_or_equal=@'34 {equivalent to `\.{<=}'} 450 @d greater_or_equal=@'35 {equivalent to `\.{>=}'} 451 @d equivalence_sign=@'36 {equivalent to `\.{==}'} 452 @d or_sign=@'37 {equivalent to `\.{or}'} 453 454 @ When we initialize the |xord| array and the remaining parts of |xchr|, 455 it will be convenient to make use of an index variable, |i|. 456 457 @<Local variables for init...@>= 458 @!i:0..255; 459 460 @ Here now is the system-dependent part of the character set. 461 If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which 462 only standard ASCII codes will appear in the input and output files, you 463 don't need to make any changes here. But if you have, for example, an extended 464 character set like the one in Appendix~C of {\sl The \TeX book}, the first 465 line of code in this module should be changed to 466 $$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$ 467 \.{WEB}'s character set is essentially identical to \TeX's, even with respect to 468 characters less than @'40. 469 @^system dependencies@> 470 471 Changes to the present module will make \.{WEB} more friendly on computers 472 that have an extended character set, so that one can type things like 473 \.^^Z\ instead of \.{<>}. If you have an extended set of characters that 474 are easily incorporated into text files, you can assign codes arbitrarily 475 here, giving an |xchr| equivalent to whatever characters the users of 476 \.{WEB} are allowed to have in their input files, provided that unsuitable 477 characters do not correspond to special codes like |carriage_return| 478 that are listed above. 479 480 (The present file \.{WEAVE.WEB} does not contain any of the non-ASCII 481 characters, because it is intended to be used with all implementations of 482 \.{WEB}. It was originally created on a Stanford system that has a 483 convenient extended character set, then ``sanitized'' by applying another 484 program that transliterated all of the non-standard characters into 485 standard equivalents.) 486 487 @<Set init...@>= 488 for i:=1 to @'37 do xchr[i]:=' '; 489 for i:=@'200 to @'377 do xchr[i]:=' '; 490 491 @ The following system-independent code makes the |xord| array contain a 492 suitable inverse to the information in |xchr|. 493 494 @<Set init...@>= 495 for i:=first_text_char to last_text_char do xord[chr(i)]:=" "; 496 for i:=1 to @'377 do xord[xchr[i]]:=i; 497 xord[' ']:=" "; 498 499 @* Input and output. 500 The input conventions of this program are intended to be very much like those 501 of \TeX\ (except, of course, that they are much simpler, because much less 502 needs to be done). Furthermore they are identical to those of \.{TANGLE}. 503 Therefore people who need to make modifications to all three systems 504 should be able to do so without too many headaches. 505 506 We use the standard \PASCAL\ input/output procedures in several places that 507 \TeX\ cannot, since \.{WEAVE} does not have to deal with files that are named 508 dynamically by the user, and since there is no input from the terminal. 509 510 @ Terminal output is done by writing on file |term_out|, which is assumed to 511 consist of characters of type |text_char|: 512 @^system dependencies@> 513 514 @d print(#)==write(term_out,#) {`|print|' means write on the terminal} 515 @d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line} 516 @d new_line==write_ln(term_out) {start new line} 517 @d print_nl(#)== {print information starting on a new line} 518 begin new_line; print(#); 519 end 520 521 @<Globals...@>= 522 @!term_out:text_file; {the terminal as an output file} 523 524 @ Different systems have different ways of specifying that the output on a 525 certain file will appear on the user's terminal. Here is one way to do this 526 on the \PASCAL\ system that was used in \.{TANGLE}'s initial development: 527 @^system dependencies@> 528 529 @<Set init...@>= 530 rewrite(term_out,'TTY:'); {send |term_out| output to the terminal} 531 532 @ The |update_terminal| procedure is called when we want 533 to make sure that everything we have output to the terminal so far has 534 actually left the computer's internal buffers and been sent. 535 @^system dependencies@> 536 537 @d update_terminal == break(term_out) {empty the terminal output buffer} 538 539 @ The main input comes from |web_file|; this input may be overridden 540 by changes in |change_file|. (If |change_file| is empty, there are no changes.) 541 542 @<Globals...@>= 543 @!web_file:text_file; {primary input} 544 @!change_file:text_file; {updates} 545 546 @ The following code opens the input files. Since these files were listed 547 in the program header, we assume that the \PASCAL\ runtime system has 548 already checked that suitable file names have been given; therefore no 549 additional error checking needs to be done. We will see below that 550 \.{WEAVE} reads through the entire input twice. 551 @^system dependencies@> 552 553 @p procedure open_input; {prepare to read |web_file| and |change_file|} 554 begin reset(web_file); reset(change_file); 555 end; 556 557 @ The main output goes to |tex_file|. 558 559 @<Globals...@>= 560 @!tex_file: text_file; 561 562 @ The following code opens |tex_file|. 563 Since this file was listed in the program header, we assume that the 564 \PASCAL\ runtime system has checked that a suitable external file name has 565 been given. 566 @^system dependencies@> 567 568 @<Set init...@>= 569 rewrite(tex_file); 570 571 @ Input goes into an array called |buffer|. 572 573 @<Globals...@>=@!buffer: array[0..long_buf_size] of ASCII_code; 574 575 @ The |input_ln| procedure brings the next line of input from the specified 576 file into the |buffer| array and returns the value |true|, unless the file has 577 already been entirely read, in which case it returns |false|. The conventions 578 of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line 579 of the file are input into |buffer[0]|, |buffer[1]|, \dots, 580 |buffer[limit-1]|; trailing blanks are ignored; 581 and the global variable |limit| is set to the length of the 582 @^system dependencies@> 583 line. The value of |limit| must be strictly less than |buf_size|. 584 585 We assume that none of the |ASCII_code| values 586 of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|, 587 or |carriage_return|. Since |buf_size| is strictly less than |long_buf_size|, 588 some of \.{WEAVE}'s routines use the fact that it is safe to refer to 589 |buffer[limit+2]| without overstepping the bounds of the array. 590 591 @p function input_ln(var f:text_file):boolean; 592 {inputs a line or returns |false|} 593 var final_limit:0..buf_size; {|limit| without trailing blanks} 594 begin limit:=0; final_limit:=0; 595 if eof(f) then input_ln:=false 596 else begin while not eoln(f) do 597 begin buffer[limit]:=xord[f^]; get(f); 598 incr(limit); 599 if buffer[limit-1]<>" " then final_limit:=limit; 600 if limit=buf_size then 601 begin while not eoln(f) do get(f); 602 decr(limit); {keep |buffer[buf_size]| empty} 603 if final_limit>limit then final_limit:=limit; 604 print_nl('! Input line too long'); loc:=0; error; 605 @.Input line too long@> 606 end; 607 end; 608 read_ln(f); limit:=final_limit; input_ln:=true; 609 end; 610 end; 611 612 @* Reporting errors to the user. 613 The \.{WEAVE} processor operates in three phases: first it inputs the source 614 file and stores cross-reference data, then it inputs the source once again and 615 produces the \TeX\ output file, and finally it sorts and outputs the index. 616 617 The global variables |phase_one| and |phase_three| tell which Phase we are in. 618 619 @<Globals...@>= 620 @!phase_one: boolean; {|true| in Phase I, |false| in Phases II and III} 621 @!phase_three: boolean; {|true| in Phase III, |false| in Phases I and II} 622 623 @ If an error is detected while we are debugging, 624 we usually want to look at the contents of memory. 625 A special procedure will be declared later for this purpose. 626 627 @<Error handling...@>= 628 @!debug@+ procedure debug_help; forward;@+gubed 629 630 @ The command `|err_print('! Error message')|' will report a syntax error to 631 the user, by printing the error message at the beginning of a new line and 632 then giving an indication of where the error was spotted in the source file. 633 Note that no period follows the error message, since the error routine 634 will automatically supply a period. 635 636 The actual error indications are provided by a procedure called |error|. 637 However, error messages are not actually reported during phase one, 638 since errors detected on the first pass will be detected again 639 during the second. 640 641 @d err_print(#)== 642 begin if not phase_one then 643 begin new_line; print(#); error; 644 end; 645 end 646 647 @<Error handling...@>= 648 procedure error; {prints `\..' and location of error message} 649 var@!k,@!l: 0..long_buf_size; {indices into |buffer|} 650 begin @<Print error location based on input buffer@>; 651 update_terminal; mark_error; 652 @!debug debug_skipped:=debug_cycle;debug_help;@+gubed 653 end; 654 655 @ The error locations can be indicated by using the global variables 656 |loc|, |line|, and |changing|, which tell respectively the first 657 unlooked-at position in |buffer|, the current line number, and whether or not 658 the current line is from |change_file| or |web_file|. 659 This routine should be modified on systems whose standard text editor 660 has special line-numbering conventions. 661 @^system dependencies@> 662 663 @<Print error location based on input buffer@>= 664 begin if changing then print('. (change file ')@+else print('. ('); 665 print_ln('l.', line:1, ')'); 666 if loc>=limit then l:=limit else l:=loc; 667 for k:=1 to l do 668 if buffer[k-1]=tab_mark then print(' ') 669 else print(xchr[buffer[k-1]]); {print the characters already read} 670 new_line; 671 for k:=1 to l do print(' '); {space out the next line} 672 for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read} 673 if buffer[limit]="|" then print(xchr["|"]); 674 {end of \PASCAL\ text in module names} 675 print(' '); {this space separates the message from future asterisks} 676 end 677 678 @ The |jump_out| procedure just cuts across all active procedure levels 679 and jumps out of the program. This is the only non-local \&{goto} statement 680 in \.{WEAVE}. It is used when no recovery from a particular error has 681 been provided. 682 683 Some \PASCAL\ compilers do not implement non-local |goto| statements. 684 @^system dependencies@> 685 In such cases the code that appears at label |end_of_WEAVE| should be 686 copied into the |jump_out| procedure, followed by a call to a system procedure 687 that terminates the program. 688 689 @d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out; 690 end 691 692 @<Error handling...@>= 693 procedure jump_out; 694 begin goto end_of_WEAVE; 695 end; 696 697 @ Sometimes the program's behavior is far different from what it should be, 698 and \.{WEAVE} prints an error message that is really for the \.{WEAVE} 699 maintenance person, not the user. In such cases the program says 700 |confusion('indication of where we are')|. 701 702 @d confusion(#)==fatal_error('! This can''t happen (',#,')') 703 @.This can't happen@> 704 705 @ An overflow stop occurs if \.{WEAVE}'s tables aren't large enough. 706 707 @d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded') 708 @.Sorry, x capacity exceeded@> 709 710 @* Data structures. 711 During the first phase of its processing, \.{WEAVE} puts identifier names, 712 index entries, and module names into the large |byte_mem| array, which is 713 packed with eight-bit integers. Allocation is sequential, since names are 714 never deleted. 715 716 An auxiliary array |byte_start| is used as a directory for |byte_mem|, 717 and the |link|, |ilk|, and |xref| arrays give further information about names. 718 These auxiliary arrays consist of sixteen-bit items. 719 720 @<Types...@>= 721 @!eight_bits=0..255; {unsigned one-byte quantity} 722 @!sixteen_bits=0..65535; {unsigned two-byte quantity} 723 724 @ \.{WEAVE} has been designed to avoid the need for indices that are more 725 than sixteen bits wide, so that it can be used on most computers. But 726 there are programs that need more than 65536 bytes; \TeX\ is one of these. 727 To get around this problem, a slight complication has been added to the 728 data structures: |byte_mem| is a two-dimensional array, whose first index 729 is either 0 or 1. (For generality, the first index is actually allowed to 730 run between 0 and |ww-1|, where |ww| is defined to be 2; the program will 731 work for any positive value of |ww|, and it can be simplified in obvious 732 ways if |ww=1|.) 733 734 @d ww=2 {we multiply the byte capacity by approximately this amount} 735 736 @<Globals...@>= 737 @!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code; 738 {characters of names} 739 @!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|} 740 @!link: array [0..max_names] of sixteen_bits; {hash table or tree links} 741 @!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links} 742 @!xref: array [0..max_names] of sixteen_bits; {heads of cross-reference lists} 743 744 @ The names of identifiers are found by computing a hash address |h| and 745 then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|, 746 |link[link[hash[h]]]|, \dots, until either finding the desired name 747 or encountering a zero. 748 749 A `|name_pointer|' variable, which signifies a name, is an index into 750 |byte_start|. The actual sequence of characters in the name pointed to by 751 |p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive, 752 in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when 753 |ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]| 754 and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|. 755 The pointer 0 is used for undefined module names; we don't 756 want to use it for the names of identifiers, since 0 stands for a null 757 pointer in a linked list. 758 759 We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]| 760 for |0<=w<ww|, since these are the starting positions for the next |ww| 761 names to be stored in |byte_mem|. 762 763 @d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name} 764 765 @<Types...@>= 766 @!name_pointer=0..max_names; {identifies a name} 767 768 @ @<Global...@>= 769 @!name_ptr:name_pointer; {first unused position in |byte_start|} 770 @!byte_ptr:array [0..ww-1] of 0..max_bytes; 771 {first unused position in |byte_mem|} 772 773 @ @<Local variables for init...@>= 774 @!wi: 0..ww-1; {to initialize the |byte_mem| indices} 775 776 @ @<Set init...@>= 777 for wi:=0 to ww-1 do 778 begin byte_start[wi]:=0; byte_ptr[wi]:=0; 779 end; 780 byte_start[ww]:=0; {this makes name 0 of length zero} 781 name_ptr:=1; 782 783 @ Several types of identifiers are distinguished by their |ilk|: 784 785 \yskip\hang |normal| identifiers are part of the \PASCAL\ program and 786 will appear in italic type. 787 788 \yskip\hang |roman| identifiers are index entries that appear after 789 \.{@@\^} in the \.{WEB} file. 790 791 \yskip\hang |wildcard| identifiers are index entries that appear after 792 \.{@@:} in the \.{WEB} file. 793 794 \yskip\hang |typewriter| identifiers are index entries that appear after 795 \.{@@.} in the \.{WEB} file. 796 797 \yskip\hang |array_like|, |begin_like|, \dots, |var_like| 798 identifiers are \PASCAL\ reserved words whose |ilk| explains how they are 799 to be treated when \PASCAL\ code is being formatted. 800 801 \yskip\hang Finally, if |c| is an ASCII code, an |ilk| equal to 802 |char_like+c| denotes a reserved word that will be converted to character 803 |c|. 804 805 @d normal=0 {ordinary identifiers have |normal| ilk} 806 @d roman=1 {normal index entries have |roman| ilk} 807 @d wildcard=2 {user-formatted index entries have |wildcard| ilk} 808 @d typewriter=3 {`typewriter type' entries have |typewriter| ilk} 809 @d reserved(#)==(ilk[#]>typewriter) {tells if a name is a reserved word} 810 @d array_like=4 {\&{array}, \&{file}, \&{set}} 811 @d begin_like=5 {\&{begin}} 812 @d case_like=6 {\&{case}} 813 @d const_like=7 {\&{const}, \&{label}, \&{type}} 814 @d div_like=8 {\&{div}, \&{mod}} 815 @d do_like=9 {\&{do}, \&{of}, \&{then}} 816 @d else_like=10 {\&{else}} 817 @d end_like=11 {\&{end}} 818 @d for_like=12 {\&{for}, \&{while}, \&{with}} 819 @d goto_like=13 {\&{goto}, \&{packed}} 820 @d if_like=14 {\&{if}} 821 @d intercal_like=15 {not used} 822 @d nil_like=16 {\&{nil}} 823 @d proc_like=17 {\&{function}, \&{procedure}, \&{program}} 824 @d record_like=18 {\&{record}} 825 @d repeat_like=19 {\&{repeat}} 826 @d to_like=20 {\&{downto}, \&{to}} 827 @d until_like=21 {\&{until}} 828 @d var_like=22 {\&{var}} 829 @d loop_like=23 {\&{loop}, \&{xclause}} 830 @d char_like=24 {\&{and}, \&{or}, \&{not}, \&{in}} 831 832 @ The names of modules are stored in |byte_mem| together 833 with the identifier names, but a hash table is not used for them because 834 \.{WEAVE} needs to be able to recognize a module name when given a prefix of 835 that name. A conventional binary search tree is used to retrieve module names, 836 with fields called |llink| and |rlink| in place of |link| and |ilk|. The 837 root of this tree is |rlink[0]|. 838 839 @d llink==link {left link in binary search tree for module names} 840 @d rlink==ilk {right link in binary search tree for module names} 841 @d root==rlink[0] {the root of the binary search tree for module names} 842 843 @<Set init...@>= 844 root:=0; {the binary search tree starts out with nothing in it} 845 846 @ Here is a little procedure that prints the text of a given name on the 847 user's terminal. 848 849 @p procedure print_id(@!p:name_pointer); {print identifier or module name} 850 var k:0..max_bytes; {index into |byte_mem|} 851 @!w:0..ww-1; {row of |byte_mem|} 852 begin if p>=name_ptr then print('IMPOSSIBLE') 853 else begin w:=p mod ww; 854 for k:=byte_start[p] to byte_start[p+ww]-1 do 855 print(xchr[byte_mem[w,k]]); 856 end; 857 end; 858 859 @ We keep track of the current module number in 860 |module_count|, which is the total number of modules that have started. 861 Modules which have been altered by a change file entry 862 have their |changed_module| flag turned on during the first phase. 863 864 @<Globals...@>= 865 @!module_count:0..max_modules; {the current module number} 866 @!changed_module: packed array [0..max_modules] of boolean; {is it changed?} 867 @!change_exists: boolean; {has any module changed?} 868 869 @ The other large memory area in \.{WEAVE} keeps the cross-reference data. 870 All uses of the name |p| are recorded in a linked list beginning at 871 |xref[p]|, which points into the |xmem| array. Entries in |xmem| consist 872 of two sixteen-bit items per word, called the |num| and |xlink| fields. 873 If |x| is an index into |xmem|, reached from name |p|, the value of |num(x)| 874 is either a module number where |p| is used, or it is |def_flag| plus a 875 module number where |p| is defined; and |xlink(x)| points to the next such 876 cross reference for |p|, if any. This list of cross references is in 877 decreasing order by module number. The current number of cross references 878 is |xref_ptr|. 879 880 The global variable |xref_switch| is set either to |def_flag| or to zero, 881 depending on whether the next cross reference to an identifier is to be 882 underlined or not in the index. This switch is set to |def_flag| when 883 \.{@@!} or \.{@@d} or \.{@@f} is scanned, and it is cleared to zero when 884 the next identifier or index entry cross reference has been made. Similarly, 885 the global variable |mod_xref_switch| is either |def_flag| or zero, depending 886 on whether a module name is being defined or used. 887 888 @d num(#)==xmem[#].num_field 889 @d xlink(#)==xmem[#].xlink_field 890 @d def_flag=10240 {must be strictly larger than |max_modules|} 891 892 @ @<Types...@>= 893 @!xref_number=0..max_refs; 894 895 @ @<Globals...@>= 896 @!xmem:array[xref_number] of packed record@t@>@/ 897 @!num_field: sixteen_bits; {module number plus zero or |def_flag|} 898 @!xlink_field: sixteen_bits; {pointer to the previous cross reference} 899 end; 900 @!xref_ptr:xref_number; {the largest occupied position in |xmem|} 901 @!xref_switch,@!mod_xref_switch:0..def_flag; {either zero or |def_flag|} 902 903 @ @<Set init...@>=xref_ptr:=0; xref_switch:=0; mod_xref_switch:=0; num(0):=0; 904 xref[0]:=0; {cross references to undefined modules} 905 906 @ A new cross reference for an identifier is formed by calling |new_xref|, 907 which discards duplicate entries and ignores non-underlined references 908 to one-letter identifiers or \PASCAL's reserved words. 909 910 @d append_xref(#)==if xref_ptr=max_refs then overflow('cross reference') 911 else begin incr(xref_ptr); num(xref_ptr):=#; 912 end 913 914 @p procedure new_xref(@!p:name_pointer); 915 label exit; 916 var q:xref_number; {pointer to previous cross reference} 917 @!m,@!n: sixteen_bits; {new and previous cross-reference value} 918 begin if (reserved(p)or(byte_start[p]+1=byte_start[p+ww]))and 919 (xref_switch=0) then return; 920 m:=module_count+xref_switch; xref_switch:=0; q:=xref[p]; 921 if q>0 then 922 begin n:=num(q); 923 if (n=m)or(n=m+def_flag) then return 924 else if m=n+def_flag then 925 begin num(q):=m; return; 926 end; 927 end; 928 append_xref(m); xlink(xref_ptr):=q; xref[p]:=xref_ptr; 929 exit: end; 930 931 @ The cross reference lists for module names are slightly different. Suppose 932 that a module name is defined in modules $m_1$, \dots, $m_k$ and used in 933 modules $n_1$, \dots, $n_l$. Then its list will contain $m_1+|def_flag|$, 934 $m_k+|def_flag|$, \dots, $m_2+|def_flag|$, $n_l$, \dots, $n_1$, in 935 this order. After Phase II, however, the order will be 936 $m_1+|def_flag|$, \dots, $m_k+|def_flag|$, $n_1$, \dots, $n_l$. 937 938 @p procedure new_mod_xref(@!p:name_pointer); 939 var q,@!r:xref_number; {pointers to previous cross references} 940 begin q:=xref[p]; r:=0; 941 if q>0 then 942 begin if mod_xref_switch=0 then while num(q)>=def_flag do 943 begin r:=q; q:=xlink(q); 944 end 945 else if num(q)>=def_flag then 946 begin r:=q; q:=xlink(q); 947 end; 948 end; 949 append_xref(module_count+mod_xref_switch); xlink(xref_ptr):=q; 950 mod_xref_switch:=0; 951 if r=0 then xref[p]:=xref_ptr 952 else xlink(r):=xref_ptr; 953 end; 954 955 @ A third large area of memory is used for sixteen-bit `tokens', which appear 956 in short lists similar to the strings of characters in |byte_mem|. Token lists 957 are used to contain the result of \PASCAL\ code translated into \TeX\ form; 958 further details about them will be explained later. A |text_pointer| variable 959 is an index into |tok_start|. 960 961 @<Types...@>= 962 @!text_pointer=0..max_texts; {identifies a token list} 963 964 @ The first position of |tok_mem| 965 that is unoccupied by replacement text is called |tok_ptr|, and the first 966 unused location of |tok_start| is called |text_ptr|. 967 Thus, we usually have |tok_start[text_ptr]=tok_ptr|. 968 969 @<Glob...@>= 970 @t\hskip1em@>@!tok_mem: packed array [0..max_toks] of sixteen_bits; {tokens} 971 @t\hskip1em@>@!tok_start: array [text_pointer] of sixteen_bits; 972 {directory into |tok_mem|} 973 @t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|} 974 @t\hskip1em@>@!tok_ptr:0..max_toks; {first unused position in |tok_mem|} 975 stat@!max_tok_ptr,@!max_txt_ptr:0..max_toks; {largest values occurring} 976 tats 977 978 @ @<Set init...@>= 979 tok_ptr:=1; text_ptr:=1; tok_start[0]:=1; tok_start[1]:=1; 980 stat max_tok_ptr:=1; max_txt_ptr:=1;@+tats 981 982 @* Searching for identifiers. 983 The hash table described above is updated by the |id_lookup| procedure, 984 which finds a given identifier and returns a pointer to its index in 985 |byte_start|. The identifier is supposed to match character by character 986 and it is also supposed to have a given |ilk| code; the same name may be 987 present more than once if it is supposed to appear in the index with 988 different typesetting conventions. 989 If the identifier was not already present, it is inserted into the table. 990 991 Because of the way \.{WEAVE}'s scanning mechanism works, it is most convenient 992 to let |id_lookup| search for an identifier that is present in the |buffer| 993 array. Two other global variables specify its position in the buffer: the 994 first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|. 995 996 @<Glob...@>= 997 @!id_first:0..long_buf_size; {where the current identifier begins in the buffer} 998 @!id_loc:0..long_buf_size; {just after the current identifier in the buffer} 999 @# 1000 @!hash:array [0..hash_size] of sixteen_bits; {heads of hash lists} 1001 1002 @ Initially all the hash lists are empty. 1003 1004 @<Local variables for init...@>= 1005 @!h:0..hash_size; {index into hash-head array} 1006 1007 @ @<Set init...@>= 1008 for h:=0 to hash_size-1 do hash[h]:=0; 1009 1010 @ Here now is the main procedure for finding identifiers (and index 1011 entries). The parameter |t| is set to the desired |ilk| code. The 1012 identifier must either have |ilk=t|, or we must have 1013 |t=normal| and the identifier must be a reserved word. 1014 1015 @p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier} 1016 label found; 1017 var i:0..long_buf_size; {index into |buffer|} 1018 @!h:0..hash_size; {hash code} 1019 @!k:0..max_bytes; {index into |byte_mem|} 1020 @!w:0..ww-1; {row of |byte_mem|} 1021 @!l:0..long_buf_size; {length of the given identifier} 1022 @!p:name_pointer; {where the identifier is being sought} 1023 begin l:=id_loc-id_first; {compute the length} 1024 @<Compute the hash code |h|@>; 1025 @<Compute the name location |p|@>; 1026 if p=name_ptr then @<Enter a new name into the table at position |p|@>; 1027 id_lookup:=p; 1028 end; 1029 1030 @ A simple hash code is used: If the sequence of 1031 ASCII codes is $c_1c_2\ldots c_n$, its hash value will be 1032 $$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$ 1033 1034 @<Compute the hash...@>= 1035 h:=buffer[id_first]; i:=id_first+1; 1036 while i<id_loc do 1037 begin h:=(h+h+buffer[i]) mod hash_size; incr(i); 1038 end 1039 1040 @ If the identifier is new, it will be placed in position |p=name_ptr|, 1041 otherwise |p| will point to its existing location. 1042 1043 @<Compute the name location...@>= 1044 p:=hash[h]; 1045 while p<>0 do 1046 begin if (length(p)=l)and((ilk[p]=t)or((t=normal)and reserved(p))) then 1047 @<Compare name |p| with current identifier, 1048 |goto found| if equal@>; 1049 p:=link[p]; 1050 end; 1051 p:=name_ptr; {the current identifier is new} 1052 link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list} 1053 found: 1054 1055 @ @<Compare name |p|...@>= 1056 begin i:=id_first; k:=byte_start[p]; w:=p mod ww; 1057 while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do 1058 begin incr(i); incr(k); 1059 end; 1060 if i=id_loc then goto found; {all characters agree} 1061 end 1062 1063 @ When we begin the following segment of the program, |p=name_ptr|. 1064 1065 @<Enter a new name...@>= 1066 begin w:=name_ptr mod ww; 1067 if byte_ptr[w]+l>max_bytes then overflow('byte memory'); 1068 if name_ptr+ww>max_names then overflow('name'); 1069 i:=id_first; k:=byte_ptr[w]; {get ready to move the identifier into |byte_mem|} 1070 while i<id_loc do 1071 begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i); 1072 end; 1073 byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr); 1074 ilk[p]:=t; xref[p]:=0; 1075 end 1076 1077 @* Initializing the table of reserved words. 1078 We have to get \PASCAL's reserved words into the hash table, and the 1079 simplest way to do this is to insert them every time \.{WEAVE} is run. 1080 A few macros permit us to do the initialization with a compact program. 1081 1082 @d sid9(#)==buffer[9]:=#;cur_name:=id_lookup 1083 @d sid8(#)==buffer[8]:=#;sid9 1084 @d sid7(#)==buffer[7]:=#;sid8 1085 @d sid6(#)==buffer[6]:=#;sid7 1086 @d sid5(#)==buffer[5]:=#;sid6 1087 @d sid4(#)==buffer[4]:=#;sid5 1088 @d sid3(#)==buffer[3]:=#;sid4 1089 @d sid2(#)==buffer[2]:=#;sid3 1090 @d sid1(#)==buffer[1]:=#;sid2 1091 @d id2==id_first:=8; sid8 1092 @d id3==id_first:=7; sid7 1093 @d id4==id_first:=6; sid6 1094 @d id5==id_first:=5; sid5 1095 @d id6==id_first:=4; sid4 1096 @d id7==id_first:=3; sid3 1097 @d id8==id_first:=2; sid2 1098 @d id9==id_first:=1; sid1 1099 1100 @<Globals...@>= 1101 @!cur_name:name_pointer; {points to the identifier just inserted} 1102 1103 @ The intended use of the macros above might not be immediately obvious, 1104 but the riddle is answered by the following: 1105 1106 @<Store all the reserved words@>= 1107 id_loc:=10;@/ 1108 id3("a")("n")("d")(char_like+and_sign);@/ 1109 id5("a")("r")("r")("a")("y")(array_like);@/ 1110 id5("b")("e")("g")("i")("n")(begin_like);@/ 1111 id4("c")("a")("s")("e")(case_like);@/ 1112 id5("c")("o")("n")("s")("t")(const_like);@/ 1113 id3("d")("i")("v")(div_like);@/ 1114 id2("d")("o")(do_like);@/ 1115 id6("d")("o")("w")("n")("t")("o")(to_like);@/ 1116 id4("e")("l")("s")("e")(else_like);@/ 1117 id3("e")("n")("d")(end_like);@/ 1118 id4("f")("i")("l")("e")(array_like);@/ 1119 id3("f")("o")("r")(for_like);@/ 1120 id8("f")("u")("n")("c")("t")("i")("o")("n")(proc_like);@/ 1121 id4("g")("o")("t")("o")(goto_like);@/ 1122 id2("i")("f")(if_like);@/ 1123 id2("i")("n")(char_like+set_element_sign);@/ 1124 id5("l")("a")("b")("e")("l")(const_like);@/ 1125 id3("m")("o")("d")(div_like);@/ 1126 id3("n")("i")("l")(nil_like);@/ 1127 id3("n")("o")("t")(char_like+not_sign);@/ 1128 id2("o")("f")(do_like);@/ 1129 id2("o")("r")(char_like+or_sign);@/ 1130 id6("p")("a")("c")("k")("e")("d")(goto_like);@/ 1131 id9("p")("r")("o")("c")("e")("d")("u")("r")("e")(proc_like);@/ 1132 id7("p")("r")("o")("g")("r")("a")("m")(proc_like);@/ 1133 id6("r")("e")("c")("o")("r")("d")(record_like);@/ 1134 id6("r")("e")("p")("e")("a")("t")(repeat_like);@/ 1135 id3("s")("e")("t")(array_like);@/ 1136 id4("t")("h")("e")("n")(do_like);@/ 1137 id2("t")("o")(to_like);@/ 1138 id4("t")("y")("p")("e")(const_like);@/ 1139 id5("u")("n")("t")("i")("l")(until_like);@/ 1140 id3("v")("a")("r")(var_like);@/ 1141 id5("w")("h")("i")("l")("e")(for_like);@/ 1142 id4("w")("i")("t")("h")(for_like);@/ 1143 id7("x")("c")("l")("a")("u")("s")("e")(loop_like);@/ 1144 1145 @* Searching for module names. 1146 The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the 1147 search tree, after inserting it if necessary, and returns a pointer to 1148 where it was found. 1149 1150 @<Glob...@>= 1151 @!mod_text:array [0..longest_name] of ASCII_code; {name being sought for} 1152 1153 @ According to the rules of \.{WEB}, no module name 1154 should be a proper prefix of another, so a ``clean'' comparison should 1155 occur between any two names. The result of |mod_lookup| is 0 if this 1156 prefix condition is violated. An error message is printed when such violations 1157 are detected during phase two of \.{WEAVE}. 1158 1159 @d less=0 {the first name is lexicographically less than the second} 1160 @d equal=1 {the first name is equal to the second} 1161 @d greater=2 {the first name is lexicographically greater than the second} 1162 @d prefix=3 {the first name is a proper prefix of the second} 1163 @d extension=4 {the first name is a proper extension of the second} 1164 1165 @p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name} 1166 label found; 1167 var c:less..extension; {comparison between two names} 1168 @!j:0..longest_name; {index into |mod_text|} 1169 @!k:0..max_bytes; {index into |byte_mem|} 1170 @!w:0..ww-1; {row of |byte_mem|} 1171 @!p:name_pointer; {current node of the search tree} 1172 @!q:name_pointer; {father of node |p|} 1173 begin c:=greater; q:=0; p:=root; 1174 while p<>0 do 1175 begin @<Set variable |c| to the result of comparing the given name 1176 to name |p|@>; 1177 q:=p; 1178 if c=less then p:=llink[q] 1179 else if c=greater then p:=rlink[q] 1180 else goto found; 1181 end; 1182 @<Enter a new module name into the tree@>; 1183 found: if c<>equal then 1184 begin err_print('! Incompatible section names'); p:=0; 1185 @.Incompatible section names@> 1186 end; 1187 mod_lookup:=p; 1188 end; 1189 1190 @ @<Enter a new module name...@>= 1191 w:=name_ptr mod ww; k:=byte_ptr[w]; 1192 if k+l>max_bytes then overflow('byte memory'); 1193 if name_ptr>max_names-ww then overflow('name'); 1194 p:=name_ptr; 1195 if c=less then llink[q]:=p else rlink[q]:=p; 1196 llink[p]:=0; rlink[p]:=0; xref[p]:=0; c:=equal; 1197 for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j]; 1198 byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr); 1199 1200 @ @<Set variable |c|...@>= 1201 begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1; 1202 while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do 1203 begin incr(k); incr(j); 1204 end; 1205 if k=byte_start[p+ww] then 1206 if j>l then c:=equal 1207 else c:=extension 1208 else if j>l then c:=prefix 1209 else if mod_text[j]<byte_mem[w,k] then c:=less 1210 else c:=greater; 1211 end 1212 1213 @ The |prefix_lookup| procedure is supposed to find exactly one module 1214 name that has |mod_text[1..l]| as a prefix. Actually the algorithm 1215 silently accepts also the situation that some module name is a prefix of 1216 |mod_text[1..l]|, because the user who painstakingly typed in more than 1217 necessary probably doesn't want to be told about the wasted effort. 1218 1219 Recall that error messages are not printed during phase one. It is 1220 possible that the |prefix_lookup| procedure will fail on the first pass, 1221 because there is no match, yet the second pass might detect no error if a 1222 matching module name has occurred after the offending prefix. In such a 1223 case the cross-reference information will be incorrect and \.{WEAVE} will 1224 report no error. However, such a mistake will be detected by the 1225 \.{TANGLE} processor. 1226 1227 @p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension} 1228 var c:less..extension; {comparison between two names} 1229 @!count:0..max_names; {the number of hits} 1230 @!j:0..longest_name; {index into |mod_text|} 1231 @!k:0..max_bytes; {index into |byte_mem|} 1232 @!w:0..ww-1; {row of |byte_mem|} 1233 @!p:name_pointer; {current node of the search tree} 1234 @!q:name_pointer; {another place to resume the search after one branch is done} 1235 @!r:name_pointer; {extension found} 1236 begin q:=0; p:=root; count:=0; r:=0; {begin search at root of tree} 1237 while p<>0 do 1238 begin @<Set variable |c| to the result of comparing...@>; 1239 if c=less then p:=llink[p] 1240 else if c=greater then p:=rlink[p] 1241 else begin r:=p; incr(count); q:=rlink[p]; p:=llink[p]; 1242 end; 1243 if p=0 then 1244 begin p:=q; q:=0; 1245 end; 1246 end; 1247 if count<>1 then 1248 if count=0 then err_print('! Name does not match') 1249 @.Name does not match@> 1250 else err_print('! Ambiguous prefix'); 1251 @.Ambiguous prefix@> 1252 prefix_lookup:=r; {the result will be 0 if there was no match} 1253 end; 1254 1255 @* Lexical scanning. 1256 Let us now consider the subroutines that read the \.{WEB} source file 1257 and break it into meaningful units. There are four such procedures: 1258 One simply skips to the next `\.{@@\ }' or `\.{@@*}' that begins a 1259 module; another passes over the \TeX\ text at the beginning of a 1260 module; the third passes over the \TeX\ text in a \PASCAL\ comment; 1261 and the last, which is the most interesting, gets the next token of 1262 a \PASCAL\ text. 1263 1264 @ But first we need to consider the low-level routine |get_line| 1265 that takes care of merging |change_file| into |web_file|. The |get_line| 1266 procedure also updates the line numbers for error messages. 1267 1268 @<Globals...@>= 1269 @!ii:integer; {general purpose |for| loop variable in the outer block} 1270 @!line:integer; {the number of the current line in the current file} 1271 @!other_line:integer; {the number of the current line in the input file that 1272 is not currently being read} 1273 @!temp_line:integer; {used when interchanging |line| with |other_line|} 1274 @!limit:0..long_buf_size; {the last character position occupied in the buffer} 1275 @!loc:0..long_buf_size; {the next character position to be read from the buffer} 1276 @!input_has_ended: boolean; {if |true|, there is no more input} 1277 @!changing: boolean; {if |true|, the current line is from |change_file|} 1278 @!change_pending: boolean; {if |true|, the current change is not yet 1279 recorded in |changed_module[module_count]|} 1280 1281 @ As we change |changing| from |true| to |false| and back again, we must 1282 remember to swap the values of |line| and |other_line| so that the |err_print| 1283 routine will be sure to report the correct line number. 1284 1285 @d change_changing== 1286 changing := not changing; 1287 temp_line:=other_line; other_line:=line; line:=temp_line 1288 {|line @t$\null\BA\null$@> other_line|} 1289 1290 @ When |changing| is |false|, the next line of |change_file| is kept in 1291 |change_buffer[0..change_limit]|, for purposes of comparison with the next 1292 line of |web_file|. After the change file has been completely input, we 1293 set |change_limit:=0|, so that no further matches will be made. 1294 1295 @<Globals...@>= 1296 @!change_buffer:array[0..buf_size] of ASCII_code; 1297 @!change_limit:0..buf_size; {the last position occupied in |change_buffer|} 1298 1299 @ Here's a simple function that checks if the two buffers are different. 1300 1301 @p function lines_dont_match:boolean; 1302 label exit; 1303 var k:0..buf_size; {index into the buffers} 1304 begin lines_dont_match:=true; 1305 if change_limit<>limit then return; 1306 if limit>0 then 1307 for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return; 1308 lines_dont_match:=false; 1309 exit: end; 1310 1311 @ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation 1312 for the next matching operation. Since blank lines in the change file are 1313 not used for matching, we have |(change_limit=0)and not changing| if and 1314 only if the change file is exhausted. This procedure is called only 1315 when |changing| is true; hence error messages will be reported correctly. 1316 1317 @p procedure prime_the_change_buffer; 1318 label continue, done, exit; 1319 var k:0..buf_size; {index into the buffers} 1320 begin change_limit:=0; {this value will be used if the change file ends} 1321 @<Skip over comment lines in the change file; |return| if end of file@>; 1322 @<Skip to the next nonblank line; |return| if end of file@>; 1323 @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>; 1324 exit: end; 1325 1326 @ While looking for a line that begins with \.{@@x} in the change file, 1327 we allow lines that begin with \.{@@}, as long as they don't begin with 1328 \.{@@y} or \.{@@z} (which would probably indicate that the change file is 1329 fouled up). 1330 1331 @<Skip over comment lines in the change file...@>= 1332 loop@+ begin incr(line); 1333 if not input_ln(change_file) then return; 1334 if limit<2 then goto continue; 1335 if buffer[0]<>"@@" then goto continue; 1336 if (buffer[1]>="X")and(buffer[1]<="Z") then 1337 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 1338 if buffer[1]="x" then goto done; 1339 if (buffer[1]="y")or(buffer[1]="z") then 1340 begin loc:=2; err_print('! Where is the matching @@x?'); 1341 @.Where is the match...@> 1342 end; 1343 continue: end; 1344 done: 1345 1346 @ Here we are looking at lines following the \.{@@x}. 1347 1348 @<Skip to the next nonblank line...@>= 1349 repeat incr(line); 1350 if not input_ln(change_file) then 1351 begin err_print('! Change file ended after @@x'); 1352 @.Change file ended...@> 1353 return; 1354 end; 1355 until limit>0; 1356 1357 @ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>= 1358 begin change_limit:=limit; 1359 if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k]; 1360 end 1361 1362 @ The following procedure is used to see if the next change entry should 1363 go into effect; it is called only when |changing| is false. 1364 The idea is to test whether or not the current 1365 contents of |buffer| matches the current contents of |change_buffer|. 1366 If not, there's nothing more to do; but if so, a change is called for: 1367 All of the text down to the \.{@@y} is supposed to match. An error 1368 message is issued if any discrepancy is found. Then the procedure 1369 prepares to read the next line from |change_file|. 1370 1371 When a match is found, the current module is marked as changed unless 1372 the first line after the \.{@@x} and after the \.{@@y} both start with 1373 either |'@@*'| or |'@@ '| (possibly preceded by whitespace). 1374 1375 @d if_module_start_then_make_change_pending(#)== 1376 loc:=0; buffer[limit]:="!"; 1377 while (buffer[loc]=" ")or(buffer[loc]=tab_mark) do incr(loc); 1378 buffer[limit]:=" "; 1379 if buffer[loc]="@@" then 1380 if (buffer[loc+1]="*") or 1381 (buffer[loc+1]=" ") or (buffer[loc+1]=tab_mark) then 1382 change_pending:=# 1383 1384 @p procedure check_change; {switches to |change_file| if the buffers match} 1385 label exit; 1386 var n:integer; {the number of discrepancies found} 1387 @!k:0..buf_size; {index into the buffers} 1388 begin if lines_dont_match then return; 1389 change_pending:=false; 1390 if not changed_module[module_count] then 1391 begin if_module_start_then_make_change_pending(true); 1392 if not change_pending then changed_module[module_count]:=true; 1393 end; 1394 n:=0; 1395 loop@+ begin change_changing; {now it's |true|} 1396 incr(line); 1397 if not input_ln(change_file) then 1398 begin err_print('! Change file ended before @@y'); 1399 @.Change file ended...@> 1400 change_limit:=0; change_changing; {|false| again} 1401 return; 1402 end; 1403 @<If the current line starts with \.{@@y}, 1404 report any discrepancies and |return|@>; 1405 @<Move |buffer| and |limit|...@>; 1406 change_changing; {now it's |false|} 1407 incr(line); 1408 if not input_ln(web_file) then 1409 begin err_print('! WEB file ended during a change'); 1410 @.WEB file ended...@> 1411 input_has_ended:=true; return; 1412 end; 1413 if lines_dont_match then incr(n); 1414 end; 1415 exit: end; 1416 1417 @ @<If the current line starts with \.{@@y}...@>= 1418 if limit>1 then if buffer[0]="@@" then 1419 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 1420 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 1421 if (buffer[1]="x")or(buffer[1]="z") then 1422 begin loc:=2; err_print('! Where is the matching @@y?'); 1423 @.Where is the match...@> 1424 end 1425 else if buffer[1]="y" then 1426 begin if n>0 then 1427 begin loc:=2; err_print('! Hmm... ',n:1, 1428 ' of the preceding lines failed to match'); 1429 @.Hmm... n of the preceding...@> 1430 end; 1431 return; 1432 end; 1433 end 1434 1435 @ The |reset_input| procedure, which gets \.{WEAVE} ready to read the 1436 user's \.{WEB} input, is used at the beginning of phases one and two. 1437 1438 @p procedure reset_input; 1439 begin open_input; line:=0; other_line:=0;@/ 1440 changing:=true; prime_the_change_buffer; change_changing;@/ 1441 limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false; 1442 end; 1443 1444 @ The |get_line| procedure is called when |loc>limit|; it puts the next 1445 line of merged input into the buffer and updates the other variables 1446 appropriately. A space is placed at the right end of the line. 1447 1448 @p procedure get_line; {inputs the next line} 1449 label restart; 1450 begin restart:if changing then 1451 @<Read from |change_file| and maybe turn off |changing|@>; 1452 if not changing then 1453 begin @<Read from |web_file| and maybe turn on |changing|@>; 1454 if changing then goto restart; 1455 end; 1456 loc:=0; buffer[limit]:=" "; 1457 end; 1458 1459 @ @<Read from |web_file|...@>= 1460 begin incr(line); 1461 if not input_ln(web_file) then input_has_ended:=true 1462 else if change_limit>0 then check_change; 1463 end 1464 1465 @ @<Read from |change_file|...@>= 1466 begin incr(line); 1467 if not input_ln(change_file) then 1468 begin err_print('! Change file ended without @@z'); 1469 @.Change file ended...@> 1470 buffer[0]:="@@"; buffer[1]:="z"; limit:=2; 1471 end; 1472 if limit>0 then {check if the change has ended} 1473 begin if change_pending then 1474 begin if_module_start_then_make_change_pending(false); 1475 if change_pending then 1476 begin changed_module[module_count]:=true; change_pending:=false; 1477 end; 1478 end; 1479 buffer[limit]:=" "; 1480 if buffer[0]="@@" then 1481 begin if (buffer[1]>="X")and(buffer[1]<="Z") then 1482 buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify} 1483 if (buffer[1]="x")or(buffer[1]="y") then 1484 begin loc:=2; err_print('! Where is the matching @@z?'); 1485 @.Where is the match...@> 1486 end 1487 else if buffer[1]="z" then 1488 begin prime_the_change_buffer; change_changing; 1489 end; 1490 end; 1491 end; 1492 end 1493 1494 @ At the end of the program, we will tell the user if the change file 1495 had a line that didn't match any relevant line in |web_file|. 1496 1497 @<Check that all changes have been read@>= 1498 if change_limit<>0 then {|changing| is false} 1499 begin for ii:=0 to change_limit-1 do buffer[ii]:=change_buffer[ii]; 1500 limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit; 1501 err_print('! Change file entry did not match'); 1502 @.Change file entry did not match@> 1503 end 1504 1505 @ Control codes in \.{WEB}, which begin with `\.{@@}', are converted 1506 into a numeric code designed to simplify \.{WEAVE}'s logic; for example, 1507 larger numbers are given to the control codes that denote more significant 1508 milestones, and the code of |new_module| should be the largest of 1509 all. Some of these numeric control codes take the place of ASCII 1510 control codes that will not otherwise appear in the output of the 1511 scanning routines. 1512 @^ASCII code@> 1513 1514 @d ignore=0 {control code of no interest to \.{WEAVE}} 1515 @d verbatim=@'2 {extended ASCII alpha will not appear} 1516 @d force_line=@'3 {extended ASCII beta will not appear} 1517 @d begin_comment=@'11 {ASCII tab mark will not appear} 1518 @d end_comment=@'12 {ASCII line feed will not appear} 1519 @d octal=@'14 {ASCII form feed will not appear} 1520 @d hex=@'15 {ASCII carriage return will not appear} 1521 @d double_dot=@'40 {ASCII space will not appear except in strings} 1522 @d no_underline=@'175 {this code will be intercepted without confusion} 1523 @d underline=@'176 {this code will be intercepted without confusion} 1524 @d param=@'177 {ASCII delete will not appear} 1525 @d xref_roman=@'203 {control code for `\.{@@\^}'} 1526 @d xref_wildcard=@'204 {control code for `\.{@@:}'} 1527 @d xref_typewriter=@'205 {control code for `\.{@@.}'} 1528 @d TeX_string=@'206 {control code for `\.{@@t}'} 1529 @d check_sum=@'207 {control code for `\.{@@\$}'} 1530 @d join=@'210 {control code for `\.{@@\&}'} 1531 @d thin_space=@'211 {control code for `\.{@@,}'} 1532 @d math_break=@'212 {control code for `\.{@@\char'174}'} 1533 @d line_break=@'213 {control code for `\.{@@/}'} 1534 @d big_line_break=@'214 {control code for `\.{@@\#}'} 1535 @d no_line_break=@'215 {control code for `\.{@@+}'} 1536 @d pseudo_semi=@'216 {control code for `\.{@@;}'} 1537 @d format=@'217 {control code for `\.{@@f}'} 1538 @d definition=@'220 {control code for `\.{@@d}'} 1539 @d begin_Pascal=@'221 {control code for `\.{@@p}'} 1540 @d module_name=@'222 {control code for `\.{@@<}'} 1541 @d new_module=@'223 {control code for `\.{@@\ }' and `\.{@@*}'} 1542 1543 @ Control codes are converted from ASCII to \.{WEAVE}'s internal 1544 representation by the |control_code| routine. 1545 1546 @p function control_code(@!c:ASCII_code):eight_bits; {convert |c| 1547 after \.{@@}} 1548 begin case c of 1549 "@@": control_code:="@@"; {`quoted' at sign} 1550 "'": control_code:=octal; {precedes octal constant} 1551 """": control_code:=hex; {precedes hexadecimal constant} 1552 "$": control_code:=check_sum; {precedes check sum constant} 1553 " ",tab_mark,"*": control_code:=new_module; {beginning of a new module} 1554 "=": control_code:=verbatim; 1555 "\": control_code:=force_line; 1556 "D","d": control_code:=definition; {macro definition} 1557 "F","f": control_code:=format; {format definition} 1558 "{": control_code:=begin_comment; {begin-comment delimiter} 1559 "}": control_code:=end_comment; {end-comment delimiter} 1560 "P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module} 1561 "&": control_code:=join; {concatenate two tokens} 1562 "<": control_code:=module_name; {beginning of a module name} 1563 ">": begin err_print('! Extra @@>'); control_code:=ignore; 1564 @.Extra \AT!>@> 1565 end; {end of module name should not be discovered in this way} 1566 "T","t": control_code:=TeX_string; {\TeX\ box within \PASCAL} 1567 "!": control_code:=underline; {set definition flag} 1568 "?": control_code:=no_underline; {reset definition flag} 1569 "^": control_code:=xref_roman; {index entry to be typeset normally} 1570 ":": control_code:=xref_wildcard; {index entry to be in user format} 1571 ".": control_code:=xref_typewriter; {index entry to be in typewriter type} 1572 ",": control_code:=thin_space; {puts extra space in \PASCAL\ format} 1573 "|": control_code:=math_break; {allows a break in a formula} 1574 "/": control_code:=line_break; {forces end-of-line in \PASCAL\ format} 1575 "#": control_code:=big_line_break; {forces end-of-line and some space besides} 1576 "+": control_code:=no_line_break; {cancels end-of-line down to single space} 1577 ";": control_code:=pseudo_semi; {acts like a semicolon, but is invisible} 1578 @t\4@>@<Special control codes allowed only when debugging@>@; 1579 othercases begin err_print('! Unknown control code'); control_code:=ignore; 1580 @.Unknown control code@> 1581 end 1582 endcases; 1583 end; 1584 1585 @ If \.{WEAVE} is compiled with debugging commands, one can write 1586 \.{@@2}, \.{@@1}, and \.{@@0} to turn tracing fully on, partly on, 1587 and off, respectively. 1588 @.\AT!2@> 1589 @.\AT!1@> 1590 1591 @<Special control codes...@>= 1592 @!debug@t@>@/ 1593 "0","1","2": begin tracing:=c-"0"; control_code:=ignore; 1594 end; 1595 gubed 1596 1597 @ The |skip_limbo| routine is used on the first pass to skip through 1598 portions of the input that are not in any modules, i.e., that precede 1599 the first module. After this procedure has been called, the value of 1600 |input_has_ended| will tell whether or not a new module has 1601 actually been found. 1602 1603 @p procedure skip_limbo; {skip to next module} 1604 label exit; 1605 var c:ASCII_code; {character following \.{@@}} 1606 begin loop if loc>limit then 1607 begin get_line; 1608 if input_has_ended then return; 1609 end 1610 else begin buffer[limit+1]:="@@"; 1611 while buffer[loc]<>"@@" do incr(loc); 1612 if loc<=limit then 1613 begin loc:=loc+2; c:=buffer[loc-1]; 1614 if (c=" ")or(c=tab_mark)or(c="*") then return; 1615 end; 1616 end; 1617 exit: end; 1618 1619 @ The |skip_TeX| routine is used on the first pass to skip through 1620 the \TeX\ code at the beginning of a module. It returns the next 1621 control code or `\v' found in the input. A |new_module| is 1622 assumed to exist at the very end of the file. 1623 1624 @p function skip_TeX: eight_bits; {skip past pure \TeX\ code} 1625 label done; 1626 var c:eight_bits; {control code found} 1627 begin loop begin if loc>limit then 1628 begin get_line; 1629 if input_has_ended then 1630 begin c:=new_module; goto done; 1631 end; 1632 end; 1633 buffer[limit+1]:="@@"; 1634 repeat c:=buffer[loc]; incr(loc); 1635 if c="|" then goto done; 1636 until c="@@"; 1637 if loc<=limit then 1638 begin c:=control_code(buffer[loc]); incr(loc); goto done; 1639 end; 1640 end; 1641 done:skip_TeX:=c; 1642 end; 1643 1644 @ The |skip_comment| routine is used on the first pass to skip 1645 through \TeX\ code in \PASCAL\ comments. The |bal| parameter 1646 tells how many left braces are assumed to have been scanned when 1647 this routine is called, and the procedure returns a corresponding 1648 value of |bal| at the point that scanning has stopped. Scanning 1649 stops either at a `\v' that introduces \PASCAL\ text, 1650 in which case the returned value is positive, or it stops at the 1651 end of the comment, in which case the returned value is zero. 1652 The scanning also stops in anomalous situations when the comment 1653 doesn't end or when it contains an illegal use of \.{@@}. 1654 One should call |skip_comment(1)| when beginning to scan a comment. 1655 1656 @p function skip_comment(@!bal:eight_bits):eight_bits; {skips \TeX\ 1657 code in comments} 1658 label done; 1659 var c:ASCII_code; {the current character} 1660 begin loop begin if loc>limit then 1661 begin get_line; 1662 if input_has_ended then 1663 begin bal:=0; goto done; 1664 end; {an error message will occur in phase two} 1665 end; 1666 c:=buffer[loc]; incr(loc); 1667 if c="|" then goto done; 1668 @<Do special things when |c="@@", "\", "{", "}"|; |goto done| at end@>; 1669 end; 1670 done: skip_comment:=bal; 1671 end; 1672 1673 @ @<Do special things when |c="@@"...@>= 1674 if c="@@" then 1675 begin c:=buffer[loc]; 1676 if (c<>" ")and(c<>tab_mark)and(c<>"*") then incr(loc) 1677 else begin decr(loc); bal:=0; goto done; 1678 end {an error message will occur in phase two} 1679 end 1680 else if (c="\")and(buffer[loc]<>"@@") then incr(loc) 1681 else if c="{" then incr(bal) 1682 else if c="}" then 1683 begin decr(bal); 1684 if bal=0 then goto done; 1685 end 1686 1687 @* Inputting the next token. 1688 As stated above, \.{WEAVE}'s most interesting lexical scanning routine is the 1689 |get_next| function that inputs the next token of \PASCAL\ input. However, 1690 |get_next| is not especially complicated. 1691 1692 The result of |get_next| is either an ASCII code for some special character, 1693 or it is a special code representing a pair of characters (e.g., `\.{:=}' 1694 or `\.{..}'), or it is the numeric value computed by the |control_code| 1695 procedure, or it is one of the following special codes: 1696 1697 \yskip\hang |exponent|: The `\.E' in a real constant. 1698 1699 \yskip\hang |identifier|: In this case the global variables |id_first| 1700 and |id_loc| will have been set to the appropriate values needed by the 1701 |id_lookup| routine. 1702 1703 \yskip\hang |string|: In this case the global variables |id_first| and 1704 |id_loc| will have been set to the beginning and ending-plus-one locations 1705 in the buffer. The string ends with the first reappearance of its initial 1706 delimiter; thus, for example, $$\.{\'This isn\'\'t a single string\'}$$ 1707 will be treated as two consecutive strings, the first being \.{\'This 1708 isn\'}. 1709 1710 \yskip\noindent Furthermore, some of the control codes cause 1711 |get_next| to take additional actions: 1712 1713 \yskip\hang |xref_roman|, |xref_wildcard|, 1714 |xref_typewriter|, |TeX_string|: The values of 1715 |id_first| and |id_loc| will be set so that the string in question appears 1716 in |buffer[id_first..(id_loc-1)]|. 1717 1718 \yskip\hang |module_name|: In this case the global variable |cur_module| will 1719 point to the |byte_start| entry for the module name that has just been scanned. 1720 1721 \yskip\noindent If |get_next| sees `\.{@@!}' or `\.{@@?}', 1722 it sets |xref_switch| to |def_flag| or zero and goes on to the next token. 1723 1724 A global variable called |scanning_hex| is set |true| during the time that 1725 the letters \.A through \.F should be treated as if they were digits. 1726 1727 @d exponent=@'200 {\.E or \.e following a digit} 1728 @d string=@'201 {\PASCAL\ string or \.{WEB} precomputed string} 1729 @d identifier=@'202 {\PASCAL\ identifier or reserved word} 1730 1731 @<Globals...@>= 1732 @!cur_module: name_pointer; {name of module just scanned} 1733 @!scanning_hex: boolean; {are we scanning a hexadecimal constant?} 1734 1735 @ @<Set init...@>= 1736 scanning_hex:=false; 1737 1738 @ As one might expect, |get_next| consists mostly of a big switch 1739 that branches to the various special cases that can arise. 1740 1741 @d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14, 1742 #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,# 1743 1744 @p function get_next:eight_bits; {produces the next input token} 1745 label restart,done,found; 1746 var c:eight_bits; {the current character} 1747 @!d:eight_bits; {the next character} 1748 @!j,@!k:0..longest_name; {indices into |mod_text|} 1749 begin restart: if loc>limit then 1750 begin get_line; 1751 if input_has_ended then 1752 begin c:=new_module; goto found; 1753 end; 1754 end; 1755 c:=buffer[loc]; incr(loc); 1756 if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit, 1757 otherwise set |scanning_hex:=false|@>; 1758 case c of 1759 "A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>; 1760 "'","""": @<Get a string@>; 1761 "@@": @<Get control code and possible module name@>; 1762 @t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@; 1763 " ",tab_mark: goto restart; {ignore spaces and tabs} 1764 "}": begin err_print('! Extra }'); goto restart; 1765 @.Extra \}@> 1766 end; 1767 othercases if c>=128 then goto restart {ignore nonstandard characters} 1768 else do_nothing 1769 endcases; 1770 found:@!debug if trouble_shooting then debug_help;@;@+gubed@/ 1771 get_next:=c; 1772 end; 1773 1774 @ @<Go to |found| if |c| is a hexadecimal digit...@>= 1775 if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found 1776 else scanning_hex:=false 1777 1778 @ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the 1779 respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used 1780 for \TeX\ comments in \PASCAL\ text. 1781 1782 @d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end 1783 1784 @<Compress two-symbol...@>= 1785 ".": if buffer[loc]="." then compress(double_dot) 1786 else if buffer[loc]=")" then compress("]"); 1787 ":": if buffer[loc]="=" then compress(left_arrow); 1788 "=": if buffer[loc]="=" then compress(equivalence_sign); 1789 ">": if buffer[loc]="=" then compress(greater_or_equal); 1790 "<": if buffer[loc]="=" then compress(less_or_equal) 1791 else if buffer[loc]=">" then compress(not_equal); 1792 "(": if buffer[loc]="*" then compress(begin_comment) 1793 else if buffer[loc]="." then compress("["); 1794 "*": if buffer[loc]=")" then compress(end_comment); 1795 1796 @ @<Get an identifier@>= 1797 begin if ((c="E")or(c="e"))and(loc>1) then 1798 if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=exponent; 1799 if c<>exponent then 1800 begin decr(loc); id_first:=loc; 1801 repeat incr(loc); d:=buffer[loc]; 1802 until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z"))and(d<>"_"); 1803 c:=identifier; id_loc:=loc; 1804 end; 1805 end 1806 1807 @ A string that starts and ends with single or double quote marks is 1808 scanned by the following piece of the program. 1809 1810 @<Get a string@>= 1811 begin id_first:=loc-1; 1812 repeat d:=buffer[loc]; incr(loc); 1813 if loc>limit then 1814 begin err_print('! String constant didn''t end'); 1815 @.String constant didn't end@> 1816 loc:=limit; d:=c; 1817 end; 1818 until d=c; 1819 id_loc:=loc; c:=string; 1820 end 1821 1822 @ After an \.{@@} sign has been scanned, the next character tells us 1823 whether there is more work to do. 1824 1825 @<Get control code and possible module name@>= 1826 begin c:=control_code(buffer[loc]); incr(loc); 1827 if c=underline then 1828 begin xref_switch:=def_flag; goto restart; 1829 end 1830 else if c=no_underline then 1831 begin xref_switch:=0; goto restart; 1832 end 1833 else if (c<=TeX_string)and(c>=xref_roman) then 1834 @<Scan to the next \.{@@>}@> 1835 else if c=hex then scanning_hex:=true 1836 else if c=module_name then 1837 @<Scan the module name and make |cur_module| point to it@> 1838 else if c=verbatim then @<Scan a verbatim string@>; 1839 end 1840 1841 @ The occurrence of a module name sets |xref_switch| to zero, 1842 because the module name might (for example) follow \&{var}. 1843 1844 @<Scan the module name...@>= 1845 begin @<Put module name into |mod_text[1..k]|@>; 1846 if k>3 then 1847 begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then 1848 cur_module:=prefix_lookup(k-3) 1849 else cur_module:=mod_lookup(k); 1850 end 1851 else cur_module:=mod_lookup(k); 1852 xref_switch:=0; 1853 end 1854 1855 @ Module names are placed into the |mod_text| array with consecutive spaces, 1856 tabs, and carriage-returns replaced by single spaces. There will be no 1857 spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate 1858 this, since the |mod_lookup| routine uses |mod_text[1]| as the first 1859 character of the name.) 1860 1861 @<Set init...@>=mod_text[0]:=" "; 1862 1863 @ @<Put module name...@>= 1864 k:=0; 1865 loop@+ begin if loc>limit then 1866 begin get_line; 1867 if input_has_ended then 1868 begin err_print('! Input ended in section name'); 1869 @.Input ended in section name@> 1870 loc:=1; goto done; 1871 end; 1872 end; 1873 d:=buffer[loc]; 1874 @<If end of name, |goto done|@>; 1875 incr(loc); if k<longest_name-1 then incr(k); 1876 if (d=" ")or(d=tab_mark) then 1877 begin d:=" "; if mod_text[k-1]=" " then decr(k); 1878 end; 1879 mod_text[k]:=d; 1880 end; 1881 done: @<Check for overlong name@>; 1882 if (mod_text[k]=" ")and(k>0) then decr(k) 1883 1884 @ @<If end of name,...@>= 1885 if d="@@" then 1886 begin d:=buffer[loc+1]; 1887 if d=">" then 1888 begin loc:=loc+2; goto done; 1889 end; 1890 if (d=" ")or(d=tab_mark)or(d="*") then 1891 begin err_print('! Section name didn''t end'); goto done; 1892 @.Section name didn't end@> 1893 end; 1894 incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again} 1895 end 1896 1897 @ @<Check for overlong name@>= 1898 if k>=longest_name-2 then 1899 begin print_nl('! Section name too long: '); 1900 @.Section name too long@> 1901 for j:=1 to 25 do print(xchr[mod_text[j]]); 1902 print('...'); mark_harmless; 1903 end 1904 1905 @ @<Scan to the next...@>= 1906 begin id_first:=loc; buffer[limit+1]:="@@"; 1907 while buffer[loc]<>"@@" do incr(loc); 1908 id_loc:=loc; 1909 if loc>limit then 1910 begin err_print('! Control text didn''t end'); loc:=limit; 1911 @.Control text didn't end@> 1912 end 1913 else begin loc:=loc+2; 1914 if buffer[loc-1]<>">" then 1915 err_print('! Control codes are forbidden in control text'); 1916 @.Control codes are forbidden...@> 1917 end; 1918 end 1919 1920 @ A verbatim \PASCAL\ string will be treated like ordinary strings, but 1921 with no surrounding delimiters. At the present point in the program we 1922 have |buffer[loc-1]=verbatim|; we must set |id_first| to the beginning 1923 of the string itself, and |id_loc| to its ending-plus-one location in the 1924 buffer. We also set |loc| to the position just after the ending delimiter. 1925 1926 @<Scan a verbatim string@>= 1927 begin id_first:=loc; incr(loc); 1928 buffer[limit+1]:="@@"; buffer[limit+2]:=">"; 1929 while (buffer[loc]<>"@@")or(buffer[loc+1]<>">") do incr(loc); 1930 if loc>=limit then err_print('! Verbatim string didn''t end'); 1931 @.Verbatim string didn't end@> 1932 id_loc:=loc; loc:=loc+2; 1933 end 1934 1935 @* Phase one processing. 1936 We now have accumulated enough subroutines to make it possible to carry out 1937 \.{WEAVE}'s first pass over the source file. If everything works right, 1938 both phase one and phase two of \.{WEAVE} will assign the same numbers to 1939 modules, and these numbers will agree with what \.{TANGLE} does. 1940 1941 The global variable |next_control| often contains the most recent output of 1942 |get_next|; in interesting cases, this will be the control code that 1943 ended a module or part of a module. 1944 1945 @<Glob...@>=@!next_control:eight_bits; {control code waiting to be acting upon} 1946 1947 @ The overall processing strategy in phase one has the following 1948 straightforward outline. 1949 1950 @<Phase I: Read all the user's text and store the cross references@>= 1951 phase_one:=true; phase_three:=false; 1952 reset_input; 1953 module_count:=0; changed_module[0]:=false; 1954 skip_limbo; change_exists:=false; 1955 while not input_has_ended do 1956 @<Store cross reference data for the current module@>; 1957 changed_module[module_count]:=change_exists; 1958 {the index changes if anything does} 1959 phase_one:=false; {prepare for second phase} 1960 @<Print error messages about unused or undefined module names@>; 1961 1962 @ @<Store cross reference data...@>= 1963 begin incr(module_count); 1964 if module_count=max_modules then overflow('section number'); 1965 changed_module[module_count]:=changing; 1966 {it will become |true| if any line changes} 1967 if buffer[loc-1]="*" then 1968 begin print('*',module_count:1); 1969 update_terminal; {print a progress report} 1970 end; 1971 @<Store cross references in the \TeX\ part of a module@>; 1972 @<Store cross references in the \(definition part of a module@>; 1973 @<Store cross references in the \PASCAL\ part of a module@>; 1974 if changed_module[module_count] then change_exists:=true; 1975 end 1976 1977 @ The |Pascal_xref| subroutine stores references to identifiers in 1978 \PASCAL\ text material beginning with the current value of |next_control| 1979 and continuing until |next_control| is `\.\{' or `\v', or until the next 1980 ``milestone'' is passed (i.e., |next_control>=format|). If 1981 |next_control>=format| when |Pascal_xref| is called, nothing will happen; 1982 but if |next_control="|"| upon entry, the procedure assumes that this is 1983 the `\v' preceding \PASCAL\ text that is to be processed. 1984 1985 The program uses the fact that our internal code numbers satisfy 1986 the relations |xref_roman=identifier+roman| and |xref_wildcard=identifier 1987 +wildcard| and |xref_typewriter=identifier+ 1988 typewriter| and |normal=0|. An implied `\.{@@!}' is inserted after 1989 \&{function}, \&{procedure}, \&{program}, and \&{var}. 1990 1991 @p procedure Pascal_xref; {makes cross references for \PASCAL\ identifiers} 1992 label exit; 1993 var p:name_pointer; {a referenced name} 1994 begin while next_control<format do 1995 begin if (next_control>=identifier)and 1996 (next_control<=xref_typewriter) then 1997 begin p:=id_lookup(next_control-identifier); new_xref(p); 1998 if (ilk[p]=proc_like)or(ilk[p]=var_like) then 1999 xref_switch:=def_flag; {implied `\.{@@!}'} 2000 end; 2001 next_control:=get_next; 2002 if (next_control="|")or(next_control="{") then return; 2003 end; 2004 exit:end; 2005 2006 @ The |outer_xref| subroutine is like |Pascal_xref| but it begins 2007 with |next_control<>"|"| and ends with |next_control>=format|. Thus, it 2008 handles \PASCAL\ text with embedded comments. 2009 2010 @p procedure outer_xref; {extension of |Pascal_xref|} 2011 var bal:eight_bits; {brace level in comment} 2012 begin while next_control<format do 2013 if next_control<>"{" then Pascal_xref 2014 else begin bal:=skip_comment(1); next_control:="|"; 2015 while bal>0 do 2016 begin Pascal_xref; 2017 if next_control="|" then bal:=skip_comment(bal) 2018 else bal:=0; {an error will be reported in phase two} 2019 end; 2020 end; 2021 end; 2022 2023 @ In the \TeX\ part of a module, cross reference entries are made only for 2024 the identifiers in \PASCAL\ texts enclosed in \pb, or for control texts 2025 enclosed in \.{@@\^}$\,\ldots\,$\.{@@>} or \.{@@.}$\,\ldots\,$\.{@@>} 2026 or \.{@@:}$\,\ldots\,$\.{@@>}. 2027 2028 @<Store cross references in the \T...@>= 2029 repeat next_control:=skip_TeX; 2030 case next_control of 2031 underline: xref_switch:=def_flag; 2032 no_underline: xref_switch:=0; 2033 "|": Pascal_xref; 2034 xref_roman, xref_wildcard, xref_typewriter, module_name: 2035 begin loc:=loc-2; next_control:=get_next; {scan to \.{@@>}} 2036 if next_control<>module_name then 2037 new_xref(id_lookup(next_control-identifier)); 2038 end; 2039 othercases do_nothing 2040 endcases; 2041 until next_control>=format 2042 2043 @ During the definition and \PASCAL\ parts of a module, cross references 2044 are made for all identifiers except reserved words; however, the 2045 identifiers in a format definition are referenced even if they are 2046 reserved. The \TeX\ code in comments is, of course, ignored, except for 2047 \PASCAL\ portions enclosed in \pb; the text of a module name is skipped 2048 entirely, even if it contains \pb\ constructions. 2049 2050 The variables |lhs| and |rhs| point to the respective identifiers involved 2051 in a format definition. 2052 2053 @<Global...@>= 2054 @!lhs,@!rhs:name_pointer; {indices into |byte_start| for format identifiers} 2055 2056 @ When we get to the following code we have |next_control>=format|. 2057 2058 @<Store cross references in the \(d...@>= 2059 while next_control<=definition do {|format| or |definition|} 2060 begin xref_switch:=def_flag; {implied \.{@@!}} 2061 if next_control=definition then next_control:=get_next 2062 else @<Process a format definition@>; 2063 outer_xref; 2064 end 2065 2066 @ Error messages for improper format definitions will be issued in phase 2067 two. Our job in phase one is to define the |ilk| of a properly formatted 2068 identifier, and to fool the |new_xref| routine into thinking that the 2069 identifier on the right-hand side of the format definition is not a 2070 reserved word. 2071 2072 @<Process a form...@>= 2073 begin next_control:=get_next; 2074 if next_control=identifier then 2075 begin lhs:=id_lookup(normal); ilk[lhs]:=normal; new_xref(lhs); 2076 next_control:=get_next; 2077 if next_control=equivalence_sign then 2078 begin next_control:=get_next; 2079 if next_control=identifier then 2080 begin rhs:=id_lookup(normal); 2081 ilk[lhs]:=ilk[rhs]; ilk[rhs]:=normal; new_xref(rhs); 2082 ilk[rhs]:=ilk[lhs]; next_control:=get_next; 2083 end; 2084 end; 2085 end; 2086 end 2087 2088 @ Finally, when the \TeX\ and definition parts have been treated, we have 2089 |next_control>=begin_Pascal|. 2090 2091 @<Store cross references in the \P...@>= 2092 if next_control<=module_name then {|begin_Pascal| or |module_name|} 2093 begin if next_control=begin_Pascal then mod_xref_switch:=0 2094 else mod_xref_switch:=def_flag; 2095 repeat if next_control=module_name then new_mod_xref(cur_module); 2096 next_control:=get_next; outer_xref; 2097 until next_control>module_name; 2098 end 2099 2100 @ After phase one has looked at everything, we want to check that each 2101 module name was both defined and used. 2102 The variable |cur_xref| will point to cross references for the 2103 current module name of interest. 2104 2105 @<Glob...@>=@!cur_xref:xref_number; {temporary cross reference pointer} 2106 2107 @ The following recursive procedure 2108 walks through the tree of module names and prints out anomalies. 2109 @^recursion@> 2110 2111 @p procedure mod_check(@!p:name_pointer); {print anomalies in subtree |p|} 2112 begin if p>0 then 2113 begin mod_check(llink[p]);@/ 2114 cur_xref:=xref[p]; 2115 if num(cur_xref)<def_flag then 2116 begin print_nl('! Never defined: <'); print_id(p); 2117 @.Never defined: <section name>@> 2118 print('>'); mark_harmless; 2119 end; 2120 while num(cur_xref)>=def_flag do cur_xref:=xlink(cur_xref); 2121 if cur_xref=0 then 2122 begin print_nl('! Never used: <'); print_id(p); print('>'); 2123 @.Never used: <section name>@> 2124 mark_harmless; 2125 end; 2126 mod_check(rlink[p]); 2127 end; 2128 end; 2129 2130 @ @<Print error messages about un...@>=@+mod_check(root) 2131 2132 @* Low-level output routines. 2133 The \TeX\ output is supposed to appear in lines at most |line_length| 2134 characters long, so we place it into an output buffer. During the output 2135 process, |out_line| will hold the current line number of the line about to 2136 be output. 2137 2138 @<Glo...@>= 2139 @!out_buf:array[0..line_length] of ASCII_code; {assembled characters} 2140 @!out_ptr:0..line_length; {number of characters in |out_buf|} 2141 @!out_line: integer; {coordinates of next line to be output} 2142 2143 @ The |flush_buffer| routine empties the buffer up to a given breakpoint, 2144 and moves any remaining characters to the beginning of the next line. 2145 If the |per_cent| parameter is |true|, a |"%"| is appended to the line 2146 that is being output; in this case the breakpoint |b| should be strictly 2147 less than |line_length|. If the |per_cent| parameter is |false|, 2148 trailing blanks are suppressed. 2149 The characters emptied from the buffer form a new line of output; 2150 if the |carryover| parameter is true, a |"%"| in that line will be 2151 carried over to the next line (so that \TeX\ will ignore the completion 2152 of commented-out text). 2153 2154 @p procedure flush_buffer(@!b:eight_bits;@!per_cent,@!carryover:boolean); 2155 {outputs |out_buf[1..b]|, where |b<=out_ptr|} 2156 label done,found; 2157 var j,@!k:0..line_length; 2158 begin j:=b; 2159 if not per_cent then {remove trailing blanks} 2160 loop@+ begin if j=0 then goto done; 2161 if out_buf[j]<>" " then goto done; 2162 decr(j); 2163 end; 2164 done: for k:=1 to j do write(tex_file,xchr[out_buf[k]]); 2165 if per_cent then write(tex_file,xchr["%"]); 2166 write_ln(tex_file); incr(out_line); 2167 if carryover then 2168 for k:=1 to j do 2169 if out_buf[k]="%" then 2170 if (k=1)or(out_buf[k-1]<>"\") then {comment mode should be preserved} 2171 begin out_buf[b]:="%"; decr(b); goto found; 2172 end; 2173 found: if (b<out_ptr) then 2174 for k:=b+1 to out_ptr do out_buf[k-b]:=out_buf[k]; 2175 out_ptr:=out_ptr-b; 2176 end; 2177 2178 @ When we are copying \TeX\ source material, we retain line breaks 2179 that occur in the input, except that an empty line is not 2180 output when the \TeX\ source line was nonempty. For example, a line 2181 of the \TeX\ file that contains only an index cross-reference entry 2182 will not be copied. The |finish_line| routine is called just before 2183 |get_line| inputs a new line, and just after a line break token has 2184 been emitted during the output of translated \PASCAL\ text. 2185 2186 @p procedure finish_line; {do this at the end of a line} 2187 label exit; 2188 var k:0..buf_size; {index into |buffer|} 2189 begin if out_ptr>0 then flush_buffer(out_ptr,false,false) 2190 else begin for k:=0 to limit do 2191 if (buffer[k]<>" ")and(buffer[k]<>tab_mark) then return; 2192 flush_buffer(0,false,false); 2193 end; 2194 exit:end; 2195 2196 @ In particular, the |finish_line| procedure is called near the very 2197 beginning of phase two. We initialize the output variables in a slightly 2198 tricky way so that the first line of the output file will be 2199 `\.{\\input webmac}'. 2200 @.\\input webmac@> 2201 @.webmac@> 2202 2203 @<Set init...@>= 2204 out_ptr:=1; out_line:=1; out_buf[1]:="c"; write(tex_file,'\input webma'); 2205 2206 @ When we wish to append the character |c| to the output buffer, we write 2207 `$|out|(c)$'; this will cause the buffer to be emptied if it was already 2208 full. Similarly, `$|out2|(c_1)(c_2)$' appends a pair of characters. 2209 A line break will occur at a space or after a single-nonletter 2210 \TeX\ control sequence. 2211 2212 @d oot(#)==@;@/ 2213 if out_ptr=line_length then break_out; 2214 incr(out_ptr); out_buf[out_ptr]:=#; 2215 @d oot1(#)==oot(#)@+end 2216 @d oot2(#)==oot(#)@,oot1 2217 @d oot3(#)==oot(#)@,oot2 2218 @d oot4(#)==oot(#)@,oot3 2219 @d oot5(#)==oot(#)@,oot4 2220 @d out==@+begin oot1 2221 @d out2==@+begin oot2 2222 @d out3==@+begin oot3 2223 @d out4==@+begin oot4 2224 @d out5==@+begin oot5 2225 2226 @ The |break_out| routine is called just before the output buffer is about 2227 to overflow. To make this routine a little faster, we initialize position 2228 0 of the output buffer to `\.\\'; this character isn't really output. 2229 2230 @<Set init...@>= 2231 out_buf[0]:="\"; 2232 2233 @ A long line is broken at a blank space or just before a backslash that isn't 2234 preceded by another backslash. In the latter case, a |"%"| is output at 2235 the break. 2236 2237 @p procedure break_out; {finds a way to break the output line} 2238 label exit; 2239 var k:0..line_length; {index into |out_buf|} 2240 @!d:ASCII_code; {character from the buffer} 2241 begin k:=out_ptr; 2242 loop@+ begin if k=0 then 2243 @<Print warning message, break the line, |return|@>; 2244 d:=out_buf[k]; 2245 if d=" " then 2246 begin flush_buffer(k,false,true); return; 2247 end; 2248 if (d="\")and(out_buf[k-1]<>"\") then {in this case |k>1|} 2249 begin flush_buffer(k-1,true,true); return; 2250 end; 2251 decr(k); 2252 end; 2253 exit:end; 2254 2255 @ We get to this module only in unusual cases that the entire output line 2256 consists of a string of backslashes followed by a string of nonblank 2257 non-backslashes. In such cases it is almost always safe to break the 2258 line by putting a |"%"| just before the last character. 2259 2260 @<Print warning message...@>= 2261 begin print_nl('! Line had to be broken (output l.',out_line:1); 2262 @.Line had to be broken@> 2263 print_ln('):'); 2264 for k:=1 to out_ptr-1 do print(xchr[out_buf[k]]); 2265 new_line; mark_harmless; 2266 flush_buffer(out_ptr-1,true,true); return; 2267 end 2268 2269 @ Here is a procedure that outputs a module number in decimal notation. 2270 2271 @<Glob...@>=@!dig:array[0..4] of 0..9; {digits to output} 2272 2273 @ The number to be converted by |out_mod| is known to be less than 2274 |def_flag|, so it cannot have more than five decimal digits. If 2275 the module is changed, we output `\.{\\*}' just after the number. 2276 2277 @p procedure out_mod(@!m:integer); {output a module number} 2278 var k:0..5; {index into |dig|} 2279 @!a:integer; {accumulator} 2280 begin k:=0; a:=m; 2281 repeat dig[k]:=a mod 10; a:=a div 10; incr(k); 2282 until a=0; 2283 repeat decr(k); out(dig[k]+"0"); 2284 until k=0; 2285 if changed_module[m] then out2("\")("*"); 2286 @.\\*@> 2287 end; 2288 2289 @ The |out_name| subroutine is used to output an identifier or index 2290 entry, enclosing it in braces. 2291 2292 @p procedure out_name(@!p:name_pointer); {outputs a name} 2293 var k:0..max_bytes; {index into |byte_mem|} 2294 @!w:0..ww-1; {row of |byte_mem|} 2295 begin out("{"); w:=p mod ww; 2296 for k:=byte_start[p] to byte_start[p+ww]-1 do 2297 begin if byte_mem[w,k]="_" then out("\"); 2298 @.\\_@> 2299 out(byte_mem[w,k]); 2300 end; 2301 out("}"); 2302 end; 2303 2304 @* Routines that copy \TeX\ material. 2305 During phase two, we use the subroutines |copy_limbo|, |copy_TeX|, and 2306 |copy_comment| in place of the analogous |skip_limbo|, |skip_TeX|, and 2307 |skip_comment| that were used in phase one. 2308 2309 The |copy_limbo| routine, for example, takes \TeX\ material that is not 2310 part of any module and transcribes it almost verbatim to the output file. 2311 No `\.{@@}' signs should occur in such material except in `\.{@@@@}' 2312 pairs; such pairs are replaced by singletons. 2313 2314 @p procedure copy_limbo; {copy \TeX\ code until the next module begins} 2315 label exit; 2316 var c:ASCII_code; {character following \.{@@} sign} 2317 begin loop if loc>limit then 2318 begin finish_line; get_line; 2319 if input_has_ended then return; 2320 end 2321 else begin buffer[limit+1]:="@@"; 2322 @<Copy up to control code, |return| if finished@>; 2323 end; 2324 exit:end; 2325 2326 @ @<Copy up to control...@>= 2327 while buffer[loc]<>"@@" do 2328 begin out(buffer[loc]); incr(loc); 2329 end; 2330 if loc<=limit then 2331 begin loc:=loc+2; c:=buffer[loc-1]; 2332 if (c=" ")or(c=tab_mark)or(c="*") then return; 2333 out("@@"); 2334 if c<>"@@" then err_print('! Double @@ required outside of sections'); 2335 @.Double \AT! required...@> 2336 end 2337 2338 @ The |copy_TeX| routine processes the \TeX\ code at the beginning of a 2339 module; for example, the words you are now reading were copied in this 2340 way. It returns the next control code or `\v' found in the input. 2341 2342 @p function copy_TeX:eight_bits; {copy pure \TeX\ material} 2343 label done; 2344 var c:eight_bits; {control code found} 2345 begin loop begin if loc>limit then 2346 begin finish_line; get_line; 2347 if input_has_ended then 2348 begin c:=new_module; goto done; 2349 end; 2350 end; 2351 buffer[limit+1]:="@@"; 2352 @<Copy up to `\v' or control code, |goto done| if finished@>; 2353 end; 2354 done:copy_TeX:=c; 2355 end; 2356 2357 @ We don't copy spaces or tab marks into the beginning of a line. This 2358 makes the test for empty lines in |finish_line| work. 2359 2360 @<Copy up to `\v'...@>= 2361 repeat c:=buffer[loc]; incr(loc); 2362 if c="|" then goto done; 2363 if c<>"@@" then 2364 begin out(c); 2365 if (out_ptr=1)and((c=" ")or(c=tab_mark)) then decr(out_ptr); 2366 end; 2367 until c="@@"; 2368 if loc<=limit then 2369 begin c:=control_code(buffer[loc]); incr(loc); 2370 goto done; 2371 end 2372 2373 @ The |copy_comment| uses and returns a brace-balance value, following the 2374 conventions of |skip_comment| above. Instead of copying the \TeX\ material 2375 into the output buffer, this procedure copies it into the token memory. 2376 The abbreviation |app_tok(t)| is used to append token |t| to the current 2377 token list, and it also makes sure that it is possible to append at least 2378 one further token without overflow. 2379 2380 @d app_tok(#)==begin if tok_ptr+2>max_toks then overflow('token'); 2381 tok_mem[tok_ptr]:=#; incr(tok_ptr); 2382 end 2383 2384 @p function copy_comment(@!bal:eight_bits):eight_bits; {copies \TeX\ code in 2385 comments} 2386 label done; 2387 var c:ASCII_code; {current character being copied} 2388 begin loop begin if loc>limit then 2389 begin get_line; 2390 if input_has_ended then 2391 begin err_print('! Input ended in mid-comment'); 2392 @.Input ended in mid-comment@> 2393 loc:=1; @<Clear |bal| and |goto done|@>; 2394 end; 2395 end; 2396 c:=buffer[loc]; incr(loc); 2397 if c="|" then goto done; 2398 app_tok(c); 2399 @<Copy special things when |c="@@", "\", "{", "}"|; 2400 |goto done| at end@>; 2401 end; 2402 done: copy_comment:=bal; 2403 end; 2404 2405 @ @<Copy special things when |c="@@"...@>= 2406 if c="@@" then 2407 begin incr(loc); 2408 if buffer[loc-1]<>"@@" then 2409 begin err_print('! Illegal use of @@ in comment'); 2410 @.Illegal use of \AT!...@> 2411 loc:=loc-2; decr(tok_ptr); @<Clear |bal|...@>; 2412 end; 2413 end 2414 else if (c="\")and(buffer[loc]<>"@@") then 2415 begin app_tok(buffer[loc]); incr(loc); 2416 end 2417 else if c="{" then incr(bal) 2418 else if c="}" then 2419 begin decr(bal); 2420 if bal=0 then goto done; 2421 end 2422 2423 @ When the comment has terminated abruptly due to an error, we output 2424 enough right braces to keep \TeX\ happy. 2425 2426 @<Clear |bal|...@>= 2427 app_tok(" "); {this is done in case the previous character was `\.\\'} 2428 repeat app_tok("}"); decr(bal); 2429 until bal=0; 2430 goto done; 2431 2432 @* Parsing. 2433 The most intricate part of \.{WEAVE} is its mechanism for converting 2434 \PASCAL-like code into \TeX\ code, and we might as well plunge into this 2435 aspect of the program now. A ``bottom up'' approach is used to parse the 2436 \PASCAL-like material, since \.{WEAVE} must deal with fragmentary 2437 constructions whose overall ``part of speech'' is not known. 2438 2439 At the lowest level, the input is represented as a sequence of entities 2440 that we shall call {\it scraps}, where each scrap of information consists 2441 of two parts, its {\it category} and its {\it translation}. The category 2442 is essentially a syntactic class, and the translation is a token list that 2443 represents \TeX\ code. Rules of syntax and semantics tell us how to 2444 combine adjacent scraps into larger ones, and if we are lucky an entire 2445 \PASCAL\ text that starts out as hundreds of small scraps will join 2446 together into one gigantic scrap whose translation is the desired \TeX\ 2447 code. If we are unlucky, we will be left with several scraps that don't 2448 combine; their translations will simply be output, one by one. 2449 2450 The combination rules are given as context-sensitive productions that are 2451 applied from left to right. Suppose that we are currently working on the 2452 sequence of scraps $s_1\,s_2\ldots s_n$. We try first to find the longest 2453 production that applies to an initial substring $s_1\,s_2\ldots\,$; but if 2454 no such productions exist, we try to find the longest production 2455 applicable to the next substring $s_2\,s_3\ldots\,$; and if that fails, we 2456 try to match $s_3\,s_4\ldots\,$, etc. 2457 2458 A production applies if the category codes have a given pattern. For 2459 example, one of the productions is 2460 $$|open|\;|math|\;|semi|\;\RA\;|open|\;|math|$$ 2461 and it means that three consecutive scraps whose respective categories are 2462 |open|, |math|, and |semi| are con\-verted to two scraps whose categories 2463 are |open| and |math|. This production also has an associated rule that 2464 tells how to combine the translation parts: 2465 $$\eqalign{O_2&=O_1\cr 2466 M_2&=M_1\,S\,\.{\\,}\,\hbox{|opt|\thinspace\tt5}\cr}$$ 2467 This means that the |open| scrap has not changed, while the new |math| scrap 2468 has a translation $M_2$ composed of the translation $M_1$ of the original 2469 |math| scrap followed by the translation |S| of the |semi| scrap followed 2470 by `\.{\\,}' followed by `|opt|' followed by `\.5'. (In the \TeX\ file, 2471 this will specify an additional thin space after the semicolon, followed 2472 by an optional line break with penalty 50.) Translation rules use subscripts 2473 to distinguish between translations of scraps whose categories have the 2474 same initial letter; these subscripts are assigned from left to right. 2475 2476 $\.{WEAVE}$ also has the production rule 2477 $$|semi|\;\RA\;|terminator|$$ 2478 (meaning that a semicolon can terminate a \PASCAL\ statement). Since 2479 productions are applied from left to right, this rule will be activated 2480 only if the |semi| is not preceded by scraps that match other productions; 2481 in particular, a |semi| that is preceded by `|open| |math|' will have 2482 disappeared because of the production above, and such semicolons do not 2483 act as statement terminators. This incidentally is how \.{WEAVE} is able 2484 to treat semicolons in two distinctly different ways, the first of which 2485 is intended for semicolons in the parameter list of a procedure 2486 declaration. 2487 2488 The translation rule corresponding to $|semi|\;\RA\;|terminator|$ is 2489 $$T=S$$ 2490 but we shall not mention translation rules in the common case that the 2491 translation of the new scrap on the right-hand side is simply the 2492 concatenation of the disappearing scraps on the left-hand side. 2493 2494 @ Here is a list of the category codes that scraps can have. 2495 2496 @d simp=1 {the translation can be used both in horizontal mode 2497 and in math mode of \TeX} 2498 @d math=2 {the translation should be used only in \TeX\ math mode} 2499 @d intro=3 {a statement is expected to follow this, after a space and 2500 an optional break} 2501 @d open=4 {denotes an incomplete parenthesized quantity to be used in 2502 math mode} 2503 @d beginning=5 {denotes an incomplete compound statement to be used in 2504 horizontal mode} 2505 @d close=6 {ends a parenthesis or compound statement} 2506 @d alpha=7 {denotes the beginning of a clause} 2507 @d omega=8 {denotes the ending of a clause and possible comment following} 2508 @d semi=9 {denotes a semicolon and possible comment following it} 2509 @d terminator=10 {something that ends a statement or declaration} 2510 @d stmt=11 {denotes a statement or declaration including its terminator} 2511 @d cond=12 {precedes an \&{if} clause that might have a matching \&{else}} 2512 @d clause=13 {precedes a statement after which indentation ends} 2513 @d colon=14 {denotes a colon} 2514 @d exp=15 {stands for the E in a floating point constant} 2515 @d proc=16 {denotes a procedure or program or function heading} 2516 @d case_head=17 {denotes a case statement or record heading} 2517 @d record_head=18 {denotes a record heading without indentation} 2518 @d var_head=19 {denotes a variable declaration heading} 2519 @d elsie=20 {\&{else}} 2520 @d casey=21 {\&{case}} 2521 @d mod_scrap=22 {denotes a module name} 2522 2523 @p @!debug procedure print_cat(@!c:eight_bits); 2524 {symbolic printout of a category} 2525 begin case c of 2526 simp: print('simp'); 2527 math: print('math'); 2528 intro: print('intro'); 2529 open: print('open'); 2530 beginning: print('beginning'); 2531 close: print('close'); 2532 alpha: print('alpha'); 2533 omega: print('omega'); 2534 semi: print('semi'); 2535 terminator: print('terminator'); 2536 stmt: print('stmt'); 2537 cond: print('cond'); 2538 clause: print('clause'); 2539 colon: print('colon'); 2540 exp: print('exp'); 2541 proc: print('proc'); 2542 case_head: print('casehead'); 2543 record_head: print('recordhead'); 2544 var_head: print('varhead'); 2545 elsie: print('elsie'); 2546 casey: print('casey'); 2547 mod_scrap: print('module'); 2548 othercases print('UNKNOWN') 2549 endcases; 2550 end; 2551 gubed 2552 2553 @ The token lists for translated \TeX\ output contain some special control 2554 symbols as well as ordinary characters. These control symbols are 2555 interpreted by \.{WEAVE} before they are written to the output file. 2556 2557 \yskip\hang |break_space| denotes an optional line break or an en space; 2558 2559 \yskip\hang |force| denotes a line break; 2560 2561 \yskip\hang |big_force| denotes a line break with additional vertical space; 2562 2563 \yskip\hang |opt| denotes an optional line break (with the continuation 2564 line indented two ems with respect to the normal starting position)---this 2565 code is followed by an integer |n|, and the break will occur with penalty 2566 $10n$; 2567 2568 \yskip\hang |backup| denotes a backspace of one em; 2569 2570 \yskip\hang |cancel| obliterates any |break_space| or |force| or |big_force| 2571 tokens that immediately precede or follow it and also cancels any 2572 |backup| tokens that follow it; 2573 2574 \yskip\hang |indent| causes future lines to be indented one more em; 2575 2576 \yskip\hang |outdent| causes future lines to be indented one less em. 2577 2578 \yskip\noindent All of these tokens are removed from the \TeX\ output that 2579 comes from \PASCAL\ text between \pb\ signs; |break_space| and |force| and 2580 |big_force| become single spaces in this mode. The translation of other 2581 \PASCAL\ texts results in \TeX\ control sequences \.{\\1}, \.{\\2}, 2582 \.{\\3}, \.{\\4}, \.{\\5}, \.{\\6}, \.{\\7} corresponding respectively to 2583 |indent|, |outdent|, |opt|, |backup|, |break_space|, |force|, and 2584 |big_force|. However, a sequence of consecutive `\.\ ', |break_space|, 2585 |force|, and/or |big_force| tokens is first replaced by a single token 2586 (the maximum of the given ones). 2587 2588 The tokens |math_rel|, |math_bin|, |math_op| will be translated into 2589 \.{\\mathrel\{}, \.{\\mathbin\{}, and \.{\\mathop\{}, respectively. 2590 Other control sequences in the \TeX\ output will be `\.{\\\\\{}$\,\ldots\,$\.\}' 2591 surrounding identifiers, `\.{\\\&\{}$\,\ldots\,$\.\}' surrounding 2592 reserved words, `\.{\\.\{}$\,\ldots\,$\.\}' surrounding strings, 2593 `\.{\\C\{}$\,\ldots\,$\.\}$\,$|force|' surrounding comments, and 2594 `\.{\\X$n$:}$\,\ldots\,$\.{\\X}' surrounding module names, where 2595 |n| is the module number. 2596 2597 @d math_bin=@'203 2598 @d math_rel=@'204 2599 @d math_op=@'205 2600 @d big_cancel=@'206 {like |cancel|, also overrides spaces} 2601 @d cancel=@'207 {overrides |backup|, |break_space|, |force|, |big_force|} 2602 @d indent=cancel+1 {one more tab (\.{\\1})} 2603 @d outdent=cancel+2 {one less tab (\.{\\2})} 2604 @d opt=cancel+3 {optional break in mid-statement (\.{\\3})} 2605 @d backup=cancel+4 {stick out one unit to the left (\.{\\4})} 2606 @d break_space=cancel+5 {optional break between statements (\.{\\5})} 2607 @d force=cancel+6 {forced break between statements (\.{\\6})} 2608 @d big_force=cancel+7 {forced break with additional space (\.{\\7})} 2609 @d end_translation=big_force+1 {special sentinel token at end of list} 2610 2611 @ The raw input is converted into scraps according to the following table, 2612 which gives category codes followed by the translations. Sometimes a single 2613 item of input produces more than one scrap. 2614 \def\stars {\.{**}}% 2615 (The symbol `\stars' stands for `\.{\\\&\{{\rm identifier}\}}', 2616 i.e., the identifier itself treated as a reserved word. In a few cases the 2617 category is given as `|@!comment|'; this is not an actual category code, it 2618 means that the translation will be treated as a comment, as explained 2619 below.) 2620 2621 \yskip\halign{\quad#\hfil&\quad#\hfil\cr 2622 \.{<>}&|math:|\.{\\I}\cr 2623 \.{<=}&|math:|\.{\\L}\cr 2624 \.{>=}&|math:|\.{\\G}\cr 2625 \.{:=}&|math:|\.{\\K}\cr 2626 \.{==}&|math:|\.{\\S}\cr 2627 \.{(*}&|math:|\.{\\B}\cr 2628 \.{*)}&|math:|\.{\\T}\cr 2629 \.{(.}&|open:|\.[\cr 2630 \.{.)}&|close:|\.]\cr 2631 \."$\,$string$\,$\."&|simp:|\.{\\.\{"{\rm$\,$modified string$\,$}"\}}\cr 2632 \.\'$\,$string$\,$\.\'&|simp:|\.{\\.\{\\\'{\rm$\,$modified 2633 string$\,$}\\\'\}}\cr 2634 \.{@@=}$\,$string$\,$\.{@@>}&|simp:|\.{\\=\{{\rm$\,$modified string$\,$}\}}\cr 2635 \#&|math:|\.{\\\#}\cr 2636 \.\$&|math:|\.{\\\$}\cr 2637 \.\_&|math:|\.{\\\_}\cr 2638 \.\%&|math:|\.{\\\%}\cr 2639 \.\^&|math:|\.{\\\^}\cr 2640 \.(&|open:|\.(\cr 2641 \.)&|close:|\.)\cr 2642 \.[&|open:|\.[\cr 2643 \.]&|close:|\.]\cr 2644 \.*&|math:|\.{\\ast}\cr 2645 \.,&|math:|\.,|@,opt@,|\.9\cr 2646 \.{..}&|math:|\.{\\to}\cr 2647 \..&|simp:|\..\cr 2648 \.:&|colon:|\.:\cr 2649 \.;&|semi:|\.;\cr 2650 identifier&|simp:|\.{\\\\\{{\rm$\,$identifier$\,$}\}}\cr 2651 \.E in constant&|exp:|\.{\\E\{}\cr 2652 digit $d$&|simp:|$d$\cr 2653 other character $c$&|math:|$c$\cr 2654 \.{and}&|math:|\.{\\W}\cr 2655 \.{array}&|alpha:|\stars\cr 2656 \.{begin}&|beginning:|$|force|\,\stars\,|cancel|$\qquad|intro:|\cr 2657 \.{case}&|casey:|\qquad|alpha:|$|force|\,\stars$\cr 2658 \.{const}&|intro:|$|force|\,|backup|\,\stars$\cr 2659 \.{div}&|math:|$|math_bin|\,\stars\,\.\}$\cr 2660 \.{do}&|omega:|\stars\cr 2661 \.{downto}&|math:|$|math_rel|\,\stars\,\.\}$\cr 2662 \.{else}&|terminator:|\qquad|elsie:|$|force|\,|backup|\,\stars$\cr 2663 \.{end}&|terminator:|\qquad|close:|$|force|\,\stars$\cr 2664 \.{file}&|alpha:|\stars\cr 2665 \.{for}&|alpha:|$|force|\,\stars$\cr 2666 \.{function}&|proc:|$|force|\,|backup|\,\stars\,|cancel|$\qquad 2667 |intro:|$|indent|\,\.{\\\ }$\cr 2668 \.{goto}&|intro:|\stars\cr 2669 \.{if}&|cond:|\qquad|alpha:|$|force|\,\stars$\cr 2670 \.{in}&|math:|\.{\\in}\cr 2671 \.{label}&|intro:|$|force|\,|backup|\,\stars$\cr 2672 \.{mod}&|math:|$|math_bin|\,\stars\,\.\}$\cr 2673 \.{nil}&|simp:|\stars\cr 2674 \.{not}&|math:|\.{\\R}\cr 2675 \.{of}&|omega:|\stars\cr 2676 \.{or}&|math:|\.{\\V}\cr 2677 \.{packed}&|intro:|\stars\cr 2678 \.{procedure}&|proc:|$|force|\,|backup|\,\stars\,|cancel|$\qquad 2679 |intro:|$|indent|\,\.{\\\ }$\cr 2680 \.{program}&|proc:|$|force|\,|backup|\,\stars\,|cancel|$\qquad 2681 |intro:|$|indent|\,\.{\\\ }$\cr 2682 \.{record}&|record_head:|\stars\qquad|intro:|\cr 2683 \.{repeat}&|beginning:|$|force|\,|indent|\,\stars\,|cancel|$\qquad|intro:|\cr 2684 \.{set}&|alpha:|\stars\cr 2685 \.{then}&|omega:|\stars\cr 2686 \.{to}&|math:|$|math_rel|\,\stars\,\.\}$\cr 2687 \.{type}&|intro:|$|force|\,|backup|\,\stars$\cr 2688 \.{until}&|terminator:|\qquad|close:|$|force|\,|backup|\,\stars$\qquad 2689 |clause:|\cr 2690 \.{var}&|var_head:|$|force|\,|backup|\,\stars\,|cancel|$\qquad|intro:|\cr 2691 \.{while}&|alpha:|$|force|\,\stars$\cr 2692 \.{with}&|alpha:|$|force|\,\stars$\cr 2693 \.{xclause}&|alpha:|$|force|\,\.{\\\~}$\qquad|omega:|\stars\cr 2694 \.{@@\'}$\,$const&|simp:|\.{\\O\{}\hbox{const}\.\}\cr 2695 \.{@@"}$\,$const&|simp:|\.{\\H\{}\hbox{const}\.\}\cr 2696 \.{@@\$}&|simp:|\.{\\)}\cr 2697 \.{@@\\}&|simp:|\.{\\]}\cr 2698 \.{@@,}&|math:|\.{\\,}\cr 2699 \.{@@t}$\,$stuff$\,$\.{@@>}&|simp:|\.{\\hbox\{{\rm$\,$stuff$\,$}\}}\cr 2700 \.{@@<}$\,$module$\,$\.{@@>}&|mod_scrap:|\.{\\X$n$:{\rm$\,$module$\,$}\\X}\cr 2701 \.{@@\#}&|comment:||big_force|\cr 2702 \.{@@/}&|comment:||force|\cr 2703 \.{@@\char'174}&|simp:|$|opt|\,\.0$\cr 2704 \.{@@+}&|comment:|$|big_cancel|\,\.{\\\ }\,|big_cancel|$\cr 2705 \.{@@;}&|semi:|\cr 2706 \.{@@\&}&|math:|\.{\\J}\cr 2707 \.{@@\{}&|math:|\.{\\B}\cr 2708 \.{@@\}}&|math:|\.{\\T}\cr} 2709 \yskip\noindent When a string is output, certain characters are preceded by 2710 `\.\\' signs so that they will print properly. 2711 2712 A comment in the input will be combined with the preceding 2713 |omega| or |semi| scrap, or with the following |terminator| scrap, if 2714 possible; otherwise it will be inserted as a separate |terminator| scrap. 2715 An additional ``comment'' is effectively appended at the end of the 2716 \PASCAL\ text, just before translation begins; this consists of a |cancel| 2717 token in the case of \PASCAL\ text in \pb, otherwise it consists of a 2718 |force| token. 2719 2720 From this table it is evident that \.{WEAVE} will parse a lot of non-\PASCAL\ 2721 programs. For example, the reserved words `\.{for}' and `\.{array}' are 2722 treated in an identical way by \.{WEAVE} from a syntactic standpoint, 2723 and semantically they are equivalent except that a forced line break occurs 2724 just before `\&{for}'; \PASCAL\ programmers may well be surprised at this 2725 similarity. The idea is to keep \.{WEAVE}'s rules as simple as possible, 2726 consistent with doing a reasonable job on syntactically correct \PASCAL\ 2727 programs. The production rules below have been formulated in the same 2728 spirit of ``almost anything goes.'' 2729 2730 @ Here is a table of all the productions. The reader can best get a feel for 2731 @^productions, table of@> 2732 how they work by trying them out by hand on small examples; no amount of 2733 explanation will be as effective as watching the rules in action. 2734 Parsing can also be watched by debugging with `\.{@@2}'. 2735 2736 \def\[#1]{\quad$\dleft#1\dright$} 2737 \def\sp{\.{\ }} 2738 \yskip 2739 \halign to\the\hsize{\hfil\it# & 2740 #\hfil\hskip-200pt\tabskip 0pt plus 100pt& 2741 #\hfil\tabskip0pt\cr 2742 &Production categories\[\hbox{translations}]&Remarks\cr 2743 \noalign{\yskip} 2744 1&|alpha@,math@,colon| $\RA$ |alpha@,math|&e.g., |case v:boolean of|\cr 2745 2&|alpha@,math@,omega| $\RA$ |clause|\[C=A\,\sp\,\.\$\,M\,\.\$\,\sp\,|indent|\, 2746 O]&e.g., |while x>0 do|\cr 2747 3&|alpha@,omega| $\RA$ |clause|\[C=A\,\sp\,|indent|\,O]&e.g., |file of|\cr 2748 4&|alpha@,simp| $\RA$ |alpha@,math|&convert to math mode\cr 2749 5&|beginning@,close@,(terminator@t or @>stmt)| $\RA$ |stmt|&compound statement 2750 ends\cr 2751 6&|beginning@,stmt| $\RA$ |beginning|\[B_2=B_1\,|break_space|\,S]&compound 2752 statement grows\cr 2753 7&|case_head@,casey@,clause| $\RA$ |case_head|\[C_4=C_1\,|outdent|\,C_2\,C_3]& 2754 variant records\cr 2755 8&|case_head@,close@,terminator| $\RA$ |stmt|\[S=C_1\,|cancel|\,|outdent|\, 2756 C_2\,T]&end of case statement\cr 2757 9&|case_head@,stmt| $\RA$ |case_head|\[C_2=C_1\,|force|\,S]&case statement 2758 grows\cr 2759 10&|casey@,clause| $\RA$ |case_head|&beginning of case statement\cr 2760 11&|clause@,stmt| $\RA$ |stmt|\[S_2=C\,|break_space|\,S_1\,|cancel|\,|outdent|\, 2761 |force|]&end of controlled statement\cr 2762 12&|cond@,clause@,stmt@,elsie| $\RA$ |clause|\[C_3=C_1\,C_2\,|break_space|\,S\, 2763 E\,\sp\,|cancel|]&complete conditional\cr 2764 13&|cond@,clause@,stmt| $\RA$ |stmt|\cr 2765 &\qquad\[S_2=C_1\,C_2\,|break_space|\,S_1\, 2766 |cancel|\,|outdent|\,|force|]&incomplete conditional\cr 2767 14&|elsie| $\RA$ |intro|&unmatched else\cr 2768 15&|exp@,math@,simp|* $\RA$ |math|\[M_2=E\,M_1\,S\,\.\}]&signed exponent\cr 2769 16&|exp@,simp|* $\RA$ |math|\[M=E\,S\,\.\}]&unsigned exponent\cr 2770 17&|intro@,stmt| $\RA$ |stmt|\[S_2=I\,\sp\,|opt|\,\.7\,|cancel|\,S_1]&labeled 2771 statement, etc.\cr 2772 18&|math@,close| $\RA$ |stmt@,close|\[S=\.\$\,M\,\.\$]&end of field list\cr 2773 19&|math@,colon| $\RA$ |intro|\[I=|force|\,|backup|\,\.\$\,M\,\.\$\,C]&compound 2774 label\cr 2775 20&|math@,math| $\RA$ |math|&simple concatenation\cr 2776 21&|math@,simp| $\RA$ |math|&simple concatenation\cr 2777 22&|math@,stmt| $\RA$ |stmt|\cr 2778 &\qquad\[S_2=\.\$\,M\,\.\$\,|indent|\,|break_space|\, 2779 S_1\,|cancel|\,|outdent|\,|force|]¯o or type definition\cr 2780 23&|math@,terminator| $\RA$ |stmt|\[S=\.\$\,M\,\.\$\,T]&statement involving 2781 math\cr 2782 24&|mod_scrap@,(terminator@t or @>semi)| $\RA$ |stmt|\[S=M\,T\,|force|]&module 2783 like a statement\cr 2784 25&|mod_scrap| $\RA$ |simp|&module unlike a statement\cr 2785 26&|open@,case_head@,close| $\RA$ |math|\[M=O\,\.\$\,|cancel|\,C_1\, 2786 |cancel|\,|outdent|\,\.\$\,C_2]&case in field list\cr 2787 27&|open@,close| $\RA$ |math|\[M=O\,\.\\\,\.,\,C]&empty set |[]|\cr 2788 28&|open@,math@,case_head@,close| $\RA$ |math|\cr 2789 &\qquad\[M_2=O\,M_1\,\.\$\,|cancel|\, 2790 C_1\,|cancel|\,|outdent|\,\.\$\,C_2]&case in field list\cr 2791 29&|open@,math@,close| $\RA$ |math|&parenthesized group\cr 2792 30&|open@,math@,colon| $\RA$ |open@,math|&colon in parentheses\cr 2793 31&|open@,math@,proc@,intro| $\RA$ |open@,math|\[M_2=M_1\,|math_op|\,|cancel|\, 2794 P\,\.\}]&|procedure| in parentheses\cr 2795 32&|open@,math@,semi| $\RA$ |open@,math|\[M_2=M_1\,S\,\.\\\,\.,\,|opt|\, 2796 \.5]&semicolon in parentheses\cr 2797 33&|open@,math@,var_head@,intro| $\RA$ |open@,math|\[M_2=M_1\,|math_op|\, 2798 |cancel|\,V\,\.\}]&|var| in parentheses\cr 2799 34&|open@,proc@,intro| $\RA$ |open@,math|\[M=|math_op|\,|cancel|\, 2800 P\,\.\}]&|procedure| in parentheses\cr 2801 35&|open@,simp| $\RA$ |open@,math|&convert to math mode\cr 2802 36&|open@,stmt@,close| $\RA$ |math|\[M=O\,\.\$\,|cancel|\,S\,|cancel|\, 2803 \.\$\,C]&field list\cr 2804 37&|open@,var_head@,intro| $\RA$ |open@,math|\[M=|math_op|\,|cancel|\,V\, 2805 \.\}]&|var| in parentheses\cr 2806 38&|proc@,beginning@,close@,terminator| $\RA$ |stmt|\[S=P\,|cancel|\, 2807 |outdent|\,B\,C\,T]&end of procedure declaration\cr 2808 39&|proc@,stmt| $\RA$ |proc|\[P_2=P_1\,|break_space|\,S]&procedure declaration 2809 grows\cr 2810 40&|record_head@,intro@,casey| $\RA$ |casey|\[C_2=R\,I\,\sp\,|cancel|\,C_1]& 2811 \&{record case} $\ldots$\cr 2812 41&|record_head| $\RA$ |case_head|\[C=|indent|\,R\,|cancel|]&other \&{record} 2813 structures\cr 2814 42&|semi| $\RA$ |terminator|&semicolon after statement\cr 2815 43&|simp@,close| $\RA$ |stmt@,close|&end of field list\cr 2816 44&|simp@,colon| $\RA$ |intro|\[I=|force|\,|backup|\,S\,C]&simple label\cr 2817 45&|simp@,math| $\RA$ |math|&simple concatenation\cr 2818 46&|simp@,mod_scrap| $\RA$ |mod_scrap|&in emergencies\cr 2819 47&|simp@,simp| $\RA$ |simp|&simple concatenation\cr 2820 48&|simp@,terminator| $\RA$ |stmt|&simple statement\cr 2821 49&|stmt@,stmt| $\RA$ |stmt|\[S_3=S_1\,|break_space|\,S_2]&adjacent 2822 statements\cr 2823 50&|terminator| $\RA$ |stmt|&empty statement\cr 2824 51&|var_head@,beginning| $\RA$ |stmt@,beginning|&end of variable 2825 declarations\cr 2826 52&|var_head@,math@,colon| $\RA$ |var_head@,intro|\[I=\.\$\,M\,\.\$\,C]& 2827 variable declaration\cr 2828 53&|var_head@,simp@,colon| $\RA$ |var_head@,intro|&variable declaration\cr 2829 54&|var_head@,stmt| $\RA$ |var_head|\[V_2=V_1\,|break_space|\,S]&variable 2830 declarations grow\cr} 2831 \yskip\noindent 2832 Translations are not specified here when they are simple concatenations 2833 of the scraps that change. For example, the full translation of 2834 `|open@,math@,colon| $\RA$ |open@,math|' is $O_2=O_1$, $M_2=M_1C$. 2835 2836 The notation `|simp|*', in the |exp|-related productions above, 2837 stands for a |simp| scrap that isn't followed by another |simp|. 2838 2839 @* Implementing the productions. 2840 When \PASCAL\ text is to be processed with the grammar above, we put its 2841 initial scraps $s_1\ldots s_n$ into two arrays |cat[1..n]| and |trans[1..n]|. 2842 The value of |cat[k]| is simply a category code from the list above; the 2843 value of |trans[k]| is a text pointer, i.e., an index into |tok_start|. 2844 Our production rules have the nice property that the right-hand side is never 2845 longer than the left-hand side. Therefore it is convenient to use sequential 2846 allocation for the current sequence of scraps. Five pointers are used to 2847 manage the parsing: 2848 2849 \yskip\hang |pp| (the parsing pointer) is such that we are trying to match 2850 the category codes |cat[pp]@,cat[pp+1]|$\,\ldots\,$ to the left-hand sides 2851 of productions. 2852 2853 \yskip\hang |scrap_base|, |lo_ptr|, |hi_ptr|, and |scrap_ptr| are such that 2854 the current sequence of scraps appears in positions |scrap_base| through 2855 |lo_ptr| and |hi_ptr| through |scrap_ptr|, inclusive, in the |cat| and 2856 |trans| arrays. Scraps located between |scrap_base| and |lo_ptr| have 2857 been examined, while those in positions |>=hi_ptr| have not yet been 2858 looked at by the parsing process. 2859 2860 \yskip\noindent Initially |scrap_ptr| is set to the position of the final 2861 scrap to be parsed, and it doesn't change its value. The parsing process 2862 makes sure that |lo_ptr>=pp+3|, since productions have as many as four terms, 2863 by moving scraps from |hi_ptr| to |lo_ptr|. If there are 2864 fewer than |pp+3| scraps left, the positions up to |pp+3| are filled with 2865 blanks that will not match in any productions. Parsing stops when 2866 |pp=lo_ptr+1| and |hi_ptr=scrap_ptr+1|. 2867 2868 The |trans| array elements are declared to be of type |0..10239| instead 2869 of type |text_pointer|, because the final sorting phase of \.{WEAVE} 2870 uses this array to contain elements of type |name_pointer|. Both 2871 of these types are subranges of |0..10239|. 2872 2873 @<Glo...@>= 2874 @!cat:array[0..max_scraps] of eight_bits; {category codes of scraps} 2875 @!trans:array[0..max_scraps] of 0..10239; {translation texts of scraps} 2876 @!pp:0..max_scraps; {current position for reducing productions} 2877 @!scrap_base:0..max_scraps; {beginning of the current scrap sequence} 2878 @!scrap_ptr:0..max_scraps; {ending of the current scrap sequence} 2879 @!lo_ptr:0..max_scraps; {last scrap that has been examined} 2880 @!hi_ptr:0..max_scraps; {first scrap that has not been examined} 2881 stat@!max_scr_ptr:0..max_scraps; {largest value assumed by |scrap_ptr|} 2882 tats 2883 2884 @ @<Set init...@>= 2885 scrap_base:=1; scrap_ptr:=0; 2886 stat max_scr_ptr:=0; @+tats 2887 2888 @ Token lists in |@!tok_mem| are composed of the following kinds of 2889 items for \TeX\ output. 2890 2891 \yskip\item{$\bullet$}ASCII codes and special codes like |force| and 2892 |math_rel| represent themselves; 2893 2894 \item{$\bullet$}|id_flag+p| represents \.{\\\\\{{\rm identifier $p$}\}}; 2895 2896 \item{$\bullet$}|res_flag+p| represents \.{\\\&\{{\rm identifier $p$}\}}; 2897 2898 \item{$\bullet$}|mod_flag+p| represents module name |p|; 2899 2900 \item{$\bullet$}|tok_flag+p| represents token list number |p|; 2901 2902 \item{$\bullet$}|inner_tok_flag+p| represents token list number |p|, to be 2903 translated without line-break controls. 2904 2905 @d id_flag=10240 {signifies an identifier} 2906 @d res_flag=id_flag+id_flag {signifies a reserved word} 2907 @d mod_flag=res_flag+id_flag {signifies a module name} 2908 @d tok_flag==mod_flag+id_flag {signifies a token list} 2909 @d inner_tok_flag==tok_flag+id_flag {signifies a token list in `\pb'} 2910 @# 2911 @d lbrace==xchr["{"] {this avoids possible \PASCAL\ compiler confusion} 2912 @d rbrace==xchr["}"] {because these braces might occur within comments} 2913 2914 @p @!debug procedure print_text(@!p:text_pointer); {prints a token list} 2915 var j:0..max_toks; {index into |tok_mem|} 2916 @!r:0..id_flag-1; {remainder of token after the flag has been stripped off} 2917 begin if p>=text_ptr then print('BAD') 2918 else for j:=tok_start[p] to tok_start[p+1]-1 do 2919 begin r:=tok_mem[j] mod id_flag; 2920 case tok_mem[j] div id_flag of 2921 1: begin print('\\',lbrace); print_id(r); print(rbrace); 2922 end; {|id_flag|} 2923 2: begin print('\&',lbrace); print_id(r); print(rbrace); 2924 end; {|res_flag|} 2925 3: begin print('<'); print_id(r); print('>'); 2926 end; {|mod_flag|} 2927 4: print('[[',r:1,']]'); {|tok_flag|} 2928 5: print('|[[',r:1,']]|'); {|inner_tok_flag|} 2929 othercases @<Print token |r| in symbolic form@> 2930 endcases; 2931 end; 2932 end; 2933 gubed 2934 2935 @ @<Print token |r|...@>= 2936 case r of 2937 math_bin: print('\mathbin',lbrace); 2938 math_rel: print('\mathrel',lbrace); 2939 math_op: print('\mathop',lbrace); 2940 big_cancel: print('[ccancel]'); 2941 cancel: print('[cancel]'); 2942 indent: print('[indent]'); 2943 outdent: print('[outdent]'); 2944 backup: print('[backup]'); 2945 opt: print('[opt]'); 2946 break_space: print('[break]'); 2947 force: print('[force]'); 2948 big_force: print('[fforce]'); 2949 end_translation: print('[quit]'); 2950 othercases print(xchr[r]) 2951 endcases 2952 2953 @ The production rules listed above are embedded directly into the \.{WEAVE} 2954 program, since it is easier to do this than to write an interpretive system 2955 that would handle production systems in general. Several macros are defined 2956 here so that the program for each production is fairly short. 2957 2958 All of our productions conform to the general notion that some |k| 2959 consecutive scraps starting at some position |j| are to be replaced by a 2960 single scrap of some category |c| whose translation is composed from the 2961 translations of the disappearing scraps. After this production has been 2962 applied, the production pointer |pp| should change by an amount |d|. Such 2963 a production can be represented by the quadruple $(j,k,c,d)$. For example, 2964 the production `|simp@,math| $\RA$ |math|' would be represented by 2965 `$(|pp|,2,|math|,-1)$'; in this case the pointer $pp$ should decrease by 1 2966 after the production has been applied, because some productions with 2967 |math| in their second positions might now match, but no productions have 2968 |math| in the third or fourth position of their left-hand sides. Note that 2969 the value of |d| is determined by the whole collection of productions, not 2970 by an individual one. Consider the further example 2971 `|var_head@,math@,colon| $\RA$ |var_head@,intro|', which is represented by 2972 `$(|pp|+1,2,|intro|,+1)$'; the $+1$ here is deduced by looking at the 2973 grammar and seeing that no matches could possibly occur at positions |<=pp| 2974 after this production has been applied. The determination of |d| has been 2975 done by hand in each case, based on the full set of productions but not on 2976 the grammar of \PASCAL\ or on the rules for constructing the initial 2977 scraps. 2978 2979 We also attach a serial number to each production, so that additional 2980 information is available when debugging. For example, the program below 2981 contains the statement `|reduce(pp+1,2,intro,+1)(52)|' when it implements 2982 the production just mentioned. 2983 2984 Before calling |reduce|, the program should have appended the tokens of 2985 the new translation to the |tok_mem| array. We commonly want to append 2986 copies of several existing translations, and macros are defined to 2987 simplify these common cases. For example, |app2(pp)| will append the 2988 translations of two consecutive scraps, |trans[pp]| and |trans[pp+1]|, to 2989 the current token list. If the entire new translation is formed in this 2990 way, we write `$|squash|(j,k,c,d)$' instead of `$|reduce|(j,k,c,d)$'. For 2991 example, `|squash(pp,2,math,-1)|' is an abbreviation for `|app2(pp); 2992 reduce(pp,2,math,-1)|'. 2993 2994 The code below is an exact translation of the production rules into 2995 \PASCAL, using such macros, and the reader should have no difficulty 2996 understanding the format by comparing the code with the symbolic 2997 productions as they were listed earlier. 2998 2999 {\sl Caution:\/} The macros |app|, |app1|, |app2|, and |app3| are 3000 sequences of statements that are not enclosed with |begin| and $|end|$, 3001 because such delimiters would make the \PASCAL\ program much longer. This 3002 means that it is necessary to write |begin| and |end| explicitly when such 3003 a macro is used as a single statement. Several mysterious bugs in the 3004 original programming of \.{WEAVE} were caused by a failure to remember 3005 this fact. Next time the author will know better. 3006 3007 @d production(#)==@!debug prod(#) gubed; goto found 3008 @d reduce(#)==red(#); production 3009 @d production_end(#)==@!debug prod(#) gubed; goto found; 3010 end 3011 @d squash(#)==begin sq(#); production_end 3012 @d app(#)==tok_mem[tok_ptr]:=#; incr(tok_ptr) {this is like |app_tok|, 3013 but it doesn't test for overflow} 3014 @d app1(#)==tok_mem[tok_ptr]:=tok_flag+trans[#]; incr(tok_ptr) 3015 @d app2(#)==app1(#);app1(#+1) 3016 @d app3(#)==app2(#);app1(#+2) 3017 3018 @ Let us consider the big case statement for productions now, before looking 3019 at its context. We want to design the program so that this case statement 3020 works, so we might as well not keep ourselves in suspense about exactly what 3021 code needs to be provided with a proper environment. 3022 3023 The code here is more complicated than it need be, since some popular 3024 \PASCAL\ compilers are unable to deal with procedures that contain a lot 3025 of program text. The |translate| procedure, which incorporates the |case| 3026 statement here, would become too long for those compilers if we did 3027 not do something to split the cases into parts. Therefore 3028 a separate procedure called |five_cases| has been introduced. 3029 @^split procedures@> 3030 This auxiliary procedure contains approximately half of the program text 3031 that |translate| would otherwise have had. There's also a procedure 3032 called |alpha_cases|, which turned out to be necessary because the best 3033 two-way split wasn't good enough. The procedure could be split further 3034 in an analogous manner, but the present scheme works on all compilers 3035 known to the author. 3036 3037 @<Match a production at |pp|, or increase |pp| if there is no match@>= 3038 if cat[pp]<=alpha then 3039 if cat[pp]<alpha then five_cases@+else alpha_cases 3040 else begin case cat[pp] of 3041 case_head: @<Cases for |case_head|@>; 3042 casey: @<Cases for |casey|@>; 3043 clause: @<Cases for |clause|@>; 3044 cond: @<Cases for |cond|@>; 3045 elsie: @<Cases for |elsie|@>; 3046 exp: @<Cases for |exp|@>; 3047 mod_scrap: @<Cases for |mod_scrap|@>; 3048 proc: @<Cases for |proc|@>; 3049 record_head: @<Cases for |record_head|@>; 3050 semi: @<Cases for |semi|@>; 3051 stmt: @<Cases for |stmt|@>; 3052 terminator: @<Cases for |terminator|@>; 3053 var_head: @<Cases for |var_head|@>; 3054 othercases do_nothing 3055 endcases;@/ 3056 incr(pp); {if no match was found, we move to the right} 3057 found: end 3058 3059 @ Here are the procedures that need to be present for the reason just 3060 explained. 3061 3062 @<Declaration of subprocedures for |translate|@>= 3063 procedure five_cases; {handles almost half of the syntax} 3064 label found; 3065 begin case cat[pp] of 3066 beginning: @<Cases for |beginning|@>; 3067 intro: @<Cases for |intro|@>; 3068 math: @<Cases for |math|@>; 3069 open: @<Cases for |open|@>; 3070 simp: @<Cases for |simp|@>; 3071 othercases do_nothing 3072 endcases;@/ 3073 incr(pp); {if no match was found, we move to the right} 3074 found: end; 3075 @# 3076 procedure alpha_cases; 3077 label found; 3078 begin @<Cases for |alpha|@>; 3079 incr(pp); {if no match was found, we move to the right} 3080 found: end; 3081 3082 @ Now comes the code that tries to match each production starting 3083 with a particular type of scrap. Whenever a match is discovered, 3084 the |squash| or |reduce| macro will cause the appropriate action 3085 to be performed, followed by |goto found|. 3086 3087 @<Cases for |alpha|@>= 3088 if cat[pp+1]=math then 3089 begin if cat[pp+2]=colon then squash(pp+1,2,math,0)(1) 3090 else if cat[pp+2]=omega then 3091 begin app1(pp); app(" "); app("$"); app1(pp+1); 3092 app("$"); app(" "); app(indent); app1(pp+2); 3093 reduce(pp,3,clause,-2)(2); 3094 end; 3095 end 3096 else if cat[pp+1]=omega then 3097 begin app1(pp); app(" "); app(indent); app1(pp+1); 3098 reduce(pp,2,clause,-2)(3); 3099 end 3100 else if cat[pp+1]=simp then squash(pp+1,1,math,0)(4) 3101 3102 @ @<Cases for |beginning|@>= 3103 if cat[pp+1]=close then 3104 begin if (cat[pp+2]=terminator)or(cat[pp+2]=stmt) then 3105 squash(pp,3,stmt,-2)(5); 3106 end 3107 else if cat[pp+1]=stmt then 3108 begin app1(pp); app(break_space); app1(pp+1); 3109 reduce(pp,2,beginning,-1)(6); 3110 end 3111 3112 @ @<Cases for |case_head|@>= 3113 if cat[pp+1]=casey then 3114 begin if cat[pp+2]=clause then 3115 begin app1(pp); app(outdent); app2(pp+1); 3116 reduce(pp,3,case_head,0)(7); 3117 end; 3118 end 3119 else if cat[pp+1]=close then 3120 begin if cat[pp+2]=terminator then 3121 begin app1(pp); app(cancel); app(outdent); app2(pp+1); 3122 reduce(pp,3,stmt,-2)(8); 3123 end; 3124 end 3125 else if cat[pp+1]=stmt then 3126 begin app1(pp); app(force); app1(pp+1); 3127 reduce(pp,2,case_head,0)(9); 3128 end 3129 3130 @ @<Cases for |casey|@>= 3131 if cat[pp+1]=clause then squash(pp,2,case_head,0)(10) 3132 3133 @ @<Cases for |clause|@>= 3134 if cat[pp+1]=stmt then 3135 begin app1(pp); app(break_space); app1(pp+1); 3136 app(cancel); app(outdent); 3137 app(force); reduce(pp,2,stmt,-2)(11); 3138 end 3139 3140 @ @<Cases for |cond|@>= 3141 if (cat[pp+1]=clause)and(cat[pp+2]=stmt) then 3142 if cat[pp+3]=elsie then 3143 begin app2(pp); app(break_space); app2(pp+2); app(" "); 3144 app(cancel); reduce(pp,4,clause,-2)(12); 3145 end 3146 else begin app2(pp); app(break_space); app1(pp+2); app(cancel); 3147 app(outdent); app(force); reduce(pp,3,stmt,-2)(13); 3148 end 3149 3150 @ @<Cases for |elsie|@>= 3151 squash(pp,1,intro,-3)(14) 3152 3153 @ @<Cases for |exp|@>= 3154 if cat[pp+1]=math then 3155 begin if cat[pp+2]=simp then if cat[pp+3]<>simp then 3156 begin app3(pp); app("}"); reduce(pp,3,math,-1)(15); 3157 end; 3158 end 3159 else if cat[pp+1]=simp then if cat[pp+2]<>simp then 3160 begin app2(pp); app("}"); reduce(pp,2,math,-1)(16); 3161 end 3162 3163 @ @<Cases for |intro|@>= 3164 if cat[pp+1]=stmt then 3165 begin app1(pp); app(" "); app(opt); app("7"); 3166 app(cancel); app1(pp+1); reduce(pp,2,stmt,-2)(17); 3167 end 3168 3169 @ @<Cases for |math|@>= 3170 if cat[pp+1]=close then 3171 begin app("$"); app1(pp); app("$"); reduce(pp,1,stmt,-2)(18); 3172 end 3173 else if cat[pp+1]=colon then 3174 begin app(force); app(backup); app("$"); app1(pp); 3175 app("$"); app1(pp+1); reduce(pp,2,intro,-3)(19); 3176 end 3177 else if cat[pp+1]=math then squash(pp,2,math,-1)(20) 3178 else if cat[pp+1]=simp then squash(pp,2,math,-1)(21) 3179 else if cat[pp+1]=stmt then 3180 begin app("$"); app1(pp); app("$"); app(indent); 3181 app(break_space); app1(pp+1); app(cancel); app(outdent); 3182 app(force); reduce(pp,2,stmt,-2)(22); 3183 end 3184 else if cat[pp+1]=terminator then 3185 begin app("$"); app1(pp); app("$"); app1(pp+1); 3186 reduce(pp,2,stmt,-2)(23); 3187 end 3188 3189 @ @<Cases for |mod_scrap|@>= 3190 if (cat[pp+1]=terminator)or(cat[pp+1]=semi) then 3191 begin app2(pp); app(force); reduce(pp,2,stmt,-2)(24); 3192 end 3193 else squash(pp,1,simp,-2)(25) 3194 3195 @ @<Cases for |open|@>= 3196 if (cat[pp+1]=case_head)and(cat[pp+2]=close) then 3197 begin app1(pp); app("$"); app(cancel); app1(pp+1); app(cancel); 3198 app(outdent); app("$"); app1(pp+2); reduce(pp,3,math,-1)(26); 3199 end 3200 else if cat[pp+1]=close then 3201 begin app1(pp); app("\"); app(","); app1(pp+1); 3202 @.\\,@> 3203 reduce(pp,2,math,-1)(27); 3204 end 3205 else if cat[pp+1]=math then @<Cases for |open@,math|@> 3206 else if cat[pp+1]=proc then 3207 begin if cat[pp+2]=intro then 3208 begin app(math_op); app(cancel); app1(pp+1); app("}"); 3209 reduce(pp+1,2,math,0)(34); 3210 end; 3211 end 3212 else if cat[pp+1]=simp then squash(pp+1,1,math,0)(35) 3213 else if (cat[pp+1]=stmt)and(cat[pp+2]=close) then 3214 begin app1(pp); app("$"); app(cancel); app1(pp+1); app(cancel); 3215 app("$"); app1(pp+2); reduce(pp,3,math,-1)(36); 3216 end 3217 else if cat[pp+1]=var_head then 3218 begin if cat[pp+2]=intro then 3219 begin app(math_op); app(cancel); app1(pp+1); app("}"); 3220 reduce(pp+1,2,math,0)(37); 3221 end; 3222 end 3223 3224 @ @<Cases for |open@,math|@>= 3225 begin if (cat[pp+2]=case_head)and(cat[pp+3]=close) then 3226 begin app2(pp); app("$"); app(cancel); app1(pp+2); app(cancel); 3227 app(outdent); app("$"); app1(pp+3); reduce(pp,4,math,-1)(28); 3228 end 3229 else if cat[pp+2]=close then squash(pp,3,math,-1)(29) 3230 else if cat[pp+2]=colon then squash(pp+1,2,math,0)(30) 3231 else if cat[pp+2]=proc then 3232 begin if cat[pp+3]=intro then 3233 begin app1(pp+1); app(math_op); app(cancel); 3234 app1(pp+2); app("}"); reduce(pp+1,3,math,0)(31); 3235 end; 3236 end 3237 else if cat[pp+2]=semi then 3238 begin app2(pp+1); app("\"); app(","); app(opt); app("5"); 3239 @.\\,@> 3240 reduce(pp+1,2,math,0)(32); 3241 end 3242 else if cat[pp+2]=var_head then 3243 begin if cat[pp+3]=intro then 3244 begin app1(pp+1); app(math_op); app(cancel); 3245 app1(pp+2); app("}"); reduce(pp+1,3,math,0)(33); 3246 end; 3247 end; 3248 end 3249 3250 @ @<Cases for |proc|@>= 3251 if cat[pp+1]=beginning then 3252 begin if (cat[pp+2]=close)and(cat[pp+3]=terminator) then 3253 begin app1(pp); app(cancel); app(outdent); app3(pp+1); 3254 reduce(pp,4,stmt,-2)(38); 3255 end; 3256 end 3257 else if cat[pp+1]=stmt then 3258 begin app1(pp); app(break_space); app1(pp+1); 3259 reduce(pp,2,proc,-2)(39); 3260 end 3261 3262 @ @<Cases for |record_head|@>= 3263 if (cat[pp+1]=intro)and(cat[pp+2]=casey) then 3264 begin app2(pp); app(" "); app(cancel); app1(pp+2); 3265 reduce(pp,3,casey,-2)(40); 3266 end 3267 else begin app(indent); app1(pp); app(cancel); 3268 reduce(pp,1,case_head,0)(41); 3269 end 3270 3271 @ @<Cases for |semi|@>= 3272 squash(pp,1,terminator,-3)(42) 3273 3274 @ @<Cases for |simp|@>= 3275 if cat[pp+1]=close then squash(pp,1,stmt,-2)(43) 3276 else if cat[pp+1]=colon then 3277 begin app(force); app(backup); app2(pp); reduce(pp,2,intro,-3)(44); 3278 end 3279 else if cat[pp+1]=math then squash(pp,2,math,-1)(45) 3280 else if cat[pp+1]=mod_scrap then squash(pp,2,mod_scrap,0)(46) 3281 else if cat[pp+1]=simp then squash(pp,2,simp,-2)(47) 3282 else if cat[pp+1]=terminator then squash(pp,2,stmt,-2)(48) 3283 3284 @ @<Cases for |stmt|@>= 3285 if cat[pp+1]=stmt then 3286 begin app1(pp); app(break_space); app1(pp+1); 3287 reduce(pp,2,stmt,-2)(49); 3288 end 3289 3290 @ @<Cases for |terminator|@>= 3291 squash(pp,1,stmt,-2)(50) 3292 3293 @ @<Cases for |var_head|@>= 3294 if cat[pp+1]=beginning then squash(pp,1,stmt,-2)(51) 3295 else if cat[pp+1]=math then 3296 begin if cat[pp+2]=colon then 3297 begin app("$"); app1(pp+1); app("$"); app1(pp+2); 3298 reduce(pp+1,2,intro,+1)(52); 3299 end; 3300 end 3301 else if cat[pp+1]=simp then 3302 begin if cat[pp+2]=colon then squash(pp+1,2,intro,+1)(53); 3303 end 3304 else if cat[pp+1]=stmt then 3305 begin app1(pp); app(break_space); app1(pp+1); 3306 reduce(pp,2,var_head,-2)(54); 3307 end 3308 3309 @ The `|freeze_text|' macro is used to give official status to a token list. 3310 Before saying |freeze_text|, items are appended to the current token list, 3311 and we know that the eventual number of this token list will be the current 3312 value of |text_ptr|. But no list of that number really exists as yet, 3313 because no ending point for the current list has been 3314 stored in the |tok_start| array. After saying |freeze_text|, the 3315 old current token list becomes legitimate, and its number is the current 3316 value of |text_ptr-1| since |text_ptr| has been increased. The new 3317 current token list is empty and ready to be appended to. 3318 Note that |freeze_text| does not check to see that |text_ptr| hasn't gotten 3319 too large, since it is assumed that this test was done beforehand. 3320 3321 @d freeze_text==incr(text_ptr); tok_start[text_ptr]:=tok_ptr 3322 3323 @ The `|reduce|' macro used in our code for productions actually calls on 3324 a procedure named `|red|', which makes the appropriate changes to the 3325 scrap list. 3326 3327 @p procedure red(@!j:sixteen_bits; @!k:eight_bits; @!c:eight_bits; 3328 @!d:integer); 3329 var i:0..max_scraps; {index into scrap memory} 3330 begin cat[j]:=c; trans[j]:=text_ptr; freeze_text; 3331 if k>1 then 3332 begin for i:=j+k to lo_ptr do 3333 begin cat[i-k+1]:=cat[i]; trans[i-k+1]:=trans[i]; 3334 end; 3335 lo_ptr:=lo_ptr-k+1; 3336 end; 3337 @<Change |pp| to $\max(|scrap_base|,|pp+d|)$@>; 3338 end; 3339 3340 @ @<Change |pp| to $\max(|scrap_base|,|pp+d|)$@>= 3341 if pp+d>=scrap_base then pp:=pp+d 3342 else pp:=scrap_base 3343 3344 @ Similarly, the `|squash|' macro invokes a procedure called `|sq|'. This 3345 procedure takes advantage of the simplification that occurs when |k=1|. 3346 3347 @p procedure sq(@!j:sixteen_bits; @!k:eight_bits; @!c:eight_bits; 3348 @!d:integer); 3349 var i:0..max_scraps; {index into scrap memory} 3350 begin if k=1 then 3351 begin cat[j]:=c; @<Change |pp|...@>; 3352 end 3353 else begin for i:=j to j+k-1 do 3354 begin app1(i); 3355 end; 3356 red(j,k,c,d); 3357 end; 3358 end; 3359 3360 @ Here now is the code that applies productions as long as possible. It 3361 requires two local labels (|found| and |done|), as well as a local 3362 variable (|i|). 3363 3364 @<Reduce the scraps using the productions until no more rules apply@>= 3365 loop@+begin @<Make sure the entries |cat[pp..(pp+3)]| are defined@>; 3366 if (tok_ptr+8>max_toks)or(text_ptr+4>max_texts) then 3367 begin stat if tok_ptr>max_tok_ptr then max_tok_ptr:=tok_ptr; 3368 if text_ptr>max_txt_ptr then max_txt_ptr:=text_ptr; 3369 tats@;@/ 3370 overflow('token/text'); 3371 end; 3372 if pp>lo_ptr then goto done; 3373 @<Match a production...@>; 3374 end; 3375 done: 3376 3377 @ If we get to the end of the scrap list, category codes equal to zero are 3378 stored, since zero does not match anything in a production. 3379 3380 @<Make sure the entries...@>= 3381 if lo_ptr<pp+3 then 3382 begin repeat if hi_ptr<=scrap_ptr then 3383 begin incr(lo_ptr);@/ 3384 cat[lo_ptr]:=cat[hi_ptr]; trans[lo_ptr]:=trans[hi_ptr];@/ 3385 incr(hi_ptr); 3386 end; 3387 until (hi_ptr>scrap_ptr)or(lo_ptr=pp+3); 3388 for i:=lo_ptr+1 to pp+3 do cat[i]:=0; 3389 end 3390 3391 @ If \.{WEAVE} is being run in debugging mode, the production numbers and 3392 current stack categories will be printed out when |tracing| is set to 2; 3393 a sequence of two or more irreducible scraps will be printed out when 3394 |tracing| is set to 1. 3395 @.\AT!2@> 3396 @.\AT!1@> 3397 3398 @<Glo...@>= 3399 @!debug@!tracing:0..2; {can be used to show parsing details} 3400 gubed 3401 3402 @ The |prod| procedure is called in debugging mode just after |reduce| or 3403 |squash|; its parameter is the number of the production that has just 3404 been applied. 3405 3406 @p @!debug procedure prod(@!n:eight_bits); {shows current categories} 3407 var k:1..max_scraps; {index into |cat|} 3408 begin if tracing=2 then 3409 begin print_nl(n:1,':'); 3410 for k:=scrap_base to lo_ptr do 3411 begin if k=pp then print('*') @+ else print(' '); 3412 print_cat(cat[k]); 3413 end; 3414 if hi_ptr<=scrap_ptr then print('...'); {indicate that more is coming} 3415 end; 3416 end; 3417 gubed 3418 3419 @ The |translate| function assumes that scraps have been stored in 3420 positions |scrap_base| through |scrap_ptr| of |cat| and |trans|. It 3421 appends a |terminator| scrap and begins to apply productions as much as 3422 possible. The result is a token list containing the translation of 3423 the given sequence of scraps. 3424 3425 After calling |translate|, we will have |text_ptr+3<=max_texts| and 3426 |tok_ptr+6<=max_toks|, so it will be possible to create up to three token 3427 lists with up to six tokens without checking for overflow. Before calling 3428 |translate|, we should have |text_ptr<max_texts| and |scrap_ptr<max_scraps|, 3429 since |translate| might add a new text and a new scrap before it checks 3430 for overflow. 3431 3432 @p @<Declaration of subprocedures for |translate|@>@; 3433 function translate:text_pointer; {converts a sequence of scraps} 3434 label done,found; 3435 var i: 1..max_scraps; {index into |cat|} 3436 @!j:0..max_scraps; {runs through final scraps} 3437 @!debug@!k:0..long_buf_size; {index into |buffer|} 3438 gubed 3439 begin pp:=scrap_base; lo_ptr:=pp-1; hi_ptr:=pp; 3440 @<If tracing, print an indication of where we are@>; 3441 @<Reduce the scraps...@>; 3442 if (lo_ptr=scrap_base)and(cat[lo_ptr]<>math) then translate:=trans[lo_ptr] 3443 else @<Combine the irreducible scraps that remain@>; 3444 end; 3445 3446 @ If the initial sequence of scraps does not reduce to a single scrap, 3447 we concatenate the translations of all remaining scraps, separated by 3448 blank spaces, with dollar signs surrounding the translations of |math| 3449 scraps. 3450 3451 @<Combine the irreducible...@>= 3452 begin @<If semi-tracing, show the irreducible scraps@>; 3453 for j:=scrap_base to lo_ptr do 3454 begin if j<>scrap_base then 3455 begin app(" "); 3456 end; 3457 if cat[j]=math then 3458 begin app("$"); 3459 end; 3460 app1(j); 3461 if cat[j]=math then 3462 begin app("$"); 3463 end; 3464 if tok_ptr+6>max_toks then overflow('token'); 3465 end; 3466 freeze_text; translate:=text_ptr-1; 3467 end 3468 3469 @ @<If semi-tracing, show the irreducible scraps@>= 3470 @!debug if (lo_ptr>scrap_base)and(tracing=1) then 3471 begin print_nl('Irreducible scrap sequence in section ',module_count:1); 3472 print_ln(':'); mark_harmless; 3473 for j:=scrap_base to lo_ptr do 3474 begin print(' '); print_cat(cat[j]); 3475 end; 3476 end; 3477 gubed 3478 3479 @ @<If tracing,...@>= 3480 @!debug if tracing=2 then 3481 begin print_nl('Tracing after l.',line:1,':'); mark_harmless; 3482 if loc>50 then 3483 begin print('...'); 3484 for k:=loc-50 to loc do print(xchr[buffer[k-1]]); 3485 end 3486 else for k:=1 to loc do print(xchr[buffer[k-1]]); 3487 end 3488 gubed 3489 3490 @* Initializing the scraps. 3491 If we are going to use the powerful production mechanism just developed, we 3492 must get the scraps set up in the first place, given a \PASCAL\ text. A table 3493 of the initial scraps corresponding to \PASCAL\ tokens appeared above in the 3494 section on parsing; our goal now is to implement that table. We shall do this 3495 by implementing a subroutine called |Pascal_parse| that is analogous to the 3496 |Pascal_xref| routine used during phase one. 3497 3498 Like |Pascal_xref|, the |Pascal_parse| procedure starts with the current 3499 value of |next_control| and it uses the operation |next_control:=get_next| 3500 repeatedly to read \PASCAL\ text until encountering the next `\v' or 3501 `\.\{', or until |next_control>=format|. The scraps corresponding to what 3502 it reads are appended into the |cat| and |trans| arrays, and |scrap_ptr| 3503 is advanced. 3504 3505 Like |prod|, this procedure has to split into pieces so that each 3506 part is short enough to be handled by \PASCAL\ compilers that discriminate 3507 against long subroutines. This time there are two split-off routines, 3508 called |easy_cases| and |sub_cases|. 3509 @^split procedures@> 3510 3511 After studying |Pascal_parse|, we will look at the sub-procedures 3512 |app_comment|, |app_octal|, and |app_hex| that are used in some of its 3513 branches. 3514 3515 @p @<Declaration of the |app_comment| procedure@>@; 3516 @<Declaration of the |app_octal| and |app_hex| procedures@>@; 3517 @<Declaration of the |easy_cases| procedure@>@; 3518 @<Declaration of the |sub_cases| procedure@>@; 3519 procedure Pascal_parse; {creates scraps from \PASCAL\ tokens} 3520 label reswitch, exit; 3521 var j:0..long_buf_size; {index into |buffer|} 3522 @!p:name_pointer; {identifier designator} 3523 begin while next_control<format do 3524 begin @<Append the scrap appropriate to |next_control|@>; 3525 next_control:=get_next; 3526 if (next_control="|")or(next_control="{") then return; 3527 end; 3528 exit:end; 3529 3530 @ The macros defined here are helpful abbreviations for the operations 3531 needed when generating the scraps. A scrap of category |c| whose 3532 translation has three tokens $t_1$, $t_2$, $t_3$ is generated by 3533 |sc3|$(t_1)(t_2)(t_3)(c)$, etc. 3534 3535 @d s0(#)==incr(scrap_ptr); cat[scrap_ptr]:=#; trans[scrap_ptr]:=text_ptr; 3536 freeze_text; 3537 end 3538 @d s1(#)==app(#);s0 3539 @d s2(#)==app(#);s1 3540 @d s3(#)==app(#);s2 3541 @d s4(#)==app(#);s3 3542 @d sc4==@+begin s4 3543 @d sc3==@+begin s3 3544 @d sc2==@+begin s2 3545 @d sc1==@+begin s1 3546 @d sc0(#)==begin incr(scrap_ptr); cat[scrap_ptr]:=#; trans[scrap_ptr]:=0; 3547 end 3548 @d comment_scrap(#)==begin app(#); app_comment; 3549 end 3550 3551 @ @<Append the scr...@>= 3552 @<Make sure that there is room for at least four more scraps, six more 3553 tokens, and four more texts@>; 3554 reswitch: case next_control of 3555 string,verbatim: @<Append a \(string scrap@>; 3556 identifier: @<Append an identifier scrap@>; 3557 TeX_string: @<Append a \TeX\ string scrap@>; 3558 othercases easy_cases 3559 endcases 3560 3561 @ The |easy_cases| each result in straightforward scraps. 3562 3563 @<Declaration of the |easy_cases| procedure@>= 3564 procedure easy_cases; {a subprocedure of |Pascal_parse|} 3565 begin case next_control of 3566 set_element_sign: sc3("\")("i")("n")(math); 3567 @.\\in@> 3568 double_dot: sc3("\")("t")("o")(math); 3569 @.\\to@> 3570 "#","$","%","^","_": sc2("\")(next_control)(math); 3571 @.\\\#@> 3572 @.\\\$@> 3573 @.\\\%@> 3574 @.\\\^@> 3575 ignore,"|",xref_roman,xref_wildcard,xref_typewriter: do_nothing; 3576 "(","[": sc1(next_control)(open); 3577 ")","]": sc1(next_control)(close); 3578 "*": sc4("\")("a")("s")("t")(math); 3579 @.\\ast@> 3580 ",": sc3(",")(opt)("9")(math); 3581 ".","0","1","2","3","4","5","6","7","8","9": sc1(next_control)(simp); 3582 ";": sc1(";")(semi); 3583 ":": sc1(":")(colon); 3584 @t\4@> @<Cases involving nonstandard ASCII characters@>@; 3585 exponent: sc3("\")("E")("{")(exp); 3586 @.\\E@> 3587 begin_comment: sc2("\")("B")(math); 3588 @.\\B@> 3589 end_comment: sc2("\")("T")(math); 3590 @.\\T@> 3591 octal: app_octal; 3592 hex: app_hex; 3593 check_sum: sc2("\")(")")(simp); 3594 @.\\)@> 3595 force_line: sc2("\")("]")(simp); 3596 @.\\]@> 3597 thin_space: sc2("\")(",")(math); 3598 @.\\,@> 3599 math_break: sc2(opt)("0")(simp); 3600 line_break: comment_scrap(force); 3601 big_line_break: comment_scrap(big_force); 3602 no_line_break: begin app(big_cancel); app("\"); app(" "); 3603 @.\\\ @> 3604 comment_scrap(big_cancel); 3605 end; 3606 pseudo_semi: sc0(semi); 3607 join: sc2("\")("J")(math); 3608 @.\\J@> 3609 othercases sc1(next_control)(math) 3610 endcases; 3611 end; 3612 3613 @ @<Make sure that there is room for at least four...@>= 3614 if (scrap_ptr+4>max_scraps)or(tok_ptr+6>max_toks)or(text_ptr+4>max_texts) then 3615 begin stat if scrap_ptr>max_scr_ptr then max_scr_ptr:=scrap_ptr; 3616 if tok_ptr>max_tok_ptr then max_tok_ptr:=tok_ptr; 3617 if text_ptr>max_txt_ptr then max_txt_ptr:=text_ptr; 3618 tats@;@/ 3619 overflow('scrap/token/text'); 3620 end 3621 3622 @ Some nonstandard ASCII characters may have entered \.{WEAVE} by means of 3623 standard ones. They are converted to \TeX\ control sequences so that it is 3624 possible to keep \.{WEAVE} from stepping beyond standard ASCII. 3625 3626 @<Cases involving nonstandard...@>= 3627 not_equal: sc2("\")("I")(math); 3628 @.\\I@> 3629 less_or_equal: sc2("\")("L")(math); 3630 @.\\L@> 3631 greater_or_equal: sc2("\")("G")(math); 3632 @.\\G@> 3633 equivalence_sign: sc2("\")("S")(math); 3634 @.\\S@> 3635 and_sign: sc2("\")("W")(math); 3636 @.\\W@> 3637 or_sign: sc2("\")("V")(math); 3638 @.\\V@> 3639 not_sign: sc2("\")("R")(math); 3640 @.\\R@> 3641 left_arrow: sc2("\")("K")(math); 3642 @.\\K@> 3643 3644 @ The following code must use |app_tok| instead of |app| in order to 3645 protect against overflow. Note that |tok_ptr+1<=max_toks| after |app_tok| 3646 has been used, so another |app| is legitimate before testing again. 3647 3648 Many of the special characters in a string must be prefixed by `\.\\' so that 3649 \TeX\ will print them properly. 3650 @^special string characters@> 3651 3652 @<Append a \(string scrap@>= 3653 begin app("\"); 3654 if next_control=verbatim then 3655 begin app("="); 3656 @.\\=@> 3657 end 3658 else begin app("."); 3659 @.\\.@> 3660 end; 3661 app("{"); j:=id_first; 3662 while j<id_loc do 3663 begin case buffer[j] of 3664 " ","\","#","%","$","^","'","`","{","}","~","&","_": 3665 begin app("\"); 3666 end; 3667 @.\\\ @> 3668 @.\\\\@> 3669 @.\\\#@> 3670 @.\\\%@> 3671 @.\\\$@> 3672 @.\\\^@> 3673 @.\\\'@> 3674 @.\\\`@> 3675 @.\\\{@> 3676 @.\\\}@> 3677 @.\\\~@> 3678 @.\\\&@> 3679 @.\\_@> 3680 "@@": if buffer[j+1]="@@" then incr(j) 3681 else err_print('! Double @@ should be used in strings'); 3682 @.Double \AT! should be used...@> 3683 othercases do_nothing 3684 endcases;@/ 3685 app_tok(buffer[j]); incr(j); 3686 end; 3687 sc1("}")(simp); 3688 end 3689 3690 @ @<Append a \TeX\ string scrap@>= 3691 begin app("\"); app("h"); app("b"); app("o"); app("x"); 3692 app("{"); 3693 for j:=id_first to id_loc-1 do app_tok(buffer[j]); 3694 sc1("}")(simp); 3695 end 3696 3697 @ @<Append an identifier scrap@>= 3698 begin p:=id_lookup(normal); 3699 case ilk[p] of 3700 normal,array_like,const_like,div_like, 3701 do_like,for_like,goto_like,nil_like,to_like: sub_cases(p); 3702 @t\4@>@<Cases that generate more than one scrap@>@; 3703 othercases begin next_control:=ilk[p]-char_like; goto reswitch; 3704 end {\&{and}, \&{in}, \&{not}, \&{or}} 3705 endcases; 3706 end 3707 3708 @ The |sub_cases| also result in straightforward scraps. 3709 3710 @<Declaration of the |sub_cases| procedure@>= 3711 procedure sub_cases(@!p:name_pointer); {a subprocedure of |Pascal_parse|} 3712 begin case ilk[p] of 3713 normal: sc1(id_flag+p)(simp); {not a reserved word} 3714 array_like: sc1(res_flag+p)(alpha); {\&{array}, \&{file}, \&{set}} 3715 const_like: sc3(force)(backup)(res_flag+p)(intro); 3716 {\&{const}, \&{label}, \&{type}} 3717 div_like: sc3(math_bin)(res_flag+p)("}")(math); {\&{div}, \&{mod}} 3718 do_like: sc1(res_flag+p)(omega); {\&{do}, \&{of}, \&{then}} 3719 for_like: sc2(force)(res_flag+p)(alpha); {\&{for}, \&{while}, \&{with}} 3720 goto_like: sc1(res_flag+p)(intro); {\&{goto}, \&{packed}} 3721 nil_like: sc1(res_flag+p)(simp); {\&{nil}} 3722 to_like: sc3(math_rel)(res_flag+p)("}")(math); {\&{downto}, \&{to}} 3723 end; 3724 end; 3725 3726 @ @<Cases that generate more than one scrap@>= 3727 begin_like: begin sc3(force)(res_flag+p)(cancel)(beginning); sc0(intro); 3728 end; {\&{begin}} 3729 case_like: begin sc0(casey); sc2(force)(res_flag+p)(alpha); 3730 end; {\&{case}} 3731 else_like: begin @<Append |terminator| if not already present@>; 3732 sc3(force)(backup)(res_flag+p)(elsie); 3733 end; {\&{else}} 3734 end_like: begin @<Append |term...@>; 3735 sc2(force)(res_flag+p)(close); 3736 end; {\&{end}} 3737 if_like: begin sc0(cond); sc2(force)(res_flag+p)(alpha); 3738 end; {\&{if}} 3739 loop_like: begin sc3(force)("\")("~")(alpha); 3740 @.\\\~@> 3741 sc1(res_flag+p)(omega); 3742 end; {\&{xclause}} 3743 proc_like: begin sc4(force)(backup)(res_flag+p)(cancel)(proc); 3744 sc3(indent)("\")(" ")(intro); 3745 @.\\\ @> 3746 end; {\&{function}, \&{procedure}, \&{program}} 3747 record_like: begin sc1(res_flag+p)(record_head); sc0(intro); 3748 end; {\&{record}} 3749 repeat_like: begin sc4(force)(indent)(res_flag+p)(cancel)(beginning); 3750 sc0(intro); 3751 end; {\&{repeat}} 3752 until_like: begin @<Append |term...@>; 3753 sc3(force)(backup)(res_flag+p)(close); sc0(clause); 3754 end; {\&{until}} 3755 var_like: begin sc4(force)(backup)(res_flag+p)(cancel)(var_head); sc0(intro); 3756 end; {\&{var}} 3757 3758 @ If a comment or semicolon appears before the reserved words \&{end}, 3759 \&{else}, or \&{until}, the |semi| or |terminator| scrap that is already 3760 present overrides the |terminator| scrap belonging to this reserved word. 3761 3762 @<Append |termin...@>= 3763 if (scrap_ptr<scrap_base)or((cat[scrap_ptr]<>terminator)and 3764 (cat[scrap_ptr]<>semi)) then sc0(terminator) 3765 3766 @ A comment is incorporated into the previous scrap if that scrap is of type 3767 |omega| or |semi| or |terminator|. (These three categories have consecutive 3768 category codes.) Otherwise the comment is entered as a separate scrap 3769 of type |terminator|, and it will combine with a |terminator| scrap that 3770 immediately follows~it. 3771 3772 The |app_comment| procedure takes care of placing a comment at the end of the 3773 current scrap list. When |app_comment| is called, we assume that the current 3774 token list is the translation of the comment involved. 3775 3776 @<Declaration of the |app_comment|...@>= 3777 procedure app_comment; {append a comment to the scrap list} 3778 begin freeze_text; 3779 if (scrap_ptr<scrap_base)or(cat[scrap_ptr]<omega)or 3780 (cat[scrap_ptr]>terminator) then sc0(terminator) 3781 else begin app1(scrap_ptr); {|cat[scrap_ptr]| is 3782 |omega| or |semi| or |terminator|} 3783 end; 3784 app(text_ptr-1+tok_flag); trans[scrap_ptr]:=text_ptr; freeze_text; 3785 end; 3786 3787 @ We are now finished with |Pascal_parse|, except for two relatively 3788 trivial subprocedures that convert constants into tokens. 3789 3790 @<Declaration of the |app_octal| and...@>= 3791 procedure app_octal; 3792 begin app("\"); app("O"); app("{"); 3793 @.\\O@> 3794 while (buffer[loc]>="0")and(buffer[loc]<="7") do 3795 begin app_tok(buffer[loc]); incr(loc); 3796 end; 3797 sc1("}")(simp); 3798 end; 3799 @# 3800 procedure app_hex; 3801 begin app("\"); app("H"); app("{"); 3802 @.\\H@> 3803 while ((buffer[loc]>="0")and(buffer[loc]<="9"))or@| 3804 ((buffer[loc]>="A")and(buffer[loc]<="F")) do 3805 begin app_tok(buffer[loc]); incr(loc); 3806 end; 3807 sc1("}")(simp); 3808 end; 3809 3810 3811 @ When the `\v' that introduces \PASCAL\ text is sensed, a call on 3812 |Pascal_translate| will return a pointer to the \TeX\ translation of 3813 that text. If scraps exist in the |cat| and |trans| arrays, they are 3814 unaffected by this translation process. 3815 3816 @p function Pascal_translate: text_pointer; 3817 var p:text_pointer; {points to the translation} 3818 @!save_base:0..max_scraps; {holds original value of |scrap_base|} 3819 begin save_base:=scrap_base; scrap_base:=scrap_ptr+1; 3820 Pascal_parse; {get the scraps together} 3821 if next_control<>"|" then err_print('! Missing "|" after Pascal text'); 3822 @.Missing "|"...@> 3823 app_tok(cancel); app_comment; {place a |cancel| token as a final ``comment''} 3824 p:=translate; {make the translation} 3825 stat if scrap_ptr>max_scr_ptr then max_scr_ptr:=scrap_ptr;@;@+tats@;@/ 3826 scrap_ptr:=scrap_base-1; scrap_base:=save_base; {scrap the scraps} 3827 Pascal_translate:=p; 3828 end; 3829 3830 @ The |outer_parse| routine is to |Pascal_parse| as |outer_xref| 3831 is to |Pascal_xref|: It constructs a sequence of scraps for \PASCAL\ text 3832 until |next_control>=format|. Thus, it takes care of embedded comments. 3833 3834 @p procedure outer_parse; {makes scraps from \PASCAL\ tokens and comments} 3835 var bal:eight_bits; {brace level in comment} 3836 @!p,@!q:text_pointer; {partial comments} 3837 begin while next_control<format do 3838 if next_control<>"{" then Pascal_parse 3839 else begin @<Make sure that there is room for at least seven more 3840 tokens, three more texts, and one more scrap@>; 3841 app("\"); app("C"); app("{"); 3842 @.\\C@> 3843 bal:=copy_comment(1); next_control:="|"; 3844 while bal>0 do 3845 begin p:=text_ptr; freeze_text; q:=Pascal_translate; 3846 {at this point we have |tok_ptr+6<=max_toks|} 3847 app(tok_flag+p); app(inner_tok_flag+q); 3848 if next_control="|" then bal:=copy_comment(bal) 3849 else bal:=0; {an error has been reported} 3850 end; 3851 app(force); app_comment; {the full comment becomes a scrap} 3852 end; 3853 end; 3854 3855 @ @<Make sure that there is room for at least seven more...@>= 3856 if (tok_ptr+7>max_toks)or(text_ptr+3>max_texts)or(scrap_ptr>=max_scraps) then 3857 begin stat if scrap_ptr>max_scr_ptr then max_scr_ptr:=scrap_ptr; 3858 if tok_ptr>max_tok_ptr then max_tok_ptr:=tok_ptr; 3859 if text_ptr>max_txt_ptr then max_txt_ptr:=text_ptr; 3860 tats@;@/ 3861 overflow('token/text/scrap'); 3862 end 3863 3864 @* Output of tokens. 3865 So far our programs have only built up multi-layered token lists in 3866 \.{WEAVE}'s internal memory; we have to figure out how to get them into 3867 the desired final form. The job of converting token lists to characters in 3868 the \TeX\ output file is not difficult, although it is an implicitly 3869 recursive process. Four main considerations had to be kept in mind when 3870 this part of \.{WEAVE} was designed. (a) There are two modes of output: 3871 |outer| mode, which translates tokens like |force| into line-breaking 3872 control sequences, and |inner| mode, which ignores them except that blank 3873 spaces take the place of line breaks. (b) The |cancel| instruction applies 3874 to adjacent token or tokens that are output, and this cuts across levels 3875 of recursion since `|cancel|' occurs at the beginning or end of a token 3876 list on one level. (c) The \TeX\ output file will be semi-readable if line 3877 breaks are inserted after the result of tokens like |break_space| and 3878 |force|. (d) The final line break should be suppressed, and there should 3879 be no |force| token output immediately after `\.{\\Y\\P}'. 3880 3881 @ The output process uses a stack to keep track of what is going on at 3882 different ``levels'' as the token lists are being written out. Entries on 3883 this stack have three parts: 3884 3885 \yskip\hang |end_field| is the |tok_mem| location where the token list of a 3886 particular level will end; 3887 3888 \yskip\hang |tok_field| is the |tok_mem| location from which the next token 3889 on a particular level will be read; 3890 3891 \yskip\hang |mode_field| is the current mode, either |inner| or |outer|. 3892 3893 \yskip\noindent The current values of these quantities are referred to 3894 quite frequently, so they are stored in a separate place instead of in the 3895 |stack| array. We call the current values |cur_end|, |cur_tok|, and 3896 |cur_mode|. 3897 3898 The global variable |stack_ptr| tells how many levels of output are 3899 currently in progress. The end of output occurs when an |end_translation| 3900 token is found, so the stack is never empty except when we first begin the 3901 output process. 3902 3903 @d inner=0 {value of |mode| for \PASCAL\ texts within \TeX\ texts} 3904 @d outer=1 {value of |mode| for \PASCAL\ texts in modules} 3905 3906 @<Types...@>= 3907 @!mode=inner..outer;@/ 3908 @!output_state=record@!end_field:sixteen_bits; {ending location of token list} 3909 @!tok_field:sixteen_bits; {present location within token list} 3910 @!mode_field:mode; {interpretation of control tokens} 3911 end; 3912 3913 @ @d cur_end==cur_state.end_field {current ending location in |tok_mem|} 3914 @d cur_tok==cur_state.tok_field {location of next output token in |tok_mem|} 3915 @d cur_mode==cur_state.mode_field {current mode of interpretation} 3916 @d init_stack==stack_ptr:=0;cur_mode:=outer {do this to initialize the stack} 3917 3918 @<Glob...@>= 3919 @!cur_state:output_state; {|cur_end|, |cur_tok|, |cur_mode|} 3920 @!stack:array[1..stack_size] of output_state; {info for non-current levels} 3921 @!stack_ptr:0..stack_size; {first unused location in the output state stack} 3922 stat@!max_stack_ptr:0..stack_size; {largest value assumed by |stack_ptr|} 3923 tats 3924 3925 @ @<Set init...@>=stat max_stack_ptr:=0;@+tats 3926 3927 @ To insert token-list |p| into the output, the |push_level| subroutine 3928 is called; it saves the old level of output and gets a new one going. 3929 The value of |cur_mode| is not changed. 3930 3931 @p procedure push_level(@!p:text_pointer); {suspends the current level} 3932 begin if stack_ptr=stack_size then overflow('stack') 3933 else begin if stack_ptr>0 then 3934 stack[stack_ptr]:=cur_state; {save |cur_end|$\,\ldots\,$|cur_mode|} 3935 incr(stack_ptr); 3936 stat if stack_ptr>max_stack_ptr then 3937 max_stack_ptr:=stack_ptr;@;@+tats@;@/ 3938 cur_tok:=tok_start[p]; cur_end:=tok_start[p+1]; 3939 end; 3940 end; 3941 3942 @ Conversely, the |pop_level| routine restores the conditions that were in 3943 force when the current level was begun. This subroutine will never be 3944 called when |stack_ptr=1|. It is so simple, we declare it as a macro: 3945 3946 @d pop_level==begin decr(stack_ptr); cur_state:=stack[stack_ptr]; 3947 end {do this when |cur_tok| reaches |cur_end|} 3948 3949 @ The |get_output| function returns the next byte of output that is not a 3950 reference to a token list. It returns the values |identifier| or |res_word| 3951 or |mod_name| if the next token is to be an identifier (typeset in 3952 italics), a reserved word (typeset in boldface) or a module name (typeset 3953 by a complex routine that might generate additional levels of output). 3954 In these cases |cur_name| points to the identifier or module name in 3955 question. 3956 3957 @d res_word=@'201 {returned by |get_output| for reserved words} 3958 @d mod_name=@'200 {returned by |get_output| for module names} 3959 3960 @p function get_output:eight_bits; {returns the next token of output} 3961 label restart; 3962 var a:sixteen_bits; {current item read from |tok_mem|} 3963 begin restart: while cur_tok=cur_end do pop_level; 3964 a:=tok_mem[cur_tok]; incr(cur_tok); 3965 if a>=@'400 then 3966 begin cur_name:=a mod id_flag; 3967 case a div id_flag of 3968 2: a:=res_word; {|a=res_flag+cur_name|} 3969 3: a:=mod_name; {|a=mod_flag+cur_name|} 3970 4: begin push_level(cur_name); goto restart; 3971 end; {|a=tok_flag+cur_name|} 3972 5: begin push_level(cur_name); cur_mode:=inner; goto restart; 3973 end; {|a=inner_tok_flag+cur_name|} 3974 othercases a:=identifier {|a=id_flag+cur_name|} 3975 endcases; 3976 end; 3977 @!debug if trouble_shooting then debug_help; @+ gubed@/ 3978 get_output:=a; 3979 end; 3980 3981 @ The real work associated with token output is done by |make_output|. 3982 This procedure appends an |end_translation| token to the current token list, 3983 and then it repeatedly calls |get_output| and feeds characters to the output 3984 buffer until reaching the |end_translation| sentinel. It is possible for 3985 |make_output| to 3986 be called recursively, since a module name may include embedded \PASCAL\ 3987 text; however, the depth of recursion never exceeds one level, since 3988 module names cannot be inside of module names. 3989 3990 A procedure called |output_Pascal| does the scanning, translation, and 3991 output of \PASCAL\ text within `\pb' brackets, and this procedure uses 3992 |make_output| to output the current token list. Thus, the recursive call 3993 of |make_output| actually occurs when |make_output| calls |output_Pascal| 3994 while outputting the name of a module. 3995 @^recursion@> 3996 3997 @p procedure make_output; forward; @t\2@>@# 3998 procedure output_Pascal; {outputs the current token list} 3999 var save_tok_ptr,@!save_text_ptr,@!save_next_control:sixteen_bits; 4000 {values to be restored} 4001 p:text_pointer; {translation of the \PASCAL\ text} 4002 begin save_tok_ptr:=tok_ptr; save_text_ptr:=text_ptr; 4003 save_next_control:=next_control; next_control:="|"; p:=Pascal_translate; 4004 app(p+inner_tok_flag); 4005 make_output; {output the list} 4006 stat if text_ptr>max_txt_ptr then max_txt_ptr:=text_ptr; 4007 if tok_ptr>max_tok_ptr then max_tok_ptr:=tok_ptr;@;@+tats@;@/ 4008 text_ptr:=save_text_ptr; tok_ptr:=save_tok_ptr; {forget the tokens} 4009 next_control:=save_next_control; {restore |next_control| to original state} 4010 end; 4011 4012 @ Here is \.{WEAVE}'s major output handler. 4013 4014 @p procedure make_output; {outputs the equivalents of tokens} 4015 label reswitch,exit,found; 4016 var a:eight_bits; {current output byte} 4017 @!b:eight_bits; {next output byte} 4018 @!k,@!k_limit:0..max_bytes; {indices into |byte_mem|} 4019 @!w:0..ww-1; {row of |byte_mem|} 4020 @!j:0..long_buf_size; {index into |buffer|} 4021 @!string_delimiter:ASCII_code; {first and last character of 4022 string being copied} 4023 @!save_loc,@!save_limit:0..long_buf_size; {|loc| and |limit| to be restored} 4024 @!cur_mod_name:name_pointer; {name of module being output} 4025 @!save_mode:mode; {value of |cur_mode| before a sequence of breaks} 4026 begin app(end_translation); {append a sentinel} 4027 freeze_text; push_level(text_ptr-1); 4028 loop@+ begin a:=get_output; 4029 reswitch: case a of 4030 end_translation: return; 4031 identifier,res_word:@<Output an identifier@>; 4032 mod_name:@<Output a module name@>; 4033 math_bin,math_op,math_rel:@<Output a \.{\\math} operator@>; 4034 cancel: begin repeat a:=get_output; 4035 until (a<backup)or(a>big_force); 4036 goto reswitch; 4037 end; 4038 big_cancel: begin repeat a:=get_output; 4039 until ((a<backup)and(a<>" "))or(a>big_force); 4040 goto reswitch; 4041 end; 4042 indent,outdent,opt,backup,break_space,force,big_force:@<Output a 4043 \(control, look ahead in case of line breaks, 4044 possibly |goto reswitch|@>; 4045 othercases out(a) {otherwise |a| is an ASCII character} 4046 endcases; 4047 end; 4048 exit:end; 4049 4050 @ An identifier of length one does not have to be enclosed in braces, and it 4051 looks slightly better if set in a math-italic font instead of a (slightly 4052 narrower) text-italic font. Thus we output `\.{\\\char'174a}' but 4053 `\.{\\\\\{aa\}}'. 4054 4055 @<Output an identifier@>= 4056 begin out("\"); 4057 if a=identifier then 4058 if length(cur_name)=1 then out("|") 4059 @.\\|@> 4060 else out("\") 4061 @.\\\\@> 4062 else out("&"); {|a=res_word|} 4063 @.\\\&@> 4064 if length(cur_name)=1 then out(byte_mem[cur_name mod ww,byte_start[cur_name]]) 4065 else out_name(cur_name); 4066 end 4067 4068 @ @<Output a \....@>= 4069 begin out5("\")("m")("a")("t")("h"); 4070 if a=math_bin then out3("b")("i")("n") 4071 else if a=math_rel then out3("r")("e")("l") 4072 else out2("o")("p"); 4073 out("{"); 4074 end 4075 4076 @ The current mode does not affect the behavior of \.{WEAVE}'s output routine 4077 except when we are outputting control tokens. 4078 4079 @<Output a \(control...@>= 4080 if a<break_space then 4081 begin if cur_mode=outer then 4082 begin out2("\")(a-cancel+"0"); 4083 @.\\1@> 4084 @.\\2@> 4085 @.\\3@> 4086 @.\\4@> 4087 @.\\5@> 4088 @.\\6@> 4089 @.\\7@> 4090 if a=opt then out(get_output) {|opt| is followed by a digit} 4091 end 4092 else if a=opt then b:=get_output {ignore digit following |opt|} 4093 end 4094 else @<Look ahead for strongest line break, |goto reswitch|@> 4095 4096 @ If several of the tokens |break_space|, |force|, |big_force| occur in a 4097 row, possibly mixed with blank spaces (which are ignored), 4098 the largest one is used. A line break also occurs in the output file, 4099 except at the very end of the translation. The very first line break 4100 is suppressed (i.e., a line break that follows `\.{\\Y\\P}'). 4101 4102 @<Look ahead for st...@>= 4103 begin b:=a; save_mode:=cur_mode; 4104 loop@+ begin a:=get_output; 4105 if (a=cancel)or(a=big_cancel) then goto reswitch; 4106 {|cancel| overrides everything} 4107 if ((a<>" ")and(a<break_space))or(a>big_force) then 4108 begin if save_mode=outer then 4109 begin if out_ptr>3 then 4110 if (out_buf[out_ptr]="P")and 4111 (out_buf[out_ptr-1]="\")and 4112 @.\\P@> 4113 @.\\Y@> 4114 (out_buf[out_ptr-2]="Y")and 4115 (out_buf[out_ptr-3]="\") then 4116 goto reswitch; 4117 @.\\1@> 4118 @.\\2@> 4119 @.\\3@> 4120 @.\\4@> 4121 @.\\5@> 4122 @.\\6@> 4123 @.\\7@> 4124 out2("\")(b-cancel+"0"); 4125 if a<>end_translation then finish_line; 4126 end 4127 else if (a<>end_translation)and(cur_mode=inner) then out(" "); 4128 goto reswitch; 4129 end; 4130 if a>b then b:=a; {if |a=" "| we have |a<b|} 4131 end; 4132 end 4133 4134 @ The remaining part of |make_output| is somewhat more complicated. When we 4135 output a module name, we may need to enter the parsing and translation 4136 routines, since the name may contain \PASCAL\ code embedded in 4137 \pb\ constructions. This \PASCAL\ code is placed at the end of the active 4138 input buffer and the translation process uses the end of the active 4139 |tok_mem| area. 4140 4141 @<Output a module name@>= 4142 begin out2("\")("X"); 4143 @.\\X@> 4144 cur_xref:=xref[cur_name]; 4145 if num(cur_xref)>=def_flag then 4146 begin out_mod(num(cur_xref)-def_flag); 4147 if phase_three then 4148 begin cur_xref:=xlink(cur_xref); 4149 while num(cur_xref)>=def_flag do 4150 begin out2(",")(" "); 4151 out_mod(num(cur_xref)-def_flag); 4152 cur_xref:=xlink(cur_xref); 4153 end; 4154 end; 4155 end 4156 else out("0"); {output the module number, or zero if it was undefined} 4157 out(":"); @<Output the text of the module name@>; 4158 out2("\")("X"); 4159 end 4160 4161 @ @<Output the text...@>= 4162 k:=byte_start[cur_name]; w:=cur_name mod ww; k_limit:=byte_start[cur_name+ww]; 4163 cur_mod_name:=cur_name; 4164 while k<k_limit do 4165 begin b:=byte_mem[w,k]; incr(k); 4166 if b="@@" then @<Skip next character, give error if not `\.{@@}'@>; 4167 if b<>"|" then out(b) 4168 else begin @<Copy the \PASCAL\ text into |buffer[(limit+1)..j]|@>; 4169 save_loc:=loc; save_limit:=limit; loc:=limit+2; limit:=j+1; 4170 buffer[limit]:="|"; output_Pascal; 4171 loc:=save_loc; limit:=save_limit; 4172 end; 4173 end 4174 4175 @ @<Skip next char...@>= 4176 begin if byte_mem[w,k]<>"@@" then 4177 begin print_nl('! Illegal control code in section name:'); 4178 @.Illegal control code...@> 4179 print_nl('<'); print_id(cur_mod_name); print('> '); mark_error; 4180 end; 4181 incr(k); 4182 end 4183 4184 @ The \PASCAL\ text enclosed in \pb\ should not contain `\v' characters, 4185 except within strings. We put a `\v' at the front of the buffer, so that an 4186 error message that displays the whole buffer will look a little bit sensible. 4187 The variable |string_delimiter| is zero outside of strings, otherwise it 4188 equals the delimiter that began the string being copied. 4189 4190 @<Copy the \PASCAL\ text into...@>= 4191 j:=limit+1; buffer[j]:="|"; string_delimiter:=0; 4192 loop@+ begin if k>=k_limit then 4193 begin print_nl('! Pascal text in section name didn''t end:'); 4194 @.Pascal text...didn't end@> 4195 print_nl('<'); print_id(cur_mod_name); print('> '); mark_error; 4196 goto found; 4197 end; 4198 b:=byte_mem[w,k]; incr(k); 4199 if b="@@" then @<Copy a control code into the buffer@> 4200 else begin if (b="""")or(b="'") then 4201 if string_delimiter=0 then string_delimiter:=b 4202 else if string_delimiter=b then string_delimiter:=0; 4203 if (b<>"|")or(string_delimiter<>0) then 4204 begin if j>long_buf_size-3 then overflow('buffer'); 4205 incr(j); buffer[j]:=b; 4206 end 4207 else goto found; 4208 end; 4209 end; 4210 found: 4211 4212 @ @<Copy a control code into the buffer@>= 4213 begin if j>long_buf_size-4 then overflow('buffer'); 4214 buffer[j+1]:="@@"; buffer[j+2]:=byte_mem[w,k]; j:=j+2; incr(k); 4215 end 4216 4217 @* Phase two processing. 4218 We have assembled enough pieces of the puzzle in order to be ready to specify 4219 the processing in \.{WEAVE}'s main pass over the source file. Phase two 4220 is analogous to phase one, except that more work is involved because we must 4221 actually output the \TeX\ material instead of merely looking at the 4222 \.{WEB} specifications. 4223 4224 @<Phase II: Read all the text again and translate it to \TeX\ form@>= 4225 reset_input; print_nl('Writing the output file...'); 4226 module_count:=0; 4227 copy_limbo; 4228 finish_line; flush_buffer(0,false,false); {insert a blank line, it looks nice} 4229 while not input_has_ended do @<Translate the \(current module@> 4230 4231 @ The output file will contain the control sequence \.{\\Y} between non-null 4232 sections of a module, e.g., between the \TeX\ and definition parts if both 4233 are nonempty. This puts a little white space between the parts when they are 4234 printed. However, we don't want \.{\\Y} to occur between two definitions 4235 within a single module. The variables |out_line| or |out_ptr| will 4236 change if a section is non-null, so the following macros `|save_position|' 4237 and `|emit_space_if_needed|' are able to handle the situation: 4238 4239 @d save_position==save_line:=out_line; save_place:=out_ptr 4240 @d emit_space_if_needed==if (save_line<>out_line)or(save_place<>out_ptr) then 4241 out2("\")("Y") 4242 @.\\Y@> 4243 4244 @<Glo...@>= 4245 @!save_line:integer; {former value of |out_line|} 4246 @!save_place:sixteen_bits; {former value of |out_ptr|} 4247 4248 @ @<Translate the \(current module@>= 4249 begin incr(module_count);@/ 4250 @<Output the code for the beginning of a new module@>; 4251 save_position;@/ 4252 @<Translate the \TeX\ part of the current module@>; 4253 @<Translate the \(definition part of the current module@>; 4254 @<Translate the \PASCAL\ part of the current module@>; 4255 @<Show cross references to this module@>; 4256 @<Output the code for the end of a module@>; 4257 end 4258 4259 @ Modules beginning with the \.{WEB} control sequence `\.{@@\ }' start in the 4260 output with the \TeX\ control sequence `\.{\\M}', followed by the module 4261 number. Similarly, `\.{@@*}' modules lead to the control sequence `\.{\\N}'. 4262 If this is a changed module, we put \.{*} just before the module number. 4263 4264 @<Output the code for the beginning...@>= 4265 out("\"); 4266 if buffer[loc-1]<>"*" then out("M") 4267 @.\\M@> 4268 else begin out("N"); print('*',module_count:1); 4269 @.\\N@> 4270 update_terminal; {print a progress report} 4271 end; 4272 out_mod(module_count); out2(".")(" ") 4273 4274 @ In the \TeX\ part of a module, we simply copy the source text, except that 4275 index entries are not copied and \PASCAL\ text within \pb\ is translated. 4276 4277 @<Translate the \T...@>= 4278 repeat next_control:=copy_TeX; 4279 case next_control of 4280 "|": begin init_stack; output_Pascal; 4281 end; 4282 "@@": out("@@"); 4283 octal: @<Translate an octal constant appearing in \TeX\ text@>; 4284 hex: @<Translate a hexadecimal constant appearing in \TeX\ text@>; 4285 TeX_string,xref_roman,xref_wildcard,xref_typewriter,module_name: 4286 begin loc:=loc-2; next_control:=get_next; {skip to \.{@@>}} 4287 if next_control=TeX_string then 4288 err_print('! TeX string should be in Pascal text only'); 4289 @.TeX string should be...@> 4290 end; 4291 begin_comment,end_comment,check_sum,thin_space,math_break,line_break, 4292 big_line_break,no_line_break,join,pseudo_semi: 4293 err_print('! You can''t do that in TeX text'); 4294 @.You can't do that...@> 4295 othercases do_nothing 4296 endcases; 4297 until next_control>=format 4298 4299 @ @<Translate an octal constant appearing in \TeX\ text@>= 4300 begin out3("\")("O")("{"); 4301 @.\\O@> 4302 while (buffer[loc]>="0")and(buffer[loc]<="7") do 4303 begin out(buffer[loc]); incr(loc); 4304 end; {since |buffer[limit]=" "|, this loop will end} 4305 out("}"); 4306 end 4307 4308 @ @<Translate a hexadecimal constant appearing in \TeX\ text@>= 4309 begin out3("\")("H")("{"); 4310 @.\\H@> 4311 while ((buffer[loc]>="0")and(buffer[loc]<="9"))or@| 4312 ((buffer[loc]>="A")and(buffer[loc]<="F")) do 4313 begin out(buffer[loc]); incr(loc); 4314 end; 4315 out("}"); 4316 end 4317 @ When we get to the following code we have |next_control>=format|, and 4318 the token memory is in its initial empty state. 4319 4320 @<Translate the \(d...@>= 4321 if next_control<=definition then {definition part non-empty} 4322 begin emit_space_if_needed; save_position; 4323 end; 4324 while next_control<=definition do {|format| or |definition|} 4325 begin init_stack; 4326 if next_control=definition then @<Start a macro definition@> 4327 else @<Start a format definition@>; 4328 outer_parse; finish_Pascal; 4329 end 4330 4331 @ The |finish_Pascal| procedure outputs the translation of the current 4332 scraps, preceded by the control sequence `\.{\\P}' and followed by the 4333 control sequence `\.{\\par}'. It also restores the token and scrap 4334 memories to their initial empty state. 4335 4336 A |force| token is appended to the current scraps before translation 4337 takes place, so that the translation will normally end with \.{\\6} or 4338 \.{\\7} (the \TeX\ macros for |force| and |big_force|). This \.{\\6} or 4339 \.{\\7} is replaced by the concluding \.{\\par} or by \.{\\Y\\par}. 4340 4341 @p procedure finish_Pascal; {finishes a definition or a \PASCAL\ part} 4342 var p:text_pointer; {translation of the scraps} 4343 begin out2("\")("P"); app_tok(force); app_comment; p:=translate; 4344 @.\\P@> 4345 app(p+tok_flag); make_output; {output the list} 4346 if out_ptr>1 then 4347 if out_buf[out_ptr-1]="\" then 4348 @.\\6@> 4349 @.\\7@> 4350 @.\\Y@> 4351 if out_buf[out_ptr]="6" then out_ptr:=out_ptr-2 4352 else if out_buf[out_ptr]="7" then out_buf[out_ptr]:="Y"; 4353 out4("\")("p")("a")("r"); finish_line; 4354 stat if text_ptr>max_txt_ptr then max_txt_ptr:=text_ptr; 4355 if tok_ptr>max_tok_ptr then max_tok_ptr:=tok_ptr; 4356 if scrap_ptr>max_scr_ptr then max_scr_ptr:=scrap_ptr; 4357 tats@;@/ 4358 tok_ptr:=1; text_ptr:=1; scrap_ptr:=0; {forget the tokens and the scraps} 4359 end; 4360 4361 @ @<Start a macro...@>= 4362 begin sc2("\")("D")(intro); {this will produce `\&{define }'} 4363 @.\\D@> 4364 next_control:=get_next; 4365 if next_control<>identifier then err_print('! Improper macro definition') 4366 @.Improper macro definition@> 4367 else sc1(id_flag+id_lookup(normal))(math); 4368 next_control:=get_next; 4369 end 4370 4371 @ @<Start a format...@>= 4372 begin sc2("\")("F")(intro); {this will produce `\&{format }'} 4373 @.\\F@> 4374 next_control:=get_next; 4375 if next_control=identifier then 4376 begin sc1(id_flag+id_lookup(normal))(math); 4377 next_control:=get_next; 4378 if next_control=equivalence_sign then 4379 begin sc2("\")("S")(math); {output an equivalence sign} 4380 @.\\S@> 4381 next_control:=get_next; 4382 if next_control=identifier then 4383 begin sc1(id_flag+id_lookup(normal))(math); 4384 sc0(semi); {insert an invisible semicolon} 4385 next_control:=get_next; 4386 end; 4387 end; 4388 end; 4389 if scrap_ptr<>5 then err_print('! Improper format definition'); 4390 @.Improper format definition@> 4391 end 4392 4393 @ Finally, when the \TeX\ and definition parts have been treated, we have 4394 |next_control>=begin_Pascal|. We will make the global variable |this_module| 4395 point to the current module name, if it has a name. 4396 4397 @<Glob...@>=@!this_module:name_pointer; {the current module name, or zero} 4398 4399 @ @<Translate the \P...@>= 4400 this_module:=0; 4401 if next_control<=module_name then 4402 begin emit_space_if_needed; init_stack; 4403 if next_control=begin_Pascal then next_control:=get_next 4404 else begin this_module:=cur_module; 4405 @<Check that |=| or |==| follows this module name, and 4406 emit the scraps to start the module definition@>; 4407 end; 4408 while next_control<=module_name do 4409 begin outer_parse; 4410 @<Emit the scrap for a module name if present@>; 4411 end; 4412 finish_Pascal; 4413 end 4414 4415 @ @<Check that |=|...@>= 4416 repeat next_control:=get_next; 4417 until next_control<>"+"; {allow optional `\.{+=}'} 4418 if (next_control<>"=")and(next_control<>equivalence_sign) then 4419 err_print('! You need an = sign after the section name') 4420 @.You need an = sign...@> 4421 else next_control:=get_next; 4422 if out_ptr>1 then 4423 if (out_buf[out_ptr]="Y")and(out_buf[out_ptr-1]="\") then 4424 @.\\Y@> 4425 begin app(backup); {the module name will be flush left} 4426 end; 4427 sc1(mod_flag+this_module)(mod_scrap); 4428 cur_xref:=xref[this_module]; 4429 if num(cur_xref)<>module_count+def_flag then 4430 begin sc3(math_rel)("+")("}")(math); 4431 {module name is multiply defined} 4432 this_module:=0; {so we won't give cross-reference info here} 4433 end; 4434 sc2("\")("S")(math); {output an equivalence sign} 4435 @.\\S@> 4436 sc1(force)(semi); {this forces a line break unless `\.{@@+}' follows} 4437 4438 @ @<Emit the scrap...@>= 4439 if next_control<module_name then 4440 begin err_print('! You can''t do that in Pascal text'); 4441 @.You can't do that...@> 4442 next_control:=get_next; 4443 end 4444 else if next_control=module_name then 4445 begin sc1(mod_flag+cur_module)(mod_scrap); next_control:=get_next; 4446 end 4447 4448 @ Cross references relating to a named module are given after the module ends. 4449 4450 @<Show cross...@>= 4451 if this_module>0 then 4452 begin @<Rearrange the list pointed to by |cur_xref|@>; 4453 footnote(def_flag); footnote(0); 4454 end 4455 4456 @ To rearrange the order of the linked list of cross references, we need 4457 four more variables that point to cross reference entries. We'll end up 4458 with a list pointed to by |cur_xref|. 4459 4460 @<Glob...@>= 4461 @!next_xref,@!this_xref,@!first_xref,@!mid_xref:xref_number; 4462 {pointer variables for rearranging a list} 4463 4464 @ We want to rearrange the cross reference list so that all the entries with 4465 |def_flag| come first, in ascending order; then come all the other 4466 entries, in ascending order. There may be no entries in either one or both 4467 of these categories. 4468 4469 @<Rearrange the list...@>= 4470 first_xref:=xref[this_module]; 4471 this_xref:=xlink(first_xref); {bypass current module number} 4472 if num(this_xref)>def_flag then 4473 begin mid_xref:=this_xref; cur_xref:=0; {this value doesn't matter} 4474 repeat next_xref:=xlink(this_xref); xlink(this_xref):=cur_xref; 4475 cur_xref:=this_xref; this_xref:=next_xref; 4476 until num(this_xref)<=def_flag; 4477 xlink(first_xref):=cur_xref; 4478 end 4479 else mid_xref:=0; {first list null} 4480 cur_xref:=0; 4481 while this_xref<>0 do 4482 begin next_xref:=xlink(this_xref); xlink(this_xref):=cur_xref; 4483 cur_xref:=this_xref; this_xref:=next_xref; 4484 end; 4485 if mid_xref>0 then xlink(mid_xref):=cur_xref 4486 else xlink(first_xref):=cur_xref; 4487 cur_xref:=xlink(first_xref) 4488 4489 @ The |footnote| procedure gives cross reference information about 4490 multiply defined module names (if the |flag| parameter is |def_flag|), or about 4491 the uses of a module name (if the |flag| parameter is zero). It assumes that 4492 |cur_xref| points to the first cross-reference entry of interest, and it 4493 leaves |cur_xref| pointing to the first element not printed. Typical outputs: 4494 `\.{\\A101.}'; `\.{\\Us370\\ET1009.}'; `\.{\\As8, 27\\*, 51\\ETs64.}'. 4495 4496 @p procedure footnote(@!flag:sixteen_bits); {outputs module cross-references} 4497 label done,exit; 4498 var q:xref_number; {cross-reference pointer variable} 4499 begin if num(cur_xref)<=flag then return; 4500 finish_line; out("\"); 4501 @.\\A@> 4502 @.\\U@> 4503 if flag=0 then out("U")@+else out("A"); 4504 @<Output all the module numbers on the reference list |cur_xref|@>; 4505 out("."); 4506 exit:end; 4507 4508 @ The following code distinguishes three cases, according as the number 4509 of cross references is one, two, or more than two. Variable |q| points 4510 to the first cross reference, and the last link is a zero. 4511 4512 @<Output all the module numbers...@>= 4513 q:=cur_xref; if num(xlink(q))>flag then out("s"); {plural} 4514 @.\\As@> 4515 @.\\Us@> 4516 loop@+ begin out_mod(num(cur_xref)-flag); 4517 cur_xref:=xlink(cur_xref); {point to the next cross reference to output} 4518 if num(cur_xref)<=flag then goto done; 4519 if num(xlink(cur_xref))>flag then out2(",")(" ") {not the last} 4520 else begin out3("\")("E")("T"); {the last} 4521 @.\\ET@> 4522 if cur_xref<>xlink(q) then out("s"); {the last of more than two} 4523 @.\\ETs@> 4524 end; 4525 end; 4526 done: 4527 4528 @ @<Output the code for the end of a module@>= 4529 out3("\")("f")("i"); finish_line; 4530 flush_buffer(0,false,false); {insert a blank line, it looks nice} 4531 @.\\fi@> 4532 4533 @* Phase three processing. 4534 We are nearly finished! \.{WEAVE}'s only remaining task is to write out the 4535 index, after sorting the identifiers and index entries. 4536 4537 @<Phase III: Output the cross-reference index@>= 4538 phase_three:=true; print_nl('Writing the index...'); 4539 if change_exists then 4540 begin finish_line; @<Tell about changed modules@>; 4541 end; 4542 finish_line; out4("\")("i")("n")("x"); finish_line; 4543 @.\\inx@> 4544 @<Do the first pass of sorting@>; 4545 @<Sort and output the index@>; 4546 out4("\")("f")("i")("n"); finish_line; 4547 @.\\fin@> 4548 @<Output all the module names@>; 4549 out4("\")("c")("o")("n"); finish_line; 4550 @.\\con@> 4551 print('Done.'); 4552 4553 @ Just before the index comes a list of all the changed modules, including 4554 the index module itself. 4555 4556 @<Glob...@>= 4557 @!k_module:0..max_modules; {runs through the modules} 4558 4559 @ @<Tell about changed modules@>= 4560 begin {remember that the index is already marked as changed} 4561 k_module:=1; 4562 out4("\")("c")("h")(" "); 4563 while k_module<module_count do 4564 begin if changed_module[k_module] then 4565 begin out_mod(k_module); out2(",")(" "); 4566 end; 4567 incr(k_module); 4568 end; 4569 out_mod(k_module); 4570 out("."); 4571 end 4572 4573 @ A left-to-right radix sorting method is used, since this makes it easy to 4574 adjust the collating sequence and since the running time will be at worst 4575 proportional to the total length of all entries in the index. We put the 4576 identifiers into 230 different lists based on their first characters. 4577 (Uppercase letters are put into the same list as the corresponding lowercase 4578 letters, since we want to have `$t<\\{TeX}<\&{to}$'.) The 4579 list for character |c| begins at location |bucket[c]| and continues through 4580 the |blink| array. 4581 4582 @<Glob...@>= 4583 @!bucket:array[ASCII_code] of name_pointer; 4584 @!next_name: name_pointer; {successor of |cur_name| when sorting} 4585 @!c:ASCII_code; {index into |bucket|} 4586 @!h:0..hash_size; {index into |hash|} 4587 @!blink:array[0..max_names] of sixteen_bits; {links in the buckets} 4588 4589 @ To begin the sorting, we go through all the hash lists and put each entry 4590 having a nonempty cross-reference list into the proper bucket. 4591 4592 @<Do the first pass...@>= 4593 for c:=0 to 255 do bucket[c]:=0; 4594 for h:=0 to hash_size-1 do 4595 begin next_name:=hash[h]; 4596 while next_name<>0 do 4597 begin cur_name:=next_name; next_name:=link[cur_name]; 4598 if xref[cur_name]<>0 then 4599 begin c:=byte_mem[cur_name mod ww,byte_start[cur_name]]; 4600 if (c<="Z")and(c>="A") then c:=c+@'40; 4601 blink[cur_name]:=bucket[c]; bucket[c]:=cur_name; 4602 end; 4603 end; 4604 end 4605 4606 @ During the sorting phase we shall use the |cat| and |trans| arrays from 4607 \.{WEAVE}'s parsing algorithm and rename them |depth| and |head|. They now 4608 represent a stack of identifier lists for all the index entries that have 4609 not yet been output. The variable |sort_ptr| tells how many such lists are 4610 present; the lists are output in reverse order (first |sort_ptr|, then 4611 |sort_ptr-1|, etc.). The |j|th list starts at |head[j]|, and if the first 4612 |k| characters of all entries on this list are known to be equal we have 4613 |depth[j]=k|. 4614 4615 @d depth==cat {reclaims memory that is no longer needed for parsing} 4616 @d head==trans {ditto} 4617 @d sort_ptr==scrap_ptr {ditto} 4618 @d max_sorts==max_scraps {ditto} 4619 4620 @<Globals...@>= 4621 @!cur_depth:eight_bits; {depth of current buckets} 4622 @!cur_byte:0..max_bytes; {index into |byte_mem|} 4623 @!cur_bank:0..ww-1; {row of |byte_mem|} 4624 @!cur_val:sixteen_bits; {current cross reference number} 4625 stat@!max_sort_ptr:0..max_sorts;@+tats {largest value of |sort_ptr|} 4626 4627 @ @<Set init...@>=stat max_sort_ptr:=0;@+tats 4628 4629 @ The desired alphabetic order is specified by the |collate| array; namely, 4630 |collate[0]<collate[1]<@t$\cdots$@><collate[229]|. 4631 4632 @<Glob...@>=@!collate:array[0..229] of ASCII_code; {collation order} 4633 4634 @ @<Local variables for init...@>= 4635 @!c:ASCII_code; {used to initialize |collate|} 4636 4637 @ We use the order $\hbox{null}<\.\ <\hbox{other characters}<\.\_< 4638 \.A=\.a<\cdots<\.Z=\.z<\.0<\cdots<\.9.$ 4639 4640 @<Set init...@>= 4641 collate[0]:=0; collate[1]:=" "; 4642 for c:=1 to " "-1 do collate[c+1]:=c; 4643 for c:=" "+1 to "0"-1 do collate[c]:=c; 4644 for c:="9"+1 to "A"-1 do collate[c-10]:=c; 4645 for c:="Z"+1 to "_"-1 do collate[c-36]:=c; 4646 collate["_"-36]:="_"+1; 4647 for c:="z"+1 to 255 do collate[c-63]:=c; 4648 collate[193]:="_"; 4649 for c:="a" to "z" do collate[c-"a"+194]:=c; 4650 for c:="0" to "9" do collate[c-"0"+220]:=c; 4651 4652 @ Procedure |unbucket| goes through the buckets and adds nonempty lists 4653 to the stack, using the collating sequence specified in the |collate| array. 4654 The parameter to |unbucket| tells the current depth in the buckets. 4655 Any two sequences that agree in their first 255 character positions are 4656 regarded as identical. 4657 4658 @d infinity=255 {$\infty$ (approximately)} 4659 4660 @p procedure unbucket(@!d:eight_bits); {empties buckets having depth |d|} 4661 var c:ASCII_code; {index into |bucket|} 4662 begin for c:=229 downto 0 do if bucket[collate[c]]>0 then 4663 begin if sort_ptr>max_sorts then overflow('sorting'); 4664 incr(sort_ptr); 4665 stat if sort_ptr>max_sort_ptr then max_sort_ptr:=sort_ptr;@;@+tats@;@/ 4666 if c=0 then depth[sort_ptr]:=infinity else depth[sort_ptr]:=d; 4667 head[sort_ptr]:=bucket[collate[c]]; bucket[collate[c]]:=0; 4668 end; 4669 end; 4670 4671 @ @<Sort and output...@>= 4672 sort_ptr:=0; unbucket(1); 4673 while sort_ptr>0 do 4674 begin cur_depth:=cat[sort_ptr]; 4675 if (blink[head[sort_ptr]]=0)or(cur_depth=infinity) then 4676 @<Output index entries for the list at |sort_ptr|@> 4677 else @<Split the list at |sort_ptr| into further lists@>; 4678 end 4679 4680 @ @<Split the list...@>= 4681 begin next_name:=head[sort_ptr]; 4682 repeat cur_name:=next_name; next_name:=blink[cur_name]; 4683 cur_byte:=byte_start[cur_name]+cur_depth; cur_bank:=cur_name mod ww; 4684 if cur_byte=byte_start[cur_name+ww] then c:=0 {we hit the end of the name} 4685 else begin c:=byte_mem[cur_bank,cur_byte]; 4686 if (c<="Z")and(c>="A") then c:=c+@'40; 4687 end; 4688 blink[cur_name]:=bucket[c]; bucket[c]:=cur_name; 4689 until next_name=0; 4690 decr(sort_ptr); unbucket(cur_depth+1); 4691 end 4692 4693 @ @<Output index...@>= 4694 begin cur_name:=head[sort_ptr]; 4695 @!debug if trouble_shooting then debug_help;@;@+gubed@/ 4696 repeat out2("\")(":"); 4697 @.\\:@> 4698 @<Output the name at |cur_name|@>; 4699 @<Output the cross-references at |cur_name|@>; 4700 cur_name:=blink[cur_name]; 4701 until cur_name=0; 4702 decr(sort_ptr); 4703 end 4704 4705 @ @<Output the name...@>= 4706 case ilk[cur_name] of 4707 normal: if length(cur_name)=1 then out2("\")("|")@+else out2("\")("\"); 4708 @.\\|@> 4709 @.\\\\@> 4710 roman: do_nothing; 4711 wildcard: out2("\")("9"); 4712 @.\\9@> 4713 typewriter: out2("\")("."); 4714 @.\\.@> 4715 othercases out2("\")("&") 4716 @.\\\&@> 4717 endcases;@/ 4718 out_name(cur_name) 4719 4720 @ Section numbers that are to be underlined are enclosed in 4721 `\.{\\[}$\,\ldots\,$\.]'. 4722 4723 @<Output the cross-references...@>= 4724 @<Invert the cross-reference list at |cur_name|, making |cur_xref| the head@>; 4725 repeat out2(",")(" "); cur_val:=num(cur_xref); 4726 if cur_val<def_flag then out_mod(cur_val) 4727 else begin out2("\")("["); out_mod(cur_val-def_flag); out("]"); 4728 @.\\[@> 4729 end; 4730 cur_xref:=xlink(cur_xref); 4731 until cur_xref=0; 4732 out("."); finish_line 4733 4734 @ List inversion is best thought of as popping elements off one stack and 4735 pushing them onto another. In this case |cur_xref| will be the head of 4736 the stack that we push things onto. 4737 4738 @<Invert the cross-reference list at |cur_name|, making |cur_xref| the head@>= 4739 this_xref:=xref[cur_name]; cur_xref:=0; 4740 repeat next_xref:=xlink(this_xref); xlink(this_xref):=cur_xref; 4741 cur_xref:=this_xref; this_xref:=next_xref; 4742 until this_xref=0 4743 4744 @ The following recursive procedure walks through the tree of module names and 4745 prints them. 4746 @^recursion@> 4747 4748 @p procedure mod_print(p:name_pointer); {print all module names in subtree |p|} 4749 begin if p>0 then 4750 begin mod_print(llink[p]);@/ 4751 out2("\")(":");@/ 4752 @.\\:@> 4753 tok_ptr:=1; text_ptr:=1; scrap_ptr:=0; init_stack; 4754 app(p+mod_flag); make_output; 4755 footnote(0); {|cur_xref| was set by |make_output|} 4756 finish_line;@/ 4757 mod_print(rlink[p]); 4758 end; 4759 end; 4760 4761 @ @<Output all the module names@>=@+mod_print(root) 4762 4763 @* Debugging. 4764 The \PASCAL\ debugger with which \.{WEAVE} was developed allows breakpoints 4765 to be set, and variables can be read and changed, but procedures cannot be 4766 executed. Therefore a `|debug_help|' procedure has been inserted in the main 4767 loops of each phase of the program; when |ddt| and |dd| are set to appropriate 4768 values, symbolic printouts of various tables will appear. 4769 4770 The idea is to set a breakpoint inside the |debug_help| routine, at the 4771 place of `\ignorespaces|breakpoint:|\unskip' below. Then when 4772 |debug_help| is to be activated, set |trouble_shooting| equal to |true|. 4773 The |debug_help| routine will prompt you for values of |ddt| and |dd|, 4774 discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending 4775 with zero or a negative number. Then control either passes to the 4776 breakpoint, allowing you to look at and/or change variables (if you typed 4777 zero), or to exit the routine (if you typed a negative value). 4778 4779 Another global variable, |debug_cycle|, can be used to skip silently 4780 past calls on |debug_help|. If you set |debug_cycle>1|, the program stops 4781 only every |debug_cycle| times |debug_help| is called; however, 4782 any error stop will set |debug_cycle| to zero. 4783 4784 @<Globals...@>= 4785 @!debug@!trouble_shooting:boolean; {is |debug_help| wanted?} 4786 @!ddt:integer; {operation code for the |debug_help| routine} 4787 @!dd:integer; {operand in procedures performed by |debug_help|} 4788 @!debug_cycle:integer; {threshold for |debug_help| stopping} 4789 @!debug_skipped:integer; {we have skipped this many |debug_help| calls} 4790 @!term_in:text_file; {the user's terminal as an input file} 4791 gubed 4792 4793 @ The debugging routine needs to read from the user's terminal. 4794 @^system dependencies@> 4795 @<Set init...@>= 4796 @!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0; tracing:=0;@/ 4797 trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works} 4798 reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|} 4799 gubed 4800 4801 @ @d breakpoint=888 {place where a breakpoint is desirable} 4802 @^system dependencies@> 4803 4804 @p @!debug procedure debug_help; {routine to display various things} 4805 label breakpoint,exit; 4806 var k:integer; {index into various arrays} 4807 begin incr(debug_skipped); 4808 if debug_skipped<debug_cycle then return; 4809 debug_skipped:=0; 4810 loop@+ begin print_nl('#'); update_terminal; {prompt} 4811 read(term_in,ddt); {read a debug-command code} 4812 if ddt<0 then return 4813 else if ddt=0 then 4814 begin goto breakpoint;@\ {go to every label at least once} 4815 breakpoint: ddt:=0;@\ 4816 end 4817 else begin read(term_in,dd); 4818 case ddt of 4819 1: print_id(dd); 4820 2: print_text(dd); 4821 3: for k:=1 to dd do print(xchr[buffer[k]]); 4822 4: for k:=1 to dd do print(xchr[mod_text[k]]); 4823 5: for k:=1 to out_ptr do print(xchr[out_buf[k]]); 4824 6: for k:=1 to dd do 4825 begin print_cat(cat[k]); print(' '); 4826 end; 4827 othercases print('?') 4828 endcases; 4829 end; 4830 end; 4831 exit:end; 4832 gubed 4833 4834 @* The main program. 4835 Let's put it all together now: \.{WEAVE} starts and ends here. 4836 @^system dependencies@> 4837 4838 The main procedure has been split into three sub-procedures in order to 4839 keep certain \PASCAL\ compilers from overflowing their capacity. 4840 @^split procedures@> 4841 4842 @p procedure Phase_I; 4843 begin @<Phase I:...@>; 4844 end; 4845 @# 4846 procedure Phase_II; 4847 begin @<Phase II:...@>; 4848 end; 4849 @# 4850 begin initialize; {beginning of the main program} 4851 print_ln(banner); {print a ``banner line''} 4852 @<Store all the reserved words@>; 4853 Phase_I; Phase_II;@/ 4854 @<Phase III:...@>; 4855 @<Check that all changes have been read@>; 4856 end_of_WEAVE: 4857 stat @<Print statistics about memory usage@>;@+tats@;@/ 4858 @t\4\4@>{here files should be closed if the operating system requires it} 4859 @<Print the job |history|@>; 4860 end. 4861 4862 @ @<Print statistics about memory usage@>= 4863 print_nl('Memory usage statistics: ', 4864 name_ptr:1,' names, ', xref_ptr:1,' cross references, ', 4865 byte_ptr[0]:1); 4866 for cur_bank:=1 to ww-1 do print('+',byte_ptr[cur_bank]:1); 4867 print(' bytes;'); 4868 print_nl('parsing required ',max_scr_ptr:1,' scraps, ',max_txt_ptr:1, 4869 ' texts, ',max_tok_ptr:1,' tokens, ', max_stack_ptr:1,' levels;'); 4870 print_nl('sorting required ',max_sort_ptr:1, ' levels.') 4871 4872 @ Some implementations may wish to pass the |history| value to the 4873 operating system so that it can be used to govern whether or not other 4874 programs are started. Here we simply report the history to the user. 4875 @^system dependencies@> 4876 4877 @<Print the job |history|@>= 4878 case history of 4879 spotless: print_nl('(No errors were found.)'); 4880 harmless_message: print_nl('(Did you see the warning message above?)'); 4881 error_message: print_nl('(Pardon me, but I think I spotted something wrong.)'); 4882 fatal_message: print_nl('(That was a fatal error, my friend.)'); 4883 end {there are no other cases} 4884 4885 @* System-dependent changes. 4886 This module should be replaced, if necessary, by changes to the program 4887 that are necessary to make \.{WEAVE} work at a particular installation. 4888 It is usually best to design your change file so that all changes to 4889 previous modules preserve the module numbering; then everybody's version 4890 will be consistent with the printed program. More extensive changes, 4891 which introduce new modules, can be inserted here; then only the index 4892 itself will get a new module number. 4893 @^system dependencies@> 4894 4895 @* Index. 4896 If you have read and understood the code for Phase III above, you know what 4897 is in this index and how it got here. All modules in which an identifier is 4898 used are listed with that identifier, except that reserved words are 4899 indexed only when they appear in format definitions, and the appearances 4900 of identifiers in module names are not indexed. Underlined entries 4901 correspond to where the identifier was declared. Error messages, control 4902 sequences put into the output, and a few 4903 other things like ``recursion'' are indexed here too.