modernc.org/knuth@v0.0.4/web/testdata/ctan.org/tex-archive/systems/knuth/dist/tex/tex.web (about) 1 % This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved. 2 % Unlimited copying and redistribution of this file are permitted as long 3 % as this file is not modified. Modifications are permitted, but only if 4 % the resulting file is not named tex.web. (The WEB system provides 5 % for alterations via an auxiliary file; the master file should stay intact.) 6 % See Appendix H of the WEB manual for hints on how to install this program. 7 % And see Appendix A of the TRIP manual for details about how to validate it. 8 9 % TeX is a trademark of the American Mathematical Society. 10 % METAFONT is a trademark of Addison-Wesley Publishing Company. 11 12 % Version 0 was released in September 1982 after it passed a variety of tests. 13 % Version 1 was released in November 1983 after thorough testing. 14 % Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984). 15 % Version 1.2 allowed `0' in response to an error, et alia (October 1984). 16 % Version 1.3 made memory allocation more flexible and local (November 1984). 17 % Version 1.4 fixed accents right after line breaks, et alia (April 1985). 18 % Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985). 19 % Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986). 20 % Version 2.1 corrected anomalies in discretionary breaks (January 1987). 21 % Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987). 22 % Version 2.3 avoided incomplete page in premature termination (August 1987). 23 % Version 2.4 fixed \noaligned rules in indented displays (August 1987). 24 % Version 2.5 saved cur_order when expanding tokens (September 1987). 25 % Version 2.6 added 10sp slop when shipping leaders (November 1987). 26 % Version 2.7 improved rounding of negative-width characters (November 1987). 27 % Version 2.8 fixed weird bug if no \patterns are used (December 1987). 28 % Version 2.9 made \csname\endcsname's "relax" local (December 1987). 29 % Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988). 30 % Version 2.92 fixed \patterns, also file names with complex macros (May 1988). 31 % Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988). 32 % Version 2.94 kept open_log_file from calling fatal_error (November 1988). 33 % Version 2.95 solved that problem a better way (December 1988). 34 % Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989). 35 % Version 2.97 corrected blunder in creating 2.95 (February 1989). 36 % Version 2.98 omitted save_for_after at outer level (March 1989). 37 % Version 2.99 caught $$\begingroup\halign..$$ (June 1989). 38 % Version 2.991 caught .5\ifdim.6... (June 1989). 39 % Version 2.992 introduced major changes for 8-bit extensions (September 1989). 40 % Version 2.993 fixed a save_stack synchronization bug et alia (December 1989). 41 % Version 3.0 fixed unusual displays; was more \output robust (March 1990). 42 % Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990). 43 % Version 3.14 fixed unprintable font names and corrected typos (March 1991). 44 % Version 3.141 more of same; reconstituted ligatures better (March 1992). 45 % Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993). 46 % Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995). 47 % Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002). 48 % Version 3.1415926 was a general cleanup with minor fixes (February 2008). 49 % Version 3.14159265 was similar (January 2014). 50 % Version 3.141592653 was similar but more extensive (January 2021). 51 52 % A reward of $327.68 will be paid to the first finder of any remaining bug. 53 54 % Although considerable effort has been expended to make the TeX program 55 % correct and reliable, no warranty is implied; the author disclaims any 56 % obligation or liability for damages, including but not limited to 57 % special, indirect, or consequential damages arising out of or in 58 % connection with the use or performance of this software. This work has 59 % been a ``labor of love'' and the author hopes that users enjoy it. 60 61 % Here is TeX material that gets inserted after \input webmac 62 \def\hang{\hangindent 3em\noindent\ignorespaces} 63 \def\hangg#1 {\hang\hbox{#1 }} 64 \def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} 65 \font\ninerm=cmr9 66 \let\mc=\ninerm % medium caps for names like SAIL 67 \def\PASCAL{Pascal} 68 \def\ph{\hbox{Pascal-H}} 69 \def\pct!{{\char`\%}} % percent sign in ordinary text 70 \font\logo=logo10 % font used for the METAFONT logo 71 \def\MF{{\logo META}\-{\logo FONT}} 72 \def\<#1>{$\langle#1\rangle$} 73 \def\section{\mathhexbox278} 74 75 \def\(#1){} % this is used to make section names sort themselves better 76 \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> 77 78 \outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section 79 \def\rhead{PART #2:\uppercase{#3}} % define running headline 80 \message{*\modno} % progress report 81 \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next 82 \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces} 83 \let\?=\relax % we want to be able to \write a \? 84 85 \def\title{\TeX82} 86 \def\topofcontents{\hsize 5.5in 87 \vglue 0pt plus 1fil minus 1.5in 88 \def\?##1]{\hbox to 1in{\hfil##1.\ }} 89 } 90 \def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} 91 \pageno=3 92 \def\glob{13} % this should be the section number of "<Global...>" 93 \def\gglob{20, 26} % this should be the next two sections of "<Global...>" 94 95 @* \[1] Introduction. 96 This is \TeX, a document compiler intended to produce typesetting of high 97 quality. 98 The \PASCAL\ program that follows is the definition of \TeX82, a standard 99 @:PASCAL}{\PASCAL@> 100 @!@:TeX82}{\TeX82@> 101 version of \TeX\ that is designed to be highly portable so that identical output 102 will be obtainable on a great variety of computers. 103 104 The main purpose of the following program is to explain the algorithms of \TeX\ 105 as clearly as possible. As a result, the program will not necessarily be very 106 efficient when a particular \PASCAL\ compiler has translated it into a 107 particular machine language. However, the program has been written so that it 108 can be tuned to run efficiently in a wide variety of operating environments 109 by making comparatively few changes. Such flexibility is possible because 110 the documentation that follows is written in the \.{WEB} language, which is 111 at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} 112 to \PASCAL\ is able to introduce most of the necessary refinements. 113 Semi-automatic translation to other languages is also feasible, because the 114 program below does not make extensive use of features that are peculiar to 115 \PASCAL. 116 117 A large piece of software like \TeX\ has inherent complexity that cannot 118 be reduced below a certain level of difficulty, although each individual 119 part is fairly simple by itself. The \.{WEB} language is intended to make 120 the algorithms as readable as possible, by reflecting the way the 121 individual program pieces fit together and by providing the 122 cross-references that connect different parts. Detailed comments about 123 what is going on, and about why things were done in certain ways, have 124 been liberally sprinkled throughout the program. These comments explain 125 features of the implementation, but they rarely attempt to explain the 126 \TeX\ language itself, since the reader is supposed to be familiar with 127 {\sl The \TeX book}. 128 @.WEB@> 129 @:TeXbook}{\sl The \TeX book@> 130 131 @ The present implementation has a long ancestry, beginning in the summer 132 of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded 133 a prototype 134 @^Plass, Michael Frederick@> 135 @^Liang, Franklin Mark@> 136 @^Knuth, Donald Ervin@> 137 based on some specifications that the author had made in May of that year. 138 This original proto\TeX\ included macro definitions and elementary 139 manipulations on boxes and glue, but it did not have line-breaking, 140 page-breaking, mathematical formulas, alignment routines, error recovery, 141 or the present semantic nest; furthermore, 142 it used character lists instead of token lists, so that a control sequence 143 like \.{\\halign} was represented by a list of seven characters. A 144 complete version of \TeX\ was designed and coded by the author in late 145 1977 and early 1978; that program, like its prototype, was written in the 146 {\mc SAIL} language, for which an excellent debugging system was 147 available. Preliminary plans to convert the {\mc SAIL} code into a form 148 somewhat like the present ``web'' were developed by Luis Trabb~Pardo and 149 @^Trabb Pardo, Luis Isidoro@> 150 the author at the beginning of 1979, and a complete implementation was 151 created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which 152 @^Zabala Salelles, Ignacio Andr\'es@> 153 was written by the author during the latter part of 1981 and the early 154 part of 1982, also incorporates ideas from the 1979 implementation of 155 @^Guibas, Leonidas Ioannis@> 156 @^Sedgewick, Robert@> 157 @^Wyatt, Douglas Kirk@> 158 \TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick, 159 and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred 160 refinements were introduced into \TeX82 based on the experiences gained with 161 the original implementations, so that essentially every part of the system 162 has been substantially improved. After the appearance of ``Version 0'' in 163 September 1982, this program benefited greatly from the comments of 164 many other people, notably David~R. Fuchs and Howard~W. Trickey. 165 A final revision in September 1989 extended the input character set to 166 eight-bit codes and introduced the ability to hyphenate words from 167 different languages, based on some ideas of Michael~J. Ferguson. 168 @^Fuchs, David Raymond@> 169 @^Trickey, Howard Wellington@> 170 @^Ferguson, Michael John@> 171 172 No doubt there still is plenty of room for improvement, but the author 173 is firmly committed to keeping \TeX82 ``frozen'' from now on; stability 174 and reliability are to be its main virtues. 175 176 On the other hand, the \.{WEB} description can be extended without changing 177 the core of \TeX82 itself, and the program has been designed so that such 178 extensions are not extremely difficult to make. 179 The |banner| string defined here should be changed whenever \TeX\ 180 undergoes any modifications, so that it will be clear which version of 181 \TeX\ might be the guilty party when a problem arises. 182 @^extensions to \TeX@> 183 @^system dependencies@> 184 185 If this program is changed, the resulting system should not be called 186 `\TeX'; the official name `\TeX' by itself is reserved 187 for software systems that are fully compatible with each other. 188 A special test suite called the ``\.{TRIP} test'' is available for 189 helping to determine whether a particular implementation deserves to be 190 known as `\TeX' [cf.~Stanford Computer Science report CS1027, 191 November 1984]. 192 193 @d banner=='This is TeX, Version 3.141592653' {printed when \TeX\ starts} 194 195 @ Different \PASCAL s have slightly different conventions, and the present 196 @!@:PASCAL H}{\ph@> 197 program expresses \TeX\ in terms of the \PASCAL\ that was 198 available to the author in 1982. Constructions that apply to 199 this particular compiler, which we shall call \ph, should help the 200 reader see how to make an appropriate interface for other systems 201 if necessary. (\ph\ is Charles Hedrick's modification of a compiler 202 @^Hedrick, Charles Locke@> 203 for the DECsystem-10 that was originally developed at the University of 204 Hamburg; cf.\ {\sl Software---Practice and Experience \bf6} (1976), 205 29--42. The \TeX\ program below is intended to be adaptable, without 206 extensive changes, to most other versions of \PASCAL, so it does not fully 207 use the admirable features of \ph. Indeed, a conscious effort has been 208 made here to avoid using several idiosyncratic features of standard 209 \PASCAL\ itself, so that most of the code can be translated mechanically 210 into other high-level languages. For example, the `\&{with}' and `\\{new}' 211 features are not used, nor are pointer types, set types, or enumerated 212 scalar types; there are no `\&{var}' parameters, except in the case of files; 213 there are no tag fields on variant records; there are no assignments 214 |real:=integer|; no procedures are declared local to other procedures.) 215 216 The portions of this program that involve system-dependent code, where 217 changes might be necessary because of differences between \PASCAL\ compilers 218 and/or differences between 219 operating systems, can be identified by looking at the sections whose 220 numbers are listed under `system dependencies' in the index. Furthermore, 221 the index entries for `dirty \PASCAL' list all places where the restrictions 222 of \PASCAL\ have not been followed perfectly, for one reason or another. 223 @!@^system dependencies@> 224 @!@^dirty \PASCAL@> 225 226 Incidentally, \PASCAL's standard |round| function can be problematical, 227 because it disagrees with the IEEE floating-point standard. 228 Many implementors have 229 therefore chosen to substitute their own home-grown rounding procedure. 230 231 @ The program begins with a normal \PASCAL\ program heading, whose 232 components will be filled in later, using the conventions of \.{WEB}. 233 @.WEB@> 234 For example, the portion of the program called `\X\glob:Global 235 variables\X' below will be replaced by a sequence of variable declarations 236 that starts in $\section\glob$ of this documentation. In this way, we are able 237 to define each individual global variable when we are prepared to 238 understand what it means; we do not have to define all of the globals at 239 once. Cross references in $\section\glob$, where it says ``See also 240 sections \gglob, \dots,'' also make it possible to look at the set of 241 all global variables, if desired. Similar remarks apply to the other 242 portions of the program heading. 243 244 Actually the heading shown here is not quite normal: The |program| line 245 does not mention any |output| file, because \ph\ would ask the \TeX\ user 246 to specify a file name if |output| were specified here. 247 @:PASCAL H}{\ph@> 248 @^system dependencies@> 249 250 @d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:} 251 @f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'} 252 @f type==true {but `|type|' will not be treated as a reserved word} 253 254 @p @t\4@>@<Compiler directives@>@/ 255 program TEX; {all file names are defined dynamically} 256 label @<Labels in the outer block@>@/ 257 const @<Constants in the outer block@>@/ 258 mtype @<Types in the outer block@>@/ 259 var @<Global variables@>@/ 260 @# 261 procedure initialize; {this procedure gets things started properly} 262 var @<Local variables for initialization@>@/ 263 begin @<Initialize whatever \TeX\ might access@>@; 264 end;@# 265 @t\4@>@<Basic printing procedures@>@/ 266 @t\4@>@<Error handling procedures@>@/ 267 268 @ The overall \TeX\ program begins with the heading just shown, after which 269 comes a bunch of procedure declarations and function declarations. 270 Finally we will get to the main program, which begins with the 271 comment `|start_here|'. If you want to skip down to the 272 main program now, you can look up `|start_here|' in the index. 273 But the author suggests that the best way to understand this program 274 is to follow pretty much the order of \TeX's components as they appear in the 275 \.{WEB} description you are now reading, since the present ordering is 276 intended to combine the advantages of the ``bottom up'' and ``top down'' 277 approaches to the problem of understanding a somewhat complicated system. 278 279 @ Three labels must be declared in the main program, so we give them 280 symbolic names. 281 282 @d start_of_TEX=1 {go here when \TeX's variables are initialized} 283 @d end_of_TEX=9998 {go here to close files and terminate gracefully} 284 @d final_end=9999 {this label marks the ending of the program} 285 286 @<Labels in the out...@>= 287 start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end; 288 {key control points} 289 290 @ Some of the code below is intended to be used only when diagnosing the 291 strange behavior that sometimes occurs when \TeX\ is being installed or 292 when system wizards are fooling around with \TeX\ without quite knowing 293 what they are doing. Such code will not normally be compiled; it is 294 delimited by the codewords `$|debug|\ldots|gubed|$', with apologies 295 to people who wish to preserve the purity of English. 296 297 Similarly, there is some conditional code delimited by 298 `$|stat|\ldots|tats|$' that is intended for use when statistics are to be 299 kept about \TeX's memory usage. The |stat| $\ldots$ |tats| code also 300 implements diagnostic information for \.{\\tracingparagraphs}, 301 \.{\\tracingpages}, and \.{\\tracingrestores}. 302 @^debugging@> 303 304 @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} 305 @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} 306 @f debug==begin 307 @f gubed==end 308 @# 309 @d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering 310 usage statistics} 311 @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering 312 usage statistics} 313 @f stat==begin 314 @f tats==end 315 316 @ This program has two important variations: (1) There is a long and slow 317 version called \.{INITEX}, which does the extra calculations needed to 318 @.INITEX@> 319 initialize \TeX's internal tables; and (2)~there is a shorter and faster 320 production version, which cuts the initialization to a bare minimum. 321 Parts of the program that are needed in (1) but not in (2) are delimited by 322 the codewords `$|init|\ldots|tini|$'. 323 324 @d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version} 325 @d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version} 326 @f init==begin 327 @f tini==end 328 329 @<Initialize whatever...@>= 330 @<Set initial values of key variables@>@/ 331 @!init @<Initialize table entries (done by \.{INITEX} only)@>@;@+tini 332 333 @ If the first character of a \PASCAL\ comment is a dollar sign, 334 \ph\ treats the comment as a list of ``compiler directives'' that will 335 affect the translation of this program into machine language. The 336 directives shown below specify full checking and inclusion of the \PASCAL\ 337 debugger when \TeX\ is being debugged, but they cause range checking and other 338 redundant code to be eliminated when the production system is being generated. 339 Arithmetic overflow will be detected in all cases. 340 @:PASCAL H}{\ph@> 341 @^system dependencies@> 342 @^overflow in arithmetic@> 343 344 @<Compiler directives@>= 345 @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} 346 @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} 347 348 @ This \TeX\ implementation conforms to the rules of the {\sl Pascal User 349 @:PASCAL}{\PASCAL@> 350 @^system dependencies@> 351 Manual} published by Jensen and Wirth in 1975, except where system-dependent 352 @^Wirth, Niklaus@> 353 @^Jensen, Kathleen@> 354 code is necessary to make a useful system program, and except in another 355 respect where such conformity would unnecessarily obscure the meaning 356 and clutter up the code: We assume that |case| statements may include a 357 default case that applies if no matching label is found. Thus, we shall use 358 constructions like 359 $$\vbox{\halign{\ignorespaces#\hfil\cr 360 |case x of|\cr 361 1: $\langle\,$code for $x=1\,\rangle$;\cr 362 3: $\langle\,$code for $x=3\,\rangle$;\cr 363 |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr 364 |endcases|\cr}}$$ 365 since most \PASCAL\ compilers have plugged this hole in the language by 366 incorporating some sort of default mechanism. For example, the \ph\ 367 compiler allows `|others|:' as a default label, and other \PASCAL s allow 368 syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The 369 definitions of |othercases| and |endcases| should be changed to agree with 370 local conventions. Note that no semicolon appears before |endcases| in 371 this program, so the definition of |endcases| should include a semicolon 372 if the compiler wants one. (Of course, if no default mechanism is 373 available, the |case| statements of \TeX\ will have to be laboriously 374 extended by listing all remaining cases. People who are stuck with such 375 \PASCAL s have, in fact, done this, successfully but not happily!) 376 @:PASCAL H}{\ph@> 377 378 @d othercases == others: {default for cases not listed explicitly} 379 @d endcases == @+end {follows the default case in an extended |case| statement} 380 @f othercases == else 381 @f endcases == end 382 383 @ The following parameters can be changed at compile time to extend or 384 reduce \TeX's capacity. They may have different values in \.{INITEX} and 385 in production versions of \TeX. 386 @.INITEX@> 387 @^system dependencies@> 388 389 @<Constants...@>= 390 @!mem_max=30000; {greatest index in \TeX's internal |mem| array; 391 must be strictly less than |max_halfword|; 392 must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|} 393 @!mem_min=0; {smallest index in \TeX's internal |mem| array; 394 must be |min_halfword| or more; 395 must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|} 396 @!buf_size=500; {maximum number of characters simultaneously present in 397 current lines of open files and in control sequences between 398 \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|} 399 @!error_line=72; {width of context lines on terminal error messages} 400 @!half_error_line=42; {width of first lines of contexts in terminal 401 error messages; should be between 30 and |error_line-15|} 402 @!max_print_line=79; {width of longest text lines output; should be at least 60} 403 @!stack_size=200; {maximum number of simultaneous input sources} 404 @!max_in_open=6; {maximum number of input files and error insertions that 405 can be going on simultaneously} 406 @!font_max=75; {maximum internal font number; must not exceed |max_quarterword| 407 and must be at most |font_base+256|} 408 @!font_mem_size=20000; {number of words of |font_info| for all fonts} 409 @!param_size=60; {maximum number of simultaneous macro parameters} 410 @!nest_size=40; {maximum number of semantic levels simultaneously active} 411 @!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|} 412 @!string_vacancies=8000; {the minimum number of characters that should be 413 available for the user's control sequences and font names, 414 after \TeX's own error messages are stored} 415 @!pool_size=32000; {maximum number of characters in strings, including all 416 error messages and help texts, and the names of all fonts and 417 control sequences; must exceed |string_vacancies| by the total 418 length of \TeX's own strings, which is currently about 23000} 419 @!save_size=600; {space for saving values outside of current group; must be 420 at most |max_halfword|} 421 @!trie_size=8000; {space for hyphenation patterns; should be larger for 422 \.{INITEX} than it is in production versions of \TeX} 423 @!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns} 424 @!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8} 425 @!file_name_size=40; {file names shouldn't be longer than this} 426 @!pool_name='TeXformats:TEX.POOL '; 427 {string of length |file_name_size|; tells where the string pool appears} 428 @.TeXformats@> 429 430 @ Like the preceding parameters, the following quantities can be changed 431 at compile time to extend or reduce \TeX's capacity. But if they are changed, 432 it is necessary to rerun the initialization program \.{INITEX} 433 @.INITEX@> 434 to generate new tables for the production \TeX\ program. 435 One can't simply make helter-skelter changes to the following constants, 436 since certain rather complex initialization 437 numbers are computed from them. They are defined here using 438 \.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to 439 emphasize this distinction. 440 441 @d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX}; 442 must not be less than |mem_min|} 443 @d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX}; 444 must be substantially larger than |mem_bot| 445 and not greater than |mem_max|} 446 @d font_base=0 {smallest internal font number; must not be less 447 than |min_quarterword|} 448 @d hash_size=2100 {maximum number of control sequences; it should be at most 449 about |(mem_max-mem_min)/10|} 450 @d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|} 451 @d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions} 452 @^system dependencies@> 453 454 @ In case somebody has inadvertently made bad settings of the ``constants,'' 455 \TeX\ checks them using a global variable called |bad|. 456 457 This is the first of many sections of \TeX\ where global variables are 458 defined. 459 460 @<Glob...@>= 461 @!bad:integer; {is some ``constant'' wrong?} 462 463 @ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=14|', 464 or something similar. (We can't do that until |max_halfword| has been defined.) 465 466 @<Check the ``constant'' values for consistency@>= 467 bad:=0; 468 if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1; 469 if max_print_line<60 then bad:=2; 470 if dvi_buf_size mod 8<>0 then bad:=3; 471 if mem_bot+1100>mem_top then bad:=4; 472 if hash_prime>hash_size then bad:=5; 473 if max_in_open>=128 then bad:=6; 474 if mem_top<256+11 then bad:=7; {we will want |null_list>255|} 475 476 @ Labels are given symbolic names by the following definitions, so that 477 occasional |goto| statements will be meaningful. We insert the label 478 `|exit|' just before the `\ignorespaces|end|\unskip' of a procedure in 479 which we have used the `|return|' statement defined below; the label 480 `|restart|' is occasionally used at the very beginning of a procedure; and 481 the label `|reswitch|' is occasionally used just prior to a |case| 482 statement in which some cases change the conditions and we wish to branch 483 to the newly applicable case. Loops that are set up with the |loop| 484 construction defined below are commonly exited by going to `|done|' or to 485 `|found|' or to `|not_found|', and they are sometimes repeated by going to 486 `|continue|'. If two or more parts of a subroutine start differently but 487 end up the same, the shared code may be gathered together at 488 `|common_ending|'. 489 490 Incidentally, this program never declares a label that isn't actually used, 491 because some fussy \PASCAL\ compilers will complain about redundant labels. 492 493 @d exit=10 {go here to leave a procedure} 494 @d restart=20 {go here to start a procedure again} 495 @d reswitch=21 {go here to start a case statement again} 496 @d continue=22 {go here to resume a loop} 497 @d done=30 {go here to exit a loop} 498 @d done1=31 {like |done|, when there is more than one loop} 499 @d done2=32 {for exiting the second loop in a long block} 500 @d done3=33 {for exiting the third loop in a very long block} 501 @d done4=34 {for exiting the fourth loop in an extremely long block} 502 @d done5=35 {for exiting the fifth loop in an immense block} 503 @d done6=36 {for exiting the sixth loop in a block} 504 @d found=40 {go here when you've found it} 505 @d found1=41 {like |found|, when there's more than one per routine} 506 @d found2=42 {like |found|, when there's more than two per routine} 507 @d not_found=45 {go here when you've found nothing} 508 @d common_ending=50 {go here when you want to merge with another branch} 509 510 @ Here are some macros for common programming idioms. 511 512 @d incr(#) == #:=#+1 {increase a variable by unity} 513 @d decr(#) == #:=#-1 {decrease a variable by unity} 514 @d negate(#) == #:=-# {change the sign of a variable} 515 @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} 516 @f loop == xclause 517 {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} 518 @d do_nothing == {empty statement} 519 @d return == goto exit {terminate a procedure call} 520 @f return == nil 521 @d empty=0 {symbolic name for a null constant} 522 523 @* \[2] The character set. 524 In order to make \TeX\ readily portable to a wide variety of 525 computers, all of its input text is converted to an internal eight-bit 526 code that includes standard ASCII, the ``American Standard Code for 527 Information Interchange.'' This conversion is done immediately when each 528 character is read in. Conversely, characters are converted from ASCII to 529 the user's external representation just before they are output to a 530 text file. 531 532 Such an internal code is relevant to users of \TeX\ primarily because it 533 governs the positions of characters in the fonts. For example, the 534 character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets 535 this letter it specifies character number 65 in the current font. 536 If that font actually has `\.A' in a different position, \TeX\ doesn't 537 know what the real position is; the program that does the actual printing from 538 \TeX's device-independent files is responsible for converting from ASCII to 539 a particular font encoding. 540 @^ASCII code@> 541 542 \TeX's internal code also defines the value of constants 543 that begin with a reverse apostrophe; and it provides an index to the 544 \.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode} 545 tables. 546 547 @ Characters of text that have been converted to \TeX's internal form 548 are said to be of type |ASCII_code|, which is a subrange of the integers. 549 550 @<Types...@>= 551 @!ASCII_code=0..255; {eight-bit numbers} 552 553 @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit 554 character sets were common, so it did not make provision for lowercase 555 letters. Nowadays, of course, we need to deal with both capital and small 556 letters in a convenient way, especially in a program for typesetting; 557 so the present specification of \TeX\ has been written under the assumption 558 that the \PASCAL\ compiler and run-time system permit the use of text files 559 with more than 64 distinguishable characters. More precisely, we assume that 560 the character set contains at least the letters and symbols associated 561 with ASCII codes @'40 through @'176; all of these characters are now 562 available on most computer terminals. 563 564 Since we are dealing with more characters than were present in the first 565 \PASCAL\ compilers, we have to decide what to call the associated data 566 type. Some \PASCAL s use the original name |char| for the 567 characters in text files, even though there now are more than 64 such 568 characters, while other \PASCAL s consider |char| to be a 64-element 569 subrange of a larger data type that has some other name. 570 571 In order to accommodate this difference, we shall use the name |text_char| 572 to stand for the data type of the characters that are converted to and 573 from |ASCII_code| when they are input and output. We shall also assume 574 that |text_char| consists of the elements |chr(first_text_char)| through 575 |chr(last_text_char)|, inclusive. The following definitions should be 576 adjusted if necessary. 577 @^system dependencies@> 578 579 @d text_char == char {the data type of characters in text files} 580 @d first_text_char=0 {ordinal number of the smallest element of |text_char|} 581 @d last_text_char=255 {ordinal number of the largest element of |text_char|} 582 583 @<Local variables for init...@>= 584 @!i:integer; 585 586 @ The \TeX\ processor converts between ASCII code and 587 the user's external character set by means of arrays |xord| and |xchr| 588 that are analogous to \PASCAL's |ord| and |chr| functions. 589 590 @<Glob...@>= 591 @!xord: array [text_char] of ASCII_code; 592 {specifies conversion of input characters} 593 @!xchr: array [ASCII_code] of text_char; 594 {specifies conversion of output characters} 595 596 @ Since we are assuming that our \PASCAL\ system is able to read and 597 write the visible characters of standard ASCII (although not 598 necessarily using the ASCII codes to represent them), the following 599 assignment statements initialize the standard part of the |xchr| array 600 properly, without needing any system-dependent changes. On the other 601 hand, it is possible to implement \TeX\ with less complete character 602 sets, and in such cases it will be necessary to change something here. 603 @^system dependencies@> 604 605 @<Set init...@>= 606 xchr[@'40]:=' '; 607 xchr[@'41]:='!'; 608 xchr[@'42]:='"'; 609 xchr[@'43]:='#'; 610 xchr[@'44]:='$'; 611 xchr[@'45]:='%'; 612 xchr[@'46]:='&'; 613 xchr[@'47]:='''';@/ 614 xchr[@'50]:='('; 615 xchr[@'51]:=')'; 616 xchr[@'52]:='*'; 617 xchr[@'53]:='+'; 618 xchr[@'54]:=','; 619 xchr[@'55]:='-'; 620 xchr[@'56]:='.'; 621 xchr[@'57]:='/';@/ 622 xchr[@'60]:='0'; 623 xchr[@'61]:='1'; 624 xchr[@'62]:='2'; 625 xchr[@'63]:='3'; 626 xchr[@'64]:='4'; 627 xchr[@'65]:='5'; 628 xchr[@'66]:='6'; 629 xchr[@'67]:='7';@/ 630 xchr[@'70]:='8'; 631 xchr[@'71]:='9'; 632 xchr[@'72]:=':'; 633 xchr[@'73]:=';'; 634 xchr[@'74]:='<'; 635 xchr[@'75]:='='; 636 xchr[@'76]:='>'; 637 xchr[@'77]:='?';@/ 638 xchr[@'100]:='@@'; 639 xchr[@'101]:='A'; 640 xchr[@'102]:='B'; 641 xchr[@'103]:='C'; 642 xchr[@'104]:='D'; 643 xchr[@'105]:='E'; 644 xchr[@'106]:='F'; 645 xchr[@'107]:='G';@/ 646 xchr[@'110]:='H'; 647 xchr[@'111]:='I'; 648 xchr[@'112]:='J'; 649 xchr[@'113]:='K'; 650 xchr[@'114]:='L'; 651 xchr[@'115]:='M'; 652 xchr[@'116]:='N'; 653 xchr[@'117]:='O';@/ 654 xchr[@'120]:='P'; 655 xchr[@'121]:='Q'; 656 xchr[@'122]:='R'; 657 xchr[@'123]:='S'; 658 xchr[@'124]:='T'; 659 xchr[@'125]:='U'; 660 xchr[@'126]:='V'; 661 xchr[@'127]:='W';@/ 662 xchr[@'130]:='X'; 663 xchr[@'131]:='Y'; 664 xchr[@'132]:='Z'; 665 xchr[@'133]:='['; 666 xchr[@'134]:='\'; 667 xchr[@'135]:=']'; 668 xchr[@'136]:='^'; 669 xchr[@'137]:='_';@/ 670 xchr[@'140]:='`'; 671 xchr[@'141]:='a'; 672 xchr[@'142]:='b'; 673 xchr[@'143]:='c'; 674 xchr[@'144]:='d'; 675 xchr[@'145]:='e'; 676 xchr[@'146]:='f'; 677 xchr[@'147]:='g';@/ 678 xchr[@'150]:='h'; 679 xchr[@'151]:='i'; 680 xchr[@'152]:='j'; 681 xchr[@'153]:='k'; 682 xchr[@'154]:='l'; 683 xchr[@'155]:='m'; 684 xchr[@'156]:='n'; 685 xchr[@'157]:='o';@/ 686 xchr[@'160]:='p'; 687 xchr[@'161]:='q'; 688 xchr[@'162]:='r'; 689 xchr[@'163]:='s'; 690 xchr[@'164]:='t'; 691 xchr[@'165]:='u'; 692 xchr[@'166]:='v'; 693 xchr[@'167]:='w';@/ 694 xchr[@'170]:='x'; 695 xchr[@'171]:='y'; 696 xchr[@'172]:='z'; 697 xchr[@'173]:='{'; 698 xchr[@'174]:='|'; 699 xchr[@'175]:='}'; 700 xchr[@'176]:='~';@/ 701 702 @ Some of the ASCII codes without visible characters have been given symbolic 703 names in this program because they are used with a special meaning. 704 705 @d null_code=@'0 {ASCII code that might disappear} 706 @d carriage_return=@'15 {ASCII code used at end of line} 707 @d invalid_code=@'177 {ASCII code that many systems prohibit in text files} 708 709 @ The ASCII code is ``standard'' only to a certain extent, since many 710 computer installations have found it advantageous to have ready access 711 to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} 712 gives a complete specification of the intended correspondence between 713 characters and \TeX's internal representation. 714 @:TeXbook}{\sl The \TeX book@> 715 716 If \TeX\ is being used 717 on a garden-variety \PASCAL\ for which only standard ASCII 718 codes will appear in the input and output files, it doesn't really matter 719 what codes are specified in |xchr[0..@'37]|, but the safest policy is to 720 blank everything out by using the code shown below. 721 722 However, other settings of |xchr| will make \TeX\ more friendly on 723 computers that have an extended character set, so that users can type things 724 like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can 725 assign codes arbitrarily, giving an |xchr| equivalent to whatever 726 characters the users of \TeX\ are allowed to have in their input files. 727 It is best to make the codes correspond to the intended interpretations as 728 shown in Appendix~C whenever possible; but this is not necessary. For 729 example, in countries with an alphabet of more than 26 letters, it is 730 usually best to map the additional letters into codes less than~@'40. 731 To get the most ``permissive'' character set, change |' '| on the 732 right of these assignment statements to |chr(i)|. 733 @^character set dependencies@> 734 @^system dependencies@> 735 736 @<Set init...@>= 737 for i:=0 to @'37 do xchr[i]:=' '; 738 for i:=@'177 to @'377 do xchr[i]:=' '; 739 740 @ The following system-independent code makes the |xord| array contain a 741 suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| 742 where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be 743 |j| or more; hence, standard ASCII code numbers will be used instead of 744 codes below @'40 in case there is a coincidence. 745 746 @<Set init...@>= 747 for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code; 748 for i:=@'200 to @'377 do xord[xchr[i]]:=i; 749 for i:=0 to @'176 do xord[xchr[i]]:=i; 750 751 @* \[3] Input and output. 752 The bane of portability is the fact that different operating systems treat 753 input and output quite differently, perhaps because computer scientists 754 have not given sufficient attention to this problem. People have felt somehow 755 that input and output are not part of ``real'' programming. Well, it is true 756 that some kinds of programming are more fun than others. With existing 757 input/output conventions being so diverse and so messy, the only sources of 758 joy in such parts of the code are the rare occasions when one can find a 759 way to make the program a little less bad than it might have been. We have 760 two choices, either to attack I/O now and get it over with, or to postpone 761 I/O until near the end. Neither prospect is very attractive, so let's 762 get it over with. 763 764 The basic operations we need to do are (1)~inputting and outputting of 765 text, to or from a file or the user's terminal; (2)~inputting and 766 outputting of eight-bit bytes, to or from a file; (3)~instructing the 767 operating system to initiate (``open'') or to terminate (``close'') input or 768 output from a specified file; (4)~testing whether the end of an input 769 file has been reached. 770 771 \TeX\ needs to deal with two kinds of files. 772 We shall use the term |alpha_file| for a file that contains textual data, 773 and the term |byte_file| for a file that contains eight-bit binary information. 774 These two types turn out to be the same on many computers, but 775 sometimes there is a significant distinction, so we shall be careful to 776 distinguish between them. Standard protocols for transferring 777 such files from computer to computer, via high-speed networks, are 778 now becoming available to more and more communities of users. 779 780 The program actually makes use also of a third kind of file, called a 781 |word_file|, when dumping and reloading base information for its own 782 initialization. We shall define a word file later; but it will be possible 783 for us to specify simple operations on word files before they are defined. 784 785 @<Types...@>= 786 @!eight_bits=0..255; {unsigned one-byte quantity} 787 @!alpha_file=packed file of text_char; {files that contain textual data} 788 @!byte_file=packed file of eight_bits; {files that contain binary data} 789 790 @ Most of what we need to do with respect to input and output can be handled 791 by the I/O facilities that are standard in \PASCAL, i.e., the routines 792 called |get|, |put|, |eof|, and so on. But 793 standard \PASCAL\ does not allow file variables to be associated with file 794 names that are determined at run time, so it cannot be used to implement 795 \TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| 796 is crucial for our purposes. We shall assume that |name_of_file| is a variable 797 of an appropriate type such that the \PASCAL\ run-time system being used to 798 implement \TeX\ can open a file whose external name is specified by 799 |name_of_file|. 800 @^system dependencies@> 801 802 @<Glob...@>= 803 @!name_of_file:packed array[1..file_name_size] of char;@;@/ 804 {on some systems this may be a \&{record} variable} 805 @!name_length:0..file_name_size;@/{this many characters are actually 806 relevant in |name_of_file| (the rest are blank)} 807 808 @ The \ph\ compiler with which the present version of \TeX\ was prepared has 809 extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, 810 we can write 811 $$\vbox{\halign{#\hfil\qquad&#\hfil\cr 812 |reset(f,@t\\{name}@>,'/O')|&for input;\cr 813 |rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ 814 The `\\{name}' parameter, which is of type `{\bf packed array 815 $[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of 816 the external file that is being opened for input or output. 817 Blank spaces that might appear in \\{name} are ignored. 818 819 The `\.{/O}' parameter tells the operating system not to issue its own 820 error messages if something goes wrong. If a file of the specified name 821 cannot be found, or if such a file cannot be opened for some other reason 822 (e.g., someone may already be trying to write the same file), we will have 823 |@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows 824 \TeX\ to undertake appropriate corrective action. 825 @:PASCAL H}{\ph@> 826 @^system dependencies@> 827 828 \TeX's file-opening procedures return |false| if no file identified by 829 |name_of_file| could be opened. 830 831 @d reset_OK(#)==erstat(#)=0 832 @d rewrite_OK(#)==erstat(#)=0 833 834 @p function a_open_in(var f:alpha_file):boolean; 835 {open a text file for input} 836 begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f); 837 end; 838 @# 839 function a_open_out(var f:alpha_file):boolean; 840 {open a text file for output} 841 begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f); 842 end; 843 @# 844 function b_open_in(var f:byte_file):boolean; 845 {open a binary file for input} 846 begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f); 847 end; 848 @# 849 function b_open_out(var f:byte_file):boolean; 850 {open a binary file for output} 851 begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f); 852 end; 853 @# 854 function w_open_in(var f:word_file):boolean; 855 {open a word file for input} 856 begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f); 857 end; 858 @# 859 function w_open_out(var f:word_file):boolean; 860 {open a word file for output} 861 begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f); 862 end; 863 864 @ Files can be closed with the \ph\ routine `|close(f)|', which 865 @:PASCAL H}{\ph@> 866 @^system dependencies@> 867 should be used when all input or output with respect to |f| has been completed. 868 This makes |f| available to be opened again, if desired; and if |f| was used for 869 output, the |close| operation makes the corresponding external file appear 870 on the user's area, ready to be read. 871 872 These procedures should not generate error messages if a file is 873 being closed before it has been successfully opened. 874 875 @p procedure a_close(var f:alpha_file); {close a text file} 876 begin close(f); 877 end; 878 @# 879 procedure b_close(var f:byte_file); {close a binary file} 880 begin close(f); 881 end; 882 @# 883 procedure w_close(var f:word_file); {close a word file} 884 begin close(f); 885 end; 886 887 @ Binary input and output are done with \PASCAL's ordinary |get| and |put| 888 procedures, so we don't have to make any other special arrangements for 889 binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. 890 The treatment of text input is more difficult, however, because 891 of the necessary translation to |ASCII_code| values. 892 \TeX's conventions should be efficient, and they should 893 blend nicely with the user's operating environment. 894 895 @ Input from text files is read one line at a time, using a routine called 896 |input_ln|. This function is defined in terms of global variables called 897 |buffer|, |first|, and |last| that will be described in detail later; for 898 now, it suffices for us to know that |buffer| is an array of |ASCII_code| 899 values, and that |first| and |last| are indices into this array 900 representing the beginning and ending of a line of text. 901 902 @<Glob...@>= 903 @!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read} 904 @!first:0..buf_size; {the first unused position in |buffer|} 905 @!last:0..buf_size; {end of the line just input to |buffer|} 906 @!max_buf_stack:0..buf_size; {largest index used in |buffer|} 907 908 @ The |input_ln| function brings the next line of input from the specified 909 file into available positions of the buffer array and returns the value 910 |true|, unless the file has already been entirely read, in which case it 911 returns |false| and sets |last:=first|. In general, the |ASCII_code| 912 numbers that represent the next line of the file are input into 913 |buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the 914 global variable |last| is set equal to |first| plus the length of the 915 line. Trailing blanks are removed from the line; thus, either |last=first| 916 (in which case the line was entirely blank) or |buffer[last-1]<>" "|. 917 918 An overflow error is given, however, if the normal actions of |input_ln| 919 would make |last>=buf_size|; this is done so that other parts of \TeX\ 920 can safely look at the contents of |buffer[last+1]| without overstepping 921 the bounds of the |buffer| array. Upon entry to |input_ln|, the condition 922 |first<buf_size| will always hold, so that there is always room for an 923 ``empty'' line. 924 925 The variable |max_buf_stack|, which is used to keep track of how large 926 the |buf_size| parameter must be to accommodate the present job, is 927 also kept up to date by |input_ln|. 928 929 If the |bypass_eoln| parameter is |true|, |input_ln| will do a |get| 930 before looking at the first character of the line; this skips over 931 an |eoln| that was in |f^|. The procedure does not do a |get| when it 932 reaches the end of the line; therefore it can be used to acquire input 933 from the user's terminal as well as from ordinary text files. 934 935 Standard \PASCAL\ says that a file should have |eoln| immediately 936 before |eof|, but \TeX\ needs only a weaker restriction: If |eof| 937 occurs in the middle of a line, the system function |eoln| should return 938 a |true| result (even though |f^| will be undefined). 939 940 Since the inner loop of |input_ln| is part of \TeX's ``inner loop''---each 941 character of input comes in at this place---it is wise to reduce system 942 overhead by making use of special routines that read in an entire array 943 of characters at once, if such routines are available. The following 944 code uses standard \PASCAL\ to illustrate what needs to be done, but 945 finer tuning is often possible at well-developed \PASCAL\ sites. 946 @^inner loop@> 947 948 @p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean; 949 {inputs the next line or returns |false|} 950 var last_nonblank:0..buf_size; {|last| with trailing blanks removed} 951 begin if bypass_eoln then if not eof(f) then get(f); 952 {input the first character of the line into |f^|} 953 last:=first; {cf.\ Matthew 19\thinspace:\thinspace30} 954 if eof(f) then input_ln:=false 955 else begin last_nonblank:=first; 956 while not eoln(f) do 957 begin if last>=max_buf_stack then 958 begin max_buf_stack:=last+1; 959 if max_buf_stack=buf_size then 960 @<Report overflow of the input buffer, and abort@>; 961 end; 962 buffer[last]:=xord[f^]; get(f); incr(last); 963 if buffer[last-1]<>" " then last_nonblank:=last; 964 end; 965 last:=last_nonblank; input_ln:=true; 966 end; 967 end; 968 969 @ The user's terminal acts essentially like other files of text, except 970 that it is used both for input and for output. When the terminal is 971 considered an input file, the file variable is called |term_in|, and when it 972 is considered an output file the file variable is |term_out|. 973 @^system dependencies@> 974 975 @<Glob...@>= 976 @!term_in:alpha_file; {the terminal as an input file} 977 @!term_out:alpha_file; {the terminal as an output file} 978 979 @ Here is how to open the terminal files 980 in \ph. The `\.{/I}' switch suppresses the first |get|. 981 @:PASCAL H}{\ph@> 982 @^system dependencies@> 983 984 @d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input} 985 @d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output} 986 987 @ Sometimes it is necessary to synchronize the input/output mixture that 988 happens on the user's terminal, and three system-dependent 989 procedures are used for this 990 purpose. The first of these, |update_terminal|, is called when we want 991 to make sure that everything we have output to the terminal so far has 992 actually left the computer's internal buffers and been sent. 993 The second, |clear_terminal|, is called when we wish to cancel any 994 input that the user may have typed ahead (since we are about to 995 issue an unexpected error message). The third, |wake_up_terminal|, 996 is supposed to revive the terminal if the user has disabled it by 997 some instruction to the operating system. The following macros show how 998 these operations can be specified in \ph: 999 @:PASCAL H}{\ph@> 1000 @^system dependencies@> 1001 1002 @d update_terminal == break(term_out) {empty the terminal output buffer} 1003 @d clear_terminal == break_in(term_in,true) {clear the terminal input buffer} 1004 @d wake_up_terminal == do_nothing {cancel the user's cancellation of output} 1005 1006 @ We need a special routine to read the first line of \TeX\ input from 1007 the user's terminal. This line is different because it is read before we 1008 have opened the transcript file; there is sort of a ``chicken and 1009 egg'' problem here. If the user types `\.{\\input paper}' on the first 1010 line, or if some macro invoked by that line does such an \.{\\input}, 1011 the transcript file will be named `\.{paper.log}'; but if no \.{\\input} 1012 commands are performed during the first line of terminal input, the transcript 1013 file will acquire its default name `\.{texput.log}'. (The transcript file 1014 will not contain error messages generated by the first line before the 1015 first \.{\\input} command.) 1016 @.texput@> 1017 1018 The first line is even more special if we are lucky enough to have an operating 1019 system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object 1020 program. It's nice to let the user start running a \TeX\ job by typing 1021 a command line like `\.{tex paper}'; in such a case, \TeX\ will operate 1022 as if the first line of input were `\.{paper}', i.e., the first line will 1023 consist of the remainder of the command line, after the part that invoked 1024 \TeX. 1025 1026 The first line is special also because it may be read before \TeX\ has 1027 input a format file. In such cases, normal error messages cannot yet 1028 be given. The following code uses concepts that will be explained later. 1029 (If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the 1030 @^system dependencies@> 1031 statement `|goto final_end|' should be replaced by something that 1032 quietly terminates the program.) 1033 1034 @<Report overflow of the input buffer, and abort@>= 1035 if format_ident=0 then 1036 begin write_ln(term_out,'Buffer size exceeded!'); goto final_end; 1037 @.Buffer size exceeded@> 1038 end 1039 else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1; 1040 overflow("buffer size",buf_size); 1041 @:TeX capacity exceeded buffer size}{\quad buffer size@> 1042 end 1043 1044 @ Different systems have different ways to get started. But regardless of 1045 what conventions are adopted, the routine that initializes the terminal 1046 should satisfy the following specifications: 1047 1048 \yskip\textindent{1)}It should open file |term_in| for input from the 1049 terminal. (The file |term_out| will already be open for output to the 1050 terminal.) 1051 1052 \textindent{2)}If the user has given a command line, this line should be 1053 considered the first line of terminal input. Otherwise the 1054 user should be prompted with `\.{**}', and the first line of input 1055 should be whatever is typed in response. 1056 1057 \textindent{3)}The first line of input, which might or might not be a 1058 command line, should appear in locations |first| to |last-1| of the 1059 |buffer| array. 1060 1061 \textindent{4)}The global variable |loc| should be set so that the 1062 character to be read next by \TeX\ is in |buffer[loc]|. This 1063 character should not be blank, and we should have |loc<last|. 1064 1065 \yskip\noindent(It may be necessary to prompt the user several times 1066 before a non-blank line comes in. The prompt is `\.{**}' instead of the 1067 later `\.*' because the meaning is slightly different: `\.{\\input}' need 1068 not be typed immediately after~`\.{**}'.) 1069 1070 @d loc==cur_input.loc_field {location of first unread character in |buffer|} 1071 1072 @ The following program does the required initialization 1073 without retrieving a possible command line. 1074 It should be clear how to modify this routine to deal with command lines, 1075 if the system permits them. 1076 @^system dependencies@> 1077 1078 @p function init_terminal:boolean; {gets the terminal input started} 1079 label exit; 1080 begin t_open_in; 1081 loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal; 1082 @.**@> 1083 if not input_ln(term_in,true) then {this shouldn't happen} 1084 begin write_ln(term_out); 1085 write(term_out,'! End of file on the terminal... why?'); 1086 @.End of file on the terminal@> 1087 init_terminal:=false; return; 1088 end; 1089 loc:=first; 1090 while (loc<last)and(buffer[loc]=" ") do incr(loc); 1091 if loc<last then 1092 begin init_terminal:=true; 1093 return; {return unless the line was all blank} 1094 end; 1095 write_ln(term_out,'Please type the name of your input file.'); 1096 end; 1097 exit:end; 1098 1099 @* \[4] String handling. 1100 Control sequence names and diagnostic messages are variable-length strings 1101 of eight-bit characters. Since \PASCAL\ does not have a well-developed string 1102 mechanism, \TeX\ does all of its string processing by homegrown methods. 1103 1104 Elaborate facilities for dynamic strings are not needed, so all of the 1105 necessary operations can be handled with a simple data structure. 1106 The array |str_pool| contains all of the (eight-bit) ASCII codes in all 1107 of the strings, and the array |str_start| contains indices of the starting 1108 points of each string. Strings are referred to by integer numbers, so that 1109 string number |s| comprises the characters |str_pool[j]| for 1110 |str_start[s]<=j<str_start[s+1]|. Additional integer variables 1111 |pool_ptr| and |str_ptr| indicate the number of entries used so far 1112 in |str_pool| and |str_start|, respectively; locations 1113 |str_pool[pool_ptr]| and |str_start[str_ptr]| are 1114 ready for the next string to be allocated. 1115 1116 String numbers 0 to 255 are reserved for strings that correspond to single 1117 ASCII characters. This is in accordance with the conventions of \.{WEB}, 1118 @.WEB@> 1119 which converts single-character strings into the ASCII code number of the 1120 single character involved, while it converts other strings into integers 1121 and builds a string pool file. Thus, when the string constant \.{"."} appears 1122 in the program below, \.{WEB} converts it into the integer 46, which is the 1123 ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} 1124 into some integer greater than~255. String number 46 will presumably be the 1125 single character `\..'; but some ASCII codes have no standard visible 1126 representation, and \TeX\ sometimes needs to be able to print an arbitrary 1127 ASCII character, so the first 256 strings are used to specify exactly what 1128 should be printed for each of the 256 possibilities. 1129 1130 Elements of the |str_pool| array must be ASCII codes that can actually 1131 be printed; i.e., they must have an |xchr| equivalent in the local 1132 character set. (This restriction applies only to preloaded strings, 1133 not to those generated dynamically by the user.) 1134 1135 Some \PASCAL\ compilers won't pack integers into a single byte unless the 1136 integers lie in the range |-128..127|. To accommodate such systems 1137 we access the string pool only via macros that can easily be redefined. 1138 @^system dependencies@> 1139 1140 @d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|} 1141 @d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|} 1142 1143 @<Types...@>= 1144 @!pool_pointer = 0..pool_size; {for variables that point into |str_pool|} 1145 @!str_number = 0..max_strings; {for variables that point into |str_start|} 1146 @!packed_ASCII_code = 0..255; {elements of |str_pool| array} 1147 1148 @ @<Glob...@>= 1149 @!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters} 1150 @!str_start : array[str_number] of pool_pointer; {the starting pointers} 1151 @!pool_ptr : pool_pointer; {first unused position in |str_pool|} 1152 @!str_ptr : str_number; {number of the current string being created} 1153 @!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|} 1154 @!init_str_ptr : str_number; {the starting value of |str_ptr|} 1155 1156 @ Several of the elementary string operations are performed using \.{WEB} 1157 macros instead of \PASCAL\ procedures, because many of the 1158 operations are done quite frequently and we want to avoid the 1159 overhead of procedure calls. For example, here is 1160 a simple macro that computes the length of a string. 1161 @.WEB@> 1162 1163 @d length(#)==(str_start[#+1]-str_start[#]) {the number of characters 1164 in string number \#} 1165 1166 @ The length of the current string is called |cur_length|: 1167 1168 @d cur_length == (pool_ptr - str_start[str_ptr]) 1169 1170 @ Strings are created by appending character codes to |str_pool|. 1171 The |append_char| macro, defined here, does not check to see if the 1172 value of |pool_ptr| has gotten too high; this test is supposed to be 1173 made before |append_char| is used. There is also a |flush_char| 1174 macro, which erases the last character appended. 1175 1176 To test if there is room to append |l| more characters to |str_pool|, 1177 we shall write |str_room(l)|, which aborts \TeX\ and gives an 1178 apologetic error message if there isn't enough room. 1179 1180 @d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} 1181 begin str_pool[pool_ptr]:=si(#); incr(pool_ptr); 1182 end 1183 @d flush_char == decr(pool_ptr) {forget the last character in the pool} 1184 @d str_room(#) == {make sure that the pool hasn't overflowed} 1185 begin if pool_ptr+# > pool_size then 1186 overflow("pool size",pool_size-init_pool_ptr); 1187 @:TeX capacity exceeded pool size}{\quad pool size@> 1188 end 1189 1190 @ Once a sequence of characters has been appended to |str_pool|, it 1191 officially becomes a string when the function |make_string| is called. 1192 This function returns the identification number of the new string as its 1193 value. 1194 1195 @p function make_string : str_number; {current string enters the pool} 1196 begin if str_ptr=max_strings then 1197 overflow("number of strings",max_strings-init_str_ptr); 1198 @:TeX capacity exceeded number of strings}{\quad number of strings@> 1199 incr(str_ptr); str_start[str_ptr]:=pool_ptr; 1200 make_string:=str_ptr-1; 1201 end; 1202 1203 @ To destroy the most recently made string, we say |flush_string|. 1204 1205 @d flush_string==begin decr(str_ptr); pool_ptr:=str_start[str_ptr]; 1206 end 1207 1208 @ The following subroutine compares string |s| with another string of the 1209 same length that appears in |buffer| starting at position |k|; 1210 the result is |true| if and only if the strings are equal. 1211 Empirical tests indicate that |str_eq_buf| is used in such a way that 1212 it tends to return |true| about 80 percent of the time. 1213 1214 @p function str_eq_buf(@!s:str_number;@!k:integer):boolean; 1215 {test equality of strings} 1216 label not_found; {loop exit} 1217 var j: pool_pointer; {running index} 1218 @!result: boolean; {result of comparison} 1219 begin j:=str_start[s]; 1220 while j<str_start[s+1] do 1221 begin if so(str_pool[j])<>buffer[k] then 1222 begin result:=false; goto not_found; 1223 end; 1224 incr(j); incr(k); 1225 end; 1226 result:=true; 1227 not_found: str_eq_buf:=result; 1228 end; 1229 1230 @ Here is a similar routine, but it compares two strings in the string pool, 1231 and it does not assume that they have the same length. 1232 1233 @p function str_eq_str(@!s,@!t:str_number):boolean; 1234 {test equality of strings} 1235 label not_found; {loop exit} 1236 var j,@!k: pool_pointer; {running indices} 1237 @!result: boolean; {result of comparison} 1238 begin result:=false; 1239 if length(s)<>length(t) then goto not_found; 1240 j:=str_start[s]; k:=str_start[t]; 1241 while j<str_start[s+1] do 1242 begin if str_pool[j]<>str_pool[k] then goto not_found; 1243 incr(j); incr(k); 1244 end; 1245 result:=true; 1246 not_found: str_eq_str:=result; 1247 end; 1248 1249 @ The initial values of |str_pool|, |str_start|, |pool_ptr|, 1250 and |str_ptr| are computed by the \.{INITEX} program, based in part 1251 on the information that \.{WEB} has output while processing \TeX. 1252 @.INITEX@> 1253 @^string pool@> 1254 1255 @p @!init function get_strings_started:boolean; {initializes the string pool, 1256 but returns |false| if something goes wrong} 1257 label done,exit; 1258 var k,@!l:0..255; {small indices or counters} 1259 @!m,@!n:text_char; {characters input from |pool_file|} 1260 @!g:str_number; {garbage} 1261 @!a:integer; {accumulator for check sum} 1262 @!c:boolean; {check sum has been checked} 1263 begin pool_ptr:=0; str_ptr:=0; str_start[0]:=0; 1264 @<Make the first 256 strings@>; 1265 @<Read the other strings from the \.{TEX.POOL} file and return |true|, 1266 or give an error message and return |false|@>; 1267 exit:end; 1268 tini 1269 1270 @ @d app_lc_hex(#)==l:=#; 1271 if l<10 then append_char(l+"0")@+else append_char(l-10+"a") 1272 1273 @<Make the first 256...@>= 1274 for k:=0 to 255 do 1275 begin if (@<Character |k| cannot be printed@>) then 1276 begin append_char("^"); append_char("^"); 1277 if k<@'100 then append_char(k+@'100) 1278 else if k<@'200 then append_char(k-@'100) 1279 else begin app_lc_hex(k div 16); app_lc_hex(k mod 16); 1280 end; 1281 end 1282 else append_char(k); 1283 g:=make_string; 1284 end 1285 1286 @ The first 128 strings will contain 95 standard ASCII characters, and the 1287 other 33 characters will be printed in three-symbol form like `\.{\^\^A}' 1288 unless a system-dependent change is made here. Installations that have 1289 an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|, 1290 would like string @'32 to be the single character @'32 instead of the 1291 three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand, 1292 even people with an extended character set will want to represent string 1293 @'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to 1294 produce visible strings instead of tabs or line-feeds or carriage-returns 1295 or bell-rings or characters that are treated anomalously in text files. 1296 1297 Unprintable characters of codes 128--255 are, similarly, rendered 1298 \.{\^\^80}--\.{\^\^ff}. 1299 1300 The boolean expression defined here should be |true| unless \TeX\ 1301 internal code number~|k| corresponds to a non-troublesome visible 1302 symbol in the local character set. An appropriate formula for the 1303 extended character set recommended in {\sl The \TeX book\/} would, for 1304 example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'. 1305 If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or 1306 |k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46, 1307 @'60..@'71, @'136, @'141..@'146, @'160..@'171]| must be printable. 1308 Thus, at least 80 printable characters are needed. 1309 @:TeXbook}{\sl The \TeX book@> 1310 @^character set dependencies@> 1311 @^system dependencies@> 1312 1313 @<Character |k| cannot be printed@>= 1314 (k<" ")or(k>"~") 1315 1316 @ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB} 1317 description that you are now reading, it outputs the \PASCAL\ program 1318 \.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX} 1319 @.WEB@>@.INITEX@> 1320 program reads the latter file, where each string appears as a two-digit decimal 1321 length followed by the string itself, and the information is recorded in 1322 \TeX's string memory. 1323 1324 @<Glob...@>= 1325 @!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}} 1326 tini 1327 1328 @ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#); 1329 a_close(pool_file); get_strings_started:=false; return; 1330 end 1331 @<Read the other strings...@>= 1332 name_of_file:=pool_name; {we needn't set |name_length|} 1333 if a_open_in(pool_file) then 1334 begin c:=false; 1335 repeat @<Read one string, but return |false| if the 1336 string memory space is getting too tight for comfort@>; 1337 until c; 1338 a_close(pool_file); get_strings_started:=true; 1339 end 1340 else bad_pool('! I can''t read TEX.POOL.') 1341 @.I can't read TEX.POOL@> 1342 1343 @ @<Read one string...@>= 1344 begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.'); 1345 @.TEX.POOL has no check sum@> 1346 read(pool_file,m,n); {read two digits of string length} 1347 if m='*' then @<Check the pool check sum@> 1348 else begin if (xord[m]<"0")or(xord[m]>"9")or@| 1349 (xord[n]<"0")or(xord[n]>"9") then 1350 bad_pool('! TEX.POOL line doesn''t begin with two digits.'); 1351 @.TEX.POOL line doesn't...@> 1352 l:=xord[m]*10+xord[n]-"0"*11; {compute the length} 1353 if pool_ptr+l+string_vacancies>pool_size then 1354 bad_pool('! You have to increase POOLSIZE.'); 1355 @.You have to increase POOLSIZE@> 1356 for k:=1 to l do 1357 begin if eoln(pool_file) then m:=' '@+else read(pool_file,m); 1358 append_char(xord[m]); 1359 end; 1360 read_ln(pool_file); g:=make_string; 1361 end; 1362 end 1363 1364 @ The \.{WEB} operation \.{@@\$} denotes the value that should be at the 1365 end of this \.{TEX.POOL} file; any other value means that the wrong pool 1366 file has been loaded. 1367 @^check sum@> 1368 1369 @<Check the pool check sum@>= 1370 begin a:=0; k:=1; 1371 loop@+ begin if (xord[n]<"0")or(xord[n]>"9") then 1372 bad_pool('! TEX.POOL check sum doesn''t have nine digits.'); 1373 @.TEX.POOL check sum...@> 1374 a:=10*a+xord[n]-"0"; 1375 if k=9 then goto done; 1376 incr(k); read(pool_file,n); 1377 end; 1378 done: if a<>@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.'); 1379 @.TEX.POOL doesn't match@> 1380 c:=true; 1381 end 1382 1383 @* \[5] On-line and off-line printing. 1384 Messages that are sent to a user's terminal and to the transcript-log file 1385 are produced by several `|print|' procedures. These procedures will 1386 direct their output to a variety of places, based on the setting of 1387 the global variable |selector|, which has the following possible 1388 values: 1389 1390 \yskip 1391 \hang |term_and_log|, the normal setting, prints on the terminal and on the 1392 transcript file. 1393 1394 \hang |log_only|, prints only on the transcript file. 1395 1396 \hang |term_only|, prints only on the terminal. 1397 1398 \hang |no_print|, doesn't print at all. This is used only in rare cases 1399 before the transcript file is open. 1400 1401 \hang |pseudo|, puts output into a cyclic buffer that is used 1402 by the |show_context| routine; when we get to that routine we shall discuss 1403 the reasoning behind this curious mode. 1404 1405 \hang |new_string|, appends the output to the current string in the 1406 string pool. 1407 1408 \hang 0 to 15, prints on one of the sixteen files for \.{\\write} output. 1409 1410 \yskip 1411 \noindent The symbolic names `|term_and_log|', etc., have been assigned 1412 numeric codes that satisfy the convenient relations |no_print+1=term_only|, 1413 |no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|. 1414 1415 Three additional global variables, |tally| and |term_offset| and 1416 |file_offset|, record the number of characters that have been printed 1417 since they were most recently cleared to zero. We use |tally| to record 1418 the length of (possibly very long) stretches of printing; |term_offset| 1419 and |file_offset|, on the other hand, keep track of how many characters 1420 have appeared so far on the current line that has been output to the 1421 terminal or to the transcript file, respectively. 1422 1423 @d no_print=16 {|selector| setting that makes data disappear} 1424 @d term_only=17 {printing is destined for the terminal only} 1425 @d log_only=18 {printing is destined for the transcript file only} 1426 @d term_and_log=19 {normal |selector| setting} 1427 @d pseudo=20 {special |selector| setting for |show_context|} 1428 @d new_string=21 {printing is deflected to the string pool} 1429 @d max_selector=21 {highest selector setting} 1430 1431 @<Glob...@>= 1432 @!log_file : alpha_file; {transcript of \TeX\ session} 1433 @!selector : 0..max_selector; {where to print a message} 1434 @!dig : array[0..22] of 0..15; {digits in a number being output} 1435 @!tally : integer; {the number of characters recently printed} 1436 @!term_offset : 0..max_print_line; 1437 {the number of characters on the current terminal line} 1438 @!file_offset : 0..max_print_line; 1439 {the number of characters on the current file line} 1440 @!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for 1441 pseudoprinting} 1442 @!trick_count: integer; {threshold for pseudoprinting, explained later} 1443 @!first_count: integer; {another variable for pseudoprinting} 1444 1445 @ @<Initialize the output routines@>= 1446 selector:=term_only; tally:=0; term_offset:=0; file_offset:=0; 1447 1448 @ Macro abbreviations for output to the terminal and to the log file are 1449 defined here for convenience. Some systems need special conventions 1450 for terminal output, and it is possible to adhere to those conventions 1451 by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section. 1452 @^system dependencies@> 1453 1454 @d wterm(#)==write(term_out,#) 1455 @d wterm_ln(#)==write_ln(term_out,#) 1456 @d wterm_cr==write_ln(term_out) 1457 @d wlog(#)==write(log_file,#) 1458 @d wlog_ln(#)==write_ln(log_file,#) 1459 @d wlog_cr==write_ln(log_file) 1460 1461 @ To end a line of text output, we call |print_ln|. 1462 1463 @<Basic print...@>= 1464 procedure print_ln; {prints an end-of-line} 1465 begin case selector of 1466 term_and_log: begin wterm_cr; wlog_cr; 1467 term_offset:=0; file_offset:=0; 1468 end; 1469 log_only: begin wlog_cr; file_offset:=0; 1470 end; 1471 term_only: begin wterm_cr; term_offset:=0; 1472 end; 1473 no_print,pseudo,new_string: do_nothing; 1474 othercases write_ln(write_file[selector]) 1475 endcases;@/ 1476 end; {|tally| is not affected} 1477 1478 @ The |print_char| procedure sends one character to the desired destination, 1479 using the |xchr| array to map it into an external character compatible with 1480 |input_ln|. All printing comes through |print_ln| or |print_char|. 1481 1482 @<Basic printing...@>= 1483 procedure print_char(@!s:ASCII_code); {prints a single character} 1484 label exit; 1485 begin if @<Character |s| is the current new-line character@> then 1486 if selector<pseudo then 1487 begin print_ln; return; 1488 end; 1489 case selector of 1490 term_and_log: begin wterm(xchr[s]); wlog(xchr[s]); 1491 incr(term_offset); incr(file_offset); 1492 if term_offset=max_print_line then 1493 begin wterm_cr; term_offset:=0; 1494 end; 1495 if file_offset=max_print_line then 1496 begin wlog_cr; file_offset:=0; 1497 end; 1498 end; 1499 log_only: begin wlog(xchr[s]); incr(file_offset); 1500 if file_offset=max_print_line then print_ln; 1501 end; 1502 term_only: begin wterm(xchr[s]); incr(term_offset); 1503 if term_offset=max_print_line then print_ln; 1504 end; 1505 no_print: do_nothing; 1506 pseudo: if tally<trick_count then trick_buf[tally mod error_line]:=s; 1507 new_string: begin if pool_ptr<pool_size then append_char(s); 1508 end; {we drop characters if the string space is full} 1509 othercases write(write_file[selector],xchr[s]) 1510 endcases;@/ 1511 incr(tally); 1512 exit:end; 1513 1514 @ An entire string is output by calling |print|. Note that if we are outputting 1515 the single standard ASCII character \.c, we could call |print("c")|, since 1516 |"c"=99| is the number of a single-character string, as explained above. But 1517 |print_char("c")| is quicker, so \TeX\ goes directly to the |print_char| 1518 routine when it knows that this is safe. (The present implementation 1519 assumes that it is always safe to print a visible ASCII character.) 1520 @^system dependencies@> 1521 1522 @<Basic print...@>= 1523 procedure print(@!s:integer); {prints string |s|} 1524 label exit; 1525 var j:pool_pointer; {current character code position} 1526 @!nl:integer; {new-line character to restore} 1527 begin if s>=str_ptr then s:="???" {this can't happen} 1528 @.???@> 1529 else if s<256 then 1530 if s<0 then s:="???" {can't happen} 1531 else begin if selector>pseudo then 1532 begin print_char(s); return; {internal strings are not expanded} 1533 end; 1534 if (@<Character |s| is the current new-line character@>) then 1535 if selector<pseudo then 1536 begin print_ln; return; 1537 end; 1538 nl:=new_line_char; new_line_char:=-1; 1539 {temporarily disable new-line character} 1540 j:=str_start[s]; 1541 while j<str_start[s+1] do 1542 begin print_char(so(str_pool[j])); incr(j); 1543 end; 1544 new_line_char:=nl; return; 1545 end; 1546 j:=str_start[s]; 1547 while j<str_start[s+1] do 1548 begin print_char(so(str_pool[j])); incr(j); 1549 end; 1550 exit:end; 1551 1552 @ Control sequence names, file names, and strings constructed with 1553 \.{\\string} might contain |ASCII_code| values that can't 1554 be printed using |print_char|. Therefore we use |slow_print| for them: 1555 1556 @<Basic print...@>= 1557 procedure slow_print(@!s:integer); {prints string |s|} 1558 var j:pool_pointer; {current character code position} 1559 begin if (s>=str_ptr) or (s<256) then print(s) 1560 else begin j:=str_start[s]; 1561 while j<str_start[s+1] do 1562 begin print(so(str_pool[j])); incr(j); 1563 end; 1564 end; 1565 end; 1566 1567 @ Here is the very first thing that \TeX\ prints: a headline that identifies 1568 the version number and format package. The |term_offset| variable is temporarily 1569 incorrect, but the discrepancy is not serious since we assume that this 1570 part of the program is system dependent. 1571 @^system dependencies@> 1572 1573 @<Initialize the output...@>= 1574 wterm(banner); 1575 if format_ident=0 then wterm_ln(' (no format preloaded)') 1576 else begin slow_print(format_ident); print_ln; 1577 end; 1578 update_terminal; 1579 1580 @ The procedure |print_nl| is like |print|, but it makes sure that the 1581 string appears at the beginning of a new line. 1582 1583 @<Basic print...@>= 1584 procedure print_nl(@!s:str_number); {prints string |s| at beginning of line} 1585 begin if ((term_offset>0)and(odd(selector)))or@| 1586 ((file_offset>0)and(selector>=log_only)) then print_ln; 1587 print(s); 1588 end; 1589 1590 @ The procedure |print_esc| prints a string that is preceded by 1591 the user's escape character (which is usually a backslash). 1592 1593 @<Basic print...@>= 1594 procedure print_esc(@!s:str_number); {prints escape character, then |s|} 1595 var c:integer; {the escape character code} 1596 begin @<Set variable |c| to the current escape character@>; 1597 if c>=0 then if c<256 then print(c); 1598 slow_print(s); 1599 end; 1600 1601 @ An array of digits in the range |0..15| is printed by |print_the_digs|. 1602 1603 @<Basic print...@>= 1604 procedure print_the_digs(@!k:eight_bits); 1605 {prints |dig[k-1]|$\,\ldots\,$|dig[0]|} 1606 begin while k>0 do 1607 begin decr(k); 1608 if dig[k]<10 then print_char("0"+dig[k]) 1609 else print_char("A"-10+dig[k]); 1610 end; 1611 end; 1612 1613 @ The following procedure, which prints out the decimal representation of a 1614 given integer |n|, has been written carefully so that it works properly 1615 if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div| 1616 to negative arguments, since such operations are not implemented consistently 1617 by all \PASCAL\ compilers. 1618 1619 @<Basic print...@>= 1620 procedure print_int(@!n:integer); {prints an integer in decimal form} 1621 var k:0..23; {index to current digit; we assume that $\vert n\vert<10^{23}$} 1622 @!m:integer; {used to negate |n| in possibly dangerous cases} 1623 begin k:=0; 1624 if n<0 then 1625 begin print_char("-"); 1626 if n>-100000000 then negate(n) 1627 else begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1; 1628 if m<10 then dig[0]:=m 1629 else begin dig[0]:=0; incr(n); 1630 end; 1631 end; 1632 end; 1633 repeat dig[k]:=n mod 10; n:=n div 10; incr(k); 1634 until n=0; 1635 print_the_digs(k); 1636 end; 1637 1638 @ Here is a trivial procedure to print two digits; it is usually called with 1639 a parameter in the range |0<=n<=99|. 1640 1641 @p procedure print_two(@!n:integer); {prints two least significant digits} 1642 begin n:=abs(n) mod 100; print_char("0"+(n div 10)); 1643 print_char("0"+(n mod 10)); 1644 end; 1645 1646 @ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|. 1647 1648 @p procedure print_hex(@!n:integer); 1649 {prints a positive integer in hexadecimal form} 1650 var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$} 1651 begin k:=0; print_char(""""); 1652 repeat dig[k]:=n mod 16; n:=n div 16; incr(k); 1653 until n=0; 1654 print_the_digs(k); 1655 end; 1656 1657 @ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function 1658 is now subsumed by |print|. We retain the old name here as a possible aid to 1659 future software arch\ae ologists. 1660 1661 @d print_ASCII == print 1662 1663 @ Roman numerals are produced by the |print_roman_int| routine. Readers 1664 who like puzzles might enjoy trying to figure out how this tricky code 1665 works; therefore no explanation will be given. Notice that 1990 yields 1666 \.{mcmxc}, not \.{mxm}. 1667 1668 @p procedure print_roman_int(@!n:integer); 1669 label exit; 1670 var j,@!k: pool_pointer; {mysterious indices into |str_pool|} 1671 @!u,@!v: nonnegative_integer; {mysterious numbers} 1672 begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000; 1673 loop@+ begin while n>=v do 1674 begin print_char(so(str_pool[j])); n:=n-v; 1675 end; 1676 if n<=0 then return; {nonpositive input produces no output} 1677 k:=j+2; u:=v div (so(str_pool[k-1])-"0"); 1678 if str_pool[k-1]=si("2") then 1679 begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); 1680 end; 1681 if n+u>=v then 1682 begin print_char(so(str_pool[k])); n:=n+u; 1683 end 1684 else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); 1685 end; 1686 end; 1687 exit:end; 1688 1689 @ The |print| subroutine will not print a string that is still being 1690 created. The following procedure will. 1691 1692 @p procedure print_current_string; {prints a yet-unmade string} 1693 var j:pool_pointer; {points to current character code} 1694 begin j:=str_start[str_ptr]; 1695 while j<pool_ptr do 1696 begin print_char(so(str_pool[j])); incr(j); 1697 end; 1698 end; 1699 1700 @ Here is a procedure that asks the user to type a line of input, 1701 assuming that the |selector| setting is either |term_only| or |term_and_log|. 1702 The input is placed into locations |first| through |last-1| of the 1703 |buffer| array, and echoed on the transcript file if appropriate. 1704 1705 This procedure is never called when |interaction<scroll_mode|. 1706 1707 @d prompt_input(#)==begin wake_up_terminal; print(#); term_input; 1708 end {prints a string and gets a line of input} 1709 1710 @p procedure term_input; {gets a line from the terminal} 1711 var k:0..buf_size; {index into |buffer|} 1712 begin update_terminal; {now the user sees the prompt for sure} 1713 if not input_ln(term_in,true) then fatal_error("End of file on the terminal!"); 1714 @.End of file on the terminal@> 1715 term_offset:=0; {the user's line ended with \<\rm return>} 1716 decr(selector); {prepare to echo the input} 1717 if last<>first then for k:=first to last-1 do print(buffer[k]); 1718 print_ln; incr(selector); {restore previous status} 1719 end; 1720 1721 @* \[6] Reporting errors. 1722 When something anomalous is detected, \TeX\ typically does something like this: 1723 $$\vbox{\halign{#\hfil\cr 1724 |print_err("Something anomalous has been detected");|\cr 1725 |help3("This is the first line of my offer to help.")|\cr 1726 |("This is the second line. I'm trying to")|\cr 1727 |("explain the best way for you to proceed.");|\cr 1728 |error;|\cr}}$$ 1729 A two-line help message would be given using |help2|, etc.; these informal 1730 helps should use simple vocabulary that complements the words used in the 1731 official error message that was printed. (Outside the U.S.A., the help 1732 messages should preferably be translated into the local vernacular. Each 1733 line of help is at most 60 characters long, in the present implementation, 1734 so that |max_print_line| will not be exceeded.) 1735 1736 The |print_err| procedure supplies a `\.!' before the official message, 1737 and makes sure that the terminal is awake if a stop is going to occur. 1738 The |error| procedure supplies a `\..' after the official message, then it 1739 shows the location of the error; and if |interaction=error_stop_mode|, 1740 it also enters into a dialog with the user, during which time the help 1741 message may be printed. 1742 @^system dependencies@> 1743 1744 @ The global variable |interaction| has four settings, representing increasing 1745 amounts of user interaction: 1746 1747 @d batch_mode=0 {omits all stops and omits terminal output} 1748 @d nonstop_mode=1 {omits all stops} 1749 @d scroll_mode=2 {omits error stops} 1750 @d error_stop_mode=3 {stops at every opportunity to interact} 1751 @d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal; 1752 print_nl("! "); print(#); 1753 end 1754 1755 @<Glob...@>= 1756 @!interaction:batch_mode..error_stop_mode; {current level of interaction} 1757 1758 @ @<Set init...@>=interaction:=error_stop_mode; 1759 1760 @ \TeX\ is careful not to call |error| when the print |selector| setting 1761 might be unusual. The only possible values of |selector| at the time of 1762 error messages are 1763 1764 \yskip\hang|no_print| (when |interaction=batch_mode| 1765 and |log_file| not yet open); 1766 1767 \hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open); 1768 1769 \hang|log_only| (when |interaction=batch_mode| and |log_file| is open); 1770 1771 \hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open). 1772 1773 @<Initialize the print |selector| based on |interaction|@>= 1774 if interaction=batch_mode then selector:=no_print@+else selector:=term_only 1775 1776 @ A global variable |deletions_allowed| is set |false| if the |get_next| 1777 routine is active when |error| is called; this ensures that |get_next| 1778 and related routines like |get_token| will never be called recursively. 1779 A similar interlock is provided by |set_box_allowed|. 1780 @^recursion@> 1781 1782 The global variable |history| records the worst level of error that 1783 has been detected. It has four possible values: |spotless|, |warning_issued|, 1784 |error_message_issued|, and |fatal_error_stop|. 1785 1786 Another global variable, |error_count|, is increased by one when an 1787 |error| occurs without an interactive dialog, and it is reset to zero at 1788 the end of every paragraph. If |error_count| reaches 100, \TeX\ decides 1789 that there is no point in continuing further. 1790 1791 @d spotless=0 {|history| value when nothing has been amiss yet} 1792 @d warning_issued=1 {|history| value when |begin_diagnostic| has been called} 1793 @d error_message_issued=2 {|history| value when |error| has been called} 1794 @d fatal_error_stop=3 {|history| value when termination was premature} 1795 1796 @<Glob...@>= 1797 @!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?} 1798 @!set_box_allowed:boolean; {is it safe to do a \.{\\setbox} assignment?} 1799 @!history:spotless..fatal_error_stop; {has the source input been clean so far?} 1800 @!error_count:-1..100; {the number of scrolled errors since the 1801 last paragraph ended} 1802 1803 @ The value of |history| is initially |fatal_error_stop|, but it will 1804 be changed to |spotless| if \TeX\ survives the initialization process. 1805 1806 @<Set init...@>= 1807 deletions_allowed:=true; set_box_allowed:=true; 1808 error_count:=0; {|history| is initialized elsewhere} 1809 1810 @ Since errors can be detected almost anywhere in \TeX, we want to declare the 1811 error procedures near the beginning of the program. But the error procedures 1812 in turn use some other procedures, which need to be declared |forward| 1813 before we get to |error| itself. 1814 1815 It is possible for |error| to be called recursively if some error arises 1816 when |get_token| is being used to delete a token, and/or if some fatal error 1817 occurs while \TeX\ is trying to fix a non-fatal one. But such recursion 1818 @^recursion@> 1819 is never more than two levels deep. 1820 1821 @<Error handling...@>= 1822 procedure@?normalize_selector; forward;@t\2@>@/ 1823 procedure@?get_token; forward;@t\2@>@/ 1824 procedure@?term_input; forward;@t\2@>@/ 1825 procedure@?show_context; forward;@t\2@>@/ 1826 procedure@?begin_file_reading; forward;@t\2@>@/ 1827 procedure@?open_log_file; forward;@t\2@>@/ 1828 procedure@?close_files_and_terminate; forward;@t\2@>@/ 1829 procedure@?clear_for_error_prompt; forward;@t\2@>@/ 1830 procedure@?give_err_help; forward;@t\2@>@/ 1831 @t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help; 1832 forward;@;@+gubed 1833 1834 @ Individual lines of help are recorded in the array |help_line|, which 1835 contains entries in positions |0..(help_ptr-1)|. They should be printed 1836 in reverse order, i.e., with |help_line[0]| appearing last. 1837 1838 @d hlp1(#)==help_line[0]:=#;@+end 1839 @d hlp2(#)==help_line[1]:=#; hlp1 1840 @d hlp3(#)==help_line[2]:=#; hlp2 1841 @d hlp4(#)==help_line[3]:=#; hlp3 1842 @d hlp5(#)==help_line[4]:=#; hlp4 1843 @d hlp6(#)==help_line[5]:=#; hlp5 1844 @d help0==help_ptr:=0 {sometimes there might be no help} 1845 @d help1==@+begin help_ptr:=1; hlp1 {use this with one help line} 1846 @d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines} 1847 @d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines} 1848 @d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines} 1849 @d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines} 1850 @d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines} 1851 1852 @<Glob...@>= 1853 @!help_line:array[0..5] of str_number; {helps for the next |error|} 1854 @!help_ptr:0..6; {the number of help lines present} 1855 @!use_err_help:boolean; {should the |err_help| list be shown?} 1856 1857 @ @<Set init...@>= 1858 help_ptr:=0; use_err_help:=false; 1859 1860 @ The |jump_out| procedure just cuts across all active procedure levels and 1861 goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the 1862 whole program. It is used when there is no recovery from a particular error. 1863 1864 Some \PASCAL\ compilers do not implement non-local |goto| statements. 1865 @^system dependencies@> 1866 In such cases the body of |jump_out| should simply be 1867 `|close_files_and_terminate|;\thinspace' followed by a call on some system 1868 procedure that quietly terminates the program. 1869 1870 @<Error hand...@>= 1871 procedure jump_out; 1872 begin goto end_of_TEX; 1873 end; 1874 1875 @ Here now is the general |error| routine. 1876 1877 @<Error hand...@>= 1878 procedure error; {completes the job of error reporting} 1879 label continue,exit; 1880 var c:ASCII_code; {what the user types} 1881 @!s1,@!s2,@!s3,@!s4:integer; 1882 {used to save global variables when deleting tokens} 1883 begin if history<error_message_issued then history:=error_message_issued; 1884 print_char("."); show_context; 1885 if interaction=error_stop_mode then 1886 @<Get user's advice and |return|@>; 1887 incr(error_count); 1888 if error_count=100 then 1889 begin print_nl("(That makes 100 errors; please try again.)"); 1890 @.That makes 100 errors...@> 1891 history:=fatal_error_stop; jump_out; 1892 end; 1893 @<Put help message on the transcript file@>; 1894 exit:end; 1895 1896 @ @<Get user's advice...@>= 1897 loop@+begin continue: if interaction<>error_stop_mode then return; 1898 clear_for_error_prompt; prompt_input("? "); 1899 @.?\relax@> 1900 if last=first then return; 1901 c:=buffer[first]; 1902 if c>="a" then c:=c+"A"-"a"; {convert to uppercase} 1903 @<Interpret code |c| and |return| if done@>; 1904 end 1905 1906 @ It is desirable to provide an `\.E' option here that gives the user 1907 an easy way to return from \TeX\ to the system editor, with the offending 1908 line ready to be edited. But such an extension requires some system 1909 wizardry, so the present implementation simply types out the name of the 1910 file that should be 1911 edited and the relevant line number. 1912 @^system dependencies@> 1913 1914 There is a secret `\.D' option available when the debugging routines haven't 1915 been commented~out. 1916 @^debugging@> 1917 1918 @<Interpret code |c| and |return| if done@>= 1919 case c of 1920 "0","1","2","3","4","5","6","7","8","9": if deletions_allowed then 1921 @<Delete \(c)|c-"0"| tokens and |goto continue|@>; 1922 @t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/ 1923 "E": if base_ptr>0 then if input_stack[base_ptr].name_field>=256 then 1924 begin print_nl("You want to edit file "); 1925 @.You want to edit file x@> 1926 slow_print(input_stack[base_ptr].name_field); 1927 print(" at line "); print_int(line); 1928 interaction:=scroll_mode; jump_out; 1929 end; 1930 "H": @<Print the help information and |goto continue|@>; 1931 "I":@<Introduce new material from the terminal and |return|@>; 1932 "Q","R","S":@<Change the interaction level and |return|@>; 1933 "X":begin interaction:=scroll_mode; jump_out; 1934 end; 1935 othercases do_nothing 1936 endcases;@/ 1937 @<Print the menu of available options@> 1938 1939 @ @<Print the menu...@>= 1940 begin print("Type <return> to proceed, S to scroll future error messages,");@/ 1941 @.Type <return> to proceed...@> 1942 print_nl("R to run without stopping, Q to run quietly,");@/ 1943 print_nl("I to insert something, "); 1944 if base_ptr>0 then if input_stack[base_ptr].name_field>=256 then 1945 print("E to edit your file,"); 1946 if deletions_allowed then 1947 print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); 1948 print_nl("H for help, X to quit."); 1949 end 1950 1951 @ Here the author of \TeX\ apologizes for making use of the numerical 1952 relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings 1953 |batch_mode|, |nonstop_mode|, |scroll_mode|. 1954 @^Knuth, Donald Ervin@> 1955 1956 @<Change the interaction...@>= 1957 begin error_count:=0; interaction:=batch_mode+c-"Q"; 1958 print("OK, entering "); 1959 case c of 1960 "Q":begin print_esc("batchmode"); decr(selector); 1961 end; 1962 "R":print_esc("nonstopmode"); 1963 "S":print_esc("scrollmode"); 1964 end; {there are no other cases} 1965 print("..."); print_ln; update_terminal; return; 1966 end 1967 1968 @ When the following code is executed, |buffer[(first+1)..(last-1)]| may 1969 contain the material inserted by the user; otherwise another prompt will 1970 be given. In order to understand this part of the program fully, you need 1971 to be familiar with \TeX's input stacks. 1972 1973 @<Introduce new material...@>= 1974 begin begin_file_reading; {enter a new syntactic level for terminal input} 1975 {now |state=mid_line|, so an initial blank space will count as a blank} 1976 if last>first+1 then 1977 begin loc:=first+1; buffer[first]:=" "; 1978 end 1979 else begin prompt_input("insert>"); loc:=first; 1980 @.insert>@> 1981 end; 1982 first:=last; 1983 cur_input.limit_field:=last-1; {no |end_line_char| ends this line} 1984 return; 1985 end 1986 1987 @ We allow deletion of up to 99 tokens at a time. 1988 1989 @<Delete \(c)|c-"0"| tokens...@>= 1990 begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state; 1991 align_state:=1000000; OK_to_interrupt:=false; 1992 if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then 1993 c:=c*10+buffer[first+1]-"0"*11 1994 else c:=c-"0"; 1995 while c>0 do 1996 begin get_token; {one-level recursive call of |error| is possible} 1997 decr(c); 1998 end; 1999 cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true; 2000 help2("I have just deleted some text, as you asked.")@/ 2001 ("You can now delete more, or insert, or whatever."); 2002 show_context; goto continue; 2003 end 2004 2005 @ @<Print the help info...@>= 2006 begin if use_err_help then 2007 begin give_err_help; use_err_help:=false; 2008 end 2009 else begin if help_ptr=0 then 2010 help2("Sorry, I don't know how to help in this situation.")@/ 2011 @t\kern1em@>("Maybe you should try asking a human?"); 2012 repeat decr(help_ptr); print(help_line[help_ptr]); print_ln; 2013 until help_ptr=0; 2014 end; 2015 help4("Sorry, I already gave what help I could...")@/ 2016 ("Maybe you should try asking a human?")@/ 2017 ("An error might have occurred before I noticed any problems.")@/ 2018 ("``If all else fails, read the instructions.''");@/ 2019 goto continue; 2020 end 2021 2022 @ @<Put help message on the transcript file@>= 2023 if interaction>batch_mode then decr(selector); {avoid terminal output} 2024 if use_err_help then 2025 begin print_ln; give_err_help; 2026 end 2027 else while help_ptr>0 do 2028 begin decr(help_ptr); print_nl(help_line[help_ptr]); 2029 end; 2030 print_ln; 2031 if interaction>batch_mode then incr(selector); {re-enable terminal output} 2032 print_ln 2033 2034 @ A dozen or so error messages end with a parenthesized integer, so we 2035 save a teeny bit of program space by declaring the following procedure: 2036 2037 @p procedure int_error(@!n:integer); 2038 begin print(" ("); print_int(n); print_char(")"); error; 2039 end; 2040 2041 @ In anomalous cases, the print selector might be in an unknown state; 2042 the following subroutine is called to fix things just enough to keep 2043 running a bit longer. 2044 2045 @p procedure normalize_selector; 2046 begin if log_opened then selector:=term_and_log 2047 else selector:=term_only; 2048 if job_name=0 then open_log_file; 2049 if interaction=batch_mode then decr(selector); 2050 end; 2051 2052 @ The following procedure prints \TeX's last words before dying. 2053 2054 @d succumb==begin if interaction=error_stop_mode then 2055 interaction:=scroll_mode; {no more interaction} 2056 if log_opened then error; 2057 @!debug if interaction>batch_mode then debug_help;@+gubed@;@/ 2058 history:=fatal_error_stop; jump_out; {irrecoverable error} 2059 end 2060 2061 @<Error hand...@>= 2062 procedure fatal_error(@!s:str_number); {prints |s|, and that's it} 2063 begin normalize_selector;@/ 2064 print_err("Emergency stop"); help1(s); succumb; 2065 @.Emergency stop@> 2066 end; 2067 2068 @ Here is the most dreaded error message. 2069 2070 @<Error hand...@>= 2071 procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness} 2072 begin normalize_selector; 2073 print_err("TeX capacity exceeded, sorry ["); 2074 @.TeX capacity exceeded ...@> 2075 print(s); print_char("="); print_int(n); print_char("]"); 2076 help2("If you really absolutely need more capacity,")@/ 2077 ("you can ask a wizard to enlarge me."); 2078 succumb; 2079 end; 2080 2081 @ The program might sometime run completely amok, at which point there is 2082 no choice but to stop. If no previous error has been detected, that's bad 2083 news; a message is printed that is really intended for the \TeX\ 2084 maintenance person instead of the user (unless the user has been 2085 particularly diabolical). The index entries for `this can't happen' may 2086 help to pinpoint the problem. 2087 @^dry rot@> 2088 2089 @<Error hand...@>= 2090 procedure confusion(@!s:str_number); 2091 {consistency check violated; |s| tells where} 2092 begin normalize_selector; 2093 if history<error_message_issued then 2094 begin print_err("This can't happen ("); print(s); print_char(")"); 2095 @.This can't happen@> 2096 help1("I'm broken. Please show this to someone who can fix can fix"); 2097 end 2098 else begin print_err("I can't go on meeting you like this"); 2099 @.I can't go on...@> 2100 help2("One of your faux pas seems to have wounded me deeply...")@/ 2101 ("in fact, I'm barely conscious. Please fix it and try again."); 2102 end; 2103 succumb; 2104 end; 2105 2106 @ Users occasionally want to interrupt \TeX\ while it's running. 2107 If the \PASCAL\ runtime system allows this, one can implement 2108 a routine that sets the global variable |interrupt| to some nonzero value 2109 when such an interrupt is signalled. Otherwise there is probably at least 2110 a way to make |interrupt| nonzero using the \PASCAL\ debugger. 2111 @^system dependencies@> 2112 @^debugging@> 2113 2114 @d check_interrupt==begin if interrupt<>0 then pause_for_instructions; 2115 end 2116 2117 @<Global...@>= 2118 @!interrupt:integer; {should \TeX\ pause for instructions?} 2119 @!OK_to_interrupt:boolean; {should interrupts be observed?} 2120 2121 @ @<Set init...@>= 2122 interrupt:=0; OK_to_interrupt:=true; 2123 2124 @ When an interrupt has been detected, the program goes into its 2125 highest interaction level and lets the user have nearly the full flexibility of 2126 the |error| routine. \TeX\ checks for interrupts only at times when it is 2127 safe to do this. 2128 2129 @p procedure pause_for_instructions; 2130 begin if OK_to_interrupt then 2131 begin interaction:=error_stop_mode; 2132 if (selector=log_only)or(selector=no_print) then 2133 incr(selector); 2134 print_err("Interruption"); 2135 @.Interruption@> 2136 help3("You rang?")@/ 2137 ("Try to insert an instruction for me (e.g., `I\showlists'),")@/ 2138 ("unless you just want to quit by typing `X'."); 2139 deletions_allowed:=false; error; deletions_allowed:=true; 2140 interrupt:=0; 2141 end; 2142 end; 2143 2144 @* \[7] Arithmetic with scaled dimensions. 2145 The principal computations performed by \TeX\ are done entirely in terms of 2146 integers less than $2^{31}$ in magnitude; and divisions are done only when both 2147 dividend and divisor are nonnegative. Thus, the arithmetic specified in this 2148 program can be carried out in exactly the same way on a wide variety of 2149 computers, including some small ones. Why? Because the arithmetic 2150 calculations need to be spelled out precisely in order to guarantee that 2151 \TeX\ will produce identical output on different machines. If some 2152 quantities were rounded differently in different implementations, we would 2153 find that line breaks and even page breaks might occur in different places. 2154 Hence the arithmetic of \TeX\ has been designed with care, and systems that 2155 claim to be implementations of \TeX82 should follow precisely the 2156 @:TeX82}{\TeX82@> 2157 calculations as they appear in the present program. 2158 2159 (Actually there are three places where \TeX\ uses |div| with a possibly negative 2160 numerator. These are harmless; see |div| in the index. Also if the user 2161 sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic 2162 information will involve negative-numerator division. The same remarks 2163 apply for |mod| as well as for |div|.) 2164 2165 @ Here is a routine that calculates half of an integer, using an 2166 unambiguous convention with respect to signed odd numbers. 2167 2168 @p function half(@!x:integer):integer; 2169 begin if odd(x) then half:=(x+1) div 2 2170 else half:=x @!div 2; 2171 end; 2172 2173 @ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples 2174 of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit 2175 positions from the right end of a binary computer word. 2176 2177 @d unity == @'200000 {$2^{16}$, represents 1.00000} 2178 @d two == @'400000 {$2^{17}$, represents 2.00000} 2179 2180 @<Types...@>= 2181 @!scaled = integer; {this type is used for scaled integers} 2182 @!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$} 2183 @!small_number=0..63; {this type is self-explanatory} 2184 2185 @ The following function is used to create a scaled integer from a given decimal 2186 fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is 2187 given in |dig[i]|, and the calculation produces a correctly rounded result. 2188 2189 @p function round_decimals(@!k:small_number) : scaled; 2190 {converts a decimal fraction} 2191 var a:integer; {the accumulator} 2192 begin a:=0; 2193 while k>0 do 2194 begin decr(k); a:=(a+dig[k]*two) div 10; 2195 end; 2196 round_decimals:=(a+1) div 2; 2197 end; 2198 2199 @ Conversely, here is a procedure analogous to |print_int|. If the output 2200 of this procedure is subsequently read by \TeX\ and converted by the 2201 |round_decimals| routine above, it turns out that the original value will 2202 be reproduced exactly; the ``simplest'' such decimal number is output, 2203 but there is always at least one digit following the decimal point. 2204 2205 The invariant relation in the \&{repeat} loop is that a sequence of 2206 decimal digits yet to be printed will yield the original number if and only if 2207 they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}f<s$. 2208 We can stop if and only if $f=0$ satisfies this condition; the loop will 2209 terminate before $s$ can possibly become zero. 2210 2211 @p procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five 2212 digits} 2213 var delta:scaled; {amount of allowable inaccuracy} 2214 begin if s<0 then 2215 begin print_char("-"); negate(s); {print the sign, if negative} 2216 end; 2217 print_int(s div unity); {print the integer part} 2218 print_char("."); 2219 s:=10*(s mod unity)+5; delta:=10; 2220 repeat if delta>unity then s:=s+@'100000-50000; {round the last digit} 2221 print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10; 2222 until s<=delta; 2223 end; 2224 2225 @ Physical sizes that a \TeX\ user specifies for portions of documents are 2226 represented internally as scaled points. Thus, if we define an `sp' (scaled 2227 @^sp@> 2228 point) as a unit equal to $2^{-16}$ printer's points, every dimension 2229 inside of \TeX\ is an integer number of sp. There are exactly 2230 4,736,286.72 sp per inch. Users are not allowed to specify dimensions 2231 larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583 2232 meters); two such quantities can be added without overflow on a 32-bit 2233 computer. 2234 2235 The present implementation of \TeX\ does not check for overflow when 2236 @^overflow in arithmetic@> 2237 dimensions are added or subtracted. This could be done by inserting a 2238 few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then 2239 @t\\{report\_overflow}@>|', but the chance of overflow is so remote that 2240 such tests do not seem worthwhile. 2241 2242 \TeX\ needs to do only a few arithmetic operations on scaled quantities, 2243 other than addition and subtraction, and the following subroutines do most of 2244 the work. A single computation might use several subroutine calls, and it is 2245 desirable to avoid producing multiple error messages in case of arithmetic 2246 overflow; so the routines set the global variable |arith_error| to |true| 2247 instead of reporting errors directly to the user. Another global variable, 2248 |remainder|, holds the remainder after a division. 2249 2250 @<Glob...@>= 2251 @!arith_error:boolean; {has arithmetic overflow occurred recently?} 2252 @!remainder:scaled; {amount subtracted to get an exact division} 2253 2254 @ The first arithmetical subroutine we need computes $nx+y$, where |x| 2255 and~|y| are |scaled| and |n| is an integer. We will also use it to 2256 multiply integers. 2257 2258 @d nx_plus_y(#)==mult_and_add(#,@'7777777777) 2259 @d mult_integers(#)==mult_and_add(#,0,@'17777777777) 2260 2261 @p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled; 2262 begin if n<0 then 2263 begin negate(x); negate(n); 2264 end; 2265 if n=0 then mult_and_add:=y 2266 else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then 2267 mult_and_add:=n*x+y 2268 else begin arith_error:=true; mult_and_add:=0; 2269 end; 2270 end; 2271 2272 @ We also need to divide scaled dimensions by integers. 2273 2274 @p function x_over_n(@!x:scaled;@!n:integer):scaled; 2275 var negative:boolean; {should |remainder| be negated?} 2276 begin negative:=false; 2277 if n=0 then 2278 begin arith_error:=true; x_over_n:=0; remainder:=x; 2279 end 2280 else begin if n<0 then 2281 begin negate(x); negate(n); negative:=true; 2282 end; 2283 if x>=0 then 2284 begin x_over_n:=x div n; remainder:=x mod n; 2285 end 2286 else begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n); 2287 end; 2288 end; 2289 if negative then negate(remainder); 2290 end; 2291 2292 @ Then comes the multiplication of a scaled number by a fraction |n/d|, 2293 where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is 2294 positive. It would be too dangerous to multiply by~|n| and then divide 2295 by~|d|, in separate operations, since overflow might well occur; and it 2296 would be too inaccurate to divide by |d| and then multiply by |n|. Hence 2297 this subroutine simulates 1.5-precision arithmetic. 2298 2299 @p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled; 2300 var positive:boolean; {was |x>=0|?} 2301 @!t,@!u,@!v:nonnegative_integer; {intermediate quantities} 2302 begin if x>=0 then positive:=true 2303 else begin negate(x); positive:=false; 2304 end; 2305 t:=(x mod @'100000)*n; 2306 u:=(x div @'100000)*n+(t div @'100000); 2307 v:=(u mod d)*@'100000 + (t mod @'100000); 2308 if u div d>=@'100000 then arith_error:=true 2309 else u:=@'100000*(u div d) + (v div d); 2310 if positive then 2311 begin xn_over_d:=u; remainder:=v mod d; 2312 end 2313 else begin xn_over_d:=-u; remainder:=-(v mod d); 2314 end; 2315 end; 2316 2317 @ The next subroutine is used to compute the ``badness'' of glue, when a 2318 total~|t| is supposed to be made from amounts that sum to~|s|. According 2319 to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$; 2320 however, badness is simply a heuristic, so we need not squeeze out the 2321 last drop of accuracy when computing it. All we really want is an 2322 approximation that has similar properties. 2323 @:TeXbook}{\sl The \TeX book@> 2324 2325 The actual method used to compute the badness is easier to read from the 2326 program than to describe in words. It produces an integer value that is a 2327 reasonably close approximation to $100(t/s)^3$, and all implementations 2328 of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is 2329 treated as infinitely bad, and represented by 10000. 2330 2331 It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s) 2332 >=badness(t,s+1)|}.$$ The badness function defined here is capable of 2333 computing at most 1095 distinct values, but that is plenty. 2334 2335 @d inf_bad = 10000 {infinitely bad value} 2336 2337 @p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|} 2338 var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx 2339 100\cdot2^{18}$} 2340 begin if t=0 then badness:=0 2341 else if s<=0 then badness:=inf_bad 2342 else begin if t<=7230584 then r:=(t*297) div s {$297^3=99.94\times2^{18}$} 2343 else if s>=1663497 then r:=t div (s div 297) 2344 else r:=t; 2345 if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$} 2346 else badness:=(r*r*r+@'400000) div @'1000000; 2347 end; {that was $r^3/2^{18}$, rounded to the nearest integer} 2348 end; 2349 2350 @ When \TeX\ ``packages'' a list into a box, it needs to calculate the 2351 proportionality ratio by which the glue inside the box should stretch 2352 or shrink. This calculation does not affect \TeX's decision making, 2353 so the precise details of rounding, etc., in the glue calculation are not 2354 of critical importance for the consistency of results on different computers. 2355 2356 We shall use the type |glue_ratio| for such proportionality ratios. 2357 A glue ratio should take the same amount of memory as an 2358 |integer| (usually 32 bits) if it is to blend smoothly with \TeX's 2359 other data structures. Thus |glue_ratio| should be equivalent to 2360 |short_real| in some implementations of \PASCAL. Alternatively, 2361 it is possible to deal with glue ratios using nothing but fixed-point 2362 arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the 2363 routines cited there must be modified to allow negative glue ratios.) 2364 @^system dependencies@> 2365 2366 @d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio} 2367 @d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio} 2368 @d float(#) == # {convert from |glue_ratio| to type |real|} 2369 @d unfloat(#) == # {convert from |real| to type |glue_ratio|} 2370 @d float_constant(#) == #.0 {convert |integer| constant to |real|} 2371 2372 @<Types...@>= 2373 @!glue_ratio=real; {one-word representation of a glue expansion factor} 2374 2375 @* \[8] Packed data. 2376 In order to make efficient use of storage space, \TeX\ bases its major data 2377 structures on a |memory_word|, which contains either a (signed) integer, 2378 possibly scaled, or a (signed) |glue_ratio|, or a small number of 2379 fields that are one half or one quarter of the size used for storing 2380 integers. 2381 2382 If |x| is a variable of type |memory_word|, it contains up to four 2383 fields that can be referred to as follows: 2384 $$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr 2385 |x|&.|int|&(an |integer|)\cr 2386 |x|&.|sc|\qquad&(a |scaled| integer)\cr 2387 |x|&.|gr|&(a |glue_ratio|)\cr 2388 |x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr 2389 |x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword 2390 field)\cr 2391 |x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt 2392 &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ 2393 This is somewhat cumbersome to write, and not very readable either, but 2394 macros will be used to make the notation shorter and more transparent. 2395 The \PASCAL\ code below gives a formal definition of |memory_word| and 2396 its subsidiary types, using packed variant records. \TeX\ makes no 2397 assumptions about the relative positions of the fields within a word. 2398 2399 Since we are assuming 32-bit integers, a halfword must contain at least 2400 16 bits, and a quarterword must contain at least 8 bits. 2401 @^system dependencies@> 2402 But it doesn't hurt to have more bits; for example, with enough 36-bit 2403 words you might be able to have |mem_max| as large as 262142, which is 2404 eight times as much memory as anybody had during the first four years of 2405 \TeX's existence. 2406 2407 N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled 2408 by a \PASCAL\ that packs all of the |memory_word| variants into 2409 the space of a single integer. This means, for example, that |glue_ratio| 2410 words should be |short_real| instead of |real| on some computers. Some 2411 \PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into 2412 an eight-bit field, but others insist on allocating space for an additional 2413 sign bit; on such systems you can get 256 values into a quarterword only 2414 if the subrange is `|-128..127|'. 2415 2416 The present implementation tries to accommodate as many variations as possible, 2417 so it makes few assumptions. If integers having the subrange 2418 `|min_quarterword..max_quarterword|' can be packed into a quarterword, 2419 and if integers having the subrange `|min_halfword..max_halfword|' 2420 can be packed into a halfword, everything should work satisfactorily. 2421 2422 It is usually most efficient to have |min_quarterword=min_halfword=0|, 2423 so one should try to achieve this unless it causes a severe problem. 2424 The values defined here are recommended for most 32-bit computers. 2425 2426 @d min_quarterword=0 {smallest allowable value in a |quarterword|} 2427 @d max_quarterword=255 {largest allowable value in a |quarterword|} 2428 @d min_halfword==0 {smallest allowable value in a |halfword|} 2429 @d max_halfword==65535 {largest allowable value in a |halfword|} 2430 2431 @ Here are the inequalities that the quarterword and halfword values 2432 must satisfy (or rather, the inequalities that they mustn't satisfy): 2433 2434 @<Check the ``constant''...@>= 2435 init if (mem_min<>mem_bot)or(mem_max<>mem_top) then bad:=10;@+tini@;@/ 2436 if (mem_min>mem_bot)or(mem_max<mem_top) then bad:=10; 2437 if (min_quarterword>0)or(max_quarterword<127) then bad:=11; 2438 if (min_halfword>0)or(max_halfword<32767) then bad:=12; 2439 if (min_quarterword<min_halfword)or@| 2440 (max_quarterword>max_halfword) then bad:=13; 2441 if (mem_min<min_halfword)or(mem_max>=max_halfword)or@| 2442 (mem_bot-mem_min>max_halfword+1) then bad:=14; 2443 if (font_base<min_quarterword)or(font_max>max_quarterword) then bad:=15; 2444 if font_max>font_base+256 then bad:=16; 2445 if (save_size>max_halfword)or(max_strings>max_halfword) then bad:=17; 2446 if buf_size>max_halfword then bad:=18; 2447 if max_quarterword-min_quarterword<255 then bad:=19; 2448 2449 @ The operation of adding or subtracting |min_quarterword| occurs quite 2450 frequently in \TeX, so it is convenient to abbreviate this operation 2451 by using the macros |qi| and |qo| for input and output to and from 2452 quarterword format. 2453 2454 The inner loop of \TeX\ will run faster with respect to compilers 2455 that don't optimize expressions like `|x+0|' and `|x-0|', if these 2456 macros are simplified in the obvious way when |min_quarterword=0|. 2457 @^inner loop@>@^system dependencies@> 2458 2459 @d qi(#)==#+min_quarterword 2460 {to put an |eight_bits| item into a quarterword} 2461 @d qo(#)==#-min_quarterword 2462 {to take an |eight_bits| item out of a quarterword} 2463 @d hi(#)==#+min_halfword 2464 {to put a sixteen-bit item into a halfword} 2465 @d ho(#)==#-min_halfword 2466 {to take a sixteen-bit item from a halfword} 2467 2468 @ The reader should study the following definitions closely: 2469 @^system dependencies@> 2470 2471 @d sc==int {|scaled| data is equivalent to |integer|} 2472 2473 @<Types...@>= 2474 @!quarterword = min_quarterword..max_quarterword; {1/4 of a word} 2475 @!halfword=min_halfword..max_halfword; {1/2 of a word} 2476 @!two_choices = 1..2; {used when there are two variants in a record} 2477 @!four_choices = 1..4; {used when there are four variants in a record} 2478 @!two_halves = packed record@;@/ 2479 @!rh:halfword; 2480 case two_choices of 2481 1: (@!lh:halfword); 2482 2: (@!b0:quarterword; @!b1:quarterword); 2483 end; 2484 @!four_quarters = packed record@;@/ 2485 @!b0:quarterword; 2486 @!b1:quarterword; 2487 @!b2:quarterword; 2488 @!b3:quarterword; 2489 end; 2490 @!memory_word = record@;@/ 2491 case four_choices of 2492 1: (@!int:integer); 2493 2: (@!gr:glue_ratio); 2494 3: (@!hh:two_halves); 2495 4: (@!qqqq:four_quarters); 2496 end; 2497 @!word_file = file of memory_word; 2498 2499 @ When debugging, we may want to print a |memory_word| without knowing 2500 what type it is; so we print it in all modes. 2501 @^dirty \PASCAL@>@^debugging@> 2502 2503 @p @!debug procedure print_word(@!w:memory_word); 2504 {prints |w| in all ways} 2505 begin print_int(w.int); print_char(" ");@/ 2506 print_scaled(w.sc); print_char(" ");@/ 2507 print_scaled(round(unity*float(w.gr))); print_ln;@/ 2508 @^real multiplication@> 2509 print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":"); 2510 print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/ 2511 print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":"); 2512 print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3); 2513 end; 2514 gubed 2515 2516 @* \[9] Dynamic memory allocation. 2517 The \TeX\ system does nearly all of its own memory allocation, so that it 2518 can readily be transported into environments that do not have automatic 2519 facilities for strings, garbage collection, etc., and so that it can be in 2520 control of what error messages the user receives. The dynamic storage 2521 requirements of \TeX\ are handled by providing a large array |mem| in 2522 which consecutive blocks of words are used as nodes by the \TeX\ routines. 2523 2524 Pointer variables are indices into this array, or into another array 2525 called |eqtb| that will be explained later. A pointer variable might 2526 also be a special flag that lies outside the bounds of |mem|, so we 2527 allow pointers to assume any |halfword| value. The minimum halfword 2528 value represents a null pointer. \TeX\ does not assume that |mem[null]| exists. 2529 2530 @d pointer==halfword {a flag or a location in |mem| or |eqtb|} 2531 @d null==min_halfword {the null pointer} 2532 2533 @<Glob...@>= 2534 @!temp_ptr:pointer; {a pointer variable for occasional emergency use} 2535 2536 @ The |mem| array is divided into two regions that are allocated separately, 2537 but the dividing line between these two regions is not fixed; they grow 2538 together until finding their ``natural'' size in a particular job. 2539 Locations less than or equal to |lo_mem_max| are used for storing 2540 variable-length records consisting of two or more words each. This region 2541 is maintained using an algorithm similar to the one described in exercise 2542 2.5--19 of {\sl The Art of Computer Programming}. However, no size field 2543 appears in the allocated nodes; the program is responsible for knowing the 2544 relevant size when a node is freed. Locations greater than or equal to 2545 |hi_mem_min| are used for storing one-word records; a conventional 2546 \.{AVAIL} stack is used for allocation in this region. 2547 2548 Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part 2549 of preloaded format files, by the \.{INITEX} preprocessor. 2550 @.INITEX@> 2551 Production versions of \TeX\ may extend the memory at both ends in order to 2552 provide more space; locations between |mem_min| and |mem_bot| are always 2553 used for variable-size nodes, and locations between |mem_top| and |mem_max| 2554 are always used for single-word nodes. 2555 2556 The key pointers that govern |mem| allocation have a prescribed order: 2557 $$\advance\thickmuskip-2mu 2558 \hbox{|null<=mem_min<=mem_bot<lo_mem_max< 2559 hi_mem_min<mem_top<=mem_end<=mem_max|.}$$ 2560 2561 Empirical tests show that the present implementation of \TeX\ tends to 2562 spend about 9\pct! of its running time allocating nodes, and about 6\pct! 2563 deallocating them after their use. 2564 2565 @<Glob...@>= 2566 @!mem : array[mem_min..mem_max] of memory_word; {the big dynamic storage area} 2567 @!lo_mem_max : pointer; {the largest location of variable-size memory in use} 2568 @!hi_mem_min : pointer; {the smallest location of one-word memory in use} 2569 2570 @ In order to study the memory requirements of particular applications, it 2571 is possible to prepare a version of \TeX\ that keeps track of current and 2572 maximum memory usage. When code between the delimiters |@!stat| $\ldots$ 2573 |tats| is not ``commented out,'' \TeX\ will run a bit slower but it will 2574 report these statistics when |tracing_stats| is sufficiently large. 2575 2576 @<Glob...@>= 2577 @!var_used, @!dyn_used : integer; {how much memory is in use} 2578 2579 @ Let's consider the one-word memory region first, since it's the 2580 simplest. The pointer variable |mem_end| holds the highest-numbered location 2581 of |mem| that has ever been used. The free locations of |mem| that 2582 occur between |hi_mem_min| and |mem_end|, inclusive, are of type 2583 |two_halves|, and we write |info(p)| and |link(p)| for the |lh| 2584 and |rh| fields of |mem[p]| when it is of this type. The single-word 2585 free locations form a linked list 2586 $$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$ 2587 terminated by |null|. 2588 2589 @d link(#) == mem[#].hh.rh {the |link| field of a memory word} 2590 @d info(#) == mem[#].hh.lh {the |info| field of a memory word} 2591 2592 @<Glob...@>= 2593 @!avail : pointer; {head of the list of available one-word nodes} 2594 @!mem_end : pointer; {the last one-word node used in |mem|} 2595 2596 @ If memory is exhausted, it might mean that the user has forgotten 2597 a right brace. We will define some procedures later that try to help 2598 pinpoint the trouble. 2599 2600 @p @<Declare the procedure called |show_token_list|@>@/ 2601 @<Declare the procedure called |runaway|@> 2602 2603 @ The function |get_avail| returns a pointer to a new one-word node whose 2604 |link| field is null. However, \TeX\ will halt if there is no more room left. 2605 @^inner loop@> 2606 2607 If the available-space list is empty, i.e., if |avail=null|, 2608 we try first to increase |mem_end|. If that cannot be done, i.e., if 2609 |mem_end=mem_max|, we try to decrease |hi_mem_min|. If that cannot be 2610 done, i.e., if |hi_mem_min=lo_mem_max+1|, we have to quit. 2611 2612 @p function get_avail : pointer; {single-word node allocation} 2613 var p:pointer; {the new node being got} 2614 begin p:=avail; {get top location in the |avail| stack} 2615 if p<>null then avail:=link(avail) {and pop it off} 2616 else if mem_end<mem_max then {or go into virgin territory} 2617 begin incr(mem_end); p:=mem_end; 2618 end 2619 else begin decr(hi_mem_min); p:=hi_mem_min; 2620 if hi_mem_min<=lo_mem_max then 2621 begin runaway; {if memory is exhausted, display possible runaway text} 2622 overflow("main memory size",mem_max+1-mem_min); 2623 {quit; all one-word nodes are busy} 2624 @:TeX capacity exceeded main memory size}{\quad main memory size@> 2625 end; 2626 end; 2627 link(p):=null; {provide an oft-desired initialization of the new node} 2628 @!stat incr(dyn_used);@+tats@;{maintain statistics} 2629 get_avail:=p; 2630 end; 2631 2632 @ Conversely, a one-word node is recycled by calling |free_avail|. 2633 This routine is part of \TeX's ``inner loop,'' so we want it to be fast. 2634 @^inner loop@> 2635 2636 @d free_avail(#)== {single-word node liberation} 2637 begin link(#):=avail; avail:=#; 2638 @!stat decr(dyn_used);@+tats@/ 2639 end 2640 2641 @ There's also a |fast_get_avail| routine, which saves the procedure-call 2642 overhead at the expense of extra programming. This routine is used in 2643 the places that would otherwise account for the most calls of |get_avail|. 2644 @^inner loop@> 2645 2646 @d fast_get_avail(#)==@t@>@;@/ 2647 begin #:=avail; {avoid |get_avail| if possible, to save time} 2648 if #=null then #:=get_avail 2649 else begin avail:=link(#); link(#):=null; 2650 @!stat incr(dyn_used);@+tats@/ 2651 end; 2652 end 2653 2654 @ The procedure |flush_list(p)| frees an entire linked list of 2655 one-word nodes that starts at position |p|. 2656 @^inner loop@> 2657 2658 @p procedure flush_list(@!p:pointer); {makes list of single-word nodes 2659 available} 2660 var @!q,@!r:pointer; {list traversers} 2661 begin if p<>null then 2662 begin r:=p; 2663 repeat q:=r; r:=link(r); @!stat decr(dyn_used);@+tats@/ 2664 until r=null; {now |q| is the last node on the list} 2665 link(q):=avail; avail:=p; 2666 end; 2667 end; 2668 2669 @ The available-space list that keeps track of the variable-size portion 2670 of |mem| is a nonempty, doubly-linked circular list of empty nodes, 2671 pointed to by the roving pointer |rover|. 2672 2673 Each empty node has size 2 or more; the first word contains the special 2674 value |max_halfword| in its |link| field and the size in its |info| field; 2675 the second word contains the two pointers for double linking. 2676 2677 Each nonempty node also has size 2 or more. Its first word is of type 2678 |two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|. 2679 Otherwise there is complete flexibility with respect to the contents 2680 of its other fields and its other words. 2681 2682 (We require |mem_max<max_halfword| because terrible things can happen 2683 when |max_halfword| appears in the |link| field of a nonempty node.) 2684 2685 @d empty_flag == max_halfword {the |link| of an empty variable-size node} 2686 @d is_empty(#) == (link(#)=empty_flag) {tests for empty node} 2687 @d node_size == info {the size field in empty variable-size nodes} 2688 @d llink(#) == info(#+1) {left link in doubly-linked list of empty nodes} 2689 @d rlink(#) == link(#+1) {right link in doubly-linked list of empty nodes} 2690 2691 @<Glob...@>= 2692 @!rover : pointer; {points to some node in the list of empties} 2693 2694 @ A call to |get_node| with argument |s| returns a pointer to a new node 2695 of size~|s|, which must be 2~or more. The |link| field of the first word 2696 of this new node is set to null. An overflow stop occurs if no suitable 2697 space exists. 2698 2699 If |get_node| is called with $s=2^{30}$, it simply merges adjacent free 2700 areas and returns the value |max_halfword|. 2701 2702 @p function get_node(@!s:integer):pointer; {variable-size node allocation} 2703 label found,exit,restart; 2704 var p:pointer; {the node currently under inspection} 2705 @!q:pointer; {the node physically after node |p|} 2706 @!r:integer; {the newly allocated node, or a candidate for this honor} 2707 @!t:integer; {temporary register} 2708 begin restart: p:=rover; {start at some free node in the ring} 2709 repeat @<Try to allocate within node |p| and its physical successors, 2710 and |goto found| if allocation was possible@>; 2711 @^inner loop@> 2712 p:=rlink(p); {move to the next node in the ring} 2713 until p=rover; {repeat until the whole list has been traversed} 2714 if s=@'10000000000 then 2715 begin get_node:=max_halfword; return; 2716 end; 2717 if lo_mem_max+2<hi_mem_min then if lo_mem_max+2<=mem_bot+max_halfword then 2718 @<Grow more variable-size memory and |goto restart|@>; 2719 overflow("main memory size",mem_max+1-mem_min); 2720 {sorry, nothing satisfactory is left} 2721 @:TeX capacity exceeded main memory size}{\quad main memory size@> 2722 found: link(r):=null; {this node is now nonempty} 2723 @!stat var_used:=var_used+s; {maintain usage statistics} 2724 tats@;@/ 2725 get_node:=r; 2726 exit:end; 2727 2728 @ The lower part of |mem| grows by 1000 words at a time, unless 2729 we are very close to going under. When it grows, we simply link 2730 a new node into the available-space list. This method of controlled 2731 growth helps to keep the |mem| usage consecutive when \TeX\ is 2732 implemented on ``virtual memory'' systems. 2733 @^virtual memory@> 2734 2735 @<Grow more variable-size memory and |goto restart|@>= 2736 begin if hi_mem_min-lo_mem_max>=1998 then t:=lo_mem_max+1000 2737 else t:=lo_mem_max+1+(hi_mem_min-lo_mem_max) div 2; 2738 {|lo_mem_max+2<=t<hi_mem_min|} 2739 p:=llink(rover); q:=lo_mem_max; rlink(p):=q; llink(rover):=q;@/ 2740 if t>mem_bot+max_halfword then t:=mem_bot+max_halfword; 2741 rlink(q):=rover; llink(q):=p; link(q):=empty_flag; node_size(q):=t-lo_mem_max;@/ 2742 lo_mem_max:=t; link(lo_mem_max):=null; info(lo_mem_max):=null; 2743 rover:=q; goto restart; 2744 end 2745 2746 @ Empirical tests show that the routine in this section performs a 2747 node-merging operation about 0.75 times per allocation, on the average, 2748 after which it finds that |r>p+1| about 95\pct! of the time. 2749 2750 @<Try to allocate...@>= 2751 q:=p+node_size(p); {find the physical successor} 2752 @^inner loop@> 2753 while is_empty(q) do {merge node |p| with node |q|} 2754 begin t:=rlink(q); 2755 if q=rover then rover:=t; 2756 llink(t):=llink(q); rlink(llink(q)):=t;@/ 2757 q:=q+node_size(q); 2758 end; 2759 r:=q-s; 2760 if r>p+1 then @<Allocate from the top of node |p| and |goto found|@>; 2761 if r=p then if rlink(p)<>p then 2762 @<Allocate entire node |p| and |goto found|@>; 2763 node_size(p):=q-p {reset the size in case it grew} 2764 2765 @ @<Allocate from the top...@>= 2766 begin node_size(p):=r-p; {store the remaining size} 2767 @^inner loop@> 2768 rover:=p; {start searching here next time} 2769 goto found; 2770 end 2771 2772 @ Here we delete node |p| from the ring, and let |rover| rove around. 2773 2774 @<Allocate entire...@>= 2775 begin rover:=rlink(p); t:=llink(p); 2776 llink(rover):=t; rlink(t):=rover; 2777 goto found; 2778 end 2779 2780 @ Conversely, when some variable-size node |p| of size |s| is no longer needed, 2781 the operation |free_node(p,s)| will make its words available, by inserting 2782 |p| as a new empty node just before where |rover| now points. 2783 @^inner loop@> 2784 2785 @p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node 2786 liberation} 2787 var q:pointer; {|llink(rover)|} 2788 begin node_size(p):=s; link(p):=empty_flag; 2789 q:=llink(rover); llink(p):=q; rlink(p):=rover; {set both links} 2790 llink(rover):=p; rlink(q):=p; {insert |p| into the ring} 2791 @!stat var_used:=var_used-s;@+tats@;{maintain statistics} 2792 end; 2793 2794 @ Just before \.{INITEX} writes out the memory, it sorts the doubly linked 2795 available space list. The list is probably very short at such times, so a 2796 simple insertion sort is used. The smallest available location will be 2797 pointed to by |rover|, the next-smallest by |rlink(rover)|, etc. 2798 2799 @p @!init procedure sort_avail; {sorts the available variable-size nodes 2800 by location} 2801 var p,@!q,@!r: pointer; {indices into |mem|} 2802 @!old_rover:pointer; {initial |rover| setting} 2803 begin p:=get_node(@'10000000000); {merge adjacent free areas} 2804 p:=rlink(rover); rlink(rover):=max_halfword; old_rover:=rover; 2805 while p<>old_rover do @<Sort \(p)|p| into the list starting at |rover| 2806 and advance |p| to |rlink(p)|@>; 2807 p:=rover; 2808 while rlink(p)<>max_halfword do 2809 begin llink(rlink(p)):=p; p:=rlink(p); 2810 end; 2811 rlink(p):=rover; llink(rover):=p; 2812 end; 2813 tini 2814 2815 @ The following |while| loop is guaranteed to 2816 terminate, since the list that starts at 2817 |rover| ends with |max_halfword| during the sorting procedure. 2818 2819 @<Sort \(p)|p|...@>= 2820 if p<rover then 2821 begin q:=p; p:=rlink(q); rlink(q):=rover; rover:=q; 2822 end 2823 else begin q:=rover; 2824 while rlink(q)<p do q:=rlink(q); 2825 r:=rlink(p); rlink(p):=rlink(q); rlink(q):=p; p:=r; 2826 end 2827 2828 @* \[10] Data structures for boxes and their friends. 2829 From the computer's standpoint, \TeX's chief mission is to create 2830 horizontal and vertical lists. We shall now investigate how the elements 2831 of these lists are represented internally as nodes in the dynamic memory. 2832 2833 A horizontal or vertical list is linked together by |link| fields in 2834 the first word of each node. Individual nodes represent boxes, glue, 2835 penalties, or special things like discretionary hyphens; because of this 2836 variety, some nodes are longer than others, and we must distinguish different 2837 kinds of nodes. We do this by putting a `|type|' field in the first word, 2838 together with the link and an optional `|subtype|'. 2839 2840 @d type(#) == mem[#].hh.b0 {identifies what kind of node this is} 2841 @d subtype(#) == mem[#].hh.b1 {secondary identification in some cases} 2842 2843 @ A |@!char_node|, which represents a single character, is the most important 2844 kind of node because it accounts for the vast majority of all boxes. 2845 Special precautions are therefore taken to ensure that a |char_node| does 2846 not take up much memory space. Every such node is one word long, and in fact 2847 it is identifiable by this property, since other kinds of nodes have at least 2848 two words, and they appear in |mem| locations less than |hi_mem_min|. 2849 This makes it possible to omit the |type| field in a |char_node|, leaving 2850 us room for two bytes that identify a |font| and a |character| within 2851 that font. 2852 2853 Note that the format of a |char_node| allows for up to 256 different 2854 fonts and up to 256 characters per font; but most implementations will 2855 probably limit the total number of fonts to fewer than 75 per job, 2856 and most fonts will stick to characters whose codes are 2857 less than 128 (since higher codes 2858 are more difficult to access on most keyboards). 2859 2860 Extensions of \TeX\ intended for oriental languages will need even more 2861 than $256\times256$ possible characters, when we consider different sizes 2862 @^oriental characters@>@^Chinese characters@>@^Japanese characters@> 2863 and styles of type. It is suggested that Chinese and Japanese fonts be 2864 handled by representing such characters in two consecutive |char_node| 2865 entries: The first of these has |font=font_base|, and its |link| points 2866 to the second; 2867 the second identifies the font and the character dimensions. 2868 The saving feature about oriental characters is that most of them have 2869 the same box dimensions. The |character| field of the first |char_node| 2870 is a ``\\{charext}'' that distinguishes between graphic symbols whose 2871 dimensions are identical for typesetting purposes. (See the \MF\ manual.) 2872 Such an extension of \TeX\ would not be difficult; further details are 2873 left to the reader. 2874 2875 In order to make sure that the |character| code fits in a quarterword, 2876 \TeX\ adds the quantity |min_quarterword| to the actual code. 2877 2878 Character nodes appear only in horizontal lists, never in vertical lists. 2879 2880 @d is_char_node(#) == (#>=hi_mem_min) 2881 {does the argument point to a |char_node|?} 2882 @d font == type {the font code in a |char_node|} 2883 @d character == subtype {the character code in a |char_node|} 2884 2885 @ An |hlist_node| stands for a box that was made from a horizontal list. 2886 Each |hlist_node| is seven words long, and contains the following fields 2887 (in addition to the mandatory |type| and |link|, which we shall not 2888 mention explicitly when discussing the other node types): The |height| and 2889 |width| and |depth| are scaled integers denoting the dimensions of the 2890 box. There is also a |shift_amount| field, a scaled integer indicating 2891 how much this box should be lowered (if it appears in a horizontal list), 2892 or how much it should be moved to the right (if it appears in a vertical 2893 list). There is a |list_ptr| field, which points to the beginning of the 2894 list from which this box was fabricated; if |list_ptr| is |null|, the box 2895 is empty. Finally, there are three fields that represent the setting of 2896 the glue: |glue_set(p)| is a word of type |glue_ratio| that represents 2897 the proportionality constant for glue setting; |glue_sign(p)| is 2898 |stretching| or |shrinking| or |normal| depending on whether or not the 2899 glue should stretch or shrink or remain rigid; and |glue_order(p)| 2900 specifies the order of infinity to which glue setting applies (|normal|, 2901 |fil|, |fill|, or |filll|). The |subtype| field is not used. 2902 2903 @d hlist_node=0 {|type| of hlist nodes} 2904 @d box_node_size=7 {number of words to allocate for a box node} 2905 @d width_offset=1 {position of |width| field in a box node} 2906 @d depth_offset=2 {position of |depth| field in a box node} 2907 @d height_offset=3 {position of |height| field in a box node} 2908 @d width(#) == mem[#+width_offset].sc {width of the box, in sp} 2909 @d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp} 2910 @d height(#) == mem[#+height_offset].sc {height of the box, in sp} 2911 @d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp} 2912 @d list_offset=5 {position of |list_ptr| field in a box node} 2913 @d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box} 2914 @d glue_order(#) == subtype(#+list_offset) {applicable order of infinity} 2915 @d glue_sign(#) == type(#+list_offset) {stretching or shrinking} 2916 @d normal=0 {the most common case when several cases are named} 2917 @d stretching = 1 {glue setting applies to the stretch components} 2918 @d shrinking = 2 {glue setting applies to the shrink components} 2919 @d glue_offset = 6 {position of |glue_set| in a box node} 2920 @d glue_set(#) == mem[#+glue_offset].gr 2921 {a word of type |glue_ratio| for glue setting} 2922 2923 @ The |new_null_box| function returns a pointer to an |hlist_node| in 2924 which all subfields have the values corresponding to `\.{\\hbox\{\}}'. 2925 (The |subtype| field is set to |min_quarterword|, for historic reasons 2926 that are no longer relevant.) 2927 2928 @p function new_null_box:pointer; {creates a new box node} 2929 var p:pointer; {the new node} 2930 begin p:=get_node(box_node_size); type(p):=hlist_node; 2931 subtype(p):=min_quarterword; 2932 width(p):=0; depth(p):=0; height(p):=0; shift_amount(p):=0; list_ptr(p):=null; 2933 glue_sign(p):=normal; glue_order(p):=normal; set_glue_ratio_zero(glue_set(p)); 2934 new_null_box:=p; 2935 end; 2936 2937 @ A |vlist_node| is like an |hlist_node| in all respects except that it 2938 contains a vertical list. 2939 2940 @d vlist_node=1 {|type| of vlist nodes} 2941 2942 @ A |rule_node| stands for a solid black rectangle; it has |width|, 2943 |depth|, and |height| fields just as in an |hlist_node|. However, if 2944 any of these dimensions is $-2^{30}$, the actual value will be determined 2945 by running the rule up to the boundary of the innermost enclosing box. 2946 This is called a ``running dimension.'' The |width| is never running in 2947 an hlist; the |height| and |depth| are never running in a~vlist. 2948 2949 @d rule_node=2 {|type| of rule nodes} 2950 @d rule_node_size=4 {number of words to allocate for a rule node} 2951 @d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item} 2952 @d is_running(#) == (#=null_flag) {tests for a running dimension} 2953 2954 @ A new rule node is delivered by the |new_rule| function. It 2955 makes all the dimensions ``running,'' so you have to change the 2956 ones that are not allowed to run. 2957 2958 @p function new_rule:pointer; 2959 var p:pointer; {the new node} 2960 begin p:=get_node(rule_node_size); type(p):=rule_node; 2961 subtype(p):=0; {the |subtype| is not used} 2962 width(p):=null_flag; depth(p):=null_flag; height(p):=null_flag; 2963 new_rule:=p; 2964 end; 2965 2966 @ Insertions are represented by |ins_node| records, where the |subtype| 2967 indicates the corresponding box number. For example, `\.{\\insert 250}' 2968 leads to an |ins_node| whose |subtype| is |250+min_quarterword|. 2969 The |height| field of an |ins_node| is slightly misnamed; it actually holds 2970 the natural height plus depth of the vertical list being inserted. 2971 The |depth| field holds the |split_max_depth| to be used in case this 2972 insertion is split, and the |split_top_ptr| points to the corresponding 2973 |split_top_skip|. The |float_cost| field holds the |floating_penalty| that 2974 will be used if this insertion floats to a subsequent page after a 2975 split insertion of the same class. There is one more field, the 2976 |ins_ptr|, which points to the beginning of the vlist for the insertion. 2977 2978 @d ins_node=3 {|type| of insertion nodes} 2979 @d ins_node_size=5 {number of words to allocate for an insertion} 2980 @d float_cost(#)==mem[#+1].int {the |floating_penalty| to be used} 2981 @d ins_ptr(#)==info(#+4) {the vertical list to be inserted} 2982 @d split_top_ptr(#)==link(#+4) {the |split_top_skip| to be used} 2983 2984 @ A |mark_node| has a |mark_ptr| field that points to the reference count 2985 of a token list that contains the user's \.{\\mark} text. 2986 This field occupies a full word instead of a halfword, because 2987 there's nothing to put in the other halfword; it is easier in \PASCAL\ to 2988 use the full word than to risk leaving garbage in the unused half. 2989 2990 @d mark_node=4 {|type| of a mark node} 2991 @d small_node_size=2 {number of words to allocate for most node types} 2992 @d mark_ptr(#)==mem[#+1].int {head of the token list for a mark} 2993 2994 @ An |adjust_node|, which occurs only in horizontal lists, 2995 specifies material that will be moved out into the surrounding 2996 vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}' 2997 operation. The |adjust_ptr| field points to the vlist containing this 2998 material. 2999 3000 @d adjust_node=5 {|type| of an adjust node} 3001 @d adjust_ptr==mark_ptr {vertical list to be moved out of horizontal list} 3002 3003 @ A |ligature_node|, which occurs only in horizontal lists, specifies 3004 a character that was fabricated from the interaction of two or more 3005 actual characters. The second word of the node, which is called the 3006 |lig_char| word, contains |font| and |character| fields just as in a 3007 |char_node|. The characters that generated the ligature have not been 3008 forgotten, since they are needed for diagnostic messages and for 3009 hyphenation; the |lig_ptr| field points to a linked list of character 3010 nodes for all original characters that have been deleted. (This list 3011 might be empty if the characters that generated the ligature were 3012 retained in other nodes.) 3013 3014 The |subtype| field is 0, plus 2 and/or 1 if the original source of the 3015 ligature included implicit left and/or right boundaries. 3016 3017 @d ligature_node=6 {|type| of a ligature node} 3018 @d lig_char(#)==#+1 {the word where the ligature is to be found} 3019 @d lig_ptr(#)==link(lig_char(#)) {the list of characters} 3020 3021 @ The |new_ligature| function creates a ligature node having given 3022 contents of the |font|, |character|, and |lig_ptr| fields. We also have 3023 a |new_lig_item| function, which returns a two-word node having a given 3024 |character| field. Such nodes are used for temporary processing as ligatures 3025 are being created. 3026 3027 @p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer; 3028 var p:pointer; {the new node} 3029 begin p:=get_node(small_node_size); type(p):=ligature_node; 3030 font(lig_char(p)):=f; character(lig_char(p)):=c; lig_ptr(p):=q; 3031 subtype(p):=0; new_ligature:=p; 3032 end; 3033 @# 3034 function new_lig_item(@!c:quarterword):pointer; 3035 var p:pointer; {the new node} 3036 begin p:=get_node(small_node_size); character(p):=c; lig_ptr(p):=null; 3037 new_lig_item:=p; 3038 end; 3039 3040 @ A |disc_node|, which occurs only in horizontal lists, specifies a 3041 ``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text 3042 that starts at |pre_break(p)| will precede the break, the text that starts at 3043 |post_break(p)| will follow the break, and text that appears in the next 3044 |replace_count(p)| nodes will be ignored. For example, an ordinary 3045 discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with 3046 |pre_break| pointing to a |char_node| containing a hyphen, |post_break=null|, 3047 and |replace_count=0|. All three of the discretionary texts must be 3048 lists that consist entirely of character, kern, box, rule, and ligature nodes. 3049 3050 If |pre_break(p)=null|, the |ex_hyphen_penalty| will be charged for this 3051 break. Otherwise the |hyphen_penalty| will be charged. The texts will 3052 actually be substituted into the list by the line-breaking algorithm if it 3053 decides to make the break, and the discretionary node will disappear at 3054 that time; thus, the output routine sees only discretionaries that were 3055 not chosen. 3056 3057 @d disc_node=7 {|type| of a discretionary node} 3058 @d replace_count==subtype {how many subsequent nodes to replace} 3059 @d pre_break==llink {text that precedes a discretionary break} 3060 @d post_break==rlink {text that follows a discretionary break} 3061 3062 @p function new_disc:pointer; {creates an empty |disc_node|} 3063 var p:pointer; {the new node} 3064 begin p:=get_node(small_node_size); type(p):=disc_node; 3065 replace_count(p):=0; pre_break(p):=null; post_break(p):=null; 3066 new_disc:=p; 3067 end; 3068 3069 @ A |whatsit_node| is a wild card reserved for extensions to \TeX. The 3070 |subtype| field in its first word says what `\\{whatsit}' it is, and 3071 implicitly determines the node size (which must be 2 or more) and the 3072 format of the remaining words. When a |whatsit_node| is encountered 3073 in a list, special actions are invoked; knowledgeable people who are 3074 careful not to mess up the rest of \TeX\ are able to make \TeX\ do new 3075 things by adding code at the end of the program. For example, there 3076 might be a `\TeX nicolor' extension to specify different colors of ink, 3077 @^extensions to \TeX@> 3078 and the whatsit node might contain the desired parameters. 3079 3080 The present implementation of \TeX\ treats the features associated with 3081 `\.{\\write}' and `\.{\\special}' as if they were extensions, in order to 3082 illustrate how such routines might be coded. We shall defer further 3083 discussion of extensions until the end of this program. 3084 3085 @d whatsit_node=8 {|type| of special extension nodes} 3086 3087 @ A |math_node|, which occurs only in horizontal lists, appears before and 3088 after mathematical formulas. The |subtype| field is |before| before the 3089 formula and |after| after it. There is a |width| field, which represents 3090 the amount of surrounding space inserted by \.{\\mathsurround}. 3091 3092 @d math_node=9 {|type| of a math node} 3093 @d before=0 {|subtype| for math node that introduces a formula} 3094 @d after=1 {|subtype| for math node that winds up a formula} 3095 3096 @p function new_math(@!w:scaled;@!s:small_number):pointer; 3097 var p:pointer; {the new node} 3098 begin p:=get_node(small_node_size); type(p):=math_node; 3099 subtype(p):=s; width(p):=w; new_math:=p; 3100 end; 3101 3102 @ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|, 3103 |rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|, 3104 |disc_node|, |whatsit_node|, and |math_node| are at the low end of the 3105 type codes, by permitting a break at glue in a list if and only if the 3106 |type| of the previous node is less than |math_node|. Furthermore, a 3107 node is discarded after a break if its type is |math_node| or~more. 3108 3109 @d precedes_break(#)==(type(#)<math_node) 3110 @d non_discardable(#)==(type(#)<math_node) 3111 3112 @ A |glue_node| represents glue in a list. However, it is really only 3113 a pointer to a separate glue specification, since \TeX\ makes use of the 3114 fact that many essentially identical nodes of glue are usually present. 3115 If |p| points to a |glue_node|, |glue_ptr(p)| points to 3116 another packet of words that specify the stretch and shrink components, etc. 3117 3118 Glue nodes also serve to represent leaders; the |subtype| is used to 3119 distinguish between ordinary glue (which is called |normal|) and the three 3120 kinds of leaders (which are called |a_leaders|, |c_leaders|, and |x_leaders|). 3121 The |leader_ptr| field points to a rule node or to a box node containing the 3122 leaders; it is set to |null| in ordinary glue nodes. 3123 3124 Many kinds of glue are computed from \TeX's ``skip'' parameters, and 3125 it is helpful to know which parameter has led to a particular glue node. 3126 Therefore the |subtype| is set to indicate the source of glue, whenever 3127 it originated as a parameter. We will be defining symbolic names for the 3128 parameter numbers later (e.g., |line_skip_code=0|, |baseline_skip_code=1|, 3129 etc.); it suffices for now to say that the |subtype| of parametric glue 3130 will be the same as the parameter number, plus~one. 3131 3132 In math formulas there are two more possibilities for the |subtype| in a 3133 glue node: |mu_glue| denotes an \.{\\mskip} (where the units are scaled \.{mu} 3134 instead of scaled \.{pt}); and |cond_math_glue| denotes the `\.{\\nonscript}' 3135 feature that cancels the glue node immediately following if it appears 3136 in a subscript. 3137 3138 @d glue_node=10 {|type| of node that points to a glue specification} 3139 @d cond_math_glue=98 {special |subtype| to suppress glue in the next node} 3140 @d mu_glue=99 {|subtype| for math glue} 3141 @d a_leaders=100 {|subtype| for aligned leaders} 3142 @d c_leaders=101 {|subtype| for centered leaders} 3143 @d x_leaders=102 {|subtype| for expanded leaders} 3144 @d glue_ptr==llink {pointer to a glue specification} 3145 @d leader_ptr==rlink {pointer to box or rule node for leaders} 3146 3147 @ A glue specification has a halfword reference count in its first word, 3148 @^reference counts@> 3149 representing |null| plus the number of glue nodes that point to it (less one). 3150 Note that the reference count appears in the same position as 3151 the |link| field in list nodes; this is the field that is initialized 3152 to |null| when a node is allocated, and it is also the field that is flagged 3153 by |empty_flag| in empty nodes. 3154 3155 Glue specifications also contain three |scaled| fields, for the |width|, 3156 |stretch|, and |shrink| dimensions. Finally, there are two one-byte 3157 fields called |stretch_order| and |shrink_order|; these contain the 3158 orders of infinity (|normal|, |fil|, |fill|, or |filll|) 3159 corresponding to the stretch and shrink values. 3160 3161 @d glue_spec_size=4 {number of words to allocate for a glue specification} 3162 @d glue_ref_count(#) == link(#) {reference count of a glue specification} 3163 @d stretch(#) == mem[#+2].sc {the stretchability of this glob of glue} 3164 @d shrink(#) == mem[#+3].sc {the shrinkability of this glob of glue} 3165 @d stretch_order == type {order of infinity for stretching} 3166 @d shrink_order == subtype {order of infinity for shrinking} 3167 @d fil=1 {first-order infinity} 3168 @d fill=2 {second-order infinity} 3169 @d filll=3 {third-order infinity} 3170 3171 @<Types...@>= 3172 @!glue_ord=normal..filll; {infinity to the 0, 1, 2, or 3 power} 3173 3174 @ Here is a function that returns a pointer to a copy of a glue spec. 3175 The reference count in the copy is |null|, because there is assumed 3176 to be exactly one reference to the new specification. 3177 3178 @p function new_spec(@!p:pointer):pointer; {duplicates a glue specification} 3179 var q:pointer; {the new spec} 3180 begin q:=get_node(glue_spec_size);@/ 3181 mem[q]:=mem[p]; glue_ref_count(q):=null;@/ 3182 width(q):=width(p); stretch(q):=stretch(p); shrink(q):=shrink(p); 3183 new_spec:=q; 3184 end; 3185 3186 @ And here's a function that creates a glue node for a given parameter 3187 identified by its code number; for example, 3188 |new_param_glue(line_skip_code)| returns a pointer to a glue node for the 3189 current \.{\\lineskip}. 3190 3191 @p function new_param_glue(@!n:small_number):pointer; 3192 var p:pointer; {the new node} 3193 @!q:pointer; {the glue specification} 3194 begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=n+1; 3195 leader_ptr(p):=null;@/ 3196 q:=@<Current |mem| equivalent of glue parameter number |n|@>@t@>; 3197 glue_ptr(p):=q; incr(glue_ref_count(q)); 3198 new_param_glue:=p; 3199 end; 3200 3201 @ Glue nodes that are more or less anonymous are created by |new_glue|, 3202 whose argument points to a glue specification. 3203 3204 @p function new_glue(@!q:pointer):pointer; 3205 var p:pointer; {the new node} 3206 begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=normal; 3207 leader_ptr(p):=null; glue_ptr(p):=q; incr(glue_ref_count(q)); 3208 new_glue:=p; 3209 end; 3210 3211 @ Still another subroutine is needed: This one is sort of a combination 3212 of |new_param_glue| and |new_glue|. It creates a glue node for one of 3213 the current glue parameters, but it makes a fresh copy of the glue 3214 specification, since that specification will probably be subject to change, 3215 while the parameter will stay put. The global variable |temp_ptr| is 3216 set to the address of the new spec. 3217 3218 @p function new_skip_param(@!n:small_number):pointer; 3219 var p:pointer; {the new node} 3220 begin temp_ptr:=new_spec(@<Current |mem| equivalent of glue parameter...@>); 3221 p:=new_glue(temp_ptr); glue_ref_count(temp_ptr):=null; subtype(p):=n+1; 3222 new_skip_param:=p; 3223 end; 3224 3225 @ A |kern_node| has a |width| field to specify a (normally negative) 3226 amount of spacing. This spacing correction appears in horizontal lists 3227 between letters like A and V when the font designer said that it looks 3228 better to move them closer together or further apart. A kern node can 3229 also appear in a vertical list, when its `|width|' denotes additional 3230 spacing in the vertical direction. The |subtype| is either |normal| (for 3231 kerns inserted from font information or math mode calculations) or |explicit| 3232 (for kerns inserted from \.{\\kern} and \.{\\/} commands) or |acc_kern| 3233 (for kerns inserted from non-math accents) or |mu_glue| (for kerns 3234 inserted from \.{\\mkern} specifications in math formulas). 3235 3236 @d kern_node=11 {|type| of a kern node} 3237 @d explicit=1 {|subtype| of kern nodes from \.{\\kern} and \.{\\/}} 3238 @d acc_kern=2 {|subtype| of kern nodes from accents} 3239 3240 @ The |new_kern| function creates a kern node having a given width. 3241 3242 @p function new_kern(@!w:scaled):pointer; 3243 var p:pointer; {the new node} 3244 begin p:=get_node(small_node_size); type(p):=kern_node; 3245 subtype(p):=normal; 3246 width(p):=w; 3247 new_kern:=p; 3248 end; 3249 3250 @ A |penalty_node| specifies the penalty associated with line or page 3251 breaking, in its |penalty| field. This field is a fullword integer, but 3252 the full range of integer values is not used: Any penalty |>=10000| is 3253 treated as infinity, and no break will be allowed for such high values. 3254 Similarly, any penalty |<=-10000| is treated as negative infinity, and a 3255 break will be forced. 3256 3257 @d penalty_node=12 {|type| of a penalty node} 3258 @d inf_penalty=inf_bad {``infinite'' penalty value} 3259 @d eject_penalty=-inf_penalty {``negatively infinite'' penalty value} 3260 @d penalty(#) == mem[#+1].int {the added cost of breaking a list here} 3261 3262 @ Anyone who has been reading the last few sections of the program will 3263 be able to guess what comes next. 3264 3265 @p function new_penalty(@!m:integer):pointer; 3266 var p:pointer; {the new node} 3267 begin p:=get_node(small_node_size); type(p):=penalty_node; 3268 subtype(p):=0; {the |subtype| is not used} 3269 penalty(p):=m; new_penalty:=p; 3270 end; 3271 3272 @ You might think that we have introduced enough node types by now. Well, 3273 almost, but there is one more: An |unset_node| has nearly the same format 3274 as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign} 3275 or \.{\\valign} that are not yet in their final form, since the box 3276 dimensions are their ``natural'' sizes before any glue adjustment has been 3277 made. The |glue_set| word is not present; instead, we have a |glue_stretch| 3278 field, which contains the total stretch of order |glue_order| that is 3279 present in the hlist or vlist being boxed. 3280 Similarly, the |shift_amount| field is replaced by a |glue_shrink| field, 3281 containing the total shrink of order |glue_sign| that is present. 3282 The |subtype| field is called |span_count|; an unset box typically 3283 contains the data for |qo(span_count)+1| columns. 3284 Unset nodes will be changed to box nodes when alignment is completed. 3285 3286 @d unset_node=13 {|type| for an unset node} 3287 @d glue_stretch(#)==mem[#+glue_offset].sc {total stretch in an unset node} 3288 @d glue_shrink==shift_amount {total shrink in an unset node} 3289 @d span_count==subtype {indicates the number of spanned columns} 3290 3291 @ In fact, there are still more types coming. When we get to math formula 3292 processing we will see that a |style_node| has |type=14|; and a number 3293 of larger type codes will also be defined, for use in math mode only. 3294 3295 @ Warning: If any changes are made to these data structure layouts, such as 3296 changing any of the node sizes or even reordering the words of nodes, 3297 the |copy_node_list| procedure and the memory initialization code 3298 below may have to be changed. Such potentially dangerous parts of the 3299 program are listed in the index under `data structure assumptions'. 3300 @!@^data structure assumptions@> 3301 However, other references to the nodes are made symbolically in terms of 3302 the \.{WEB} macro definitions above, so that format changes will leave 3303 \TeX's other algorithms intact. 3304 @^system dependencies@> 3305 3306 @* \[11] Memory layout. 3307 Some areas of |mem| are dedicated to fixed usage, since static allocation is 3308 more efficient than dynamic allocation when we can get away with it. For 3309 example, locations |mem_bot| to |mem_bot+3| are always used to store the 3310 specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The 3311 following macro definitions accomplish the static allocation by giving 3312 symbolic names to the fixed positions. Static variable-size nodes appear 3313 in locations |mem_bot| through |lo_mem_stat_max|, and static single-word nodes 3314 appear in locations |hi_mem_stat_min| through |mem_top|, inclusive. It is 3315 harmless to let |lig_trick| and |garbage| share the same location of |mem|. 3316 3317 @d zero_glue==mem_bot {specification for \.{0pt plus 0pt minus 0pt}} 3318 @d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}} 3319 @d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}} 3320 @d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}} 3321 @d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}} 3322 @d lo_mem_stat_max==fil_neg_glue+glue_spec_size-1 {largest statically 3323 allocated word in the variable-size |mem|} 3324 @# 3325 @d page_ins_head==mem_top {list of insertion data for current page} 3326 @d contrib_head==mem_top-1 {vlist of items not yet on current page} 3327 @d page_head==mem_top-2 {vlist for current page} 3328 @d temp_head==mem_top-3 {head of a temporary list of some kind} 3329 @d hold_head==mem_top-4 {head of a temporary list of another kind} 3330 @d adjust_head==mem_top-5 {head of adjustment list returned by |hpack|} 3331 @d active==mem_top-7 {head of active list in |line_break|, needs two words} 3332 @d align_head==mem_top-8 {head of preamble list for alignments} 3333 @d end_span==mem_top-9 {tail of spanned-width lists} 3334 @d omit_template==mem_top-10 {a constant token list} 3335 @d null_list==mem_top-11 {permanently empty list} 3336 @d lig_trick==mem_top-12 {a ligature masquerading as a |char_node|} 3337 @d garbage==mem_top-12 {used for scrap information} 3338 @d backup_head==mem_top-13 {head of token list built by |scan_keyword|} 3339 @d hi_mem_stat_min==mem_top-13 {smallest statically allocated word in 3340 the one-word |mem|} 3341 @d hi_mem_stat_usage=14 {the number of one-word nodes always present} 3342 3343 @ The following code gets |mem| off to a good start, when \TeX\ is 3344 initializing itself the slow~way. 3345 3346 @<Local variables for init...@>= 3347 @!k:integer; {index into |mem|, |eqtb|, etc.} 3348 3349 @ @<Initialize table entries...@>= 3350 for k:=mem_bot+1 to lo_mem_stat_max do mem[k].sc:=0; 3351 {all glue dimensions are zeroed} 3352 @^data structure assumptions@> 3353 k:=mem_bot;@+while k<=lo_mem_stat_max do 3354 {set first words of glue specifications} 3355 begin glue_ref_count(k):=null+1; 3356 stretch_order(k):=normal; shrink_order(k):=normal; 3357 k:=k+glue_spec_size; 3358 end; 3359 stretch(fil_glue):=unity; stretch_order(fil_glue):=fil;@/ 3360 stretch(fill_glue):=unity; stretch_order(fill_glue):=fill;@/ 3361 stretch(ss_glue):=unity; stretch_order(ss_glue):=fil;@/ 3362 shrink(ss_glue):=unity; shrink_order(ss_glue):=fil;@/ 3363 stretch(fil_neg_glue):=-unity; stretch_order(fil_neg_glue):=fil;@/ 3364 rover:=lo_mem_stat_max+1; 3365 link(rover):=empty_flag; {now initialize the dynamic memory} 3366 node_size(rover):=1000; {which is a 1000-word available node} 3367 llink(rover):=rover; rlink(rover):=rover;@/ 3368 lo_mem_max:=rover+1000; link(lo_mem_max):=null; info(lo_mem_max):=null;@/ 3369 for k:=hi_mem_stat_min to mem_top do 3370 mem[k]:=mem[lo_mem_max]; {clear list heads} 3371 @<Initialize the special list heads and constant nodes@>; 3372 avail:=null; mem_end:=mem_top; 3373 hi_mem_min:=hi_mem_stat_min; {initialize the one-word memory} 3374 var_used:=lo_mem_stat_max+1-mem_bot; dyn_used:=hi_mem_stat_usage; 3375 {initialize statistics} 3376 3377 @ If \TeX\ is extended improperly, the |mem| array might get screwed up. 3378 For example, some pointers might be wrong, or some ``dead'' nodes might not 3379 have been freed when the last reference to them disappeared. Procedures 3380 |check_mem| and |search_mem| are available to help diagnose such 3381 problems. These procedures make use of two arrays called |free| and 3382 |was_free| that are present only if \TeX's debugging routines have 3383 been included. (You may want to decrease the size of |mem| while you 3384 @^debugging@> 3385 are debugging.) 3386 3387 @<Glob...@>= 3388 @!debug @!free: packed array [mem_min..mem_max] of boolean; {free cells} 3389 @t\hskip10pt@>@!was_free: packed array [mem_min..mem_max] of boolean; 3390 {previously free cells} 3391 @t\hskip10pt@>@!was_mem_end,@!was_lo_max,@!was_hi_min: pointer; 3392 {previous |mem_end|, |lo_mem_max|, and |hi_mem_min|} 3393 @t\hskip10pt@>@!panicking:boolean; {do we want to check memory constantly?} 3394 gubed 3395 3396 @ @<Set initial...@>= 3397 @!debug was_mem_end:=mem_min; {indicate that everything was previously free} 3398 was_lo_max:=mem_min; was_hi_min:=mem_max; 3399 panicking:=false; 3400 gubed 3401 3402 @ Procedure |check_mem| makes sure that the available space lists of 3403 |mem| are well formed, and it optionally prints out all locations 3404 that are reserved now but were free the last time this procedure was called. 3405 3406 @p @!debug procedure check_mem(@!print_locs : boolean); 3407 label done1,done2; {loop exits} 3408 var p,@!q:pointer; {current locations of interest in |mem|} 3409 @!clobbered:boolean; {is something amiss?} 3410 begin for p:=mem_min to lo_mem_max do free[p]:=false; {you can probably 3411 do this faster} 3412 for p:=hi_mem_min to mem_end do free[p]:=false; {ditto} 3413 @<Check single-word |avail| list@>; 3414 @<Check variable-size |avail| list@>; 3415 @<Check flags of unavailable nodes@>; 3416 if print_locs then @<Print newly busy locations@>; 3417 for p:=mem_min to lo_mem_max do was_free[p]:=free[p]; 3418 for p:=hi_mem_min to mem_end do was_free[p]:=free[p]; 3419 {|was_free:=free| might be faster} 3420 was_mem_end:=mem_end; was_lo_max:=lo_mem_max; was_hi_min:=hi_mem_min; 3421 end; 3422 gubed 3423 3424 @ @<Check single-word...@>= 3425 p:=avail; q:=null; clobbered:=false; 3426 while p<>null do 3427 begin if (p>mem_end)or(p<hi_mem_min) then clobbered:=true 3428 else if free[p] then clobbered:=true; 3429 if clobbered then 3430 begin print_nl("AVAIL list clobbered at "); 3431 @.AVAIL list clobbered...@> 3432 print_int(q); goto done1; 3433 end; 3434 free[p]:=true; q:=p; p:=link(q); 3435 end; 3436 done1: 3437 3438 @ @<Check variable-size...@>= 3439 p:=rover; q:=null; clobbered:=false; 3440 repeat if (p>=lo_mem_max)or(p<mem_min) then clobbered:=true 3441 else if (rlink(p)>=lo_mem_max)or(rlink(p)<mem_min) then clobbered:=true 3442 else if not(is_empty(p))or(node_size(p)<2)or@| 3443 (p+node_size(p)>lo_mem_max)or@| (llink(rlink(p))<>p) then clobbered:=true; 3444 if clobbered then 3445 begin print_nl("Double-AVAIL list clobbered at "); 3446 print_int(q); goto done2; 3447 end; 3448 for q:=p to p+node_size(p)-1 do {mark all locations free} 3449 begin if free[q] then 3450 begin print_nl("Doubly free location at "); 3451 @.Doubly free location...@> 3452 print_int(q); goto done2; 3453 end; 3454 free[q]:=true; 3455 end; 3456 q:=p; p:=rlink(p); 3457 until p=rover; 3458 done2: 3459 3460 @ @<Check flags...@>= 3461 p:=mem_min; 3462 while p<=lo_mem_max do {node |p| should not be empty} 3463 begin if is_empty(p) then 3464 begin print_nl("Bad flag at "); print_int(p); 3465 @.Bad flag...@> 3466 end; 3467 while (p<=lo_mem_max) and not free[p] do incr(p); 3468 while (p<=lo_mem_max) and free[p] do incr(p); 3469 end 3470 3471 @ @<Print newly busy...@>= 3472 begin print_nl("New busy locs:"); 3473 for p:=mem_min to lo_mem_max do 3474 if not free[p] and ((p>was_lo_max) or was_free[p]) then 3475 begin print_char(" "); print_int(p); 3476 end; 3477 for p:=hi_mem_min to mem_end do 3478 if not free[p] and 3479 ((p<was_hi_min) or (p>was_mem_end) or was_free[p]) then 3480 begin print_char(" "); print_int(p); 3481 end; 3482 end 3483 3484 @ The |search_mem| procedure attempts to answer the question ``Who points 3485 to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem| 3486 that might not be of type |two_halves|. Strictly speaking, this is 3487 @^dirty \PASCAL@> 3488 undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to 3489 point to |p| purely by coincidence). But for debugging purposes, we want 3490 to rule out the places that do {\sl not\/} point to |p|, so a few false 3491 drops are tolerable. 3492 3493 @p @!debug procedure search_mem(@!p:pointer); {look for pointers to |p|} 3494 var q:integer; {current position being searched} 3495 begin for q:=mem_min to lo_mem_max do 3496 begin if link(q)=p then 3497 begin print_nl("LINK("); print_int(q); print_char(")"); 3498 end; 3499 if info(q)=p then 3500 begin print_nl("INFO("); print_int(q); print_char(")"); 3501 end; 3502 end; 3503 for q:=hi_mem_min to mem_end do 3504 begin if link(q)=p then 3505 begin print_nl("LINK("); print_int(q); print_char(")"); 3506 end; 3507 if info(q)=p then 3508 begin print_nl("INFO("); print_int(q); print_char(")"); 3509 end; 3510 end; 3511 @<Search |eqtb| for equivalents equal to |p|@>; 3512 @<Search |save_stack| for equivalents that point to |p|@>; 3513 @<Search |hyph_list| for pointers to |p|@>; 3514 end; 3515 gubed 3516 3517 @* \[12] Displaying boxes. 3518 We can reinforce our knowledge of the data structures just introduced 3519 by considering two procedures that display a list in symbolic form. 3520 The first of these, called |short_display|, is used in ``overfull box'' 3521 messages to give the top-level description of a list. The other one, 3522 called |show_node_list|, prints a detailed description of exactly what 3523 is in the data structure. 3524 3525 The philosophy of |short_display| is to ignore the fine points about exactly 3526 what is inside boxes, except that ligatures and discretionary breaks are 3527 expanded. As a result, |short_display| is a recursive procedure, but the 3528 recursion is never more than one level deep. 3529 @^recursion@> 3530 3531 A global variable |font_in_short_display| keeps track of the font code that 3532 is assumed to be present when |short_display| begins; deviations from this 3533 font will be printed. 3534 3535 @<Glob...@>= 3536 @!font_in_short_display:integer; {an internal font number} 3537 3538 @ Boxes, rules, inserts, whatsits, marks, and things in general that are 3539 sort of ``complicated'' are indicated only by printing `\.{[]}'. 3540 3541 @p procedure short_display(@!p:integer); {prints highlights of list |p|} 3542 var n:integer; {for replacement counts} 3543 begin while p>mem_min do 3544 begin if is_char_node(p) then 3545 begin if p<=mem_end then 3546 begin if font(p)<>font_in_short_display then 3547 begin if (font(p)<font_base)or(font(p)>font_max) then 3548 print_char("*") 3549 @.*\relax@> 3550 else @<Print the font identifier for |font(p)|@>; 3551 print_char(" "); font_in_short_display:=font(p); 3552 end; 3553 print_ASCII(qo(character(p))); 3554 end; 3555 end 3556 else @<Print a short indication of the contents of node |p|@>; 3557 p:=link(p); 3558 end; 3559 end; 3560 3561 @ @<Print a short indication of the contents of node |p|@>= 3562 case type(p) of 3563 hlist_node,vlist_node,ins_node,whatsit_node,mark_node,adjust_node, 3564 unset_node: print("[]"); 3565 rule_node: print_char("|"); 3566 glue_node: if glue_ptr(p)<>zero_glue then print_char(" "); 3567 math_node: print_char("$"); 3568 ligature_node: short_display(lig_ptr(p)); 3569 disc_node: begin short_display(pre_break(p)); 3570 short_display(post_break(p));@/ 3571 n:=replace_count(p); 3572 while n>0 do 3573 begin if link(p)<>null then p:=link(p); 3574 decr(n); 3575 end; 3576 end; 3577 othercases do_nothing 3578 endcases 3579 3580 @ The |show_node_list| routine requires some auxiliary subroutines: one to 3581 print a font-and-character combination, one to print a token list without 3582 its reference count, and one to print a rule dimension. 3583 3584 @p procedure print_font_and_char(@!p:integer); {prints |char_node| data} 3585 begin if p>mem_end then print_esc("CLOBBERED.") 3586 else begin if (font(p)<font_base)or(font(p)>font_max) then print_char("*") 3587 @.*\relax@> 3588 else @<Print the font identifier for |font(p)|@>; 3589 print_char(" "); print_ASCII(qo(character(p))); 3590 end; 3591 end; 3592 @# 3593 procedure print_mark(@!p:integer); {prints token list data in braces} 3594 begin print_char("{"); 3595 if (p<hi_mem_min)or(p>mem_end) then print_esc("CLOBBERED.") 3596 else show_token_list(link(p),null,max_print_line-10); 3597 print_char("}"); 3598 end; 3599 @# 3600 procedure print_rule_dimen(@!d:scaled); {prints dimension in rule node} 3601 begin if is_running(d) then print_char("*") else print_scaled(d); 3602 @.*\relax@> 3603 end; 3604 3605 @ Then there is a subroutine that prints glue stretch and shrink, possibly 3606 followed by the name of finite units: 3607 3608 @p procedure print_glue(@!d:scaled;@!order:integer;@!s:str_number); 3609 {prints a glue component} 3610 begin print_scaled(d); 3611 if (order<normal)or(order>filll) then print("foul") 3612 else if order>normal then 3613 begin print("fil"); 3614 while order>fil do 3615 begin print_char("l"); decr(order); 3616 end; 3617 end 3618 else if s<>0 then print(s); 3619 end; 3620 3621 @ The next subroutine prints a whole glue specification. 3622 3623 @p procedure print_spec(@!p:integer;@!s:str_number); 3624 {prints a glue specification} 3625 begin if (p<mem_min)or(p>=lo_mem_max) then print_char("*") 3626 @.*\relax@> 3627 else begin print_scaled(width(p)); 3628 if s<>0 then print(s); 3629 if stretch(p)<>0 then 3630 begin print(" plus "); print_glue(stretch(p),stretch_order(p),s); 3631 end; 3632 if shrink(p)<>0 then 3633 begin print(" minus "); print_glue(shrink(p),shrink_order(p),s); 3634 end; 3635 end; 3636 end; 3637 3638 @ We also need to declare some procedures that appear later in this 3639 documentation. 3640 3641 @p @<Declare procedures needed for displaying the elements of mlists@>@; 3642 @<Declare the procedure called |print_skip_param|@> 3643 3644 @ Since boxes can be inside of boxes, |show_node_list| is inherently recursive, 3645 @^recursion@> 3646 up to a given maximum number of levels. The history of nesting is indicated 3647 by the current string, which will be printed at the beginning of each line; 3648 the length of this string, namely |cur_length|, is the depth of nesting. 3649 3650 Recursive calls on |show_node_list| therefore use the following pattern: 3651 3652 @d node_list_display(#)== 3653 begin append_char("."); show_node_list(#); flush_char; 3654 end {|str_room| need not be checked; see |show_box| below} 3655 3656 @ A global variable called |depth_threshold| is used to record the maximum 3657 depth of nesting for which |show_node_list| will show information. If we 3658 have |depth_threshold=0|, for example, only the top level information will 3659 be given and no sublists will be traversed. Another global variable, called 3660 |breadth_max|, tells the maximum number of items to show at each level; 3661 |breadth_max| had better be positive, or you won't see anything. 3662 3663 @<Glob...@>= 3664 @!depth_threshold : integer; {maximum nesting depth in box displays} 3665 @!breadth_max : integer; {maximum number of items shown at the same list level} 3666 3667 @ Now we are ready for |show_node_list| itself. This procedure has been 3668 written to be ``extra robust'' in the sense that it should not crash or get 3669 into a loop even if the data structures have been messed up by bugs in 3670 the rest of the program. You can safely call its parent routine 3671 |show_box(p)| for arbitrary values of |p| when you are debugging \TeX. 3672 However, in the presence of bad data, the procedure may 3673 @^dirty \PASCAL@>@^debugging@> 3674 fetch a |memory_word| whose variant is different from the way it was stored; 3675 for example, it might try to read |mem[p].hh| when |mem[p]| 3676 contains a scaled integer, if |p| is a pointer that has been 3677 clobbered or chosen at random. 3678 3679 @p procedure show_node_list(@!p:integer); {prints a node list symbolically} 3680 label exit; 3681 var n:integer; {the number of items already printed at this level} 3682 @!g:real; {a glue ratio, as a floating point number} 3683 begin if cur_length>depth_threshold then 3684 begin if p>null then print(" []"); 3685 {indicate that there's been some truncation} 3686 return; 3687 end; 3688 n:=0; 3689 while p>mem_min do 3690 begin print_ln; print_current_string; {display the nesting history} 3691 if p>mem_end then {pointer out of range} 3692 begin print("Bad link, display aborted."); return; 3693 @.Bad link...@> 3694 end; 3695 incr(n); if n>breadth_max then {time to stop} 3696 begin print("etc."); return; 3697 @.etc@> 3698 end; 3699 @<Display node |p|@>; 3700 p:=link(p); 3701 end; 3702 exit: 3703 end; 3704 3705 @ @<Display node |p|@>= 3706 if is_char_node(p) then print_font_and_char(p) 3707 else case type(p) of 3708 hlist_node,vlist_node,unset_node: @<Display box |p|@>; 3709 rule_node: @<Display rule |p|@>; 3710 ins_node: @<Display insertion |p|@>; 3711 whatsit_node: @<Display the whatsit node |p|@>; 3712 glue_node: @<Display glue |p|@>; 3713 kern_node: @<Display kern |p|@>; 3714 math_node: @<Display math node |p|@>; 3715 ligature_node: @<Display ligature |p|@>; 3716 penalty_node: @<Display penalty |p|@>; 3717 disc_node: @<Display discretionary |p|@>; 3718 mark_node: @<Display mark |p|@>; 3719 adjust_node: @<Display adjustment |p|@>; 3720 @t\4@>@<Cases of |show_node_list| that arise in mlists only@>@; 3721 othercases print("Unknown node type!") 3722 endcases 3723 3724 @ @<Display box |p|@>= 3725 begin if type(p)=hlist_node then print_esc("h") 3726 else if type(p)=vlist_node then print_esc("v") 3727 else print_esc("unset"); 3728 print("box("); print_scaled(height(p)); print_char("+"); 3729 print_scaled(depth(p)); print(")x"); print_scaled(width(p)); 3730 if type(p)=unset_node then 3731 @<Display special fields of the unset node |p|@> 3732 else begin @<Display the value of |glue_set(p)|@>; 3733 if shift_amount(p)<>0 then 3734 begin print(", shifted "); print_scaled(shift_amount(p)); 3735 end; 3736 end; 3737 node_list_display(list_ptr(p)); {recursive call} 3738 end 3739 3740 @ @<Display special fields of the unset node |p|@>= 3741 begin if span_count(p)<>min_quarterword then 3742 begin print(" ("); print_int(qo(span_count(p))+1); 3743 print(" columns)"); 3744 end; 3745 if glue_stretch(p)<>0 then 3746 begin print(", stretch "); print_glue(glue_stretch(p),glue_order(p),0); 3747 end; 3748 if glue_shrink(p)<>0 then 3749 begin print(", shrink "); print_glue(glue_shrink(p),glue_sign(p),0); 3750 end; 3751 end 3752 3753 @ The code will have to change in this place if |glue_ratio| is 3754 a structured type instead of an ordinary |real|. Note that this routine 3755 should avoid arithmetic errors even if the |glue_set| field holds an 3756 arbitrary random value. The following code assumes that a properly 3757 formed nonzero |real| number has absolute value $2^{20}$ or more when 3758 it is regarded as an integer; this precaution was adequate to prevent 3759 floating point underflow on the author's computer. 3760 @^system dependencies@> 3761 @^dirty \PASCAL@> 3762 3763 @<Display the value of |glue_set(p)|@>= 3764 g:=float(glue_set(p)); 3765 if (g<>float_constant(0))and(glue_sign(p)<>normal) then 3766 begin print(", glue set "); 3767 if glue_sign(p)=shrinking then print("- "); 3768 if abs(mem[p+glue_offset].int)<@'4000000 then print("?.?") 3769 else if abs(g)>float_constant(20000) then 3770 begin if g>float_constant(0) then print_char(">") 3771 else print("< -"); 3772 print_glue(20000*unity,glue_order(p),0); 3773 end 3774 else print_glue(round(unity*g),glue_order(p),0); 3775 @^real multiplication@> 3776 end 3777 3778 @ @<Display rule |p|@>= 3779 begin print_esc("rule("); print_rule_dimen(height(p)); print_char("+"); 3780 print_rule_dimen(depth(p)); print(")x"); print_rule_dimen(width(p)); 3781 end 3782 3783 @ @<Display insertion |p|@>= 3784 begin print_esc("insert"); print_int(qo(subtype(p))); 3785 print(", natural size "); print_scaled(height(p)); 3786 print("; split("); print_spec(split_top_ptr(p),0); 3787 print_char(","); print_scaled(depth(p)); 3788 print("); float cost "); print_int(float_cost(p)); 3789 node_list_display(ins_ptr(p)); {recursive call} 3790 end 3791 3792 @ @<Display glue |p|@>= 3793 if subtype(p)>=a_leaders then @<Display leaders |p|@> 3794 else begin print_esc("glue"); 3795 if subtype(p)<>normal then 3796 begin print_char("("); 3797 if subtype(p)<cond_math_glue then 3798 print_skip_param(subtype(p)-1) 3799 else if subtype(p)=cond_math_glue then print_esc("nonscript") 3800 else print_esc("mskip"); 3801 print_char(")"); 3802 end; 3803 if subtype(p)<>cond_math_glue then 3804 begin print_char(" "); 3805 if subtype(p)<cond_math_glue then print_spec(glue_ptr(p),0) 3806 else print_spec(glue_ptr(p),"mu"); 3807 end; 3808 end 3809 3810 @ @<Display leaders |p|@>= 3811 begin print_esc(""); 3812 if subtype(p)=c_leaders then print_char("c") 3813 else if subtype(p)=x_leaders then print_char("x"); 3814 print("leaders "); print_spec(glue_ptr(p),0); 3815 node_list_display(leader_ptr(p)); {recursive call} 3816 end 3817 3818 @ An ``explicit'' kern value is indicated implicitly by an explicit space. 3819 3820 @<Display kern |p|@>= 3821 if subtype(p)<>mu_glue then 3822 begin print_esc("kern"); 3823 if subtype(p)<>normal then print_char(" "); 3824 print_scaled(width(p)); 3825 if subtype(p)=acc_kern then print(" (for accent)"); 3826 @.for accent@> 3827 end 3828 else begin print_esc("mkern"); print_scaled(width(p)); print("mu"); 3829 end 3830 3831 @ @<Display math node |p|@>= 3832 begin print_esc("math"); 3833 if subtype(p)=before then print("on") 3834 else print("off"); 3835 if width(p)<>0 then 3836 begin print(", surrounded "); print_scaled(width(p)); 3837 end; 3838 end 3839 3840 @ @<Display ligature |p|@>= 3841 begin print_font_and_char(lig_char(p)); print(" (ligature "); 3842 if subtype(p)>1 then print_char("|"); 3843 font_in_short_display:=font(lig_char(p)); short_display(lig_ptr(p)); 3844 if odd(subtype(p)) then print_char("|"); 3845 print_char(")"); 3846 end 3847 3848 @ @<Display penalty |p|@>= 3849 begin print_esc("penalty "); print_int(penalty(p)); 3850 end 3851 3852 @ The |post_break| list of a discretionary node is indicated by a prefixed 3853 `\.{\char'174}' instead of the `\..' before the |pre_break| list. 3854 3855 @<Display discretionary |p|@>= 3856 begin print_esc("discretionary"); 3857 if replace_count(p)>0 then 3858 begin print(" replacing "); print_int(replace_count(p)); 3859 end; 3860 node_list_display(pre_break(p)); {recursive call} 3861 append_char("|"); show_node_list(post_break(p)); flush_char; {recursive call} 3862 end 3863 3864 @ @<Display mark |p|@>= 3865 begin print_esc("mark"); print_mark(mark_ptr(p)); 3866 end 3867 3868 @ @<Display adjustment |p|@>= 3869 begin print_esc("vadjust"); node_list_display(adjust_ptr(p)); {recursive call} 3870 end 3871 3872 @ The recursive machinery is started by calling |show_box|. 3873 @^recursion@> 3874 3875 @p procedure show_box(@!p:pointer); 3876 begin @<Assign the values |depth_threshold:=show_box_depth| and 3877 |breadth_max:=show_box_breadth|@>; 3878 if breadth_max<=0 then breadth_max:=5; 3879 if pool_ptr+depth_threshold>=pool_size then 3880 depth_threshold:=pool_size-pool_ptr-1; 3881 {now there's enough room for prefix string} 3882 show_node_list(p); {the show starts at |p|} 3883 print_ln; 3884 end; 3885 3886 @* \[13] Destroying boxes. 3887 When we are done with a node list, we are obliged to return it to free 3888 storage, including all of its sublists. The recursive procedure 3889 |flush_node_list| does this for us. 3890 3891 @ First, however, we shall consider two non-recursive procedures that do 3892 simpler tasks. The first of these, |delete_token_ref|, is called when 3893 a pointer to a token list's reference count is being removed. This means 3894 that the token list should disappear if the reference count was |null|, 3895 otherwise the count should be decreased by one. 3896 @^reference counts@> 3897 3898 @d token_ref_count(#) == info(#) {reference count preceding a token list} 3899 3900 @p procedure delete_token_ref(@!p:pointer); {|p| points to the reference count 3901 of a token list that is losing one reference} 3902 begin if token_ref_count(p)=null then flush_list(p) 3903 else decr(token_ref_count(p)); 3904 end; 3905 3906 @ Similarly, |delete_glue_ref| is called when a pointer to a glue 3907 specification is being withdrawn. 3908 @^reference counts@> 3909 @d fast_delete_glue_ref(#)==@t@>@;@/ 3910 begin if glue_ref_count(#)=null then free_node(#,glue_spec_size) 3911 else decr(glue_ref_count(#)); 3912 end 3913 3914 @p procedure delete_glue_ref(@!p:pointer); {|p| points to a glue specification} 3915 fast_delete_glue_ref(p); 3916 3917 @ Now we are ready to delete any node list, recursively. 3918 In practice, the nodes deleted are usually charnodes (about 2/3 of the time), 3919 and they are glue nodes in about half of the remaining cases. 3920 @^recursion@> 3921 3922 @p procedure flush_node_list(@!p:pointer); {erase list of nodes starting at |p|} 3923 label done; {go here when node |p| has been freed} 3924 var q:pointer; {successor to node |p|} 3925 begin while p<>null do 3926 @^inner loop@> 3927 begin q:=link(p); 3928 if is_char_node(p) then free_avail(p) 3929 else begin case type(p) of 3930 hlist_node,vlist_node,unset_node: begin flush_node_list(list_ptr(p)); 3931 free_node(p,box_node_size); goto done; 3932 end; 3933 rule_node: begin free_node(p,rule_node_size); goto done; 3934 end; 3935 ins_node: begin flush_node_list(ins_ptr(p)); 3936 delete_glue_ref(split_top_ptr(p)); 3937 free_node(p,ins_node_size); goto done; 3938 end; 3939 whatsit_node: @<Wipe out the whatsit node |p| and |goto done|@>; 3940 glue_node: begin fast_delete_glue_ref(glue_ptr(p)); 3941 if leader_ptr(p)<>null then flush_node_list(leader_ptr(p)); 3942 end; 3943 kern_node,math_node,penalty_node: do_nothing; 3944 ligature_node: flush_node_list(lig_ptr(p)); 3945 mark_node: delete_token_ref(mark_ptr(p)); 3946 disc_node: begin flush_node_list(pre_break(p)); 3947 flush_node_list(post_break(p)); 3948 end; 3949 adjust_node: flush_node_list(adjust_ptr(p)); 3950 @t\4@>@<Cases of |flush_node_list| that arise in mlists only@>@; 3951 othercases confusion("flushing") 3952 @:this can't happen flushing}{\quad flushing@> 3953 endcases;@/ 3954 free_node(p,small_node_size); 3955 done:end; 3956 p:=q; 3957 end; 3958 end; 3959 3960 @* \[14] Copying boxes. 3961 Another recursive operation that acts on boxes is sometimes needed: The 3962 procedure |copy_node_list| returns a pointer to another node list that has 3963 the same structure and meaning as the original. Note that since glue 3964 specifications and token lists have reference counts, we need not make 3965 copies of them. Reference counts can never get too large to fit in a 3966 halfword, since each pointer to a node is in a different memory address, 3967 and the total number of memory addresses fits in a halfword. 3968 @^recursion@> 3969 @^reference counts@> 3970 3971 (Well, there actually are also references from outside |mem|; if the 3972 |save_stack| is made arbitrarily large, it would theoretically be possible 3973 to break \TeX\ by overflowing a reference count. But who would want to do that?) 3974 3975 @d add_token_ref(#)==incr(token_ref_count(#)) {new reference to a token list} 3976 @d add_glue_ref(#)==incr(glue_ref_count(#)) {new reference to a glue spec} 3977 3978 @ The copying procedure copies words en masse without bothering 3979 to look at their individual fields. If the node format changes---for 3980 example, if the size is altered, or if some link field is moved to another 3981 relative position---then this code may need to be changed too. 3982 @^data structure assumptions@> 3983 3984 @p function copy_node_list(@!p:pointer):pointer; {makes a duplicate of the 3985 node list that starts at |p| and returns a pointer to the new list} 3986 var h:pointer; {temporary head of copied list} 3987 @!q:pointer; {previous position in new list} 3988 @!r:pointer; {current node being fabricated for new list} 3989 @!words:0..5; {number of words remaining to be copied} 3990 begin h:=get_avail; q:=h; 3991 while p<>null do 3992 begin @<Make a copy of node |p| in node |r|@>; 3993 link(q):=r; q:=r; p:=link(p); 3994 end; 3995 link(q):=null; q:=link(h); free_avail(h); 3996 copy_node_list:=q; 3997 end; 3998 3999 @ @<Make a copy of node |p|...@>= 4000 words:=1; {this setting occurs in more branches than any other} 4001 if is_char_node(p) then r:=get_avail 4002 else @<Case statement to copy different types and set |words| to the number 4003 of initial words not yet copied@>; 4004 while words>0 do 4005 begin decr(words); mem[r+words]:=mem[p+words]; 4006 end 4007 4008 @ @<Case statement to copy...@>= 4009 case type(p) of 4010 hlist_node,vlist_node,unset_node: begin r:=get_node(box_node_size); 4011 mem[r+6]:=mem[p+6]; mem[r+5]:=mem[p+5]; {copy the last two words} 4012 list_ptr(r):=copy_node_list(list_ptr(p)); {this affects |mem[r+5]|} 4013 words:=5; 4014 end; 4015 rule_node: begin r:=get_node(rule_node_size); words:=rule_node_size; 4016 end; 4017 ins_node: begin r:=get_node(ins_node_size); mem[r+4]:=mem[p+4]; 4018 add_glue_ref(split_top_ptr(p)); 4019 ins_ptr(r):=copy_node_list(ins_ptr(p)); {this affects |mem[r+4]|} 4020 words:=ins_node_size-1; 4021 end; 4022 whatsit_node:@<Make a partial copy of the whatsit node |p| and make |r| 4023 point to it; set |words| to the number of initial words not yet copied@>; 4024 glue_node: begin r:=get_node(small_node_size); add_glue_ref(glue_ptr(p)); 4025 glue_ptr(r):=glue_ptr(p); leader_ptr(r):=copy_node_list(leader_ptr(p)); 4026 end; 4027 kern_node,math_node,penalty_node: begin r:=get_node(small_node_size); 4028 words:=small_node_size; 4029 end; 4030 ligature_node: begin r:=get_node(small_node_size); 4031 mem[lig_char(r)]:=mem[lig_char(p)]; {copy |font| and |character|} 4032 lig_ptr(r):=copy_node_list(lig_ptr(p)); 4033 end; 4034 disc_node: begin r:=get_node(small_node_size); 4035 pre_break(r):=copy_node_list(pre_break(p)); 4036 post_break(r):=copy_node_list(post_break(p)); 4037 end; 4038 mark_node: begin r:=get_node(small_node_size); add_token_ref(mark_ptr(p)); 4039 words:=small_node_size; 4040 end; 4041 adjust_node: begin r:=get_node(small_node_size); 4042 adjust_ptr(r):=copy_node_list(adjust_ptr(p)); 4043 end; {|words=1=small_node_size-1|} 4044 othercases confusion("copying") 4045 @:this can't happen copying}{\quad copying@> 4046 endcases 4047 4048 @* \[15] The command codes. 4049 Before we can go any further, we need to define symbolic names for the internal 4050 code numbers that represent the various commands obeyed by \TeX. These codes 4051 are somewhat arbitrary, but not completely so. For example, the command 4052 codes for character types are fixed by the language, since a user says, 4053 e.g., `\.{\\catcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter, 4054 and the command code |math_shift| is equal to~3. Some other codes have 4055 been made adjacent so that |case| statements in the program need not consider 4056 cases that are widely spaced, or so that |case| statements can be replaced 4057 by |if| statements. 4058 4059 At any rate, here is the list, for future reference. First come the 4060 ``catcode'' commands, several of which share their numeric codes with 4061 ordinary commands when the catcode cannot emerge from \TeX's scanning routine. 4062 4063 @d escape=0 {escape delimiter (called \.\\ in {\sl The \TeX book\/})} 4064 @:TeXbook}{\sl The \TeX book@> 4065 @d relax=0 {do nothing ( \.{\\relax} )} 4066 @d left_brace=1 {beginning of a group ( \.\{ )} 4067 @d right_brace=2 {ending of a group ( \.\} )} 4068 @d math_shift=3 {mathematics shift character ( \.\$ )} 4069 @d tab_mark=4 {alignment delimiter ( \.\&, \.{\\span} )} 4070 @d car_ret=5 {end of line ( |carriage_return|, \.{\\cr}, \.{\\crcr} )} 4071 @d out_param=5 {output a macro parameter} 4072 @d mac_param=6 {macro parameter symbol ( \.\# )} 4073 @d sup_mark=7 {superscript ( \.{\char'136} )} 4074 @d sub_mark=8 {subscript ( \.{\char'137} )} 4075 @d ignore=9 {characters to ignore ( \.{\^\^@@} )} 4076 @d endv=9 {end of \<v_j> list in alignment template} 4077 @d spacer=10 {characters equivalent to blank space ( \.{\ } )} 4078 @d letter=11 {characters regarded as letters ( \.{A..Z}, \.{a..z} )} 4079 @d other_char=12 {none of the special character types} 4080 @d active_char=13 {characters that invoke macros ( \.{\char`\~} )} 4081 @d par_end=13 {end of paragraph ( \.{\\par} )} 4082 @d match=13 {match a macro parameter} 4083 @d comment=14 {characters that introduce comments ( \.\% )} 4084 @d end_match=14 {end of parameters to macro} 4085 @d stop=14 {end of job ( \.{\\end}, \.{\\dump} )} 4086 @d invalid_char=15 {characters that shouldn't appear ( \.{\^\^?} )} 4087 @d delim_num=15 {specify delimiter numerically ( \.{\\delimiter} )} 4088 @d max_char_code=15 {largest catcode for individual characters} 4089 4090 @ Next are the ordinary run-of-the-mill command codes. Codes that are 4091 |min_internal| or more represent internal quantities that might be 4092 expanded by `\.{\\the}'. 4093 4094 @d char_num=16 {character specified numerically ( \.{\\char} )} 4095 @d math_char_num=17 {explicit math code ( \.{\\mathchar} )} 4096 @d mark=18 {mark definition ( \.{\\mark} )} 4097 @d xray=19 {peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)} 4098 @d make_box=20 {make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)} 4099 @d hmove=21 {horizontal motion ( \.{\\moveleft}, \.{\\moveright} )} 4100 @d vmove=22 {vertical motion ( \.{\\raise}, \.{\\lower} )} 4101 @d un_hbox=23 {unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )} 4102 @d un_vbox=24 {unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )} 4103 @d remove_item=25 {nullify last item ( \.{\\unpenalty}, 4104 \.{\\unkern}, \.{\\unskip} )} 4105 @d hskip=26 {horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)} 4106 @d vskip=27 {vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)} 4107 @d mskip=28 {math glue ( \.{\\mskip} )} 4108 @d kern=29 {fixed space ( \.{\\kern} )} 4109 @d mkern=30 {math kern ( \.{\\mkern} )} 4110 @d leader_ship=31 {use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)} 4111 @d halign=32 {horizontal table alignment ( \.{\\halign} )} 4112 @d valign=33 {vertical table alignment ( \.{\\valign} )} 4113 @d no_align=34 {temporary escape from alignment ( \.{\\noalign} )} 4114 @d vrule=35 {vertical rule ( \.{\\vrule} )} 4115 @d hrule=36 {horizontal rule ( \.{\\hrule} )} 4116 @d insert=37 {vlist inserted in box ( \.{\\insert} )} 4117 @d vadjust=38 {vlist inserted in enclosing paragraph ( \.{\\vadjust} )} 4118 @d ignore_spaces=39 {gobble |spacer| tokens ( \.{\\ignorespaces} )} 4119 @d after_assignment=40 {save till assignment is done ( \.{\\afterassignment} )} 4120 @d after_group=41 {save till group is done ( \.{\\aftergroup} )} 4121 @d break_penalty=42 {additional badness ( \.{\\penalty} )} 4122 @d start_par=43 {begin paragraph ( \.{\\indent}, \.{\\noindent} )} 4123 @d ital_corr=44 {italic correction ( \.{\\/} )} 4124 @d accent=45 {attach accent in text ( \.{\\accent} )} 4125 @d math_accent=46 {attach accent in math ( \.{\\mathaccent} )} 4126 @d discretionary=47 {discretionary texts ( \.{\\-}, \.{\\discretionary} )} 4127 @d eq_no=48 {equation number ( \.{\\eqno}, \.{\\leqno} )} 4128 @d left_right=49 {variable delimiter ( \.{\\left}, \.{\\right} )} 4129 @d math_comp=50 {component of formula ( \.{\\mathbin}, etc.~)} 4130 @d limit_switch=51 {diddle limit conventions ( \.{\\displaylimits}, etc.~)} 4131 @d above=52 {generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)} 4132 @d math_style=53 {style specification ( \.{\\displaystyle}, etc.~)} 4133 @d math_choice=54 {choice specification ( \.{\\mathchoice} )} 4134 @d non_script=55 {conditional math glue ( \.{\\nonscript} )} 4135 @d vcenter=56 {vertically center a vbox ( \.{\\vcenter} )} 4136 @d case_shift=57 {force specific case ( \.{\\lowercase}, \.{\\uppercase}~)} 4137 @d message=58 {send to user ( \.{\\message}, \.{\\errmessage} )} 4138 @d extension=59 {extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)} 4139 @d in_stream=60 {files for reading ( \.{\\openin}, \.{\\closein} )} 4140 @d begin_group=61 {begin local grouping ( \.{\\begingroup} )} 4141 @d end_group=62 {end local grouping ( \.{\\endgroup} )} 4142 @d omit=63 {omit alignment template ( \.{\\omit} )} 4143 @d ex_space=64 {explicit space ( \.{\\\ } )} 4144 @d no_boundary=65 {suppress boundary ligatures ( \.{\\noboundary} )} 4145 @d radical=66 {square root and similar signs ( \.{\\radical} )} 4146 @d end_cs_name=67 {end control sequence ( \.{\\endcsname} )} 4147 @d min_internal=68 {the smallest code that can follow \.{\\the}} 4148 @d char_given=68 {character code defined by \.{\\chardef}} 4149 @d math_given=69 {math code defined by \.{\\mathchardef}} 4150 @d last_item=70 {most recent item ( \.{\\lastpenalty}, 4151 \.{\\lastkern}, \.{\\lastskip} )} 4152 @d max_non_prefixed_command=70 {largest command code that can't be \.{\\global}} 4153 4154 @ The next codes are special; they all relate to mode-independent 4155 assignment of values to \TeX's internal registers or tables. 4156 Codes that are |max_internal| or less represent internal quantities 4157 that might be expanded by `\.{\\the}'. 4158 4159 @d toks_register=71 {token list register ( \.{\\toks} )} 4160 @d assign_toks=72 {special token list ( \.{\\output}, \.{\\everypar}, etc.~)} 4161 @d assign_int=73 {user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)} 4162 @d assign_dimen=74 {user-defined length ( \.{\\hsize}, etc.~)} 4163 @d assign_glue=75 {user-defined glue ( \.{\\baselineskip}, etc.~)} 4164 @d assign_mu_glue=76 {user-defined muglue ( \.{\\thinmuskip}, etc.~)} 4165 @d assign_font_dimen=77 {user-defined font dimension ( \.{\\fontdimen} )} 4166 @d assign_font_int=78 {user-defined font integer ( \.{\\hyphenchar}, 4167 \.{\\skewchar} )} 4168 @d set_aux=79 {specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )} 4169 @d set_prev_graf=80 {specify state info ( \.{\\prevgraf} )} 4170 @d set_page_dimen=81 {specify state info ( \.{\\pagegoal}, etc.~)} 4171 @d set_page_int=82 {specify state info ( \.{\\deadcycles}, 4172 \.{\\insertpenalties} )} 4173 @d set_box_dimen=83 {change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )} 4174 @d set_shape=84 {specify fancy paragraph shape ( \.{\\parshape} )} 4175 @d def_code=85 {define a character code ( \.{\\catcode}, etc.~)} 4176 @d def_family=86 {declare math fonts ( \.{\\textfont}, etc.~)} 4177 @d set_font=87 {set current font ( font identifiers )} 4178 @d def_font=88 {define a font file ( \.{\\font} )} 4179 @d register=89 {internal register ( \.{\\count}, \.{\\dimen}, etc.~)} 4180 @d max_internal=89 {the largest code that can follow \.{\\the}} 4181 @d advance=90 {advance a register or parameter ( \.{\\advance} )} 4182 @d multiply=91 {multiply a register or parameter ( \.{\\multiply} )} 4183 @d divide=92 {divide a register or parameter ( \.{\\divide} )} 4184 @d prefix=93 {qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )} 4185 @d let=94 {assign a command code ( \.{\\let}, \.{\\futurelet} )} 4186 @d shorthand_def=95 {code definition ( \.{\\chardef}, \.{\\countdef}, etc.~)} 4187 @d read_to_cs=96 {read into a control sequence ( \.{\\read} )} 4188 @d def=97 {macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )} 4189 @d set_box=98 {set a box ( \.{\\setbox} )} 4190 @d hyph_data=99 {hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )} 4191 @d set_interaction=100 {define level of interaction ( \.{\\batchmode}, etc.~)} 4192 @d max_command=100 {the largest command code seen at |big_switch|} 4193 4194 @ The remaining command codes are extra special, since they cannot get through 4195 \TeX's scanner to the main control routine. They have been given values higher 4196 than |max_command| so that their special nature is easily discernible. 4197 The ``expandable'' commands come first. 4198 4199 @d undefined_cs=max_command+1 {initial state of most |eq_type| fields} 4200 @d expand_after=max_command+2 {special expansion ( \.{\\expandafter} )} 4201 @d no_expand=max_command+3 {special nonexpansion ( \.{\\noexpand} )} 4202 @d input=max_command+4 {input a source file ( \.{\\input}, \.{\\endinput} )} 4203 @d if_test=max_command+5 {conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)} 4204 @d fi_or_else=max_command+6 {delimiters for conditionals ( \.{\\else}, etc.~)} 4205 @d cs_name=max_command+7 {make a control sequence from tokens ( \.{\\csname} )} 4206 @d convert=max_command+8 {convert to text ( \.{\\number}, \.{\\string}, etc.~)} 4207 @d the=max_command+9 {expand an internal quantity ( \.{\\the} )} 4208 @d top_bot_mark=max_command+10 {inserted mark ( \.{\\topmark}, etc.~)} 4209 @d call=max_command+11 {non-long, non-outer control sequence} 4210 @d long_call=max_command+12 {long, non-outer control sequence} 4211 @d outer_call=max_command+13 {non-long, outer control sequence} 4212 @d long_outer_call=max_command+14 {long, outer control sequence} 4213 @d end_template=max_command+15 {end of an alignment template} 4214 @d dont_expand=max_command+16 {the following token was marked by \.{\\noexpand}} 4215 @d glue_ref=max_command+17 {the equivalent points to a glue specification} 4216 @d shape_ref=max_command+18 {the equivalent points to a parshape specification} 4217 @d box_ref=max_command+19 {the equivalent points to a box node, or is |null|} 4218 @d data=max_command+20 {the equivalent is simply a halfword number} 4219 4220 @* \[16] The semantic nest. 4221 \TeX\ is typically in the midst of building many lists at once. For example, 4222 when a math formula is being processed, \TeX\ is in math mode and 4223 working on an mlist; this formula has temporarily interrupted \TeX\ from 4224 being in horizontal mode and building the hlist of a paragraph; and this 4225 paragraph has temporarily interrupted \TeX\ from being in vertical mode 4226 and building the vlist for the next page of a document. Similarly, when a 4227 \.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily 4228 interrupted from working in restricted horizontal mode, and it enters 4229 internal vertical mode. The ``semantic nest'' is a stack that 4230 keeps track of what lists and modes are currently suspended. 4231 4232 At each level of processing we are in one of six modes: 4233 4234 \yskip\hang|vmode| stands for vertical mode (the page builder); 4235 4236 \hang|hmode| stands for horizontal mode (the paragraph builder); 4237 4238 \hang|mmode| stands for displayed formula mode; 4239 4240 \hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox}); 4241 4242 \hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox}); 4243 4244 \hang|-mmode| stands for math formula mode (not displayed). 4245 4246 \yskip\noindent The mode is temporarily set to zero while processing \.{\\write} 4247 texts. 4248 4249 Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that 4250 \TeX's ``big semantic switch'' can select the appropriate thing to 4251 do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current 4252 mode and |cur_cmd| is the current command code. 4253 4254 @d vmode=1 {vertical mode} 4255 @d hmode=vmode+max_command+1 {horizontal mode} 4256 @d mmode=hmode+max_command+1 {math mode} 4257 4258 @p procedure print_mode(@!m:integer); {prints the mode represented by |m|} 4259 begin if m>0 then 4260 case m div (max_command+1) of 4261 0:print("vertical"); 4262 1:print("horizontal"); 4263 2:print("display math"); 4264 end 4265 else if m=0 then print("no") 4266 else case (-m) div (max_command+1) of 4267 0:print("internal vertical"); 4268 1:print("restricted horizontal"); 4269 2:print("math"); 4270 end; 4271 print(" mode"); 4272 end; 4273 4274 @ The state of affairs at any semantic level can be represented by 4275 five values: 4276 4277 \yskip\hang|mode| is the number representing the semantic mode, as 4278 just explained. 4279 4280 \yskip\hang|head| is a |pointer| to a list head for the list being built; 4281 |link(head)| therefore points to the first element of the list, or 4282 to |null| if the list is empty. 4283 4284 \yskip\hang|tail| is a |pointer| to the final node of the list being 4285 built; thus, |tail=head| if and only if the list is empty. 4286 4287 \yskip\hang|prev_graf| is the number of lines of the current paragraph that 4288 have already been put into the present vertical list. 4289 4290 \yskip\hang|aux| is an auxiliary |memory_word| that gives further information 4291 that is needed to characterize the situation. 4292 4293 \yskip\noindent 4294 In vertical mode, |aux| is also known as |prev_depth|; it is the scaled 4295 value representing the depth of the previous box, for use in baseline 4296 calculations, or it is |<=-1000|pt if the next box on the vertical list is to 4297 be exempt from baseline calculations. In horizontal mode, |aux| is also 4298 known as |space_factor| and |clang|; it holds the current space factor used in 4299 spacing calculations, and the current language used for hyphenation. 4300 (The value of |clang| is undefined in restricted horizontal mode.) 4301 In math mode, |aux| is also known as |incompleat_noad|; if 4302 not |null|, it points to a record that represents the numerator of a 4303 generalized fraction for which the denominator is currently being formed 4304 in the current list. 4305 4306 There is also a sixth quantity, |mode_line|, which correlates 4307 the semantic nest with the user's input; |mode_line| contains the source 4308 line number at which the current level of nesting was entered. The negative 4309 of this line number is the |mode_line| at the level of the 4310 user's output routine. 4311 4312 In horizontal mode, the |prev_graf| field is used for initial language data. 4313 4314 The semantic nest is an array called |nest| that holds the |mode|, |head|, 4315 |tail|, |prev_graf|, |aux|, and |mode_line| values for all semantic levels 4316 below the currently active one. Information about the currently active 4317 level is kept in the global quantities |mode|, |head|, |tail|, |prev_graf|, 4318 |aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to 4319 be pushed onto |nest| if necessary. 4320 4321 @d ignore_depth==-65536000 {|prev_depth| value that is ignored} 4322 4323 @<Types...@>= 4324 @!list_state_record=record@!mode_field:-mmode..mmode;@+ 4325 @!head_field,@!tail_field: pointer; 4326 @!pg_field,@!ml_field: integer;@+ 4327 @!aux_field: memory_word; 4328 end; 4329 4330 @ @d mode==cur_list.mode_field {current mode} 4331 @d head==cur_list.head_field {header node of current list} 4332 @d tail==cur_list.tail_field {final node on current list} 4333 @d prev_graf==cur_list.pg_field {number of paragraph lines accumulated} 4334 @d aux==cur_list.aux_field {auxiliary data about the current list} 4335 @d prev_depth==aux.sc {the name of |aux| in vertical mode} 4336 @d space_factor==aux.hh.lh {part of |aux| in horizontal mode} 4337 @d clang==aux.hh.rh {the other part of |aux| in horizontal mode} 4338 @d incompleat_noad==aux.int {the name of |aux| in math mode} 4339 @d mode_line==cur_list.ml_field {source file line number at beginning of list} 4340 4341 @<Glob...@>= 4342 @!nest:array[0..nest_size] of list_state_record; 4343 @!nest_ptr:0..nest_size; {first unused location of |nest|} 4344 @!max_nest_stack:0..nest_size; {maximum of |nest_ptr| when pushing} 4345 @!cur_list:list_state_record; {the ``top'' semantic state} 4346 @!shown_mode:-mmode..mmode; {most recent mode shown by \.{\\tracingcommands}} 4347 4348 @ Here is a common way to make the current list grow: 4349 4350 @d tail_append(#)==begin link(tail):=#; tail:=link(tail); 4351 end 4352 4353 @ We will see later that the vertical list at the bottom semantic level is split 4354 into two parts; the ``current page'' runs from |page_head| to |page_tail|, 4355 and the ``contribution list'' runs from |contrib_head| to |tail| of 4356 semantic level zero. The idea is that contributions are first formed in 4357 vertical mode, then ``contributed'' to the current page (during which time 4358 the page-breaking decisions are made). For now, we don't need to know 4359 any more details about the page-building process. 4360 4361 @<Set init...@>= 4362 nest_ptr:=0; max_nest_stack:=0; 4363 mode:=vmode; head:=contrib_head; tail:=contrib_head; 4364 prev_depth:=ignore_depth; mode_line:=0; 4365 prev_graf:=0; shown_mode:=0; 4366 @<Start a new current page@>; 4367 4368 @ When \TeX's work on one level is interrupted, the state is saved by 4369 calling |push_nest|. This routine changes |head| and |tail| so that 4370 a new (empty) list is begun; it does not change |mode| or |aux|. 4371 4372 @p procedure push_nest; {enter a new semantic level, save the old} 4373 begin if nest_ptr>max_nest_stack then 4374 begin max_nest_stack:=nest_ptr; 4375 if nest_ptr=nest_size then overflow("semantic nest size",nest_size); 4376 @:TeX capacity exceeded semantic nest size}{\quad semantic nest size@> 4377 end; 4378 nest[nest_ptr]:=cur_list; {stack the record} 4379 incr(nest_ptr); head:=get_avail; tail:=head; prev_graf:=0; mode_line:=line; 4380 end; 4381 4382 @ Conversely, when \TeX\ is finished on the current level, the former 4383 state is restored by calling |pop_nest|. This routine will never be 4384 called at the lowest semantic level, nor will it be called unless |head| 4385 is a node that should be returned to free memory. 4386 4387 @p procedure pop_nest; {leave a semantic level, re-enter the old} 4388 begin free_avail(head); decr(nest_ptr); cur_list:=nest[nest_ptr]; 4389 end; 4390 4391 @ Here is a procedure that displays what \TeX\ is working on, at all levels. 4392 4393 @p procedure@?print_totals; forward;@t\2@> 4394 procedure show_activities; 4395 var p:0..nest_size; {index into |nest|} 4396 @!m:-mmode..mmode; {mode} 4397 @!a:memory_word; {auxiliary} 4398 @!q,@!r:pointer; {for showing the current page} 4399 @!t:integer; {ditto} 4400 begin nest[nest_ptr]:=cur_list; {put the top level into the array} 4401 print_nl(""); print_ln; 4402 for p:=nest_ptr downto 0 do 4403 begin m:=nest[p].mode_field; a:=nest[p].aux_field; 4404 print_nl("### "); print_mode(m); 4405 print(" entered at line "); print_int(abs(nest[p].ml_field)); 4406 if m=hmode then if nest[p].pg_field <> @'40600000 then 4407 begin print(" (language"); print_int(nest[p].pg_field mod @'200000); 4408 print(":hyphenmin"); print_int(nest[p].pg_field div @'20000000); 4409 print_char(","); print_int((nest[p].pg_field div @'200000) mod @'100); 4410 print_char(")"); 4411 end; 4412 if nest[p].ml_field<0 then print(" (\output routine)"); 4413 if p=0 then 4414 begin @<Show the status of the current page@>; 4415 if link(contrib_head)<>null then 4416 print_nl("### recent contributions:"); 4417 end; 4418 show_box(link(nest[p].head_field)); 4419 @<Show the auxiliary field, |a|@>; 4420 end; 4421 end; 4422 4423 @ @<Show the auxiliary...@>= 4424 case abs(m) div (max_command+1) of 4425 0: begin print_nl("prevdepth "); 4426 if a.sc<=ignore_depth then print("ignored") 4427 else print_scaled(a.sc); 4428 if nest[p].pg_field<>0 then 4429 begin print(", prevgraf "); 4430 print_int(nest[p].pg_field); print(" line"); 4431 if nest[p].pg_field<>1 then print_char("s"); 4432 end; 4433 end; 4434 1: begin print_nl("spacefactor "); print_int(a.hh.lh); 4435 if m>0 then@+ if a.hh.rh>0 then 4436 begin print(", current language "); print_int(a.hh.rh);@+ 4437 end; 4438 end; 4439 2: if a.int<>null then 4440 begin print("this will begin denominator of:"); show_box(a.int);@+ 4441 end; 4442 end {there are no other cases} 4443 4444 @* \[17] The table of equivalents. 4445 Now that we have studied the data structures for \TeX's semantic routines, 4446 we ought to consider the data structures used by its syntactic routines. In 4447 other words, our next concern will be 4448 the tables that \TeX\ looks at when it is scanning 4449 what the user has written. 4450 4451 The biggest and most important such table is called |eqtb|. It holds the 4452 current ``equivalents'' of things; i.e., it explains what things mean 4453 or what their current values are, for all quantities that are subject to 4454 the nesting structure provided by \TeX's grouping mechanism. There are six 4455 parts to |eqtb|: 4456 4457 \yskip\hangg 1) |eqtb[active_base..(hash_base-1)]| holds the current 4458 equivalents of single-character control sequences. 4459 4460 \yskip\hangg 2) |eqtb[hash_base..(glue_base-1)]| holds the current 4461 equivalents of multiletter control sequences. 4462 4463 \yskip\hangg 3) |eqtb[glue_base..(local_base-1)]| holds the current 4464 equivalents of glue parameters like the current baselineskip. 4465 4466 \yskip\hangg 4) |eqtb[local_base..(int_base-1)]| holds the current 4467 equivalents of local halfword quantities like the current box registers, 4468 the current ``catcodes,'' the current font, and a pointer to the current 4469 paragraph shape. 4470 4471 \yskip\hangg 5) |eqtb[int_base..(dimen_base-1)]| holds the current 4472 equivalents of fullword integer parameters like the current hyphenation 4473 penalty. 4474 4475 \yskip\hangg 6) |eqtb[dimen_base..eqtb_size]| holds the current equivalents 4476 of fullword dimension parameters like the current hsize or amount of 4477 hanging indentation. 4478 4479 \yskip\noindent Note that, for example, the current amount of 4480 baselineskip glue is determined by the setting of a particular location 4481 in region~3 of |eqtb|, while the current meaning of the control sequence 4482 `\.{\\baselineskip}' (which might have been changed by \.{\\def} or 4483 \.{\\let}) appears in region~2. 4484 4485 @ Each entry in |eqtb| is a |memory_word|. Most of these words are of type 4486 |two_halves|, and subdivided into three fields: 4487 4488 \yskip\hangg 1) The |eq_level| (a quarterword) is the level of grouping at 4489 which this equivalent was defined. If the level is |level_zero|, the 4490 equivalent has never been defined; |level_one| refers to the outer level 4491 (outside of all groups), and this level is also used for global 4492 definitions that never go away. Higher levels are for equivalents that 4493 will disappear at the end of their group. @^global definitions@> 4494 4495 \yskip\hangg 2) The |eq_type| (another quarterword) specifies what kind of 4496 entry this is. There are many types, since each \TeX\ primitive like 4497 \.{\\hbox}, \.{\\def}, etc., has its own special code. The list of 4498 command codes above includes all possible settings of the |eq_type| field. 4499 4500 \yskip\hangg 3) The |equiv| (a halfword) is the current equivalent value. 4501 This may be a font number, a pointer into |mem|, or a variety of other 4502 things. 4503 4504 @d eq_level_field(#)==#.hh.b1 4505 @d eq_type_field(#)==#.hh.b0 4506 @d equiv_field(#)==#.hh.rh 4507 @d eq_level(#)==eq_level_field(eqtb[#]) {level of definition} 4508 @d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent} 4509 @d equiv(#)==equiv_field(eqtb[#]) {equivalent value} 4510 @d level_zero=min_quarterword {level for undefined quantities} 4511 @d level_one=level_zero+1 {outermost level for defined quantities} 4512 4513 @ Many locations in |eqtb| have symbolic names. The purpose of the next 4514 paragraphs is to define these names, and to set up the initial values of the 4515 equivalents. 4516 4517 In the first region we have 256 equivalents for ``active characters'' that 4518 act as control sequences, followed by 256 equivalents for single-character 4519 control sequences. 4520 4521 Then comes region~2, which corresponds to the hash table that we will 4522 define later. The maximum address in this region is used for a dummy 4523 control sequence that is perpetually undefined. There also are several 4524 locations for control sequences that are perpetually defined 4525 (since they are used in error recovery). 4526 4527 @d active_base=1 {beginning of region 1, for active character equivalents} 4528 @d single_base=active_base+256 {equivalents of one-character control sequences} 4529 @d null_cs=single_base+256 {equivalent of \.{\\csname\\endcsname}} 4530 @d hash_base=null_cs+1 {beginning of region 2, for the hash table} 4531 @d frozen_control_sequence=hash_base+hash_size {for error recovery} 4532 @d frozen_protection=frozen_control_sequence {inaccessible but definable} 4533 @d frozen_cr=frozen_control_sequence+1 {permanent `\.{\\cr}'} 4534 @d frozen_end_group=frozen_control_sequence+2 {permanent `\.{\\endgroup}'} 4535 @d frozen_right=frozen_control_sequence+3 {permanent `\.{\\right}'} 4536 @d frozen_fi=frozen_control_sequence+4 {permanent `\.{\\fi}'} 4537 @d frozen_end_template=frozen_control_sequence+5 {permanent `\.{\\endtemplate}'} 4538 @d frozen_endv=frozen_control_sequence+6 {second permanent `\.{\\endtemplate}'} 4539 @d frozen_relax=frozen_control_sequence+7 {permanent `\.{\\relax}'} 4540 @d end_write=frozen_control_sequence+8 {permanent `\.{\\endwrite}'} 4541 @d frozen_dont_expand=frozen_control_sequence+9 4542 {permanent `\.{\\notexpanded:}'} 4543 @d frozen_null_font=frozen_control_sequence+10 4544 {permanent `\.{\\nullfont}'} 4545 @d font_id_base=frozen_null_font-font_base 4546 {begins table of 257 permanent font identifiers} 4547 @d undefined_control_sequence=frozen_null_font+257 {dummy location} 4548 @d glue_base=undefined_control_sequence+1 {beginning of region 3} 4549 4550 @<Initialize table entries...@>= 4551 eq_type(undefined_control_sequence):=undefined_cs; 4552 equiv(undefined_control_sequence):=null; 4553 eq_level(undefined_control_sequence):=level_zero; 4554 for k:=active_base to undefined_control_sequence-1 do 4555 eqtb[k]:=eqtb[undefined_control_sequence]; 4556 4557 @ Here is a routine that displays the current meaning of an |eqtb| entry 4558 in region 1 or~2. (Similar routines for the other regions will appear 4559 below.) 4560 4561 @<Show equivalent |n|, in region 1 or 2@>= 4562 begin sprint_cs(n); print_char("="); print_cmd_chr(eq_type(n),equiv(n)); 4563 if eq_type(n)>=call then 4564 begin print_char(":"); show_token_list(link(equiv(n)),null,32); 4565 end; 4566 end 4567 4568 @ Region 3 of |eqtb| contains the 256 \.{\\skip} registers, as well as the 4569 glue parameters defined here. It is important that the ``muskip'' 4570 parameters have larger numbers than the others. 4571 4572 @d line_skip_code=0 {interline glue if |baseline_skip| is infeasible} 4573 @d baseline_skip_code=1 {desired glue between baselines} 4574 @d par_skip_code=2 {extra glue just above a paragraph} 4575 @d above_display_skip_code=3 {extra glue just above displayed math} 4576 @d below_display_skip_code=4 {extra glue just below displayed math} 4577 @d above_display_short_skip_code=5 4578 {glue above displayed math following short lines} 4579 @d below_display_short_skip_code=6 4580 {glue below displayed math following short lines} 4581 @d left_skip_code=7 {glue at left of justified lines} 4582 @d right_skip_code=8 {glue at right of justified lines} 4583 @d top_skip_code=9 {glue at top of main pages} 4584 @d split_top_skip_code=10 {glue at top of split pages} 4585 @d tab_skip_code=11 {glue between aligned entries} 4586 @d space_skip_code=12 {glue between words (if not |zero_glue|)} 4587 @d xspace_skip_code=13 {glue after sentences (if not |zero_glue|)} 4588 @d par_fill_skip_code=14 {glue on last line of paragraph} 4589 @d thin_mu_skip_code=15 {thin space in math formula} 4590 @d med_mu_skip_code=16 {medium space in math formula} 4591 @d thick_mu_skip_code=17 {thick space in math formula} 4592 @d glue_pars=18 {total number of glue parameters} 4593 @d skip_base=glue_base+glue_pars {table of 256 ``skip'' registers} 4594 @d mu_skip_base=skip_base+256 {table of 256 ``muskip'' registers} 4595 @d local_base=mu_skip_base+256 {beginning of region 4} 4596 @# 4597 @d skip(#)==equiv(skip_base+#) {|mem| location of glue specification} 4598 @d mu_skip(#)==equiv(mu_skip_base+#) {|mem| location of math glue spec} 4599 @d glue_par(#)==equiv(glue_base+#) {|mem| location of glue specification} 4600 @d line_skip==glue_par(line_skip_code) 4601 @d baseline_skip==glue_par(baseline_skip_code) 4602 @d par_skip==glue_par(par_skip_code) 4603 @d above_display_skip==glue_par(above_display_skip_code) 4604 @d below_display_skip==glue_par(below_display_skip_code) 4605 @d above_display_short_skip==glue_par(above_display_short_skip_code) 4606 @d below_display_short_skip==glue_par(below_display_short_skip_code) 4607 @d left_skip==glue_par(left_skip_code) 4608 @d right_skip==glue_par(right_skip_code) 4609 @d top_skip==glue_par(top_skip_code) 4610 @d split_top_skip==glue_par(split_top_skip_code) 4611 @d tab_skip==glue_par(tab_skip_code) 4612 @d space_skip==glue_par(space_skip_code) 4613 @d xspace_skip==glue_par(xspace_skip_code) 4614 @d par_fill_skip==glue_par(par_fill_skip_code) 4615 @d thin_mu_skip==glue_par(thin_mu_skip_code) 4616 @d med_mu_skip==glue_par(med_mu_skip_code) 4617 @d thick_mu_skip==glue_par(thick_mu_skip_code) 4618 4619 @<Current |mem| equivalent of glue parameter number |n|@>=glue_par(n) 4620 4621 @ Sometimes we need to convert \TeX's internal code numbers into symbolic 4622 form. The |print_skip_param| routine gives the symbolic name of a glue 4623 parameter. 4624 4625 @<Declare the procedure called |print_skip_param|@>= 4626 procedure print_skip_param(@!n:integer); 4627 begin case n of 4628 line_skip_code: print_esc("lineskip"); 4629 baseline_skip_code: print_esc("baselineskip"); 4630 par_skip_code: print_esc("parskip"); 4631 above_display_skip_code: print_esc("abovedisplayskip"); 4632 below_display_skip_code: print_esc("belowdisplayskip"); 4633 above_display_short_skip_code: print_esc("abovedisplayshortskip"); 4634 below_display_short_skip_code: print_esc("belowdisplayshortskip"); 4635 left_skip_code: print_esc("leftskip"); 4636 right_skip_code: print_esc("rightskip"); 4637 top_skip_code: print_esc("topskip"); 4638 split_top_skip_code: print_esc("splittopskip"); 4639 tab_skip_code: print_esc("tabskip"); 4640 space_skip_code: print_esc("spaceskip"); 4641 xspace_skip_code: print_esc("xspaceskip"); 4642 par_fill_skip_code: print_esc("parfillskip"); 4643 thin_mu_skip_code: print_esc("thinmuskip"); 4644 med_mu_skip_code: print_esc("medmuskip"); 4645 thick_mu_skip_code: print_esc("thickmuskip"); 4646 othercases print("[unknown glue parameter!]") 4647 endcases; 4648 end; 4649 4650 @ The symbolic names for glue parameters are put into \TeX's hash table 4651 by using the routine called |primitive|, defined below. Let us enter them 4652 now, so that we don't have to list all those parameter names anywhere else. 4653 4654 @<Put each of \TeX's primitives into the hash table@>= 4655 primitive("lineskip",assign_glue,glue_base+line_skip_code);@/ 4656 @!@:line_skip_}{\.{\\lineskip} primitive@> 4657 primitive("baselineskip",assign_glue,glue_base+baseline_skip_code);@/ 4658 @!@:baseline_skip_}{\.{\\baselineskip} primitive@> 4659 primitive("parskip",assign_glue,glue_base+par_skip_code);@/ 4660 @!@:par_skip_}{\.{\\parskip} primitive@> 4661 primitive("abovedisplayskip",assign_glue,glue_base+above_display_skip_code);@/ 4662 @!@:above_display_skip_}{\.{\\abovedisplayskip} primitive@> 4663 primitive("belowdisplayskip",assign_glue,glue_base+below_display_skip_code);@/ 4664 @!@:below_display_skip_}{\.{\\belowdisplayskip} primitive@> 4665 primitive("abovedisplayshortskip", 4666 assign_glue,glue_base+above_display_short_skip_code);@/ 4667 @!@:above_display_short_skip_}{\.{\\abovedisplayshortskip} primitive@> 4668 primitive("belowdisplayshortskip", 4669 assign_glue,glue_base+below_display_short_skip_code);@/ 4670 @!@:below_display_short_skip_}{\.{\\belowdisplayshortskip} primitive@> 4671 primitive("leftskip",assign_glue,glue_base+left_skip_code);@/ 4672 @!@:left_skip_}{\.{\\leftskip} primitive@> 4673 primitive("rightskip",assign_glue,glue_base+right_skip_code);@/ 4674 @!@:right_skip_}{\.{\\rightskip} primitive@> 4675 primitive("topskip",assign_glue,glue_base+top_skip_code);@/ 4676 @!@:top_skip_}{\.{\\topskip} primitive@> 4677 primitive("splittopskip",assign_glue,glue_base+split_top_skip_code);@/ 4678 @!@:split_top_skip_}{\.{\\splittopskip} primitive@> 4679 primitive("tabskip",assign_glue,glue_base+tab_skip_code);@/ 4680 @!@:tab_skip_}{\.{\\tabskip} primitive@> 4681 primitive("spaceskip",assign_glue,glue_base+space_skip_code);@/ 4682 @!@:space_skip_}{\.{\\spaceskip} primitive@> 4683 primitive("xspaceskip",assign_glue,glue_base+xspace_skip_code);@/ 4684 @!@:xspace_skip_}{\.{\\xspaceskip} primitive@> 4685 primitive("parfillskip",assign_glue,glue_base+par_fill_skip_code);@/ 4686 @!@:par_fill_skip_}{\.{\\parfillskip} primitive@> 4687 primitive("thinmuskip",assign_mu_glue,glue_base+thin_mu_skip_code);@/ 4688 @!@:thin_mu_skip_}{\.{\\thinmuskip} primitive@> 4689 primitive("medmuskip",assign_mu_glue,glue_base+med_mu_skip_code);@/ 4690 @!@:med_mu_skip_}{\.{\\medmuskip} primitive@> 4691 primitive("thickmuskip",assign_mu_glue,glue_base+thick_mu_skip_code);@/ 4692 @!@:thick_mu_skip_}{\.{\\thickmuskip} primitive@> 4693 4694 @ @<Cases of |print_cmd_chr| for symbolic printing of primitives@>= 4695 assign_glue,assign_mu_glue: if chr_code<skip_base then 4696 print_skip_param(chr_code-glue_base) 4697 else if chr_code<mu_skip_base then 4698 begin print_esc("skip"); print_int(chr_code-skip_base); 4699 end 4700 else begin print_esc("muskip"); print_int(chr_code-mu_skip_base); 4701 end; 4702 4703 @ All glue parameters and registers are initially `\.{0pt plus0pt minus0pt}'. 4704 4705 @<Initialize table entries...@>= 4706 equiv(glue_base):=zero_glue; eq_level(glue_base):=level_one; 4707 eq_type(glue_base):=glue_ref; 4708 for k:=glue_base+1 to local_base-1 do eqtb[k]:=eqtb[glue_base]; 4709 glue_ref_count(zero_glue):=glue_ref_count(zero_glue)+local_base-glue_base; 4710 4711 @ @<Show equivalent |n|, in region 3@>= 4712 if n<skip_base then 4713 begin print_skip_param(n-glue_base); print_char("="); 4714 if n<glue_base+thin_mu_skip_code then print_spec(equiv(n),"pt") 4715 else print_spec(equiv(n),"mu"); 4716 end 4717 else if n<mu_skip_base then 4718 begin print_esc("skip"); print_int(n-skip_base); print_char("="); 4719 print_spec(equiv(n),"pt"); 4720 end 4721 else begin print_esc("muskip"); print_int(n-mu_skip_base); print_char("="); 4722 print_spec(equiv(n),"mu"); 4723 end 4724 4725 @ Region 4 of |eqtb| contains the local quantities defined here. The 4726 bulk of this region is taken up by five tables that are indexed by eight-bit 4727 characters; these tables are important to both the syntactic and semantic 4728 portions of \TeX. There are also a bunch of special things like font and 4729 token parameters, as well as the tables of \.{\\toks} and \.{\\box} 4730 registers. 4731 4732 @d par_shape_loc=local_base {specifies paragraph shape} 4733 @d output_routine_loc=local_base+1 {points to token list for \.{\\output}} 4734 @d every_par_loc=local_base+2 {points to token list for \.{\\everypar}} 4735 @d every_math_loc=local_base+3 {points to token list for \.{\\everymath}} 4736 @d every_display_loc=local_base+4 {points to token list for \.{\\everydisplay}} 4737 @d every_hbox_loc=local_base+5 {points to token list for \.{\\everyhbox}} 4738 @d every_vbox_loc=local_base+6 {points to token list for \.{\\everyvbox}} 4739 @d every_job_loc=local_base+7 {points to token list for \.{\\everyjob}} 4740 @d every_cr_loc=local_base+8 {points to token list for \.{\\everycr}} 4741 @d err_help_loc=local_base+9 {points to token list for \.{\\errhelp}} 4742 @d toks_base=local_base+10 {table of 256 token list registers} 4743 @d box_base=toks_base+256 {table of 256 box registers} 4744 @d cur_font_loc=box_base+256 {internal font number outside math mode} 4745 @d math_font_base=cur_font_loc+1 {table of 48 math font numbers} 4746 @d cat_code_base=math_font_base+48 4747 {table of 256 command codes (the ``catcodes'')} 4748 @d lc_code_base=cat_code_base+256 {table of 256 lowercase mappings} 4749 @d uc_code_base=lc_code_base+256 {table of 256 uppercase mappings} 4750 @d sf_code_base=uc_code_base+256 {table of 256 spacefactor mappings} 4751 @d math_code_base=sf_code_base+256 {table of 256 math mode mappings} 4752 @d int_base=math_code_base+256 {beginning of region 5} 4753 @# 4754 @d par_shape_ptr==equiv(par_shape_loc) 4755 @d output_routine==equiv(output_routine_loc) 4756 @d every_par==equiv(every_par_loc) 4757 @d every_math==equiv(every_math_loc) 4758 @d every_display==equiv(every_display_loc) 4759 @d every_hbox==equiv(every_hbox_loc) 4760 @d every_vbox==equiv(every_vbox_loc) 4761 @d every_job==equiv(every_job_loc) 4762 @d every_cr==equiv(every_cr_loc) 4763 @d err_help==equiv(err_help_loc) 4764 @d toks(#)==equiv(toks_base+#) 4765 @d box(#)==equiv(box_base+#) 4766 @d cur_font==equiv(cur_font_loc) 4767 @d fam_fnt(#)==equiv(math_font_base+#) 4768 @d cat_code(#)==equiv(cat_code_base+#) 4769 @d lc_code(#)==equiv(lc_code_base+#) 4770 @d uc_code(#)==equiv(uc_code_base+#) 4771 @d sf_code(#)==equiv(sf_code_base+#) 4772 @d math_code(#)==equiv(math_code_base+#) 4773 {Note: |math_code(c)| is the true math code plus |min_halfword|} 4774 4775 @<Put each...@>= 4776 primitive("output",assign_toks,output_routine_loc); 4777 @!@:output_}{\.{\\output} primitive@> 4778 primitive("everypar",assign_toks,every_par_loc); 4779 @!@:every_par_}{\.{\\everypar} primitive@> 4780 primitive("everymath",assign_toks,every_math_loc); 4781 @!@:every_math_}{\.{\\everymath} primitive@> 4782 primitive("everydisplay",assign_toks,every_display_loc); 4783 @!@:every_display_}{\.{\\everydisplay} primitive@> 4784 primitive("everyhbox",assign_toks,every_hbox_loc); 4785 @!@:every_hbox_}{\.{\\everyhbox} primitive@> 4786 primitive("everyvbox",assign_toks,every_vbox_loc); 4787 @!@:every_vbox_}{\.{\\everyvbox} primitive@> 4788 primitive("everyjob",assign_toks,every_job_loc); 4789 @!@:every_job_}{\.{\\everyjob} primitive@> 4790 primitive("everycr",assign_toks,every_cr_loc); 4791 @!@:every_cr_}{\.{\\everycr} primitive@> 4792 primitive("errhelp",assign_toks,err_help_loc); 4793 @!@:err_help_}{\.{\\errhelp} primitive@> 4794 4795 @ @<Cases of |print_cmd_chr|...@>= 4796 assign_toks: if chr_code>=toks_base then 4797 begin print_esc("toks"); print_int(chr_code-toks_base); 4798 end 4799 else case chr_code of 4800 output_routine_loc: print_esc("output"); 4801 every_par_loc: print_esc("everypar"); 4802 every_math_loc: print_esc("everymath"); 4803 every_display_loc: print_esc("everydisplay"); 4804 every_hbox_loc: print_esc("everyhbox"); 4805 every_vbox_loc: print_esc("everyvbox"); 4806 every_job_loc: print_esc("everyjob"); 4807 every_cr_loc: print_esc("everycr"); 4808 othercases print_esc("errhelp") 4809 endcases; 4810 4811 @ We initialize most things to null or undefined values. An undefined font 4812 is represented by the internal code |font_base|. 4813 4814 However, the character code tables are given initial values based on the 4815 conventional interpretation of ASCII code. These initial values should 4816 not be changed when \TeX\ is adapted for use with non-English languages; 4817 all changes to the initialization conventions should be made in format 4818 packages, not in \TeX\ itself, so that global interchange of formats is 4819 possible. 4820 4821 @d null_font==font_base 4822 @d var_code==@'70000 {math code meaning ``use the current family''} 4823 4824 @<Initialize table entries...@>= 4825 par_shape_ptr:=null; eq_type(par_shape_loc):=shape_ref; 4826 eq_level(par_shape_loc):=level_one;@/ 4827 for k:=output_routine_loc to toks_base+255 do 4828 eqtb[k]:=eqtb[undefined_control_sequence]; 4829 box(0):=null; eq_type(box_base):=box_ref; eq_level(box_base):=level_one; 4830 for k:=box_base+1 to box_base+255 do eqtb[k]:=eqtb[box_base]; 4831 cur_font:=null_font; eq_type(cur_font_loc):=data; 4832 eq_level(cur_font_loc):=level_one;@/ 4833 for k:=math_font_base to math_font_base+47 do eqtb[k]:=eqtb[cur_font_loc]; 4834 equiv(cat_code_base):=0; eq_type(cat_code_base):=data; 4835 eq_level(cat_code_base):=level_one;@/ 4836 for k:=cat_code_base+1 to int_base-1 do eqtb[k]:=eqtb[cat_code_base]; 4837 for k:=0 to 255 do 4838 begin cat_code(k):=other_char; math_code(k):=hi(k); sf_code(k):=1000; 4839 end; 4840 cat_code(carriage_return):=car_ret; cat_code(" "):=spacer; 4841 cat_code("\"):=escape; cat_code("%"):=comment; 4842 cat_code(invalid_code):=invalid_char; cat_code(null_code):=ignore; 4843 for k:="0" to "9" do math_code(k):=hi(k+var_code); 4844 for k:="A" to "Z" do 4845 begin cat_code(k):=letter; cat_code(k+"a"-"A"):=letter;@/ 4846 math_code(k):=hi(k+var_code+@"100); 4847 math_code(k+"a"-"A"):=hi(k+"a"-"A"+var_code+@"100);@/ 4848 lc_code(k):=k+"a"-"A"; lc_code(k+"a"-"A"):=k+"a"-"A";@/ 4849 uc_code(k):=k; uc_code(k+"a"-"A"):=k;@/ 4850 sf_code(k):=999; 4851 end; 4852 4853 @ @<Show equivalent |n|, in region 4@>= 4854 if n=par_shape_loc then 4855 begin print_esc("parshape"); print_char("="); 4856 if par_shape_ptr=null then print_char("0") 4857 else print_int(info(par_shape_ptr)); 4858 end 4859 else if n<toks_base then 4860 begin print_cmd_chr(assign_toks,n); print_char("="); 4861 if equiv(n)<>null then show_token_list(link(equiv(n)),null,32); 4862 end 4863 else if n<box_base then 4864 begin print_esc("toks"); print_int(n-toks_base); print_char("="); 4865 if equiv(n)<>null then show_token_list(link(equiv(n)),null,32); 4866 end 4867 else if n<cur_font_loc then 4868 begin print_esc("box"); print_int(n-box_base); print_char("="); 4869 if equiv(n)=null then print("void") 4870 else begin depth_threshold:=0; breadth_max:=1; show_node_list(equiv(n)); 4871 end; 4872 end 4873 else if n<cat_code_base then @<Show the font identifier in |eqtb[n]|@> 4874 else @<Show the halfword code in |eqtb[n]|@> 4875 4876 @ @<Show the font identifier in |eqtb[n]|@>= 4877 begin if n=cur_font_loc then print("current font") 4878 else if n<math_font_base+16 then 4879 begin print_esc("textfont"); print_int(n-math_font_base); 4880 end 4881 else if n<math_font_base+32 then 4882 begin print_esc("scriptfont"); print_int(n-math_font_base-16); 4883 end 4884 else begin print_esc("scriptscriptfont"); print_int(n-math_font_base-32); 4885 end; 4886 print_char("=");@/ 4887 print_esc(hash[font_id_base+equiv(n)].rh); 4888 {that's |font_id_text(equiv(n))|} 4889 end 4890 4891 @ @<Show the halfword code in |eqtb[n]|@>= 4892 if n<math_code_base then 4893 begin if n<lc_code_base then 4894 begin print_esc("catcode"); print_int(n-cat_code_base); 4895 end 4896 else if n<uc_code_base then 4897 begin print_esc("lccode"); print_int(n-lc_code_base); 4898 end 4899 else if n<sf_code_base then 4900 begin print_esc("uccode"); print_int(n-uc_code_base); 4901 end 4902 else begin print_esc("sfcode"); print_int(n-sf_code_base); 4903 end; 4904 print_char("="); print_int(equiv(n)); 4905 end 4906 else begin print_esc("mathcode"); print_int(n-math_code_base); 4907 print_char("="); print_int(ho(equiv(n))); 4908 end 4909 4910 @ Region 5 of |eqtb| contains the integer parameters and registers defined 4911 here, as well as the |del_code| table. The latter table differs from the 4912 |cat_code..math_code| tables that precede it, since delimiter codes are 4913 fullword integers while the other kinds of codes occupy at most a 4914 halfword. This is what makes region~5 different from region~4. We will 4915 store the |eq_level| information in an auxiliary array of quarterwords 4916 that will be defined later. 4917 4918 @d pretolerance_code=0 {badness tolerance before hyphenation} 4919 @d tolerance_code=1 {badness tolerance after hyphenation} 4920 @d line_penalty_code=2 {added to the badness of every line} 4921 @d hyphen_penalty_code=3 {penalty for break after discretionary hyphen} 4922 @d ex_hyphen_penalty_code=4 {penalty for break after explicit hyphen} 4923 @d club_penalty_code=5 {penalty for creating a club line} 4924 @d widow_penalty_code=6 {penalty for creating a widow line} 4925 @d display_widow_penalty_code=7 {ditto, just before a display} 4926 @d broken_penalty_code=8 {penalty for breaking a page at a broken line} 4927 @d bin_op_penalty_code=9 {penalty for breaking after a binary operation} 4928 @d rel_penalty_code=10 {penalty for breaking after a relation} 4929 @d pre_display_penalty_code=11 4930 {penalty for breaking just before a displayed formula} 4931 @d post_display_penalty_code=12 4932 {penalty for breaking just after a displayed formula} 4933 @d inter_line_penalty_code=13 {additional penalty between lines} 4934 @d double_hyphen_demerits_code=14 {demerits for double hyphen break} 4935 @d final_hyphen_demerits_code=15 {demerits for final hyphen break} 4936 @d adj_demerits_code=16 {demerits for adjacent incompatible lines} 4937 @d mag_code=17 {magnification ratio} 4938 @d delimiter_factor_code=18 {ratio for variable-size delimiters} 4939 @d looseness_code=19 {change in number of lines for a paragraph} 4940 @d time_code=20 {current time of day} 4941 @d day_code=21 {current day of the month} 4942 @d month_code=22 {current month of the year} 4943 @d year_code=23 {current year of our Lord} 4944 @d show_box_breadth_code=24 {nodes per level in |show_box|} 4945 @d show_box_depth_code=25 {maximum level in |show_box|} 4946 @d hbadness_code=26 {hboxes exceeding this badness will be shown by |hpack|} 4947 @d vbadness_code=27 {vboxes exceeding this badness will be shown by |vpack|} 4948 @d pausing_code=28 {pause after each line is read from a file} 4949 @d tracing_online_code=29 {show diagnostic output on terminal} 4950 @d tracing_macros_code=30 {show macros as they are being expanded} 4951 @d tracing_stats_code=31 {show memory usage if \TeX\ knows it} 4952 @d tracing_paragraphs_code=32 {show line-break calculations} 4953 @d tracing_pages_code=33 {show page-break calculations} 4954 @d tracing_output_code=34 {show boxes when they are shipped out} 4955 @d tracing_lost_chars_code=35 {show characters that aren't in the font} 4956 @d tracing_commands_code=36 {show command codes at |big_switch|} 4957 @d tracing_restores_code=37 {show equivalents when they are restored} 4958 @d uc_hyph_code=38 {hyphenate words beginning with a capital letter} 4959 @d output_penalty_code=39 {penalty found at current page break} 4960 @d max_dead_cycles_code=40 {bound on consecutive dead cycles of output} 4961 @d hang_after_code=41 {hanging indentation changes after this many lines} 4962 @d floating_penalty_code=42 {penalty for insertions held over after a split} 4963 @d global_defs_code=43 {override \.{\\global} specifications} 4964 @d cur_fam_code=44 {current family} 4965 @d escape_char_code=45 {escape character for token output} 4966 @d default_hyphen_char_code=46 {value of \.{\\hyphenchar} when a font is loaded} 4967 @d default_skew_char_code=47 {value of \.{\\skewchar} when a font is loaded} 4968 @d end_line_char_code=48 {character placed at the right end of the buffer} 4969 @d new_line_char_code=49 {character that prints as |print_ln|} 4970 @d language_code=50 {current hyphenation table} 4971 @d left_hyphen_min_code=51 {minimum left hyphenation fragment size} 4972 @d right_hyphen_min_code=52 {minimum right hyphenation fragment size} 4973 @d holding_inserts_code=53 {do not remove insertion nodes from \.{\\box255}} 4974 @d error_context_lines_code=54 {maximum intermediate line pairs shown} 4975 @d int_pars=55 {total number of integer parameters} 4976 @d count_base=int_base+int_pars {256 user \.{\\count} registers} 4977 @d del_code_base=count_base+256 {256 delimiter code mappings} 4978 @d dimen_base=del_code_base+256 {beginning of region 6} 4979 @# 4980 @d del_code(#)==eqtb[del_code_base+#].int 4981 @d count(#)==eqtb[count_base+#].int 4982 @d int_par(#)==eqtb[int_base+#].int {an integer parameter} 4983 @d pretolerance==int_par(pretolerance_code) 4984 @d tolerance==int_par(tolerance_code) 4985 @d line_penalty==int_par(line_penalty_code) 4986 @d hyphen_penalty==int_par(hyphen_penalty_code) 4987 @d ex_hyphen_penalty==int_par(ex_hyphen_penalty_code) 4988 @d club_penalty==int_par(club_penalty_code) 4989 @d widow_penalty==int_par(widow_penalty_code) 4990 @d display_widow_penalty==int_par(display_widow_penalty_code) 4991 @d broken_penalty==int_par(broken_penalty_code) 4992 @d bin_op_penalty==int_par(bin_op_penalty_code) 4993 @d rel_penalty==int_par(rel_penalty_code) 4994 @d pre_display_penalty==int_par(pre_display_penalty_code) 4995 @d post_display_penalty==int_par(post_display_penalty_code) 4996 @d inter_line_penalty==int_par(inter_line_penalty_code) 4997 @d double_hyphen_demerits==int_par(double_hyphen_demerits_code) 4998 @d final_hyphen_demerits==int_par(final_hyphen_demerits_code) 4999 @d adj_demerits==int_par(adj_demerits_code) 5000 @d mag==int_par(mag_code) 5001 @d delimiter_factor==int_par(delimiter_factor_code) 5002 @d looseness==int_par(looseness_code) 5003 @d time==int_par(time_code) 5004 @d day==int_par(day_code) 5005 @d month==int_par(month_code) 5006 @d year==int_par(year_code) 5007 @d show_box_breadth==int_par(show_box_breadth_code) 5008 @d show_box_depth==int_par(show_box_depth_code) 5009 @d hbadness==int_par(hbadness_code) 5010 @d vbadness==int_par(vbadness_code) 5011 @d pausing==int_par(pausing_code) 5012 @d tracing_online==int_par(tracing_online_code) 5013 @d tracing_macros==int_par(tracing_macros_code) 5014 @d tracing_stats==int_par(tracing_stats_code) 5015 @d tracing_paragraphs==int_par(tracing_paragraphs_code) 5016 @d tracing_pages==int_par(tracing_pages_code) 5017 @d tracing_output==int_par(tracing_output_code) 5018 @d tracing_lost_chars==int_par(tracing_lost_chars_code) 5019 @d tracing_commands==int_par(tracing_commands_code) 5020 @d tracing_restores==int_par(tracing_restores_code) 5021 @d uc_hyph==int_par(uc_hyph_code) 5022 @d output_penalty==int_par(output_penalty_code) 5023 @d max_dead_cycles==int_par(max_dead_cycles_code) 5024 @d hang_after==int_par(hang_after_code) 5025 @d floating_penalty==int_par(floating_penalty_code) 5026 @d global_defs==int_par(global_defs_code) 5027 @d cur_fam==int_par(cur_fam_code) 5028 @d escape_char==int_par(escape_char_code) 5029 @d default_hyphen_char==int_par(default_hyphen_char_code) 5030 @d default_skew_char==int_par(default_skew_char_code) 5031 @d end_line_char==int_par(end_line_char_code) 5032 @d new_line_char==int_par(new_line_char_code) 5033 @d language==int_par(language_code) 5034 @d left_hyphen_min==int_par(left_hyphen_min_code) 5035 @d right_hyphen_min==int_par(right_hyphen_min_code) 5036 @d holding_inserts==int_par(holding_inserts_code) 5037 @d error_context_lines==int_par(error_context_lines_code) 5038 5039 @<Assign the values |depth_threshold:=show_box_depth|...@>= 5040 depth_threshold:=show_box_depth; 5041 breadth_max:=show_box_breadth 5042 5043 @ We can print the symbolic name of an integer parameter as follows. 5044 5045 @p procedure print_param(@!n:integer); 5046 begin case n of 5047 pretolerance_code:print_esc("pretolerance"); 5048 tolerance_code:print_esc("tolerance"); 5049 line_penalty_code:print_esc("linepenalty"); 5050 hyphen_penalty_code:print_esc("hyphenpenalty"); 5051 ex_hyphen_penalty_code:print_esc("exhyphenpenalty"); 5052 club_penalty_code:print_esc("clubpenalty"); 5053 widow_penalty_code:print_esc("widowpenalty"); 5054 display_widow_penalty_code:print_esc("displaywidowpenalty"); 5055 broken_penalty_code:print_esc("brokenpenalty"); 5056 bin_op_penalty_code:print_esc("binoppenalty"); 5057 rel_penalty_code:print_esc("relpenalty"); 5058 pre_display_penalty_code:print_esc("predisplaypenalty"); 5059 post_display_penalty_code:print_esc("postdisplaypenalty"); 5060 inter_line_penalty_code:print_esc("interlinepenalty"); 5061 double_hyphen_demerits_code:print_esc("doublehyphendemerits"); 5062 final_hyphen_demerits_code:print_esc("finalhyphendemerits"); 5063 adj_demerits_code:print_esc("adjdemerits"); 5064 mag_code:print_esc("mag"); 5065 delimiter_factor_code:print_esc("delimiterfactor"); 5066 looseness_code:print_esc("looseness"); 5067 time_code:print_esc("time"); 5068 day_code:print_esc("day"); 5069 month_code:print_esc("month"); 5070 year_code:print_esc("year"); 5071 show_box_breadth_code:print_esc("showboxbreadth"); 5072 show_box_depth_code:print_esc("showboxdepth"); 5073 hbadness_code:print_esc("hbadness"); 5074 vbadness_code:print_esc("vbadness"); 5075 pausing_code:print_esc("pausing"); 5076 tracing_online_code:print_esc("tracingonline"); 5077 tracing_macros_code:print_esc("tracingmacros"); 5078 tracing_stats_code:print_esc("tracingstats"); 5079 tracing_paragraphs_code:print_esc("tracingparagraphs"); 5080 tracing_pages_code:print_esc("tracingpages"); 5081 tracing_output_code:print_esc("tracingoutput"); 5082 tracing_lost_chars_code:print_esc("tracinglostchars"); 5083 tracing_commands_code:print_esc("tracingcommands"); 5084 tracing_restores_code:print_esc("tracingrestores"); 5085 uc_hyph_code:print_esc("uchyph"); 5086 output_penalty_code:print_esc("outputpenalty"); 5087 max_dead_cycles_code:print_esc("maxdeadcycles"); 5088 hang_after_code:print_esc("hangafter"); 5089 floating_penalty_code:print_esc("floatingpenalty"); 5090 global_defs_code:print_esc("globaldefs"); 5091 cur_fam_code:print_esc("fam"); 5092 escape_char_code:print_esc("escapechar"); 5093 default_hyphen_char_code:print_esc("defaulthyphenchar"); 5094 default_skew_char_code:print_esc("defaultskewchar"); 5095 end_line_char_code:print_esc("endlinechar"); 5096 new_line_char_code:print_esc("newlinechar"); 5097 language_code:print_esc("language"); 5098 left_hyphen_min_code:print_esc("lefthyphenmin"); 5099 right_hyphen_min_code:print_esc("righthyphenmin"); 5100 holding_inserts_code:print_esc("holdinginserts"); 5101 error_context_lines_code:print_esc("errorcontextlines"); 5102 othercases print("[unknown integer parameter!]") 5103 endcases; 5104 end; 5105 5106 @ The integer parameter names must be entered into the hash table. 5107 5108 @<Put each...@>= 5109 primitive("pretolerance",assign_int,int_base+pretolerance_code);@/ 5110 @!@:pretolerance_}{\.{\\pretolerance} primitive@> 5111 primitive("tolerance",assign_int,int_base+tolerance_code);@/ 5112 @!@:tolerance_}{\.{\\tolerance} primitive@> 5113 primitive("linepenalty",assign_int,int_base+line_penalty_code);@/ 5114 @!@:line_penalty_}{\.{\\linepenalty} primitive@> 5115 primitive("hyphenpenalty",assign_int,int_base+hyphen_penalty_code);@/ 5116 @!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@> 5117 primitive("exhyphenpenalty",assign_int,int_base+ex_hyphen_penalty_code);@/ 5118 @!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@> 5119 primitive("clubpenalty",assign_int,int_base+club_penalty_code);@/ 5120 @!@:club_penalty_}{\.{\\clubpenalty} primitive@> 5121 primitive("widowpenalty",assign_int,int_base+widow_penalty_code);@/ 5122 @!@:widow_penalty_}{\.{\\widowpenalty} primitive@> 5123 primitive("displaywidowpenalty", 5124 assign_int,int_base+display_widow_penalty_code);@/ 5125 @!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@> 5126 primitive("brokenpenalty",assign_int,int_base+broken_penalty_code);@/ 5127 @!@:broken_penalty_}{\.{\\brokenpenalty} primitive@> 5128 primitive("binoppenalty",assign_int,int_base+bin_op_penalty_code);@/ 5129 @!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@> 5130 primitive("relpenalty",assign_int,int_base+rel_penalty_code);@/ 5131 @!@:rel_penalty_}{\.{\\relpenalty} primitive@> 5132 primitive("predisplaypenalty",assign_int,int_base+pre_display_penalty_code);@/ 5133 @!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@> 5134 primitive("postdisplaypenalty",assign_int,int_base+post_display_penalty_code);@/ 5135 @!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@> 5136 primitive("interlinepenalty",assign_int,int_base+inter_line_penalty_code);@/ 5137 @!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@> 5138 primitive("doublehyphendemerits", 5139 assign_int,int_base+double_hyphen_demerits_code);@/ 5140 @!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@> 5141 primitive("finalhyphendemerits", 5142 assign_int,int_base+final_hyphen_demerits_code);@/ 5143 @!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@> 5144 primitive("adjdemerits",assign_int,int_base+adj_demerits_code);@/ 5145 @!@:adj_demerits_}{\.{\\adjdemerits} primitive@> 5146 primitive("mag",assign_int,int_base+mag_code);@/ 5147 @!@:mag_}{\.{\\mag} primitive@> 5148 primitive("delimiterfactor",assign_int,int_base+delimiter_factor_code);@/ 5149 @!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@> 5150 primitive("looseness",assign_int,int_base+looseness_code);@/ 5151 @!@:looseness_}{\.{\\looseness} primitive@> 5152 primitive("time",assign_int,int_base+time_code);@/ 5153 @!@:time_}{\.{\\time} primitive@> 5154 primitive("day",assign_int,int_base+day_code);@/ 5155 @!@:day_}{\.{\\day} primitive@> 5156 primitive("month",assign_int,int_base+month_code);@/ 5157 @!@:month_}{\.{\\month} primitive@> 5158 primitive("year",assign_int,int_base+year_code);@/ 5159 @!@:year_}{\.{\\year} primitive@> 5160 primitive("showboxbreadth",assign_int,int_base+show_box_breadth_code);@/ 5161 @!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@> 5162 primitive("showboxdepth",assign_int,int_base+show_box_depth_code);@/ 5163 @!@:show_box_depth_}{\.{\\showboxdepth} primitive@> 5164 primitive("hbadness",assign_int,int_base+hbadness_code);@/ 5165 @!@:hbadness_}{\.{\\hbadness} primitive@> 5166 primitive("vbadness",assign_int,int_base+vbadness_code);@/ 5167 @!@:vbadness_}{\.{\\vbadness} primitive@> 5168 primitive("pausing",assign_int,int_base+pausing_code);@/ 5169 @!@:pausing_}{\.{\\pausing} primitive@> 5170 primitive("tracingonline",assign_int,int_base+tracing_online_code);@/ 5171 @!@:tracing_online_}{\.{\\tracingonline} primitive@> 5172 primitive("tracingmacros",assign_int,int_base+tracing_macros_code);@/ 5173 @!@:tracing_macros_}{\.{\\tracingmacros} primitive@> 5174 primitive("tracingstats",assign_int,int_base+tracing_stats_code);@/ 5175 @!@:tracing_stats_}{\.{\\tracingstats} primitive@> 5176 primitive("tracingparagraphs",assign_int,int_base+tracing_paragraphs_code);@/ 5177 @!@:tracing_paragraphs_}{\.{\\tracingparagraphs} primitive@> 5178 primitive("tracingpages",assign_int,int_base+tracing_pages_code);@/ 5179 @!@:tracing_pages_}{\.{\\tracingpages} primitive@> 5180 primitive("tracingoutput",assign_int,int_base+tracing_output_code);@/ 5181 @!@:tracing_output_}{\.{\\tracingoutput} primitive@> 5182 primitive("tracinglostchars",assign_int,int_base+tracing_lost_chars_code);@/ 5183 @!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@> 5184 primitive("tracingcommands",assign_int,int_base+tracing_commands_code);@/ 5185 @!@:tracing_commands_}{\.{\\tracingcommands} primitive@> 5186 primitive("tracingrestores",assign_int,int_base+tracing_restores_code);@/ 5187 @!@:tracing_restores_}{\.{\\tracingrestores} primitive@> 5188 primitive("uchyph",assign_int,int_base+uc_hyph_code);@/ 5189 @!@:uc_hyph_}{\.{\\uchyph} primitive@> 5190 primitive("outputpenalty",assign_int,int_base+output_penalty_code);@/ 5191 @!@:output_penalty_}{\.{\\outputpenalty} primitive@> 5192 primitive("maxdeadcycles",assign_int,int_base+max_dead_cycles_code);@/ 5193 @!@:max_dead_cycles_}{\.{\\maxdeadcycles} primitive@> 5194 primitive("hangafter",assign_int,int_base+hang_after_code);@/ 5195 @!@:hang_after_}{\.{\\hangafter} primitive@> 5196 primitive("floatingpenalty",assign_int,int_base+floating_penalty_code);@/ 5197 @!@:floating_penalty_}{\.{\\floatingpenalty} primitive@> 5198 primitive("globaldefs",assign_int,int_base+global_defs_code);@/ 5199 @!@:global_defs_}{\.{\\globaldefs} primitive@> 5200 primitive("fam",assign_int,int_base+cur_fam_code);@/ 5201 @!@:fam_}{\.{\\fam} primitive@> 5202 primitive("escapechar",assign_int,int_base+escape_char_code);@/ 5203 @!@:escape_char_}{\.{\\escapechar} primitive@> 5204 primitive("defaulthyphenchar",assign_int,int_base+default_hyphen_char_code);@/ 5205 @!@:default_hyphen_char_}{\.{\\defaulthyphenchar} primitive@> 5206 primitive("defaultskewchar",assign_int,int_base+default_skew_char_code);@/ 5207 @!@:default_skew_char_}{\.{\\defaultskewchar} primitive@> 5208 primitive("endlinechar",assign_int,int_base+end_line_char_code);@/ 5209 @!@:end_line_char_}{\.{\\endlinechar} primitive@> 5210 primitive("newlinechar",assign_int,int_base+new_line_char_code);@/ 5211 @!@:new_line_char_}{\.{\\newlinechar} primitive@> 5212 primitive("language",assign_int,int_base+language_code);@/ 5213 @!@:language_}{\.{\\language} primitive@> 5214 primitive("lefthyphenmin",assign_int,int_base+left_hyphen_min_code);@/ 5215 @!@:left_hyphen_min_}{\.{\\lefthyphenmin} primitive@> 5216 primitive("righthyphenmin",assign_int,int_base+right_hyphen_min_code);@/ 5217 @!@:right_hyphen_min_}{\.{\\righthyphenmin} primitive@> 5218 primitive("holdinginserts",assign_int,int_base+holding_inserts_code);@/ 5219 @!@:holding_inserts_}{\.{\\holdinginserts} primitive@> 5220 primitive("errorcontextlines",assign_int,int_base+error_context_lines_code);@/ 5221 @!@:error_context_lines_}{\.{\\errorcontextlines} primitive@> 5222 5223 @ @<Cases of |print_cmd_chr|...@>= 5224 assign_int: if chr_code<count_base then print_param(chr_code-int_base) 5225 else begin print_esc("count"); print_int(chr_code-count_base); 5226 end; 5227 5228 @ The integer parameters should really be initialized by a macro package; 5229 the following initialization does the minimum to keep \TeX\ from 5230 complete failure. 5231 @^null delimiter@> 5232 5233 @<Initialize table entries...@>= 5234 for k:=int_base to del_code_base-1 do eqtb[k].int:=0; 5235 mag:=1000; tolerance:=10000; hang_after:=1; max_dead_cycles:=25; 5236 escape_char:="\"; end_line_char:=carriage_return; 5237 for k:=0 to 255 do del_code(k):=-1; 5238 del_code("."):=0; {this null delimiter is used in error recovery} 5239 5240 @ The following procedure, which is called just before \TeX\ initializes its 5241 input and output, establishes the initial values of the date and time. 5242 @^system dependencies@> 5243 Since standard \PASCAL\ cannot provide such information, something special 5244 is needed. The program here simply assumes that suitable values appear in 5245 the global variables \\{sys\_time}, \\{sys\_day}, \\{sys\_month}, and 5246 \\{sys\_year} (which are initialized to noon on 4 July 1776, 5247 in case the implementor is careless). 5248 5249 @p procedure fix_date_and_time; 5250 begin sys_time:=12*60; 5251 sys_day:=4; sys_month:=7; sys_year:=1776; {self-evident truths} 5252 time:=sys_time; {minutes since midnight} 5253 day:=sys_day; {day of the month} 5254 month:=sys_month; {month of the year} 5255 year:=sys_year; {Anno Domini} 5256 end; 5257 5258 @ @<Show equivalent |n|, in region 5@>= 5259 begin if n<count_base then print_param(n-int_base) 5260 else if n<del_code_base then 5261 begin print_esc("count"); print_int(n-count_base); 5262 end 5263 else begin print_esc("delcode"); print_int(n-del_code_base); 5264 end; 5265 print_char("="); print_int(eqtb[n].int); 5266 end 5267 5268 @ @<Set variable |c| to the current escape character@>=c:=escape_char 5269 5270 @ @<Character |s| is the current new-line character@>=s=new_line_char 5271 5272 @ \TeX\ is occasionally supposed to print diagnostic information that 5273 goes only into the transcript file, unless |tracing_online| is positive. 5274 Here are two routines that adjust the destination of print commands: 5275 5276 @p procedure begin_diagnostic; {prepare to do some tracing} 5277 begin old_setting:=selector; 5278 if (tracing_online<=0)and(selector=term_and_log) then 5279 begin decr(selector); 5280 if history=spotless then history:=warning_issued; 5281 end; 5282 end; 5283 @# 5284 procedure end_diagnostic(@!blank_line:boolean); 5285 {restore proper conditions after tracing} 5286 begin print_nl(""); 5287 if blank_line then print_ln; 5288 selector:=old_setting; 5289 end; 5290 5291 @ Of course we had better declare a few more global variables, if the previous 5292 routines are going to work. 5293 5294 @<Glob...@>= 5295 @!old_setting:0..max_selector; 5296 @!sys_time,@!sys_day,@!sys_month,@!sys_year:integer; 5297 {date and time supplied by external system} 5298 5299 @ The final region of |eqtb| contains the dimension parameters defined 5300 here, and the 256 \.{\\dimen} registers. 5301 5302 @d par_indent_code=0 {indentation of paragraphs} 5303 @d math_surround_code=1 {space around math in text} 5304 @d line_skip_limit_code=2 {threshold for |line_skip| instead of |baseline_skip|} 5305 @d hsize_code=3 {line width in horizontal mode} 5306 @d vsize_code=4 {page height in vertical mode} 5307 @d max_depth_code=5 {maximum depth of boxes on main pages} 5308 @d split_max_depth_code=6 {maximum depth of boxes on split pages} 5309 @d box_max_depth_code=7 {maximum depth of explicit vboxes} 5310 @d hfuzz_code=8 {tolerance for overfull hbox messages} 5311 @d vfuzz_code=9 {tolerance for overfull vbox messages} 5312 @d delimiter_shortfall_code=10 {maximum amount uncovered by variable delimiters} 5313 @d null_delimiter_space_code=11 {blank space in null delimiters} 5314 @d script_space_code=12 {extra space after subscript or superscript} 5315 @d pre_display_size_code=13 {length of text preceding a display} 5316 @d display_width_code=14 {length of line for displayed equation} 5317 @d display_indent_code=15 {indentation of line for displayed equation} 5318 @d overfull_rule_code=16 {width of rule that identifies overfull hboxes} 5319 @d hang_indent_code=17 {amount of hanging indentation} 5320 @d h_offset_code=18 {amount of horizontal offset when shipping pages out} 5321 @d v_offset_code=19 {amount of vertical offset when shipping pages out} 5322 @d emergency_stretch_code=20 {reduces badnesses on final pass of line-breaking} 5323 @d dimen_pars=21 {total number of dimension parameters} 5324 @d scaled_base=dimen_base+dimen_pars 5325 {table of 256 user-defined \.{\\dimen} registers} 5326 @d eqtb_size=scaled_base+255 {largest subscript of |eqtb|} 5327 @# 5328 @d dimen(#)==eqtb[scaled_base+#].sc 5329 @d dimen_par(#)==eqtb[dimen_base+#].sc {a scaled quantity} 5330 @d par_indent==dimen_par(par_indent_code) 5331 @d math_surround==dimen_par(math_surround_code) 5332 @d line_skip_limit==dimen_par(line_skip_limit_code) 5333 @d hsize==dimen_par(hsize_code) 5334 @d vsize==dimen_par(vsize_code) 5335 @d max_depth==dimen_par(max_depth_code) 5336 @d split_max_depth==dimen_par(split_max_depth_code) 5337 @d box_max_depth==dimen_par(box_max_depth_code) 5338 @d hfuzz==dimen_par(hfuzz_code) 5339 @d vfuzz==dimen_par(vfuzz_code) 5340 @d delimiter_shortfall==dimen_par(delimiter_shortfall_code) 5341 @d null_delimiter_space==dimen_par(null_delimiter_space_code) 5342 @d script_space==dimen_par(script_space_code) 5343 @d pre_display_size==dimen_par(pre_display_size_code) 5344 @d display_width==dimen_par(display_width_code) 5345 @d display_indent==dimen_par(display_indent_code) 5346 @d overfull_rule==dimen_par(overfull_rule_code) 5347 @d hang_indent==dimen_par(hang_indent_code) 5348 @d h_offset==dimen_par(h_offset_code) 5349 @d v_offset==dimen_par(v_offset_code) 5350 @d emergency_stretch==dimen_par(emergency_stretch_code) 5351 5352 @p procedure print_length_param(@!n:integer); 5353 begin case n of 5354 par_indent_code:print_esc("parindent"); 5355 math_surround_code:print_esc("mathsurround"); 5356 line_skip_limit_code:print_esc("lineskiplimit"); 5357 hsize_code:print_esc("hsize"); 5358 vsize_code:print_esc("vsize"); 5359 max_depth_code:print_esc("maxdepth"); 5360 split_max_depth_code:print_esc("splitmaxdepth"); 5361 box_max_depth_code:print_esc("boxmaxdepth"); 5362 hfuzz_code:print_esc("hfuzz"); 5363 vfuzz_code:print_esc("vfuzz"); 5364 delimiter_shortfall_code:print_esc("delimitershortfall"); 5365 null_delimiter_space_code:print_esc("nulldelimiterspace"); 5366 script_space_code:print_esc("scriptspace"); 5367 pre_display_size_code:print_esc("predisplaysize"); 5368 display_width_code:print_esc("displaywidth"); 5369 display_indent_code:print_esc("displayindent"); 5370 overfull_rule_code:print_esc("overfullrule"); 5371 hang_indent_code:print_esc("hangindent"); 5372 h_offset_code:print_esc("hoffset"); 5373 v_offset_code:print_esc("voffset"); 5374 emergency_stretch_code:print_esc("emergencystretch"); 5375 othercases print("[unknown dimen parameter!]") 5376 endcases; 5377 end; 5378 5379 @ @<Put each...@>= 5380 primitive("parindent",assign_dimen,dimen_base+par_indent_code);@/ 5381 @!@:par_indent_}{\.{\\parindent} primitive@> 5382 primitive("mathsurround",assign_dimen,dimen_base+math_surround_code);@/ 5383 @!@:math_surround_}{\.{\\mathsurround} primitive@> 5384 primitive("lineskiplimit",assign_dimen,dimen_base+line_skip_limit_code);@/ 5385 @!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@> 5386 primitive("hsize",assign_dimen,dimen_base+hsize_code);@/ 5387 @!@:hsize_}{\.{\\hsize} primitive@> 5388 primitive("vsize",assign_dimen,dimen_base+vsize_code);@/ 5389 @!@:vsize_}{\.{\\vsize} primitive@> 5390 primitive("maxdepth",assign_dimen,dimen_base+max_depth_code);@/ 5391 @!@:max_depth_}{\.{\\maxdepth} primitive@> 5392 primitive("splitmaxdepth",assign_dimen,dimen_base+split_max_depth_code);@/ 5393 @!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@> 5394 primitive("boxmaxdepth",assign_dimen,dimen_base+box_max_depth_code);@/ 5395 @!@:box_max_depth_}{\.{\\boxmaxdepth} primitive@> 5396 primitive("hfuzz",assign_dimen,dimen_base+hfuzz_code);@/ 5397 @!@:hfuzz_}{\.{\\hfuzz} primitive@> 5398 primitive("vfuzz",assign_dimen,dimen_base+vfuzz_code);@/ 5399 @!@:vfuzz_}{\.{\\vfuzz} primitive@> 5400 primitive("delimitershortfall", 5401 assign_dimen,dimen_base+delimiter_shortfall_code);@/ 5402 @!@:delimiter_shortfall_}{\.{\\delimitershortfall} primitive@> 5403 primitive("nulldelimiterspace", 5404 assign_dimen,dimen_base+null_delimiter_space_code);@/ 5405 @!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@> 5406 primitive("scriptspace",assign_dimen,dimen_base+script_space_code);@/ 5407 @!@:script_space_}{\.{\\scriptspace} primitive@> 5408 primitive("predisplaysize",assign_dimen,dimen_base+pre_display_size_code);@/ 5409 @!@:pre_display_size_}{\.{\\predisplaysize} primitive@> 5410 primitive("displaywidth",assign_dimen,dimen_base+display_width_code);@/ 5411 @!@:display_width_}{\.{\\displaywidth} primitive@> 5412 primitive("displayindent",assign_dimen,dimen_base+display_indent_code);@/ 5413 @!@:display_indent_}{\.{\\displayindent} primitive@> 5414 primitive("overfullrule",assign_dimen,dimen_base+overfull_rule_code);@/ 5415 @!@:overfull_rule_}{\.{\\overfullrule} primitive@> 5416 primitive("hangindent",assign_dimen,dimen_base+hang_indent_code);@/ 5417 @!@:hang_indent_}{\.{\\hangindent} primitive@> 5418 primitive("hoffset",assign_dimen,dimen_base+h_offset_code);@/ 5419 @!@:h_offset_}{\.{\\hoffset} primitive@> 5420 primitive("voffset",assign_dimen,dimen_base+v_offset_code);@/ 5421 @!@:v_offset_}{\.{\\voffset} primitive@> 5422 primitive("emergencystretch",assign_dimen,dimen_base+emergency_stretch_code);@/ 5423 @!@:emergency_stretch_}{\.{\\emergencystretch} primitive@> 5424 5425 @ @<Cases of |print_cmd_chr|...@>= 5426 assign_dimen: if chr_code<scaled_base then 5427 print_length_param(chr_code-dimen_base) 5428 else begin print_esc("dimen"); print_int(chr_code-scaled_base); 5429 end; 5430 5431 @ @<Initialize table entries...@>= 5432 for k:=dimen_base to eqtb_size do eqtb[k].sc:=0; 5433 5434 @ @<Show equivalent |n|, in region 6@>= 5435 begin if n<scaled_base then print_length_param(n-dimen_base) 5436 else begin print_esc("dimen"); print_int(n-scaled_base); 5437 end; 5438 print_char("="); print_scaled(eqtb[n].sc); print("pt"); 5439 end 5440 5441 @ Here is a procedure that displays the contents of |eqtb[n]| 5442 symbolically. 5443 5444 @p@t\4@>@<Declare the procedure called |print_cmd_chr|@>@;@/ 5445 @!stat procedure show_eqtb(@!n:pointer); 5446 begin if n<active_base then print_char("?") {this can't happen} 5447 else if n<glue_base then @<Show equivalent |n|, in region 1 or 2@> 5448 else if n<local_base then @<Show equivalent |n|, in region 3@> 5449 else if n<int_base then @<Show equivalent |n|, in region 4@> 5450 else if n<dimen_base then @<Show equivalent |n|, in region 5@> 5451 else if n<=eqtb_size then @<Show equivalent |n|, in region 6@> 5452 else print_char("?"); {this can't happen either} 5453 end; 5454 tats 5455 5456 @ The last two regions of |eqtb| have fullword values instead of the 5457 three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary, 5458 but \TeX\ needs to store the |eq_level| information in another array 5459 called |xeq_level|. 5460 5461 @<Glob...@>= 5462 @!eqtb:array[active_base..eqtb_size] of memory_word; 5463 @!xeq_level:array[int_base..eqtb_size] of quarterword; 5464 5465 @ @<Set init...@>= 5466 for k:=int_base to eqtb_size do xeq_level[k]:=level_one; 5467 5468 @ When the debugging routine |search_mem| is looking for pointers having a 5469 given value, it is interested only in regions 1 to~3 of~|eqtb|, and in the 5470 first part of region~4. 5471 5472 @<Search |eqtb| for equivalents equal to |p|@>= 5473 for q:=active_base to box_base+255 do 5474 begin if equiv(q)=p then 5475 begin print_nl("EQUIV("); print_int(q); print_char(")"); 5476 end; 5477 end 5478 5479 @* \[18] The hash table. 5480 Control sequences are stored and retrieved by means of a fairly standard hash 5481 table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C 5482 in {\sl The Art of Computer Programming\/}). Once a control sequence enters the 5483 table, it is never removed, because there are complicated situations 5484 involving \.{\\gdef} where the removal of a control sequence at the end of 5485 a group would be a mistake preventable only by the introduction of a 5486 complicated reference-count mechanism. 5487 5488 The actual sequence of letters forming a control sequence identifier is 5489 stored in the |str_pool| array together with all the other strings. An 5490 auxiliary array |hash| consists of items with two halfword fields per 5491 word. The first of these, called |next(p)|, points to the next identifier 5492 belonging to the same coalesced list as the identifier corresponding to~|p|; 5493 and the other, called |text(p)|, points to the |str_start| entry for 5494 |p|'s identifier. If position~|p| of the hash table is empty, we have 5495 |text(p)=0|; if position |p| is either empty or the end of a coalesced 5496 hash list, we have |next(p)=0|. An auxiliary pointer variable called 5497 |hash_used| is maintained in such a way that all locations |p>=hash_used| 5498 are nonempty. The global variable |cs_count| tells how many multiletter 5499 control sequences have been defined, if statistics are being kept. 5500 5501 A global boolean variable called |no_new_control_sequence| is set to 5502 |true| during the time that new hash table entries are forbidden. 5503 5504 @d next(#) == hash[#].lh {link for coalesced lists} 5505 @d text(#) == hash[#].rh {string number for control sequence name} 5506 @d hash_is_full == (hash_used=hash_base) {test if all positions are occupied} 5507 @d font_id_text(#) == text(font_id_base+#) {a frozen font identifier's name} 5508 5509 @<Glob...@>= 5510 @!hash: array[hash_base..undefined_control_sequence-1] of two_halves; 5511 {the hash table} 5512 @!hash_used:pointer; {allocation pointer for |hash|} 5513 @!no_new_control_sequence:boolean; {are new identifiers legal?} 5514 @!cs_count:integer; {total number of known identifiers} 5515 5516 @ @<Set init...@>= 5517 no_new_control_sequence:=true; {new identifiers are usually forbidden} 5518 next(hash_base):=0; text(hash_base):=0; 5519 for k:=hash_base+1 to undefined_control_sequence-1 do hash[k]:=hash[hash_base]; 5520 5521 @ @<Initialize table entries...@>= 5522 hash_used:=frozen_control_sequence; {nothing is used} 5523 cs_count:=0; 5524 eq_type(frozen_dont_expand):=dont_expand; 5525 text(frozen_dont_expand):="notexpanded:"; 5526 @.notexpanded:@> 5527 5528 @ Here is the subroutine that searches the hash table for an identifier 5529 that matches a given string of length |l>1| appearing in |buffer[j.. 5530 (j+l-1)]|. If the identifier is found, the corresponding hash table address 5531 is returned. Otherwise, if the global variable |no_new_control_sequence| 5532 is |true|, the dummy address |undefined_control_sequence| is returned. 5533 Otherwise the identifier is inserted into the hash table and its location 5534 is returned. 5535 5536 @p function id_lookup(@!j,@!l:integer):pointer; {search the hash table} 5537 label found; {go here if you found it} 5538 var h:integer; {hash code} 5539 @!d:integer; {number of characters in incomplete current string} 5540 @!p:pointer; {index in |hash| array} 5541 @!k:pointer; {index in |buffer| array} 5542 begin @<Compute the hash code |h|@>; 5543 p:=h+hash_base; {we start searching here; note that |0<=h<hash_prime|} 5544 loop@+begin if text(p)>0 then if length(text(p))=l then 5545 if str_eq_buf(text(p),j) then goto found; 5546 if next(p)=0 then 5547 begin if no_new_control_sequence then 5548 p:=undefined_control_sequence 5549 else @<Insert a new control sequence after |p|, then make 5550 |p| point to it@>; 5551 goto found; 5552 end; 5553 p:=next(p); 5554 end; 5555 found: id_lookup:=p; 5556 end; 5557 5558 @ @<Insert a new control...@>= 5559 begin if text(p)>0 then 5560 begin repeat if hash_is_full then overflow("hash size",hash_size); 5561 @:TeX capacity exceeded hash size}{\quad hash size@> 5562 decr(hash_used); 5563 until text(hash_used)=0; {search for an empty location in |hash|} 5564 next(p):=hash_used; p:=hash_used; 5565 end; 5566 str_room(l); d:=cur_length; 5567 while pool_ptr>str_start[str_ptr] do 5568 begin decr(pool_ptr); str_pool[pool_ptr+l]:=str_pool[pool_ptr]; 5569 end; {move current string up to make room for another} 5570 for k:=j to j+l-1 do append_char(buffer[k]); 5571 text(p):=make_string; pool_ptr:=pool_ptr+d; 5572 @!stat incr(cs_count);@+tats@;@/ 5573 end 5574 5575 @ The value of |hash_prime| should be roughly 85\pct! of |hash_size|, and it 5576 should be a prime number. The theory of hashing tells us to expect fewer 5577 than two table probes, on the average, when the search is successful. 5578 [See J.~S. Vitter, {\sl Journal of the ACM\/ \bf30} (1983), 231--258.] 5579 @^Vitter, Jeffrey Scott@> 5580 5581 @<Compute the hash code |h|@>= 5582 h:=buffer[j]; 5583 for k:=j+1 to j+l-1 do 5584 begin h:=h+h+buffer[k]; 5585 while h>=hash_prime do h:=h-hash_prime; 5586 end 5587 5588 @ Single-character control sequences do not need to be looked up in a hash 5589 table, since we can use the character code itself as a direct address. 5590 The procedure |print_cs| prints the name of a control sequence, given 5591 a pointer to its address in |eqtb|. A space is printed after the name 5592 unless it is a single nonletter or an active character. This procedure 5593 might be invoked with invalid data, so it is ``extra robust.'' The 5594 individual characters must be printed one at a time using |print|, since 5595 they may be unprintable. 5596 5597 @<Basic printing...@>= 5598 procedure print_cs(@!p:integer); {prints a purported control sequence} 5599 begin if p<hash_base then {single character} 5600 if p>=single_base then 5601 if p=null_cs then 5602 begin print_esc("csname"); print_esc("endcsname"); print_char(" "); 5603 end 5604 else begin print_esc(p-single_base); 5605 if cat_code(p-single_base)=letter then print_char(" "); 5606 end 5607 else if p<active_base then print_esc("IMPOSSIBLE.") 5608 @.IMPOSSIBLE@> 5609 else print(p-active_base) 5610 else if p>=undefined_control_sequence then print_esc("IMPOSSIBLE.") 5611 else if (text(p)<0)or(text(p)>=str_ptr) then print_esc("NONEXISTENT.") 5612 @.NONEXISTENT@> 5613 else begin print_esc(text(p)); 5614 print_char(" "); 5615 end; 5616 end; 5617 5618 @ Here is a similar procedure; it avoids the error checks, and it never 5619 prints a space after the control sequence. 5620 5621 @<Basic printing procedures@>= 5622 procedure sprint_cs(@!p:pointer); {prints a control sequence} 5623 begin if p<hash_base then 5624 if p<single_base then print(p-active_base) 5625 else if p<null_cs then print_esc(p-single_base) 5626 else begin print_esc("csname"); print_esc("endcsname"); 5627 end 5628 else print_esc(text(p)); 5629 end; 5630 5631 @ We need to put \TeX's ``primitive'' control sequences into the hash 5632 table, together with their command code (which will be the |eq_type|) 5633 and an operand (which will be the |equiv|). The |primitive| procedure 5634 does this, in a way that no \TeX\ user can. The global value |cur_val| 5635 contains the new |eqtb| pointer after |primitive| has acted. 5636 5637 @p @!init procedure primitive(@!s:str_number;@!c:quarterword;@!o:halfword); 5638 var k:pool_pointer; {index into |str_pool|} 5639 @!j:small_number; {index into |buffer|} 5640 @!l:small_number; {length of the string} 5641 begin if s<256 then cur_val:=s+single_base 5642 else begin k:=str_start[s]; l:=str_start[s+1]-k; 5643 {we will move |s| into the (empty) |buffer|} 5644 for j:=0 to l-1 do buffer[j]:=so(str_pool[k+j]); 5645 cur_val:=id_lookup(0,l); {|no_new_control_sequence| is |false|} 5646 flush_string; text(cur_val):=s; {we don't want to have the string twice} 5647 end; 5648 eq_level(cur_val):=level_one; eq_type(cur_val):=c; equiv(cur_val):=o; 5649 end; 5650 tini 5651 5652 @ Many of \TeX's primitives need no |equiv|, since they are identifiable 5653 by their |eq_type| alone. These primitives are loaded into the hash table 5654 as follows: 5655 5656 @<Put each of \TeX's primitives into the hash table@>= 5657 primitive(" ",ex_space,0);@/ 5658 @!@:Single-character primitives /}{\quad\.{\\\ }@> 5659 primitive("/",ital_corr,0);@/ 5660 @!@:Single-character primitives /}{\quad\.{\\/}@> 5661 primitive("accent",accent,0);@/ 5662 @!@:accent_}{\.{\\accent} primitive@> 5663 primitive("advance",advance,0);@/ 5664 @!@:advance_}{\.{\\advance} primitive@> 5665 primitive("afterassignment",after_assignment,0);@/ 5666 @!@:after_assignment_}{\.{\\afterassignment} primitive@> 5667 primitive("aftergroup",after_group,0);@/ 5668 @!@:after_group_}{\.{\\aftergroup} primitive@> 5669 primitive("begingroup",begin_group,0);@/ 5670 @!@:begin_group_}{\.{\\begingroup} primitive@> 5671 primitive("char",char_num,0);@/ 5672 @!@:char_}{\.{\\char} primitive@> 5673 primitive("csname",cs_name,0);@/ 5674 @!@:cs_name_}{\.{\\csname} primitive@> 5675 primitive("delimiter",delim_num,0);@/ 5676 @!@:delimiter_}{\.{\\delimiter} primitive@> 5677 primitive("divide",divide,0);@/ 5678 @!@:divide_}{\.{\\divide} primitive@> 5679 primitive("endcsname",end_cs_name,0);@/ 5680 @!@:end_cs_name_}{\.{\\endcsname} primitive@> 5681 primitive("endgroup",end_group,0); 5682 @!@:end_group_}{\.{\\endgroup} primitive@> 5683 text(frozen_end_group):="endgroup"; eqtb[frozen_end_group]:=eqtb[cur_val];@/ 5684 primitive("expandafter",expand_after,0);@/ 5685 @!@:expand_after_}{\.{\\expandafter} primitive@> 5686 primitive("font",def_font,0);@/ 5687 @!@:font_}{\.{\\font} primitive@> 5688 primitive("fontdimen",assign_font_dimen,0);@/ 5689 @!@:font_dimen_}{\.{\\fontdimen} primitive@> 5690 primitive("halign",halign,0);@/ 5691 @!@:halign_}{\.{\\halign} primitive@> 5692 primitive("hrule",hrule,0);@/ 5693 @!@:hrule_}{\.{\\hrule} primitive@> 5694 primitive("ignorespaces",ignore_spaces,0);@/ 5695 @!@:ignore_spaces_}{\.{\\ignorespaces} primitive@> 5696 primitive("insert",insert,0);@/ 5697 @!@:insert_}{\.{\\insert} primitive@> 5698 primitive("mark",mark,0);@/ 5699 @!@:mark_}{\.{\\mark} primitive@> 5700 primitive("mathaccent",math_accent,0);@/ 5701 @!@:math_accent_}{\.{\\mathaccent} primitive@> 5702 primitive("mathchar",math_char_num,0);@/ 5703 @!@:math_char_}{\.{\\mathchar} primitive@> 5704 primitive("mathchoice",math_choice,0);@/ 5705 @!@:math_choice_}{\.{\\mathchoice} primitive@> 5706 primitive("multiply",multiply,0);@/ 5707 @!@:multiply_}{\.{\\multiply} primitive@> 5708 primitive("noalign",no_align,0);@/ 5709 @!@:no_align_}{\.{\\noalign} primitive@> 5710 primitive("noboundary",no_boundary,0);@/ 5711 @!@:no_boundary_}{\.{\\noboundary} primitive@> 5712 primitive("noexpand",no_expand,0);@/ 5713 @!@:no_expand_}{\.{\\noexpand} primitive@> 5714 primitive("nonscript",non_script,0);@/ 5715 @!@:non_script_}{\.{\\nonscript} primitive@> 5716 primitive("omit",omit,0);@/ 5717 @!@:omit_}{\.{\\omit} primitive@> 5718 primitive("parshape",set_shape,0);@/ 5719 @!@:par_shape_}{\.{\\parshape} primitive@> 5720 primitive("penalty",break_penalty,0);@/ 5721 @!@:penalty_}{\.{\\penalty} primitive@> 5722 primitive("prevgraf",set_prev_graf,0);@/ 5723 @!@:prev_graf_}{\.{\\prevgraf} primitive@> 5724 primitive("radical",radical,0);@/ 5725 @!@:radical_}{\.{\\radical} primitive@> 5726 primitive("read",read_to_cs,0);@/ 5727 @!@:read_}{\.{\\read} primitive@> 5728 primitive("relax",relax,256); {cf.\ |scan_file_name|} 5729 @!@:relax_}{\.{\\relax} primitive@> 5730 text(frozen_relax):="relax"; eqtb[frozen_relax]:=eqtb[cur_val];@/ 5731 primitive("setbox",set_box,0);@/ 5732 @!@:set_box_}{\.{\\setbox} primitive@> 5733 primitive("the",the,0);@/ 5734 @!@:the_}{\.{\\the} primitive@> 5735 primitive("toks",toks_register,0);@/ 5736 @!@:toks_}{\.{\\toks} primitive@> 5737 primitive("vadjust",vadjust,0);@/ 5738 @!@:vadjust_}{\.{\\vadjust} primitive@> 5739 primitive("valign",valign,0);@/ 5740 @!@:valign_}{\.{\\valign} primitive@> 5741 primitive("vcenter",vcenter,0);@/ 5742 @!@:vcenter_}{\.{\\vcenter} primitive@> 5743 primitive("vrule",vrule,0);@/ 5744 @!@:vrule_}{\.{\\vrule} primitive@> 5745 5746 @ Each primitive has a corresponding inverse, so that it is possible to 5747 display the cryptic numeric contents of |eqtb| in symbolic form. 5748 Every call of |primitive| in this program is therefore accompanied by some 5749 straightforward code that forms part of the |print_cmd_chr| routine 5750 below. 5751 5752 @<Cases of |print_cmd_chr|...@>= 5753 accent: print_esc("accent"); 5754 advance: print_esc("advance"); 5755 after_assignment: print_esc("afterassignment"); 5756 after_group: print_esc("aftergroup"); 5757 assign_font_dimen: print_esc("fontdimen"); 5758 begin_group: print_esc("begingroup"); 5759 break_penalty: print_esc("penalty"); 5760 char_num: print_esc("char"); 5761 cs_name: print_esc("csname"); 5762 def_font: print_esc("font"); 5763 delim_num: print_esc("delimiter"); 5764 divide: print_esc("divide"); 5765 end_cs_name: print_esc("endcsname"); 5766 end_group: print_esc("endgroup"); 5767 ex_space: print_esc(" "); 5768 expand_after: print_esc("expandafter"); 5769 halign: print_esc("halign"); 5770 hrule: print_esc("hrule"); 5771 ignore_spaces: print_esc("ignorespaces"); 5772 insert: print_esc("insert"); 5773 ital_corr: print_esc("/"); 5774 mark: print_esc("mark"); 5775 math_accent: print_esc("mathaccent"); 5776 math_char_num: print_esc("mathchar"); 5777 math_choice: print_esc("mathchoice"); 5778 multiply: print_esc("multiply"); 5779 no_align: print_esc("noalign"); 5780 no_boundary:print_esc("noboundary"); 5781 no_expand: print_esc("noexpand"); 5782 non_script: print_esc("nonscript"); 5783 omit: print_esc("omit"); 5784 radical: print_esc("radical"); 5785 read_to_cs: print_esc("read"); 5786 relax: print_esc("relax"); 5787 set_box: print_esc("setbox"); 5788 set_prev_graf: print_esc("prevgraf"); 5789 set_shape: print_esc("parshape"); 5790 the: print_esc("the"); 5791 toks_register: print_esc("toks"); 5792 vadjust: print_esc("vadjust"); 5793 valign: print_esc("valign"); 5794 vcenter: print_esc("vcenter"); 5795 vrule: print_esc("vrule"); 5796 5797 @ We will deal with the other primitives later, at some point in the program 5798 where their |eq_type| and |equiv| values are more meaningful. For example, 5799 the primitives for math mode will be loaded when we consider the routines 5800 that deal with formulas. It is easy to find where each particular 5801 primitive was treated by looking in the index at the end; for example, the 5802 section where |"radical"| entered |eqtb| is listed under `\.{\\radical} 5803 primitive'. (Primitives consisting of a single nonalphabetic character, 5804 @!like `\.{\\/}', are listed under `Single-character primitives'.) 5805 @!@^Single-character primitives@> 5806 5807 Meanwhile, this is a convenient place to catch up on something we were unable 5808 to do before the hash table was defined: 5809 5810 @<Print the font identifier for |font(p)|@>= 5811 print_esc(font_id_text(font(p))) 5812 5813 @* \[19] Saving and restoring equivalents. 5814 The nested structure provided by `$\.{\char'173}\ldots\.{\char'175}$' groups 5815 in \TeX\ means that |eqtb| entries valid in outer groups should be saved 5816 and restored later if they are overridden inside the braces. When a new |eqtb| 5817 value is being assigned, the program therefore checks to see if the previous 5818 entry belongs to an outer level. In such a case, the old value is placed 5819 on the |save_stack| just before the new value enters |eqtb|. At the 5820 end of a grouping level, i.e., when the right brace is sensed, the 5821 |save_stack| is used to restore the outer values, and the inner ones are 5822 destroyed. 5823 5824 Entries on the |save_stack| are of type |memory_word|. The top item on 5825 this stack is |save_stack[p]|, where |p=save_ptr-1|; it contains three 5826 fields called |save_type|, |save_level|, and |save_index|, and it is 5827 interpreted in one of four ways: 5828 5829 \yskip\hangg 1) If |save_type(p)=restore_old_value|, then 5830 |save_index(p)| is a location in |eqtb| whose current value should 5831 be destroyed at the end of the current group and replaced by |save_stack[p-1]|. 5832 Furthermore if |save_index(p)>=int_base|, then |save_level(p)| 5833 should replace the corresponding entry in |xeq_level|. 5834 5835 \yskip\hangg 2) If |save_type(p)=restore_zero|, then |save_index(p)| 5836 is a location in |eqtb| whose current value should be destroyed at the end 5837 of the current group, when it should be 5838 replaced by the value of |eqtb[undefined_control_sequence]|. 5839 5840 \yskip\hangg 3) If |save_type(p)=insert_token|, then |save_index(p)| 5841 is a token that should be inserted into \TeX's input when the current 5842 group ends. 5843 5844 \yskip\hangg 4) If |save_type(p)=level_boundary|, then |save_level(p)| 5845 is a code explaining what kind of group we were previously in, and 5846 |save_index(p)| points to the level boundary word at the bottom of 5847 the entries for that group. 5848 5849 @d save_type(#)==save_stack[#].hh.b0 {classifies a |save_stack| entry} 5850 @d save_level(#)==save_stack[#].hh.b1 5851 {saved level for regions 5 and 6, or group code} 5852 @d save_index(#)==save_stack[#].hh.rh 5853 {|eqtb| location or token or |save_stack| location} 5854 @d restore_old_value=0 {|save_type| when a value should be restored later} 5855 @d restore_zero=1 {|save_type| when an undefined entry should be restored} 5856 @d insert_token=2 {|save_type| when a token is being saved for later use} 5857 @d level_boundary=3 {|save_type| corresponding to beginning of group} 5858 5859 @ Here are the group codes that are used to discriminate between different 5860 kinds of groups. They allow \TeX\ to decide what special actions, if any, 5861 should be performed when a group ends. 5862 \def\grp{\.{\char'173...\char'175}} 5863 5864 Some groups are not supposed to be ended by right braces. For example, 5865 the `\.\$' that begins a math formula causes a |math_shift_group| to 5866 be started, and this should be terminated by a matching `\.\$'. Similarly, 5867 a group that starts with \.{\\left} should end with \.{\\right}, and 5868 one that starts with \.{\\begingroup} should end with \.{\\endgroup}. 5869 5870 @d bottom_level=0 {group code for the outside world} 5871 @d simple_group=1 {group code for local structure only} 5872 @d hbox_group=2 {code for `\.{\\hbox}\grp'} 5873 @d adjusted_hbox_group=3 {code for `\.{\\hbox}\grp' in vertical mode} 5874 @d vbox_group=4 {code for `\.{\\vbox}\grp'} 5875 @d vtop_group=5 {code for `\.{\\vtop}\grp'} 5876 @d align_group=6 {code for `\.{\\halign}\grp', `\.{\\valign}\grp'} 5877 @d no_align_group=7 {code for `\.{\\noalign}\grp'} 5878 @d output_group=8 {code for output routine} 5879 @d math_group=9 {code for, e.g., `\.{\char'136}\grp'} 5880 @d disc_group=10 {code for `\.{\\discretionary}\grp\grp\grp'} 5881 @d insert_group=11 {code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'} 5882 @d vcenter_group=12 {code for `\.{\\vcenter}\grp'} 5883 @d math_choice_group=13 {code for `\.{\\mathchoice}\grp\grp\grp\grp'} 5884 @d semi_simple_group=14 {code for `\.{\\begingroup...\\endgroup}'} 5885 @d math_shift_group=15 {code for `\.{\$...\$}'} 5886 @d math_left_group=16 {code for `\.{\\left...\\right}'} 5887 @d max_group_code=16 5888 5889 @<Types...@>= 5890 @!group_code=0..max_group_code; {|save_level| for a level boundary} 5891 5892 @ The global variable |cur_group| keeps track of what sort of group we are 5893 currently in. Another global variable, |cur_boundary|, points to the 5894 topmost |level_boundary| word. And |cur_level| is the current depth of 5895 nesting. The routines are designed to preserve the condition that no entry 5896 in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|. 5897 5898 @ @<Glob...@>= 5899 @!save_stack : array[0..save_size] of memory_word; 5900 @!save_ptr : 0..save_size; {first unused entry on |save_stack|} 5901 @!max_save_stack:0..save_size; {maximum usage of save stack} 5902 @!cur_level: quarterword; {current nesting level for groups} 5903 @!cur_group: group_code; {current group type} 5904 @!cur_boundary: 0..save_size; {where the current level begins} 5905 5906 @ At this time it might be a good idea for the reader to review the introduction 5907 to |eqtb| that was given above just before the long lists of parameter names. 5908 Recall that the ``outer level'' of the program is |level_one|, since 5909 undefined control sequences are assumed to be ``defined'' at |level_zero|. 5910 5911 @<Set init...@>= 5912 save_ptr:=0; cur_level:=level_one; cur_group:=bottom_level; cur_boundary:=0; 5913 max_save_stack:=0; 5914 5915 @ The following macro is used to test if there is room for up to six more 5916 entries on |save_stack|. By making a conservative test like this, we can 5917 get by with testing for overflow in only a few places. 5918 5919 @d check_full_save_stack==if save_ptr>max_save_stack then 5920 begin max_save_stack:=save_ptr; 5921 if max_save_stack>save_size-6 then overflow("save size",save_size); 5922 @:TeX capacity exceeded save size}{\quad save size@> 5923 end 5924 5925 @ Procedure |new_save_level| is called when a group begins. The 5926 argument is a group identification code like `|hbox_group|'. After 5927 calling this routine, it is safe to put five more entries on |save_stack|. 5928 5929 In some cases integer-valued items are placed onto the 5930 |save_stack| just below a |level_boundary| word, because this is a 5931 convenient place to keep information that is supposed to ``pop up'' just 5932 when the group has finished. 5933 For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt 5934 dimension is stored on |save_stack| just before |new_save_level| is 5935 called. 5936 5937 We use the notation |saved(k)| to stand for an integer item that 5938 appears in location |save_ptr+k| of the save stack. 5939 5940 @d saved(#)==save_stack[save_ptr+#].int 5941 5942 @p procedure new_save_level(@!c:group_code); {begin a new level of grouping} 5943 begin check_full_save_stack; 5944 save_type(save_ptr):=level_boundary; save_level(save_ptr):=cur_group; 5945 save_index(save_ptr):=cur_boundary; 5946 if cur_level=max_quarterword then overflow("grouping levels", 5947 @:TeX capacity exceeded grouping levels}{\quad grouping levels@> 5948 max_quarterword-min_quarterword); 5949 {quit if |(cur_level+1)| is too big to be stored in |eqtb|} 5950 cur_boundary:=save_ptr; incr(cur_level); incr(save_ptr); cur_group:=c; 5951 end; 5952 5953 @ Just before an entry of |eqtb| is changed, the following procedure should 5954 be called to update the other data structures properly. It is important 5955 to keep in mind that reference counts in |mem| include references from 5956 within |save_stack|, so these counts must be handled carefully. 5957 @^reference counts@> 5958 5959 @p procedure eq_destroy(@!w:memory_word); {gets ready to forget |w|} 5960 var q:pointer; {|equiv| field of |w|} 5961 begin case eq_type_field(w) of 5962 call,long_call,outer_call,long_outer_call: delete_token_ref(equiv_field(w)); 5963 glue_ref: delete_glue_ref(equiv_field(w)); 5964 shape_ref: begin q:=equiv_field(w); {we need to free a \.{\\parshape} block} 5965 if q<>null then free_node(q,info(q)+info(q)+1); 5966 end; {such a block is |2n+1| words long, where |n=info(q)|} 5967 box_ref: flush_node_list(equiv_field(w)); 5968 othercases do_nothing 5969 endcases; 5970 end; 5971 5972 @ To save a value of |eqtb[p]| that was established at level |l|, we 5973 can use the following subroutine. 5974 5975 @p procedure eq_save(@!p:pointer;@!l:quarterword); {saves |eqtb[p]|} 5976 begin check_full_save_stack; 5977 if l=level_zero then save_type(save_ptr):=restore_zero 5978 else begin save_stack[save_ptr]:=eqtb[p]; incr(save_ptr); 5979 save_type(save_ptr):=restore_old_value; 5980 end; 5981 save_level(save_ptr):=l; save_index(save_ptr):=p; incr(save_ptr); 5982 end; 5983 5984 @ The procedure |eq_define| defines an |eqtb| entry having specified 5985 |eq_type| and |equiv| fields, and saves the former value if appropriate. 5986 This procedure is used only for entries in the first four regions of |eqtb|, 5987 i.e., only for entries that have |eq_type| and |equiv| fields. 5988 After calling this routine, it is safe to put four more entries on 5989 |save_stack|, provided that there was room for four more entries before 5990 the call, since |eq_save| makes the necessary test. 5991 5992 @p procedure eq_define(@!p:pointer;@!t:quarterword;@!e:halfword); 5993 {new data for |eqtb|} 5994 begin if eq_level(p)=cur_level then eq_destroy(eqtb[p]) 5995 else if cur_level>level_one then eq_save(p,eq_level(p)); 5996 eq_level(p):=cur_level; eq_type(p):=t; equiv(p):=e; 5997 end; 5998 5999 @ The counterpart of |eq_define| for the remaining (fullword) positions in 6000 |eqtb| is called |eq_word_define|. Since |xeq_level[p]>=level_one| for all 6001 |p|, a `|restore_zero|' will never be used in this case. 6002 6003 @p procedure eq_word_define(@!p:pointer;@!w:integer); 6004 begin if xeq_level[p]<>cur_level then 6005 begin eq_save(p,xeq_level[p]); xeq_level[p]:=cur_level; 6006 end; 6007 eqtb[p].int:=w; 6008 end; 6009 6010 @ The |eq_define| and |eq_word_define| routines take care of local definitions. 6011 @^global definitions@> 6012 Global definitions are done in almost the same way, but there is no need 6013 to save old values, and the new value is associated with |level_one|. 6014 6015 @p procedure geq_define(@!p:pointer;@!t:quarterword;@!e:halfword); 6016 {global |eq_define|} 6017 begin eq_destroy(eqtb[p]); 6018 eq_level(p):=level_one; eq_type(p):=t; equiv(p):=e; 6019 end; 6020 @# 6021 procedure geq_word_define(@!p:pointer;@!w:integer); {global |eq_word_define|} 6022 begin eqtb[p].int:=w; xeq_level[p]:=level_one; 6023 end; 6024 6025 @ Subroutine |save_for_after| puts a token on the stack for save-keeping. 6026 6027 @p procedure save_for_after(@!t:halfword); 6028 begin if cur_level>level_one then 6029 begin check_full_save_stack; 6030 save_type(save_ptr):=insert_token; save_level(save_ptr):=level_zero; 6031 save_index(save_ptr):=t; incr(save_ptr); 6032 end; 6033 end; 6034 6035 @ The |unsave| routine goes the other way, taking items off of |save_stack|. 6036 This routine takes care of restoration when a level ends; everything 6037 belonging to the topmost group is cleared off of the save stack. 6038 6039 @p@t\4@>@<Declare the procedure called |restore_trace|@>@;@/ 6040 procedure@?back_input; forward; @t\2@> 6041 procedure unsave; {pops the top level off the save stack} 6042 label done; 6043 var p:pointer; {position to be restored} 6044 @!l:quarterword; {saved level, if in fullword regions of |eqtb|} 6045 @!t:halfword; {saved value of |cur_tok|} 6046 begin if cur_level>level_one then 6047 begin decr(cur_level); 6048 @<Clear off top level from |save_stack|@>; 6049 end 6050 else confusion("curlevel"); {|unsave| is not used when |cur_group=bottom_level|} 6051 @:this can't happen curlevel}{\quad curlevel@> 6052 end; 6053 6054 @ @<Clear off...@>= 6055 loop@+begin decr(save_ptr); 6056 if save_type(save_ptr)=level_boundary then goto done; 6057 p:=save_index(save_ptr); 6058 if save_type(save_ptr)=insert_token then 6059 @<Insert token |p| into \TeX's input@> 6060 else begin if save_type(save_ptr)=restore_old_value then 6061 begin l:=save_level(save_ptr); decr(save_ptr); 6062 end 6063 else save_stack[save_ptr]:=eqtb[undefined_control_sequence]; 6064 @<Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless 6065 |eqtb[p]| holds a global value@>; 6066 end; 6067 end; 6068 done: cur_group:=save_level(save_ptr); cur_boundary:=save_index(save_ptr) 6069 6070 @ A global definition, which sets the level to |level_one|, 6071 @^global definitions@> 6072 will not be undone by |unsave|. If at least one global definition of 6073 |eqtb[p]| has been carried out within the group that just ended, the 6074 last such definition will therefore survive. 6075 6076 @<Store \(s)|save...@>= 6077 if p<int_base then 6078 if eq_level(p)=level_one then 6079 begin eq_destroy(save_stack[save_ptr]); {destroy the saved value} 6080 @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/ 6081 end 6082 else begin eq_destroy(eqtb[p]); {destroy the current value} 6083 eqtb[p]:=save_stack[save_ptr]; {restore the saved value} 6084 @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/ 6085 end 6086 else if xeq_level[p]<>level_one then 6087 begin eqtb[p]:=save_stack[save_ptr]; xeq_level[p]:=l; 6088 @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/ 6089 end 6090 else begin 6091 @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/ 6092 end 6093 6094 @ @<Declare the procedure called |restore_trace|@>= 6095 @!stat procedure restore_trace(@!p:pointer;@!s:str_number); 6096 {|eqtb[p]| has just been restored or retained} 6097 begin begin_diagnostic; print_char("{"); print(s); print_char(" "); 6098 show_eqtb(p); print_char("}"); 6099 end_diagnostic(false); 6100 end; 6101 tats 6102 6103 @ When looking for possible pointers to a memory location, it is helpful 6104 to look for references from |eqtb| that might be waiting on the 6105 save stack. Of course, we might find spurious pointers too; but this 6106 routine is merely an aid when debugging, and at such times we are 6107 grateful for any scraps of information, even if they prove to be irrelevant. 6108 @^dirty \PASCAL@> 6109 6110 @<Search |save_stack| for equivalents that point to |p|@>= 6111 if save_ptr>0 then for q:=0 to save_ptr-1 do 6112 begin if equiv_field(save_stack[q])=p then 6113 begin print_nl("SAVE("); print_int(q); print_char(")"); 6114 end; 6115 end 6116 6117 @ Most of the parameters kept in |eqtb| can be changed freely, but there's 6118 an exception: The magnification should not be used with two different 6119 values during any \TeX\ job, since a single magnification is applied to an 6120 entire run. The global variable |mag_set| is set to the current magnification 6121 whenever it becomes necessary to ``freeze'' it at a particular value. 6122 6123 @<Glob...@>= 6124 @!mag_set:integer; {if nonzero, this magnification should be used henceforth} 6125 6126 @ @<Set init...@>= 6127 mag_set:=0; 6128 6129 @ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag| 6130 for magnification. 6131 6132 @p procedure prepare_mag; 6133 begin if (mag_set>0)and(mag<>mag_set) then 6134 begin print_err("Incompatible magnification ("); print_int(mag); 6135 @.Incompatible magnification@> 6136 print(");"); print_nl(" the previous value will be retained"); 6137 help2("I can handle only one magnification ratio per job. So I've")@/ 6138 ("reverted to the magnification you used earlier on this run.");@/ 6139 int_error(mag_set); 6140 geq_word_define(int_base+mag_code,mag_set); {|mag:=mag_set|} 6141 end; 6142 if (mag<=0)or(mag>32768) then 6143 begin print_err("Illegal magnification has been changed to 1000");@/ 6144 @.Illegal magnification...@> 6145 help1("The magnification ratio must be between 1 and 32768."); 6146 int_error(mag); geq_word_define(int_base+mag_code,1000); 6147 end; 6148 mag_set:=mag; 6149 end; 6150 6151 @* \[20] Token lists. 6152 A \TeX\ token is either a character or a control sequence, and it is 6153 @^token@> 6154 represented internally in one of two ways: (1)~A character whose ASCII 6155 code number is |c| and whose command code is |m| is represented as the 6156 number $2^8m+c$; the command code is in the range |1<=m<=14|. (2)~A control 6157 sequence whose |eqtb| address is |p| is represented as the number 6158 |cs_token_flag+p|. Here |cs_token_flag=@t$2^{12}-1$@>| is larger than 6159 $2^8m+c$, yet it is small enough that |cs_token_flag+p< max_halfword|; 6160 thus, a token fits comfortably in a halfword. 6161 6162 A token |t| represents a |left_brace| command if and only if 6163 |t<left_brace_limit|; it represents a |right_brace| command if and only if 6164 we have |left_brace_limit<=t<right_brace_limit|; and it represents a |match| or 6165 |end_match| command if and only if |match_token<=t<=end_match_token|. 6166 The following definitions take care of these token-oriented constants 6167 and a few others. 6168 6169 @d cs_token_flag==@'7777 {amount added to the |eqtb| location in a 6170 token that stands for a control sequence; is a multiple of~256, less~1} 6171 @d left_brace_token=@'0400 {$2^8\cdot|left_brace|$} 6172 @d left_brace_limit=@'1000 {$2^8\cdot(|left_brace|+1)$} 6173 @d right_brace_token=@'1000 {$2^8\cdot|right_brace|$} 6174 @d right_brace_limit=@'1400 {$2^8\cdot(|right_brace|+1)$} 6175 @d math_shift_token=@'1400 {$2^8\cdot|math_shift|$} 6176 @d tab_token=@'2000 {$2^8\cdot|tab_mark|$} 6177 @d out_param_token=@'2400 {$2^8\cdot|out_param|$} 6178 @d space_token=@'5040 {$2^8\cdot|spacer|+|" "|$} 6179 @d letter_token=@'5400 {$2^8\cdot|letter|$} 6180 @d other_token=@'6000 {$2^8\cdot|other_char|$} 6181 @d match_token=@'6400 {$2^8\cdot|match|$} 6182 @d end_match_token=@'7000 {$2^8\cdot|end_match|$} 6183 6184 @ @<Check the ``constant''...@>= 6185 if cs_token_flag+undefined_control_sequence>max_halfword then bad:=21; 6186 6187 @ A token list is a singly linked list of one-word nodes in |mem|, where 6188 each word contains a token and a link. Macro definitions, output-routine 6189 definitions, marks, \.{\\write} texts, and a few other things 6190 are remembered by \TeX\ in the form 6191 of token lists, usually preceded by a node with a reference count in its 6192 |token_ref_count| field. The token stored in location |p| is called 6193 |info(p)|. 6194 6195 Three special commands appear in the token lists of macro definitions. 6196 When |m=match|, it means that \TeX\ should scan a parameter 6197 for the current macro; when |m=end_match|, it means that parameter 6198 matching should end and \TeX\ should start reading the macro text; and 6199 when |m=out_param|, it means that \TeX\ should insert parameter 6200 number |c| into the text at this point. 6201 6202 The enclosing \.{\char'173} and \.{\char'175} characters of a macro 6203 definition are omitted, but an output routine 6204 will be enclosed in braces. 6205 6206 Here is an example macro definition that illustrates these conventions. 6207 After \TeX\ processes the text 6208 $$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$ 6209 the definition of \.{\\mac} is represented as a token list containing 6210 $$\def\,{\hskip2pt} 6211 \vbox{\halign{\hfil#\hfil\cr 6212 (reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ , 6213 \.{\\b}, |end_match|,\cr 6214 |out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#, 6215 |other_char|\,\.1,\cr 6216 |out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$ 6217 The procedure |scan_toks| builds such token lists, and |macro_call| 6218 does the parameter matching. 6219 @^reference counts@> 6220 6221 Examples such as 6222 $$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$ 6223 explain why reference counts would be needed even if \TeX\ had no \.{\\let} 6224 operation: When the token list for \.{\\m} is being read, the redefinition of 6225 \.{\\m} changes the |eqtb| entry before the token list has been fully 6226 consumed, so we dare not simply destroy a token list when its 6227 control sequence is being redefined. 6228 6229 If the parameter-matching part of a definition ends with `\.{\#\{}', 6230 the corresponding token list will have `\.\{' just before the `|end_match|' 6231 and also at the very end. The first `\.\{' is used to delimit the parameter; the 6232 second one keeps the first from disappearing. 6233 6234 @ The procedure |show_token_list|, which prints a symbolic form of 6235 the token list that starts at a given node |p|, illustrates these 6236 conventions. The token list being displayed should not begin with a reference 6237 count. However, the procedure is intended to be robust, so that if the 6238 memory links are awry or if |p| is not really a pointer to a token list, 6239 nothing catastrophic will happen. 6240 6241 An additional parameter |q| is also given; this parameter is either null 6242 or it points to a node in the token list where a certain magic computation 6243 takes place that will be explained later. (Basically, |q| is non-null when 6244 we are printing the two-line context information at the time of an error 6245 message; |q| marks the place corresponding to where the second line 6246 should begin.) 6247 6248 For example, if |p| points to the node containing the first \.a in the 6249 token list above, then |show_token_list| will print the string 6250 $$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$ 6251 and if |q| points to the node containing the second \.a, 6252 the magic computation will be performed just before the second \.a is printed. 6253 6254 The generation will stop, and `\.{\\ETC.}' will be printed, if the length 6255 of printing exceeds a given limit~|l|. Anomalous entries are printed in the 6256 form of control sequences that are not followed by a blank space, e.g., 6257 `\.{\\BAD.}'; this cannot be confused with actual control sequences because 6258 a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'. 6259 6260 @<Declare the procedure called |show_token_list|@>= 6261 procedure show_token_list(@!p,@!q:integer;@!l:integer); 6262 label exit; 6263 var m,@!c:integer; {pieces of a token} 6264 @!match_chr:ASCII_code; {character used in a `|match|'} 6265 @!n:ASCII_code; {the highest parameter number, as an ASCII digit} 6266 begin match_chr:="#"; n:="0"; tally:=0; 6267 while (p<>null) and (tally<l) do 6268 begin if p=q then @<Do magic computation@>; 6269 @<Display token |p|, and |return| if there are problems@>; 6270 p:=link(p); 6271 end; 6272 if p<>null then print_esc("ETC."); 6273 @.ETC@> 6274 exit: 6275 end; 6276 6277 @ @<Display token |p|...@>= 6278 if (p<hi_mem_min) or (p>mem_end) then 6279 begin print_esc("CLOBBERED."); return; 6280 @.CLOBBERED@> 6281 end; 6282 if info(p)>=cs_token_flag then print_cs(info(p)-cs_token_flag) 6283 else begin m:=info(p) div @'400; c:=info(p) mod @'400; 6284 if info(p)<0 then print_esc("BAD.") 6285 @.BAD@> 6286 else @<Display the token $(|m|,|c|)$@>; 6287 end 6288 6289 @ The procedure usually ``learns'' the character code used for macro 6290 parameters by seeing one in a |match| command before it runs into any 6291 |out_param| commands. 6292 6293 @<Display the token ...@>= 6294 case m of 6295 left_brace,right_brace,math_shift,tab_mark,sup_mark,sub_mark,spacer, 6296 letter,other_char: print(c); 6297 mac_param: begin print(c); print(c); 6298 end; 6299 out_param: begin print(match_chr); 6300 if c<=9 then print_char(c+"0") 6301 else begin print_char("!"); return; 6302 end; 6303 end; 6304 match: begin match_chr:=c; print(c); incr(n); print_char(n); 6305 if n>"9" then return; 6306 end; 6307 end_match: print("->"); 6308 @.->@> 6309 othercases print_esc("BAD.") 6310 @.BAD@> 6311 endcases 6312 6313 @ Here's the way we sometimes want to display a token list, given a pointer 6314 to its reference count; the pointer may be null. 6315 6316 @p procedure token_show(@!p:pointer); 6317 begin if p<>null then show_token_list(link(p),null,10000000); 6318 end; 6319 6320 @ The |print_meaning| subroutine displays |cur_cmd| and |cur_chr| in 6321 symbolic form, including the expansion of a macro or mark. 6322 6323 @p procedure print_meaning; 6324 begin print_cmd_chr(cur_cmd,cur_chr); 6325 if cur_cmd>=call then 6326 begin print_char(":"); print_ln; token_show(cur_chr); 6327 end 6328 else if cur_cmd=top_bot_mark then 6329 begin print_char(":"); print_ln; 6330 token_show(cur_mark[cur_chr]); 6331 end; 6332 end; 6333 6334 @* \[21] Introduction to the syntactic routines. 6335 Let's pause a moment now and try to look at the Big Picture. 6336 The \TeX\ program consists of three main parts: syntactic routines, 6337 semantic routines, and output routines. The chief purpose of the 6338 syntactic routines is to deliver the user's input to the semantic routines, 6339 one token at a time. The semantic routines act as an interpreter 6340 responding to these tokens, which may be regarded as commands. And the 6341 output routines are periodically called on to convert box-and-glue 6342 lists into a compact set of instructions that will be sent 6343 to a typesetter. We have discussed the basic data structures and utility 6344 routines of \TeX, so we are good and ready to plunge into the real activity by 6345 considering the syntactic routines. 6346 6347 Our current goal is to come to grips with the |get_next| procedure, 6348 which is the keystone of \TeX's input mechanism. Each call of |get_next| 6349 sets the value of three variables |cur_cmd|, |cur_chr|, and |cur_cs|, 6350 representing the next input token. 6351 $$\vbox{\halign{#\hfil\cr 6352 \hbox{|cur_cmd| denotes a command code from the long list of codes 6353 given above;}\cr 6354 \hbox{|cur_chr| denotes a character code or other modifier of the command 6355 code;}\cr 6356 \hbox{|cur_cs| is the |eqtb| location of the current control sequence,}\cr 6357 \hbox{\qquad if the current token was a control sequence, 6358 otherwise it's zero.}\cr}}$$ 6359 Underlying this external behavior of |get_next| is all the machinery 6360 necessary to convert from character files to tokens. At a given time we 6361 may be only partially finished with the reading of several files (for 6362 which \.{\\input} was specified), and partially finished with the expansion 6363 of some user-defined macros and/or some macro parameters, and partially 6364 finished with the generation of some text in a template for \.{\\halign}, 6365 and so on. When reading a character file, special characters must be 6366 classified as math delimiters, etc.; comments and extra blank spaces must 6367 be removed, paragraphs must be recognized, and control sequences must be 6368 found in the hash table. Furthermore there are occasions in which the 6369 scanning routines have looked ahead for a word like `\.{plus}' but only 6370 part of that word was found, hence a few characters must be put back 6371 into the input and scanned again. 6372 6373 To handle these situations, which might all be present simultaneously, 6374 \TeX\ uses various stacks that hold information about the incomplete 6375 activities, and there is a finite state control for each level of the 6376 input mechanism. These stacks record the current state of an implicitly 6377 recursive process, but the |get_next| procedure is not recursive. 6378 Therefore it will not be difficult to translate these algorithms into 6379 low-level languages that do not support recursion. 6380 6381 @<Glob...@>= 6382 @!cur_cmd: eight_bits; {current command set by |get_next|} 6383 @!cur_chr: halfword; {operand of current command} 6384 @!cur_cs: pointer; {control sequence found here, zero if none found} 6385 @!cur_tok: halfword; {packed representative of |cur_cmd| and |cur_chr|} 6386 6387 @ The |print_cmd_chr| routine prints a symbolic interpretation of a 6388 command code and its modifier. This is used in certain `\.{You can\'t}' 6389 error messages, and in the implementation of diagnostic routines like 6390 \.{\\show}. 6391 6392 The body of |print_cmd_chr| is a rather tedious listing of print 6393 commands, and most of it is essentially an inverse to the |primitive| 6394 routine that enters a \TeX\ primitive into |eqtb|. Therefore much of 6395 this procedure appears elsewhere in the program, 6396 together with the corresponding |primitive| calls. 6397 6398 @d chr_cmd(#)==begin print(#); print_ASCII(chr_code); 6399 end 6400 6401 @<Declare the procedure called |print_cmd_chr|@>= 6402 procedure print_cmd_chr(@!cmd:quarterword;@!chr_code:halfword); 6403 begin case cmd of 6404 left_brace: chr_cmd("begin-group character "); 6405 right_brace: chr_cmd("end-group character "); 6406 math_shift: chr_cmd("math shift character "); 6407 mac_param: chr_cmd("macro parameter character "); 6408 sup_mark: chr_cmd("superscript character "); 6409 sub_mark: chr_cmd("subscript character "); 6410 endv: print("end of alignment template"); 6411 spacer: chr_cmd("blank space "); 6412 letter: chr_cmd("the letter "); 6413 other_char: chr_cmd("the character "); 6414 @t\4@>@<Cases of |print_cmd_chr| for symbolic printing of primitives@>@/ 6415 othercases print("[unknown command code!]") 6416 endcases; 6417 end; 6418 6419 @ Here is a procedure that displays the current command. 6420 6421 @p procedure show_cur_cmd_chr; 6422 begin begin_diagnostic; print_nl("{"); 6423 if mode<>shown_mode then 6424 begin print_mode(mode); print(": "); shown_mode:=mode; 6425 end; 6426 print_cmd_chr(cur_cmd,cur_chr); print_char("}"); 6427 end_diagnostic(false); 6428 end; 6429 6430 @* \[22] Input stacks and states. 6431 This implementation of 6432 \TeX\ uses two different conventions for representing sequential stacks. 6433 @^stack conventions@>@^conventions for representing stacks@> 6434 6435 \yskip\hangg 1) If there is frequent access to the top entry, and if the 6436 stack is essentially never empty, then the top entry is kept in a global 6437 variable (even better would be a machine register), and the other entries 6438 appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the 6439 semantic stack described above is handled this way, and so is the input 6440 stack that we are about to study. 6441 6442 \yskip\hangg 2) If there is infrequent top access, the entire stack contents 6443 are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack| 6444 is treated this way, as we have seen. 6445 6446 \yskip\noindent 6447 The state of \TeX's input mechanism appears in the input stack, whose 6448 entries are records with six fields, called |state|, |index|, |start|, |loc|, 6449 |limit|, and |name|. This stack is maintained with 6450 convention~(1), so it is declared in the following way: 6451 6452 @<Types...@>= 6453 @!in_state_record = record 6454 @!state_field, @!index_field: quarterword; 6455 @!start_field,@!loc_field, @!limit_field, @!name_field: halfword; 6456 end; 6457 6458 @ @<Glob...@>= 6459 @!input_stack : array[0..stack_size] of in_state_record; 6460 @!input_ptr : 0..stack_size; {first unused location of |input_stack|} 6461 @!max_in_stack: 0..stack_size; {largest value of |input_ptr| when pushing} 6462 @!cur_input : in_state_record; 6463 {the ``top'' input state, according to convention (1)} 6464 6465 @ We've already defined the special variable |loc==cur_input.loc_field| 6466 in our discussion of basic input-output routines. The other components of 6467 |cur_input| are defined in the same way: 6468 6469 @d state==cur_input.state_field {current scanner state} 6470 @d index==cur_input.index_field {reference for buffer information} 6471 @d start==cur_input.start_field {starting position in |buffer|} 6472 @d limit==cur_input.limit_field {end of current line in |buffer|} 6473 @d name==cur_input.name_field {name of the current file} 6474 6475 @ Let's look more closely now at the control variables 6476 (|state|,~|index|,~|start|,~|loc|,~|limit|,~|name|), 6477 assuming that \TeX\ is reading a line of characters that have been input 6478 from some file or from the user's terminal. There is an array called 6479 |buffer| that acts as a stack of all lines of characters that are 6480 currently being read from files, including all lines on subsidiary 6481 levels of the input stack that are not yet completed. \TeX\ will return to 6482 the other lines when it is finished with the present input file. 6483 6484 (Incidentally, on a machine with byte-oriented addressing, it might be 6485 appropriate to combine |buffer| with the |str_pool| array, 6486 letting the buffer entries grow downward from the top of the string pool 6487 and checking that these two tables don't bump into each other.) 6488 6489 The line we are currently working on begins in position |start| of the 6490 buffer; the next character we are about to read is |buffer[loc]|; and 6491 |limit| is the location of the last character present. If |loc>limit|, 6492 the line has been completely read. Usually |buffer[limit]| is the 6493 |end_line_char|, denoting the end of a line, but this is not 6494 true if the current line is an insertion that was entered on the user's 6495 terminal in response to an error message. 6496 6497 The |name| variable is a string number that designates the name of 6498 the current file, if we are reading a text file. It is zero if we 6499 are reading from the terminal; it is |n+1| if we are reading from 6500 input stream |n|, where |0<=n<=16|. (Input stream 16 stands for 6501 an invalid stream number; in such cases the input is actually from 6502 the terminal, under control of the procedure |read_toks|.) 6503 6504 The |state| variable has one of three values, when we are scanning such 6505 files: 6506 $$\baselineskip 15pt\vbox{\halign{#\hfil\cr 6507 1) |state=mid_line| is the normal state.\cr 6508 2) |state=skip_blanks| is like |mid_line|, but blanks are ignored.\cr 6509 3) |state=new_line| is the state at the beginning of a line.\cr}}$$ 6510 These state values are assigned numeric codes so that if we add the state 6511 code to the next character's command code, we get distinct values. For 6512 example, `|mid_line+spacer|' stands for the case that a blank 6513 space character occurs in the middle of a line when it is not being 6514 ignored; after this case is processed, the next value of |state| will 6515 be |skip_blanks|. 6516 6517 @d mid_line=1 {|state| code when scanning a line of characters} 6518 @d skip_blanks=2+max_char_code {|state| code when ignoring blanks} 6519 @d new_line=3+max_char_code+max_char_code {|state| code at start of line} 6520 6521 @ Additional information about the current line is available via the 6522 |index| variable, which counts how many lines of characters are present 6523 in the buffer below the current level. We have |index=0| when reading 6524 from the terminal and prompting the user for each line; then if the user types, 6525 e.g., `\.{\\input paper}', we will have |index=1| while reading 6526 the file \.{paper.tex}. However, it does not follow that |index| is the 6527 same as the input stack pointer, since many of the levels on the input 6528 stack may come from token lists. For example, the instruction `\.{\\input 6529 paper}' might occur in a token list. 6530 6531 The global variable |in_open| is equal to the |index| 6532 value of the highest non-token-list level. Thus, the number of partially read 6533 lines in the buffer is |in_open+1|, and we have |in_open=index| 6534 when we are not reading a token list. 6535 6536 If we are not currently reading from the terminal, or from an input 6537 stream, we are reading from the file variable |input_file[index]|. We use 6538 the notation |terminal_input| as a convenient abbreviation for |name=0|, 6539 and |cur_file| as an abbreviation for |input_file[index]|. 6540 6541 The global variable |line| contains the line number in the topmost 6542 open file, for use in error messages. If we are not reading from 6543 the terminal, |line_stack[index]| holds the line number for the 6544 enclosing level, so that |line| can be restored when the current 6545 file has been read. Line numbers should never be negative, since the 6546 negative of the current line number is used to identify the user's output 6547 routine in the |mode_line| field of the semantic nest entries. 6548 6549 If more information about the input state is needed, it can be 6550 included in small arrays like those shown here. For example, 6551 the current page or segment number in the input file might be 6552 put into a variable |@!page|, maintained for enclosing levels in 6553 `\ignorespaces|@!page_stack:array[1..max_in_open] of integer|\unskip' 6554 by analogy with |line_stack|. 6555 @^system dependencies@> 6556 6557 @d terminal_input==(name=0) {are we reading from the terminal?} 6558 @d cur_file==input_file[index] {the current |alpha_file| variable} 6559 6560 @<Glob...@>= 6561 @!in_open : 0..max_in_open; {the number of lines in the buffer, less one} 6562 @!open_parens : 0..max_in_open; {the number of open text files} 6563 @!input_file : array[1..max_in_open] of alpha_file; 6564 @!line : integer; {current line number in the current source file} 6565 @!line_stack : array[1..max_in_open] of integer; 6566 6567 @ Users of \TeX\ sometimes forget to balance left and right braces properly, 6568 and one of the ways \TeX\ tries to spot such errors is by considering an 6569 input file as broken into subfiles by control sequences that 6570 are declared to be \.{\\outer}. 6571 6572 A variable called |scanner_status| tells \TeX\ whether or not to complain 6573 when a subfile ends. This variable has six possible values: 6574 6575 \yskip\hang|normal|, means that a subfile can safely end here without incident. 6576 6577 \yskip\hang|skipping|, means that a subfile can safely end here, but not a file, 6578 because we're reading past some conditional text that was not selected. 6579 6580 \yskip\hang|defining|, means that a subfile shouldn't end now because a 6581 macro is being defined. 6582 6583 \yskip\hang|matching|, means that a subfile shouldn't end now because a 6584 macro is being used and we are searching for the end of its arguments. 6585 6586 \yskip\hang|aligning|, means that a subfile shouldn't end now because we are 6587 not finished with the preamble of an \.{\\halign} or \.{\\valign}. 6588 6589 \yskip\hang|absorbing|, means that a subfile shouldn't end now because we are 6590 reading a balanced token list for \.{\\message}, \.{\\write}, etc. 6591 6592 \yskip\noindent 6593 If the |scanner_status| is not |normal|, the variable |warning_index| points 6594 to the |eqtb| location for the relevant control sequence name to print 6595 in an error message. 6596 6597 @d skipping=1 {|scanner_status| when passing conditional text} 6598 @d defining=2 {|scanner_status| when reading a macro definition} 6599 @d matching=3 {|scanner_status| when reading macro arguments} 6600 @d aligning=4 {|scanner_status| when reading an alignment preamble} 6601 @d absorbing=5 {|scanner_status| when reading a balanced text} 6602 6603 @<Glob...@>= 6604 @!scanner_status : normal..absorbing; {can a subfile end now?} 6605 @!warning_index : pointer; {identifier relevant to non-|normal| scanner status} 6606 @!def_ref : pointer; {reference count of token list being defined} 6607 6608 @ Here is a procedure that uses |scanner_status| to print a warning message 6609 when a subfile has ended, and at certain other crucial times: 6610 6611 @<Declare the procedure called |runaway|@>= 6612 procedure runaway; 6613 var p:pointer; {head of runaway list} 6614 begin if scanner_status>skipping then 6615 begin print_nl("Runaway "); 6616 @.Runaway...@> 6617 case scanner_status of 6618 defining: begin print("definition"); p:=def_ref; 6619 end; 6620 matching: begin print("argument"); p:=temp_head; 6621 end; 6622 aligning: begin print("preamble"); p:=hold_head; 6623 end; 6624 absorbing: begin print("text"); p:=def_ref; 6625 end; 6626 end; {there are no other cases} 6627 print_char("?");print_ln; show_token_list(link(p),null,error_line-10); 6628 end; 6629 end; 6630 6631 @ However, all this discussion about input state really applies only to the 6632 case that we are inputting from a file. There is another important case, 6633 namely when we are currently getting input from a token list. In this case 6634 |state=token_list|, and the conventions about the other state variables 6635 are different: 6636 6637 \yskip\hang|loc| is a pointer to the current node in the token list, i.e., 6638 the node that will be read next. If |loc=null|, the token list has been 6639 fully read. 6640 6641 \yskip\hang|start| points to the first node of the token list; this node 6642 may or may not contain a reference count, depending on the type of token 6643 list involved. 6644 6645 \yskip\hang|token_type|, which takes the place of |index| in the 6646 discussion above, is a code number that explains what kind of token list 6647 is being scanned. 6648 6649 \yskip\hang|name| points to the |eqtb| address of the control sequence 6650 being expanded, if the current token list is a macro. 6651 6652 \yskip\hang|param_start|, which takes the place of |limit|, tells where 6653 the parameters of the current macro begin in the |param_stack|, if the 6654 current token list is a macro. 6655 6656 \yskip\noindent The |token_type| can take several values, depending on 6657 where the current token list came from: 6658 6659 \yskip\hang|parameter|, if a parameter is being scanned; 6660 6661 \hang|u_template|, if the \<u_j> part of an alignment 6662 template is being scanned; 6663 6664 \hang|v_template|, if the \<v_j> part of an alignment 6665 template is being scanned; 6666 6667 \hang|backed_up|, if the token list being scanned has been inserted as 6668 `to be read again'; 6669 6670 \hang|inserted|, if the token list being scanned has been inserted as 6671 the text expansion of a \.{\\count} or similar variable; 6672 6673 \hang|macro|, if a user-defined control sequence is being scanned; 6674 6675 \hang|output_text|, if an \.{\\output} routine is being scanned; 6676 6677 \hang|every_par_text|, if the text of \.{\\everypar} is being scanned; 6678 6679 \hang|every_math_text|, if the text of \.{\\everymath} is being scanned; 6680 6681 \hang|every_display_text|, if the text of \.{\\everydisplay} is being scanned; 6682 6683 \hang|every_hbox_text|, if the text of \.{\\everyhbox} is being scanned; 6684 6685 \hang|every_vbox_text|, if the text of \.{\\everyvbox} is being scanned; 6686 6687 \hang|every_job_text|, if the text of \.{\\everyjob} is being scanned; 6688 6689 \hang|every_cr_text|, if the text of \.{\\everycr} is being scanned; 6690 6691 \hang|mark_text|, if the text of a \.{\\mark} is being scanned; 6692 6693 \hang|write_text|, if the text of a \.{\\write} is being scanned. 6694 6695 \yskip\noindent 6696 The codes for |output_text|, |every_par_text|, etc., are equal to a constant 6697 plus the corresponding codes for token list parameters |output_routine_loc|, 6698 |every_par_loc|, etc. The token list begins with a reference count if and 6699 only if |token_type>=macro|. 6700 @^reference counts@> 6701 6702 @d token_list=0 {|state| code when scanning a token list} 6703 @d token_type==index {type of current token list} 6704 @d param_start==limit {base of macro parameters in |param_stack|} 6705 @d parameter=0 {|token_type| code for parameter} 6706 @d u_template=1 {|token_type| code for \<u_j> template} 6707 @d v_template=2 {|token_type| code for \<v_j> template} 6708 @d backed_up=3 {|token_type| code for text to be reread} 6709 @d inserted=4 {|token_type| code for inserted texts} 6710 @d macro=5 {|token_type| code for defined control sequences} 6711 @d output_text=6 {|token_type| code for output routines} 6712 @d every_par_text=7 {|token_type| code for \.{\\everypar}} 6713 @d every_math_text=8 {|token_type| code for \.{\\everymath}} 6714 @d every_display_text=9 {|token_type| code for \.{\\everydisplay}} 6715 @d every_hbox_text=10 {|token_type| code for \.{\\everyhbox}} 6716 @d every_vbox_text=11 {|token_type| code for \.{\\everyvbox}} 6717 @d every_job_text=12 {|token_type| code for \.{\\everyjob}} 6718 @d every_cr_text=13 {|token_type| code for \.{\\everycr}} 6719 @d mark_text=14 {|token_type| code for \.{\\topmark}, etc.} 6720 @d write_text=15 {|token_type| code for \.{\\write}} 6721 6722 @ The |param_stack| is an auxiliary array used to hold pointers to the token 6723 lists for parameters at the current level and subsidiary levels of input. 6724 This stack is maintained with convention (2), and it grows at a different 6725 rate from the others. 6726 6727 @<Glob...@>= 6728 @!param_stack:array [0..param_size] of pointer; 6729 {token list pointers for parameters} 6730 @!param_ptr:0..param_size; {first unused entry in |param_stack|} 6731 @!max_param_stack:integer; 6732 {largest value of |param_ptr|, will be |<=param_size+9|} 6733 6734 @ The input routines must also interact with the processing of 6735 \.{\\halign} and \.{\\valign}, since the appearance of tab marks and 6736 \.{\\cr} in certain places is supposed to trigger the beginning of special 6737 \<v_j> template text in the scanner. This magic is accomplished by an 6738 |align_state| variable that is increased by~1 when a `\.{\char'173}' is 6739 scanned and decreased by~1 when a `\.{\char'175}' is scanned. The |align_state| 6740 is nonzero during the \<u_j> template, after which it is set to zero; the 6741 \<v_j> template begins when a tab mark or \.{\\cr} occurs at a time that 6742 |align_state=0|. 6743 6744 @<Glob...@>= 6745 @!align_state:integer; {group level with respect to current alignment} 6746 6747 @ Thus, the ``current input state'' can be very complicated indeed; there 6748 can be many levels and each level can arise in a variety of ways. The 6749 |show_context| procedure, which is used by \TeX's error-reporting routine to 6750 print out the current input state on all levels down to the most recent 6751 line of characters from an input file, illustrates most of these conventions. 6752 The global variable |base_ptr| contains the lowest level that was 6753 displayed by this procedure. 6754 6755 @<Glob...@>= 6756 @!base_ptr:0..stack_size; {shallowest level shown by |show_context|} 6757 6758 @ The status at each level is indicated by printing two lines, where the first 6759 line indicates what was read so far and the second line shows what remains 6760 to be read. The context is cropped, if necessary, so that the first line 6761 contains at most |half_error_line| characters, and the second contains 6762 at most |error_line|. Non-current input levels whose |token_type| is 6763 `|backed_up|' are shown only if they have not been fully read. 6764 6765 @p procedure show_context; {prints where the scanner is} 6766 label done; 6767 var old_setting:0..max_selector; {saved |selector| setting} 6768 @!nn:integer; {number of contexts shown so far, less one} 6769 @!bottom_line:boolean; {have we reached the final context to be shown?} 6770 @<Local variables for formatting calculations@>@/ 6771 begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input; 6772 {store current state} 6773 nn:=-1; bottom_line:=false; 6774 loop@+begin cur_input:=input_stack[base_ptr]; {enter into the context} 6775 if (state<>token_list) then 6776 if (name>17) or (base_ptr=0) then bottom_line:=true; 6777 if (base_ptr=input_ptr)or bottom_line or(nn<error_context_lines) then 6778 @<Display the current context@> 6779 else if nn=error_context_lines then 6780 begin print_nl("..."); incr(nn); {omitted if |error_context_lines<0|} 6781 end; 6782 if bottom_line then goto done; 6783 decr(base_ptr); 6784 end; 6785 done: cur_input:=input_stack[input_ptr]; {restore original state} 6786 end; 6787 6788 @ @<Display the current context@>= 6789 begin if (base_ptr=input_ptr) or (state<>token_list) or 6790 (token_type<>backed_up) or (loc<>null) then 6791 {we omit backed-up token lists that have already been read} 6792 begin tally:=0; {get ready to count characters} 6793 old_setting:=selector; 6794 if state<>token_list then 6795 begin @<Print location of current line@>; 6796 @<Pseudoprint the line@>; 6797 end 6798 else begin @<Print type of token list@>; 6799 @<Pseudoprint the token list@>; 6800 end; 6801 selector:=old_setting; {stop pseudoprinting} 6802 @<Print two lines using the tricky pseudoprinted information@>; 6803 incr(nn); 6804 end; 6805 end 6806 6807 @ This routine should be changed, if necessary, to give the best possible 6808 indication of where the current line resides in the input file. 6809 For example, on some systems it is best to print both a page and line number. 6810 @^system dependencies@> 6811 6812 @<Print location of current line@>= 6813 if name<=17 then 6814 if terminal_input then 6815 if base_ptr=0 then print_nl("<*>") else print_nl("<insert> ") 6816 else begin print_nl("<read "); 6817 if name=17 then print_char("*")@+else print_int(name-1); 6818 @.*\relax@> 6819 print_char(">"); 6820 end 6821 else begin print_nl("l."); print_int(line); 6822 end; 6823 print_char(" ") 6824 6825 @ @<Print type of token list@>= 6826 case token_type of 6827 parameter: print_nl("<argument> "); 6828 u_template,v_template: print_nl("<template> "); 6829 backed_up: if loc=null then print_nl("<recently read> ") 6830 else print_nl("<to be read again> "); 6831 inserted: print_nl("<inserted text> "); 6832 macro: begin print_ln; print_cs(name); 6833 end; 6834 output_text: print_nl("<output> "); 6835 every_par_text: print_nl("<everypar> "); 6836 every_math_text: print_nl("<everymath> "); 6837 every_display_text: print_nl("<everydisplay> "); 6838 every_hbox_text: print_nl("<everyhbox> "); 6839 every_vbox_text: print_nl("<everyvbox> "); 6840 every_job_text: print_nl("<everyjob> "); 6841 every_cr_text: print_nl("<everycr> "); 6842 mark_text: print_nl("<mark> "); 6843 write_text: print_nl("<write> "); 6844 othercases print_nl("?") {this should never happen} 6845 endcases 6846 6847 @ Here it is necessary to explain a little trick. We don't want to store a long 6848 string that corresponds to a token list, because that string might take up 6849 lots of memory; and we are printing during a time when an error message is 6850 being given, so we dare not do anything that might overflow one of \TeX's 6851 tables. So `pseudoprinting' is the answer: We enter a mode of printing 6852 that stores characters into a buffer of length |error_line|, where character 6853 $k+1$ is placed into \hbox{|trick_buf[k mod error_line]|} if 6854 |k<trick_count|, otherwise character |k| is dropped. Initially we set 6855 |tally:=0| and |trick_count:=1000000|; then when we reach the 6856 point where transition from line 1 to line 2 should occur, we 6857 set |first_count:=tally| and |trick_count:=@tmax@>(error_line, 6858 tally+1+error_line-half_error_line)|. At the end of the 6859 pseudoprinting, the values of |first_count|, |tally|, and 6860 |trick_count| give us all the information we need to print the two lines, 6861 and all of the necessary text is in |trick_buf|. 6862 6863 Namely, let |l| be the length of the descriptive information that appears 6864 on the first line. The length of the context information gathered for that 6865 line is |k=first_count|, and the length of the context information 6866 gathered for line~2 is $m=\min(|tally|, |trick_count|)-k$. If |l+k<=h|, 6867 where |h=half_error_line|, we print |trick_buf[0..k-1]| after the 6868 descriptive information on line~1, and set |n:=l+k|; here |n| is the 6869 length of line~1. If $l+k>h$, some cropping is necessary, so we set |n:=h| 6870 and print `\.{...}' followed by 6871 $$\hbox{|trick_buf[(l+k-h+3)..k-1]|,}$$ 6872 where subscripts of |trick_buf| are circular modulo |error_line|. The 6873 second line consists of |n|~spaces followed by |trick_buf[k..(k+m-1)]|, 6874 unless |n+m>error_line|; in the latter case, further cropping is done. 6875 This is easier to program than to explain. 6876 6877 @<Local variables for formatting...@>= 6878 @!i:0..buf_size; {index into |buffer|} 6879 @!j:0..buf_size; {end of current line in |buffer|} 6880 @!l:0..half_error_line; {length of descriptive information on line 1} 6881 @!m:integer; {context information gathered for line 2} 6882 @!n:0..error_line; {length of line 1} 6883 @!p: integer; {starting or ending place in |trick_buf|} 6884 @!q: integer; {temporary index} 6885 6886 @ The following code sets up the print routines so that they will gather 6887 the desired information. 6888 6889 @d begin_pseudoprint== 6890 begin l:=tally; tally:=0; selector:=pseudo; 6891 trick_count:=1000000; 6892 end 6893 @d set_trick_count== 6894 begin first_count:=tally; 6895 trick_count:=tally+1+error_line-half_error_line; 6896 if trick_count<error_line then trick_count:=error_line; 6897 end 6898 6899 @ And the following code uses the information after it has been gathered. 6900 6901 @<Print two lines using the tricky pseudoprinted information@>= 6902 if trick_count=1000000 then set_trick_count; 6903 {|set_trick_count| must be performed} 6904 if tally<trick_count then m:=tally-first_count 6905 else m:=trick_count-first_count; {context on line 2} 6906 if l+first_count<=half_error_line then 6907 begin p:=0; n:=l+first_count; 6908 end 6909 else begin print("..."); p:=l+first_count-half_error_line+3; 6910 n:=half_error_line; 6911 end; 6912 for q:=p to first_count-1 do print_char(trick_buf[q mod error_line]); 6913 print_ln; 6914 for q:=1 to n do print_char(" "); {print |n| spaces to begin line~2} 6915 if m+n<=error_line then p:=first_count+m else p:=first_count+(error_line-n-3); 6916 for q:=first_count to p-1 do print_char(trick_buf[q mod error_line]); 6917 if m+n>error_line then print("...") 6918 6919 @ But the trick is distracting us from our current goal, which is to 6920 understand the input state. So let's concentrate on the data structures that 6921 are being pseudoprinted as we finish up the |show_context| procedure. 6922 6923 @<Pseudoprint the line@>= 6924 begin_pseudoprint; 6925 if buffer[limit]=end_line_char then j:=limit 6926 else j:=limit+1; {determine the effective end of the line} 6927 if j>0 then for i:=start to j-1 do 6928 begin if i=loc then set_trick_count; 6929 print(buffer[i]); 6930 end 6931 6932 @ @<Pseudoprint the token list@>= 6933 begin_pseudoprint; 6934 if token_type<macro then show_token_list(start,loc,100000) 6935 else show_token_list(link(start),loc,100000) {avoid reference count} 6936 6937 @ Here is the missing piece of |show_token_list| that is activated when the 6938 token beginning line~2 is about to be shown: 6939 6940 @<Do magic computation@>=set_trick_count 6941 6942 @* \[23] Maintaining the input stacks. 6943 The following subroutines change the input status in commonly needed ways. 6944 6945 First comes |push_input|, which stores the current state and creates a 6946 new level (having, initially, the same properties as the old). 6947 6948 @d push_input==@t@> {enter a new input level, save the old} 6949 begin if input_ptr>max_in_stack then 6950 begin max_in_stack:=input_ptr; 6951 if input_ptr=stack_size then overflow("input stack size",stack_size); 6952 @:TeX capacity exceeded input stack size}{\quad input stack size@> 6953 end; 6954 input_stack[input_ptr]:=cur_input; {stack the record} 6955 incr(input_ptr); 6956 end 6957 6958 @ And of course what goes up must come down. 6959 6960 @d pop_input==@t@> {leave an input level, re-enter the old} 6961 begin decr(input_ptr); cur_input:=input_stack[input_ptr]; 6962 end 6963 6964 @ Here is a procedure that starts a new level of token-list input, given 6965 a token list |p| and its type |t|. If |t=macro|, the calling routine should 6966 set |name| and |loc|. 6967 6968 @d back_list(#)==begin_token_list(#,backed_up) {backs up a simple token list} 6969 @d ins_list(#)==begin_token_list(#,inserted) {inserts a simple token list} 6970 6971 @p procedure begin_token_list(@!p:pointer;@!t:quarterword); 6972 begin push_input; state:=token_list; start:=p; token_type:=t; 6973 if t>=macro then {the token list starts with a reference count} 6974 begin add_token_ref(p); 6975 if t=macro then param_start:=param_ptr 6976 else begin loc:=link(p); 6977 if tracing_macros>1 then 6978 begin begin_diagnostic; print_nl(""); 6979 case t of 6980 mark_text:print_esc("mark"); 6981 write_text:print_esc("write"); 6982 othercases print_cmd_chr(assign_toks,t-output_text+output_routine_loc) 6983 endcases;@/ 6984 print("->"); token_show(p); end_diagnostic(false); 6985 end; 6986 end; 6987 end 6988 else loc:=p; 6989 end; 6990 6991 @ When a token list has been fully scanned, the following computations 6992 should be done as we leave that level of input. The |token_type| tends 6993 to be equal to either |backed_up| or |inserted| about 2/3 of the time. 6994 @^inner loop@> 6995 6996 @p procedure end_token_list; {leave a token-list input level} 6997 begin if token_type>=backed_up then {token list to be deleted} 6998 begin if token_type<=inserted then flush_list(start) 6999 else begin delete_token_ref(start); {update reference count} 7000 if token_type=macro then {parameters must be flushed} 7001 while param_ptr>param_start do 7002 begin decr(param_ptr); 7003 flush_list(param_stack[param_ptr]); 7004 end; 7005 end; 7006 end 7007 else if token_type=u_template then 7008 if align_state>500000 then align_state:=0 7009 else fatal_error("(interwoven alignment preambles are not allowed)"); 7010 @.interwoven alignment preambles...@> 7011 pop_input; 7012 check_interrupt; 7013 end; 7014 7015 @ Sometimes \TeX\ has read too far and wants to ``unscan'' what it has 7016 seen. The |back_input| procedure takes care of this by putting the token 7017 just scanned back into the input stream, ready to be read again. This 7018 procedure can be used only if |cur_tok| represents the token to be 7019 replaced. Some applications of \TeX\ use this procedure a lot, 7020 so it has been slightly optimized for speed. 7021 @^inner loop@> 7022 7023 @p procedure back_input; {undoes one token of input} 7024 var p:pointer; {a token list of length one} 7025 begin while (state=token_list)and(loc=null)and(token_type<>v_template) do 7026 end_token_list; {conserve stack space} 7027 p:=get_avail; info(p):=cur_tok; 7028 if cur_tok<right_brace_limit then 7029 if cur_tok<left_brace_limit then decr(align_state) 7030 else incr(align_state); 7031 push_input; state:=token_list; start:=p; token_type:=backed_up; 7032 loc:=p; {that was |back_list(p)|, without procedure overhead} 7033 end; 7034 7035 @ @<Insert token |p| into \TeX's input@>= 7036 begin t:=cur_tok; cur_tok:=p; back_input; cur_tok:=t; 7037 end 7038 7039 @ The |back_error| routine is used when we want to replace an offending token 7040 just before issuing an error message. This routine, like |back_input|, 7041 requires that |cur_tok| has been set. We disable interrupts during the 7042 call of |back_input| so that the help message won't be lost. 7043 7044 @p procedure back_error; {back up one token and call |error|} 7045 begin OK_to_interrupt:=false; back_input; OK_to_interrupt:=true; error; 7046 end; 7047 @# 7048 procedure ins_error; {back up one inserted token and call |error|} 7049 begin OK_to_interrupt:=false; back_input; token_type:=inserted; 7050 OK_to_interrupt:=true; error; 7051 end; 7052 7053 @ The |begin_file_reading| procedure starts a new level of input for lines 7054 of characters to be read from a file, or as an insertion from the 7055 terminal. It does not take care of opening the file, nor does it set |loc| 7056 or |limit| or |line|. 7057 @^system dependencies@> 7058 7059 @p procedure begin_file_reading; 7060 begin if in_open=max_in_open then overflow("text input levels",max_in_open); 7061 @:TeX capacity exceeded text input levels}{\quad text input levels@> 7062 if first=buf_size then overflow("buffer size",buf_size); 7063 @:TeX capacity exceeded buffer size}{\quad buffer size@> 7064 incr(in_open); push_input; index:=in_open; 7065 line_stack[index]:=line; start:=first; state:=mid_line; 7066 name:=0; {|terminal_input| is now |true|} 7067 end; 7068 7069 @ Conversely, the variables must be downdated when such a level of input 7070 is finished: 7071 7072 @p procedure end_file_reading; 7073 begin first:=start; line:=line_stack[index]; 7074 if name>17 then a_close(cur_file); {forget it} 7075 pop_input; decr(in_open); 7076 end; 7077 7078 @ In order to keep the stack from overflowing during a long sequence of 7079 inserted `\.{\\show}' commands, the following routine removes completed 7080 error-inserted lines from memory. 7081 7082 @p procedure clear_for_error_prompt; 7083 begin while (state<>token_list)and terminal_input and@| 7084 (input_ptr>0)and(loc>limit) do end_file_reading; 7085 print_ln; clear_terminal; 7086 end; 7087 7088 @ To get \TeX's whole input mechanism going, we perform the following 7089 actions. 7090 7091 @<Initialize the input routines@>= 7092 begin input_ptr:=0; max_in_stack:=0; 7093 in_open:=0; open_parens:=0; max_buf_stack:=0; 7094 param_ptr:=0; max_param_stack:=0; 7095 first:=buf_size; repeat buffer[first]:=0; decr(first); until first=0; 7096 scanner_status:=normal; warning_index:=null; first:=1; 7097 state:=new_line; start:=1; index:=0; line:=0; name:=0; 7098 force_eof:=false; 7099 align_state:=1000000;@/ 7100 if not init_terminal then goto final_end; 7101 limit:=last; first:=last+1; {|init_terminal| has set |loc| and |last|} 7102 end 7103 7104 @* \[24] Getting the next token. 7105 The heart of \TeX's input mechanism is the |get_next| procedure, which 7106 we shall develop in the next few sections of the program. Perhaps we 7107 shouldn't actually call it the ``heart,'' however, because it really acts 7108 as \TeX's eyes and mouth, reading the source files and gobbling them up. 7109 And it also helps \TeX\ to regurgitate stored token lists that are to be 7110 processed again. 7111 @^eyes and mouth@> 7112 7113 The main duty of |get_next| is to input one token and to set |cur_cmd| 7114 and |cur_chr| to that token's command code and modifier. Furthermore, if 7115 the input token is a control sequence, the |eqtb| location of that control 7116 sequence is stored in |cur_cs|; otherwise |cur_cs| is set to zero. 7117 7118 Underlying this simple description is a certain amount of complexity 7119 because of all the cases that need to be handled. 7120 However, the inner loop of |get_next| is reasonably short and fast. 7121 7122 When |get_next| is asked to get the next token of a \.{\\read} line, 7123 it sets |cur_cmd=cur_chr=cur_cs=0| in the case that no more tokens 7124 appear on that line. (There might not be any tokens at all, if the 7125 |end_line_char| has |ignore| as its catcode.) 7126 7127 @ The value of |par_loc| is the |eqtb| address of `\.{\\par}'. This quantity 7128 is needed because a blank line of input is supposed to be exactly equivalent 7129 to the appearance of \.{\\par}; we must set |cur_cs:=par_loc| 7130 when detecting a blank line. 7131 7132 @<Glob...@>= 7133 @!par_loc:pointer; {location of `\.{\\par}' in |eqtb|} 7134 @!par_token:halfword; {token representing `\.{\\par}'} 7135 7136 @ @<Put each...@>= 7137 primitive("par",par_end,256); {cf.\ |scan_file_name|} 7138 @!@:par_}{\.{\\par} primitive@> 7139 par_loc:=cur_val; par_token:=cs_token_flag+par_loc; 7140 7141 @ @<Cases of |print_cmd_chr|...@>= 7142 par_end:print_esc("par"); 7143 7144 @ Before getting into |get_next|, let's consider the subroutine that 7145 is called when an `\.{\\outer}' control sequence has been scanned or 7146 when the end of a file has been reached. These two cases are distinguished 7147 by |cur_cs|, which is zero at the end of a file. 7148 7149 @p procedure check_outer_validity; 7150 var p:pointer; {points to inserted token list} 7151 @!q:pointer; {auxiliary pointer} 7152 begin if scanner_status<>normal then 7153 begin deletions_allowed:=false; 7154 @<Back up an outer control sequence so that it can be reread@>; 7155 if scanner_status>skipping then 7156 @<Tell the user what has run away and try to recover@> 7157 else begin print_err("Incomplete "); print_cmd_chr(if_test,cur_if); 7158 @.Incomplete \\if...@> 7159 print("; all text was ignored after line "); print_int(skip_line); 7160 help3("A forbidden control sequence occurred in skipped text.")@/ 7161 ("This kind of error happens when you say `\if...' and forget")@/ 7162 ("the matching `\fi'. I've inserted a `\fi'; this might work."); 7163 if cur_cs<>0 then cur_cs:=0 7164 else help_line[2]:=@| 7165 "The file ended while I was skipping conditional text."; 7166 cur_tok:=cs_token_flag+frozen_fi; ins_error; 7167 end; 7168 deletions_allowed:=true; 7169 end; 7170 end; 7171 7172 @ An outer control sequence that occurs in a \.{\\read} will not be reread, 7173 since the error recovery for \.{\\read} is not very powerful. 7174 7175 @<Back up an outer control sequence so that it can be reread@>= 7176 if cur_cs<>0 then 7177 begin if (state=token_list)or(name<1)or(name>17) then 7178 begin p:=get_avail; info(p):=cs_token_flag+cur_cs; 7179 back_list(p); {prepare to read the control sequence again} 7180 end; 7181 cur_cmd:=spacer; cur_chr:=" "; {replace it by a space} 7182 end 7183 7184 @ @<Tell the user what has run away...@>= 7185 begin runaway; {print a definition, argument, or preamble} 7186 if cur_cs=0 then print_err("File ended") 7187 @.File ended while scanning...@> 7188 else begin cur_cs:=0; print_err("Forbidden control sequence found"); 7189 @.Forbidden control sequence...@> 7190 end; 7191 print(" while scanning "); 7192 @<Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}', 7193 and insert tokens that should lead to recovery@>; 7194 print(" of "); sprint_cs(warning_index); 7195 help4("I suspect you have forgotten a `}', causing me")@/ 7196 ("to read past where you wanted me to stop.")@/ 7197 ("I'll try to recover; but if the error is serious,")@/ 7198 ("you'd better type `E' or `X' now and fix your file.");@/ 7199 error; 7200 end 7201 7202 @ The recovery procedure can't be fully understood without knowing more 7203 about the \TeX\ routines that should be aborted, but we can sketch the 7204 ideas here: For a runaway definition or a runaway balanced text 7205 we will insert a right brace; for a 7206 runaway preamble, we will insert a special \.{\\cr} token and a right 7207 brace; and for a runaway argument, we will set |long_state| to 7208 |outer_call| and insert \.{\\par}. 7209 7210 @<Print either `\.{definition}' or ...@>= 7211 p:=get_avail; 7212 case scanner_status of 7213 defining:begin print("definition"); info(p):=right_brace_token+"}"; 7214 end; 7215 matching:begin print("use"); info(p):=par_token; long_state:=outer_call; 7216 end; 7217 aligning:begin print("preamble"); info(p):=right_brace_token+"}"; q:=p; 7218 p:=get_avail; link(p):=q; info(p):=cs_token_flag+frozen_cr; 7219 align_state:=-1000000; 7220 end; 7221 absorbing:begin print("text"); info(p):=right_brace_token+"}"; 7222 end; 7223 end; {there are no other cases} 7224 ins_list(p) 7225 7226 @ We need to mention a procedure here that may be called by |get_next|. 7227 7228 @p procedure@?firm_up_the_line; forward; 7229 7230 @ Now we're ready to take the plunge into |get_next| itself. Parts of 7231 this routine are executed more often than any other instructions of \TeX. 7232 @^mastication@>@^inner loop@> 7233 7234 @d switch=25 {a label in |get_next|} 7235 @d start_cs=26 {another} 7236 7237 @p procedure get_next; {sets |cur_cmd|, |cur_chr|, |cur_cs| to next token} 7238 label restart, {go here to get the next input token} 7239 switch, {go here to eat the next character from a file} 7240 reswitch, {go here to digest it again} 7241 start_cs, {go here to start looking for a control sequence} 7242 found, {go here when a control sequence has been found} 7243 exit; {go here when the next input token has been got} 7244 var k:0..buf_size; {an index into |buffer|} 7245 @!t:halfword; {a token} 7246 @!cat:0..max_char_code; {|cat_code(cur_chr)|, usually} 7247 @!c,@!cc:ASCII_code; {constituents of a possible expanded code} 7248 @!d:2..3; {number of excess characters in an expanded code} 7249 begin restart: cur_cs:=0; 7250 if state<>token_list then 7251 @<Input from external file, |goto restart| if no input found@> 7252 else @<Input from token list, |goto restart| if end of list or 7253 if a parameter needs to be expanded@>; 7254 @<If an alignment entry has just ended, take appropriate action@>; 7255 exit:end; 7256 7257 @ An alignment entry ends when a tab or \.{\\cr} occurs, provided that the 7258 current level of braces is the same as the level that was present at the 7259 beginning of that alignment entry; i.e., provided that |align_state| has 7260 returned to the value it had after the \<u_j> template for that entry. 7261 @^inner loop@> 7262 7263 @<If an alignment entry has just ended, take appropriate action@>= 7264 if cur_cmd<=car_ret then if cur_cmd>=tab_mark then if align_state=0 then 7265 @<Insert the \(v)\<v_j> template and |goto restart|@> 7266 7267 @ @<Input from external file, |goto restart| if no input found@>= 7268 @^inner loop@> 7269 begin switch: if loc<=limit then {current line not yet finished} 7270 begin cur_chr:=buffer[loc]; incr(loc); 7271 reswitch: cur_cmd:=cat_code(cur_chr); 7272 @<Change state if necessary, and |goto switch| if the 7273 current character should be ignored, 7274 or |goto reswitch| if the current character 7275 changes to another@>; 7276 end 7277 else begin state:=new_line;@/ 7278 @<Move to next line of file, 7279 or |goto restart| if there is no next line, 7280 or |return| if a \.{\\read} line has finished@>; 7281 check_interrupt; 7282 goto switch; 7283 end; 7284 end 7285 7286 @ The following 48-way switch accomplishes the scanning quickly, assuming 7287 that a decent \PASCAL\ compiler has translated the code. Note that the numeric 7288 values for |mid_line|, |skip_blanks|, and |new_line| are spaced 7289 apart from each other by |max_char_code+1|, so we can add a character's 7290 command code to the state to get a single number that characterizes both. 7291 7292 @d any_state_plus(#) == mid_line+#,skip_blanks+#,new_line+# 7293 7294 @<Change state if necessary...@>= 7295 case state+cur_cmd of 7296 @<Cases where character is ignored@>: goto switch; 7297 any_state_plus(escape): @<Scan a control sequence 7298 and set |state:=skip_blanks| or |mid_line|@>; 7299 any_state_plus(active_char): @<Process an active-character control sequence 7300 and set |state:=mid_line|@>; 7301 any_state_plus(sup_mark): @<If this |sup_mark| starts an expanded character 7302 like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|, 7303 otherwise set |state:=mid_line|@>; 7304 any_state_plus(invalid_char): @<Decry the invalid character and 7305 |goto restart|@>; 7306 @t\4@>@<Handle situations involving spaces, braces, changes of state@>@; 7307 othercases do_nothing 7308 endcases 7309 7310 @ @<Cases where character is ignored@>= 7311 any_state_plus(ignore),skip_blanks+spacer,new_line+spacer 7312 7313 @ We go to |restart| instead of to |switch|, because |state| might equal 7314 |token_list| after the error has been dealt with 7315 (cf.\ |clear_for_error_prompt|). 7316 7317 @<Decry the invalid...@>= 7318 begin print_err("Text line contains an invalid character"); 7319 @.Text line contains...@> 7320 help2("A funny symbol that I can't read has just been input.")@/ 7321 ("Continue, and I'll forget that it ever happened.");@/ 7322 deletions_allowed:=false; error; deletions_allowed:=true; 7323 goto restart; 7324 end 7325 7326 @ @d add_delims_to(#)==#+math_shift,#+tab_mark,#+mac_param, 7327 #+sub_mark,#+letter,#+other_char 7328 7329 @<Handle situations involving spaces, braces, changes of state@>= 7330 mid_line+spacer:@<Enter |skip_blanks| state, emit a space@>; 7331 mid_line+car_ret:@<Finish line, emit a space@>; 7332 skip_blanks+car_ret,any_state_plus(comment): 7333 @<Finish line, |goto switch|@>; 7334 new_line+car_ret:@<Finish line, emit a \.{\\par}@>; 7335 mid_line+left_brace: incr(align_state); 7336 skip_blanks+left_brace,new_line+left_brace: begin 7337 state:=mid_line; incr(align_state); 7338 end; 7339 mid_line+right_brace: decr(align_state); 7340 skip_blanks+right_brace,new_line+right_brace: begin 7341 state:=mid_line; decr(align_state); 7342 end; 7343 add_delims_to(skip_blanks),add_delims_to(new_line): state:=mid_line; 7344 7345 @ When a character of type |spacer| gets through, its character code is 7346 changed to $\.{"\ "}=@'40$. This means that the ASCII codes for tab and space, 7347 and for the space inserted at the end of a line, will 7348 be treated alike when macro parameters are being matched. We do this 7349 since such characters are indistinguishable on most computer terminal displays. 7350 7351 @<Finish line, emit a space@>= 7352 begin loc:=limit+1; cur_cmd:=spacer; cur_chr:=" "; 7353 end 7354 7355 @ The following code is performed only when |cur_cmd=spacer|. 7356 7357 @<Enter |skip_blanks| state, emit a space@>= 7358 begin state:=skip_blanks; cur_chr:=" "; 7359 end 7360 7361 @ @<Finish line, |goto switch|@>= 7362 begin loc:=limit+1; goto switch; 7363 end 7364 7365 @ @<Finish line, emit a \.{\\par}@>= 7366 begin loc:=limit+1; cur_cs:=par_loc; cur_cmd:=eq_type(cur_cs); 7367 cur_chr:=equiv(cur_cs); 7368 if cur_cmd>=outer_call then check_outer_validity; 7369 end 7370 7371 @ Notice that a code like \.{\^\^8} becomes \.x if not followed by a hex digit. 7372 7373 @d is_hex(#)==(((#>="0")and(#<="9"))or((#>="a")and(#<="f"))) 7374 @d hex_to_cur_chr== 7375 if c<="9" then cur_chr:=c-"0" @+else cur_chr:=c-"a"+10; 7376 if cc<="9" then cur_chr:=16*cur_chr+cc-"0" 7377 else cur_chr:=16*cur_chr+cc-"a"+10 7378 7379 @<If this |sup_mark| starts an expanded character...@>= 7380 begin if cur_chr=buffer[loc] then if loc<limit then 7381 begin c:=buffer[loc+1]; @+if c<@'200 then {yes we have an expanded char} 7382 begin loc:=loc+2; 7383 if is_hex(c) then if loc<=limit then 7384 begin cc:=buffer[loc]; @+if is_hex(cc) then 7385 begin incr(loc); hex_to_cur_chr; goto reswitch; 7386 end; 7387 end; 7388 if c<@'100 then cur_chr:=c+@'100 @+else cur_chr:=c-@'100; 7389 goto reswitch; 7390 end; 7391 end; 7392 state:=mid_line; 7393 end 7394 7395 @ @<Process an active-character...@>= 7396 begin cur_cs:=cur_chr+active_base; 7397 cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); state:=mid_line; 7398 if cur_cmd>=outer_call then check_outer_validity; 7399 end 7400 7401 @ Control sequence names are scanned only when they appear in some line of 7402 a file; once they have been scanned the first time, their |eqtb| location 7403 serves as a unique identification, so \TeX\ doesn't need to refer to the 7404 original name any more except when it prints the equivalent in symbolic form. 7405 7406 The program that scans a control sequence has been written carefully 7407 in order to avoid the blowups that might otherwise occur if a malicious 7408 user tried something like `\.{\\catcode\'15=0}'. The algorithm might 7409 look at |buffer[limit+1]|, but it never looks at |buffer[limit+2]|. 7410 7411 If expanded characters like `\.{\^\^A}' or `\.{\^\^df}' 7412 appear in or just following 7413 a control sequence name, they are converted to single characters in the 7414 buffer and the process is repeated, slowly but surely. 7415 7416 @<Scan a control...@>= 7417 begin if loc>limit then cur_cs:=null_cs {|state| is irrelevant in this case} 7418 else begin start_cs: k:=loc; cur_chr:=buffer[k]; cat:=cat_code(cur_chr); 7419 incr(k); 7420 if cat=letter then state:=skip_blanks 7421 else if cat=spacer then state:=skip_blanks 7422 else state:=mid_line; 7423 if (cat=letter)and(k<=limit) then 7424 @<Scan ahead in the buffer until finding a nonletter; 7425 if an expanded code is encountered, reduce it 7426 and |goto start_cs|; otherwise if a multiletter control 7427 sequence is found, adjust |cur_cs| and |loc|, and 7428 |goto found|@> 7429 else @<If an expanded code is present, reduce it and |goto start_cs|@>; 7430 cur_cs:=single_base+buffer[loc]; incr(loc); 7431 end; 7432 found: cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 7433 if cur_cmd>=outer_call then check_outer_validity; 7434 end 7435 7436 @ Whenever we reach the following piece of code, we will have 7437 |cur_chr=buffer[k-1]| and |k<=limit+1| and |cat=cat_code(cur_chr)|. If an 7438 expanded code like \.{\^\^A} or \.{\^\^df} appears in |buffer[(k-1)..(k+1)]| 7439 or |buffer[(k-1)..(k+2)]|, we 7440 will store the corresponding code in |buffer[k-1]| and shift the rest of 7441 the buffer left two or three places. 7442 7443 @<If an expanded...@>= 7444 begin if buffer[k]=cur_chr then @+if cat=sup_mark then @+if k<limit then 7445 begin c:=buffer[k+1]; @+if c<@'200 then {yes, one is indeed present} 7446 begin d:=2; 7447 if is_hex(c) then @+if k+2<=limit then 7448 begin cc:=buffer[k+2]; @+if is_hex(cc) then incr(d); 7449 end; 7450 if d>2 then 7451 begin hex_to_cur_chr; buffer[k-1]:=cur_chr; 7452 end 7453 else if c<@'100 then buffer[k-1]:=c+@'100 7454 else buffer[k-1]:=c-@'100; 7455 limit:=limit-d; first:=first-d; 7456 while k<=limit do 7457 begin buffer[k]:=buffer[k+d]; incr(k); 7458 end; 7459 goto start_cs; 7460 end; 7461 end; 7462 end 7463 7464 @ @<Scan ahead in the buffer...@>= 7465 begin repeat cur_chr:=buffer[k]; cat:=cat_code(cur_chr); incr(k); 7466 until (cat<>letter)or(k>limit); 7467 @<If an expanded...@>; 7468 if cat<>letter then decr(k); 7469 {now |k| points to first nonletter} 7470 if k>loc+1 then {multiletter control sequence has been scanned} 7471 begin cur_cs:=id_lookup(loc,k-loc); loc:=k; goto found; 7472 end; 7473 end 7474 7475 @ Let's consider now what happens when |get_next| is looking at a token list. 7476 7477 @<Input from token list, |goto restart| if end of list or 7478 if a parameter needs to be expanded@>= 7479 if loc<>null then {list not exhausted} 7480 @^inner loop@> 7481 begin t:=info(loc); loc:=link(loc); {move to next} 7482 if t>=cs_token_flag then {a control sequence token} 7483 begin cur_cs:=t-cs_token_flag; 7484 cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 7485 if cur_cmd>=outer_call then 7486 if cur_cmd=dont_expand then 7487 @<Get the next token, suppressing expansion@> 7488 else check_outer_validity; 7489 end 7490 else begin cur_cmd:=t div @'400; cur_chr:=t mod @'400; 7491 case cur_cmd of 7492 left_brace: incr(align_state); 7493 right_brace: decr(align_state); 7494 out_param: @<Insert macro parameter and |goto restart|@>; 7495 othercases do_nothing 7496 endcases; 7497 end; 7498 end 7499 else begin {we are done with this token list} 7500 end_token_list; goto restart; {resume previous level} 7501 end 7502 7503 @ The present point in the program is reached only when the |expand| 7504 routine has inserted a special marker into the input. In this special 7505 case, |info(loc)| is known to be a control sequence token, and |link(loc)=null|. 7506 7507 @d no_expand_flag=257 {this characterizes a special variant of |relax|} 7508 7509 @<Get the next token, suppressing expansion@>= 7510 begin cur_cs:=info(loc)-cs_token_flag; loc:=null;@/ 7511 cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 7512 if cur_cmd>max_command then 7513 begin cur_cmd:=relax; cur_chr:=no_expand_flag; 7514 end; 7515 end 7516 7517 @ @<Insert macro parameter...@>= 7518 begin begin_token_list(param_stack[param_start+cur_chr-1],parameter); 7519 goto restart; 7520 end 7521 7522 @ All of the easy branches of |get_next| have now been taken care of. 7523 There is one more branch. 7524 7525 @d end_line_char_inactive == (end_line_char<0)or(end_line_char>255) 7526 7527 @<Move to next line of file, or |goto restart|...@>= 7528 if name>17 then @<Read next line of file into |buffer|, or 7529 |goto restart| if the file has ended@> 7530 else begin if not terminal_input then {\.{\\read} line has ended} 7531 begin cur_cmd:=0; cur_chr:=0; return; 7532 end; 7533 if input_ptr>0 then {text was inserted during error recovery} 7534 begin end_file_reading; goto restart; {resume previous level} 7535 end; 7536 if selector<log_only then open_log_file; 7537 if interaction>nonstop_mode then 7538 begin if end_line_char_inactive then incr(limit); 7539 if limit=start then {previous line was empty} 7540 print_nl("(Please type a command or say `\end')"); 7541 @.Please type...@> 7542 print_ln; first:=start; 7543 prompt_input("*"); {input on-line into |buffer|} 7544 @.*\relax@> 7545 limit:=last; 7546 if end_line_char_inactive then decr(limit) 7547 else buffer[limit]:=end_line_char; 7548 first:=limit+1; 7549 loc:=start; 7550 end 7551 else fatal_error("*** (job aborted, no legal \end found)"); 7552 @.job aborted@> 7553 {nonstop mode, which is intended for overnight batch processing, 7554 never waits for on-line input} 7555 end 7556 7557 @ The global variable |force_eof| is normally |false|; it is set |true| 7558 by an \.{\\endinput} command. 7559 7560 @<Glob...@>= 7561 @!force_eof:boolean; {should the next \.{\\input} be aborted early?} 7562 7563 @ @<Read next line of file into |buffer|, or 7564 |goto restart| if the file has ended@>= 7565 begin incr(line); first:=start; 7566 if not force_eof then 7567 begin if input_ln(cur_file,true) then {not end of file} 7568 firm_up_the_line {this sets |limit|} 7569 else force_eof:=true; 7570 end; 7571 if force_eof then 7572 begin print_char(")"); decr(open_parens); 7573 update_terminal; {show user that file has been read} 7574 force_eof:=false; 7575 end_file_reading; {resume previous level} 7576 check_outer_validity; goto restart; 7577 end; 7578 if end_line_char_inactive then decr(limit) 7579 else buffer[limit]:=end_line_char; 7580 first:=limit+1; loc:=start; {ready to read} 7581 end 7582 7583 @ If the user has set the |pausing| parameter to some positive value, 7584 and if nonstop mode has not been selected, each line of input is displayed 7585 on the terminal and the transcript file, followed by `\.{=>}'. 7586 \TeX\ waits for a response. If the response is simply |carriage_return|, the 7587 line is accepted as it stands, otherwise the line typed is 7588 used instead of the line in the file. 7589 7590 @p procedure firm_up_the_line; 7591 var k:0..buf_size; {an index into |buffer|} 7592 begin limit:=last; 7593 if pausing>0 then if interaction>nonstop_mode then 7594 begin wake_up_terminal; print_ln; 7595 if start<limit then for k:=start to limit-1 do print(buffer[k]); 7596 first:=limit; prompt_input("=>"); {wait for user response} 7597 @.=>@> 7598 if last>first then 7599 begin for k:=first to last-1 do {move line down in buffer} 7600 buffer[k+start-first]:=buffer[k]; 7601 limit:=start+last-first; 7602 end; 7603 end; 7604 end; 7605 7606 @ Since |get_next| is used so frequently in \TeX, it is convenient 7607 to define three related procedures that do a little more: 7608 7609 \yskip\hang|get_token| not only sets |cur_cmd| and |cur_chr|, it 7610 also sets |cur_tok|, a packed halfword version of the current token. 7611 7612 \yskip\hang|get_x_token|, meaning ``get an expanded token,'' is like 7613 |get_token|, but if the current token turns out to be a user-defined 7614 control sequence (i.e., a macro call), or a conditional, 7615 or something like \.{\\topmark} or \.{\\expandafter} or \.{\\csname}, 7616 it is eliminated from the input by beginning the expansion of the macro 7617 or the evaluation of the conditional. 7618 7619 \yskip\hang|x_token| is like |get_x_token| except that it assumes that 7620 |get_next| has already been called. 7621 7622 \yskip\noindent 7623 In fact, these three procedures account for almost every use of |get_next|. 7624 7625 @ No new control sequences will be defined except during a call of 7626 |get_token|, or when \.{\\csname} compresses a token list, because 7627 |no_new_control_sequence| is always |true| at other times. 7628 7629 @p procedure get_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|} 7630 begin no_new_control_sequence:=false; get_next; no_new_control_sequence:=true; 7631 @^inner loop@> 7632 if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr 7633 else cur_tok:=cs_token_flag+cur_cs; 7634 end; 7635 7636 @* \[25] Expanding the next token. 7637 Only a dozen or so command codes |>max_command| can possibly be returned by 7638 |get_next|; in increasing order, they are |undefined_cs|, |expand_after|, 7639 |no_expand|, |input|, |if_test|, |fi_or_else|, |cs_name|, |convert|, |the|, 7640 |top_bot_mark|, |call|, |long_call|, |outer_call|, |long_outer_call|, and 7641 |end_template|.{\emergencystretch=40pt\par} 7642 7643 The |expand| subroutine is used when |cur_cmd>max_command|. It removes a 7644 ``call'' or a conditional or one of the other special operations just 7645 listed. It follows that |expand| might invoke itself recursively. In all 7646 cases, |expand| destroys the current token, but it sets things up so that 7647 the next |get_next| will deliver the appropriate next token. The value of 7648 |cur_tok| need not be known when |expand| is called. 7649 7650 Since several of the basic scanning routines communicate via global variables, 7651 their values are saved as local variables of |expand| so that 7652 recursive calls don't invalidate them. 7653 @^recursion@> 7654 7655 @p@t\4@>@<Declare the procedure called |macro_call|@>@;@/ 7656 @t\4@>@<Declare the procedure called |insert_relax|@>@;@/ 7657 procedure@?pass_text; forward;@t\2@> 7658 procedure@?start_input; forward;@t\2@> 7659 procedure@?conditional; forward;@t\2@> 7660 procedure@?get_x_token; forward;@t\2@> 7661 procedure@?conv_toks; forward;@t\2@> 7662 procedure@?ins_the_toks; forward;@t\2@> 7663 procedure expand; 7664 var t:halfword; {token that is being ``expanded after''} 7665 @!p,@!q,@!r:pointer; {for list manipulation} 7666 @!j:0..buf_size; {index into |buffer|} 7667 @!cv_backup:integer; {to save the global quantity |cur_val|} 7668 @!cvl_backup,@!radix_backup,@!co_backup:small_number; 7669 {to save |cur_val_level|, etc.} 7670 @!backup_backup:pointer; {to save |link(backup_head)|} 7671 @!save_scanner_status:small_number; {temporary storage of |scanner_status|} 7672 begin cv_backup:=cur_val; cvl_backup:=cur_val_level; radix_backup:=radix; 7673 co_backup:=cur_order; backup_backup:=link(backup_head); 7674 if cur_cmd<call then @<Expand a nonmacro@> 7675 else if cur_cmd<end_template then macro_call 7676 else @<Insert a token containing |frozen_endv|@>; 7677 cur_val:=cv_backup; cur_val_level:=cvl_backup; radix:=radix_backup; 7678 cur_order:=co_backup; link(backup_head):=backup_backup; 7679 end; 7680 7681 @ @<Expand a nonmacro@>= 7682 begin if tracing_commands>1 then show_cur_cmd_chr; 7683 case cur_cmd of 7684 top_bot_mark:@<Insert the \(a)appropriate mark text into the scanner@>; 7685 expand_after:@<Expand the token after the next token@>; 7686 no_expand:@<Suppress expansion of the next token@>; 7687 cs_name:@<Manufacture a control sequence name@>; 7688 convert:conv_toks; {this procedure is discussed in Part 27 below} 7689 the:ins_the_toks; {this procedure is discussed in Part 27 below} 7690 if_test:conditional; {this procedure is discussed in Part 28 below} 7691 fi_or_else:@<Terminate the current conditional and skip to \.{\\fi}@>; 7692 input:@<Initiate or terminate input from a file@>; 7693 othercases @<Complain about an undefined macro@> 7694 endcases; 7695 end 7696 7697 @ It takes only a little shuffling to do what \TeX\ calls \.{\\expandafter}. 7698 7699 @<Expand the token after...@>= 7700 begin get_token; t:=cur_tok; get_token; 7701 if cur_cmd>max_command then expand@+else back_input; 7702 cur_tok:=t; back_input; 7703 end 7704 7705 @ The implementation of \.{\\noexpand} is a bit trickier, because it is 7706 necessary to insert a special `|dont_expand|' marker into \TeX's reading 7707 mechanism. This special marker is processed by |get_next|, but it does 7708 not slow down the inner loop. 7709 7710 Since \.{\\outer} macros might arise here, we must also 7711 clear the |scanner_status| temporarily. 7712 7713 @<Suppress expansion...@>= 7714 begin save_scanner_status:=scanner_status; scanner_status:=normal; 7715 get_token; scanner_status:=save_scanner_status; t:=cur_tok; 7716 back_input; {now |start| and |loc| point to the backed-up token |t|} 7717 if t>=cs_token_flag then 7718 begin p:=get_avail; info(p):=cs_token_flag+frozen_dont_expand; 7719 link(p):=loc; start:=p; loc:=p; 7720 end; 7721 end 7722 7723 @ @<Complain about an undefined macro@>= 7724 begin print_err("Undefined control sequence"); 7725 @.Undefined control sequence@> 7726 help5("The control sequence at the end of the top line")@/ 7727 ("of your error message was never \def'ed. If you have")@/ 7728 ("misspelled it (e.g., `\hobx'), type `I' and the correct")@/ 7729 ("spelling (e.g., `I\hbox'). Otherwise just continue,")@/ 7730 ("and I'll forget about whatever was undefined."); 7731 error; 7732 end 7733 7734 @ The |expand| procedure and some other routines that construct token 7735 lists find it convenient to use the following macros, which are valid only if 7736 the variables |p| and |q| are reserved for token-list building. 7737 7738 @d store_new_token(#)==begin q:=get_avail; link(p):=q; info(q):=#; 7739 p:=q; {|link(p)| is |null|} 7740 end 7741 @d fast_store_new_token(#)==begin fast_get_avail(q); link(p):=q; info(q):=#; 7742 p:=q; {|link(p)| is |null|} 7743 end 7744 7745 @ @<Manufacture a control...@>= 7746 begin r:=get_avail; p:=r; {head of the list of characters} 7747 repeat get_x_token; 7748 if cur_cs=0 then store_new_token(cur_tok); 7749 until cur_cs<>0; 7750 if cur_cmd<>end_cs_name then @<Complain about missing \.{\\endcsname}@>; 7751 @<Look up the characters of list |r| in the hash table, and set |cur_cs|@>; 7752 flush_list(r); 7753 if eq_type(cur_cs)=undefined_cs then 7754 begin eq_define(cur_cs,relax,256); {N.B.: The |save_stack| might change} 7755 end; {the control sequence will now match `\.{\\relax}'} 7756 cur_tok:=cur_cs+cs_token_flag; back_input; 7757 end 7758 7759 @ @<Complain about missing \.{\\endcsname}@>= 7760 begin print_err("Missing "); print_esc("endcsname"); print(" inserted"); 7761 @.Missing \\endcsname...@> 7762 help2("The control sequence marked <to be read again> should")@/ 7763 ("not appear between \csname and \endcsname."); 7764 back_error; 7765 end 7766 7767 @ @<Look up the characters of list |r| in the hash table...@>= 7768 j:=first; p:=link(r); 7769 while p<>null do 7770 begin if j>=max_buf_stack then 7771 begin max_buf_stack:=j+1; 7772 if max_buf_stack=buf_size then 7773 overflow("buffer size",buf_size); 7774 @:TeX capacity exceeded buffer size}{\quad buffer size@> 7775 end; 7776 buffer[j]:=info(p) mod @'400; incr(j); p:=link(p); 7777 end; 7778 if j>first+1 then 7779 begin no_new_control_sequence:=false; cur_cs:=id_lookup(first,j-first); 7780 no_new_control_sequence:=true; 7781 end 7782 else if j=first then cur_cs:=null_cs {the list is empty} 7783 else cur_cs:=single_base+buffer[first] {the list has length one} 7784 7785 @ An |end_template| command is effectively changed to an |endv| command 7786 by the following code. (The reason for this is discussed below; the 7787 |frozen_end_template| at the end of the template has passed the 7788 |check_outer_validity| test, so its mission of error detection has been 7789 accomplished.) 7790 7791 @<Insert a token containing |frozen_endv|@>= 7792 begin cur_tok:=cs_token_flag+frozen_endv; back_input; 7793 end 7794 7795 @ The processing of \.{\\input} involves the |start_input| subroutine, 7796 which will be declared later; the processing of \.{\\endinput} is trivial. 7797 7798 @<Put each...@>= 7799 primitive("input",input,0);@/ 7800 @!@:input_}{\.{\\input} primitive@> 7801 primitive("endinput",input,1);@/ 7802 @!@:end_input_}{\.{\\endinput} primitive@> 7803 7804 @ @<Cases of |print_cmd_chr|...@>= 7805 input: if chr_code=0 then print_esc("input")@+else print_esc("endinput"); 7806 7807 @ @<Initiate or terminate input...@>= 7808 if cur_chr>0 then force_eof:=true 7809 else if name_in_progress then insert_relax 7810 else start_input 7811 7812 @ Sometimes the expansion looks too far ahead, so we want to insert 7813 a harmless \.{\\relax} into the user's input. 7814 7815 @<Declare the procedure called |insert_relax|@>= 7816 procedure insert_relax; 7817 begin cur_tok:=cs_token_flag+cur_cs; back_input; 7818 cur_tok:=cs_token_flag+frozen_relax; back_input; token_type:=inserted; 7819 end; 7820 7821 @ Here is a recursive procedure that is \TeX's usual way to get the 7822 next token of input. It has been slightly optimized to take account of 7823 common cases. 7824 7825 @p procedure get_x_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|, 7826 and expands macros} 7827 label restart,done; 7828 begin restart: get_next; 7829 @^inner loop@> 7830 if cur_cmd<=max_command then goto done; 7831 if cur_cmd>=call then 7832 if cur_cmd<end_template then macro_call 7833 else begin cur_cs:=frozen_endv; cur_cmd:=endv; 7834 goto done; {|cur_chr=null_list|} 7835 end 7836 else expand; 7837 goto restart; 7838 done: if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr 7839 else cur_tok:=cs_token_flag+cur_cs; 7840 end; 7841 7842 @ The |get_x_token| procedure is essentially equivalent to two consecutive 7843 procedure calls: |get_next; x_token|. 7844 7845 @p procedure x_token; {|get_x_token| without the initial |get_next|} 7846 begin while cur_cmd>max_command do 7847 begin expand; 7848 get_next; 7849 end; 7850 if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr 7851 else cur_tok:=cs_token_flag+cur_cs; 7852 end; 7853 7854 @ A control sequence that has been \.{\\def}'ed by the user is expanded by 7855 \TeX's |macro_call| procedure. 7856 7857 Before we get into the details of |macro_call|, however, let's consider the 7858 treatment of primitives like \.{\\topmark}, since they are essentially 7859 macros without parameters. The token lists for such marks are kept in a 7860 global array of five pointers; we refer to the individual entries of this 7861 array by symbolic names |top_mark|, etc. The value of |top_mark| is either 7862 |null| or a pointer to the reference count of a token list. 7863 7864 @d top_mark_code=0 {the mark in effect at the previous page break} 7865 @d first_mark_code=1 {the first mark between |top_mark| and |bot_mark|} 7866 @d bot_mark_code=2 {the mark in effect at the current page break} 7867 @d split_first_mark_code=3 {the first mark found by \.{\\vsplit}} 7868 @d split_bot_mark_code=4 {the last mark found by \.{\\vsplit}} 7869 @d top_mark==cur_mark[top_mark_code] 7870 @d first_mark==cur_mark[first_mark_code] 7871 @d bot_mark==cur_mark[bot_mark_code] 7872 @d split_first_mark==cur_mark[split_first_mark_code] 7873 @d split_bot_mark==cur_mark[split_bot_mark_code] 7874 7875 @<Glob...@>= 7876 @!cur_mark:array[top_mark_code..split_bot_mark_code] of pointer; 7877 {token lists for marks} 7878 7879 @ @<Set init...@>= 7880 top_mark:=null; first_mark:=null; bot_mark:=null; 7881 split_first_mark:=null; split_bot_mark:=null; 7882 7883 @ @<Put each...@>= 7884 primitive("topmark",top_bot_mark,top_mark_code); 7885 @!@:top_mark_}{\.{\\topmark} primitive@> 7886 primitive("firstmark",top_bot_mark,first_mark_code); 7887 @!@:first_mark_}{\.{\\firstmark} primitive@> 7888 primitive("botmark",top_bot_mark,bot_mark_code); 7889 @!@:bot_mark_}{\.{\\botmark} primitive@> 7890 primitive("splitfirstmark",top_bot_mark,split_first_mark_code); 7891 @!@:split_first_mark_}{\.{\\splitfirstmark} primitive@> 7892 primitive("splitbotmark",top_bot_mark,split_bot_mark_code); 7893 @!@:split_bot_mark_}{\.{\\splitbotmark} primitive@> 7894 7895 @ @<Cases of |print_cmd_chr|...@>= 7896 top_bot_mark: case chr_code of 7897 first_mark_code: print_esc("firstmark"); 7898 bot_mark_code: print_esc("botmark"); 7899 split_first_mark_code: print_esc("splitfirstmark"); 7900 split_bot_mark_code: print_esc("splitbotmark"); 7901 othercases print_esc("topmark") 7902 endcases; 7903 7904 @ The following code is activated when |cur_cmd=top_bot_mark| and 7905 when |cur_chr| is a code like |top_mark_code|. 7906 7907 @<Insert the \(a)appropriate mark text into the scanner@>= 7908 begin if cur_mark[cur_chr]<>null then 7909 begin_token_list(cur_mark[cur_chr],mark_text); 7910 end 7911 7912 @ Now let's consider |macro_call| itself, which is invoked when \TeX\ is 7913 scanning a control sequence whose |cur_cmd| is either |call|, |long_call|, 7914 |outer_call|, or |long_outer_call|. The control sequence definition 7915 appears in the token list whose reference count is in location |cur_chr| 7916 of |mem|. 7917 7918 The global variable |long_state| will be set to |call| or to |long_call|, 7919 depending on whether or not the control sequence disallows \.{\\par} 7920 in its parameters. The |get_next| routine will set |long_state| to 7921 |outer_call| and emit \.{\\par}, if a file ends or if an \.{\\outer} 7922 control sequence occurs in the midst of an argument. 7923 7924 @<Glob...@>= 7925 @!long_state:call..long_outer_call; {governs the acceptance of \.{\\par}} 7926 7927 @ The parameters, if any, must be scanned before the macro is expanded. 7928 Parameters are token lists without reference counts. They are placed on 7929 an auxiliary stack called |pstack| while they are being scanned, since 7930 the |param_stack| may be losing entries during the matching process. 7931 (Note that |param_stack| can't be gaining entries, since |macro_call| is 7932 the only routine that puts anything onto |param_stack|, and it 7933 is not recursive.) 7934 7935 @<Glob...@>= 7936 @!pstack:array[0..8] of pointer; {arguments supplied to a macro} 7937 7938 @ After parameter scanning is complete, the parameters are moved to the 7939 |param_stack|. Then the macro body is fed to the scanner; in other words, 7940 |macro_call| places the defined text of the control sequence at the 7941 top of\/ \TeX's input stack, so that |get_next| will proceed to read it 7942 next. 7943 7944 The global variable |cur_cs| contains the |eqtb| address of the control sequence 7945 being expanded, when |macro_call| begins. If this control sequence has not been 7946 declared \.{\\long}, i.e., if its command code in the |eq_type| field is 7947 not |long_call| or |long_outer_call|, its parameters are not allowed to contain 7948 the control sequence \.{\\par}. If an illegal \.{\\par} appears, the macro 7949 call is aborted, and the \.{\\par} will be rescanned. 7950 7951 @<Declare the procedure called |macro_call|@>= 7952 procedure macro_call; {invokes a user-defined control sequence} 7953 label exit, continue, done, done1, found; 7954 var r:pointer; {current node in the macro's token list} 7955 @!p:pointer; {current node in parameter token list being built} 7956 @!q:pointer; {new node being put into the token list} 7957 @!s:pointer; {backup pointer for parameter matching} 7958 @!t:pointer; {cycle pointer for backup recovery} 7959 @!u,@!v:pointer; {auxiliary pointers for backup recovery} 7960 @!rbrace_ptr:pointer; {one step before the last |right_brace| token} 7961 @!n:small_number; {the number of parameters scanned} 7962 @!unbalance:halfword; {unmatched left braces in current parameter} 7963 @!m:halfword; {the number of tokens or groups (usually)} 7964 @!ref_count:pointer; {start of the token list} 7965 @!save_scanner_status:small_number; {|scanner_status| upon entry} 7966 @!save_warning_index:pointer; {|warning_index| upon entry} 7967 @!match_chr:ASCII_code; {character used in parameter} 7968 begin save_scanner_status:=scanner_status; save_warning_index:=warning_index; 7969 warning_index:=cur_cs; ref_count:=cur_chr; r:=link(ref_count); n:=0; 7970 if tracing_macros>0 then @<Show the text of the macro being expanded@>; 7971 if info(r)<>end_match_token then 7972 @<Scan the parameters and make |link(r)| point to the macro body; but 7973 |return| if an illegal \.{\\par} is detected@>; 7974 @<Feed the macro body and its parameters to the scanner@>; 7975 exit:scanner_status:=save_scanner_status; warning_index:=save_warning_index; 7976 end; 7977 7978 @ Before we put a new token list on the input stack, it is wise to clean off 7979 all token lists that have recently been depleted. Then a user macro that ends 7980 with a call to itself will not require unbounded stack space. 7981 7982 @<Feed the macro body and its parameters to the scanner@>= 7983 while (state=token_list)and(loc=null)and(token_type<>v_template) do 7984 end_token_list; {conserve stack space} 7985 begin_token_list(ref_count,macro); name:=warning_index; loc:=link(r); 7986 if n>0 then 7987 begin if param_ptr+n>max_param_stack then 7988 begin max_param_stack:=param_ptr+n; 7989 if max_param_stack>param_size then 7990 overflow("parameter stack size",param_size); 7991 @:TeX capacity exceeded parameter stack size}{\quad parameter stack size@> 7992 end; 7993 for m:=0 to n-1 do param_stack[param_ptr+m]:=pstack[m]; 7994 param_ptr:=param_ptr+n; 7995 end 7996 7997 @ At this point, the reader will find it advisable to review the explanation 7998 of token list format that was presented earlier, since many aspects of that 7999 format are of importance chiefly in the |macro_call| routine. 8000 8001 The token list might begin with a string of compulsory tokens before the 8002 first |match| or |end_match|. In that case the macro name is supposed to be 8003 followed by those tokens; the following program will set |s=null| to 8004 represent this restriction. Otherwise |s| will be set to the first token of 8005 a string that will delimit the next parameter. 8006 8007 @<Scan the parameters and make |link(r)| point to the macro body...@>= 8008 begin scanner_status:=matching; unbalance:=0; 8009 long_state:=eq_type(cur_cs); 8010 if long_state>=outer_call then long_state:=long_state-2; 8011 repeat link(temp_head):=null; 8012 if (info(r)>match_token+255)or(info(r)<match_token) then s:=null 8013 else begin match_chr:=info(r)-match_token; s:=link(r); r:=s; 8014 p:=temp_head; m:=0; 8015 end; 8016 @<Scan a parameter until its delimiter string has been found; or, if |s=null|, 8017 simply scan the delimiter string@>;@/ 8018 {now |info(r)| is a token whose command code is either |match| or |end_match|} 8019 until info(r)=end_match_token; 8020 end 8021 8022 @ If |info(r)| is a |match| or |end_match| command, it cannot be equal to 8023 any token found by |get_token|. Therefore an undelimited parameter---i.e., 8024 a |match| that is immediately followed by |match| or |end_match|---will 8025 always fail the test `|cur_tok=info(r)|' in the following algorithm. 8026 8027 @<Scan a parameter until its delimiter string has been found; or, ...@>= 8028 continue: get_token; {set |cur_tok| to the next token of input} 8029 if cur_tok=info(r) then 8030 @<Advance \(r)|r|; |goto found| if the parameter delimiter has been 8031 fully matched, otherwise |goto continue|@>; 8032 @<Contribute the recently matched tokens to the current parameter, and 8033 |goto continue| if a partial match is still in effect; 8034 but abort if |s=null|@>; 8035 if cur_tok=par_token then if long_state<>long_call then 8036 @<Report a runaway argument and abort@>; 8037 if cur_tok<right_brace_limit then 8038 if cur_tok<left_brace_limit then 8039 @<Contribute an entire group to the current parameter@> 8040 else @<Report an extra right brace and |goto continue|@> 8041 else @<Store the current token, but |goto continue| if it is 8042 a blank space that would become an undelimited parameter@>; 8043 incr(m); 8044 if info(r)>end_match_token then goto continue; 8045 if info(r)<match_token then goto continue; 8046 found: if s<>null then @<Tidy up the parameter just scanned, and tuck it away@> 8047 8048 @ @<Store the current token, but |goto continue| if it is...@>= 8049 begin if cur_tok=space_token then 8050 if info(r)<=end_match_token then 8051 if info(r)>=match_token then goto continue; 8052 store_new_token(cur_tok); 8053 end 8054 8055 @ A slightly subtle point arises here: When the parameter delimiter ends 8056 with `\.{\#\{}', the token list will have a left brace both before and 8057 after the |end_match|\kern-.4pt. Only one of these should affect the 8058 |align_state|, but both will be scanned, so we must make a correction. 8059 8060 @<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully...@>= 8061 begin r:=link(r); 8062 if (info(r)>=match_token)and(info(r)<=end_match_token) then 8063 begin if cur_tok<left_brace_limit then decr(align_state); 8064 goto found; 8065 end 8066 else goto continue; 8067 end 8068 8069 @ @<Report an extra right brace and |goto continue|@>= 8070 begin back_input; print_err("Argument of "); sprint_cs(warning_index); 8071 @.Argument of \\x has...@> 8072 print(" has an extra }"); 8073 help6("I've run across a `}' that doesn't seem to match anything.")@/ 8074 ("For example, `\def\a#1{...}' and `\a}' would produce")@/ 8075 ("this error. If you simply proceed now, the `\par' that")@/ 8076 ("I've just inserted will cause me to report a runaway")@/ 8077 ("argument that might be the root of the problem. But if")@/ 8078 ("your `}' was spurious, just type `2' and it will go away."); 8079 incr(align_state); long_state:=call; cur_tok:=par_token; ins_error; 8080 goto continue; 8081 end {a white lie; the \.{\\par} won't always trigger a runaway} 8082 8083 @ If |long_state=outer_call|, a runaway argument has already been reported. 8084 8085 @<Report a runaway argument and abort@>= 8086 begin if long_state=call then 8087 begin runaway; print_err("Paragraph ended before "); 8088 @.Paragraph ended before...@> 8089 sprint_cs(warning_index); print(" was complete"); 8090 help3("I suspect you've forgotten a `}', causing me to apply this")@/ 8091 ("control sequence to too much text. How can we recover?")@/ 8092 ("My plan is to forget the whole thing and hope for the best."); 8093 back_error; 8094 end; 8095 pstack[n]:=link(temp_head); align_state:=align_state-unbalance; 8096 for m:=0 to n do flush_list(pstack[m]); 8097 return; 8098 end 8099 8100 @ When the following code becomes active, we have matched tokens from |s| to 8101 the predecessor of |r|, and we have found that |cur_tok<>info(r)|. An 8102 interesting situation now presents itself: If the parameter is to be 8103 delimited by a string such as `\.{ab}', and if we have scanned `\.{aa}', 8104 we want to contribute one `\.a' to the current parameter and resume 8105 looking for a `\.b'. The program must account for such partial matches and 8106 for others that can be quite complex. But most of the time we have |s=r| 8107 and nothing needs to be done. 8108 8109 Incidentally, it is possible for \.{\\par} tokens to sneak in to certain 8110 parameters of non-\.{\\long} macros. For example, consider a case like 8111 `\.{\\def\\a\#1\\par!\{...\}}' where the first \.{\\par} is not followed 8112 by an exclamation point. In such situations it does not seem appropriate 8113 to prohibit the \.{\\par}, so \TeX\ keeps quiet about this bending of 8114 the rules. 8115 8116 @<Contribute the recently matched tokens to the current parameter...@>= 8117 if s<>r then 8118 if s=null then @<Report an improper use of the macro and abort@> 8119 else begin t:=s; 8120 repeat store_new_token(info(t)); incr(m); u:=link(t); v:=s; 8121 loop@+ begin if u=r then 8122 if cur_tok<>info(v) then goto done 8123 else begin r:=link(v); goto continue; 8124 end; 8125 if info(u)<>info(v) then goto done; 8126 u:=link(u); v:=link(v); 8127 end; 8128 done: t:=link(t); 8129 until t=r; 8130 r:=s; {at this point, no tokens are recently matched} 8131 end 8132 8133 @ @<Report an improper use...@>= 8134 begin print_err("Use of "); sprint_cs(warning_index); 8135 @.Use of x doesn't match...@> 8136 print(" doesn't match its definition"); 8137 help4("If you say, e.g., `\def\a1{...}', then you must always")@/ 8138 ("put `1' after `\a', since control sequence names are")@/ 8139 ("made up of letters only. The macro here has not been")@/ 8140 ("followed by the required stuff, so I'm ignoring it."); 8141 error; return; 8142 end 8143 8144 @ @<Contribute an entire group to the current parameter@>= 8145 begin unbalance:=1; 8146 @^inner loop@> 8147 loop@+ begin fast_store_new_token(cur_tok); get_token; 8148 if cur_tok=par_token then if long_state<>long_call then 8149 @<Report a runaway argument and abort@>; 8150 if cur_tok<right_brace_limit then 8151 if cur_tok<left_brace_limit then incr(unbalance) 8152 else begin decr(unbalance); 8153 if unbalance=0 then goto done1; 8154 end; 8155 end; 8156 done1: rbrace_ptr:=p; store_new_token(cur_tok); 8157 end 8158 8159 @ If the parameter consists of a single group enclosed in braces, we must 8160 strip off the enclosing braces. That's why |rbrace_ptr| was introduced. 8161 8162 @<Tidy up the parameter just scanned, and tuck it away@>= 8163 begin if (m=1)and(info(p)<right_brace_limit) then 8164 begin link(rbrace_ptr):=null; free_avail(p); 8165 p:=link(temp_head); pstack[n]:=link(p); free_avail(p); 8166 end 8167 else pstack[n]:=link(temp_head); 8168 incr(n); 8169 if tracing_macros>0 then 8170 begin begin_diagnostic; print_nl(match_chr); print_int(n); 8171 print("<-"); show_token_list(pstack[n-1],null,1000); 8172 end_diagnostic(false); 8173 end; 8174 end 8175 8176 @ @<Show the text of the macro being expanded@>= 8177 begin begin_diagnostic; print_ln; print_cs(warning_index); 8178 token_show(ref_count); end_diagnostic(false); 8179 end 8180 8181 @* \[26] Basic scanning subroutines. 8182 Let's turn now to some procedures that \TeX\ calls upon frequently to digest 8183 certain kinds of patterns in the input. Most of these are quite simple; 8184 some are quite elaborate. Almost all of the routines call |get_x_token|, 8185 which can cause them to be invoked recursively. 8186 @^stomach@> 8187 @^recursion@> 8188 8189 @ The |scan_left_brace| routine is called when a left brace is supposed to be 8190 the next non-blank token. (The term ``left brace'' means, more precisely, 8191 a character whose catcode is |left_brace|.) \TeX\ allows \.{\\relax} to 8192 appear before the |left_brace|. 8193 8194 @p procedure scan_left_brace; {reads a mandatory |left_brace|} 8195 begin @<Get the next non-blank non-relax non-call token@>; 8196 if cur_cmd<>left_brace then 8197 begin print_err("Missing { inserted"); 8198 @.Missing \{ inserted@> 8199 help4("A left brace was mandatory here, so I've put one in.")@/ 8200 ("You might want to delete and/or insert some corrections")@/ 8201 ("so that I will find a matching right brace soon.")@/ 8202 ("(If you're confused by all this, try typing `I}' now.)"); 8203 back_error; cur_tok:=left_brace_token+"{"; cur_cmd:=left_brace; 8204 cur_chr:="{"; incr(align_state); 8205 end; 8206 end; 8207 8208 @ @<Get the next non-blank non-relax non-call token@>= 8209 repeat get_x_token; 8210 until (cur_cmd<>spacer)and(cur_cmd<>relax) 8211 8212 @ The |scan_optional_equals| routine looks for an optional `\.=' sign preceded 8213 by optional spaces; `\.{\\relax}' is not ignored here. 8214 8215 @p procedure scan_optional_equals; 8216 begin @<Get the next non-blank non-call token@>; 8217 if cur_tok<>other_token+"=" then back_input; 8218 end; 8219 8220 @ @<Get the next non-blank non-call token@>= 8221 repeat get_x_token; 8222 until cur_cmd<>spacer 8223 8224 @ In case you are getting bored, here is a slightly less trivial routine: 8225 Given a string of lowercase letters, like `\.{pt}' or `\.{plus}' or 8226 `\.{width}', the |scan_keyword| routine checks to see whether the next 8227 tokens of input match this string. The match must be exact, except that 8228 uppercase letters will match their lowercase counterparts; uppercase 8229 equivalents are determined by subtracting |"a"-"A"|, rather than using the 8230 |uc_code| table, since \TeX\ uses this routine only for its own limited 8231 set of keywords. 8232 8233 If a match is found, the characters are effectively removed from the input 8234 and |true| is returned. Otherwise |false| is returned, and the input 8235 is left essentially unchanged (except for the fact that some macros 8236 may have been expanded, etc.). 8237 @^inner loop@> 8238 8239 @p function scan_keyword(@!s:str_number):boolean; {look for a given string} 8240 label exit; 8241 var p:pointer; {tail of the backup list} 8242 @!q:pointer; {new node being added to the token list via |store_new_token|} 8243 @!k:pool_pointer; {index into |str_pool|} 8244 begin p:=backup_head; link(p):=null; k:=str_start[s]; 8245 while k<str_start[s+1] do 8246 begin get_x_token; {recursion is possible here} 8247 @^recursion@> 8248 if (cur_cs=0)and@| 8249 ((cur_chr=so(str_pool[k]))or(cur_chr=so(str_pool[k])-"a"+"A")) then 8250 begin store_new_token(cur_tok); incr(k); 8251 end 8252 else if (cur_cmd<>spacer)or(p<>backup_head) then 8253 begin back_input; 8254 if p<>backup_head then back_list(link(backup_head)); 8255 scan_keyword:=false; return; 8256 end; 8257 end; 8258 flush_list(link(backup_head)); scan_keyword:=true; 8259 exit:end; 8260 8261 @ Here is a procedure that sounds an alarm when mu and non-mu units 8262 are being switched. 8263 8264 @p procedure mu_error; 8265 begin print_err("Incompatible glue units"); 8266 @.Incompatible glue units@> 8267 help1("I'm going to assume that 1mu=1pt when they're mixed."); 8268 error; 8269 end; 8270 8271 @ The next routine `|scan_something_internal|' is used to fetch internal 8272 numeric quantities like `\.{\\hsize}', and also to handle the `\.{\\the}' 8273 when expanding constructions like `\.{\\the\\toks0}' and 8274 `\.{\\the\\baselineskip}'. Soon we will be considering the |scan_int| 8275 procedure, which calls |scan_something_internal|; on the other hand, 8276 |scan_something_internal| also calls |scan_int|, for constructions like 8277 `\.{\\catcode\`\\\$}' or `\.{\\fontdimen} \.3 \.{\\ff}'. So we 8278 have to declare |scan_int| as a |forward| procedure. A few other 8279 procedures are also declared at this point. 8280 8281 @p procedure@?scan_int; forward; {scans an integer value} 8282 @t\4\4@>@<Declare procedures that scan restricted classes of integers@>@; 8283 @t\4\4@>@<Declare procedures that scan font-related stuff@> 8284 8285 @ \TeX\ doesn't know exactly what to expect when |scan_something_internal| 8286 begins. For example, an integer or dimension or glue value could occur 8287 immediately after `\.{\\hskip}'; and one can even say \.{\\the} with 8288 respect to token lists in constructions like 8289 `\.{\\xdef\\o\{\\the\\output\}}'. On the other hand, only integers are 8290 allowed after a construction like `\.{\\count}'. To handle the various 8291 possibilities, |scan_something_internal| has a |level| parameter, which 8292 tells the ``highest'' kind of quantity that |scan_something_internal| is 8293 allowed to produce. Six levels are distinguished, namely |int_val|, 8294 |dimen_val|, |glue_val|, |mu_val|, |ident_val|, and |tok_val|. 8295 8296 The output of |scan_something_internal| (and of the other routines 8297 |scan_int|, |scan_dimen|, and |scan_glue| below) is put into the global 8298 variable |cur_val|, and its level is put into |cur_val_level|. The highest 8299 values of |cur_val_level| are special: |mu_val| is used only when 8300 |cur_val| points to something in a ``muskip'' register, or to one of the 8301 three parameters \.{\\thinmuskip}, \.{\\medmuskip}, \.{\\thickmuskip}; 8302 |ident_val| is used only when |cur_val| points to a font identifier; 8303 |tok_val| is used only when |cur_val| points to |null| or to the reference 8304 count of a token list. The last two cases are allowed only when 8305 |scan_something_internal| is called with |level=tok_val|. 8306 8307 If the output is glue, |cur_val| will point to a glue specification, and 8308 the reference count of that glue will have been updated to reflect this 8309 reference; if the output is a nonempty token list, |cur_val| will point to 8310 its reference count, but in this case the count will not have been updated. 8311 Otherwise |cur_val| will contain the integer or scaled value in question. 8312 8313 @d int_val=0 {integer values} 8314 @d dimen_val=1 {dimension values} 8315 @d glue_val=2 {glue specifications} 8316 @d mu_val=3 {math glue specifications} 8317 @d ident_val=4 {font identifier} 8318 @d tok_val=5 {token lists} 8319 8320 @<Glob...@>= 8321 @!cur_val:integer; {value returned by numeric scanners} 8322 @!cur_val_level:int_val..tok_val; {the ``level'' of this value} 8323 8324 @ The hash table is initialized with `\.{\\count}', `\.{\\dimen}', `\.{\\skip}', 8325 and `\.{\\muskip}' all having |register| as their command code; they are 8326 distinguished by the |chr_code|, which is either |int_val|, |dimen_val|, 8327 |glue_val|, or |mu_val|. 8328 8329 @<Put each...@>= 8330 primitive("count",register,int_val); 8331 @!@:count_}{\.{\\count} primitive@> 8332 primitive("dimen",register,dimen_val); 8333 @!@:dimen_}{\.{\\dimen} primitive@> 8334 primitive("skip",register,glue_val); 8335 @!@:skip_}{\.{\\skip} primitive@> 8336 primitive("muskip",register,mu_val); 8337 @!@:mu_skip_}{\.{\\muskip} primitive@> 8338 8339 @ @<Cases of |print_cmd_chr|...@>= 8340 register: if chr_code=int_val then print_esc("count") 8341 else if chr_code=dimen_val then print_esc("dimen") 8342 else if chr_code=glue_val then print_esc("skip") 8343 else print_esc("muskip"); 8344 8345 @ OK, we're ready for |scan_something_internal| itself. A second parameter, 8346 |negative|, is set |true| if the value that is found should be negated. 8347 It is assumed that |cur_cmd| and |cur_chr| represent the first token of 8348 the internal quantity to be scanned; an error will be signalled if 8349 |cur_cmd<min_internal| or |cur_cmd>max_internal|. 8350 8351 @d scanned_result_end(#)==cur_val_level:=#;@+end 8352 @d scanned_result(#)==@+begin cur_val:=#;scanned_result_end 8353 8354 @p procedure scan_something_internal(@!level:small_number;@!negative:boolean); 8355 {fetch an internal parameter} 8356 var m:halfword; {|chr_code| part of the operand token} 8357 @!p:0..nest_size; {index into |nest|} 8358 begin m:=cur_chr; 8359 case cur_cmd of 8360 def_code: @<Fetch a character code from some table@>; 8361 toks_register,assign_toks,def_family,set_font,def_font: @<Fetch a token list or 8362 font identifier, provided that |level=tok_val|@>; 8363 assign_int: scanned_result(eqtb[m].int)(int_val); 8364 assign_dimen: scanned_result(eqtb[m].sc)(dimen_val); 8365 assign_glue: scanned_result(equiv(m))(glue_val); 8366 assign_mu_glue: scanned_result(equiv(m))(mu_val); 8367 set_aux: @<Fetch the |space_factor| or the |prev_depth|@>; 8368 set_prev_graf: @<Fetch the |prev_graf|@>; 8369 set_page_int:@<Fetch the |dead_cycles| or the |insert_penalties|@>; 8370 set_page_dimen: @<Fetch something on the |page_so_far|@>; 8371 set_shape: @<Fetch the |par_shape| size@>; 8372 set_box_dimen: @<Fetch a box dimension@>; 8373 char_given,math_given: scanned_result(cur_chr)(int_val); 8374 assign_font_dimen: @<Fetch a font dimension@>; 8375 assign_font_int: @<Fetch a font integer@>; 8376 register: @<Fetch a register@>; 8377 last_item: @<Fetch an item in the current node, if appropriate@>; 8378 othercases @<Complain that \.{\\the} can't do this; give zero result@> 8379 endcases;@/ 8380 while cur_val_level>level do @<Convert \(c)|cur_val| to a lower level@>; 8381 @<Fix the reference count, if any, and negate |cur_val| if |negative|@>; 8382 end; 8383 8384 @ @<Fetch a character code from some table@>= 8385 begin scan_char_num; 8386 if m=math_code_base then scanned_result(ho(math_code(cur_val)))(int_val) 8387 else if m<math_code_base then scanned_result(equiv(m+cur_val))(int_val) 8388 else scanned_result(eqtb[m+cur_val].int)(int_val); 8389 end 8390 8391 @ @<Fetch a token list...@>= 8392 if level<>tok_val then 8393 begin print_err("Missing number, treated as zero"); 8394 @.Missing number...@> 8395 help3("A number should have been here; I inserted `0'.")@/ 8396 ("(If you can't figure out why I needed to see a number,")@/ 8397 ("look up `weird error' in the index to The TeXbook.)"); 8398 @:TeXbook}{\sl The \TeX book@> 8399 back_error; scanned_result(0)(dimen_val); 8400 end 8401 else if cur_cmd<=assign_toks then 8402 begin if cur_cmd<assign_toks then {|cur_cmd=toks_register|} 8403 begin scan_eight_bit_int; m:=toks_base+cur_val; 8404 end; 8405 scanned_result(equiv(m))(tok_val); 8406 end 8407 else begin back_input; scan_font_ident; 8408 scanned_result(font_id_base+cur_val)(ident_val); 8409 end 8410 8411 @ Users refer to `\.{\\the\\spacefactor}' only in horizontal 8412 mode, and to `\.{\\the\\prevdepth}' only in vertical mode; so we put the 8413 associated mode in the modifier part of the |set_aux| command. 8414 The |set_page_int| command has modifier 0 or 1, for `\.{\\deadcycles}' and 8415 `\.{\\insertpenalties}', respectively. The |set_box_dimen| command is 8416 modified by either |width_offset|, |height_offset|, or |depth_offset|. 8417 And the |last_item| command is modified by either |int_val|, |dimen_val|, 8418 |glue_val|, |input_line_no_code|, or |badness_code|. 8419 8420 @d input_line_no_code=glue_val+1 {code for \.{\\inputlineno}} 8421 @d badness_code=glue_val+2 {code for \.{\\badness}} 8422 8423 @<Put each...@>= 8424 primitive("spacefactor",set_aux,hmode); 8425 @!@:space_factor_}{\.{\\spacefactor} primitive@> 8426 primitive("prevdepth",set_aux,vmode);@/ 8427 @!@:prev_depth_}{\.{\\prevdepth} primitive@> 8428 primitive("deadcycles",set_page_int,0); 8429 @!@:dead_cycles_}{\.{\\deadcycles} primitive@> 8430 primitive("insertpenalties",set_page_int,1); 8431 @!@:insert_penalties_}{\.{\\insertpenalties} primitive@> 8432 primitive("wd",set_box_dimen,width_offset); 8433 @!@:wd_}{\.{\\wd} primitive@> 8434 primitive("ht",set_box_dimen,height_offset); 8435 @!@:ht_}{\.{\\ht} primitive@> 8436 primitive("dp",set_box_dimen,depth_offset); 8437 @!@:dp_}{\.{\\dp} primitive@> 8438 primitive("lastpenalty",last_item,int_val); 8439 @!@:last_penalty_}{\.{\\lastpenalty} primitive@> 8440 primitive("lastkern",last_item,dimen_val); 8441 @!@:last_kern_}{\.{\\lastkern} primitive@> 8442 primitive("lastskip",last_item,glue_val); 8443 @!@:last_skip_}{\.{\\lastskip} primitive@> 8444 primitive("inputlineno",last_item,input_line_no_code); 8445 @!@:input_line_no_}{\.{\\inputlineno} primitive@> 8446 primitive("badness",last_item,badness_code); 8447 @!@:badness_}{\.{\\badness} primitive@> 8448 8449 @ @<Cases of |print_cmd_chr|...@>= 8450 set_aux: if chr_code=vmode then print_esc("prevdepth") 8451 @+else print_esc("spacefactor"); 8452 set_page_int: if chr_code=0 then print_esc("deadcycles") 8453 @+else print_esc("insertpenalties"); 8454 set_box_dimen: if chr_code=width_offset then print_esc("wd") 8455 else if chr_code=height_offset then print_esc("ht") 8456 else print_esc("dp"); 8457 last_item: case chr_code of 8458 int_val: print_esc("lastpenalty"); 8459 dimen_val: print_esc("lastkern"); 8460 glue_val: print_esc("lastskip"); 8461 input_line_no_code: print_esc("inputlineno"); 8462 othercases print_esc("badness") 8463 endcases; 8464 8465 @ @<Fetch the |space_factor| or the |prev_depth|@>= 8466 if abs(mode)<>m then 8467 begin print_err("Improper "); print_cmd_chr(set_aux,m); 8468 @.Improper \\spacefactor@> 8469 @.Improper \\prevdepth@> 8470 help4("You can refer to \spacefactor only in horizontal mode;")@/ 8471 ("you can refer to \prevdepth only in vertical mode; and")@/ 8472 ("neither of these is meaningful inside \write. So")@/ 8473 ("I'm forgetting what you said and using zero instead."); 8474 error; 8475 if level<>tok_val then scanned_result(0)(dimen_val) 8476 else scanned_result(0)(int_val); 8477 end 8478 else if m=vmode then scanned_result(prev_depth)(dimen_val) 8479 else scanned_result(space_factor)(int_val) 8480 8481 @ @<Fetch the |dead_cycles| or the |insert_penalties|@>= 8482 begin if m=0 then cur_val:=dead_cycles@+else cur_val:=insert_penalties; 8483 cur_val_level:=int_val; 8484 end 8485 8486 @ @<Fetch a box dimension@>= 8487 begin scan_eight_bit_int; 8488 if box(cur_val)=null then cur_val:=0 @+else cur_val:=mem[box(cur_val)+m].sc; 8489 cur_val_level:=dimen_val; 8490 end 8491 8492 @ Inside an \.{\\output} routine, a user may wish to look at the page totals 8493 that were present at the moment when output was triggered. 8494 8495 @d max_dimen==@'7777777777 {$2^{30}-1$} 8496 8497 @<Fetch something on the |page_so_far|@>= 8498 begin if (page_contents=empty) and (not output_active) then 8499 if m=0 then cur_val:=max_dimen@+else cur_val:=0 8500 else cur_val:=page_so_far[m]; 8501 cur_val_level:=dimen_val; 8502 end 8503 8504 @ @<Fetch the |prev_graf|@>= 8505 if mode=0 then scanned_result(0)(int_val) {|prev_graf=0| within \.{\\write}} 8506 else begin nest[nest_ptr]:=cur_list; p:=nest_ptr; 8507 while abs(nest[p].mode_field)<>vmode do decr(p); 8508 scanned_result(nest[p].pg_field)(int_val); 8509 end 8510 8511 @ @<Fetch the |par_shape| size@>= 8512 begin if par_shape_ptr=null then cur_val:=0 8513 else cur_val:=info(par_shape_ptr); 8514 cur_val_level:=int_val; 8515 end 8516 8517 @ Here is where \.{\\lastpenalty}, \.{\\lastkern}, and \.{\\lastskip} are 8518 implemented. The reference count for \.{\\lastskip} will be updated later. 8519 8520 We also handle \.{\\inputlineno} and \.{\\badness} here, because they are 8521 legal in similar contexts. 8522 8523 @<Fetch an item in the current node...@>= 8524 if cur_chr>glue_val then 8525 begin if cur_chr=input_line_no_code then cur_val:=line 8526 else cur_val:=last_badness; {|cur_chr=badness_code|} 8527 cur_val_level:=int_val; 8528 end 8529 else begin if cur_chr=glue_val then cur_val:=zero_glue@+else cur_val:=0; 8530 cur_val_level:=cur_chr; 8531 if not is_char_node(tail)and(mode<>0) then 8532 case cur_chr of 8533 int_val: if type(tail)=penalty_node then cur_val:=penalty(tail); 8534 dimen_val: if type(tail)=kern_node then cur_val:=width(tail); 8535 glue_val: if type(tail)=glue_node then 8536 begin cur_val:=glue_ptr(tail); 8537 if subtype(tail)=mu_glue then cur_val_level:=mu_val; 8538 end; 8539 end {there are no other cases} 8540 else if (mode=vmode)and(tail=head) then 8541 case cur_chr of 8542 int_val: cur_val:=last_penalty; 8543 dimen_val: cur_val:=last_kern; 8544 glue_val: if last_glue<>max_halfword then cur_val:=last_glue; 8545 end; {there are no other cases} 8546 end 8547 8548 @ @<Fetch a font dimension@>= 8549 begin find_font_dimen(false); font_info[fmem_ptr].sc:=0; 8550 scanned_result(font_info[cur_val].sc)(dimen_val); 8551 end 8552 8553 @ @<Fetch a font integer@>= 8554 begin scan_font_ident; 8555 if m=0 then scanned_result(hyphen_char[cur_val])(int_val) 8556 else scanned_result(skew_char[cur_val])(int_val); 8557 end 8558 8559 @ @<Fetch a register@>= 8560 begin scan_eight_bit_int; 8561 case m of 8562 int_val:cur_val:=count(cur_val); 8563 dimen_val:cur_val:=dimen(cur_val); 8564 glue_val: cur_val:=skip(cur_val); 8565 mu_val: cur_val:=mu_skip(cur_val); 8566 end; {there are no other cases} 8567 cur_val_level:=m; 8568 end 8569 8570 @ @<Complain that \.{\\the} can't do this; give zero result@>= 8571 begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr); 8572 @.You can't use x after ...@> 8573 print("' after "); print_esc("the"); 8574 help1("I'm forgetting what you said and using zero instead."); 8575 error; 8576 if level<>tok_val then scanned_result(0)(dimen_val) 8577 else scanned_result(0)(int_val); 8578 end 8579 8580 @ When a |glue_val| changes to a |dimen_val|, we use the width component 8581 of the glue; there is no need to decrease the reference count, since it 8582 has not yet been increased. When a |dimen_val| changes to an |int_val|, 8583 we use scaled points so that the value doesn't actually change. And when a 8584 |mu_val| changes to a |glue_val|, the value doesn't change either. 8585 8586 @<Convert \(c)|cur_val| to a lower level@>= 8587 begin if cur_val_level=glue_val then cur_val:=width(cur_val) 8588 else if cur_val_level=mu_val then mu_error; 8589 decr(cur_val_level); 8590 end 8591 8592 @ If |cur_val| points to a glue specification at this point, the reference 8593 count for the glue does not yet include the reference by |cur_val|. 8594 If |negative| is |true|, |cur_val_level| is known to be |<=mu_val|. 8595 8596 @<Fix the reference count, if any, ...@>= 8597 if negative then 8598 if cur_val_level>=glue_val then 8599 begin cur_val:=new_spec(cur_val); 8600 @<Negate all three glue components of |cur_val|@>; 8601 end 8602 else negate(cur_val) 8603 else if (cur_val_level>=glue_val)and(cur_val_level<=mu_val) then 8604 add_glue_ref(cur_val) 8605 8606 @ @<Negate all three...@>= 8607 begin negate(width(cur_val)); 8608 negate(stretch(cur_val)); 8609 negate(shrink(cur_val)); 8610 end 8611 8612 @ Our next goal is to write the |scan_int| procedure, which scans anything that 8613 \TeX\ treats as an integer. But first we might as well look at some simple 8614 applications of |scan_int| that have already been made inside of 8615 |scan_something_internal|. 8616 8617 @ @<Declare procedures that scan restricted classes of integers@>= 8618 procedure scan_eight_bit_int; 8619 begin scan_int; 8620 if (cur_val<0)or(cur_val>255) then 8621 begin print_err("Bad register code"); 8622 @.Bad register code@> 8623 help2("A register number must be between 0 and 255.")@/ 8624 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8625 end; 8626 end; 8627 8628 @ @<Declare procedures that scan restricted classes of integers@>= 8629 procedure scan_char_num; 8630 begin scan_int; 8631 if (cur_val<0)or(cur_val>255) then 8632 begin print_err("Bad character code"); 8633 @.Bad character code@> 8634 help2("A character number must be between 0 and 255.")@/ 8635 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8636 end; 8637 end; 8638 8639 @ While we're at it, we might as well deal with similar routines that 8640 will be needed later. 8641 8642 @<Declare procedures that scan restricted classes of integers@>= 8643 procedure scan_four_bit_int; 8644 begin scan_int; 8645 if (cur_val<0)or(cur_val>15) then 8646 begin print_err("Bad number"); 8647 @.Bad number@> 8648 help2("Since I expected to read a number between 0 and 15,")@/ 8649 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8650 end; 8651 end; 8652 8653 @ @<Declare procedures that scan restricted classes of integers@>= 8654 procedure scan_fifteen_bit_int; 8655 begin scan_int; 8656 if (cur_val<0)or(cur_val>@'77777) then 8657 begin print_err("Bad mathchar"); 8658 @.Bad mathchar@> 8659 help2("A mathchar number must be between 0 and 32767.")@/ 8660 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8661 end; 8662 end; 8663 8664 @ @<Declare procedures that scan restricted classes of integers@>= 8665 procedure scan_twenty_seven_bit_int; 8666 begin scan_int; 8667 if (cur_val<0)or(cur_val>@'777777777) then 8668 begin print_err("Bad delimiter code"); 8669 @.Bad delimiter code@> 8670 help2("A numeric delimiter code must be between 0 and 2^{27}-1.")@/ 8671 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8672 end; 8673 end; 8674 8675 @ An integer number can be preceded by any number of spaces and `\.+' or 8676 `\.-' signs. Then comes either a decimal constant (i.e., radix 10), an 8677 octal constant (i.e., radix 8, preceded by~\.\'), a hexadecimal constant 8678 (radix 16, preceded by~\."), an alphabetic constant (preceded by~\.\`), or 8679 an internal variable. After scanning is complete, 8680 |cur_val| will contain the answer, which must be at most 8681 $2^{31}-1=2147483647$ in absolute value. The value of |radix| is set to 8682 10, 8, or 16 in the cases of decimal, octal, or hexadecimal constants, 8683 otherwise |radix| is set to zero. An optional space follows a constant. 8684 8685 @d octal_token=other_token+"'" {apostrophe, indicates an octal constant} 8686 @d hex_token=other_token+"""" {double quote, indicates a hex constant} 8687 @d alpha_token=other_token+"`" {reverse apostrophe, precedes alpha constants} 8688 @d point_token=other_token+"." {decimal point} 8689 @d continental_point_token=other_token+"," {decimal point, Eurostyle} 8690 8691 @<Glob...@>= 8692 @!radix:small_number; {|scan_int| sets this to 8, 10, 16, or zero} 8693 8694 @ We initialize the following global variables just in case |expand| 8695 comes into action before any of the basic scanning routines has assigned 8696 them a value. 8697 8698 @<Set init...@>= 8699 cur_val:=0; cur_val_level:=int_val; radix:=0; cur_order:=normal; 8700 8701 @ The |scan_int| routine is used also to scan the integer part of a 8702 fraction; for example, the `\.3' in `\.{3.14159}' will be found by 8703 |scan_int|. The |scan_dimen| routine assumes that |cur_tok=point_token| 8704 after the integer part of such a fraction has been scanned by |scan_int|, 8705 and that the decimal point has been backed up to be scanned again. 8706 8707 @p procedure scan_int; {sets |cur_val| to an integer} 8708 label done; 8709 var negative:boolean; {should the answer be negated?} 8710 @!m:integer; {|@t$2^{31}$@> div radix|, the threshold of danger} 8711 @!d:small_number; {the digit just scanned} 8712 @!vacuous:boolean; {have no digits appeared?} 8713 @!OK_so_far:boolean; {has an error message been issued?} 8714 begin radix:=0; OK_so_far:=true;@/ 8715 @<Get the next non-blank non-sign token; set |negative| appropriately@>; 8716 if cur_tok=alpha_token then @<Scan an alphabetic character code into |cur_val|@> 8717 else if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then 8718 scan_something_internal(int_val,false) 8719 else @<Scan a numeric constant@>; 8720 if negative then negate(cur_val); 8721 end; 8722 8723 @ @<Get the next non-blank non-sign token...@>= 8724 negative:=false; 8725 repeat @<Get the next non-blank non-call token@>; 8726 if cur_tok=other_token+"-" then 8727 begin negative := not negative; cur_tok:=other_token+"+"; 8728 end; 8729 until cur_tok<>other_token+"+" 8730 8731 @ A space is ignored after an alphabetic character constant, so that 8732 such constants behave like numeric ones. 8733 8734 @<Scan an alphabetic character code into |cur_val|@>= 8735 begin get_token; {suppress macro expansion} 8736 if cur_tok<cs_token_flag then 8737 begin cur_val:=cur_chr; 8738 if cur_cmd<=right_brace then 8739 if cur_cmd=right_brace then incr(align_state) 8740 else decr(align_state); 8741 end 8742 else if cur_tok<cs_token_flag+single_base then 8743 cur_val:=cur_tok-cs_token_flag-active_base 8744 else cur_val:=cur_tok-cs_token_flag-single_base; 8745 if cur_val>255 then 8746 begin print_err("Improper alphabetic constant"); 8747 @.Improper alphabetic constant@> 8748 help2("A one-character control sequence belongs after a ` mark.")@/ 8749 ("So I'm essentially inserting \0 here."); 8750 cur_val:="0"; back_error; 8751 end 8752 else @<Scan an optional space@>; 8753 end 8754 8755 @ @<Scan an optional space@>= 8756 begin get_x_token; if cur_cmd<>spacer then back_input; 8757 end 8758 8759 @ @<Scan a numeric constant@>= 8760 begin radix:=10; m:=214748364; 8761 if cur_tok=octal_token then 8762 begin radix:=8; m:=@'2000000000; get_x_token; 8763 end 8764 else if cur_tok=hex_token then 8765 begin radix:=16; m:=@'1000000000; get_x_token; 8766 end; 8767 vacuous:=true; cur_val:=0;@/ 8768 @<Accumulate the constant until |cur_tok| is not a suitable digit@>; 8769 if vacuous then @<Express astonishment that no number was here@> 8770 else if cur_cmd<>spacer then back_input; 8771 end 8772 8773 @ @d infinity==@'17777777777 {the largest positive value that \TeX\ knows} 8774 @d zero_token=other_token+"0" {zero, the smallest digit} 8775 @d A_token=letter_token+"A" {the smallest special hex digit} 8776 @d other_A_token=other_token+"A" {special hex digit of type |other_char|} 8777 8778 @<Accumulate the constant...@>= 8779 loop@+ begin if (cur_tok<zero_token+radix)and(cur_tok>=zero_token)and 8780 (cur_tok<=zero_token+9) then d:=cur_tok-zero_token 8781 else if radix=16 then 8782 if (cur_tok<=A_token+5)and(cur_tok>=A_token) then d:=cur_tok-A_token+10 8783 else if (cur_tok<=other_A_token+5)and(cur_tok>=other_A_token) then 8784 d:=cur_tok-other_A_token+10 8785 else goto done 8786 else goto done; 8787 vacuous:=false; 8788 if (cur_val>=m)and((cur_val>m)or(d>7)or(radix<>10)) then 8789 begin if OK_so_far then 8790 begin print_err("Number too big"); 8791 @.Number too big@> 8792 help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")@/ 8793 ("so I'm using that number instead of yours."); 8794 error; cur_val:=infinity; OK_so_far:=false; 8795 end; 8796 end 8797 else cur_val:=cur_val*radix+d; 8798 get_x_token; 8799 end; 8800 done: 8801 8802 @ @<Express astonishment...@>= 8803 begin print_err("Missing number, treated as zero"); 8804 @.Missing number...@> 8805 help3("A number should have been here; I inserted `0'.")@/ 8806 ("(If you can't figure out why I needed to see a number,")@/ 8807 ("look up `weird error' in the index to The TeXbook.)"); 8808 @:TeXbook}{\sl The \TeX book@> 8809 back_error; 8810 end 8811 8812 @ The |scan_dimen| routine is similar to |scan_int|, but it sets |cur_val| to 8813 a |scaled| value, i.e., an integral number of sp. One of its main tasks 8814 is therefore to interpret the abbreviations for various kinds of units and 8815 to convert measurements to scaled points. 8816 8817 There are three parameters: |mu| is |true| if the finite units must be 8818 `\.{mu}', while |mu| is |false| if `\.{mu}' units are disallowed; 8819 |inf| is |true| if the infinite units `\.{fil}', `\.{fill}', `\.{filll}' 8820 are permitted; and |shortcut| is |true| if |cur_val| already contains 8821 an integer and only the units need to be considered. 8822 8823 The order of infinity that was found in the case of infinite glue is returned 8824 in the global variable |cur_order|. 8825 8826 @<Glob...@>= 8827 @!cur_order:glue_ord; {order of infinity found by |scan_dimen|} 8828 8829 @ Constructions like `\.{-\'77 pt}' are legal dimensions, so |scan_dimen| 8830 may begin with |scan_int|. This explains why it is convenient to use 8831 |scan_int| also for the integer part of a decimal fraction. 8832 8833 Several branches of |scan_dimen| work with |cur_val| as an integer and 8834 with an auxiliary fraction |f|, so that the actual quantity of interest is 8835 $|cur_val|+|f|/2^{16}$. At the end of the routine, this ``unpacked'' 8836 representation is put into the single word |cur_val|, which suddenly 8837 switches significance from |integer| to |scaled|. 8838 8839 @d attach_fraction=88 {go here to pack |cur_val| and |f| into |cur_val|} 8840 @d attach_sign=89 {go here when |cur_val| is correct except perhaps for sign} 8841 @d scan_normal_dimen==scan_dimen(false,false,false) 8842 8843 @p procedure scan_dimen(@!mu,@!inf,@!shortcut:boolean); 8844 {sets |cur_val| to a dimension} 8845 label done, done1, done2, found, not_found, attach_fraction, attach_sign; 8846 var negative:boolean; {should the answer be negated?} 8847 @!f:integer; {numerator of a fraction whose denominator is $2^{16}$} 8848 @<Local variables for dimension calculations@>@; 8849 begin f:=0; arith_error:=false; cur_order:=normal; negative:=false; 8850 if not shortcut then 8851 begin @<Get the next non-blank non-sign...@>; 8852 if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then 8853 @<Fetch an internal dimension and |goto attach_sign|, 8854 or fetch an internal integer@> 8855 else begin back_input; 8856 if cur_tok=continental_point_token then cur_tok:=point_token; 8857 if cur_tok<>point_token then scan_int 8858 else begin radix:=10; cur_val:=0; 8859 end; 8860 if cur_tok=continental_point_token then cur_tok:=point_token; 8861 if (radix=10)and(cur_tok=point_token) then @<Scan decimal fraction@>; 8862 end; 8863 end; 8864 if cur_val<0 then {in this case |f=0|} 8865 begin negative := not negative; negate(cur_val); 8866 end; 8867 @<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there 8868 are |x| sp per unit; |goto attach_sign| if the units are internal@>; 8869 @<Scan an optional space@>; 8870 attach_sign: if arith_error or(abs(cur_val)>=@'10000000000) then 8871 @<Report that this dimension is out of range@>; 8872 if negative then negate(cur_val); 8873 end; 8874 8875 @ @<Fetch an internal dimension and |goto attach_sign|...@>= 8876 if mu then 8877 begin scan_something_internal(mu_val,false); 8878 @<Coerce glue to a dimension@>; 8879 if cur_val_level=mu_val then goto attach_sign; 8880 if cur_val_level<>int_val then mu_error; 8881 end 8882 else begin scan_something_internal(dimen_val,false); 8883 if cur_val_level=dimen_val then goto attach_sign; 8884 end 8885 8886 @ @<Local variables for dimension calculations@>= 8887 @!num,@!denom:1..65536; {conversion ratio for the scanned units} 8888 @!k,@!kk:small_number; {number of digits in a decimal fraction} 8889 @!p,@!q:pointer; {top of decimal digit stack} 8890 @!v:scaled; {an internal dimension} 8891 @!save_cur_val:integer; {temporary storage of |cur_val|} 8892 8893 @ The following code is executed when |scan_something_internal| was 8894 called asking for |mu_val|, when we really wanted a ``mudimen'' instead 8895 of ``muglue.'' 8896 8897 @<Coerce glue to a dimension@>= 8898 if cur_val_level>=glue_val then 8899 begin v:=width(cur_val); delete_glue_ref(cur_val); cur_val:=v; 8900 end 8901 8902 @ When the following code is executed, we have |cur_tok=point_token|, but this 8903 token has been backed up using |back_input|; we must first discard it. 8904 8905 It turns out that a decimal point all by itself is equivalent to `\.{0.0}'. 8906 Let's hope people don't use that fact. 8907 8908 @<Scan decimal fraction@>= 8909 begin k:=0; p:=null; get_token; {|point_token| is being re-scanned} 8910 loop@+ begin get_x_token; 8911 if (cur_tok>zero_token+9)or(cur_tok<zero_token) then goto done1; 8912 if k<17 then {digits for |k>=17| cannot affect the result} 8913 begin q:=get_avail; link(q):=p; info(q):=cur_tok-zero_token; 8914 p:=q; incr(k); 8915 end; 8916 end; 8917 done1: for kk:=k downto 1 do 8918 begin dig[kk-1]:=info(p); q:=p; p:=link(p); free_avail(q); 8919 end; 8920 f:=round_decimals(k); 8921 if cur_cmd<>spacer then back_input; 8922 end 8923 8924 @ Now comes the harder part: At this point in the program, |cur_val| is a 8925 nonnegative integer and $f/2^{16}$ is a nonnegative fraction less than 1; 8926 we want to multiply the sum of these two quantities by the appropriate 8927 factor, based on the specified units, in order to produce a |scaled| 8928 result, and we want to do the calculation with fixed point arithmetic that 8929 does not overflow. 8930 8931 @<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>= 8932 if inf then @<Scan for \(f)\.{fil} units; |goto attach_fraction| if found@>; 8933 @<Scan for \(u)units that are internal dimensions; 8934 |goto attach_sign| with |cur_val| set if found@>; 8935 if mu then @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>; 8936 if scan_keyword("true") then @<Adjust \(f)for the magnification ratio@>; 8937 @.true@> 8938 if scan_keyword("pt") then goto attach_fraction; {the easy case} 8939 @.pt@> 8940 @<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly; 8941 |goto done| in the case of scaled points@>; 8942 attach_fraction: if cur_val>=@'40000 then arith_error:=true 8943 else cur_val:=cur_val*unity+f; 8944 done: 8945 8946 @ A specification like `\.{filllll}' or `\.{fill L L L}' will lead to two 8947 error messages (one for each additional keyword \.{"l"}). 8948 8949 @<Scan for \(f)\.{fil} units...@>= 8950 if scan_keyword("fil") then 8951 @.fil@> 8952 begin cur_order:=fil; 8953 while scan_keyword("l") do 8954 begin if cur_order=filll then 8955 begin print_err("Illegal unit of measure ("); 8956 @.Illegal unit of measure@> 8957 print("replaced by filll)"); 8958 help1("I dddon't go any higher than filll."); error; 8959 end 8960 else incr(cur_order); 8961 end; 8962 goto attach_fraction; 8963 end 8964 8965 @ @<Scan for \(u)units that are internal dimensions...@>= 8966 save_cur_val:=cur_val; 8967 @<Get the next non-blank non-call...@>; 8968 if (cur_cmd<min_internal)or(cur_cmd>max_internal) then back_input 8969 else begin if mu then 8970 begin scan_something_internal(mu_val,false); @<Coerce glue...@>; 8971 if cur_val_level<>mu_val then mu_error; 8972 end 8973 else scan_something_internal(dimen_val,false); 8974 v:=cur_val; goto found; 8975 end; 8976 if mu then goto not_found; 8977 if scan_keyword("em") then v:=(@<The em width for |cur_font|@>) 8978 @.em@> 8979 else if scan_keyword("ex") then v:=(@<The x-height for |cur_font|@>) 8980 @.ex@> 8981 else goto not_found; 8982 @<Scan an optional space@>; 8983 found:cur_val:=nx_plus_y(save_cur_val,v,xn_over_d(v,f,@'200000)); 8984 goto attach_sign; 8985 not_found: 8986 8987 @ @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>= 8988 if scan_keyword("mu") then goto attach_fraction 8989 @.mu@> 8990 else begin print_err("Illegal unit of measure ("); print("mu inserted)"); 8991 @.Illegal unit of measure@> 8992 help4("The unit of measurement in math glue must be mu.")@/ 8993 ("To recover gracefully from this error, it's best to")@/ 8994 ("delete the erroneous units; e.g., type `2' to delete")@/ 8995 ("two letters. (See Chapter 27 of The TeXbook.)"); 8996 @:TeXbook}{\sl The \TeX book@> 8997 error; goto attach_fraction; 8998 end 8999 9000 @ @<Adjust \(f)for the magnification ratio@>= 9001 begin prepare_mag; 9002 if mag<>1000 then 9003 begin cur_val:=xn_over_d(cur_val,1000,mag); 9004 f:=(1000*f+@'200000*remainder) div mag; 9005 cur_val:=cur_val+(f div @'200000); f:=f mod @'200000; 9006 end; 9007 end 9008 9009 @ The necessary conversion factors can all be specified exactly as 9010 fractions whose numerator and denominator sum to 32768 or less. 9011 According to the definitions here, $\rm2660\,dd\approx1000.33297\,mm$; 9012 this agrees well with the value $\rm1000.333\,mm$ cited by Bosshard 9013 @^Bosshard, Hans Rudolf@> 9014 in {\sl Technische Grundlagen zur Satzherstellung\/} (Bern, 1980). 9015 9016 @d set_conversion_end(#)== denom:=#; end 9017 @d set_conversion(#)==@+begin num:=#; set_conversion_end 9018 9019 @<Scan for \(a)all other units and adjust |cur_val| and |f|...@>= 9020 if scan_keyword("in") then set_conversion(7227)(100) 9021 @.in@> 9022 else if scan_keyword("pc") then set_conversion(12)(1) 9023 @.pc@> 9024 else if scan_keyword("cm") then set_conversion(7227)(254) 9025 @.cm@> 9026 else if scan_keyword("mm") then set_conversion(7227)(2540) 9027 @.mm@> 9028 else if scan_keyword("bp") then set_conversion(7227)(7200) 9029 @.bp@> 9030 else if scan_keyword("dd") then set_conversion(1238)(1157) 9031 @.dd@> 9032 else if scan_keyword("cc") then set_conversion(14856)(1157) 9033 @.cc@> 9034 else if scan_keyword("sp") then goto done 9035 @.sp@> 9036 else @<Complain about unknown unit and |goto done2|@>; 9037 cur_val:=xn_over_d(cur_val,num,denom); 9038 f:=(num*f+@'200000*remainder) div denom;@/ 9039 cur_val:=cur_val+(f div @'200000); f:=f mod @'200000; 9040 done2: 9041 9042 @ @<Complain about unknown unit...@>= 9043 begin print_err("Illegal unit of measure ("); print("pt inserted)"); 9044 @.Illegal unit of measure@> 9045 help6("Dimensions can be in units of em, ex, in, pt, pc,")@/ 9046 ("cm, mm, dd, cc, bp, or sp; but yours is a new one!")@/ 9047 ("I'll assume that you meant to say pt, for printer's points.")@/ 9048 ("To recover gracefully from this error, it's best to")@/ 9049 ("delete the erroneous units; e.g., type `2' to delete")@/ 9050 ("two letters. (See Chapter 27 of The TeXbook.)"); 9051 @:TeXbook}{\sl The \TeX book@> 9052 error; goto done2; 9053 end 9054 9055 9056 @ @<Report that this dimension is out of range@>= 9057 begin print_err("Dimension too large"); 9058 @.Dimension too large@> 9059 help2("I can't work with sizes bigger than about 19 feet.")@/ 9060 ("Continue and I'll use the largest value I can.");@/ 9061 error; cur_val:=max_dimen; arith_error:=false; 9062 end 9063 9064 @ The final member of \TeX's value-scanning trio is |scan_glue|, which 9065 makes |cur_val| point to a glue specification. The reference count of that 9066 glue spec will take account of the fact that |cur_val| is pointing to~it. 9067 9068 The |level| parameter should be either |glue_val| or |mu_val|. 9069 9070 Since |scan_dimen| was so much more complex than |scan_int|, we might expect 9071 |scan_glue| to be even worse. But fortunately, it is very simple, since 9072 most of the work has already been done. 9073 9074 @p procedure scan_glue(@!level:small_number); 9075 {sets |cur_val| to a glue spec pointer} 9076 label exit; 9077 var negative:boolean; {should the answer be negated?} 9078 @!q:pointer; {new glue specification} 9079 @!mu:boolean; {does |level=mu_val|?} 9080 begin mu:=(level=mu_val); @<Get the next non-blank non-sign...@>; 9081 if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then 9082 begin scan_something_internal(level,negative); 9083 if cur_val_level>=glue_val then 9084 begin if cur_val_level<>level then mu_error; 9085 return; 9086 end; 9087 if cur_val_level=int_val then scan_dimen(mu,false,true) 9088 else if level=mu_val then mu_error; 9089 end 9090 else begin back_input; scan_dimen(mu,false,false); 9091 if negative then negate(cur_val); 9092 end; 9093 @<Create a new glue specification whose width is |cur_val|; scan for its 9094 stretch and shrink components@>; 9095 exit:end; 9096 9097 @ @<Create a new glue specification whose width is |cur_val|...@>= 9098 q:=new_spec(zero_glue); width(q):=cur_val; 9099 if scan_keyword("plus") then 9100 @.plus@> 9101 begin scan_dimen(mu,true,false); 9102 stretch(q):=cur_val; stretch_order(q):=cur_order; 9103 end; 9104 if scan_keyword("minus") then 9105 @.minus@> 9106 begin scan_dimen(mu,true,false); 9107 shrink(q):=cur_val; shrink_order(q):=cur_order; 9108 end; 9109 cur_val:=q 9110 9111 @ Here's a similar procedure that returns a pointer to a rule node. This 9112 routine is called just after \TeX\ has seen \.{\\hrule} or \.{\\vrule}; 9113 therefore |cur_cmd| will be either |hrule| or |vrule|. The idea is to store 9114 the default rule dimensions in the node, then to override them if 9115 `\.{height}' or `\.{width}' or `\.{depth}' specifications are 9116 found (in any order). 9117 9118 @d default_rule=26214 {0.4\thinspace pt} 9119 9120 @p function scan_rule_spec:pointer; 9121 label reswitch; 9122 var q:pointer; {the rule node being created} 9123 begin q:=new_rule; {|width|, |depth|, and |height| all equal |null_flag| now} 9124 if cur_cmd=vrule then width(q):=default_rule 9125 else begin height(q):=default_rule; depth(q):=0; 9126 end; 9127 reswitch: if scan_keyword("width") then 9128 @.width@> 9129 begin scan_normal_dimen; width(q):=cur_val; goto reswitch; 9130 end; 9131 if scan_keyword("height") then 9132 @.height@> 9133 begin scan_normal_dimen; height(q):=cur_val; goto reswitch; 9134 end; 9135 if scan_keyword("depth") then 9136 @.depth@> 9137 begin scan_normal_dimen; depth(q):=cur_val; goto reswitch; 9138 end; 9139 scan_rule_spec:=q; 9140 end; 9141 9142 @* \[27] Building token lists. 9143 The token lists for macros and for other things like \.{\\mark} and \.{\\output} 9144 and \.{\\write} are produced by a procedure called |scan_toks|. 9145 9146 Before we get into the details of |scan_toks|, let's consider a much 9147 simpler task, that of converting the current string into a token list. 9148 The |str_toks| function does this; it classifies spaces as type |spacer| 9149 and everything else as type |other_char|. 9150 9151 The token list created by |str_toks| begins at |link(temp_head)| and ends 9152 at the value |p| that is returned. (If |p=temp_head|, the list is empty.) 9153 9154 @p function str_toks(@!b:pool_pointer):pointer; 9155 {converts |str_pool[b..pool_ptr-1]| to a token list} 9156 var p:pointer; {tail of the token list} 9157 @!q:pointer; {new node being added to the token list via |store_new_token|} 9158 @!t:halfword; {token being appended} 9159 @!k:pool_pointer; {index into |str_pool|} 9160 begin str_room(1); 9161 p:=temp_head; link(p):=null; k:=b; 9162 while k<pool_ptr do 9163 begin t:=so(str_pool[k]); 9164 if t=" " then t:=space_token 9165 else t:=other_token+t; 9166 fast_store_new_token(t); 9167 incr(k); 9168 end; 9169 pool_ptr:=b; str_toks:=p; 9170 end; 9171 9172 @ The main reason for wanting |str_toks| is the next function, 9173 |the_toks|, which has similar input/output characteristics. 9174 9175 This procedure is supposed to scan something like `\.{\\skip\\count12}', 9176 i.e., whatever can follow `\.{\\the}', and it constructs a token list 9177 containing something like `\.{-3.0pt minus 0.5fill}'. 9178 9179 @p function the_toks:pointer; 9180 var old_setting:0..max_selector; {holds |selector| setting} 9181 @!p,@!q,@!r:pointer; {used for copying a token list} 9182 @!b:pool_pointer; {base of temporary string} 9183 begin get_x_token; scan_something_internal(tok_val,false); 9184 if cur_val_level>=ident_val then @<Copy the token list@> 9185 else begin old_setting:=selector; selector:=new_string; b:=pool_ptr; 9186 case cur_val_level of 9187 int_val:print_int(cur_val); 9188 dimen_val:begin print_scaled(cur_val); print("pt"); 9189 end; 9190 glue_val: begin print_spec(cur_val,"pt"); delete_glue_ref(cur_val); 9191 end; 9192 mu_val: begin print_spec(cur_val,"mu"); delete_glue_ref(cur_val); 9193 end; 9194 end; {there are no other cases} 9195 selector:=old_setting; the_toks:=str_toks(b); 9196 end; 9197 end; 9198 9199 @ @<Copy the token list@>= 9200 begin p:=temp_head; link(p):=null; 9201 if cur_val_level=ident_val then store_new_token(cs_token_flag+cur_val) 9202 else if cur_val<>null then 9203 begin r:=link(cur_val); {do not copy the reference count} 9204 while r<>null do 9205 begin fast_store_new_token(info(r)); r:=link(r); 9206 end; 9207 end; 9208 the_toks:=p; 9209 end 9210 9211 @ Here's part of the |expand| subroutine that we are now ready to complete: 9212 9213 @p procedure ins_the_toks; 9214 begin link(garbage):=the_toks; ins_list(link(temp_head)); 9215 end; 9216 9217 @ The primitives \.{\\number}, \.{\\romannumeral}, \.{\\string}, \.{\\meaning}, 9218 \.{\\fontname}, and \.{\\jobname} are defined as follows. 9219 9220 @d number_code=0 {command code for \.{\\number}} 9221 @d roman_numeral_code=1 {command code for \.{\\romannumeral}} 9222 @d string_code=2 {command code for \.{\\string}} 9223 @d meaning_code=3 {command code for \.{\\meaning}} 9224 @d font_name_code=4 {command code for \.{\\fontname}} 9225 @d job_name_code=5 {command code for \.{\\jobname}} 9226 9227 @<Put each...@>= 9228 primitive("number",convert,number_code);@/ 9229 @!@:number_}{\.{\\number} primitive@> 9230 primitive("romannumeral",convert,roman_numeral_code);@/ 9231 @!@:roman_numeral_}{\.{\\romannumeral} primitive@> 9232 primitive("string",convert,string_code);@/ 9233 @!@:string_}{\.{\\string} primitive@> 9234 primitive("meaning",convert,meaning_code);@/ 9235 @!@:meaning_}{\.{\\meaning} primitive@> 9236 primitive("fontname",convert,font_name_code);@/ 9237 @!@:font_name_}{\.{\\fontname} primitive@> 9238 primitive("jobname",convert,job_name_code);@/ 9239 @!@:job_name_}{\.{\\jobname} primitive@> 9240 9241 @ @<Cases of |print_cmd_chr|...@>= 9242 convert: case chr_code of 9243 number_code: print_esc("number"); 9244 roman_numeral_code: print_esc("romannumeral"); 9245 string_code: print_esc("string"); 9246 meaning_code: print_esc("meaning"); 9247 font_name_code: print_esc("fontname"); 9248 othercases print_esc("jobname") 9249 endcases; 9250 9251 @ The procedure |conv_toks| uses |str_toks| to insert the token list 9252 for |convert| functions into the scanner; `\.{\\outer}' control sequences 9253 are allowed to follow `\.{\\string}' and `\.{\\meaning}'. 9254 9255 @p procedure conv_toks; 9256 var old_setting:0..max_selector; {holds |selector| setting} 9257 @!c:number_code..job_name_code; {desired type of conversion} 9258 @!save_scanner_status:small_number; {|scanner_status| upon entry} 9259 @!b:pool_pointer; {base of temporary string} 9260 begin c:=cur_chr; @<Scan the argument for command |c|@>; 9261 old_setting:=selector; selector:=new_string; b:=pool_ptr; 9262 @<Print the result of command |c|@>; 9263 selector:=old_setting; link(garbage):=str_toks(b); ins_list(link(temp_head)); 9264 end; 9265 9266 @ @<Scan the argument for command |c|@>= 9267 case c of 9268 number_code,roman_numeral_code: scan_int; 9269 string_code, meaning_code: begin save_scanner_status:=scanner_status; 9270 scanner_status:=normal; get_token; scanner_status:=save_scanner_status; 9271 end; 9272 font_name_code: scan_font_ident; 9273 job_name_code: if job_name=0 then open_log_file; 9274 end {there are no other cases} 9275 9276 @ @<Print the result of command |c|@>= 9277 case c of 9278 number_code: print_int(cur_val); 9279 roman_numeral_code: print_roman_int(cur_val); 9280 string_code:if cur_cs<>0 then sprint_cs(cur_cs) 9281 else print_char(cur_chr); 9282 meaning_code: print_meaning; 9283 font_name_code: begin print(font_name[cur_val]); 9284 if font_size[cur_val]<>font_dsize[cur_val] then 9285 begin print(" at "); print_scaled(font_size[cur_val]); 9286 print("pt"); 9287 end; 9288 end; 9289 job_name_code: print(job_name); 9290 end {there are no other cases} 9291 9292 @ Now we can't postpone the difficulties any longer; we must bravely tackle 9293 |scan_toks|. This function returns a pointer to the tail of a new token 9294 list, and it also makes |def_ref| point to the reference count at the 9295 head of that list. 9296 9297 There are two boolean parameters, |macro_def| and |xpand|. If |macro_def| 9298 is true, the goal is to create the token list for a macro definition; 9299 otherwise the goal is to create the token list for some other \TeX\ 9300 primitive: \.{\\mark}, \.{\\output}, \.{\\everypar}, \.{\\lowercase}, 9301 \.{\\uppercase}, \.{\\message}, \.{\\errmessage}, \.{\\write}, or 9302 \.{\\special}. In the latter cases a left brace must be scanned next; this 9303 left brace will not be part of the token list, nor will the matching right 9304 brace that comes at the end. If |xpand| is false, the token list will 9305 simply be copied from the input using |get_token|. Otherwise all expandable 9306 tokens will be expanded until unexpandable tokens are left, except that 9307 the results of expanding `\.{\\the}' are not expanded further. 9308 If both |macro_def| and |xpand| are true, the expansion applies 9309 only to the macro body (i.e., to the material following the first 9310 |left_brace| character). 9311 9312 The value of |cur_cs| when |scan_toks| begins should be the |eqtb| 9313 address of the control sequence to display in ``runaway'' error 9314 messages. 9315 9316 @p function scan_toks(@!macro_def,@!xpand:boolean):pointer; 9317 label found,continue,done,done1,done2; 9318 var t:halfword; {token representing the highest parameter number} 9319 @!s:halfword; {saved token} 9320 @!p:pointer; {tail of the token list being built} 9321 @!q:pointer; {new node being added to the token list via |store_new_token|} 9322 @!unbalance:halfword; {number of unmatched left braces} 9323 @!hash_brace:halfword; {possible `\.{\#\{}' token} 9324 begin if macro_def then scanner_status:=defining 9325 @+else scanner_status:=absorbing; 9326 warning_index:=cur_cs; def_ref:=get_avail; token_ref_count(def_ref):=null; 9327 p:=def_ref; hash_brace:=0; t:=zero_token; 9328 if macro_def then @<Scan and build the parameter part of the macro definition@> 9329 else scan_left_brace; {remove the compulsory left brace} 9330 @<Scan and build the body of the token list; |goto found| when finished@>; 9331 found: scanner_status:=normal; 9332 if hash_brace<>0 then store_new_token(hash_brace); 9333 scan_toks:=p; 9334 end; 9335 9336 @ @<Scan and build the parameter part...@>= 9337 begin loop begin continue: get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|} 9338 if cur_tok<right_brace_limit then goto done1; 9339 if cur_cmd=mac_param then 9340 @<If the next character is a parameter number, make |cur_tok| 9341 a |match| token; but if it is a left brace, store 9342 `|left_brace|, |end_match|', set |hash_brace|, and |goto done|@>; 9343 store_new_token(cur_tok); 9344 end; 9345 done1: store_new_token(end_match_token); 9346 if cur_cmd=right_brace then 9347 @<Express shock at the missing left brace; |goto found|@>; 9348 done: end 9349 9350 @ @<Express shock...@>= 9351 begin print_err("Missing { inserted"); incr(align_state); 9352 @.Missing \{ inserted@> 9353 help2("Where was the left brace? You said something like `\def\a}',")@/ 9354 ("which I'm going to interpret as `\def\a{}'."); error; goto found; 9355 end 9356 9357 @ @<If the next character is a parameter number...@>= 9358 begin s:=match_token+cur_chr; get_token; 9359 if cur_tok<left_brace_limit then 9360 begin hash_brace:=cur_tok; 9361 store_new_token(cur_tok); store_new_token(end_match_token); 9362 goto done; 9363 end; 9364 if t=zero_token+9 then 9365 begin print_err("You already have nine parameters"); 9366 @.You already have nine...@> 9367 help2("I'm going to ignore the # sign you just used,")@/ 9368 ("as well as the token that followed it."); error; goto continue; 9369 end 9370 else begin incr(t); 9371 if cur_tok<>t then 9372 begin print_err("Parameters must be numbered consecutively"); 9373 @.Parameters...consecutively@> 9374 help2("I've inserted the digit you should have used after the #.")@/ 9375 ("Type `1' to delete what you did use."); back_error; 9376 end; 9377 cur_tok:=s; 9378 end; 9379 end 9380 9381 @ @<Scan and build the body of the token list; |goto found| when finished@>= 9382 unbalance:=1; 9383 loop@+ begin if xpand then @<Expand the next part of the input@> 9384 else get_token; 9385 if cur_tok<right_brace_limit then 9386 if cur_cmd<right_brace then incr(unbalance) 9387 else begin decr(unbalance); 9388 if unbalance=0 then goto found; 9389 end 9390 else if cur_cmd=mac_param then 9391 if macro_def then @<Look for parameter number or \.{\#\#}@>; 9392 store_new_token(cur_tok); 9393 end 9394 9395 @ Here we insert an entire token list created by |the_toks| without 9396 expanding it further. 9397 9398 @<Expand the next part of the input@>= 9399 begin loop begin get_next; 9400 if cur_cmd<=max_command then goto done2; 9401 if cur_cmd<>the then expand 9402 else begin q:=the_toks; 9403 if link(temp_head)<>null then 9404 begin link(p):=link(temp_head); p:=q; 9405 end; 9406 end; 9407 end; 9408 done2: x_token 9409 end 9410 9411 @ @<Look for parameter number...@>= 9412 begin s:=cur_tok; 9413 if xpand then get_x_token else get_token; 9414 if cur_cmd<>mac_param then 9415 if (cur_tok<=zero_token)or(cur_tok>t) then 9416 begin print_err("Illegal parameter number in definition of "); 9417 @.Illegal parameter number...@> 9418 sprint_cs(warning_index); 9419 help3("You meant to type ## instead of #, right?")@/ 9420 ("Or maybe a } was forgotten somewhere earlier, and things")@/ 9421 ("are all screwed up? I'm going to assume that you meant ##."); 9422 back_error; cur_tok:=s; 9423 end 9424 else cur_tok:=out_param_token-"0"+cur_chr; 9425 end 9426 9427 @ Another way to create a token list is via the \.{\\read} command. The 9428 sixteen files potentially usable for reading appear in the following 9429 global variables. The value of |read_open[n]| will be |closed| if 9430 stream number |n| has not been opened or if it has been fully read; 9431 |just_open| if an \.{\\openin} but not a \.{\\read} has been done; 9432 and |normal| if it is open and ready to read the next line. 9433 9434 @d closed=2 {not open, or at end of file} 9435 @d just_open=1 {newly opened, first line not yet read} 9436 9437 @<Glob...@>= 9438 @!read_file:array[0..15] of alpha_file; {used for \.{\\read}} 9439 @!read_open:array[0..16] of normal..closed; {state of |read_file[n]|} 9440 9441 @ @<Set init...@>= 9442 for k:=0 to 16 do read_open[k]:=closed; 9443 9444 @ The |read_toks| procedure constructs a token list like that for any 9445 macro definition, and makes |cur_val| point to it. Parameter |r| points 9446 to the control sequence that will receive this token list. 9447 9448 @p procedure read_toks(@!n:integer;@!r:pointer); 9449 label done; 9450 var p:pointer; {tail of the token list} 9451 @!q:pointer; {new node being added to the token list via |store_new_token|} 9452 @!s:integer; {saved value of |align_state|} 9453 @!m:small_number; {stream number} 9454 begin scanner_status:=defining; warning_index:=r; 9455 def_ref:=get_avail; token_ref_count(def_ref):=null; 9456 p:=def_ref; {the reference count} 9457 store_new_token(end_match_token); 9458 if (n<0)or(n>15) then m:=16@+else m:=n; 9459 s:=align_state; align_state:=1000000; {disable tab marks, etc.} 9460 repeat @<Input and store tokens from the next line of the file@>; 9461 until align_state=1000000; 9462 cur_val:=def_ref; scanner_status:=normal; align_state:=s; 9463 end; 9464 9465 @ @<Input and store tokens from the next line of the file@>= 9466 begin_file_reading; name:=m+1; 9467 if read_open[m]=closed then @<Input for \.{\\read} from the terminal@> 9468 else if read_open[m]=just_open then @<Input the first line of |read_file[m]|@> 9469 else @<Input the next line of |read_file[m]|@>; 9470 limit:=last; 9471 if end_line_char_inactive then decr(limit) 9472 else buffer[limit]:=end_line_char; 9473 first:=limit+1; loc:=start; state:=new_line;@/ 9474 loop@+ begin get_token; 9475 if cur_tok=0 then goto done; 9476 {|cur_cmd=cur_chr=0| will occur at the end of the line} 9477 if align_state<1000000 then {unmatched `\.\}' aborts the line} 9478 begin repeat get_token; until cur_tok=0; 9479 align_state:=1000000; goto done; 9480 end; 9481 store_new_token(cur_tok); 9482 end; 9483 done: end_file_reading 9484 9485 @ Here we input on-line into the |buffer| array, prompting the user explicitly 9486 if |n>=0|. The value of |n| is set negative so that additional prompts 9487 will not be given in the case of multi-line input. 9488 9489 @<Input for \.{\\read} from the terminal@>= 9490 if interaction>nonstop_mode then 9491 if n<0 then prompt_input("") 9492 else begin wake_up_terminal; 9493 print_ln; sprint_cs(r); prompt_input("="); n:=-1; 9494 end 9495 else fatal_error("*** (cannot \read from terminal in nonstop modes)") 9496 @.cannot \\read@> 9497 9498 @ The first line of a file must be treated specially, since |input_ln| 9499 must be told not to start with |get|. 9500 @^system dependencies@> 9501 9502 @<Input the first line of |read_file[m]|@>= 9503 if input_ln(read_file[m],false) then read_open[m]:=normal 9504 else begin a_close(read_file[m]); read_open[m]:=closed; 9505 end 9506 9507 @ An empty line is appended at the end of a |read_file|. 9508 @^empty line at end of file@> 9509 9510 @<Input the next line of |read_file[m]|@>= 9511 begin if not input_ln(read_file[m],true) then 9512 begin a_close(read_file[m]); read_open[m]:=closed; 9513 if align_state<>1000000 then 9514 begin runaway; 9515 print_err("File ended within "); print_esc("read"); 9516 @.File ended within \\read@> 9517 help1("This \read has unbalanced braces."); 9518 align_state:=1000000; limit:=0; error; 9519 end; 9520 end; 9521 end 9522 9523 @* \[28] Conditional processing. 9524 We consider now the way \TeX\ handles various kinds of \.{\\if} commands. 9525 9526 @d if_char_code=0 { `\.{\\if}' } 9527 @d if_cat_code=1 { `\.{\\ifcat}' } 9528 @d if_int_code=2 { `\.{\\ifnum}' } 9529 @d if_dim_code=3 { `\.{\\ifdim}' } 9530 @d if_odd_code=4 { `\.{\\ifodd}' } 9531 @d if_vmode_code=5 { `\.{\\ifvmode}' } 9532 @d if_hmode_code=6 { `\.{\\ifhmode}' } 9533 @d if_mmode_code=7 { `\.{\\ifmmode}' } 9534 @d if_inner_code=8 { `\.{\\ifinner}' } 9535 @d if_void_code=9 { `\.{\\ifvoid}' } 9536 @d if_hbox_code=10 { `\.{\\ifhbox}' } 9537 @d if_vbox_code=11 { `\.{\\ifvbox}' } 9538 @d ifx_code=12 { `\.{\\ifx}' } 9539 @d if_eof_code=13 { `\.{\\ifeof}' } 9540 @d if_true_code=14 { `\.{\\iftrue}' } 9541 @d if_false_code=15 { `\.{\\iffalse}' } 9542 @d if_case_code=16 { `\.{\\ifcase}' } 9543 9544 @<Put each...@>= 9545 primitive("if",if_test,if_char_code); 9546 @!@:if_char_}{\.{\\if} primitive@> 9547 primitive("ifcat",if_test,if_cat_code); 9548 @!@:if_cat_code_}{\.{\\ifcat} primitive@> 9549 primitive("ifnum",if_test,if_int_code); 9550 @!@:if_int_}{\.{\\ifnum} primitive@> 9551 primitive("ifdim",if_test,if_dim_code); 9552 @!@:if_dim_}{\.{\\ifdim} primitive@> 9553 primitive("ifodd",if_test,if_odd_code); 9554 @!@:if_odd_}{\.{\\ifodd} primitive@> 9555 primitive("ifvmode",if_test,if_vmode_code); 9556 @!@:if_vmode_}{\.{\\ifvmode} primitive@> 9557 primitive("ifhmode",if_test,if_hmode_code); 9558 @!@:if_hmode_}{\.{\\ifhmode} primitive@> 9559 primitive("ifmmode",if_test,if_mmode_code); 9560 @!@:if_mmode_}{\.{\\ifmmode} primitive@> 9561 primitive("ifinner",if_test,if_inner_code); 9562 @!@:if_inner_}{\.{\\ifinner} primitive@> 9563 primitive("ifvoid",if_test,if_void_code); 9564 @!@:if_void_}{\.{\\ifvoid} primitive@> 9565 primitive("ifhbox",if_test,if_hbox_code); 9566 @!@:if_hbox_}{\.{\\ifhbox} primitive@> 9567 primitive("ifvbox",if_test,if_vbox_code); 9568 @!@:if_vbox_}{\.{\\ifvbox} primitive@> 9569 primitive("ifx",if_test,ifx_code); 9570 @!@:ifx_}{\.{\\ifx} primitive@> 9571 primitive("ifeof",if_test,if_eof_code); 9572 @!@:if_eof_}{\.{\\ifeof} primitive@> 9573 primitive("iftrue",if_test,if_true_code); 9574 @!@:if_true_}{\.{\\iftrue} primitive@> 9575 primitive("iffalse",if_test,if_false_code); 9576 @!@:if_false_}{\.{\\iffalse} primitive@> 9577 primitive("ifcase",if_test,if_case_code); 9578 @!@:if_case_}{\.{\\ifcase} primitive@> 9579 9580 @ @<Cases of |print_cmd_chr|...@>= 9581 if_test: case chr_code of 9582 if_cat_code:print_esc("ifcat"); 9583 if_int_code:print_esc("ifnum"); 9584 if_dim_code:print_esc("ifdim"); 9585 if_odd_code:print_esc("ifodd"); 9586 if_vmode_code:print_esc("ifvmode"); 9587 if_hmode_code:print_esc("ifhmode"); 9588 if_mmode_code:print_esc("ifmmode"); 9589 if_inner_code:print_esc("ifinner"); 9590 if_void_code:print_esc("ifvoid"); 9591 if_hbox_code:print_esc("ifhbox"); 9592 if_vbox_code:print_esc("ifvbox"); 9593 ifx_code:print_esc("ifx"); 9594 if_eof_code:print_esc("ifeof"); 9595 if_true_code:print_esc("iftrue"); 9596 if_false_code:print_esc("iffalse"); 9597 if_case_code:print_esc("ifcase"); 9598 othercases print_esc("if") 9599 endcases; 9600 9601 @ Conditions can be inside conditions, and this nesting has a stack 9602 that is independent of the |save_stack|. 9603 9604 Four global variables represent the top of the condition stack: 9605 |cond_ptr| points to pushed-down entries, if any; |if_limit| specifies 9606 the largest code of a |fi_or_else| command that is syntactically legal; 9607 |cur_if| is the name of the current type of conditional; and |if_line| 9608 is the line number at which it began. 9609 9610 If no conditions are currently in progress, the condition stack has the 9611 special state |cond_ptr=null|, |if_limit=normal|, |cur_if=0|, |if_line=0|. 9612 Otherwise |cond_ptr| points to a two-word node; the |type|, |subtype|, and 9613 |link| fields of the first word contain |if_limit|, |cur_if|, and 9614 |cond_ptr| at the next level, and the second word contains the 9615 corresponding |if_line|. 9616 9617 @d if_node_size=2 {number of words in stack entry for conditionals} 9618 @d if_line_field(#)==mem[#+1].int 9619 @d if_code=1 {code for \.{\\if...} being evaluated} 9620 @d fi_code=2 {code for \.{\\fi}} 9621 @d else_code=3 {code for \.{\\else}} 9622 @d or_code=4 {code for \.{\\or}} 9623 9624 @<Glob...@>= 9625 @!cond_ptr:pointer; {top of the condition stack} 9626 @!if_limit:normal..or_code; {upper bound on |fi_or_else| codes} 9627 @!cur_if:small_number; {type of conditional being worked on} 9628 @!if_line:integer; {line where that conditional began} 9629 9630 @ @<Set init...@>= 9631 cond_ptr:=null; if_limit:=normal; cur_if:=0; if_line:=0; 9632 9633 @ @<Put each...@>= 9634 primitive("fi",fi_or_else,fi_code); 9635 @!@:fi_}{\.{\\fi} primitive@> 9636 text(frozen_fi):="fi"; eqtb[frozen_fi]:=eqtb[cur_val]; 9637 primitive("or",fi_or_else,or_code); 9638 @!@:or_}{\.{\\or} primitive@> 9639 primitive("else",fi_or_else,else_code); 9640 @!@:else_}{\.{\\else} primitive@> 9641 9642 @ @<Cases of |print_cmd_chr|...@>= 9643 fi_or_else: if chr_code=fi_code then print_esc("fi") 9644 else if chr_code=or_code then print_esc("or") 9645 else print_esc("else"); 9646 9647 @ When we skip conditional text, we keep track of the line number 9648 where skipping began, for use in error messages. 9649 9650 @<Glob...@>= 9651 @!skip_line:integer; {skipping began here} 9652 9653 @ Here is a procedure that ignores text until coming to an \.{\\or}, 9654 \.{\\else}, or \.{\\fi} at the current level of $\.{\\if}\ldots\.{\\fi}$ 9655 nesting. After it has acted, |cur_chr| will indicate the token that 9656 was found, but |cur_tok| will not be set (because this makes the 9657 procedure run faster). 9658 9659 @p procedure pass_text; 9660 label done; 9661 var l:integer; {level of $\.{\\if}\ldots\.{\\fi}$ nesting} 9662 @!save_scanner_status:small_number; {|scanner_status| upon entry} 9663 begin save_scanner_status:=scanner_status; scanner_status:=skipping; l:=0; 9664 skip_line:=line; 9665 loop@+ begin get_next; 9666 if cur_cmd=fi_or_else then 9667 begin if l=0 then goto done; 9668 if cur_chr=fi_code then decr(l); 9669 end 9670 else if cur_cmd=if_test then incr(l); 9671 end; 9672 done: scanner_status:=save_scanner_status; 9673 end; 9674 9675 @ When we begin to process a new \.{\\if}, we set |if_limit:=if_code|; then 9676 if\/ \.{\\or} or \.{\\else} or \.{\\fi} occurs before the current \.{\\if} 9677 condition has been evaluated, \.{\\relax} will be inserted. 9678 For example, a sequence of commands like `\.{\\ifvoid1\\else...\\fi}' 9679 would otherwise require something after the `\.1'. 9680 9681 @<Push the condition stack@>= 9682 begin p:=get_node(if_node_size); link(p):=cond_ptr; type(p):=if_limit; 9683 subtype(p):=cur_if; if_line_field(p):=if_line; 9684 cond_ptr:=p; cur_if:=cur_chr; if_limit:=if_code; if_line:=line; 9685 end 9686 9687 @ @<Pop the condition stack@>= 9688 begin p:=cond_ptr; if_line:=if_line_field(p); 9689 cur_if:=subtype(p); if_limit:=type(p); cond_ptr:=link(p); 9690 free_node(p,if_node_size); 9691 end 9692 9693 @ Here's a procedure that changes the |if_limit| code corresponding to 9694 a given value of |cond_ptr|. 9695 9696 @p procedure change_if_limit(@!l:small_number;@!p:pointer); 9697 label exit; 9698 var q:pointer; 9699 begin if p=cond_ptr then if_limit:=l {that's the easy case} 9700 else begin q:=cond_ptr; 9701 loop@+ begin if q=null then confusion("if"); 9702 @:this can't happen if}{\quad if@> 9703 if link(q)=p then 9704 begin type(q):=l; return; 9705 end; 9706 q:=link(q); 9707 end; 9708 end; 9709 exit:end; 9710 9711 @ A condition is started when the |expand| procedure encounters 9712 an |if_test| command; in that case |expand| reduces to |conditional|, 9713 which is a recursive procedure. 9714 @^recursion@> 9715 9716 @p procedure conditional; 9717 label exit,common_ending; 9718 var b:boolean; {is the condition true?} 9719 @!r:"<"..">"; {relation to be evaluated} 9720 @!m,@!n:integer; {to be tested against the second operand} 9721 @!p,@!q:pointer; {for traversing token lists in \.{\\ifx} tests} 9722 @!save_scanner_status:small_number; {|scanner_status| upon entry} 9723 @!save_cond_ptr:pointer; {|cond_ptr| corresponding to this conditional} 9724 @!this_if:small_number; {type of this conditional} 9725 begin @<Push the condition stack@>;@+save_cond_ptr:=cond_ptr;this_if:=cur_chr;@/ 9726 @<Either process \.{\\ifcase} or set |b| to the value of a boolean condition@>; 9727 if tracing_commands>1 then @<Display the value of |b|@>; 9728 if b then 9729 begin change_if_limit(else_code,save_cond_ptr); 9730 return; {wait for \.{\\else} or \.{\\fi}} 9731 end; 9732 @<Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|@>; 9733 common_ending: if cur_chr=fi_code then @<Pop the condition stack@> 9734 else if_limit:=fi_code; {wait for \.{\\fi}} 9735 exit:end; 9736 9737 @ In a construction like `\.{\\if\\iftrue abc\\else d\\fi}', the first 9738 \.{\\else} that we come to after learning that the \.{\\if} is false is 9739 not the \.{\\else} we're looking for. Hence the following curious 9740 logic is needed. 9741 9742 @ @<Skip to \.{\\else} or \.{\\fi}...@>= 9743 loop@+ begin pass_text; 9744 if cond_ptr=save_cond_ptr then 9745 begin if cur_chr<>or_code then goto common_ending; 9746 print_err("Extra "); print_esc("or"); 9747 @.Extra \\or@> 9748 help1("I'm ignoring this; it doesn't match any \if."); 9749 error; 9750 end 9751 else if cur_chr=fi_code then @<Pop the condition stack@>; 9752 end 9753 9754 @ @<Either process \.{\\ifcase} or set |b|...@>= 9755 case this_if of 9756 if_char_code, if_cat_code: @<Test if two characters match@>; 9757 if_int_code, if_dim_code: @<Test relation between integers or dimensions@>; 9758 if_odd_code: @<Test if an integer is odd@>; 9759 if_vmode_code: b:=(abs(mode)=vmode); 9760 if_hmode_code: b:=(abs(mode)=hmode); 9761 if_mmode_code: b:=(abs(mode)=mmode); 9762 if_inner_code: b:=(mode<0); 9763 if_void_code, if_hbox_code, if_vbox_code: @<Test box register status@>; 9764 ifx_code: @<Test if two tokens match@>; 9765 if_eof_code: begin scan_four_bit_int; b:=(read_open[cur_val]=closed); 9766 end; 9767 if_true_code: b:=true; 9768 if_false_code: b:=false; 9769 if_case_code: @<Select the appropriate case 9770 and |return| or |goto common_ending|@>; 9771 end {there are no other cases} 9772 9773 @ @<Display the value of |b|@>= 9774 begin begin_diagnostic; 9775 if b then print("{true}")@+else print("{false}"); 9776 end_diagnostic(false); 9777 end 9778 9779 @ Here we use the fact that |"<"|, |"="|, and |">"| are consecutive ASCII 9780 codes. 9781 @^ASCII code@> 9782 9783 @<Test relation between integers or dimensions@>= 9784 begin if this_if=if_int_code then scan_int@+else scan_normal_dimen; 9785 n:=cur_val; @<Get the next non-blank non-call...@>; 9786 if (cur_tok>=other_token+"<")and(cur_tok<=other_token+">") then 9787 r:=cur_tok-other_token 9788 else begin print_err("Missing = inserted for "); 9789 @.Missing = inserted@> 9790 print_cmd_chr(if_test,this_if); 9791 help1("I was expecting to see `<', `=', or `>'. Didn't."); 9792 back_error; r:="="; 9793 end; 9794 if this_if=if_int_code then scan_int@+else scan_normal_dimen; 9795 case r of 9796 "<": b:=(n<cur_val); 9797 "=": b:=(n=cur_val); 9798 ">": b:=(n>cur_val); 9799 end; 9800 end 9801 9802 @ @<Test if an integer is odd@>= 9803 begin scan_int; b:=odd(cur_val); 9804 end 9805 9806 @ @<Test box register status@>= 9807 begin scan_eight_bit_int; p:=box(cur_val); 9808 if this_if=if_void_code then b:=(p=null) 9809 else if p=null then b:=false 9810 else if this_if=if_hbox_code then b:=(type(p)=hlist_node) 9811 else b:=(type(p)=vlist_node); 9812 end 9813 9814 @ An active character will be treated as category 13 following 9815 \.{\\if\\noexpand} or following \.{\\ifcat\\noexpand}. We use the fact that 9816 active characters have the smallest tokens, among all control sequences. 9817 9818 @d get_x_token_or_active_char==@t@>@; 9819 begin get_x_token; 9820 if cur_cmd=relax then if cur_chr=no_expand_flag then 9821 begin cur_cmd:=active_char; 9822 cur_chr:=cur_tok-cs_token_flag-active_base; 9823 end; 9824 end 9825 9826 @<Test if two characters match@>= 9827 begin get_x_token_or_active_char; 9828 if (cur_cmd>active_char)or(cur_chr>255) then {not a character} 9829 begin m:=relax; n:=256; 9830 end 9831 else begin m:=cur_cmd; n:=cur_chr; 9832 end; 9833 get_x_token_or_active_char; 9834 if (cur_cmd>active_char)or(cur_chr>255) then 9835 begin cur_cmd:=relax; cur_chr:=256; 9836 end; 9837 if this_if=if_char_code then b:=(n=cur_chr)@+else b:=(m=cur_cmd); 9838 end 9839 9840 @ Note that `\.{\\ifx}' will declare two macros different if one is \\{long} 9841 or \\{outer} and the other isn't, even though the texts of the macros are 9842 the same. 9843 9844 We need to reset |scanner_status|, since \.{\\outer} control sequences 9845 are allowed, but we might be scanning a macro definition or preamble. 9846 9847 @<Test if two tokens match@>= 9848 begin save_scanner_status:=scanner_status; scanner_status:=normal; 9849 get_next; n:=cur_cs; p:=cur_cmd; q:=cur_chr; 9850 get_next; if cur_cmd<>p then b:=false 9851 else if cur_cmd<call then b:=(cur_chr=q) 9852 else @<Test if two macro texts match@>; 9853 scanner_status:=save_scanner_status; 9854 end 9855 9856 @ Note also that `\.{\\ifx}' decides that macros \.{\\a} and \.{\\b} are 9857 different in examples like this: 9858 $$\vbox{\halign{\.{#}\hfil&\qquad\.{#}\hfil\cr 9859 {}\\def\\a\{\\c\}& 9860 {}\\def\\c\{\}\cr 9861 {}\\def\\b\{\\d\}& 9862 {}\\def\\d\{\}\cr}}$$ 9863 9864 @<Test if two macro texts match@>= 9865 begin p:=link(cur_chr); q:=link(equiv(n)); {omit reference counts} 9866 if p=q then b:=true 9867 else begin while (p<>null)and(q<>null) do 9868 if info(p)<>info(q) then p:=null 9869 else begin p:=link(p); q:=link(q); 9870 end; 9871 b:=((p=null)and(q=null)); 9872 end; 9873 end 9874 9875 @ @<Select the appropriate case and |return| or |goto common_ending|@>= 9876 begin scan_int; n:=cur_val; {|n| is the number of cases to pass} 9877 if tracing_commands>1 then 9878 begin begin_diagnostic; print("{case "); print_int(n); print_char("}"); 9879 end_diagnostic(false); 9880 end; 9881 while n<>0 do 9882 begin pass_text; 9883 if cond_ptr=save_cond_ptr then 9884 if cur_chr=or_code then decr(n) 9885 else goto common_ending 9886 else if cur_chr=fi_code then @<Pop the condition stack@>; 9887 end; 9888 change_if_limit(or_code,save_cond_ptr); 9889 return; {wait for \.{\\or}, \.{\\else}, or \.{\\fi}} 9890 end 9891 9892 @ The processing of conditionals is complete except for the following 9893 code, which is actually part of |expand|. It comes into play when 9894 \.{\\or}, \.{\\else}, or \.{\\fi} is scanned. 9895 9896 @<Terminate the current conditional and skip to \.{\\fi}@>= 9897 if cur_chr>if_limit then 9898 if if_limit=if_code then insert_relax {condition not yet evaluated} 9899 else begin print_err("Extra "); print_cmd_chr(fi_or_else,cur_chr); 9900 @.Extra \\or@> 9901 @.Extra \\else@> 9902 @.Extra \\fi@> 9903 help1("I'm ignoring this; it doesn't match any \if."); 9904 error; 9905 end 9906 else begin while cur_chr<>fi_code do pass_text; {skip to \.{\\fi}} 9907 @<Pop the condition stack@>; 9908 end 9909 9910 @* \[29] File names. 9911 It's time now to fret about file names. Besides the fact that different 9912 operating systems treat files in different ways, we must cope with the 9913 fact that completely different naming conventions are used by different 9914 groups of people. The following programs show what is required for one 9915 particular operating system; similar routines for other systems are not 9916 difficult to devise. 9917 @^fingers@> 9918 @^system dependencies@> 9919 9920 \TeX\ assumes that a file name has three parts: the name proper; its 9921 ``extension''; and a ``file area'' where it is found in an external file 9922 system. The extension of an input file or a write file is assumed to be 9923 `\.{.tex}' unless otherwise specified; it is `\.{.log}' on the 9924 transcript file that records each run of \TeX; it is `\.{.tfm}' on the font 9925 metric files that describe characters in the fonts \TeX\ uses; it is 9926 `\.{.dvi}' on the output files that specify typesetting information; and it 9927 is `\.{.fmt}' on the format files written by \.{INITEX} to initialize \TeX. 9928 The file area can be arbitrary on input files, but files are usually 9929 output to the user's current area. If an input file cannot be 9930 found on the specified area, \TeX\ will look for it on a special system 9931 area; this special area is intended for commonly used input files like 9932 \.{webmac.tex}. 9933 9934 Simple uses of \TeX\ refer only to file names that have no explicit 9935 extension or area. For example, a person usually says `\.{\\input} \.{paper}' 9936 or `\.{\\font\\tenrm} \.= \.{helvetica}' instead of `\.{\\input} 9937 \.{paper.new}' or `\.{\\font\\tenrm} \.= \.{<csd.knuth>test}'. Simple file 9938 names are best, because they make the \TeX\ source files portable; 9939 whenever a file name consists entirely of letters and digits, it should be 9940 treated in the same way by all implementations of \TeX. However, users 9941 need the ability to refer to other files in their environment, especially 9942 when responding to error messages concerning unopenable files; therefore 9943 we want to let them use the syntax that appears in their favorite 9944 operating system. 9945 9946 The following procedures don't allow spaces to be part of 9947 file names; but some users seem to like names that are spaced-out. 9948 System-dependent changes to allow such things should probably 9949 be made with reluctance, and only when an entire file name that 9950 includes spaces is ``quoted'' somehow. 9951 9952 @ In order to isolate the system-dependent aspects of file names, the 9953 @^system dependencies@> 9954 system-independent parts of \TeX\ are expressed in terms 9955 of three system-dependent 9956 procedures called |begin_name|, |more_name|, and |end_name|. In 9957 essence, if the user-specified characters of the file name are $c_1\ldots c_n$, 9958 the system-independent driver program does the operations 9959 $$|begin_name|;\,|more_name|(c_1);\,\ldots\,;\,|more_name|(c_n); 9960 \,|end_name|.$$ 9961 These three procedures communicate with each other via global variables. 9962 Afterwards the file name will appear in the string pool as three strings 9963 called |cur_name|\penalty10000\hskip-.05em, 9964 |cur_area|, and |cur_ext|; the latter two are null (i.e., 9965 |""|), unless they were explicitly specified by the user. 9966 9967 Actually the situation is slightly more complicated, because \TeX\ needs 9968 to know when the file name ends. The |more_name| routine is a function 9969 (with side effects) that returns |true| on the calls |more_name|$(c_1)$, 9970 \dots, |more_name|$(c_{n-1})$. The final call |more_name|$(c_n)$ 9971 returns |false|; or, it returns |true| and the token following $c_n$ is 9972 something like `\.{\\hbox}' (i.e., not a character). In other words, 9973 |more_name| is supposed to return |true| unless it is sure that the 9974 file name has been completely scanned; and |end_name| is supposed to be able 9975 to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of 9976 whether $|more_name|(c_n)$ returned |true| or |false|. 9977 9978 @<Glob...@>= 9979 @!cur_name:str_number; {name of file just scanned} 9980 @!cur_area:str_number; {file area just scanned, or \.{""}} 9981 @!cur_ext:str_number; {file extension just scanned, or \.{""}} 9982 9983 @ The file names we shall deal with for illustrative purposes have the 9984 following structure: If the name contains `\.>' or `\.:', the file area 9985 consists of all characters up to and including the final such character; 9986 otherwise the file area is null. If the remaining file name contains 9987 `\..', the file extension consists of all such characters from the first 9988 remaining `\..' to the end, otherwise the file extension is null. 9989 @^system dependencies@> 9990 9991 We can scan such file names easily by using two global variables that keep track 9992 of the occurrences of area and extension delimiters: 9993 9994 @<Glob...@>= 9995 @!area_delimiter:pool_pointer; {the most recent `\.>' or `\.:', if any} 9996 @!ext_delimiter:pool_pointer; {the relevant `\..', if any} 9997 9998 @ Input files that can't be found in the user's area may appear in a standard 9999 system area called |TEX_area|. Font metric files whose areas are not given 10000 explicitly are assumed to appear in a standard system area called 10001 |TEX_font_area|. These system area names will, of course, vary from place 10002 to place. 10003 @^system dependencies@> 10004 10005 @d TEX_area=="TeXinputs:" 10006 @.TeXinputs@> 10007 @d TEX_font_area=="TeXfonts:" 10008 @.TeXfonts@> 10009 10010 @ Here now is the first of the system-dependent routines for file name scanning. 10011 @^system dependencies@> 10012 10013 @p procedure begin_name; 10014 begin area_delimiter:=0; ext_delimiter:=0; 10015 end; 10016 10017 @ And here's the second. The string pool might change as the file name is 10018 being scanned, since a new \.{\\csname} might be entered; therefore we keep 10019 |area_delimiter| and |ext_delimiter| relative to the beginning of the current 10020 string, instead of assigning an absolute address like |pool_ptr| to them. 10021 @^system dependencies@> 10022 10023 @p function more_name(@!c:ASCII_code):boolean; 10024 begin if c=" " then more_name:=false 10025 else begin str_room(1); append_char(c); {contribute |c| to the current string} 10026 if (c=">")or(c=":") then 10027 begin area_delimiter:=cur_length; ext_delimiter:=0; 10028 end 10029 else if (c=".")and(ext_delimiter=0) then ext_delimiter:=cur_length; 10030 more_name:=true; 10031 end; 10032 end; 10033 10034 @ The third. 10035 @^system dependencies@> 10036 10037 @p procedure end_name; 10038 begin if str_ptr+3>max_strings then 10039 overflow("number of strings",max_strings-init_str_ptr); 10040 @:TeX capacity exceeded number of strings}{\quad number of strings@> 10041 if area_delimiter=0 then cur_area:="" 10042 else begin cur_area:=str_ptr; 10043 str_start[str_ptr+1]:=str_start[str_ptr]+area_delimiter; incr(str_ptr); 10044 end; 10045 if ext_delimiter=0 then 10046 begin cur_ext:=""; cur_name:=make_string; 10047 end 10048 else begin cur_name:=str_ptr; 10049 str_start[str_ptr+1]:=str_start[str_ptr]+ext_delimiter-area_delimiter-1; 10050 incr(str_ptr); cur_ext:=make_string; 10051 end; 10052 end; 10053 10054 @ Conversely, here is a routine that takes three strings and prints a file 10055 name that might have produced them. (The routine is system dependent, because 10056 some operating systems put the file area last instead of first.) 10057 @^system dependencies@> 10058 10059 @<Basic printing...@>= 10060 procedure print_file_name(@!n,@!a,@!e:integer); 10061 begin slow_print(a); slow_print(n); slow_print(e); 10062 end; 10063 10064 @ Another system-dependent routine is needed to convert three internal 10065 \TeX\ strings 10066 into the |name_of_file| value that is used to open files. The present code 10067 allows both lowercase and uppercase letters in the file name. 10068 @^system dependencies@> 10069 10070 @d append_to_name(#)==begin c:=#; incr(k); 10071 if k<=file_name_size then name_of_file[k]:=xchr[c]; 10072 end 10073 10074 @p procedure pack_file_name(@!n,@!a,@!e:str_number); 10075 var k:integer; {number of positions filled in |name_of_file|} 10076 @!c: ASCII_code; {character being packed} 10077 @!j:pool_pointer; {index into |str_pool|} 10078 begin k:=0; 10079 for j:=str_start[a] to str_start[a+1]-1 do append_to_name(so(str_pool[j])); 10080 for j:=str_start[n] to str_start[n+1]-1 do append_to_name(so(str_pool[j])); 10081 for j:=str_start[e] to str_start[e+1]-1 do append_to_name(so(str_pool[j])); 10082 if k<=file_name_size then name_length:=k@+else name_length:=file_name_size; 10083 for k:=name_length+1 to file_name_size do name_of_file[k]:=' '; 10084 end; 10085 10086 @ A messier routine is also needed, since format file names must be scanned 10087 before \TeX's string mechanism has been initialized. We shall use the 10088 global variable |TEX_format_default| to supply the text for default system areas 10089 and extensions related to format files. 10090 @^system dependencies@> 10091 10092 @d format_default_length=20 {length of the |TEX_format_default| string} 10093 @d format_area_length=11 {length of its area part} 10094 @d format_ext_length=4 {length of its `\.{.fmt}' part} 10095 @d format_extension=".fmt" {the extension, as a \.{WEB} constant} 10096 10097 @<Glob...@>= 10098 @!TEX_format_default:packed array[1..format_default_length] of char; 10099 10100 @ @<Set init...@>= 10101 TEX_format_default:='TeXformats:plain.fmt'; 10102 @.TeXformats@> 10103 @.plain@> 10104 @^system dependencies@> 10105 10106 @ @<Check the ``constant'' values for consistency@>= 10107 if format_default_length>file_name_size then bad:=31; 10108 10109 @ Here is the messy routine that was just mentioned. It sets |name_of_file| 10110 from the first |n| characters of |TEX_format_default|, followed by 10111 |buffer[a..b]|, followed by the last |format_ext_length| characters of 10112 |TEX_format_default|. 10113 10114 We dare not give error messages here, since \TeX\ calls this routine before 10115 the |error| routine is ready to roll. Instead, we simply drop excess characters, 10116 since the error will be detected in another way when a strange file name 10117 isn't found. 10118 @^system dependencies@> 10119 10120 @p procedure pack_buffered_name(@!n:small_number;@!a,@!b:integer); 10121 var k:integer; {number of positions filled in |name_of_file|} 10122 @!c: ASCII_code; {character being packed} 10123 @!j:integer; {index into |buffer| or |TEX_format_default|} 10124 begin if n+b-a+1+format_ext_length>file_name_size then 10125 b:=a+file_name_size-n-1-format_ext_length; 10126 k:=0; 10127 for j:=1 to n do append_to_name(xord[TEX_format_default[j]]); 10128 for j:=a to b do append_to_name(buffer[j]); 10129 for j:=format_default_length-format_ext_length+1 to format_default_length do 10130 append_to_name(xord[TEX_format_default[j]]); 10131 if k<=file_name_size then name_length:=k@+else name_length:=file_name_size; 10132 for k:=name_length+1 to file_name_size do name_of_file[k]:=' '; 10133 end; 10134 10135 @ Here is the only place we use |pack_buffered_name|. This part of the program 10136 becomes active when a ``virgin'' \TeX\ is trying to get going, just after 10137 the preliminary initialization, or when the user is substituting another 10138 format file by typing `\.\&' after the initial `\.{**}' prompt. The buffer 10139 contains the first line of input in |buffer[loc..(last-1)]|, where 10140 |loc<last| and |buffer[loc]<>" "|. 10141 10142 @<Declare the function called |open_fmt_file|@>= 10143 function open_fmt_file:boolean; 10144 label found,exit; 10145 var j:0..buf_size; {the first space after the format file name} 10146 begin j:=loc; 10147 if buffer[loc]="&" then 10148 begin incr(loc); j:=loc; buffer[last]:=" "; 10149 while buffer[j]<>" " do incr(j); 10150 pack_buffered_name(0,loc,j-1); {try first without the system file area} 10151 if w_open_in(fmt_file) then goto found; 10152 pack_buffered_name(format_area_length,loc,j-1); 10153 {now try the system format file area} 10154 if w_open_in(fmt_file) then goto found; 10155 wake_up_terminal; 10156 wterm_ln('Sorry, I can''t find that format;',' will try PLAIN.'); 10157 @.Sorry, I can't find...@> 10158 update_terminal; 10159 end; 10160 {now pull out all the stops: try for the system \.{plain} file} 10161 pack_buffered_name(format_default_length-format_ext_length,1,0); 10162 if not w_open_in(fmt_file) then 10163 begin wake_up_terminal; 10164 wterm_ln('I can''t find the PLAIN format file!'); 10165 @.I can't find PLAIN...@> 10166 @.plain@> 10167 open_fmt_file:=false; return; 10168 end; 10169 found:loc:=j; open_fmt_file:=true; 10170 exit:end; 10171 10172 @ Operating systems often make it possible to determine the exact name (and 10173 possible version number) of a file that has been opened. The following routine, 10174 which simply makes a \TeX\ string from the value of |name_of_file|, should 10175 ideally be changed to deduce the full name of file~|f|, which is the file 10176 most recently opened, if it is possible to do this in a \PASCAL\ program. 10177 @^system dependencies@> 10178 10179 This routine might be called after string memory has overflowed, hence 10180 we dare not use `|str_room|'. 10181 10182 @p function make_name_string:str_number; 10183 var k:1..file_name_size; {index into |name_of_file|} 10184 begin if (pool_ptr+name_length>pool_size)or(str_ptr=max_strings)or 10185 (cur_length>0) then 10186 make_name_string:="?" 10187 else begin for k:=1 to name_length do append_char(xord[name_of_file[k]]); 10188 make_name_string:=make_string; 10189 end; 10190 end; 10191 function a_make_name_string(var f:alpha_file):str_number; 10192 begin a_make_name_string:=make_name_string; 10193 end; 10194 function b_make_name_string(var f:byte_file):str_number; 10195 begin b_make_name_string:=make_name_string; 10196 end; 10197 function w_make_name_string(var f:word_file):str_number; 10198 begin w_make_name_string:=make_name_string; 10199 end; 10200 10201 @ Now let's consider the ``driver'' 10202 routines by which \TeX\ deals with file names 10203 in a system-independent manner. First comes a procedure that looks for a 10204 file name in the input by calling |get_x_token| for the information. 10205 10206 @p procedure scan_file_name; 10207 label done; 10208 begin name_in_progress:=true; begin_name; 10209 @<Get the next non-blank non-call...@>; 10210 loop@+begin if (cur_cmd>other_char)or(cur_chr>255) then {not a character} 10211 begin back_input; goto done; 10212 end; 10213 if not more_name(cur_chr) then goto done; 10214 get_x_token; 10215 end; 10216 done: end_name; name_in_progress:=false; 10217 end; 10218 10219 @ The global variable |name_in_progress| is used to prevent recursive 10220 use of |scan_file_name|, since the |begin_name| and other procedures 10221 communicate via global variables. Recursion would arise only by 10222 devious tricks like `\.{\\input\\input f}'; such attempts at sabotage 10223 must be thwarted. Furthermore, |name_in_progress| prevents \.{\\input} 10224 @^recursion@> 10225 from being initiated when a font size specification is being scanned. 10226 10227 Another global variable, |job_name|, contains the file name that was first 10228 \.{\\input} by the user. This name is extended by `\.{.log}' and `\.{.dvi}' 10229 and `\.{.fmt}' in the names of \TeX's output files. 10230 10231 @<Glob...@>= 10232 @!name_in_progress:boolean; {is a file name being scanned?} 10233 @!job_name:str_number; {principal file name} 10234 @!log_opened:boolean; {has the transcript file been opened?} 10235 10236 @ Initially |job_name=0|; it becomes nonzero as soon as the true name is known. 10237 We have |job_name=0| if and only if the `\.{log}' file has not been opened, 10238 except of course for a short time just after |job_name| has become nonzero. 10239 10240 @<Initialize the output...@>= 10241 job_name:=0; name_in_progress:=false; log_opened:=false; 10242 10243 @ Here is a routine that manufactures the output file names, assuming that 10244 |job_name<>0|. It ignores and changes the current settings of |cur_area| 10245 and |cur_ext|. 10246 10247 @d pack_cur_name==pack_file_name(cur_name,cur_area,cur_ext) 10248 10249 @p procedure pack_job_name(@!s:str_number); {|s = ".log"|, |".dvi"|, or 10250 |format_extension|} 10251 begin cur_area:=""; cur_ext:=s; 10252 cur_name:=job_name; pack_cur_name; 10253 end; 10254 10255 @ If some trouble arises when \TeX\ tries to open a file, the following 10256 routine calls upon the user to supply another file name. Parameter~|s| 10257 is used in the error message to identify the type of file; parameter~|e| 10258 is the default extension if none is given. Upon exit from the routine, 10259 variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are 10260 ready for another attempt at file opening. 10261 10262 @p procedure prompt_file_name(@!s,@!e:str_number); 10263 label done; 10264 var k:0..buf_size; {index into |buffer|} 10265 begin if interaction=scroll_mode then wake_up_terminal; 10266 if s="input file name" then print_err("I can't find file `") 10267 @.I can't find file x@> 10268 else print_err("I can't write on file `"); 10269 @.I can't write on file x@> 10270 print_file_name(cur_name,cur_area,cur_ext); print("'."); 10271 if e=".tex" then show_context; 10272 print_nl("Please type another "); print(s); 10273 @.Please type...@> 10274 if interaction<scroll_mode then 10275 fatal_error("*** (job aborted, file error in nonstop mode)"); 10276 @.job aborted, file error...@> 10277 clear_terminal; prompt_input(": "); @<Scan file name in the buffer@>; 10278 if cur_ext="" then cur_ext:=e; 10279 pack_cur_name; 10280 end; 10281 10282 @ @<Scan file name in the buffer@>= 10283 begin begin_name; k:=first; 10284 while (buffer[k]=" ")and(k<last) do incr(k); 10285 loop@+ begin if k=last then goto done; 10286 if not more_name(buffer[k]) then goto done; 10287 incr(k); 10288 end; 10289 done:end_name; 10290 end 10291 10292 @ Here's an example of how these conventions are used. Whenever it is time to 10293 ship out a box of stuff, we shall use the macro |ensure_dvi_open|. 10294 10295 @d ensure_dvi_open==if output_file_name=0 then 10296 begin if job_name=0 then open_log_file; 10297 pack_job_name(".dvi"); 10298 while not b_open_out(dvi_file) do 10299 prompt_file_name("file name for output",".dvi"); 10300 output_file_name:=b_make_name_string(dvi_file); 10301 end 10302 10303 @<Glob...@>= 10304 @!dvi_file: byte_file; {the device-independent output goes here} 10305 @!output_file_name: str_number; {full name of the output file} 10306 @!log_name:str_number; {full name of the log file} 10307 10308 @ @<Initialize the output...@>=output_file_name:=0; 10309 10310 @ The |open_log_file| routine is used to open the transcript file and to help 10311 it catch up to what has previously been printed on the terminal. 10312 10313 @p procedure open_log_file; 10314 var old_setting:0..max_selector; {previous |selector| setting} 10315 @!k:0..buf_size; {index into |months| and |buffer|} 10316 @!l:0..buf_size; {end of first input line} 10317 @!months:packed array [1..36] of char; {abbreviations of month names} 10318 begin old_setting:=selector; 10319 if job_name=0 then job_name:="texput"; 10320 @.texput@> 10321 pack_job_name(".log"); 10322 while not a_open_out(log_file) do @<Try to get a different log file name@>; 10323 log_name:=a_make_name_string(log_file); 10324 selector:=log_only; log_opened:=true; 10325 @<Print the banner line, including the date and time@>; 10326 input_stack[input_ptr]:=cur_input; {make sure bottom level is in memory} 10327 print_nl("**"); 10328 @.**@> 10329 l:=input_stack[0].limit_field; {last position of first line} 10330 if buffer[l]=end_line_char then decr(l); 10331 for k:=1 to l do print(buffer[k]); 10332 print_ln; {now the transcript file contains the first line of input} 10333 selector:=old_setting+2; {|log_only| or |term_and_log|} 10334 end; 10335 10336 @ Sometimes |open_log_file| is called at awkward moments when \TeX\ is 10337 unable to print error messages or even to |show_context|. 10338 The |prompt_file_name| routine can result in a |fatal_error|, but the |error| 10339 routine will not be invoked because |log_opened| will be false. 10340 10341 The normal idea of |batch_mode| is that nothing at all should be written 10342 on the terminal. However, in the unusual case that 10343 no log file could be opened, we make an exception and allow 10344 an explanatory message to be seen. 10345 10346 Incidentally, the program always refers to the log file as a `\.{transcript 10347 file}', because some systems cannot use the extension `\.{.log}' for 10348 this file. 10349 10350 @<Try to get a different log file name@>= 10351 begin selector:=term_only; 10352 prompt_file_name("transcript file name",".log"); 10353 end 10354 10355 @ @<Print the banner...@>= 10356 begin wlog(banner); 10357 slow_print(format_ident); print(" "); 10358 print_int(sys_day); print_char(" "); 10359 months:='JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC'; 10360 for k:=3*sys_month-2 to 3*sys_month do wlog(months[k]); 10361 print_char(" "); print_int(sys_year); print_char(" "); 10362 print_two(sys_time div 60); print_char(":"); print_two(sys_time mod 60); 10363 end 10364 10365 @ Let's turn now to the procedure that is used to initiate file reading 10366 when an `\.{\\input}' command is being processed. 10367 Beware: For historic reasons, this code foolishly conserves a tiny bit 10368 of string pool space; but that can confuse the interactive `\.E' option. 10369 @^system dependencies@> 10370 10371 @p procedure start_input; {\TeX\ will \.{\\input} something} 10372 label done; 10373 begin scan_file_name; {set |cur_name| to desired file name} 10374 if cur_ext="" then cur_ext:=".tex"; 10375 pack_cur_name; 10376 loop@+ begin begin_file_reading; {set up |cur_file| and new level of input} 10377 if a_open_in(cur_file) then goto done; 10378 if cur_area="" then 10379 begin pack_file_name(cur_name,TEX_area,cur_ext); 10380 if a_open_in(cur_file) then goto done; 10381 end; 10382 end_file_reading; {remove the level that didn't work} 10383 prompt_file_name("input file name",".tex"); 10384 end; 10385 done: name:=a_make_name_string(cur_file); 10386 if job_name=0 then 10387 begin job_name:=cur_name; open_log_file; 10388 end; {|open_log_file| doesn't |show_context|, so |limit| 10389 and |loc| needn't be set to meaningful values yet} 10390 if term_offset+length(name)>max_print_line-2 then print_ln 10391 else if (term_offset>0)or(file_offset>0) then print_char(" "); 10392 print_char("("); incr(open_parens); slow_print(name); update_terminal; 10393 state:=new_line; 10394 if name=str_ptr-1 then {conserve string pool space (but see note above)} 10395 begin flush_string; name:=cur_name; 10396 end; 10397 @<Read the first line of the new file@>; 10398 end; 10399 10400 @ Here we have to remember to tell the |input_ln| routine not to 10401 start with a |get|. If the file is empty, it is considered to 10402 contain a single blank line. 10403 @^system dependencies@> 10404 @^empty line at end of file@> 10405 10406 @<Read the first line...@>= 10407 begin line:=1; 10408 if input_ln(cur_file,false) then do_nothing; 10409 firm_up_the_line; 10410 if end_line_char_inactive then decr(limit) 10411 else buffer[limit]:=end_line_char; 10412 first:=limit+1; loc:=start; 10413 end 10414 10415 @* \[30] Font metric data. 10416 \TeX\ gets its knowledge about fonts from font metric files, also called 10417 \.{TFM} files; the `\.T' in `\.{TFM}' stands for \TeX, 10418 but other programs know about them too. 10419 @:TFM files}{\.{TFM} files@> 10420 @^font metric files@> 10421 10422 The information in a \.{TFM} file appears in a sequence of 8-bit bytes. 10423 Since the number of bytes is always a multiple of 4, we could 10424 also regard the file as a sequence of 32-bit words, but \TeX\ uses the 10425 byte interpretation. The format of \.{TFM} files was designed by 10426 Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds 10427 @^Ramshaw, Lyle Harold@> 10428 of information in a compact but useful form. 10429 10430 @<Glob...@>= 10431 @!tfm_file:byte_file; 10432 10433 @ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit 10434 integers that give the lengths of the various subsequent portions 10435 of the file. These twelve integers are, in order: 10436 $$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr 10437 |lf|&length of the entire file, in words;\cr 10438 |lh|&length of the header data, in words;\cr 10439 |bc|&smallest character code in the font;\cr 10440 |ec|&largest character code in the font;\cr 10441 |nw|&number of words in the width table;\cr 10442 |nh|&number of words in the height table;\cr 10443 |nd|&number of words in the depth table;\cr 10444 |ni|&number of words in the italic correction table;\cr 10445 |nl|&number of words in the lig/kern table;\cr 10446 |nk|&number of words in the kern table;\cr 10447 |ne|&number of words in the extensible character table;\cr 10448 |np|&number of font parameter words.\cr}}$$ 10449 They are all nonnegative and less than $2^{15}$. We must have |bc-1<=ec<=255|, 10450 and 10451 $$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$ 10452 Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|), 10453 and as few as 0 characters (if |bc=ec+1|). 10454 10455 Incidentally, when two or more 8-bit bytes are combined to form an integer of 10456 16 or more bits, the most significant bytes appear first in the file. 10457 This is called BigEndian order. 10458 @!@^BigEndian order@> 10459 10460 @ The rest of the \.{TFM} file may be regarded as a sequence of ten data 10461 arrays having the informal specification 10462 $$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2} 10463 \vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr 10464 header&|[0..lh-1]@t\\{stuff}@>|\cr 10465 char\_info&|[bc..ec]char_info_word|\cr 10466 width&|[0..nw-1]fix_word|\cr 10467 height&|[0..nh-1]fix_word|\cr 10468 depth&|[0..nd-1]fix_word|\cr 10469 italic&|[0..ni-1]fix_word|\cr 10470 lig\_kern&|[0..nl-1]lig_kern_command|\cr 10471 kern&|[0..nk-1]fix_word|\cr 10472 exten&|[0..ne-1]extensible_recipe|\cr 10473 param&|[1..np]fix_word|\cr}}$$ 10474 The most important data type used here is a |@!fix_word|, which is 10475 a 32-bit representation of a binary fraction. A |fix_word| is a signed 10476 quantity, with the two's complement of the entire word used to represent 10477 negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the 10478 binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and 10479 the smallest is $-2048$. We will see below, however, that all but two of 10480 the |fix_word| values must lie between $-16$ and $+16$. 10481 10482 @ The first data array is a block of header information, which contains 10483 general facts about the font. The header must contain at least two words, 10484 |header[0]| and |header[1]|, whose meaning is explained below. 10485 Additional header information of use to other software routines might 10486 also be included, but \TeX82 does not need to know about such details. 10487 For example, 16 more words of header information are in use at the Xerox 10488 Palo Alto Research Center; the first ten specify the character coding 10489 scheme used (e.g., `\.{XEROX text}' or `\.{TeX math symbols}'), the next five 10490 give the font identifier (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the 10491 last gives the ``face byte.'' The program that converts \.{DVI} files 10492 to Xerox printing format gets this information by looking at the \.{TFM} 10493 file, which it needs to read anyway because of other information that 10494 is not explicitly repeated in \.{DVI}~format. 10495 10496 \yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into 10497 the \.{DVI} output file. Later on when the \.{DVI} file is printed, 10498 possibly on another computer, the actual font that gets used is supposed 10499 to have a check sum that agrees with the one in the \.{TFM} file used by 10500 \TeX. In this way, users will be warned about potential incompatibilities. 10501 (However, if the check sum is zero in either the font file or the \.{TFM} 10502 file, no check is made.) The actual relation between this check sum and 10503 the rest of the \.{TFM} file is not important; the check sum is simply an 10504 identification number with the property that incompatible fonts almost 10505 always have distinct check sums. 10506 @^check sum@> 10507 10508 \yskip\hang|header[1]| is a |fix_word| containing the design size of 10509 the font, in units of \TeX\ points. This number must be at least 1.0; it is 10510 fairly arbitrary, but usually the design size is 10.0 for a ``10 point'' 10511 font, i.e., a font that was designed to look best at a 10-point size, 10512 whatever that really means. When a \TeX\ user asks for a font 10513 `\.{at} $\delta$ \.{pt}', the effect is to override the design size 10514 and replace it by $\delta$, and to multiply the $x$ and~$y$ coordinates 10515 of the points in the font image by a factor of $\delta$ divided by the 10516 design size. {\sl All other dimensions in the\/ \.{TFM} file are 10517 |fix_word|\kern-1pt\ numbers in design-size units}, with the exception of 10518 |param[1]| (which denotes the slant ratio). Thus, for example, the value 10519 of |param[6]|, which defines the \.{em} unit, is often the |fix_word| value 10520 $2^{20}=1.0$, since many fonts have a design size equal to one em. 10521 The other dimensions must be less than 16 design-size units in absolute 10522 value; thus, |header[1]| and |param[1]| are the only |fix_word| 10523 entries in the whole \.{TFM} file whose first byte might be something 10524 besides 0 or 255. 10525 10526 @ Next comes the |char_info| array, which contains one |@!char_info_word| 10527 per character. Each word in this part of the file contains six fields 10528 packed into four bytes as follows. 10529 10530 \yskip\hang first byte: |@!width_index| (8 bits)\par 10531 \hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index| 10532 (4~bits)\par 10533 \hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag| 10534 (2~bits)\par 10535 \hang fourth byte: |@!remainder| (8 bits)\par 10536 \yskip\noindent 10537 The actual width of a character is \\{width}|[width_index]|, in design-size 10538 units; this is a device for compressing information, since many characters 10539 have the same width. Since it is quite common for many characters 10540 to have the same height, depth, or italic correction, the \.{TFM} format 10541 imposes a limit of 16 different heights, 16 different depths, and 10542 64 different italic corrections. 10543 10544 @!@^italic correction@> 10545 The italic correction of a character has two different uses. 10546 (a)~In ordinary text, the italic correction is added to the width only if 10547 the \TeX\ user specifies `\.{\\/}' after the character. 10548 (b)~In math formulas, the italic correction is always added to the width, 10549 except with respect to the positioning of subscripts. 10550 10551 Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]= 10552 \\{italic}[0]=0$ should always hold, so that an index of zero implies a 10553 value of zero. The |width_index| should never be zero unless the 10554 character does not exist in the font, since a character is valid if and 10555 only if it lies between |bc| and |ec| and has a nonzero |width_index|. 10556 10557 @ The |tag| field in a |char_info_word| has four values that explain how to 10558 interpret the |remainder| field. 10559 10560 \yskip\hangg|tag=0| (|no_tag|) means that |remainder| is unused.\par 10561 \hangg|tag=1| (|lig_tag|) means that this character has a ligature/kerning 10562 program starting at position |remainder| in the |lig_kern| array.\par 10563 \hangg|tag=2| (|list_tag|) means that this character is part of a chain of 10564 characters of ascending sizes, and not the largest in the chain. The 10565 |remainder| field gives the character code of the next larger character.\par 10566 \hangg|tag=3| (|ext_tag|) means that this character code represents an 10567 extensible character, i.e., a character that is built up of smaller pieces 10568 so that it can be made arbitrarily large. The pieces are specified in 10569 |@!exten[remainder]|.\par 10570 \yskip\noindent 10571 Characters with |tag=2| and |tag=3| are treated as characters with |tag=0| 10572 unless they are used in special circumstances in math formulas. For example, 10573 the \.{\\sum} operation looks for a |list_tag|, and the \.{\\left} 10574 operation looks for both |list_tag| and |ext_tag|. 10575 10576 @d no_tag=0 {vanilla character} 10577 @d lig_tag=1 {character has a ligature/kerning program} 10578 @d list_tag=2 {character has a successor in a charlist} 10579 @d ext_tag=3 {character is extensible} 10580 10581 @ The |lig_kern| array contains instructions in a simple programming language 10582 that explains what to do for special letter pairs. Each word in this array is a 10583 |@!lig_kern_command| of four bytes. 10584 10585 \yskip\hang first byte: |skip_byte|, indicates that this is the final program 10586 step if the byte is 128 or more, otherwise the next step is obtained by 10587 skipping this number of intervening steps.\par 10588 \hang second byte: |next_char|, ``if |next_char| follows the current character, 10589 then perform the operation and stop, otherwise continue.''\par 10590 \hang third byte: |op_byte|, indicates a ligature step if less than~128, 10591 a kern step otherwise.\par 10592 \hang fourth byte: |remainder|.\par 10593 \yskip\noindent 10594 In a kern step, an 10595 additional space equal to |kern[256*(op_byte-128)+remainder]| is inserted 10596 between the current character and |next_char|. This amount is 10597 often negative, so that the characters are brought closer together 10598 by kerning; but it might be positive. 10599 10600 There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where 10601 $0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is 10602 |remainder| is inserted between the current character and |next_char|; 10603 then the current character is deleted if $b=0$, and |next_char| is 10604 deleted if $c=0$; then we pass over $a$~characters to reach the next 10605 current character (which may have a ligature/kerning program of its own). 10606 10607 If the very first instruction of the |lig_kern| array has |skip_byte=255|, 10608 the |next_char| byte is the so-called boundary character of this font; 10609 the value of |next_char| need not lie between |bc| and~|ec|. 10610 If the very last instruction of the |lig_kern| array has |skip_byte=255|, 10611 there is a special ligature/kerning program for a boundary character at the 10612 left, beginning at location |256*op_byte+remainder|. 10613 The interpretation is that \TeX\ puts implicit boundary characters 10614 before and after each consecutive string of characters from the same font. 10615 These implicit characters do not appear in the output, but they can affect 10616 ligatures and kerning. 10617 10618 If the very first instruction of a character's |lig_kern| program has 10619 |skip_byte>128|, the program actually begins in location 10620 |256*op_byte+remainder|. This feature allows access to large |lig_kern| 10621 arrays, because the first instruction must otherwise 10622 appear in a location |<=255|. 10623 10624 Any instruction with |skip_byte>128| in the |lig_kern| array must satisfy 10625 the condition 10626 $$\hbox{|256*op_byte+remainder<nl|.}$$ 10627 If such an instruction is encountered during 10628 normal program execution, it denotes an unconditional halt; no ligature 10629 or kerning command is performed. 10630 10631 @d stop_flag==qi(128) {value indicating `\.{STOP}' in a lig/kern program} 10632 @d kern_flag==qi(128) {op code for a kern step} 10633 @d skip_byte(#)==#.b0 10634 @d next_char(#)==#.b1 10635 @d op_byte(#)==#.b2 10636 @d rem_byte(#)==#.b3 10637 10638 @ Extensible characters are specified by an |@!extensible_recipe|, which 10639 consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this 10640 order). These bytes are the character codes of individual pieces used to 10641 build up a large symbol. If |top|, |mid|, or |bot| are zero, they are not 10642 present in the built-up result. For example, an extensible vertical line is 10643 like an extensible bracket, except that the top and bottom pieces are missing. 10644 10645 Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box 10646 if the piece isn't present. Then the extensible characters have the form 10647 $TR^kMR^kB$ from top to bottom, for some |k>=0|, unless $M$ is absent; 10648 in the latter case we can have $TR^kB$ for both even and odd values of~|k|. 10649 The width of the extensible character is the width of $R$; and the 10650 height-plus-depth is the sum of the individual height-plus-depths of the 10651 components used, since the pieces are butted together in a vertical list. 10652 10653 @d ext_top(#)==#.b0 {|top| piece in a recipe} 10654 @d ext_mid(#)==#.b1 {|mid| piece in a recipe} 10655 @d ext_bot(#)==#.b2 {|bot| piece in a recipe} 10656 @d ext_rep(#)==#.b3 {|rep| piece in a recipe} 10657 10658 @ The final portion of a \.{TFM} file is the |param| array, which is another 10659 sequence of |fix_word| values. 10660 10661 \yskip\hang|param[1]=slant| is the amount of italic slant, which is used 10662 to help position accents. For example, |slant=.25| means that when you go 10663 up one unit, you also go .25 units to the right. The |slant| is a pure 10664 number; it's the only |fix_word| other than the design size itself that is 10665 not scaled by the design size. 10666 10667 \hang|param[2]=space| is the normal spacing between words in text. 10668 Note that character |" "| in the font need not have anything to do with 10669 blank spaces. 10670 10671 \hang|param[3]=space_stretch| is the amount of glue stretching between words. 10672 10673 \hang|param[4]=space_shrink| is the amount of glue shrinking between words. 10674 10675 \hang|param[5]=x_height| is the size of one ex in the font; it is also 10676 the height of letters for which accents don't have to be raised or lowered. 10677 10678 \hang|param[6]=quad| is the size of one em in the font. 10679 10680 \hang|param[7]=extra_space| is the amount added to |param[2]| at the 10681 ends of sentences. 10682 10683 \yskip\noindent 10684 If fewer than seven parameters are present, \TeX\ sets the missing parameters 10685 to zero. Fonts used for math symbols are required to have 10686 additional parameter information, which is explained later. 10687 10688 @d slant_code=1 10689 @d space_code=2 10690 @d space_stretch_code=3 10691 @d space_shrink_code=4 10692 @d x_height_code=5 10693 @d quad_code=6 10694 @d extra_space_code=7 10695 10696 @ So that is what \.{TFM} files hold. Since \TeX\ has to absorb such information 10697 about lots of fonts, it stores most of the data in a large array called 10698 |font_info|. Each item of |font_info| is a |memory_word|; the |fix_word| 10699 data gets converted into |scaled| entries, while everything else goes into 10700 words of type |four_quarters|. 10701 10702 When the user defines \.{\\font\\f}, say, \TeX\ assigns an internal number 10703 to the user's font~\.{\\f}. Adding this number to |font_id_base| gives the 10704 |eqtb| location of a ``frozen'' control sequence that will always select 10705 the font. 10706 10707 @<Types...@>= 10708 @!internal_font_number=font_base..font_max; {|font| in a |char_node|} 10709 @!font_index=0..font_mem_size; {index into |font_info|} 10710 10711 @ Here now is the (rather formidable) array of font arrays. 10712 10713 @d non_char==qi(256) {a |halfword| code that can't match a real character} 10714 @d non_address=0 {a spurious |bchar_label|} 10715 10716 @<Glob...@>= 10717 @!font_info:array[font_index] of memory_word; 10718 {the big collection of font data} 10719 @!fmem_ptr:font_index; {first unused word of |font_info|} 10720 @!font_ptr:internal_font_number; {largest internal font number in use} 10721 @!font_check:array[internal_font_number] of four_quarters; {check sum} 10722 @!font_size:array[internal_font_number] of scaled; {``at'' size} 10723 @!font_dsize:array[internal_font_number] of scaled; {``design'' size} 10724 @!font_params:array[internal_font_number] of font_index; {how many font 10725 parameters are present} 10726 @!font_name:array[internal_font_number] of str_number; {name of the font} 10727 @!font_area:array[internal_font_number] of str_number; {area of the font} 10728 @!font_bc:array[internal_font_number] of eight_bits; 10729 {beginning (smallest) character code} 10730 @!font_ec:array[internal_font_number] of eight_bits; 10731 {ending (largest) character code} 10732 @!font_glue:array[internal_font_number] of pointer; 10733 {glue specification for interword space, |null| if not allocated} 10734 @!font_used:array[internal_font_number] of boolean; 10735 {has a character from this font actually appeared in the output?} 10736 @!hyphen_char:array[internal_font_number] of integer; 10737 {current \.{\\hyphenchar} values} 10738 @!skew_char:array[internal_font_number] of integer; 10739 {current \.{\\skewchar} values} 10740 @!bchar_label:array[internal_font_number] of font_index; 10741 {start of |lig_kern| program for left boundary character, 10742 |non_address| if there is none} 10743 @!font_bchar:array[internal_font_number] of min_quarterword..non_char; 10744 {boundary character, |non_char| if there is none} 10745 @!font_false_bchar:array[internal_font_number] of min_quarterword..non_char; 10746 {|font_bchar| if it doesn't exist in the font, otherwise |non_char|} 10747 10748 @ Besides the arrays just enumerated, we have directory arrays that make it 10749 easy to get at the individual entries in |font_info|. For example, the 10750 |char_info| data for character |c| in font |f| will be in 10751 |font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index| 10752 part of this word (the |b0| field), the width of the character is 10753 |font_info[width_base[f]+w].sc|. (These formulas assume that 10754 |min_quarterword| has already been added to |c| and to |w|, since \TeX\ 10755 stores its quarterwords that way.) 10756 10757 @<Glob...@>= 10758 @!char_base:array[internal_font_number] of integer; 10759 {base addresses for |char_info|} 10760 @!width_base:array[internal_font_number] of integer; 10761 {base addresses for widths} 10762 @!height_base:array[internal_font_number] of integer; 10763 {base addresses for heights} 10764 @!depth_base:array[internal_font_number] of integer; 10765 {base addresses for depths} 10766 @!italic_base:array[internal_font_number] of integer; 10767 {base addresses for italic corrections} 10768 @!lig_kern_base:array[internal_font_number] of integer; 10769 {base addresses for ligature/kerning programs} 10770 @!kern_base:array[internal_font_number] of integer; 10771 {base addresses for kerns} 10772 @!exten_base:array[internal_font_number] of integer; 10773 {base addresses for extensible recipes} 10774 @!param_base:array[internal_font_number] of integer; 10775 {base addresses for font parameters} 10776 10777 @ @<Set init...@>= 10778 for k:=font_base to font_max do font_used[k]:=false; 10779 10780 @ \TeX\ always knows at least one font, namely the null font. It has no 10781 characters, and its seven parameters are all equal to zero. 10782 10783 @<Initialize table...@>= 10784 font_ptr:=null_font; fmem_ptr:=7; 10785 font_name[null_font]:="nullfont"; font_area[null_font]:=""; 10786 hyphen_char[null_font]:="-"; skew_char[null_font]:=-1; 10787 bchar_label[null_font]:=non_address; 10788 font_bchar[null_font]:=non_char; font_false_bchar[null_font]:=non_char; 10789 font_bc[null_font]:=1; font_ec[null_font]:=0; 10790 font_size[null_font]:=0; font_dsize[null_font]:=0; 10791 char_base[null_font]:=0; width_base[null_font]:=0; 10792 height_base[null_font]:=0; depth_base[null_font]:=0; 10793 italic_base[null_font]:=0; lig_kern_base[null_font]:=0; 10794 kern_base[null_font]:=0; exten_base[null_font]:=0; 10795 font_glue[null_font]:=null; font_params[null_font]:=7; 10796 param_base[null_font]:=-1; 10797 for k:=0 to 6 do font_info[k].sc:=0; 10798 10799 @ @<Put each...@>= 10800 primitive("nullfont",set_font,null_font); 10801 @!@:null_font_}{\.{\\nullfont} primitive@> 10802 text(frozen_null_font):="nullfont"; eqtb[frozen_null_font]:=eqtb[cur_val]; 10803 10804 @ Of course we want to define macros that suppress the detail of how font 10805 information is actually packed, so that we don't have to write things like 10806 $$\hbox{|font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0].sc|}$$ 10807 too often. The \.{WEB} definitions here make |char_info(f)(c)| the 10808 |four_quarters| word of font information corresponding to character 10809 |c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be 10810 the character's width; hence the long formula above is at least 10811 abbreviated to 10812 $$\hbox{|char_width(f)(char_info(f)(c))|.}$$ 10813 Usually, of course, we will fetch |q| first and look at several of its 10814 fields at the same time. 10815 10816 The italic correction of a character will be denoted by 10817 |char_italic(f)(q)|, so it is analogous to |char_width|. But we will get 10818 at the height and depth in a slightly different way, since we usually want 10819 to compute both height and depth if we want either one. The value of 10820 |height_depth(q)| will be the 8-bit quantity 10821 $$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we 10822 will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and 10823 depth of the character |c| for which |q=char_info(f)(c)|. Got that? 10824 10825 The tag field will be called |char_tag(q)|; the remainder byte will be 10826 called |rem_byte(q)|, using a macro that we have already defined above. 10827 10828 Access to a character's |width|, |height|, |depth|, and |tag| fields is 10829 part of \TeX's inner loop, so we want these macros to produce code that is 10830 as fast as possible under the circumstances. 10831 @^inner loop@> 10832 10833 @d char_info_end(#)==#].qqqq 10834 @d char_info(#)==font_info[char_base[#]+char_info_end 10835 @d char_width_end(#)==#.b0].sc 10836 @d char_width(#)==font_info[width_base[#]+char_width_end 10837 @d char_exists(#)==(#.b0>min_quarterword) 10838 @d char_italic_end(#)==(qo(#.b2)) div 4].sc 10839 @d char_italic(#)==font_info[italic_base[#]+char_italic_end 10840 @d height_depth(#)==qo(#.b1) 10841 @d char_height_end(#)==(#) div 16].sc 10842 @d char_height(#)==font_info[height_base[#]+char_height_end 10843 @d char_depth_end(#)==(#) mod 16].sc 10844 @d char_depth(#)==font_info[depth_base[#]+char_depth_end 10845 @d char_tag(#)==((qo(#.b2)) mod 4) 10846 10847 @ The global variable |null_character| is set up to be a word of 10848 |char_info| for a character that doesn't exist. Such a word provides a 10849 convenient way to deal with erroneous situations. 10850 10851 @<Glob...@>= 10852 @!null_character:four_quarters; {nonexistent character information} 10853 10854 @ @<Set init...@>= 10855 null_character.b0:=min_quarterword; null_character.b1:=min_quarterword; 10856 null_character.b2:=min_quarterword; null_character.b3:=min_quarterword; 10857 10858 @ Here are some macros that help process ligatures and kerns. 10859 We write |char_kern(f)(j)| to find the amount of kerning specified by 10860 kerning command~|j| in font~|f|. If |j| is the |char_info| for a character 10861 with a ligature/kern program, the first instruction of that program is either 10862 |i=font_info[lig_kern_start(f)(j)]| or |font_info[lig_kern_restart(f)(i)]|, 10863 depending on whether or not |skip_byte(i)<=stop_flag|. 10864 10865 The constant |kern_base_offset| should be simplified, for \PASCAL\ compilers 10866 that do not do local optimization. 10867 @^system dependencies@> 10868 10869 @d char_kern_end(#)==256*op_byte(#)+rem_byte(#)].sc 10870 @d char_kern(#)==font_info[kern_base[#]+char_kern_end 10871 @d kern_base_offset==256*(128+min_quarterword) 10872 @d lig_kern_start(#)==lig_kern_base[#]+rem_byte {beginning of lig/kern program} 10873 @d lig_kern_restart_end(#)==256*op_byte(#)+rem_byte(#)+32768-kern_base_offset 10874 @d lig_kern_restart(#)==lig_kern_base[#]+lig_kern_restart_end 10875 10876 @ Font parameters are referred to as |slant(f)|, |space(f)|, etc. 10877 10878 @d param_end(#)==param_base[#]].sc 10879 @d param(#)==font_info[#+param_end 10880 @d slant==param(slant_code) {slant to the right, per unit distance upward} 10881 @d space==param(space_code) {normal space between words} 10882 @d space_stretch==param(space_stretch_code) {stretch between words} 10883 @d space_shrink==param(space_shrink_code) {shrink between words} 10884 @d x_height==param(x_height_code) {one ex} 10885 @d quad==param(quad_code) {one em} 10886 @d extra_space==param(extra_space_code) {additional space at end of sentence} 10887 10888 @<The em width for |cur_font|@>=quad(cur_font) 10889 10890 @ @<The x-height for |cur_font|@>=x_height(cur_font) 10891 10892 @ \TeX\ checks the information of a \.{TFM} file for validity as the 10893 file is being read in, so that no further checks will be needed when 10894 typesetting is going on. The somewhat tedious subroutine that does this 10895 is called |read_font_info|. It has four parameters: the user font 10896 identifier~|u|, the file name and area strings |nom| and |aire|, and the 10897 ``at'' size~|s|. If |s|~is negative, it's the negative of a scale factor 10898 to be applied to the design size; |s=-1000| is the normal case. 10899 Otherwise |s| will be substituted for the design size; in this 10900 case, |s| must be positive and less than $2048\rm\,pt$ 10901 (i.e., it must be less than $2^{27}$ when considered as an integer). 10902 10903 The subroutine opens and closes a global file variable called |tfm_file|. 10904 It returns the value of the internal font number that was just loaded. 10905 If an error is detected, an error message is issued and no font 10906 information is stored; |null_font| is returned in this case. 10907 10908 @d bad_tfm=11 {label for |read_font_info|} 10909 @d abort==goto bad_tfm {do this when the \.{TFM} data is wrong} 10910 10911 @p function read_font_info(@!u:pointer;@!nom,@!aire:str_number; 10912 @!s:scaled):internal_font_number; {input a \.{TFM} file} 10913 label done,bad_tfm,not_found; 10914 var k:font_index; {index into |font_info|} 10915 @!file_opened:boolean; {was |tfm_file| successfully opened?} 10916 @!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np:halfword; 10917 {sizes of subfiles} 10918 @!f:internal_font_number; {the new font's number} 10919 @!g:internal_font_number; {the number to return} 10920 @!a,@!b,@!c,@!d:eight_bits; {byte variables} 10921 @!qw:four_quarters;@!sw:scaled; {accumulators} 10922 @!bch_label:integer; {left boundary start location, or infinity} 10923 @!bchar:0..256; {boundary character, or 256} 10924 @!z:scaled; {the design size or the ``at'' size} 10925 @!alpha:integer;@!beta:1..16; 10926 {auxiliary quantities used in fixed-point multiplication} 10927 begin g:=null_font;@/ 10928 @<Read and check the font data; |abort| if the \.{TFM} file is 10929 malformed; if there's no room for this font, say so and |goto 10930 done|; otherwise |incr(font_ptr)| and |goto done|@>; 10931 bad_tfm: @<Report that the font won't be loaded@>; 10932 done: if file_opened then b_close(tfm_file); 10933 read_font_info:=g; 10934 end; 10935 10936 @ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert 10937 between the \.{TFM} format and a symbolic property-list format 10938 that can be easily edited. These programs contain extensive 10939 diagnostic information, so \TeX\ does not have to bother giving 10940 precise details about why it rejects a particular \.{TFM} file. 10941 @.TFtoPL@> @.PLtoTF@> 10942 10943 @d start_font_error_message==print_err("Font "); sprint_cs(u); 10944 print_char("="); print_file_name(nom,aire,""); 10945 if s>=0 then 10946 begin print(" at "); print_scaled(s); print("pt"); 10947 end 10948 else if s<>-1000 then 10949 begin print(" scaled "); print_int(-s); 10950 end 10951 10952 @<Report that the font won't be loaded@>= 10953 start_font_error_message; 10954 @.Font x=xx not loadable...@> 10955 if file_opened then print(" not loadable: Bad metric (TFM) file") 10956 else print(" not loadable: Metric (TFM) file not found"); 10957 help5("I wasn't able to read the size data for this font,")@/ 10958 ("so I will ignore the font specification.")@/ 10959 ("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/ 10960 ("You might try inserting a different font spec;")@/ 10961 ("e.g., type `I\font<same font id>=<substitute font name>'."); 10962 error 10963 10964 @ @<Read and check...@>= 10965 @<Open |tfm_file| for input@>; 10966 @<Read the {\.{TFM}} size fields@>; 10967 @<Use size fields to allocate font information@>; 10968 @<Read the {\.{TFM}} header@>; 10969 @<Read character data@>; 10970 @<Read box dimensions@>; 10971 @<Read ligature/kern program@>; 10972 @<Read extensible character recipes@>; 10973 @<Read font parameters@>; 10974 @<Make final adjustments and |goto done|@> 10975 10976 @ @<Open |tfm_file| for input@>= 10977 file_opened:=false; 10978 if aire="" then pack_file_name(nom,TEX_font_area,".tfm") 10979 else pack_file_name(nom,aire,".tfm"); 10980 if not b_open_in(tfm_file) then abort; 10981 file_opened:=true 10982 10983 @ Note: A malformed \.{TFM} file might be shorter than it claims to be; 10984 thus |eof(tfm_file)| might be true when |read_font_info| refers to 10985 |tfm_file^| or when it says |get(tfm_file)|. If such circumstances 10986 cause system error messages, you will have to defeat them somehow, 10987 for example by defining |fget| to be `\ignorespaces|begin get(tfm_file);| 10988 |if eof(tfm_file) then abort; end|\unskip'. 10989 @^system dependencies@> 10990 10991 @d fget==get(tfm_file) 10992 @d fbyte==tfm_file^ 10993 @d read_sixteen(#)==begin #:=fbyte; 10994 if #>127 then abort; 10995 fget; #:=#*@'400+fbyte; 10996 end 10997 @d store_four_quarters(#)==begin fget; a:=fbyte; qw.b0:=qi(a); 10998 fget; b:=fbyte; qw.b1:=qi(b); 10999 fget; c:=fbyte; qw.b2:=qi(c); 11000 fget; d:=fbyte; qw.b3:=qi(d); 11001 #:=qw; 11002 end 11003 11004 @ @<Read the {\.{TFM}} size fields@>= 11005 begin read_sixteen(lf); 11006 fget; read_sixteen(lh); 11007 fget; read_sixteen(bc); 11008 fget; read_sixteen(ec); 11009 if (bc>ec+1)or(ec>255) then abort; 11010 if bc>255 then {|bc=256| and |ec=255|} 11011 begin bc:=1; ec:=0; 11012 end; 11013 fget; read_sixteen(nw); 11014 fget; read_sixteen(nh); 11015 fget; read_sixteen(nd); 11016 fget; read_sixteen(ni); 11017 fget; read_sixteen(nl); 11018 fget; read_sixteen(nk); 11019 fget; read_sixteen(ne); 11020 fget; read_sixteen(np); 11021 if lf<>6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then abort; 11022 if (nw=0)or(nh=0)or(nd=0)or(ni=0) then abort; 11023 end 11024 11025 @ The preliminary settings of the index-offset variables |char_base|, 11026 |width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be 11027 corrected later by subtracting |min_quarterword| from them; and we will 11028 subtract 1 from |param_base| too. It's best to forget about such anomalies 11029 until later. 11030 11031 @<Use size fields to allocate font information@>= 11032 lf:=lf-6-lh; {|lf| words should be loaded into |font_info|} 11033 if np<7 then lf:=lf+7-np; {at least seven parameters will appear} 11034 if (font_ptr=font_max)or(fmem_ptr+lf>font_mem_size) then 11035 @<Apologize for not loading the font, |goto done|@>; 11036 f:=font_ptr+1; 11037 char_base[f]:=fmem_ptr-bc; 11038 width_base[f]:=char_base[f]+ec+1; 11039 height_base[f]:=width_base[f]+nw; 11040 depth_base[f]:=height_base[f]+nh; 11041 italic_base[f]:=depth_base[f]+nd; 11042 lig_kern_base[f]:=italic_base[f]+ni; 11043 kern_base[f]:=lig_kern_base[f]+nl-kern_base_offset; 11044 exten_base[f]:=kern_base[f]+kern_base_offset+nk; 11045 param_base[f]:=exten_base[f]+ne 11046 11047 @ @<Apologize for not loading...@>= 11048 begin start_font_error_message; 11049 print(" not loaded: Not enough room left"); 11050 @.Font x=xx not loaded...@> 11051 help4("I'm afraid I won't be able to make use of this font,")@/ 11052 ("because my memory for character-size data is too small.")@/ 11053 ("If you're really stuck, ask a wizard to enlarge me.")@/ 11054 ("Or maybe try `I\font<same font id>=<name of loaded font>'."); 11055 error; goto done; 11056 end 11057 11058 @ Only the first two words of the header are needed by \TeX82. 11059 11060 @<Read the {\.{TFM}} header@>= 11061 begin if lh<2 then abort; 11062 store_four_quarters(font_check[f]); 11063 fget; read_sixteen(z); {this rejects a negative design size} 11064 fget; z:=z*@'400+fbyte; fget; z:=(z*@'20)+(fbyte div@'20); 11065 if z<unity then abort; 11066 while lh>2 do 11067 begin fget;fget;fget;fget;decr(lh); {ignore the rest of the header} 11068 end; 11069 font_dsize[f]:=z; 11070 if s<>-1000 then 11071 if s>=0 then z:=s 11072 else z:=xn_over_d(z,-s,1000); 11073 font_size[f]:=z; 11074 end 11075 11076 @ @<Read character data@>= 11077 for k:=fmem_ptr to width_base[f]-1 do 11078 begin store_four_quarters(font_info[k].qqqq); 11079 if (a>=nw)or(b div @'20>=nh)or(b mod @'20>=nd)or 11080 (c div 4>=ni) then abort; 11081 case c mod 4 of 11082 lig_tag: if d>=nl then abort; 11083 ext_tag: if d>=ne then abort; 11084 list_tag: @<Check for charlist cycle@>; 11085 othercases do_nothing {|no_tag|} 11086 endcases; 11087 end 11088 11089 @ We want to make sure that there is no cycle of characters linked together 11090 by |list_tag| entries, since such a cycle would get \TeX\ into an endless 11091 loop. If such a cycle exists, the routine here detects it when processing 11092 the largest character code in the cycle. 11093 11094 @d check_byte_range(#)==begin if (#<bc)or(#>ec) then abort@+end 11095 @d current_character_being_worked_on==k+bc-fmem_ptr 11096 11097 @<Check for charlist cycle@>= 11098 begin check_byte_range(d); 11099 while d<current_character_being_worked_on do 11100 begin qw:=char_info(f)(d); 11101 {N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet} 11102 if char_tag(qw)<>list_tag then goto not_found; 11103 d:=qo(rem_byte(qw)); {next character on the list} 11104 end; 11105 if d=current_character_being_worked_on then abort; {yes, there's a cycle} 11106 not_found:end 11107 11108 @ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents 11109 the number 11110 $$x=\left\{\vcenter{\halign{$#$,\hfil\qquad&if $#$\hfil\cr 11111 b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=0;\cr 11112 -16+b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=255.\cr}}\right.$$ 11113 (No other choices of |a| are allowed, since the magnitude of a number in 11114 design-size units must be less than 16.) We want to multiply this 11115 quantity by the integer~|z|, which is known to be less than $2^{27}$. 11116 If $|z|<2^{23}$, the individual multiplications $b\cdot z$, 11117 $c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2, 11118 4, 8, or 16, to obtain a multiplier less than $2^{23}$, and we can 11119 compensate for this later. If |z| has thereby been replaced by 11120 $|z|^\prime=|z|/2^e$, let $\beta=2^{4-e}$; we shall compute 11121 $$\lfloor(b+c\cdot2^{-8}+d\cdot2^{-16})\,z^\prime/\beta\rfloor$$ 11122 if $a=0$, or the same quantity minus $\alpha=2^{4+e}z^\prime$ if $a=255$. 11123 This calculation must be done exactly, in order to guarantee portability 11124 of \TeX\ between computers. 11125 11126 @d store_scaled(#)==begin fget; a:=fbyte; fget; b:=fbyte; 11127 fget; c:=fbyte; fget; d:=fbyte;@/ 11128 sw:=(((((d*z)div@'400)+(c*z))div@'400)+(b*z))div beta; 11129 if a=0 then #:=sw@+else if a=255 then #:=sw-alpha@+else abort; 11130 end 11131 11132 @<Read box dimensions@>= 11133 begin @<Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$@>; 11134 for k:=width_base[f] to lig_kern_base[f]-1 do 11135 store_scaled(font_info[k].sc); 11136 if font_info[width_base[f]].sc<>0 then abort; {\\{width}[0] must be zero} 11137 if font_info[height_base[f]].sc<>0 then abort; {\\{height}[0] must be zero} 11138 if font_info[depth_base[f]].sc<>0 then abort; {\\{depth}[0] must be zero} 11139 if font_info[italic_base[f]].sc<>0 then abort; {\\{italic}[0] must be zero} 11140 end 11141 11142 @ @<Replace |z|...@>= 11143 begin alpha:=16; 11144 while z>=@'40000000 do 11145 begin z:=z div 2; alpha:=alpha+alpha; 11146 end; 11147 beta:=256 div alpha; alpha:=alpha*z; 11148 end 11149 11150 @ @d check_existence(#)==@t@>@;@/ 11151 begin check_byte_range(#); 11152 qw:=char_info(f)(#); {N.B.: not |qi(#)|} 11153 if not char_exists(qw) then abort; 11154 end 11155 11156 @<Read ligature/kern program@>= 11157 bch_label:=@'77777; bchar:=256; 11158 if nl>0 then 11159 begin for k:=lig_kern_base[f] to kern_base[f]+kern_base_offset-1 do 11160 begin store_four_quarters(font_info[k].qqqq); 11161 if a>128 then 11162 begin if 256*c+d>=nl then abort; 11163 if a=255 then if k=lig_kern_base[f] then bchar:=b; 11164 end 11165 else begin if b<>bchar then check_existence(b); 11166 if c<128 then check_existence(d) {check ligature} 11167 else if 256*(c-128)+d>=nk then abort; {check kern} 11168 if a<128 then if k-lig_kern_base[f]+a+1>=nl then abort; 11169 end; 11170 end; 11171 if a=255 then bch_label:=256*c+d; 11172 end; 11173 for k:=kern_base[f]+kern_base_offset to exten_base[f]-1 do 11174 store_scaled(font_info[k].sc); 11175 11176 @ @<Read extensible character recipes@>= 11177 for k:=exten_base[f] to param_base[f]-1 do 11178 begin store_four_quarters(font_info[k].qqqq); 11179 if a<>0 then check_existence(a); 11180 if b<>0 then check_existence(b); 11181 if c<>0 then check_existence(c); 11182 check_existence(d); 11183 end 11184 11185 @ We check to see that the \.{TFM} file doesn't end prematurely; but 11186 no error message is given for files having more than |lf| words. 11187 11188 @<Read font parameters@>= 11189 begin for k:=1 to np do 11190 if k=1 then {the |slant| parameter is a pure number} 11191 begin fget; sw:=fbyte; if sw>127 then sw:=sw-256; 11192 fget; sw:=sw*@'400+fbyte; fget; sw:=sw*@'400+fbyte; 11193 fget; font_info[param_base[f]].sc:= 11194 (sw*@'20)+(fbyte div@'20); 11195 end 11196 else store_scaled(font_info[param_base[f]+k-1].sc); 11197 if eof(tfm_file) then abort; 11198 for k:=np+1 to 7 do font_info[param_base[f]+k-1].sc:=0; 11199 end 11200 11201 @ Now to wrap it up, we have checked all the necessary things about the \.{TFM} 11202 file, and all we need to do is put the finishing touches on the data for 11203 the new font. 11204 11205 @d adjust(#)==#[f]:=qo(#[f]) 11206 {correct for the excess |min_quarterword| that was added} 11207 11208 @<Make final adjustments...@>= 11209 if np>=7 then font_params[f]:=np@+else font_params[f]:=7; 11210 hyphen_char[f]:=default_hyphen_char; skew_char[f]:=default_skew_char; 11211 if bch_label<nl then bchar_label[f]:=bch_label+lig_kern_base[f] 11212 else bchar_label[f]:=non_address; 11213 font_bchar[f]:=qi(bchar); 11214 font_false_bchar[f]:=qi(bchar); 11215 if bchar<=ec then if bchar>=bc then 11216 begin qw:=char_info(f)(bchar); {N.B.: not |qi(bchar)|} 11217 if char_exists(qw) then font_false_bchar[f]:=non_char; 11218 end; 11219 font_name[f]:=nom; 11220 font_area[f]:=aire; 11221 font_bc[f]:=bc; font_ec[f]:=ec; font_glue[f]:=null; 11222 adjust(char_base); adjust(width_base); adjust(lig_kern_base); 11223 adjust(kern_base); adjust(exten_base); 11224 decr(param_base[f]); 11225 fmem_ptr:=fmem_ptr+lf; font_ptr:=f; g:=f; goto done 11226 11227 @ Before we forget about the format of these tables, let's deal with two 11228 of \TeX's basic scanning routines related to font information. 11229 11230 @<Declare procedures that scan font-related stuff@>= 11231 procedure scan_font_ident; 11232 var f:internal_font_number; 11233 @!m:halfword; 11234 begin @<Get the next non-blank non-call...@>; 11235 if cur_cmd=def_font then f:=cur_font 11236 else if cur_cmd=set_font then f:=cur_chr 11237 else if cur_cmd=def_family then 11238 begin m:=cur_chr; scan_four_bit_int; f:=equiv(m+cur_val); 11239 end 11240 else begin print_err("Missing font identifier"); 11241 @.Missing font identifier@> 11242 help2("I was looking for a control sequence whose")@/ 11243 ("current meaning has been defined by \font."); 11244 back_error; f:=null_font; 11245 end; 11246 cur_val:=f; 11247 end; 11248 11249 @ The following routine is used to implement `\.{\\fontdimen} |n| |f|'. 11250 The boolean parameter |writing| is set |true| if the calling program 11251 intends to change the parameter value. 11252 11253 @<Declare procedures that scan font-related stuff@>= 11254 procedure find_font_dimen(@!writing:boolean); 11255 {sets |cur_val| to |font_info| location} 11256 var f:internal_font_number; 11257 @!n:integer; {the parameter number} 11258 begin scan_int; n:=cur_val; scan_font_ident; f:=cur_val; 11259 if n<=0 then cur_val:=fmem_ptr 11260 else begin if writing and(n<=space_shrink_code)and@| 11261 (n>=space_code)and(font_glue[f]<>null) then 11262 begin delete_glue_ref(font_glue[f]); 11263 font_glue[f]:=null; 11264 end; 11265 if n>font_params[f] then 11266 if f<font_ptr then cur_val:=fmem_ptr 11267 else @<Increase the number of parameters in the last font@> 11268 else cur_val:=n+param_base[f]; 11269 end; 11270 @<Issue an error message if |cur_val=fmem_ptr|@>; 11271 end; 11272 11273 @ @<Issue an error message if |cur_val=fmem_ptr|@>= 11274 if cur_val=fmem_ptr then 11275 begin print_err("Font "); print_esc(font_id_text(f)); 11276 print(" has only "); print_int(font_params[f]); 11277 print(" fontdimen parameters"); 11278 @.Font x has only...@> 11279 help2("To increase the number of font parameters, you must")@/ 11280 ("use \fontdimen immediately after the \font is loaded."); 11281 error; 11282 end 11283 11284 @ @<Increase the number of parameters...@>= 11285 begin repeat if fmem_ptr=font_mem_size then 11286 overflow("font memory",font_mem_size); 11287 @:TeX capacity exceeded font memory}{\quad font memory@> 11288 font_info[fmem_ptr].sc:=0; incr(fmem_ptr); incr(font_params[f]); 11289 until n=font_params[f]; 11290 cur_val:=fmem_ptr-1; {this equals |param_base[f]+font_params[f]|} 11291 end 11292 11293 @ When \TeX\ wants to typeset a character that doesn't exist, the 11294 character node is not created; thus the output routine can assume 11295 that characters exist when it sees them. The following procedure 11296 prints a warning message unless the user has suppressed it. 11297 11298 @p procedure char_warning(@!f:internal_font_number;@!c:eight_bits); 11299 begin if tracing_lost_chars>0 then 11300 begin begin_diagnostic; 11301 print_nl("Missing character: There is no "); 11302 @.Missing character@> 11303 print_ASCII(c); print(" in font "); 11304 slow_print(font_name[f]); print_char("!"); end_diagnostic(false); 11305 end; 11306 end; 11307 11308 @ Here is a function that returns a pointer to a character node for a 11309 given character in a given font. If that character doesn't exist, 11310 |null| is returned instead. 11311 11312 @p function new_character(@!f:internal_font_number;@!c:eight_bits):pointer; 11313 label exit; 11314 var p:pointer; {newly allocated node} 11315 begin if font_bc[f]<=c then if font_ec[f]>=c then 11316 if char_exists(char_info(f)(qi(c))) then 11317 begin p:=get_avail; font(p):=f; character(p):=qi(c); 11318 new_character:=p; return; 11319 end; 11320 char_warning(f,c); 11321 new_character:=null; 11322 exit:end; 11323 11324 @* \[31] Device-independent file format. 11325 The most important output produced by a run of \TeX\ is the ``device 11326 independent'' (\.{DVI}) file that specifies where characters and rules 11327 are to appear on printed pages. The form of these files was designed by 11328 David R. Fuchs in 1979. Almost any reasonable typesetting device can be 11329 @^Fuchs, David Raymond@> 11330 @:DVI_files}{\.{DVI} files@> 11331 driven by a program that takes \.{DVI} files as input, and dozens of such 11332 \.{DVI}-to-whatever programs have been written. Thus, it is possible to 11333 print the output of \TeX\ on many different kinds of equipment, using \TeX\ 11334 as a device-independent ``front end.'' 11335 11336 A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a 11337 series of commands in a machine-like language. The first byte of each command 11338 is the operation code, and this code is followed by zero or more bytes 11339 that provide parameters to the command. The parameters themselves may consist 11340 of several consecutive bytes; for example, the `|set_rule|' command has two 11341 parameters, each of which is four bytes long. Parameters are usually 11342 regarded as nonnegative integers; but four-byte-long parameters, 11343 and shorter parameters that denote distances, can be 11344 either positive or negative. Such parameters are given in two's complement 11345 notation. For example, a two-byte-long distance parameter has a value between 11346 $-2^{15}$ and $2^{15}-1$. As in \.{TFM} files, numbers that occupy 11347 more than one byte position appear in BigEndian order. 11348 11349 A \.{DVI} file consists of a ``preamble,'' followed by a sequence of one 11350 or more ``pages,'' followed by a ``postamble.'' The preamble is simply a 11351 |pre| command, with its parameters that define the dimensions used in the 11352 file; this must come first. Each ``page'' consists of a |bop| command, 11353 followed by any number of other commands that tell where characters are to 11354 be placed on a physical page, followed by an |eop| command. The pages 11355 appear in the order that \TeX\ generated them. If we ignore |nop| commands 11356 and \\{fnt\_def} commands (which are allowed between any two commands in 11357 the file), each |eop| command is immediately followed by a |bop| command, 11358 or by a |post| command; in the latter case, there are no more pages in the 11359 file, and the remaining bytes form the postamble. Further details about 11360 the postamble will be explained later. 11361 11362 Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte 11363 quantities that give the location number of some other byte in the file; 11364 the first byte is number~0, then comes number~1, and so on. For example, 11365 one of the parameters of a |bop| command points to the previous |bop|; 11366 this makes it feasible to read the pages in backwards order, in case the 11367 results are being directed to a device that stacks its output face up. 11368 Suppose the preamble of a \.{DVI} file occupies bytes 0 to 99. Now if the 11369 first page occupies bytes 100 to 999, say, and if the second 11370 page occupies bytes 1000 to 1999, then the |bop| that starts in byte 1000 11371 points to 100 and the |bop| that starts in byte 2000 points to 1000. (The 11372 very first |bop|, i.e., the one starting in byte 100, has a pointer of~$-1$.) 11373 11374 @ The \.{DVI} format is intended to be both compact and easily interpreted 11375 by a machine. Compactness is achieved by making most of the information 11376 implicit instead of explicit. When a \.{DVI}-reading program reads the 11377 commands for a page, it keeps track of several quantities: (a)~The current 11378 font |f| is an integer; this value is changed only 11379 by \\{fnt} and \\{fnt\_num} commands. (b)~The current position on the page 11380 is given by two numbers called the horizontal and vertical coordinates, 11381 |h| and |v|. Both coordinates are zero at the upper left corner of the page; 11382 moving to the right corresponds to increasing the horizontal coordinate, and 11383 moving down corresponds to increasing the vertical coordinate. Thus, the 11384 coordinates are essentially Cartesian, except that vertical directions are 11385 flipped; the Cartesian version of |(h,v)| would be |(h,-v)|. (c)~The 11386 current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|, 11387 where |w| and~|x| are used for horizontal spacing and where |y| and~|z| 11388 are used for vertical spacing. (d)~There is a stack containing 11389 |(h,v,w,x,y,z)| values; the \.{DVI} commands |push| and |pop| are used to 11390 change the current level of operation. Note that the current font~|f| is 11391 not pushed and popped; the stack contains only information about 11392 positioning. 11393 11394 The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up 11395 to 32 bits, including the sign. Since they represent physical distances, 11396 there is a small unit of measurement such that increasing |h| by~1 means 11397 moving a certain tiny distance to the right. The actual unit of 11398 measurement is variable, as explained below; \TeX\ sets things up so that 11399 its \.{DVI} output is in sp units, i.e., scaled points, in agreement with 11400 all the |scaled| dimensions in \TeX's data structures. 11401 11402 @ Here is a list of all the commands that may appear in a \.{DVI} file. Each 11403 command is specified by its symbolic name (e.g., |bop|), its opcode byte 11404 (e.g., 139), and its parameters (if any). The parameters are followed 11405 by a bracketed number telling how many bytes they occupy; for example, 11406 `|p[4]|' means that parameter |p| is four bytes long. 11407 11408 \yskip\hang|set_char_0| 0. Typeset character number~0 from font~|f| 11409 such that the reference point of the character is at |(h,v)|. Then 11410 increase |h| by the width of that character. Note that a character may 11411 have zero or negative width, so one cannot be sure that |h| will advance 11412 after this command; but |h| usually does increase. 11413 11414 \yskip\hang\\{set\_char\_1} through \\{set\_char\_127} (opcodes 1 to 127). 11415 Do the operations of |set_char_0|; but use the character whose number 11416 matches the opcode, instead of character~0. 11417 11418 \yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character 11419 number~|c| is typeset. \TeX82 uses this command for characters in the 11420 range |128<=c<256|. 11421 11422 \yskip\hang|@!set2| 129 |c[2]|. Same as |set1|, except that |c|~is two 11423 bytes long, so it is in the range |0<=c<65536|. \TeX82 never uses this 11424 command, but it should come in handy for extensions of \TeX\ that deal 11425 with oriental languages. 11426 @^oriental characters@>@^Chinese characters@>@^Japanese characters@> 11427 11428 \yskip\hang|@!set3| 130 |c[3]|. Same as |set1|, except that |c|~is three 11429 bytes long, so it can be as large as $2^{24}-1$. Not even the Chinese 11430 language has this many characters, but this command might prove useful 11431 in some yet unforeseen extension. 11432 11433 \yskip\hang|@!set4| 131 |c[4]|. Same as |set1|, except that |c|~is four 11434 bytes long. Imagine that. 11435 11436 \yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle 11437 of height~|a| and width~|b|, with its bottom left corner at |(h,v)|. Then 11438 set |h:=h+b|. If either |a<=0| or |b<=0|, nothing should be typeset. Note 11439 that if |b<0|, the value of |h| will decrease even though nothing else happens. 11440 See below for details about how to typeset rules so that consistency with 11441 \MF\ is guaranteed. 11442 11443 \yskip\hang|@!put1| 133 |c[1]|. Typeset character number~|c| from font~|f| 11444 such that the reference point of the character is at |(h,v)|. (The `put' 11445 commands are exactly like the `set' commands, except that they simply put out a 11446 character or a rule without moving the reference point afterwards.) 11447 11448 \yskip\hang|@!put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed. 11449 11450 \yskip\hang|@!put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed. 11451 11452 \yskip\hang|@!put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed. 11453 11454 \yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that 11455 |h| is not changed. 11456 11457 \yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s 11458 may occur between \.{DVI} commands, but a |nop| cannot be inserted between 11459 a command and its parameters or between two parameters. 11460 11461 \yskip\hang|bop| 139 $c_0[4]$ $c_1[4]$ $\ldots$ $c_9[4]$ $p[4]$. Beginning 11462 of a page: Set |(h,v,w,x,y,z):=(0,0,0,0,0,0)| and set the stack empty. Set 11463 the current font |f| to an undefined value. The ten $c_i$ parameters hold 11464 the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ at the time 11465 \.{\\shipout} was invoked for this page; they can be used to identify 11466 pages, if a user wants to print only part of a \.{DVI} file. The parameter 11467 |p| points to the previous |bop| in the file; the first 11468 |bop| has $p=-1$. 11469 11470 \yskip\hang|eop| 140. End of page: Print what you have read since the 11471 previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading 11472 programs that drive most output devices will have kept a buffer of the 11473 material that appears on the page that has just ended. This material is 11474 largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by 11475 |h|~coordinate; so it usually needs to be sorted into some order that is 11476 appropriate for the device in question.) 11477 11478 \yskip\hang|push| 141. Push the current values of |(h,v,w,x,y,z)| onto the 11479 top of the stack; do not change any of these values. Note that |f| is 11480 not pushed. 11481 11482 \yskip\hang|pop| 142. Pop the top six values off of the stack and assign 11483 them respectively to |(h,v,w,x,y,z)|. The number of pops should never 11484 exceed the number of pushes, since it would be highly embarrassing if the 11485 stack were empty at the time of a |pop| command. 11486 11487 \yskip\hang|right1| 143 |b[1]|. Set |h:=h+b|, i.e., move right |b| units. 11488 The parameter is a signed number in two's complement notation, |-128<=b<128|; 11489 if |b<0|, the reference point moves left. 11490 11491 \yskip\hang|@!right2| 144 |b[2]|. Same as |right1|, except that |b| is a 11492 two-byte quantity in the range |-32768<=b<32768|. 11493 11494 \yskip\hang|@!right3| 145 |b[3]|. Same as |right1|, except that |b| is a 11495 three-byte quantity in the range |@t$-2^{23}$@><=b<@t$2^{23}$@>|. 11496 11497 \yskip\hang|@!right4| 146 |b[4]|. Same as |right1|, except that |b| is a 11498 four-byte quantity in the range |@t$-2^{31}$@><=b<@t$2^{31}$@>|. 11499 11500 \yskip\hang|w0| 147. Set |h:=h+w|; i.e., move right |w| units. With luck, 11501 this parameterless command will usually suffice, because the same kind of motion 11502 will occur several times in succession; the following commands explain how 11503 |w| gets particular values. 11504 11505 \yskip\hang|w1| 148 |b[1]|. Set |w:=b| and |h:=h+b|. The value of |b| is a 11506 signed quantity in two's complement notation, |-128<=b<128|. This command 11507 changes the current |w|~spacing and moves right by |b|. 11508 11509 \yskip\hang|@!w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long, 11510 |-32768<=b<32768|. 11511 11512 \yskip\hang|@!w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long, 11513 |@t$-2^{23}$@><=b<@t$2^{23}$@>|. 11514 11515 \yskip\hang|@!w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long, 11516 |@t$-2^{31}$@><=b<@t$2^{31}$@>|. 11517 11518 \yskip\hang|x0| 152. Set |h:=h+x|; i.e., move right |x| units. The `|x|' 11519 commands are like the `|w|' commands except that they involve |x| instead 11520 of |w|. 11521 11522 \yskip\hang|x1| 153 |b[1]|. Set |x:=b| and |h:=h+b|. The value of |b| is a 11523 signed quantity in two's complement notation, |-128<=b<128|. This command 11524 changes the current |x|~spacing and moves right by |b|. 11525 11526 \yskip\hang|@!x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long, 11527 |-32768<=b<32768|. 11528 11529 \yskip\hang|@!x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long, 11530 |@t$-2^{23}$@><=b<@t$2^{23}$@>|. 11531 11532 \yskip\hang|@!x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long, 11533 |@t$-2^{31}$@><=b<@t$2^{31}$@>|. 11534 11535 \yskip\hang|down1| 157 |a[1]|. Set |v:=v+a|, i.e., move down |a| units. 11536 The parameter is a signed number in two's complement notation, |-128<=a<128|; 11537 if |a<0|, the reference point moves up. 11538 11539 \yskip\hang|@!down2| 158 |a[2]|. Same as |down1|, except that |a| is a 11540 two-byte quantity in the range |-32768<=a<32768|. 11541 11542 \yskip\hang|@!down3| 159 |a[3]|. Same as |down1|, except that |a| is a 11543 three-byte quantity in the range |@t$-2^{23}$@><=a<@t$2^{23}$@>|. 11544 11545 \yskip\hang|@!down4| 160 |a[4]|. Same as |down1|, except that |a| is a 11546 four-byte quantity in the range |@t$-2^{31}$@><=a<@t$2^{31}$@>|. 11547 11548 \yskip\hang|y0| 161. Set |v:=v+y|; i.e., move down |y| units. With luck, 11549 this parameterless command will usually suffice, because the same kind of motion 11550 will occur several times in succession; the following commands explain how 11551 |y| gets particular values. 11552 11553 \yskip\hang|y1| 162 |a[1]|. Set |y:=a| and |v:=v+a|. The value of |a| is a 11554 signed quantity in two's complement notation, |-128<=a<128|. This command 11555 changes the current |y|~spacing and moves down by |a|. 11556 11557 \yskip\hang|@!y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long, 11558 |-32768<=a<32768|. 11559 11560 \yskip\hang|@!y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long, 11561 |@t$-2^{23}$@><=a<@t$2^{23}$@>|. 11562 11563 \yskip\hang|@!y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long, 11564 |@t$-2^{31}$@><=a<@t$2^{31}$@>|. 11565 11566 \yskip\hang|z0| 166. Set |v:=v+z|; i.e., move down |z| units. The `|z|' commands 11567 are like the `|y|' commands except that they involve |z| instead of |y|. 11568 11569 \yskip\hang|z1| 167 |a[1]|. Set |z:=a| and |v:=v+a|. The value of |a| is a 11570 signed quantity in two's complement notation, |-128<=a<128|. This command 11571 changes the current |z|~spacing and moves down by |a|. 11572 11573 \yskip\hang|@!z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long, 11574 |-32768<=a<32768|. 11575 11576 \yskip\hang|@!z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long, 11577 |@t$-2^{23}$@><=a<@t$2^{23}$@>|. 11578 11579 \yskip\hang|@!z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long, 11580 |@t$-2^{31}$@><=a<@t$2^{31}$@>|. 11581 11582 \yskip\hang|fnt_num_0| 171. Set |f:=0|. Font 0 must previously have been 11583 defined by a \\{fnt\_def} instruction, as explained below. 11584 11585 \yskip\hang\\{fnt\_num\_1} through \\{fnt\_num\_63} (opcodes 172 to 234). Set 11586 |f:=1|, \dots, \hbox{|f:=63|}, respectively. 11587 11588 \yskip\hang|fnt1| 235 |k[1]|. Set |f:=k|. \TeX82 uses this command for font 11589 numbers in the range |64<=k<256|. 11590 11591 \yskip\hang|@!fnt2| 236 |k[2]|. Same as |fnt1|, except that |k|~is two 11592 bytes long, so it is in the range |0<=k<65536|. \TeX82 never generates this 11593 command, but large font numbers may prove useful for specifications of 11594 color or texture, or they may be used for special fonts that have fixed 11595 numbers in some external coding scheme. 11596 11597 \yskip\hang|@!fnt3| 237 |k[3]|. Same as |fnt1|, except that |k|~is three 11598 bytes long, so it can be as large as $2^{24}-1$. 11599 11600 \yskip\hang|@!fnt4| 238 |k[4]|. Same as |fnt1|, except that |k|~is four 11601 bytes long; this is for the really big font numbers (and for the negative ones). 11602 11603 \yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in 11604 general; it functions as a $(k+2)$-byte |nop| unless special \.{DVI}-reading 11605 programs are being used. \TeX82 generates |xxx1| when a short enough 11606 \.{\\special} appears, setting |k| to the number of bytes being sent. It 11607 is recommended that |x| be a string having the form of a keyword followed 11608 by possible parameters relevant to that keyword. 11609 11610 \yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0<=k<65536|. 11611 11612 \yskip\hang|@!xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0<=k<@t$2^{24}$@>|. 11613 11614 \yskip\hang|xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be ridiculously 11615 large. \TeX82 uses |xxx4| when sending a string of length 256 or more. 11616 11617 \yskip\hang|fnt_def1| 243 |k[1]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11618 Define font |k|, where |0<=k<256|; font definitions will be explained shortly. 11619 11620 \yskip\hang|@!fnt_def2| 244 |k[2]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11621 Define font |k|, where |0<=k<65536|. 11622 11623 \yskip\hang|@!fnt_def3| 245 |k[3]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11624 Define font |k|, where |0<=k<@t$2^{24}$@>|. 11625 11626 \yskip\hang|@!fnt_def4| 246 |k[4]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11627 Define font |k|, where |@t$-2^{31}$@><=k<@t$2^{31}$@>|. 11628 11629 \yskip\hang|pre| 247 |i[1]| |num[4]| |den[4]| |mag[4]| |k[1]| |x[k]|. 11630 Beginning of the preamble; this must come at the very beginning of the 11631 file. Parameters |i|, |num|, |den|, |mag|, |k|, and |x| are explained below. 11632 11633 \yskip\hang|post| 248. Beginning of the postamble, see below. 11634 11635 \yskip\hang|post_post| 249. Ending of the postamble, see below. 11636 11637 \yskip\noindent Commands 250--255 are undefined at the present time. 11638 11639 @ @d set_char_0=0 {typeset character 0 and move right} 11640 @d set1=128 {typeset a character and move right} 11641 @d set_rule=132 {typeset a rule and move right} 11642 @d put_rule=137 {typeset a rule} 11643 @d nop=138 {no operation} 11644 @d bop=139 {beginning of page} 11645 @d eop=140 {ending of page} 11646 @d push=141 {save the current positions} 11647 @d pop=142 {restore previous positions} 11648 @d right1=143 {move right} 11649 @d w0=147 {move right by |w|} 11650 @d w1=148 {move right and set |w|} 11651 @d x0=152 {move right by |x|} 11652 @d x1=153 {move right and set |x|} 11653 @d down1=157 {move down} 11654 @d y0=161 {move down by |y|} 11655 @d y1=162 {move down and set |y|} 11656 @d z0=166 {move down by |z|} 11657 @d z1=167 {move down and set |z|} 11658 @d fnt_num_0=171 {set current font to 0} 11659 @d fnt1=235 {set current font} 11660 @d xxx1=239 {extension to \.{DVI} primitives} 11661 @d xxx4=242 {potentially long extension to \.{DVI} primitives} 11662 @d fnt_def1=243 {define the meaning of a font number} 11663 @d pre=247 {preamble} 11664 @d post=248 {postamble beginning} 11665 @d post_post=249 {postamble ending} 11666 11667 @ The preamble contains basic information about the file as a whole. As 11668 stated above, there are six parameters: 11669 $$\hbox{|@!i[1]| |@!num[4]| |@!den[4]| |@!mag[4]| |@!k[1]| |@!x[k]|.}$$ 11670 The |i| byte identifies \.{DVI} format; currently this byte is always set 11671 to~2. (The value |i=3| is currently used for an extended format that 11672 allows a mixture of right-to-left and left-to-right typesetting. 11673 Some day we will set |i=4|, when \.{DVI} format makes another 11674 incompatible change---perhaps in the year 2048.) 11675 11676 The next two parameters, |num| and |den|, are positive integers that define 11677 the units of measurement; they are the numerator and denominator of a 11678 fraction by which all dimensions in the \.{DVI} file could be multiplied 11679 in order to get lengths in units of $10^{-7}$ meters. Since $\rm 7227{pt} = 11680 254{cm}$, and since \TeX\ works with scaled points where there are $2^{16}$ 11681 sp in a point, \TeX\ sets 11682 $|num|/|den|=(254\cdot10^5)/(7227\cdot2^{16})=25400000/473628672$. 11683 @^sp@> 11684 11685 The |mag| parameter is what \TeX\ calls \.{\\mag}, i.e., 1000 times the 11686 desired magnification. The actual fraction by which dimensions are 11687 multiplied is therefore $|mag|\cdot|num|/1000|den|$. Note that if a \TeX\ 11688 source document does not call for any `\.{true}' dimensions, and if you 11689 change it only by specifying a different \.{\\mag} setting, the \.{DVI} 11690 file that \TeX\ creates will be completely unchanged except for the value 11691 of |mag| in the preamble and postamble. (Fancy \.{DVI}-reading programs allow 11692 users to override the |mag|~setting when a \.{DVI} file is being printed.) 11693 11694 Finally, |k| and |x| allow the \.{DVI} writer to include a comment, which is not 11695 interpreted further. The length of comment |x| is |k|, where |0<=k<256|. 11696 11697 @d id_byte=2 {identifies the kind of \.{DVI} files described here} 11698 11699 @ Font definitions for a given font number |k| contain further parameters 11700 $$\hbox{|c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.}$$ 11701 The four-byte value |c| is the check sum that \TeX\ found in the \.{TFM} 11702 file for this font; |c| should match the check sum of the font found by 11703 programs that read this \.{DVI} file. 11704 @^check sum@> 11705 11706 Parameter |s| contains a fixed-point scale factor that is applied to 11707 the character widths in font |k|; font dimensions in \.{TFM} files and 11708 other font files are relative to this quantity, which is called the 11709 ``at size'' elsewhere in this documentation. The value of |s| is 11710 always positive and less than $2^{27}$. It is given in the same units 11711 as the other \.{DVI} dimensions, i.e., in sp when \TeX82 has made the 11712 file. Parameter |d| is similar to |s|; it is the ``design size,'' and 11713 (like~|s|) it is given in \.{DVI} units. Thus, font |k| is to be used 11714 at $|mag|\cdot s/1000d$ times its normal size. 11715 11716 The remaining part of a font definition gives the external name of the font, 11717 which is an ASCII string of length |a+l|. The number |a| is the length 11718 of the ``area'' or directory, and |l| is the length of the font name itself; 11719 the standard local system font area is supposed to be used when |a=0|. 11720 The |n| field contains the area in its first |a| bytes. 11721 11722 Font definitions must appear before the first use of a particular font number. 11723 Once font |k| is defined, it must not be defined again; however, we 11724 shall see below that font definitions appear in the postamble as well as 11725 in the pages, so in this sense each font number is defined exactly twice, 11726 if at all. Like |nop| commands, font definitions can 11727 appear before the first |bop|, or between an |eop| and a |bop|. 11728 11729 @ Sometimes it is desirable to make horizontal or vertical rules line up 11730 precisely with certain features in characters of a font. It is possible to 11731 guarantee the correct matching between \.{DVI} output and the characters 11732 generated by \MF\ by adhering to the following principles: (1)~The \MF\ 11733 characters should be positioned so that a bottom edge or left edge that is 11734 supposed to line up with the bottom or left edge of a rule appears at the 11735 reference point, i.e., in row~0 and column~0 of the \MF\ raster. This 11736 ensures that the position of the rule will not be rounded differently when 11737 the pixel size is not a perfect multiple of the units of measurement in 11738 the \.{DVI} file. (2)~A typeset rule of height $a>0$ and width $b>0$ 11739 should be equivalent to a \MF-generated character having black pixels in 11740 precisely those raster positions whose \MF\ coordinates satisfy 11741 |0<=x<@t$\alpha$@>b| and |0<=y<@t$\alpha$@>a|, where $\alpha$ is the number 11742 of pixels per \.{DVI} unit. 11743 @:METAFONT}{\MF@> 11744 @^alignment of rules with characters@> 11745 @^rules aligning with characters@> 11746 11747 @ The last page in a \.{DVI} file is followed by `|post|'; this command 11748 introduces the postamble, which summarizes important facts that \TeX\ has 11749 accumulated about the file, making it possible to print subsets of the data 11750 with reasonable efficiency. The postamble has the form 11751 $$\vbox{\halign{\hbox{#\hfil}\cr 11752 |post| |p[4]| |num[4]| |den[4]| |mag[4]| |l[4]| |u[4]| |s[2]| |t[2]|\cr 11753 $\langle\,$font definitions$\,\rangle$\cr 11754 |post_post| |q[4]| |i[1]| 223's$[{\G}4]$\cr}}$$ 11755 Here |p| is a pointer to the final |bop| in the file. The next three 11756 parameters, |num|, |den|, and |mag|, are duplicates of the quantities that 11757 appeared in the preamble. 11758 11759 Parameters |l| and |u| give respectively the height-plus-depth of the tallest 11760 page and the width of the widest page, in the same units as other dimensions 11761 of the file. These numbers might be used by a \.{DVI}-reading program to 11762 position individual ``pages'' on large sheets of film or paper; however, 11763 the standard convention for output on normal size paper is to position each 11764 page so that the upper left-hand corner is exactly one inch from the left 11765 and the top. Experience has shown that it is unwise to design \.{DVI}-to-printer 11766 software that attempts cleverly to center the output; a fixed position of 11767 the upper left corner is easiest for users to understand and to work with. 11768 Therefore |l| and~|u| are often ignored. 11769 11770 Parameter |s| is the maximum stack depth (i.e., the largest excess of 11771 |push| commands over |pop| commands) needed to process this file. Then 11772 comes |t|, the total number of pages (|bop| commands) present. 11773 11774 The postamble continues with font definitions, which are any number of 11775 \\{fnt\_def} commands as described above, possibly interspersed with |nop| 11776 commands. Each font number that is used in the \.{DVI} file must be defined 11777 exactly twice: Once before it is first selected by a \\{fnt} command, and once 11778 in the postamble. 11779 11780 @ The last part of the postamble, following the |post_post| byte that 11781 signifies the end of the font definitions, contains |q|, a pointer to the 11782 |post| command that started the postamble. An identification byte, |i|, 11783 comes next; this currently equals~2, as in the preamble. 11784 11785 The |i| byte is followed by four or more bytes that are all equal to 11786 the decimal number 223 (i.e., @'337 in octal). \TeX\ puts out four to seven of 11787 these trailing bytes, until the total length of the file is a multiple of 11788 four bytes, since this works out best on machines that pack four bytes per 11789 word; but any number of 223's is allowed, as long as there are at least four 11790 of them. In effect, 223 is a sort of signature that is added at the very end. 11791 @^Fuchs, David Raymond@> 11792 11793 This curious way to finish off a \.{DVI} file makes it feasible for 11794 \.{DVI}-reading programs to find the postamble first, on most computers, 11795 even though \TeX\ wants to write the postamble last. Most operating 11796 systems permit random access to individual words or bytes of a file, so 11797 the \.{DVI} reader can start at the end and skip backwards over the 223's 11798 until finding the identification byte. Then it can back up four bytes, read 11799 |q|, and move to byte |q| of the file. This byte should, of course, 11800 contain the value 248 (|post|); now the postamble can be read, so the 11801 \.{DVI} reader can discover all the information needed for typesetting the 11802 pages. Note that it is also possible to skip through the \.{DVI} file at 11803 reasonably high speed to locate a particular page, if that proves 11804 desirable. This saves a lot of time, since \.{DVI} files used in production 11805 jobs tend to be large. 11806 11807 Unfortunately, however, standard \PASCAL\ does not include the ability to 11808 @^system dependencies@> 11809 access a random position in a file, or even to determine the length of a file. 11810 Almost all systems nowadays provide the necessary capabilities, so \.{DVI} 11811 format has been designed to work most efficiently with modern operating systems. 11812 But if \.{DVI} files have to be processed under the restrictions of standard 11813 \PASCAL, one can simply read them from front to back, since the necessary 11814 header information is present in the preamble and in the font definitions. 11815 (The |l| and |u| and |s| and |t| parameters, which appear only in the 11816 postamble, are ``frills'' that are handy but not absolutely necessary.) 11817 11818 @* \[32] Shipping pages out. 11819 After considering \TeX's eyes and stomach, we come now to the bowels. 11820 @^bowels@> 11821 11822 The |ship_out| procedure is given a pointer to a box; its mission is 11823 to describe that box in \.{DVI} form, outputting a ``page'' to |dvi_file|. 11824 The \.{DVI} coordinates $(h,v)=(0,0)$ should correspond to the upper left 11825 corner of the box being shipped. 11826 11827 Since boxes can be inside of boxes inside of boxes, the main work of 11828 |ship_out| is done by two mutually recursive routines, |hlist_out| 11829 and |vlist_out|, which traverse the hlists and vlists inside of horizontal 11830 and vertical boxes. 11831 11832 As individual pages are being processed, we need to accumulate 11833 information about the entire set of pages, since such statistics must be 11834 reported in the postamble. The global variables |total_pages|, |max_v|, 11835 |max_h|, |max_push|, and |last_bop| are used to record this information. 11836 11837 The variable |doing_leaders| is |true| while leaders are being output. 11838 The variable |dead_cycles| contains the number of times an output routine 11839 has been initiated since the last |ship_out|. 11840 11841 A few additional global variables are also defined here for use in 11842 |vlist_out| and |hlist_out|. They could have been local variables, but 11843 that would waste stack space when boxes are deeply nested, since the 11844 values of these variables are not needed during recursive calls. 11845 @^recursion@> 11846 11847 @<Glob...@>= 11848 @!total_pages:integer; {the number of pages that have been shipped out} 11849 @!max_v:scaled; {maximum height-plus-depth of pages shipped so far} 11850 @!max_h:scaled; {maximum width of pages shipped so far} 11851 @!max_push:integer; {deepest nesting of |push| commands encountered so far} 11852 @!last_bop:integer; {location of previous |bop| in the \.{DVI} output} 11853 @!dead_cycles:integer; {recent outputs that didn't ship anything out} 11854 @!doing_leaders:boolean; {are we inside a leader box?} 11855 @# 11856 @!c,@!f:quarterword; {character and font in current |char_node|} 11857 @!rule_ht,@!rule_dp,@!rule_wd:scaled; {size of current rule being output} 11858 @!g:pointer; {current glue specification} 11859 @!lq,@!lr:integer; {quantities used in calculations for leaders} 11860 11861 @ @<Set init...@>= 11862 total_pages:=0; max_v:=0; max_h:=0; max_push:=0; last_bop:=-1; 11863 doing_leaders:=false; dead_cycles:=0; cur_s:=-1; 11864 11865 @ The \.{DVI} bytes are output to a buffer instead of being written directly 11866 to the output file. This makes it possible to reduce the overhead of 11867 subroutine calls, thereby measurably speeding up the computation, since 11868 output of \.{DVI} bytes is part of \TeX's inner loop. And it has another 11869 advantage as well, since we can change instructions in the buffer in order to 11870 make the output more compact. For example, a `|down2|' command can be 11871 changed to a `|y2|', thereby making a subsequent `|y0|' command possible, 11872 saving two bytes. 11873 11874 The output buffer is divided into two parts of equal size; the bytes found 11875 in |dvi_buf[0..half_buf-1]| constitute the first half, and those in 11876 |dvi_buf[half_buf..dvi_buf_size-1]| constitute the second. The global 11877 variable |dvi_ptr| points to the position that will receive the next 11878 output byte. When |dvi_ptr| reaches |dvi_limit|, which is always equal 11879 to one of the two values |half_buf| or |dvi_buf_size|, the half buffer that 11880 is about to be invaded next is sent to the output and |dvi_limit| is 11881 changed to its other value. Thus, there is always at least a half buffer's 11882 worth of information present, except at the very beginning of the job. 11883 11884 Bytes of the \.{DVI} file are numbered sequentially starting with 0; 11885 the next byte to be generated will be number |dvi_offset+dvi_ptr|. 11886 A byte is present in the buffer only if its number is |>=dvi_gone|. 11887 11888 @<Types...@>= 11889 @!dvi_index=0..dvi_buf_size; {an index into the output buffer} 11890 11891 @ Some systems may find it more efficient to make |dvi_buf| a |packed| 11892 array, since output of four bytes at once may be facilitated. 11893 @^system dependencies@> 11894 11895 @<Glob...@>= 11896 @!dvi_buf:array[dvi_index] of eight_bits; {buffer for \.{DVI} output} 11897 @!half_buf:dvi_index; {half of |dvi_buf_size|} 11898 @!dvi_limit:dvi_index; {end of the current half buffer} 11899 @!dvi_ptr:dvi_index; {the next available buffer address} 11900 @!dvi_offset:integer; {|dvi_buf_size| times the number of times the 11901 output buffer has been fully emptied} 11902 @!dvi_gone:integer; {the number of bytes already output to |dvi_file|} 11903 11904 @ Initially the buffer is all in one piece; we will output half of it only 11905 after it first fills up. 11906 11907 @<Set init...@>= 11908 half_buf:=dvi_buf_size div 2; dvi_limit:=dvi_buf_size; dvi_ptr:=0; 11909 dvi_offset:=0; dvi_gone:=0; 11910 11911 @ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling 11912 |write_dvi(a,b)|. For best results, this procedure should be optimized to 11913 run as fast as possible on each particular system, since it is part of 11914 \TeX's inner loop. It is safe to assume that |a| and |b+1| will both be 11915 multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on 11916 many machines to use efficient methods to pack four bytes per word and to 11917 output an array of words with one system call. 11918 @^system dependencies@> 11919 @^inner loop@> 11920 @^defecation@> 11921 11922 @p procedure write_dvi(@!a,@!b:dvi_index); 11923 var k:dvi_index; 11924 begin for k:=a to b do write(dvi_file,dvi_buf[k]); 11925 end; 11926 11927 @ To put a byte in the buffer without paying the cost of invoking a procedure 11928 each time, we use the macro |dvi_out|. 11929 11930 @d dvi_out(#)==@+begin dvi_buf[dvi_ptr]:=#; incr(dvi_ptr); 11931 if dvi_ptr=dvi_limit then dvi_swap; 11932 end 11933 11934 @p procedure dvi_swap; {outputs half of the buffer} 11935 begin if dvi_limit=dvi_buf_size then 11936 begin write_dvi(0,half_buf-1); dvi_limit:=half_buf; 11937 dvi_offset:=dvi_offset+dvi_buf_size; dvi_ptr:=0; 11938 end 11939 else begin write_dvi(half_buf,dvi_buf_size-1); dvi_limit:=dvi_buf_size; 11940 end; 11941 dvi_gone:=dvi_gone+half_buf; 11942 end; 11943 11944 @ Here is how we clean out the buffer when \TeX\ is all through; |dvi_ptr| 11945 will be a multiple of~4. 11946 11947 @<Empty the last bytes out of |dvi_buf|@>= 11948 if dvi_limit=half_buf then write_dvi(half_buf,dvi_buf_size-1); 11949 if dvi_ptr>0 then write_dvi(0,dvi_ptr-1) 11950 11951 @ The |dvi_four| procedure outputs four bytes in two's complement notation, 11952 without risking arithmetic overflow. 11953 11954 @p procedure dvi_four(@!x:integer); 11955 begin if x>=0 then dvi_out(x div @'100000000) 11956 else begin x:=x+@'10000000000; 11957 x:=x+@'10000000000; 11958 dvi_out((x div @'100000000) + 128); 11959 end; 11960 x:=x mod @'100000000; dvi_out(x div @'200000); 11961 x:=x mod @'200000; dvi_out(x div @'400); 11962 dvi_out(x mod @'400); 11963 end; 11964 11965 @ A mild optimization of the output is performed by the |dvi_pop| 11966 routine, which issues a |pop| unless it is possible to cancel a 11967 `|push| |pop|' pair. The parameter to |dvi_pop| is the byte address 11968 following the old |push| that matches the new |pop|. 11969 11970 @p procedure dvi_pop(@!l:integer); 11971 begin if (l=dvi_offset+dvi_ptr)and(dvi_ptr>0) then decr(dvi_ptr) 11972 else dvi_out(pop); 11973 end; 11974 11975 @ Here's a procedure that outputs a font definition. Since \TeX82 uses at 11976 most 256 different fonts per job, |fnt_def1| is always used as the command code. 11977 11978 @p procedure dvi_font_def(@!f:internal_font_number); 11979 var k:pool_pointer; {index into |str_pool|} 11980 begin dvi_out(fnt_def1); 11981 dvi_out(f-font_base-1);@/ 11982 dvi_out(qo(font_check[f].b0)); 11983 dvi_out(qo(font_check[f].b1)); 11984 dvi_out(qo(font_check[f].b2)); 11985 dvi_out(qo(font_check[f].b3));@/ 11986 dvi_four(font_size[f]); 11987 dvi_four(font_dsize[f]);@/ 11988 dvi_out(length(font_area[f])); 11989 dvi_out(length(font_name[f])); 11990 @<Output the font name whose internal number is |f|@>; 11991 end; 11992 11993 @ @<Output the font name whose internal number is |f|@>= 11994 for k:=str_start[font_area[f]] to str_start[font_area[f]+1]-1 do 11995 dvi_out(so(str_pool[k])); 11996 for k:=str_start[font_name[f]] to str_start[font_name[f]+1]-1 do 11997 dvi_out(so(str_pool[k])) 11998 11999 @ Versions of \TeX\ intended for small computers might well choose to omit 12000 the ideas in the next few parts of this program, since it is not really 12001 necessary to optimize the \.{DVI} code by making use of the |w0|, |x0|, 12002 |y0|, and |z0| commands. Furthermore, the algorithm that we are about to 12003 describe does not pretend to give an optimum reduction in the length 12004 of the \.{DVI} code; after all, speed is more important than compactness. 12005 But the method is surprisingly effective, and it takes comparatively little 12006 time. 12007 12008 We can best understand the basic idea by first considering a simpler problem 12009 that has the same essential characteristics. Given a sequence of digits, 12010 say $3\,1\,4\,1\,5\,9\,2\,6\,5\,3\,5\,8\,9$, we want to assign subscripts 12011 $d$, $y$, or $z$ to each digit so as to maximize the number of ``$y$-hits'' 12012 and ``$z$-hits''; a $y$-hit is an instance of two appearances of the same 12013 digit with the subscript $y$, where no $y$'s intervene between the two 12014 appearances, and a $z$-hit is defined similarly. For example, the sequence 12015 above could be decorated with subscripts as follows: 12016 $$3_z\,1_y\,4_d\,1_y\,5_y\,9_d\,2_d\,6_d\,5_y\,3_z\,5_y\,8_d\,9_d.$$ 12017 There are three $y$-hits ($1_y\ldots1_y$ and $5_y\ldots5_y\ldots5_y$) and 12018 one $z$-hit ($3_z\ldots3_z$); there are no $d$-hits, since the two appearances 12019 of $9_d$ have $d$'s between them, but we don't count $d$-hits so it doesn't 12020 matter how many there are. These subscripts are analogous to the \.{DVI} 12021 commands called \\{down}, $y$, and $z$, and the digits are analogous to 12022 different amounts of vertical motion; a $y$-hit or $z$-hit corresponds to 12023 the opportunity to use the one-byte commands |y0| or |z0| in a \.{DVI} file. 12024 12025 \TeX's method of assigning subscripts works like this: Append a new digit, 12026 say $\delta$, to the right of the sequence. Now look back through the 12027 sequence until one of the following things happens: (a)~You see 12028 $\delta_y$ or $\delta_z$, and this was the first time you encountered a 12029 $y$ or $z$ subscript, respectively. Then assign $y$ or $z$ to the new 12030 $\delta$; you have scored a hit. (b)~You see $\delta_d$, and no $y$ 12031 subscripts have been encountered so far during this search. Then change 12032 the previous $\delta_d$ to $\delta_y$ (this corresponds to changing a 12033 command in the output buffer), and assign $y$ to the new $\delta$; it's 12034 another hit. (c)~You see $\delta_d$, and a $y$ subscript has been seen 12035 but not a $z$. Change the previous $\delta_d$ to $\delta_z$ and assign 12036 $z$ to the new $\delta$. (d)~You encounter both $y$ and $z$ subscripts 12037 before encountering a suitable $\delta$, or you scan all the way to the 12038 front of the sequence. Assign $d$ to the new $\delta$; this assignment may 12039 be changed later. 12040 12041 The subscripts $3_z\,1_y\,4_d\ldots\,$ in the example above were, in fact, 12042 produced by this procedure, as the reader can verify. (Go ahead and try it.) 12043 12044 @ In order to implement such an idea, \TeX\ maintains a stack of pointers 12045 to the \\{down}, $y$, and $z$ commands that have been generated for the 12046 current page. And there is a similar stack for \\{right}, |w|, and |x| 12047 commands. These stacks are called the down stack and right stack, and their 12048 top elements are maintained in the variables |down_ptr| and |right_ptr|. 12049 12050 Each entry in these stacks contains four fields: The |width| field is 12051 the amount of motion down or to the right; the |location| field is the 12052 byte number of the \.{DVI} command in question (including the appropriate 12053 |dvi_offset|); the |link| field points to the next item below this one 12054 on the stack; and the |info| field encodes the options for possible change 12055 in the \.{DVI} command. 12056 12057 @d movement_node_size=3 {number of words per entry in the down and right stacks} 12058 @d location(#)==mem[#+2].int {\.{DVI} byte number for a movement command} 12059 12060 @<Glob...@>= 12061 @!down_ptr,@!right_ptr:pointer; {heads of the down and right stacks} 12062 12063 @ @<Set init...@>= 12064 down_ptr:=null; right_ptr:=null; 12065 12066 @ Here is a subroutine that produces a \.{DVI} command for some specified 12067 downward or rightward motion. It has two parameters: |w| is the amount 12068 of motion, and |o| is either |down1| or |right1|. We use the fact that 12069 the command codes have convenient arithmetic properties: |y1-down1=w1-right1| 12070 and |z1-down1=x1-right1|. 12071 12072 @p procedure movement(@!w:scaled;@!o:eight_bits); 12073 label exit,found,not_found,2,1; 12074 var mstate:small_number; {have we seen a |y| or |z|?} 12075 @!p,@!q:pointer; {current and top nodes on the stack} 12076 @!k:integer; {index into |dvi_buf|, modulo |dvi_buf_size|} 12077 begin q:=get_node(movement_node_size); {new node for the top of the stack} 12078 width(q):=w; location(q):=dvi_offset+dvi_ptr; 12079 if o=down1 then 12080 begin link(q):=down_ptr; down_ptr:=q; 12081 end 12082 else begin link(q):=right_ptr; right_ptr:=q; 12083 end; 12084 @<Look at the other stack entries until deciding what sort of \.{DVI} command 12085 to generate; |goto found| if node |p| is a ``hit''@>; 12086 @<Generate a |down| or |right| command for |w| and |return|@>; 12087 found: @<Generate a |y0| or |z0| command in order to reuse a previous 12088 appearance of~|w|@>; 12089 exit:end; 12090 12091 @ The |info| fields in the entries of the down stack or the right stack 12092 have six possible settings: |y_here| or |z_here| mean that the \.{DVI} 12093 command refers to |y| or |z|, respectively (or to |w| or |x|, in the 12094 case of horizontal motion); |yz_OK| means that the \.{DVI} command is 12095 \\{down} (or \\{right}) but can be changed to either |y| or |z| (or 12096 to either |w| or |x|); |y_OK| means that it is \\{down} and can be changed 12097 to |y| but not |z|; |z_OK| is similar; and |d_fixed| means it must stay 12098 \\{down}. 12099 12100 The four settings |yz_OK|, |y_OK|, |z_OK|, |d_fixed| would not need to 12101 be distinguished from each other if we were simply solving the 12102 digit-subscripting problem mentioned above. But in \TeX's case there is 12103 a complication because of the nested structure of |push| and |pop| 12104 commands. Suppose we add parentheses to the digit-subscripting problem, 12105 redefining hits so that $\delta_y\ldots \delta_y$ is a hit if all $y$'s between 12106 the $\delta$'s are enclosed in properly nested parentheses, and if the 12107 parenthesis level of the right-hand $\delta_y$ is deeper than or equal to 12108 that of the left-hand one. Thus, `(' and `)' correspond to `|push|' 12109 and `|pop|'. Now if we want to assign a subscript to the final 1 in the 12110 sequence 12111 $$2_y\,7_d\,1_d\,(\,8_z\,2_y\,8_z\,)\,1$$ 12112 we cannot change the previous $1_d$ to $1_y$, since that would invalidate 12113 the $2_y\ldots2_y$ hit. But we can change it to $1_z$, scoring a hit 12114 since the intervening $8_z$'s are enclosed in parentheses. 12115 12116 The program below removes movement nodes that are introduced after a |push|, 12117 before it outputs the corresponding |pop|. 12118 12119 @d y_here=1 {|info| when the movement entry points to a |y| command} 12120 @d z_here=2 {|info| when the movement entry points to a |z| command} 12121 @d yz_OK=3 {|info| corresponding to an unconstrained \\{down} command} 12122 @d y_OK=4 {|info| corresponding to a \\{down} that can't become a |z|} 12123 @d z_OK=5 {|info| corresponding to a \\{down} that can't become a |y|} 12124 @d d_fixed=6 {|info| corresponding to a \\{down} that can't change} 12125 12126 @ When the |movement| procedure gets to the label |found|, the value of 12127 |info(p)| will be either |y_here| or |z_here|. If it is, say, |y_here|, 12128 the procedure generates a |y0| command (or a |w0| command), and marks 12129 all |info| fields between |q| and |p| so that |y| is not OK in that range. 12130 12131 @<Generate a |y0| or |z0| command...@>= 12132 info(q):=info(p); 12133 if info(q)=y_here then 12134 begin dvi_out(o+y0-down1); {|y0| or |w0|} 12135 while link(q)<>p do 12136 begin q:=link(q); 12137 case info(q) of 12138 yz_OK: info(q):=z_OK; 12139 y_OK: info(q):=d_fixed; 12140 othercases do_nothing 12141 endcases; 12142 end; 12143 end 12144 else begin dvi_out(o+z0-down1); {|z0| or |x0|} 12145 while link(q)<>p do 12146 begin q:=link(q); 12147 case info(q) of 12148 yz_OK: info(q):=y_OK; 12149 z_OK: info(q):=d_fixed; 12150 othercases do_nothing 12151 endcases; 12152 end; 12153 end 12154 12155 @ @<Generate a |down| or |right|...@>= 12156 info(q):=yz_OK; 12157 if abs(w)>=@'40000000 then 12158 begin dvi_out(o+3); {|down4| or |right4|} 12159 dvi_four(w); return; 12160 end; 12161 if abs(w)>=@'100000 then 12162 begin dvi_out(o+2); {|down3| or |right3|} 12163 if w<0 then w:=w+@'100000000; 12164 dvi_out(w div @'200000); w:=w mod @'200000; goto 2; 12165 end; 12166 if abs(w)>=@'200 then 12167 begin dvi_out(o+1); {|down2| or |right2|} 12168 if w<0 then w:=w+@'200000; 12169 goto 2; 12170 end; 12171 dvi_out(o); {|down1| or |right1|} 12172 if w<0 then w:=w+@'400; 12173 goto 1; 12174 2: dvi_out(w div @'400); 12175 1: dvi_out(w mod @'400); return 12176 12177 @ As we search through the stack, we are in one of three states, 12178 |y_seen|, |z_seen|, or |none_seen|, depending on whether we have 12179 encountered |y_here| or |z_here| nodes. These states are encoded as 12180 multiples of 6, so that they can be added to the |info| fields for quick 12181 decision-making. 12182 @^inner loop@> 12183 12184 @d none_seen=0 {no |y_here| or |z_here| nodes have been encountered yet} 12185 @d y_seen=6 {we have seen |y_here| but not |z_here|} 12186 @d z_seen=12 {we have seen |z_here| but not |y_here|} 12187 12188 @<Look at the other stack entries until deciding...@>= 12189 p:=link(q); mstate:=none_seen; 12190 while p<>null do 12191 begin if width(p)=w then @<Consider a node with matching width; 12192 |goto found| if it's a hit@> 12193 else case mstate+info(p) of 12194 none_seen+y_here: mstate:=y_seen; 12195 none_seen+z_here: mstate:=z_seen; 12196 y_seen+z_here,z_seen+y_here: goto not_found; 12197 othercases do_nothing 12198 endcases; 12199 p:=link(p); 12200 end; 12201 not_found: 12202 12203 @ We might find a valid hit in a |y| or |z| byte that is already gone 12204 from the buffer. But we can't change bytes that are gone forever; ``the 12205 moving finger writes, $\ldots\,\,$.'' 12206 12207 @<Consider a node with matching width...@>= 12208 case mstate+info(p) of 12209 none_seen+yz_OK,none_seen+y_OK,z_seen+yz_OK,z_seen+y_OK:@t@>@;@/ 12210 if location(p)<dvi_gone then goto not_found 12211 else @<Change buffered instruction to |y| or |w| and |goto found|@>; 12212 none_seen+z_OK,y_seen+yz_OK,y_seen+z_OK:@t@>@;@/ 12213 if location(p)<dvi_gone then goto not_found 12214 else @<Change buffered instruction to |z| or |x| and |goto found|@>; 12215 none_seen+y_here,none_seen+z_here,y_seen+z_here,z_seen+y_here: goto found; 12216 othercases do_nothing 12217 endcases 12218 12219 @ @<Change buffered instruction to |y| or |w| and |goto found|@>= 12220 begin k:=location(p)-dvi_offset; 12221 if k<0 then k:=k+dvi_buf_size; 12222 dvi_buf[k]:=dvi_buf[k]+y1-down1; 12223 info(p):=y_here; goto found; 12224 end 12225 12226 @ @<Change buffered instruction to |z| or |x| and |goto found|@>= 12227 begin k:=location(p)-dvi_offset; 12228 if k<0 then k:=k+dvi_buf_size; 12229 dvi_buf[k]:=dvi_buf[k]+z1-down1; 12230 info(p):=z_here; goto found; 12231 end 12232 12233 @ In case you are wondering when all the movement nodes are removed from 12234 \TeX's memory, the answer is that they are recycled just before 12235 |hlist_out| and |vlist_out| finish outputting a box. This restores the 12236 down and right stacks to the state they were in before the box was output, 12237 except that some |info|'s may have become more restrictive. 12238 12239 @p procedure prune_movements(@!l:integer); 12240 {delete movement nodes with |location>=l|} 12241 label done,exit; 12242 var p:pointer; {node being deleted} 12243 begin while down_ptr<>null do 12244 begin if location(down_ptr)<l then goto done; 12245 p:=down_ptr; down_ptr:=link(p); free_node(p,movement_node_size); 12246 end; 12247 done: while right_ptr<>null do 12248 begin if location(right_ptr)<l then return; 12249 p:=right_ptr; right_ptr:=link(p); free_node(p,movement_node_size); 12250 end; 12251 exit:end; 12252 12253 @ The actual distances by which we want to move might be computed as the 12254 sum of several separate movements. For example, there might be several 12255 glue nodes in succession, or we might want to move right by the width of 12256 some box plus some amount of glue. More importantly, the baselineskip 12257 distances are computed in terms of glue together with the depth and 12258 height of adjacent boxes, and we want the \.{DVI} file to lump these 12259 three quantities together into a single motion. 12260 12261 Therefore, \TeX\ maintains two pairs of global variables: |dvi_h| and |dvi_v| 12262 are the |h| and |v| coordinates corresponding to the commands actually 12263 output to the \.{DVI} file, while |cur_h| and |cur_v| are the coordinates 12264 corresponding to the current state of the output routines. Coordinate 12265 changes will accumulate in |cur_h| and |cur_v| without being reflected 12266 in the output, until such a change becomes necessary or desirable; we 12267 can call the |movement| procedure whenever we want to make |dvi_h=cur_h| 12268 or |dvi_v=cur_v|. 12269 12270 The current font reflected in the \.{DVI} output is called |dvi_f|; 12271 there is no need for a `\\{cur\_f}' variable. 12272 12273 The depth of nesting of |hlist_out| and |vlist_out| is called |cur_s|; 12274 this is essentially the depth of |push| commands in the \.{DVI} output. 12275 12276 @d synch_h==if cur_h<>dvi_h then 12277 begin movement(cur_h-dvi_h,right1); dvi_h:=cur_h; 12278 end 12279 @d synch_v==if cur_v<>dvi_v then 12280 begin movement(cur_v-dvi_v,down1); dvi_v:=cur_v; 12281 end 12282 12283 @<Glob...@>= 12284 @!dvi_h,@!dvi_v:scaled; {a \.{DVI} reader program thinks we are here} 12285 @!cur_h,@!cur_v:scaled; {\TeX\ thinks we are here} 12286 @!dvi_f:internal_font_number; {the current font} 12287 @!cur_s:integer; {current depth of output box nesting, initially $-1$} 12288 12289 @ @<Initialize variables as |ship_out| begins@>= 12290 dvi_h:=0; dvi_v:=0; cur_h:=h_offset; dvi_f:=null_font; 12291 ensure_dvi_open; 12292 if total_pages=0 then 12293 begin dvi_out(pre); dvi_out(id_byte); {output the preamble} 12294 @^preamble of \.{DVI} file@> 12295 dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp} 12296 prepare_mag; dvi_four(mag); {magnification factor is frozen} 12297 old_setting:=selector; selector:=new_string; 12298 print(" TeX output "); print_int(year); print_char("."); 12299 print_two(month); print_char("."); print_two(day); 12300 print_char(":"); print_two(time div 60); 12301 print_two(time mod 60); 12302 selector:=old_setting; dvi_out(cur_length); 12303 for s:=str_start[str_ptr] to pool_ptr-1 do dvi_out(so(str_pool[s])); 12304 pool_ptr:=str_start[str_ptr]; {flush the current string} 12305 end 12306 12307 @ When |hlist_out| is called, its duty is to output the box represented 12308 by the |hlist_node| pointed to by |temp_ptr|. The reference point of that 12309 box has coordinates |(cur_h,cur_v)|. 12310 12311 Similarly, when |vlist_out| is called, its duty is to output the box represented 12312 by the |vlist_node| pointed to by |temp_ptr|. The reference point of that 12313 box has coordinates |(cur_h,cur_v)|. 12314 @^recursion@> 12315 12316 @p procedure@?vlist_out; forward; {|hlist_out| and |vlist_out| are mutually 12317 recursive} 12318 12319 @ The recursive procedures |hlist_out| and |vlist_out| each have local variables 12320 |save_h| and |save_v| to hold the values of |dvi_h| and |dvi_v| just before 12321 entering a new level of recursion. In effect, the values of |save_h| and 12322 |save_v| on \TeX's run-time stack correspond to the values of |h| and |v| 12323 that a \.{DVI}-reading program will push onto its coordinate stack. 12324 12325 @d move_past=13 {go to this label when advancing past glue or a rule} 12326 @d fin_rule=14 {go to this label to finish processing a rule} 12327 @d next_p=15 {go to this label when finished with node |p|} 12328 12329 @p @t\4@>@<Declare procedures needed in |hlist_out|, |vlist_out|@>@t@>@/ 12330 procedure hlist_out; {output an |hlist_node| box} 12331 label reswitch, move_past, fin_rule, next_p; 12332 var base_line: scaled; {the baseline coordinate for this box} 12333 @!left_edge: scaled; {the left coordinate for this box} 12334 @!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to} 12335 @!this_box: pointer; {pointer to containing box} 12336 @!g_order: glue_ord; {applicable order of infinity for glue} 12337 @!g_sign: normal..shrinking; {selects type of glue} 12338 @!p:pointer; {current position in the hlist} 12339 @!save_loc:integer; {\.{DVI} byte location upon entry} 12340 @!leader_box:pointer; {the leader box being replicated} 12341 @!leader_wd:scaled; {width of leader box being replicated} 12342 @!lx:scaled; {extra space between leader boxes} 12343 @!outer_doing_leaders:boolean; {were we doing leaders?} 12344 @!edge:scaled; {left edge of sub-box, or right edge of leader space} 12345 @!glue_temp:real; {glue value before rounding} 12346 @!cur_glue:real; {glue seen so far} 12347 @!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio} 12348 begin cur_g:=0; cur_glue:=float_constant(0); 12349 this_box:=temp_ptr; g_order:=glue_order(this_box); 12350 g_sign:=glue_sign(this_box); p:=list_ptr(this_box); 12351 incr(cur_s); 12352 if cur_s>0 then dvi_out(push); 12353 if cur_s>max_push then max_push:=cur_s; 12354 save_loc:=dvi_offset+dvi_ptr; base_line:=cur_v; left_edge:=cur_h; 12355 while p<>null do @<Output node |p| for |hlist_out| and move to the next node, 12356 maintaining the condition |cur_v=base_line|@>; 12357 prune_movements(save_loc); 12358 if cur_s>0 then dvi_pop(save_loc); 12359 decr(cur_s); 12360 end; 12361 12362 @ We ought to give special care to the efficiency of one part of |hlist_out|, 12363 since it belongs to \TeX's inner loop. When a |char_node| is encountered, 12364 we save a little time by processing several nodes in succession until 12365 reaching a non-|char_node|. The program uses the fact that |set_char_0=0|. 12366 @^inner loop@> 12367 12368 @<Output node |p| for |hlist_out|...@>= 12369 reswitch: if is_char_node(p) then 12370 begin synch_h; synch_v; 12371 repeat f:=font(p); c:=character(p); 12372 if f<>dvi_f then @<Change font |dvi_f| to |f|@>; 12373 if c>=qi(128) then dvi_out(set1); 12374 dvi_out(qo(c));@/ 12375 cur_h:=cur_h+char_width(f)(char_info(f)(c)); 12376 p:=link(p); 12377 until not is_char_node(p); 12378 dvi_h:=cur_h; 12379 end 12380 else @<Output the non-|char_node| |p| for |hlist_out| 12381 and move to the next node@> 12382 12383 @ @<Change font |dvi_f| to |f|@>= 12384 begin if not font_used[f] then 12385 begin dvi_font_def(f); font_used[f]:=true; 12386 end; 12387 if f<=64+font_base then dvi_out(f-font_base-1+fnt_num_0) 12388 else begin dvi_out(fnt1); dvi_out(f-font_base-1); 12389 end; 12390 dvi_f:=f; 12391 end 12392 12393 @ @<Output the non-|char_node| |p| for |hlist_out|...@>= 12394 begin case type(p) of 12395 hlist_node,vlist_node:@<Output a box in an hlist@>; 12396 rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p); 12397 goto fin_rule; 12398 end; 12399 whatsit_node: @<Output the whatsit node |p| in an hlist@>; 12400 glue_node: @<Move right or output leaders@>; 12401 kern_node,math_node:cur_h:=cur_h+width(p); 12402 ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>; 12403 othercases do_nothing 12404 endcases;@/ 12405 goto next_p; 12406 fin_rule: @<Output a rule in an hlist@>; 12407 move_past: cur_h:=cur_h+rule_wd; 12408 next_p:p:=link(p); 12409 end 12410 12411 @ @<Output a box in an hlist@>= 12412 if list_ptr(p)=null then cur_h:=cur_h+width(p) 12413 else begin save_h:=dvi_h; save_v:=dvi_v; 12414 cur_v:=base_line+shift_amount(p); {shift the box down} 12415 temp_ptr:=p; edge:=cur_h; 12416 if type(p)=vlist_node then vlist_out@+else hlist_out; 12417 dvi_h:=save_h; dvi_v:=save_v; 12418 cur_h:=edge+width(p); cur_v:=base_line; 12419 end 12420 12421 @ @<Output a rule in an hlist@>= 12422 if is_running(rule_ht) then rule_ht:=height(this_box); 12423 if is_running(rule_dp) then rule_dp:=depth(this_box); 12424 rule_ht:=rule_ht+rule_dp; {this is the rule thickness} 12425 if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules} 12426 begin synch_h; cur_v:=base_line+rule_dp; synch_v; 12427 dvi_out(set_rule); dvi_four(rule_ht); dvi_four(rule_wd); 12428 cur_v:=base_line; dvi_h:=dvi_h+rule_wd; 12429 end 12430 12431 @ @d billion==float_constant(1000000000) 12432 @d vet_glue(#)== glue_temp:=#; 12433 if glue_temp>billion then 12434 glue_temp:=billion 12435 else if glue_temp<-billion then 12436 glue_temp:=-billion 12437 12438 @<Move right or output leaders@>= 12439 begin g:=glue_ptr(p); rule_wd:=width(g)-cur_g; 12440 if g_sign<>normal then 12441 begin if g_sign=stretching then 12442 begin if stretch_order(g)=g_order then 12443 begin cur_glue:=cur_glue+stretch(g); 12444 vet_glue(float(glue_set(this_box))*cur_glue); 12445 @^real multiplication@> 12446 cur_g:=round(glue_temp); 12447 end; 12448 end 12449 else if shrink_order(g)=g_order then 12450 begin cur_glue:=cur_glue-shrink(g); 12451 vet_glue(float(glue_set(this_box))*cur_glue); 12452 cur_g:=round(glue_temp); 12453 end; 12454 end; 12455 rule_wd:=rule_wd+cur_g; 12456 if subtype(p)>=a_leaders then 12457 @<Output leaders in an hlist, |goto fin_rule| if a rule 12458 or to |next_p| if done@>; 12459 goto move_past; 12460 end 12461 12462 @ @<Output leaders in an hlist...@>= 12463 begin leader_box:=leader_ptr(p); 12464 if type(leader_box)=rule_node then 12465 begin rule_ht:=height(leader_box); rule_dp:=depth(leader_box); 12466 goto fin_rule; 12467 end; 12468 leader_wd:=width(leader_box); 12469 if (leader_wd>0)and(rule_wd>0) then 12470 begin rule_wd:=rule_wd+10; {compensate for floating-point rounding} 12471 edge:=cur_h+rule_wd; lx:=0; 12472 @<Let |cur_h| be the position of the first box, and set |leader_wd+lx| 12473 to the spacing between corresponding parts of boxes@>; 12474 while cur_h+leader_wd<=edge do 12475 @<Output a leader box at |cur_h|, 12476 then advance |cur_h| by |leader_wd+lx|@>; 12477 cur_h:=edge-10; goto next_p; 12478 end; 12479 end 12480 12481 @ The calculations related to leaders require a bit of care. First, in the 12482 case of |a_leaders| (aligned leaders), we want to move |cur_h| to 12483 |left_edge| plus the smallest multiple of |leader_wd| for which the result 12484 is not less than the current value of |cur_h|; i.e., |cur_h| should become 12485 $|left_edge|+|leader_wd|\times\lceil 12486 (|cur_h|-|left_edge|)/|leader_wd|\rceil$. The program here should work in 12487 all cases even though some implementations of \PASCAL\ give nonstandard 12488 results for the |div| operation when |cur_h| is less than |left_edge|. 12489 12490 In the case of |c_leaders| (centered leaders), we want to increase |cur_h| 12491 by half of the excess space not occupied by the leaders; and in the 12492 case of |x_leaders| (expanded leaders) we increase |cur_h| 12493 by $1/(q+1)$ of this excess space, where $q$ is the number of times the 12494 leader box will be replicated. Slight inaccuracies in the division might 12495 accumulate; half of this rounding error is placed at each end of the leaders. 12496 12497 @<Let |cur_h| be the position of the first box, ...@>= 12498 if subtype(p)=a_leaders then 12499 begin save_h:=cur_h; 12500 cur_h:=left_edge+leader_wd*((cur_h-left_edge)@!div leader_wd); 12501 if cur_h<save_h then cur_h:=cur_h+leader_wd; 12502 end 12503 else begin lq:=rule_wd div leader_wd; {the number of box copies} 12504 lr:=rule_wd mod leader_wd; {the remaining space} 12505 if subtype(p)=c_leaders then cur_h:=cur_h+(lr div 2) 12506 else begin lx:=lr div (lq+1); 12507 cur_h:=cur_h+((lr-(lq-1)*lx) div 2); 12508 end; 12509 end 12510 12511 @ The `\\{synch}' operations here are intended to decrease the number of 12512 bytes needed to specify horizontal and vertical motion in the \.{DVI} output. 12513 12514 @<Output a leader box at |cur_h|, ...@>= 12515 begin cur_v:=base_line+shift_amount(leader_box); synch_v; save_v:=dvi_v;@/ 12516 synch_h; save_h:=dvi_h; temp_ptr:=leader_box; 12517 outer_doing_leaders:=doing_leaders; doing_leaders:=true; 12518 if type(leader_box)=vlist_node then vlist_out@+else hlist_out; 12519 doing_leaders:=outer_doing_leaders; 12520 dvi_v:=save_v; dvi_h:=save_h; cur_v:=base_line; 12521 cur_h:=save_h+leader_wd+lx; 12522 end 12523 12524 @ The |vlist_out| routine is similar to |hlist_out|, but a bit simpler. 12525 12526 @p procedure vlist_out; {output a |vlist_node| box} 12527 label move_past, fin_rule, next_p; 12528 var left_edge: scaled; {the left coordinate for this box} 12529 @!top_edge: scaled; {the top coordinate for this box} 12530 @!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to} 12531 @!this_box: pointer; {pointer to containing box} 12532 @!g_order: glue_ord; {applicable order of infinity for glue} 12533 @!g_sign: normal..shrinking; {selects type of glue} 12534 @!p:pointer; {current position in the vlist} 12535 @!save_loc:integer; {\.{DVI} byte location upon entry} 12536 @!leader_box:pointer; {the leader box being replicated} 12537 @!leader_ht:scaled; {height of leader box being replicated} 12538 @!lx:scaled; {extra space between leader boxes} 12539 @!outer_doing_leaders:boolean; {were we doing leaders?} 12540 @!edge:scaled; {bottom boundary of leader space} 12541 @!glue_temp:real; {glue value before rounding} 12542 @!cur_glue:real; {glue seen so far} 12543 @!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio} 12544 begin cur_g:=0; cur_glue:=float_constant(0); 12545 this_box:=temp_ptr; g_order:=glue_order(this_box); 12546 g_sign:=glue_sign(this_box); p:=list_ptr(this_box); 12547 incr(cur_s); 12548 if cur_s>0 then dvi_out(push); 12549 if cur_s>max_push then max_push:=cur_s; 12550 save_loc:=dvi_offset+dvi_ptr; left_edge:=cur_h; cur_v:=cur_v-height(this_box); 12551 top_edge:=cur_v; 12552 while p<>null do @<Output node |p| for |vlist_out| and move to the next node, 12553 maintaining the condition |cur_h=left_edge|@>; 12554 prune_movements(save_loc); 12555 if cur_s>0 then dvi_pop(save_loc); 12556 decr(cur_s); 12557 end; 12558 12559 @ @<Output node |p| for |vlist_out|...@>= 12560 begin if is_char_node(p) then confusion("vlistout") 12561 @:this can't happen vlistout}{\quad vlistout@> 12562 else @<Output the non-|char_node| |p| for |vlist_out|@>; 12563 next_p:p:=link(p); 12564 end 12565 12566 @ @<Output the non-|char_node| |p| for |vlist_out|@>= 12567 begin case type(p) of 12568 hlist_node,vlist_node:@<Output a box in a vlist@>; 12569 rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p); 12570 goto fin_rule; 12571 end; 12572 whatsit_node: @<Output the whatsit node |p| in a vlist@>; 12573 glue_node: @<Move down or output leaders@>; 12574 kern_node:cur_v:=cur_v+width(p); 12575 othercases do_nothing 12576 endcases;@/ 12577 goto next_p; 12578 fin_rule: @<Output a rule in a vlist, |goto next_p|@>; 12579 move_past: cur_v:=cur_v+rule_ht; 12580 end 12581 12582 @ The |synch_v| here allows the \.{DVI} output to use one-byte commands 12583 for adjusting |v| in most cases, since the baselineskip distance will 12584 usually be constant. 12585 12586 @<Output a box in a vlist@>= 12587 if list_ptr(p)=null then cur_v:=cur_v+height(p)+depth(p) 12588 else begin cur_v:=cur_v+height(p); synch_v; 12589 save_h:=dvi_h; save_v:=dvi_v; 12590 cur_h:=left_edge+shift_amount(p); {shift the box right} 12591 temp_ptr:=p; 12592 if type(p)=vlist_node then vlist_out@+else hlist_out; 12593 dvi_h:=save_h; dvi_v:=save_v; 12594 cur_v:=save_v+depth(p); cur_h:=left_edge; 12595 end 12596 12597 @ @<Output a rule in a vlist...@>= 12598 if is_running(rule_wd) then rule_wd:=width(this_box); 12599 rule_ht:=rule_ht+rule_dp; {this is the rule thickness} 12600 cur_v:=cur_v+rule_ht; 12601 if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules} 12602 begin synch_h; synch_v; 12603 dvi_out(put_rule); dvi_four(rule_ht); dvi_four(rule_wd); 12604 end; 12605 goto next_p 12606 12607 @ @<Move down or output leaders@>= 12608 begin g:=glue_ptr(p); rule_ht:=width(g)-cur_g; 12609 if g_sign<>normal then 12610 begin if g_sign=stretching then 12611 begin if stretch_order(g)=g_order then 12612 begin cur_glue:=cur_glue+stretch(g); 12613 vet_glue(float(glue_set(this_box))*cur_glue); 12614 @^real multiplication@> 12615 cur_g:=round(glue_temp); 12616 end; 12617 end 12618 else if shrink_order(g)=g_order then 12619 begin cur_glue:=cur_glue-shrink(g); 12620 vet_glue(float(glue_set(this_box))*cur_glue); 12621 cur_g:=round(glue_temp); 12622 end; 12623 end; 12624 rule_ht:=rule_ht+cur_g; 12625 if subtype(p)>=a_leaders then 12626 @<Output leaders in a vlist, |goto fin_rule| if a rule 12627 or to |next_p| if done@>; 12628 goto move_past; 12629 end 12630 12631 @ @<Output leaders in a vlist...@>= 12632 begin leader_box:=leader_ptr(p); 12633 if type(leader_box)=rule_node then 12634 begin rule_wd:=width(leader_box); rule_dp:=0; 12635 goto fin_rule; 12636 end; 12637 leader_ht:=height(leader_box)+depth(leader_box); 12638 if (leader_ht>0)and(rule_ht>0) then 12639 begin rule_ht:=rule_ht+10; {compensate for floating-point rounding} 12640 edge:=cur_v+rule_ht; lx:=0; 12641 @<Let |cur_v| be the position of the first box, and set |leader_ht+lx| 12642 to the spacing between corresponding parts of boxes@>; 12643 while cur_v+leader_ht<=edge do 12644 @<Output a leader box at |cur_v|, 12645 then advance |cur_v| by |leader_ht+lx|@>; 12646 cur_v:=edge-10; goto next_p; 12647 end; 12648 end 12649 12650 @ @<Let |cur_v| be the position of the first box, ...@>= 12651 if subtype(p)=a_leaders then 12652 begin save_v:=cur_v; 12653 cur_v:=top_edge+leader_ht*((cur_v-top_edge)@!div leader_ht); 12654 if cur_v<save_v then cur_v:=cur_v+leader_ht; 12655 end 12656 else begin lq:=rule_ht div leader_ht; {the number of box copies} 12657 lr:=rule_ht mod leader_ht; {the remaining space} 12658 if subtype(p)=c_leaders then cur_v:=cur_v+(lr div 2) 12659 else begin lx:=lr div (lq+1); 12660 cur_v:=cur_v+((lr-(lq-1)*lx) div 2); 12661 end; 12662 end 12663 12664 @ When we reach this part of the program, |cur_v| indicates the top of a 12665 leader box, not its baseline. 12666 12667 @<Output a leader box at |cur_v|, ...@>= 12668 begin cur_h:=left_edge+shift_amount(leader_box); synch_h; save_h:=dvi_h;@/ 12669 cur_v:=cur_v+height(leader_box); synch_v; save_v:=dvi_v; 12670 temp_ptr:=leader_box; 12671 outer_doing_leaders:=doing_leaders; doing_leaders:=true; 12672 if type(leader_box)=vlist_node then vlist_out@+else hlist_out; 12673 doing_leaders:=outer_doing_leaders; 12674 dvi_v:=save_v; dvi_h:=save_h; cur_h:=left_edge; 12675 cur_v:=save_v-height(leader_box)+leader_ht+lx; 12676 end 12677 12678 @ The |hlist_out| and |vlist_out| procedures are now complete, so we are 12679 ready for the |ship_out| routine that gets them started in the first place. 12680 12681 @p procedure ship_out(@!p:pointer); {output the box |p|} 12682 label done; 12683 var page_loc:integer; {location of the current |bop|} 12684 @!j,@!k:0..9; {indices to first ten count registers} 12685 @!s:pool_pointer; {index into |str_pool|} 12686 @!old_setting:0..max_selector; {saved |selector| setting} 12687 begin if tracing_output>0 then 12688 begin print_nl(""); print_ln; 12689 print("Completed box being shipped out"); 12690 @.Completed box...@> 12691 end; 12692 if term_offset>max_print_line-9 then print_ln 12693 else if (term_offset>0)or(file_offset>0) then print_char(" "); 12694 print_char("["); j:=9; 12695 while (count(j)=0)and(j>0) do decr(j); 12696 for k:=0 to j do 12697 begin print_int(count(k)); 12698 if k<j then print_char("."); 12699 end; 12700 update_terminal; 12701 if tracing_output>0 then 12702 begin print_char("]"); 12703 begin_diagnostic; show_box(p); end_diagnostic(true); 12704 end; 12705 @<Ship box |p| out@>; 12706 if tracing_output<=0 then print_char("]"); 12707 dead_cycles:=0; 12708 update_terminal; {progress report} 12709 @<Flush the box from memory, showing statistics if requested@>; 12710 end; 12711 12712 @ @<Flush the box from memory, showing statistics if requested@>= 12713 @!stat if tracing_stats>1 then 12714 begin print_nl("Memory usage before: "); 12715 @.Memory usage...@> 12716 print_int(var_used); print_char("&"); 12717 print_int(dyn_used); print_char(";"); 12718 end; 12719 tats@/ 12720 flush_node_list(p); 12721 @!stat if tracing_stats>1 then 12722 begin print(" after: "); 12723 print_int(var_used); print_char("&"); 12724 print_int(dyn_used); print("; still untouched: "); 12725 print_int(hi_mem_min-lo_mem_max-1); print_ln; 12726 end; 12727 tats 12728 12729 @ @<Ship box |p| out@>= 12730 @<Update the values of |max_h| and |max_v|; but if the page is too large, 12731 |goto done|@>; 12732 @<Initialize variables as |ship_out| begins@>; 12733 page_loc:=dvi_offset+dvi_ptr; 12734 dvi_out(bop); 12735 for k:=0 to 9 do dvi_four(count(k)); 12736 dvi_four(last_bop); last_bop:=page_loc; 12737 cur_v:=height(p)+v_offset; temp_ptr:=p; 12738 if type(p)=vlist_node then vlist_out@+else hlist_out; 12739 dvi_out(eop); incr(total_pages); cur_s:=-1; 12740 done: 12741 12742 @ Sometimes the user will generate a huge page because other error messages 12743 are being ignored. Such pages are not output to the \.{dvi} file, since they 12744 may confuse the printing software. 12745 12746 @<Update the values of |max_h| and |max_v|; but if the page is too large...@>= 12747 if (height(p)>max_dimen)or@|(depth(p)>max_dimen)or@| 12748 (height(p)+depth(p)+v_offset>max_dimen)or@| 12749 (width(p)+h_offset>max_dimen) then 12750 begin print_err("Huge page cannot be shipped out"); 12751 @.Huge page...@> 12752 help2("The page just created is more than 18 feet tall or")@/ 12753 ("more than 18 feet wide, so I suspect something went wrong."); 12754 error; 12755 if tracing_output<=0 then 12756 begin begin_diagnostic; 12757 print_nl("The following box has been deleted:"); 12758 @.The following...deleted@> 12759 show_box(p); 12760 end_diagnostic(true); 12761 end; 12762 goto done; 12763 end; 12764 if height(p)+depth(p)+v_offset>max_v then max_v:=height(p)+depth(p)+v_offset; 12765 if width(p)+h_offset>max_h then max_h:=width(p)+h_offset 12766 12767 @ At the end of the program, we must finish things off by writing the 12768 post\-amble. If |total_pages=0|, the \.{DVI} file was never opened. 12769 If |total_pages>=65536|, the \.{DVI} file will lie. And if 12770 |max_push>=65536|, the user deserves whatever chaos might ensue. 12771 12772 An integer variable |k| will be declared for use by this routine. 12773 12774 @<Finish the \.{DVI} file@>= 12775 while cur_s>-1 do 12776 begin if cur_s>0 then dvi_out(pop) 12777 else begin dvi_out(eop); incr(total_pages); 12778 end; 12779 decr(cur_s); 12780 end; 12781 if total_pages=0 then print_nl("No pages of output.") 12782 @.No pages of output@> 12783 else begin dvi_out(post); {beginning of the postamble} 12784 dvi_four(last_bop); last_bop:=dvi_offset+dvi_ptr-5; {|post| location} 12785 dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp} 12786 prepare_mag; dvi_four(mag); {magnification factor} 12787 dvi_four(max_v); dvi_four(max_h);@/ 12788 dvi_out(max_push div 256); dvi_out(max_push mod 256);@/ 12789 dvi_out((total_pages div 256) mod 256); dvi_out(total_pages mod 256);@/ 12790 @<Output the font definitions for all fonts that were used@>; 12791 dvi_out(post_post); dvi_four(last_bop); dvi_out(id_byte);@/ 12792 k:=4+((dvi_buf_size-dvi_ptr) mod 4); {the number of 223's} 12793 while k>0 do 12794 begin dvi_out(223); decr(k); 12795 end; 12796 @<Empty the last bytes out of |dvi_buf|@>; 12797 print_nl("Output written on "); slow_print(output_file_name); 12798 @.Output written on x@> 12799 print(" ("); print_int(total_pages); print(" page"); 12800 if total_pages<>1 then print_char("s"); 12801 print(", "); print_int(dvi_offset+dvi_ptr); print(" bytes)."); 12802 b_close(dvi_file); 12803 end 12804 12805 @ @<Output the font definitions...@>= 12806 while font_ptr>font_base do 12807 begin if font_used[font_ptr] then dvi_font_def(font_ptr); 12808 decr(font_ptr); 12809 end 12810 12811 @* \[33] Packaging. 12812 We're essentially done with the parts of \TeX\ that are concerned with 12813 the input (|get_next|) and the output (|ship_out|). So it's time to 12814 get heavily into the remaining part, which does the real work of typesetting. 12815 12816 After lists are constructed, \TeX\ wraps them up and puts them into boxes. 12817 Two major subroutines are given the responsibility for this task: |hpack| 12818 applies to horizontal lists (hlists) and |vpack| applies to vertical lists 12819 (vlists). The main duty of |hpack| and |vpack| is to compute the dimensions 12820 of the resulting boxes, and to adjust the glue if one of those dimensions 12821 is pre-specified. The computed sizes normally enclose all of the material 12822 inside the new box; but some items may stick out if negative glue is used, 12823 if the box is overfull, or if a \.{\\vbox} includes other boxes that have 12824 been shifted left. 12825 12826 The subroutine call |hpack(p,w,m)| returns a pointer to an |hlist_node| 12827 for a box containing the hlist that starts at |p|. Parameter |w| specifies 12828 a width; and parameter |m| is either `|exactly|' or `|additional|'. Thus, 12829 |hpack(p,w,exactly)| produces a box whose width is exactly |w|, while 12830 |hpack(p,w,additional)| yields a box whose width is the natural width plus 12831 |w|. It is convenient to define a macro called `|natural|' to cover the 12832 most common case, so that we can say |hpack(p,natural)| to get a box that 12833 has the natural width of list |p|. 12834 12835 Similarly, |vpack(p,w,m)| returns a pointer to a |vlist_node| for a 12836 box containing the vlist that starts at |p|. In this case |w| represents 12837 a height instead of a width; the parameter |m| is interpreted as in |hpack|. 12838 12839 @d exactly=0 {a box dimension is pre-specified} 12840 @d additional=1 {a box dimension is increased from the natural one} 12841 @d natural==0,additional {shorthand for parameters to |hpack| and |vpack|} 12842 12843 @ The parameters to |hpack| and |vpack| correspond to \TeX's primitives 12844 like `\.{\\hbox} \.{to} \.{300pt}', `\.{\\hbox} \.{spread} \.{10pt}'; note 12845 that `\.{\\hbox}' with no dimension following it is equivalent to 12846 `\.{\\hbox} \.{spread} \.{0pt}'. The |scan_spec| subroutine scans such 12847 constructions in the user's input, including the mandatory left brace that 12848 follows them, and it puts the specification onto |save_stack| so that the 12849 desired box can later be obtained by executing the following code: 12850 $$\vbox{\halign{#\hfil\cr 12851 |save_ptr:=save_ptr-2;|\cr 12852 |hpack(p,saved(1),saved(0)).|\cr}}$$ 12853 Special care is necessary to ensure that the special |save_stack| codes 12854 are placed just below the new group code, because scanning can change 12855 |save_stack| when \.{\\csname} appears. 12856 12857 @p procedure scan_spec(@!c:group_code;@!three_codes:boolean); 12858 {scans a box specification and left brace} 12859 label found; 12860 var @!s:integer; {temporarily saved value} 12861 @!spec_code:exactly..additional; 12862 begin if three_codes then s:=saved(0); 12863 if scan_keyword("to") then spec_code:=exactly 12864 @.to@> 12865 else if scan_keyword("spread") then spec_code:=additional 12866 @.spread@> 12867 else begin spec_code:=additional; cur_val:=0; 12868 goto found; 12869 end; 12870 scan_normal_dimen; 12871 found: if three_codes then 12872 begin saved(0):=s; incr(save_ptr); 12873 end; 12874 saved(0):=spec_code; saved(1):=cur_val; save_ptr:=save_ptr+2; 12875 new_save_level(c); scan_left_brace; 12876 end; 12877 12878 @ To figure out the glue setting, |hpack| and |vpack| determine how much 12879 stretchability and shrinkability are present, considering all four orders 12880 of infinity. The highest order of infinity that has a nonzero coefficient 12881 is then used as if no other orders were present. 12882 12883 For example, suppose that the given list contains six glue nodes with 12884 the respective stretchabilities 3pt, 8fill, 5fil, 6pt, $-3$fil, $-8$fill. 12885 Then the total is essentially 2fil; and if a total additional space of 6pt 12886 is to be achieved by stretching, the actual amounts of stretch will be 12887 0pt, 0pt, 15pt, 0pt, $-9$pt, and 0pt, since only `fil' glue will be 12888 considered. (The `fill' glue is therefore not really stretching infinitely 12889 with respect to `fil'; nobody would actually want that to happen.) 12890 12891 The arrays |total_stretch| and |total_shrink| are used to determine how much 12892 glue of each kind is present. A global variable |last_badness| is used 12893 to implement \.{\\badness}. 12894 12895 @<Glob...@>= 12896 @!total_stretch, @!total_shrink: array[glue_ord] of scaled; 12897 {glue found by |hpack| or |vpack|} 12898 @!last_badness:integer; {badness of the most recently packaged box} 12899 12900 @ If the global variable |adjust_tail| is non-null, the |hpack| routine 12901 also removes all occurrences of |ins_node|, |mark_node|, and |adjust_node| 12902 items and appends the resulting material onto the list that ends at 12903 location |adjust_tail|. 12904 12905 @<Glob...@>= 12906 @!adjust_tail:pointer; {tail of adjustment list} 12907 12908 @ @<Set init...@>=adjust_tail:=null; last_badness:=0; 12909 12910 @ Here now is |hpack|, which contains few if any surprises. 12911 12912 @p function hpack(@!p:pointer;@!w:scaled;@!m:small_number):pointer; 12913 label reswitch, common_ending, exit; 12914 var r:pointer; {the box node that will be returned} 12915 @!q:pointer; {trails behind |p|} 12916 @!h,@!d,@!x:scaled; {height, depth, and natural width} 12917 @!s:scaled; {shift amount} 12918 @!g:pointer; {points to a glue specification} 12919 @!o:glue_ord; {order of infinity} 12920 @!f:internal_font_number; {the font in a |char_node|} 12921 @!i:four_quarters; {font information about a |char_node|} 12922 @!hd:eight_bits; {height and depth indices for a character} 12923 begin last_badness:=0; r:=get_node(box_node_size); type(r):=hlist_node; 12924 subtype(r):=min_quarterword; shift_amount(r):=0; 12925 q:=r+list_offset; link(q):=p;@/ 12926 h:=0; @<Clear dimensions to zero@>; 12927 while p<>null do @<Examine node |p| in the hlist, taking account of its effect 12928 on the dimensions of the new box, or moving it to the adjustment list; 12929 then advance |p| to the next node@>; 12930 if adjust_tail<>null then link(adjust_tail):=null; 12931 height(r):=h; depth(r):=d;@/ 12932 @<Determine the value of |width(r)| and the appropriate glue setting; 12933 then |return| or |goto common_ending|@>; 12934 common_ending: @<Finish issuing a diagnostic message 12935 for an overfull or underfull hbox@>; 12936 exit: hpack:=r; 12937 end; 12938 12939 @ @<Clear dimensions to zero@>= 12940 d:=0; x:=0; 12941 total_stretch[normal]:=0; total_shrink[normal]:=0; 12942 total_stretch[fil]:=0; total_shrink[fil]:=0; 12943 total_stretch[fill]:=0; total_shrink[fill]:=0; 12944 total_stretch[filll]:=0; total_shrink[filll]:=0 12945 12946 @ @<Examine node |p| in the hlist, taking account of its effect...@>= 12947 @^inner loop@> 12948 begin reswitch: while is_char_node(p) do 12949 @<Incorporate character dimensions into the dimensions of 12950 the hbox that will contain~it, then move to the next node@>; 12951 if p<>null then 12952 begin case type(p) of 12953 hlist_node,vlist_node,rule_node,unset_node: 12954 @<Incorporate box dimensions into the dimensions of 12955 the hbox that will contain~it@>; 12956 ins_node,mark_node,adjust_node: if adjust_tail<>null then 12957 @<Transfer node |p| to the adjustment list@>; 12958 whatsit_node:@<Incorporate a whatsit node into an hbox@>; 12959 glue_node:@<Incorporate glue into the horizontal totals@>; 12960 kern_node,math_node: x:=x+width(p); 12961 ligature_node: @<Make node |p| look like a |char_node| 12962 and |goto reswitch|@>; 12963 othercases do_nothing 12964 endcases;@/ 12965 p:=link(p); 12966 end; 12967 end 12968 12969 12970 @ @<Make node |p| look like a |char_node| and |goto reswitch|@>= 12971 begin mem[lig_trick]:=mem[lig_char(p)]; link(lig_trick):=link(p); 12972 p:=lig_trick; goto reswitch; 12973 end 12974 12975 @ The code here implicitly uses the fact that running dimensions are 12976 indicated by |null_flag|, which will be ignored in the calculations 12977 because it is a highly negative number. 12978 12979 @<Incorporate box dimensions into the dimensions of the hbox...@>= 12980 begin x:=x+width(p); 12981 if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p); 12982 if height(p)-s>h then h:=height(p)-s; 12983 if depth(p)+s>d then d:=depth(p)+s; 12984 end 12985 12986 @ The following code is part of \TeX's inner loop; i.e., adding another 12987 character of text to the user's input will cause each of these instructions 12988 to be exercised one more time. 12989 @^inner loop@> 12990 12991 @<Incorporate character dimensions into the dimensions of the hbox...@>= 12992 begin f:=font(p); i:=char_info(f)(character(p)); hd:=height_depth(i); 12993 x:=x+char_width(f)(i);@/ 12994 s:=char_height(f)(hd);@+if s>h then h:=s; 12995 s:=char_depth(f)(hd);@+if s>d then d:=s; 12996 p:=link(p); 12997 end 12998 12999 @ Although node |q| is not necessarily the immediate predecessor of node |p|, 13000 it always points to some node in the list preceding |p|. Thus, we can delete 13001 nodes by moving |q| when necessary. The algorithm takes linear time, and the 13002 extra computation does not intrude on the inner loop unless it is necessary 13003 to make a deletion. 13004 @^inner loop@> 13005 13006 @<Transfer node |p| to the adjustment list@>= 13007 begin while link(q)<>p do q:=link(q); 13008 if type(p)=adjust_node then 13009 begin link(adjust_tail):=adjust_ptr(p); 13010 while link(adjust_tail)<>null do adjust_tail:=link(adjust_tail); 13011 p:=link(p); free_node(link(q),small_node_size); 13012 end 13013 else begin link(adjust_tail):=p; adjust_tail:=p; p:=link(p); 13014 end; 13015 link(q):=p; p:=q; 13016 end 13017 13018 @ @<Incorporate glue into the horizontal totals@>= 13019 begin g:=glue_ptr(p); x:=x+width(g);@/ 13020 o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g); 13021 o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g); 13022 if subtype(p)>=a_leaders then 13023 begin g:=leader_ptr(p); 13024 if height(g)>h then h:=height(g); 13025 if depth(g)>d then d:=depth(g); 13026 end; 13027 end 13028 13029 @ When we get to the present part of the program, |x| is the natural width 13030 of the box being packaged. 13031 13032 @<Determine the value of |width(r)| and the appropriate glue setting...@>= 13033 if m=additional then w:=x+w; 13034 width(r):=w; x:=w-x; {now |x| is the excess to be made up} 13035 if x=0 then 13036 begin glue_sign(r):=normal; glue_order(r):=normal; 13037 set_glue_ratio_zero(glue_set(r)); 13038 return; 13039 end 13040 else if x>0 then @<Determine horizontal glue stretch setting, then |return| 13041 or \hbox{|goto common_ending|}@> 13042 else @<Determine horizontal glue shrink setting, then |return| 13043 or \hbox{|goto common_ending|}@> 13044 13045 @ @<Determine horizontal glue stretch setting...@>= 13046 begin @<Determine the stretch order@>; 13047 glue_order(r):=o; glue_sign(r):=stretching; 13048 if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o]) 13049 @^real division@> 13050 else begin glue_sign(r):=normal; 13051 set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch} 13052 end; 13053 if o=normal then if list_ptr(r)<>null then 13054 @<Report an underfull hbox and |goto common_ending|, if this box 13055 is sufficiently bad@>; 13056 return; 13057 end 13058 13059 @ @<Determine the stretch order@>= 13060 if total_stretch[filll]<>0 then o:=filll 13061 else if total_stretch[fill]<>0 then o:=fill 13062 else if total_stretch[fil]<>0 then o:=fil 13063 else o:=normal 13064 13065 @ @<Report an underfull hbox and |goto common_ending|, if...@>= 13066 begin last_badness:=badness(x,total_stretch[normal]); 13067 if last_badness>hbadness then 13068 begin print_ln; 13069 if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose"); 13070 print(" \hbox (badness "); print_int(last_badness); 13071 @.Underfull \\hbox...@> 13072 @.Loose \\hbox...@> 13073 goto common_ending; 13074 end; 13075 end 13076 13077 @ In order to provide a decent indication of where an overfull or underfull 13078 box originated, we use a global variable |pack_begin_line| that is 13079 set nonzero only when |hpack| is being called by the paragraph builder 13080 or the alignment finishing routine. 13081 13082 @<Glob...@>= 13083 @!pack_begin_line:integer; {source file line where the current paragraph 13084 or alignment began; a negative value denotes alignment} 13085 13086 @ @<Set init...@>= 13087 pack_begin_line:=0; 13088 13089 @ @<Finish issuing a diagnostic message for an overfull or underfull hbox@>= 13090 if output_active then print(") has occurred while \output is active") 13091 else begin if pack_begin_line<>0 then 13092 begin if pack_begin_line>0 then print(") in paragraph at lines ") 13093 else print(") in alignment at lines "); 13094 print_int(abs(pack_begin_line)); 13095 print("--"); 13096 end 13097 else print(") detected at line "); 13098 print_int(line); 13099 end; 13100 print_ln;@/ 13101 font_in_short_display:=null_font; short_display(list_ptr(r)); print_ln;@/ 13102 begin_diagnostic; show_box(r); end_diagnostic(true) 13103 13104 @ @<Determine horizontal glue shrink setting...@>= 13105 begin @<Determine the shrink order@>; 13106 glue_order(r):=o; glue_sign(r):=shrinking; 13107 if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o]) 13108 @^real division@> 13109 else begin glue_sign(r):=normal; 13110 set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink} 13111 end; 13112 if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then 13113 begin last_badness:=1000000; 13114 set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage} 13115 @<Report an overfull hbox and |goto common_ending|, if this box 13116 is sufficiently bad@>; 13117 end 13118 else if o=normal then if list_ptr(r)<>null then 13119 @<Report a tight hbox and |goto common_ending|, if this box 13120 is sufficiently bad@>; 13121 return; 13122 end 13123 13124 @ @<Determine the shrink order@>= 13125 if total_shrink[filll]<>0 then o:=filll 13126 else if total_shrink[fill]<>0 then o:=fill 13127 else if total_shrink[fil]<>0 then o:=fil 13128 else o:=normal 13129 13130 @ @<Report an overfull hbox and |goto common_ending|, if...@>= 13131 if (-x-total_shrink[normal]>hfuzz)or(hbadness<100) then 13132 begin if (overfull_rule>0)and(-x-total_shrink[normal]>hfuzz) then 13133 begin while link(q)<>null do q:=link(q); 13134 link(q):=new_rule; 13135 width(link(q)):=overfull_rule; 13136 end; 13137 print_ln; print_nl("Overfull \hbox ("); 13138 @.Overfull \\hbox...@> 13139 print_scaled(-x-total_shrink[normal]); print("pt too wide"); 13140 goto common_ending; 13141 end 13142 13143 @ @<Report a tight hbox and |goto common_ending|, if...@>= 13144 begin last_badness:=badness(-x,total_shrink[normal]); 13145 if last_badness>hbadness then 13146 begin print_ln; print_nl("Tight \hbox (badness "); print_int(last_badness); 13147 @.Tight \\hbox...@> 13148 goto common_ending; 13149 end; 13150 end 13151 13152 @ The |vpack| subroutine is actually a special case of a slightly more 13153 general routine called |vpackage|, which has four parameters. The fourth 13154 parameter, which is |max_dimen| in the case of |vpack|, specifies the 13155 maximum depth of the page box that is constructed. The depth is first 13156 computed by the normal rules; if it exceeds this limit, the reference 13157 point is simply moved down until the limiting depth is attained. 13158 13159 @d vpack(#)==vpackage(#,max_dimen) {special case of unconstrained depth} 13160 13161 @p function vpackage(@!p:pointer;@!h:scaled;@!m:small_number;@!l:scaled): 13162 pointer; 13163 label common_ending, exit; 13164 var r:pointer; {the box node that will be returned} 13165 @!w,@!d,@!x:scaled; {width, depth, and natural height} 13166 @!s:scaled; {shift amount} 13167 @!g:pointer; {points to a glue specification} 13168 @!o:glue_ord; {order of infinity} 13169 begin last_badness:=0; r:=get_node(box_node_size); type(r):=vlist_node; 13170 subtype(r):=min_quarterword; shift_amount(r):=0; 13171 list_ptr(r):=p;@/ 13172 w:=0; @<Clear dimensions to zero@>; 13173 while p<>null do @<Examine node |p| in the vlist, taking account of its effect 13174 on the dimensions of the new box; then advance |p| to the next node@>; 13175 width(r):=w; 13176 if d>l then 13177 begin x:=x+d-l; depth(r):=l; 13178 end 13179 else depth(r):=d; 13180 @<Determine the value of |height(r)| and the appropriate glue setting; 13181 then |return| or |goto common_ending|@>; 13182 common_ending: @<Finish issuing a diagnostic message 13183 for an overfull or underfull vbox@>; 13184 exit: vpackage:=r; 13185 end; 13186 13187 @ @<Examine node |p| in the vlist, taking account of its effect...@>= 13188 begin if is_char_node(p) then confusion("vpack") 13189 @:this can't happen vpack}{\quad vpack@> 13190 else case type(p) of 13191 hlist_node,vlist_node,rule_node,unset_node: 13192 @<Incorporate box dimensions into the dimensions of 13193 the vbox that will contain~it@>; 13194 whatsit_node:@<Incorporate a whatsit node into a vbox@>; 13195 glue_node: @<Incorporate glue into the vertical totals@>; 13196 kern_node: begin x:=x+d+width(p); d:=0; 13197 end; 13198 othercases do_nothing 13199 endcases; 13200 p:=link(p); 13201 end 13202 13203 @ @<Incorporate box dimensions into the dimensions of the vbox...@>= 13204 begin x:=x+d+height(p); d:=depth(p); 13205 if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p); 13206 if width(p)+s>w then w:=width(p)+s; 13207 end 13208 13209 @ @<Incorporate glue into the vertical totals@>= 13210 begin x:=x+d; d:=0;@/ 13211 g:=glue_ptr(p); x:=x+width(g);@/ 13212 o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g); 13213 o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g); 13214 if subtype(p)>=a_leaders then 13215 begin g:=leader_ptr(p); 13216 if width(g)>w then w:=width(g); 13217 end; 13218 end 13219 13220 @ When we get to the present part of the program, |x| is the natural height 13221 of the box being packaged. 13222 13223 @<Determine the value of |height(r)| and the appropriate glue setting...@>= 13224 if m=additional then h:=x+h; 13225 height(r):=h; x:=h-x; {now |x| is the excess to be made up} 13226 if x=0 then 13227 begin glue_sign(r):=normal; glue_order(r):=normal; 13228 set_glue_ratio_zero(glue_set(r)); 13229 return; 13230 end 13231 else if x>0 then @<Determine vertical glue stretch setting, then |return| 13232 or \hbox{|goto common_ending|}@> 13233 else @<Determine vertical glue shrink setting, then |return| 13234 or \hbox{|goto common_ending|}@> 13235 13236 @ @<Determine vertical glue stretch setting...@>= 13237 begin @<Determine the stretch order@>; 13238 glue_order(r):=o; glue_sign(r):=stretching; 13239 if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o]) 13240 @^real division@> 13241 else begin glue_sign(r):=normal; 13242 set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch} 13243 end; 13244 if o=normal then if list_ptr(r)<>null then 13245 @<Report an underfull vbox and |goto common_ending|, if this box 13246 is sufficiently bad@>; 13247 return; 13248 end 13249 13250 @ @<Report an underfull vbox and |goto common_ending|, if...@>= 13251 begin last_badness:=badness(x,total_stretch[normal]); 13252 if last_badness>vbadness then 13253 begin print_ln; 13254 if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose"); 13255 print(" \vbox (badness "); print_int(last_badness); 13256 @.Underfull \\vbox...@> 13257 @.Loose \\vbox...@> 13258 goto common_ending; 13259 end; 13260 end 13261 13262 @ @<Finish issuing a diagnostic message for an overfull or underfull vbox@>= 13263 if output_active then print(") has occurred while \output is active") 13264 else begin if pack_begin_line<>0 then {it's actually negative} 13265 begin print(") in alignment at lines "); 13266 print_int(abs(pack_begin_line)); 13267 print("--"); 13268 end 13269 else print(") detected at line "); 13270 print_int(line); 13271 print_ln;@/ 13272 end; 13273 begin_diagnostic; show_box(r); end_diagnostic(true) 13274 13275 @ @<Determine vertical glue shrink setting...@>= 13276 begin @<Determine the shrink order@>; 13277 glue_order(r):=o; glue_sign(r):=shrinking; 13278 if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o]) 13279 @^real division@> 13280 else begin glue_sign(r):=normal; 13281 set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink} 13282 end; 13283 if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then 13284 begin last_badness:=1000000; 13285 set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage} 13286 @<Report an overfull vbox and |goto common_ending|, if this box 13287 is sufficiently bad@>; 13288 end 13289 else if o=normal then if list_ptr(r)<>null then 13290 @<Report a tight vbox and |goto common_ending|, if this box 13291 is sufficiently bad@>; 13292 return; 13293 end 13294 13295 @ @<Report an overfull vbox and |goto common_ending|, if...@>= 13296 if (-x-total_shrink[normal]>vfuzz)or(vbadness<100) then 13297 begin print_ln; print_nl("Overfull \vbox ("); 13298 @.Overfull \\vbox...@> 13299 print_scaled(-x-total_shrink[normal]); print("pt too high"); 13300 goto common_ending; 13301 end 13302 13303 @ @<Report a tight vbox and |goto common_ending|, if...@>= 13304 begin last_badness:=badness(-x,total_shrink[normal]); 13305 if last_badness>vbadness then 13306 begin print_ln; print_nl("Tight \vbox (badness "); print_int(last_badness); 13307 @.Tight \\vbox...@> 13308 goto common_ending; 13309 end; 13310 end 13311 13312 @ When a box is being appended to the current vertical list, the 13313 baselineskip calculation is handled by the |append_to_vlist| routine. 13314 13315 @p procedure append_to_vlist(@!b:pointer); 13316 var d:scaled; {deficiency of space between baselines} 13317 @!p:pointer; {a new glue node} 13318 begin if prev_depth>ignore_depth then 13319 begin d:=width(baseline_skip)-prev_depth-height(b); 13320 if d<line_skip_limit then p:=new_param_glue(line_skip_code) 13321 else begin p:=new_skip_param(baseline_skip_code); 13322 width(temp_ptr):=d; {|temp_ptr=glue_ptr(p)|} 13323 end; 13324 link(tail):=p; tail:=p; 13325 end; 13326 link(tail):=b; tail:=b; prev_depth:=depth(b); 13327 end; 13328 13329 @* \[34] Data structures for math mode. 13330 When \TeX\ reads a formula that is enclosed between \.\$'s, it constructs an 13331 {\sl mlist}, which is essentially a tree structure representing that 13332 formula. An mlist is a linear sequence of items, but we can regard it as 13333 a tree structure because mlists can appear within mlists. For example, many 13334 of the entries can be subscripted or superscripted, and such ``scripts'' 13335 are mlists in their own right. 13336 13337 An entire formula is parsed into such a tree before any of the actual 13338 typesetting is done, because the current style of type is usually not 13339 known until the formula has been fully scanned. For example, when the 13340 formula `\.{\$a+b \\over c+d\$}' is being read, there is no way to tell 13341 that `\.{a+b}' will be in script size until `\.{\\over}' has appeared. 13342 13343 During the scanning process, each element of the mlist being built is 13344 classified as a relation, a binary operator, an open parenthesis, etc., 13345 or as a construct like `\.{\\sqrt}' that must be built up. This classification 13346 appears in the mlist data structure. 13347 13348 After a formula has been fully scanned, the mlist is converted to an hlist 13349 so that it can be incorporated into the surrounding text. This conversion is 13350 controlled by a recursive procedure that decides all of the appropriate 13351 styles by a ``top-down'' process starting at the outermost level and working 13352 in towards the subformulas. The formula is ultimately pasted together using 13353 combinations of horizontal and vertical boxes, with glue and penalty nodes 13354 inserted as necessary. 13355 13356 An mlist is represented internally as a linked list consisting chiefly 13357 of ``noads'' (pronounced ``no-adds''), to distinguish them from the somewhat 13358 similar ``nodes'' in hlists and vlists. Certain kinds of ordinary nodes are 13359 allowed to appear in mlists together with the noads; \TeX\ tells the difference 13360 by means of the |type| field, since a noad's |type| is always greater than 13361 that of a node. An mlist does not contain character nodes, hlist nodes, vlist 13362 nodes, math nodes, ligature nodes, 13363 or unset nodes; in particular, each mlist item appears in the 13364 variable-size part of |mem|, so the |type| field is always present. 13365 13366 @ Each noad is four or more words long. The first word contains the |type| 13367 and |subtype| and |link| fields that are already so familiar to us; the 13368 second, third, and fourth words are called the noad's |nucleus|, |subscr|, 13369 and |supscr| fields. 13370 13371 Consider, for example, the simple formula `\.{\$x\^2\$}', which would be 13372 parsed into an mlist containing a single element called an |ord_noad|. 13373 The |nucleus| of this noad is a representation of `\.x', the |subscr| is 13374 empty, and the |supscr| is a representation of `\.2'. 13375 13376 The |nucleus|, |subscr|, and |supscr| fields are further broken into 13377 subfields. If |p| points to a noad, and if |q| is one of its principal 13378 fields (e.g., |q=subscr(p)|), there are several possibilities for the 13379 subfields, depending on the |math_type| of |q|. 13380 13381 \yskip\hang|math_type(q)=math_char| means that |fam(q)| refers to one of 13382 the sixteen font families, and |character(q)| is the number of a character 13383 within a font of that family, as in a character node. 13384 13385 \yskip\hang|math_type(q)=math_text_char| is similar, but the character is 13386 unsubscripted and unsuperscripted and it is followed immediately by another 13387 character from the same font. (This |math_type| setting appears only 13388 briefly during the processing; it is used to suppress unwanted italic 13389 corrections.) 13390 13391 \yskip\hang|math_type(q)=empty| indicates a field with no value (the 13392 corresponding attribute of noad |p| is not present). 13393 13394 \yskip\hang|math_type(q)=sub_box| means that |info(q)| points to a box 13395 node (either an |hlist_node| or a |vlist_node|) that should be used as the 13396 value of the field. The |shift_amount| in the subsidiary box node is the 13397 amount by which that box will be shifted downward. 13398 13399 \yskip\hang|math_type(q)=sub_mlist| means that |info(q)| points to 13400 an mlist; the mlist must be converted to an hlist in order to obtain 13401 the value of this field. 13402 13403 \yskip\noindent In the latter case, we might have |info(q)=null|. This 13404 is not the same as |math_type(q)=empty|; for example, `\.{\$P\_\{\}\$}' 13405 and `\.{\$P\$}' produce different results (the former will not have the 13406 ``italic correction'' added to the width of |P|, but the ``script skip'' 13407 will be added). 13408 13409 The definitions of subfields given here are evidently wasteful of space, 13410 since a halfword is being used for the |math_type| although only three 13411 bits would be needed. However, there are hardly ever many noads present at 13412 once, since they are soon converted to nodes that take up even more space, 13413 so we can afford to represent them in whatever way simplifies the 13414 programming. 13415 13416 @d noad_size=4 {number of words in a normal noad} 13417 @d nucleus(#)==#+1 {the |nucleus| field of a noad} 13418 @d supscr(#)==#+2 {the |supscr| field of a noad} 13419 @d subscr(#)==#+3 {the |subscr| field of a noad} 13420 @d math_type==link {a |halfword| in |mem|} 13421 @d fam==font {a |quarterword| in |mem|} 13422 @d math_char=1 {|math_type| when the attribute is simple} 13423 @d sub_box=2 {|math_type| when the attribute is a box} 13424 @d sub_mlist=3 {|math_type| when the attribute is a formula} 13425 @d math_text_char=4 {|math_type| when italic correction is dubious} 13426 13427 @ Each portion of a formula is classified as Ord, Op, Bin, Rel, Open, 13428 Close, Punct, or Inner, for purposes of spacing and line breaking. An 13429 |ord_noad|, |op_noad|, |bin_noad|, |rel_noad|, |open_noad|, |close_noad|, 13430 |punct_noad|, or |inner_noad| is used to represent portions of the various 13431 types. For example, an `\.=' sign in a formula leads to the creation of a 13432 |rel_noad| whose |nucleus| field is a representation of an equals sign 13433 (usually |fam=0|, |character=@'75|). A formula preceded by \.{\\mathrel} 13434 also results in a |rel_noad|. When a |rel_noad| is followed by an 13435 |op_noad|, say, and possibly separated by one or more ordinary nodes (not 13436 noads), \TeX\ will insert a penalty node (with the current |rel_penalty|) 13437 just after the formula that corresponds to the |rel_noad|, unless there 13438 already was a penalty immediately following; and a ``thick space'' will be 13439 inserted just before the formula that corresponds to the |op_noad|. 13440 13441 A noad of type |ord_noad|, |op_noad|, \dots, |inner_noad| usually 13442 has a |subtype=normal|. The only exception is that an |op_noad| might 13443 have |subtype=limits| or |no_limits|, if the normal positioning of 13444 limits has been overridden for this operator. 13445 13446 @d ord_noad=unset_node+3 {|type| of a noad classified Ord} 13447 @d op_noad=ord_noad+1 {|type| of a noad classified Op} 13448 @d bin_noad=ord_noad+2 {|type| of a noad classified Bin} 13449 @d rel_noad=ord_noad+3 {|type| of a noad classified Rel} 13450 @d open_noad=ord_noad+4 {|type| of a noad classified Open} 13451 @d close_noad=ord_noad+5 {|type| of a noad classified Close} 13452 @d punct_noad=ord_noad+6 {|type| of a noad classified Punct} 13453 @d inner_noad=ord_noad+7 {|type| of a noad classified Inner} 13454 @d limits=1 {|subtype| of |op_noad| whose scripts are to be above, below} 13455 @d no_limits=2 {|subtype| of |op_noad| whose scripts are to be normal} 13456 13457 @ A |radical_noad| is five words long; the fifth word is the |left_delimiter| 13458 field, which usually represents a square root sign. 13459 13460 A |fraction_noad| is six words long; it has a |right_delimiter| field 13461 as well as a |left_delimiter|. 13462 13463 Delimiter fields are of type |four_quarters|, and they have four subfields 13464 called |small_fam|, |small_char|, |large_fam|, |large_char|. These subfields 13465 represent variable-size delimiters by giving the ``small'' and ``large'' 13466 starting characters, as explained in Chapter~17 of {\sl The \TeX book}. 13467 @:TeXbook}{\sl The \TeX book@> 13468 13469 A |fraction_noad| is actually quite different from all other noads. Not 13470 only does it have six words, it has |thickness|, |denominator|, and 13471 |numerator| fields instead of |nucleus|, |subscr|, and |supscr|. The 13472 |thickness| is a scaled value that tells how thick to make a fraction 13473 rule; however, the special value |default_code| is used to stand for the 13474 |default_rule_thickness| of the current size. The |numerator| and 13475 |denominator| point to mlists that define a fraction; we always have 13476 $$\hbox{|math_type(numerator)=math_type(denominator)=sub_mlist|}.$$ The 13477 |left_delimiter| and |right_delimiter| fields specify delimiters that will 13478 be placed at the left and right of the fraction. In this way, a 13479 |fraction_noad| is able to represent all of \TeX's operators \.{\\over}, 13480 \.{\\atop}, \.{\\above}, \.{\\overwithdelims}, \.{\\atopwithdelims}, and 13481 \.{\\abovewithdelims}. 13482 13483 @d left_delimiter(#)==#+4 {first delimiter field of a noad} 13484 @d right_delimiter(#)==#+5 {second delimiter field of a fraction noad} 13485 @d radical_noad=inner_noad+1 {|type| of a noad for square roots} 13486 @d radical_noad_size=5 {number of |mem| words in a radical noad} 13487 @d fraction_noad=radical_noad+1 {|type| of a noad for generalized fractions} 13488 @d fraction_noad_size=6 {number of |mem| words in a fraction noad} 13489 @d small_fam(#)==mem[#].qqqq.b0 {|fam| for ``small'' delimiter} 13490 @d small_char(#)==mem[#].qqqq.b1 {|character| for ``small'' delimiter} 13491 @d large_fam(#)==mem[#].qqqq.b2 {|fam| for ``large'' delimiter} 13492 @d large_char(#)==mem[#].qqqq.b3 {|character| for ``large'' delimiter} 13493 @d thickness==width {|thickness| field in a fraction noad} 13494 @d default_code==@'10000000000 {denotes |default_rule_thickness|} 13495 @d numerator==supscr {|numerator| field in a fraction noad} 13496 @d denominator==subscr {|denominator| field in a fraction noad} 13497 13498 @ The global variable |empty_field| is set up for initialization of empty 13499 fields in new noads. Similarly, |null_delimiter| is for the initialization 13500 of delimiter fields. 13501 13502 @<Glob...@>= 13503 @!empty_field:two_halves; 13504 @!null_delimiter:four_quarters; 13505 13506 @ @<Set init...@>= 13507 empty_field.rh:=empty; empty_field.lh:=null;@/ 13508 null_delimiter.b0:=0; null_delimiter.b1:=min_quarterword;@/ 13509 null_delimiter.b2:=0; null_delimiter.b3:=min_quarterword; 13510 13511 @ The |new_noad| function creates an |ord_noad| that is completely null. 13512 13513 @p function new_noad:pointer; 13514 var p:pointer; 13515 begin p:=get_node(noad_size); 13516 type(p):=ord_noad; subtype(p):=normal; 13517 mem[nucleus(p)].hh:=empty_field; 13518 mem[subscr(p)].hh:=empty_field; 13519 mem[supscr(p)].hh:=empty_field; 13520 new_noad:=p; 13521 end; 13522 13523 @ A few more kinds of noads will complete the set: An |under_noad| has its 13524 nucleus underlined; an |over_noad| has it overlined. An |accent_noad| places 13525 an accent over its nucleus; the accent character appears as 13526 |fam(accent_chr(p))| and |character(accent_chr(p))|. A |vcenter_noad| 13527 centers its nucleus vertically with respect to the axis of the formula; 13528 in such noads we always have |math_type(nucleus(p))=sub_box|. 13529 13530 And finally, we have |left_noad| and |right_noad| types, to implement 13531 \TeX's \.{\\left} and \.{\\right}. The |nucleus| of such noads is 13532 replaced by a |delimiter| field; thus, for example, `\.{\\left(}' produces 13533 a |left_noad| such that |delimiter(p)| holds the family and character 13534 codes for all left parentheses. A |left_noad| never appears in an mlist 13535 except as the first element, and a |right_noad| never appears in an mlist 13536 except as the last element; furthermore, we either have both a |left_noad| 13537 and a |right_noad|, or neither one is present. The |subscr| and |supscr| 13538 fields are always |empty| in a |left_noad| and a |right_noad|. 13539 13540 @d under_noad=fraction_noad+1 {|type| of a noad for underlining} 13541 @d over_noad=under_noad+1 {|type| of a noad for overlining} 13542 @d accent_noad=over_noad+1 {|type| of a noad for accented subformulas} 13543 @d accent_noad_size=5 {number of |mem| words in an accent noad} 13544 @d accent_chr(#)==#+4 {the |accent_chr| field of an accent noad} 13545 @d vcenter_noad=accent_noad+1 {|type| of a noad for \.{\\vcenter}} 13546 @d left_noad=vcenter_noad+1 {|type| of a noad for \.{\\left}} 13547 @d right_noad=left_noad+1 {|type| of a noad for \.{\\right}} 13548 @d delimiter==nucleus {|delimiter| field in left and right noads} 13549 @d scripts_allowed(#)==(type(#)>=ord_noad)and(type(#)<left_noad) 13550 13551 @ Math formulas can also contain instructions like \.{\\textstyle} that 13552 override \TeX's normal style rules. A |style_node| is inserted into the 13553 data structure to record such instructions; it is three words long, so it 13554 is considered a node instead of a noad. The |subtype| is either |display_style| 13555 or |text_style| or |script_style| or |script_script_style|. The 13556 second and third words of a |style_node| are not used, but they are 13557 present because a |choice_node| is converted to a |style_node|. 13558 13559 \TeX\ uses even numbers 0, 2, 4, 6 to encode the basic styles 13560 |display_style|, \dots, |script_script_style|, and adds~1 to get the 13561 ``cramped'' versions of these styles. This gives a numerical order that 13562 is backwards from the convention of Appendix~G in {\sl The \TeX book\/}; 13563 i.e., a smaller style has a larger numerical value. 13564 @:TeXbook}{\sl The \TeX book@> 13565 13566 @d style_node=unset_node+1 {|type| of a style node} 13567 @d style_node_size=3 {number of words in a style node} 13568 @d display_style=0 {|subtype| for \.{\\displaystyle}} 13569 @d text_style=2 {|subtype| for \.{\\textstyle}} 13570 @d script_style=4 {|subtype| for \.{\\scriptstyle}} 13571 @d script_script_style=6 {|subtype| for \.{\\scriptscriptstyle}} 13572 @d cramped=1 {add this to an uncramped style if you want to cramp it} 13573 13574 @p function new_style(@!s:small_number):pointer; {create a style node} 13575 var p:pointer; {the new node} 13576 begin p:=get_node(style_node_size); type(p):=style_node; 13577 subtype(p):=s; width(p):=0; depth(p):=0; {the |width| and |depth| are not used} 13578 new_style:=p; 13579 end; 13580 13581 @ Finally, the \.{\\mathchoice} primitive creates a |choice_node|, which 13582 has special subfields |display_mlist|, |text_mlist|, |script_mlist|, 13583 and |script_script_mlist| pointing to the mlists for each style. 13584 13585 @d choice_node=unset_node+2 {|type| of a choice node} 13586 @d display_mlist(#)==info(#+1) {mlist to be used in display style} 13587 @d text_mlist(#)==link(#+1) {mlist to be used in text style} 13588 @d script_mlist(#)==info(#+2) {mlist to be used in script style} 13589 @d script_script_mlist(#)==link(#+2) {mlist to be used in scriptscript style} 13590 13591 @p function new_choice:pointer; {create a choice node} 13592 var p:pointer; {the new node} 13593 begin p:=get_node(style_node_size); type(p):=choice_node; 13594 subtype(p):=0; {the |subtype| is not used} 13595 display_mlist(p):=null; text_mlist(p):=null; script_mlist(p):=null; 13596 script_script_mlist(p):=null; 13597 new_choice:=p; 13598 end; 13599 13600 @ Let's consider now the previously unwritten part of |show_node_list| 13601 that displays the things that can only be present in mlists; this 13602 program illustrates how to access the data structures just defined. 13603 13604 In the context of the following program, |p| points to a node or noad that 13605 should be displayed, and the current string contains the ``recursion history'' 13606 that leads to this point. The recursion history consists of a dot for each 13607 outer level in which |p| is subsidiary to some node, or in which |p| is 13608 subsidiary to the |nucleus| field of some noad; the dot is replaced by 13609 `\.\_' or `\.\^' or `\./' or `\.\\' if |p| is descended from the |subscr| 13610 or |supscr| or |denominator| or |numerator| fields of noads. For example, 13611 the current string would be `\.{.\^.\_/}' if |p| points to the |ord_noad| for 13612 |x| in the (ridiculous) formula 13613 `\.{\$\\sqrt\{a\^\{\\mathinner\{b\_\{c\\over x+y\}\}\}\}\$}'. 13614 13615 @<Cases of |show_node_list| that arise...@>= 13616 style_node:print_style(subtype(p)); 13617 choice_node:@<Display choice node |p|@>; 13618 ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad, 13619 radical_noad,over_noad,under_noad,vcenter_noad,accent_noad, 13620 left_noad,right_noad:@<Display normal noad |p|@>; 13621 fraction_noad:@<Display fraction noad |p|@>; 13622 13623 @ Here are some simple routines used in the display of noads. 13624 13625 @<Declare procedures needed for displaying the elements of mlists@>= 13626 procedure print_fam_and_char(@!p:pointer); {prints family and character} 13627 begin print_esc("fam"); print_int(fam(p)); print_char(" "); 13628 print_ASCII(qo(character(p))); 13629 end; 13630 @# 13631 procedure print_delimiter(@!p:pointer); {prints a delimiter as 24-bit hex value} 13632 var a:integer; {accumulator} 13633 begin a:=small_fam(p)*256+qo(small_char(p)); 13634 a:=a*@"1000+large_fam(p)*256+qo(large_char(p)); 13635 if a<0 then print_int(a) {this should never happen} 13636 else print_hex(a); 13637 end; 13638 13639 @ The next subroutine will descend to another level of recursion when a 13640 subsidiary mlist needs to be displayed. The parameter |c| indicates what 13641 character is to become part of the recursion history. An empty mlist is 13642 distinguished from a field with |math_type(p)=empty|, because these are 13643 not equivalent (as explained above). 13644 @^recursion@> 13645 13646 @<Declare procedures needed for displaying...@>= 13647 procedure@?show_info; forward;@t\2@>@?{|show_node_list(info(temp_ptr))|} 13648 procedure print_subsidiary_data(@!p:pointer;@!c:ASCII_code); 13649 {display a noad field} 13650 begin if cur_length>=depth_threshold then 13651 begin if math_type(p)<>empty then print(" []"); 13652 end 13653 else begin append_char(c); {include |c| in the recursion history} 13654 temp_ptr:=p; {prepare for |show_info| if recursion is needed} 13655 case math_type(p) of 13656 math_char: begin print_ln; print_current_string; print_fam_and_char(p); 13657 end; 13658 sub_box: show_info; {recursive call} 13659 sub_mlist: if info(p)=null then 13660 begin print_ln; print_current_string; print("{}"); 13661 end 13662 else show_info; {recursive call} 13663 othercases do_nothing {|empty|} 13664 endcases;@/ 13665 flush_char; {remove |c| from the recursion history} 13666 end; 13667 end; 13668 13669 @ The inelegant introduction of |show_info| in the code above seems better 13670 than the alternative of using \PASCAL's strange |forward| declaration for a 13671 procedure with parameters. The \PASCAL\ convention about dropping parameters 13672 from a post-|forward| procedure is, frankly, so intolerable to the author 13673 of \TeX\ that he would rather stoop to communication via a global temporary 13674 variable. (A similar stoopidity occurred with respect to |hlist_out| and 13675 |vlist_out| above, and it will occur with respect to |mlist_to_hlist| below.) 13676 @^Knuth, Donald Ervin@> 13677 @:PASCAL}{\PASCAL@> 13678 13679 @p procedure show_info; {the reader will kindly forgive this} 13680 begin show_node_list(info(temp_ptr)); 13681 end; 13682 13683 @ @<Declare procedures needed for displaying...@>= 13684 procedure print_style(@!c:integer); 13685 begin case c div 2 of 13686 0: print_esc("displaystyle"); {|display_style=0|} 13687 1: print_esc("textstyle"); {|text_style=2|} 13688 2: print_esc("scriptstyle"); {|script_style=4|} 13689 3: print_esc("scriptscriptstyle"); {|script_script_style=6|} 13690 othercases print("Unknown style!") 13691 endcases; 13692 end; 13693 13694 @ @<Display choice node |p|@>= 13695 begin print_esc("mathchoice"); 13696 append_char("D"); show_node_list(display_mlist(p)); flush_char; 13697 append_char("T"); show_node_list(text_mlist(p)); flush_char; 13698 append_char("S"); show_node_list(script_mlist(p)); flush_char; 13699 append_char("s"); show_node_list(script_script_mlist(p)); flush_char; 13700 end 13701 13702 @ @<Display normal noad |p|@>= 13703 begin case type(p) of 13704 ord_noad: print_esc("mathord"); 13705 op_noad: print_esc("mathop"); 13706 bin_noad: print_esc("mathbin"); 13707 rel_noad: print_esc("mathrel"); 13708 open_noad: print_esc("mathopen"); 13709 close_noad: print_esc("mathclose"); 13710 punct_noad: print_esc("mathpunct"); 13711 inner_noad: print_esc("mathinner"); 13712 over_noad: print_esc("overline"); 13713 under_noad: print_esc("underline"); 13714 vcenter_noad: print_esc("vcenter"); 13715 radical_noad: begin print_esc("radical"); print_delimiter(left_delimiter(p)); 13716 end; 13717 accent_noad: begin print_esc("accent"); print_fam_and_char(accent_chr(p)); 13718 end; 13719 left_noad: begin print_esc("left"); print_delimiter(delimiter(p)); 13720 end; 13721 right_noad: begin print_esc("right"); print_delimiter(delimiter(p)); 13722 end; 13723 end; 13724 if subtype(p)<>normal then 13725 if subtype(p)=limits then print_esc("limits") 13726 else print_esc("nolimits"); 13727 if type(p)<left_noad then print_subsidiary_data(nucleus(p),"."); 13728 print_subsidiary_data(supscr(p),"^"); 13729 print_subsidiary_data(subscr(p),"_"); 13730 end 13731 13732 @ @<Display fraction noad |p|@>= 13733 begin print_esc("fraction, thickness "); 13734 if thickness(p)=default_code then print("= default") 13735 else print_scaled(thickness(p)); 13736 if (small_fam(left_delimiter(p))<>0)or@+ 13737 (small_char(left_delimiter(p))<>min_quarterword)or@| 13738 (large_fam(left_delimiter(p))<>0)or@| 13739 (large_char(left_delimiter(p))<>min_quarterword) then 13740 begin print(", left-delimiter "); print_delimiter(left_delimiter(p)); 13741 end; 13742 if (small_fam(right_delimiter(p))<>0)or@| 13743 (small_char(right_delimiter(p))<>min_quarterword)or@| 13744 (large_fam(right_delimiter(p))<>0)or@| 13745 (large_char(right_delimiter(p))<>min_quarterword) then 13746 begin print(", right-delimiter "); print_delimiter(right_delimiter(p)); 13747 end; 13748 print_subsidiary_data(numerator(p),"\"); 13749 print_subsidiary_data(denominator(p),"/"); 13750 end 13751 13752 @ That which can be displayed can also be destroyed. 13753 13754 @<Cases of |flush_node_list| that arise...@>= 13755 style_node: begin free_node(p,style_node_size); goto done; 13756 end; 13757 choice_node:begin flush_node_list(display_mlist(p)); 13758 flush_node_list(text_mlist(p)); 13759 flush_node_list(script_mlist(p)); 13760 flush_node_list(script_script_mlist(p)); 13761 free_node(p,style_node_size); goto done; 13762 end; 13763 ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad, 13764 radical_noad,over_noad,under_noad,vcenter_noad,accent_noad:@t@>@;@/ 13765 begin if math_type(nucleus(p))>=sub_box then 13766 flush_node_list(info(nucleus(p))); 13767 if math_type(supscr(p))>=sub_box then 13768 flush_node_list(info(supscr(p))); 13769 if math_type(subscr(p))>=sub_box then 13770 flush_node_list(info(subscr(p))); 13771 if type(p)=radical_noad then free_node(p,radical_noad_size) 13772 else if type(p)=accent_noad then free_node(p,accent_noad_size) 13773 else free_node(p,noad_size); 13774 goto done; 13775 end; 13776 left_noad,right_noad: begin free_node(p,noad_size); goto done; 13777 end; 13778 fraction_noad: begin flush_node_list(info(numerator(p))); 13779 flush_node_list(info(denominator(p))); 13780 free_node(p,fraction_noad_size); goto done; 13781 end; 13782 13783 @* \[35] Subroutines for math mode. 13784 In order to convert mlists to hlists, i.e., noads to nodes, we need several 13785 subroutines that are conveniently dealt with now. 13786 13787 Let us first introduce the macros that make it easy to get at the parameters and 13788 other font information. A size code, which is a multiple of 16, is added to a 13789 family number to get an index into the table of internal font numbers 13790 for each combination of family and size. (Be alert: Size codes get 13791 larger as the type gets smaller.) 13792 13793 @d text_size=0 {size code for the largest size in a family} 13794 @d script_size=16 {size code for the medium size in a family} 13795 @d script_script_size=32 {size code for the smallest size in a family} 13796 13797 @<Basic printing procedures@>= 13798 procedure print_size(@!s:integer); 13799 begin if s=text_size then print_esc("textfont") 13800 else if s=script_size then print_esc("scriptfont") 13801 else print_esc("scriptscriptfont"); 13802 end; 13803 13804 @ Before an mlist is converted to an hlist, \TeX\ makes sure that 13805 the fonts in family~2 have enough parameters to be math-symbol 13806 fonts, and that the fonts in family~3 have enough parameters to be 13807 math-extension fonts. The math-symbol parameters are referred to by using the 13808 following macros, which take a size code as their parameter; for example, 13809 |num1(cur_size)| gives the value of the |num1| parameter for the current size. 13810 @^parameters for symbols@> 13811 @^font parameters@> 13812 13813 @d mathsy_end(#)==fam_fnt(2+#)]].sc 13814 @d mathsy(#)==font_info[#+param_base[mathsy_end 13815 @d math_x_height==mathsy(5) {height of `\.x'} 13816 @d math_quad==mathsy(6) {\.{18mu}} 13817 @d num1==mathsy(8) {numerator shift-up in display styles} 13818 @d num2==mathsy(9) {numerator shift-up in non-display, non-\.{\\atop}} 13819 @d num3==mathsy(10) {numerator shift-up in non-display \.{\\atop}} 13820 @d denom1==mathsy(11) {denominator shift-down in display styles} 13821 @d denom2==mathsy(12) {denominator shift-down in non-display styles} 13822 @d sup1==mathsy(13) {superscript shift-up in uncramped display style} 13823 @d sup2==mathsy(14) {superscript shift-up in uncramped non-display} 13824 @d sup3==mathsy(15) {superscript shift-up in cramped styles} 13825 @d sub1==mathsy(16) {subscript shift-down if superscript is absent} 13826 @d sub2==mathsy(17) {subscript shift-down if superscript is present} 13827 @d sup_drop==mathsy(18) {superscript baseline below top of large box} 13828 @d sub_drop==mathsy(19) {subscript baseline below bottom of large box} 13829 @d delim1==mathsy(20) {size of \.{\\atopwithdelims} delimiters 13830 in display styles} 13831 @d delim2==mathsy(21) {size of \.{\\atopwithdelims} delimiters in non-displays} 13832 @d axis_height==mathsy(22) {height of fraction lines above the baseline} 13833 @d total_mathsy_params=22 13834 13835 @ The math-extension parameters have similar macros, but the size code is 13836 omitted (since it is always |cur_size| when we refer to such parameters). 13837 @^parameters for symbols@> 13838 @^font parameters@> 13839 13840 @d mathex(#)==font_info[#+param_base[fam_fnt(3+cur_size)]].sc 13841 @d default_rule_thickness==mathex(8) {thickness of \.{\\over} bars} 13842 @d big_op_spacing1==mathex(9) {minimum clearance above a displayed op} 13843 @d big_op_spacing2==mathex(10) {minimum clearance below a displayed op} 13844 @d big_op_spacing3==mathex(11) {minimum baselineskip above displayed op} 13845 @d big_op_spacing4==mathex(12) {minimum baselineskip below displayed op} 13846 @d big_op_spacing5==mathex(13) {padding above and below displayed limits} 13847 @d total_mathex_params=13 13848 13849 @ We also need to compute the change in style between mlists and their 13850 subsidiaries. The following macros define the subsidiary style for 13851 an overlined nucleus (|cramped_style|), for a subscript or a superscript 13852 (|sub_style| or |sup_style|), or for a numerator or denominator (|num_style| 13853 or |denom_style|). 13854 13855 @d cramped_style(#)==2*(# div 2)+cramped {cramp the style} 13856 @d sub_style(#)==2*(# div 4)+script_style+cramped {smaller and cramped} 13857 @d sup_style(#)==2*(# div 4)+script_style+(# mod 2) {smaller} 13858 @d num_style(#)==#+2-2*(# div 6) {smaller unless already script-script} 13859 @d denom_style(#)==2*(# div 2)+cramped+2-2*(# div 6) {smaller, cramped} 13860 13861 @ When the style changes, the following piece of program computes associated 13862 information: 13863 13864 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>= 13865 begin if cur_style<script_style then cur_size:=text_size 13866 else cur_size:=16*((cur_style-text_style) div 2); 13867 cur_mu:=x_over_n(math_quad(cur_size),18); 13868 end 13869 13870 @ Here is a function that returns a pointer to a rule node having a given 13871 thickness |t|. The rule will extend horizontally to the boundary of the vlist 13872 that eventually contains it. 13873 13874 @p function fraction_rule(@!t:scaled):pointer; 13875 {construct the bar for a fraction} 13876 var p:pointer; {the new node} 13877 begin p:=new_rule; height(p):=t; depth(p):=0; fraction_rule:=p; 13878 end; 13879 13880 @ The |overbar| function returns a pointer to a vlist box that consists of 13881 a given box |b|, above which has been placed a kern of height |k| under a 13882 fraction rule of thickness |t| under additional space of height |t|. 13883 13884 @p function overbar(@!b:pointer;@!k,@!t:scaled):pointer; 13885 var p,@!q:pointer; {nodes being constructed} 13886 begin p:=new_kern(k); link(p):=b; q:=fraction_rule(t); link(q):=p; 13887 p:=new_kern(t); link(p):=q; overbar:=vpack(p,natural); 13888 end; 13889 13890 @ The |var_delimiter| function, which finds or constructs a sufficiently 13891 large delimiter, is the most interesting of the auxiliary functions that 13892 currently concern us. Given a pointer |d| to a delimiter field in some noad, 13893 together with a size code |s| and a vertical distance |v|, this function 13894 returns a pointer to a box that contains the smallest variant of |d| whose 13895 height plus depth is |v| or more. (And if no variant is large enough, it 13896 returns the largest available variant.) In particular, this routine will 13897 construct arbitrarily large delimiters from extensible components, if 13898 |d| leads to such characters. 13899 13900 The value returned is a box whose |shift_amount| has been set so that 13901 the box is vertically centered with respect to the axis in the given size. 13902 If a built-up symbol is returned, the height of the box before shifting 13903 will be the height of its topmost component. 13904 13905 @p@t\4@>@<Declare subprocedures for |var_delimiter|@> 13906 function var_delimiter(@!d:pointer;@!s:small_number;@!v:scaled):pointer; 13907 label found,continue; 13908 var b:pointer; {the box that will be constructed} 13909 @!f,@!g: internal_font_number; {best-so-far and tentative font codes} 13910 @!c,@!x,@!y: quarterword; {best-so-far and tentative character codes} 13911 @!m,@!n: integer; {the number of extensible pieces} 13912 @!u: scaled; {height-plus-depth of a tentative character} 13913 @!w: scaled; {largest height-plus-depth so far} 13914 @!q: four_quarters; {character info} 13915 @!hd: eight_bits; {height-depth byte} 13916 @!r: four_quarters; {extensible pieces} 13917 @!z: small_number; {runs through font family members} 13918 @!large_attempt: boolean; {are we trying the ``large'' variant?} 13919 begin f:=null_font; w:=0; large_attempt:=false; 13920 z:=small_fam(d); x:=small_char(d); 13921 loop@+ begin @<Look at the variants of |(z,x)|; set |f| and |c| whenever 13922 a better character is found; |goto found| as soon as a 13923 large enough variant is encountered@>; 13924 if large_attempt then goto found; {there were none large enough} 13925 large_attempt:=true; z:=large_fam(d); x:=large_char(d); 13926 end; 13927 found: if f<>null_font then 13928 @<Make variable |b| point to a box for |(f,c)|@> 13929 else begin b:=new_null_box; 13930 width(b):=null_delimiter_space; {use this width if no delimiter was found} 13931 end; 13932 shift_amount(b):=half(height(b)-depth(b)) - axis_height(s); 13933 var_delimiter:=b; 13934 end; 13935 13936 @ The search process is complicated slightly by the facts that some of the 13937 characters might not be present in some of the fonts, and they might not 13938 be probed in increasing order of height. 13939 13940 @<Look at the variants of |(z,x)|; set |f| and |c|...@>= 13941 if (z<>0)or(x<>min_quarterword) then 13942 begin z:=z+s+16; 13943 repeat z:=z-16; g:=fam_fnt(z); 13944 if g<>null_font then 13945 @<Look at the list of characters starting with |x| in 13946 font |g|; set |f| and |c| whenever 13947 a better character is found; |goto found| as soon as a 13948 large enough variant is encountered@>; 13949 until z<16; 13950 end 13951 13952 @ @<Look at the list of characters starting with |x|...@>= 13953 begin y:=x; 13954 if (qo(y)>=font_bc[g])and(qo(y)<=font_ec[g]) then 13955 begin continue: q:=char_info(g)(y); 13956 if char_exists(q) then 13957 begin if char_tag(q)=ext_tag then 13958 begin f:=g; c:=y; goto found; 13959 end; 13960 hd:=height_depth(q); 13961 u:=char_height(g)(hd)+char_depth(g)(hd); 13962 if u>w then 13963 begin f:=g; c:=y; w:=u; 13964 if u>=v then goto found; 13965 end; 13966 if char_tag(q)=list_tag then 13967 begin y:=rem_byte(q); goto continue; 13968 end; 13969 end; 13970 end; 13971 end 13972 13973 @ Here is a subroutine that creates a new box, whose list contains a 13974 single character, and whose width includes the italic correction for 13975 that character. The height or depth of the box will be negative, if 13976 the height or depth of the character is negative; thus, this routine 13977 may deliver a slightly different result than |hpack| would produce. 13978 13979 @<Declare subprocedures for |var_delimiter|@>= 13980 function char_box(@!f:internal_font_number;@!c:quarterword):pointer; 13981 var q:four_quarters; 13982 @!hd:eight_bits; {|height_depth| byte} 13983 @!b,@!p:pointer; {the new box and its character node} 13984 begin q:=char_info(f)(c); hd:=height_depth(q); 13985 b:=new_null_box; width(b):=char_width(f)(q)+char_italic(f)(q); 13986 height(b):=char_height(f)(hd); depth(b):=char_depth(f)(hd); 13987 p:=get_avail; character(p):=c; font(p):=f; list_ptr(b):=p; char_box:=b; 13988 end; 13989 13990 @ When the following code is executed, |char_tag(q)| will be equal to 13991 |ext_tag| if and only if a built-up symbol is supposed to be returned. 13992 13993 @<Make variable |b| point to a box for |(f,c)|@>= 13994 if char_tag(q)=ext_tag then 13995 @<Construct an extensible character in a new box |b|, 13996 using recipe |rem_byte(q)| and font |f|@> 13997 else b:=char_box(f,c) 13998 13999 @ When we build an extensible character, it's handy to have the 14000 following subroutine, which puts a given character on top 14001 of the characters already in box |b|: 14002 14003 @<Declare subprocedures for |var_delimiter|@>= 14004 procedure stack_into_box(@!b:pointer;@!f:internal_font_number; 14005 @!c:quarterword); 14006 var p:pointer; {new node placed into |b|} 14007 begin p:=char_box(f,c); link(p):=list_ptr(b); list_ptr(b):=p; 14008 height(b):=height(p); 14009 end; 14010 14011 @ Another handy subroutine computes the height plus depth of 14012 a given character: 14013 14014 @<Declare subprocedures for |var_delimiter|@>= 14015 function height_plus_depth(@!f:internal_font_number;@!c:quarterword):scaled; 14016 var q:four_quarters; 14017 @!hd:eight_bits; {|height_depth| byte} 14018 begin q:=char_info(f)(c); hd:=height_depth(q); 14019 height_plus_depth:=char_height(f)(hd)+char_depth(f)(hd); 14020 end; 14021 14022 @ @<Construct an extensible...@>= 14023 begin b:=new_null_box; 14024 type(b):=vlist_node; 14025 r:=font_info[exten_base[f]+rem_byte(q)].qqqq;@/ 14026 @<Compute the minimum suitable height, |w|, and the corresponding 14027 number of extension steps, |n|; also set |width(b)|@>; 14028 c:=ext_bot(r); 14029 if c<>min_quarterword then stack_into_box(b,f,c); 14030 c:=ext_rep(r); 14031 for m:=1 to n do stack_into_box(b,f,c); 14032 c:=ext_mid(r); 14033 if c<>min_quarterword then 14034 begin stack_into_box(b,f,c); c:=ext_rep(r); 14035 for m:=1 to n do stack_into_box(b,f,c); 14036 end; 14037 c:=ext_top(r); 14038 if c<>min_quarterword then stack_into_box(b,f,c); 14039 depth(b):=w-height(b); 14040 end 14041 14042 @ The width of an extensible character is the width of the repeatable 14043 module. If this module does not have positive height plus depth, 14044 we don't use any copies of it, otherwise we use as few as possible 14045 (in groups of two if there is a middle part). 14046 14047 @<Compute the minimum suitable height, |w|, and...@>= 14048 c:=ext_rep(r); u:=height_plus_depth(f,c); 14049 w:=0; q:=char_info(f)(c); width(b):=char_width(f)(q)+char_italic(f)(q);@/ 14050 c:=ext_bot(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c); 14051 c:=ext_mid(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c); 14052 c:=ext_top(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c); 14053 n:=0; 14054 if u>0 then while w<v do 14055 begin w:=w+u; incr(n); 14056 if ext_mid(r)<>min_quarterword then w:=w+u; 14057 end 14058 14059 @ The next subroutine is much simpler; it is used for numerators and 14060 denominators of fractions as well as for displayed operators and 14061 their limits above and below. It takes a given box~|b| and 14062 changes it so that the new box is centered in a box of width~|w|. 14063 The centering is done by putting \.{\\hss} glue at the left and right 14064 of the list inside |b|, then packaging the new box; thus, the 14065 actual box might not really be centered, if it already contains 14066 infinite glue. 14067 14068 The given box might contain a single character whose italic correction 14069 has been added to the width of the box; in this case a compensating 14070 kern is inserted. 14071 14072 @p function rebox(@!b:pointer;@!w:scaled):pointer; 14073 var p:pointer; {temporary register for list manipulation} 14074 @!f:internal_font_number; {font in a one-character box} 14075 @!v:scaled; {width of a character without italic correction} 14076 begin if (width(b)<>w)and(list_ptr(b)<>null) then 14077 begin if type(b)=vlist_node then b:=hpack(b,natural); 14078 p:=list_ptr(b); 14079 if (is_char_node(p))and(link(p)=null) then 14080 begin f:=font(p); v:=char_width(f)(char_info(f)(character(p))); 14081 if v<>width(b) then link(p):=new_kern(width(b)-v); 14082 end; 14083 free_node(b,box_node_size); 14084 b:=new_glue(ss_glue); link(b):=p; 14085 while link(p)<>null do p:=link(p); 14086 link(p):=new_glue(ss_glue); 14087 rebox:=hpack(b,w,exactly); 14088 end 14089 else begin width(b):=w; rebox:=b; 14090 end; 14091 end; 14092 14093 @ Here is a subroutine that creates a new glue specification from another 14094 one that is expressed in `\.{mu}', given the value of the math unit. 14095 14096 @d mu_mult(#)==nx_plus_y(n,#,xn_over_d(#,f,@'200000)) 14097 14098 @p function math_glue(@!g:pointer;@!m:scaled):pointer; 14099 var p:pointer; {the new glue specification} 14100 @!n:integer; {integer part of |m|} 14101 @!f:scaled; {fraction part of |m|} 14102 begin n:=x_over_n(m,@'200000); f:=remainder;@/ 14103 if f<0 then 14104 begin decr(n); f:=f+@'200000; 14105 end; 14106 p:=get_node(glue_spec_size); 14107 width(p):=mu_mult(width(g)); {convert \.{mu} to \.{pt}} 14108 stretch_order(p):=stretch_order(g); 14109 if stretch_order(p)=normal then stretch(p):=mu_mult(stretch(g)) 14110 else stretch(p):=stretch(g); 14111 shrink_order(p):=shrink_order(g); 14112 if shrink_order(p)=normal then shrink(p):=mu_mult(shrink(g)) 14113 else shrink(p):=shrink(g); 14114 math_glue:=p; 14115 end; 14116 14117 @ The |math_kern| subroutine removes |mu_glue| from a kern node, given 14118 the value of the math unit. 14119 14120 @p procedure math_kern(@!p:pointer;@!m:scaled); 14121 var @!n:integer; {integer part of |m|} 14122 @!f:scaled; {fraction part of |m|} 14123 begin if subtype(p)=mu_glue then 14124 begin n:=x_over_n(m,@'200000); f:=remainder;@/ 14125 if f<0 then 14126 begin decr(n); f:=f+@'200000; 14127 end; 14128 width(p):=mu_mult(width(p)); subtype(p):=explicit; 14129 end; 14130 end; 14131 14132 @ Sometimes it is necessary to destroy an mlist. The following 14133 subroutine empties the current list, assuming that |abs(mode)=mmode|. 14134 14135 @p procedure flush_math; 14136 begin flush_node_list(link(head)); flush_node_list(incompleat_noad); 14137 link(head):=null; tail:=head; incompleat_noad:=null; 14138 end; 14139 14140 @* \[36] Typesetting math formulas. 14141 \TeX's most important routine for dealing with formulas is called 14142 |mlist_to_hlist|. After a formula has been scanned and represented as an 14143 mlist, this routine converts it to an hlist that can be placed into a box 14144 or incorporated into the text of a paragraph. There are three implicit 14145 parameters, passed in global variables: |cur_mlist| points to the first 14146 node or noad in the given mlist (and it might be |null|); |cur_style| is a 14147 style code; and |mlist_penalties| is |true| if penalty nodes for potential 14148 line breaks are to be inserted into the resulting hlist. After 14149 |mlist_to_hlist| has acted, |link(temp_head)| points to the translated hlist. 14150 14151 Since mlists can be inside mlists, the procedure is recursive. And since this 14152 is not part of \TeX's inner loop, the program has been written in a manner 14153 that stresses compactness over efficiency. 14154 @^recursion@> 14155 14156 @<Glob...@>= 14157 @!cur_mlist:pointer; {beginning of mlist to be translated} 14158 @!cur_style:small_number; {style code at current place in the list} 14159 @!cur_size:small_number; {size code corresponding to |cur_style|} 14160 @!cur_mu:scaled; {the math unit width corresponding to |cur_size|} 14161 @!mlist_penalties:boolean; {should |mlist_to_hlist| insert penalties?} 14162 14163 @ The recursion in |mlist_to_hlist| is due primarily to a subroutine 14164 called |clean_box| that puts a given noad field into a box using a given 14165 math style; |mlist_to_hlist| can call |clean_box|, which can call 14166 |mlist_to_hlist|. 14167 @^recursion@> 14168 14169 The box returned by |clean_box| is ``clean'' in the 14170 sense that its |shift_amount| is zero. 14171 14172 @p procedure@?mlist_to_hlist; forward;@t\2@>@/ 14173 function clean_box(@!p:pointer;@!s:small_number):pointer; 14174 label found; 14175 var q:pointer; {beginning of a list to be boxed} 14176 @!save_style:small_number; {|cur_style| to be restored} 14177 @!x:pointer; {box to be returned} 14178 @!r:pointer; {temporary pointer} 14179 begin case math_type(p) of 14180 math_char: begin cur_mlist:=new_noad; mem[nucleus(cur_mlist)]:=mem[p]; 14181 end; 14182 sub_box: begin q:=info(p); goto found; 14183 end; 14184 sub_mlist: cur_mlist:=info(p); 14185 othercases begin q:=new_null_box; goto found; 14186 end 14187 endcases;@/ 14188 save_style:=cur_style; cur_style:=s; mlist_penalties:=false;@/ 14189 mlist_to_hlist; q:=link(temp_head); {recursive call} 14190 cur_style:=save_style; {restore the style} 14191 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14192 found: if is_char_node(q)or(q=null) then x:=hpack(q,natural) 14193 else if (link(q)=null)and(type(q)<=vlist_node)and(shift_amount(q)=0) then 14194 x:=q {it's already clean} 14195 else x:=hpack(q,natural); 14196 @<Simplify a trivial box@>; 14197 clean_box:=x; 14198 end; 14199 14200 @ Here we save memory space in a common case. 14201 14202 @<Simplify a trivial box@>= 14203 q:=list_ptr(x); 14204 if is_char_node(q) then 14205 begin r:=link(q); 14206 if r<>null then if link(r)=null then if not is_char_node(r) then 14207 if type(r)=kern_node then {unneeded italic correction} 14208 begin free_node(r,small_node_size); link(q):=null; 14209 end; 14210 end 14211 14212 @ It is convenient to have a procedure that converts a |math_char| 14213 field to an ``unpacked'' form. The |fetch| routine sets |cur_f|, |cur_c|, 14214 and |cur_i| to the font code, character code, and character information bytes of 14215 a given noad field. It also takes care of issuing error messages for 14216 nonexistent characters; in such cases, |char_exists(cur_i)| will be |false| 14217 after |fetch| has acted, and the field will also have been reset to |empty|. 14218 14219 @p procedure fetch(@!a:pointer); {unpack the |math_char| field |a|} 14220 begin cur_c:=character(a); cur_f:=fam_fnt(fam(a)+cur_size); 14221 if cur_f=null_font then 14222 @<Complain about an undefined family and set |cur_i| null@> 14223 else begin if (qo(cur_c)>=font_bc[cur_f])and(qo(cur_c)<=font_ec[cur_f]) then 14224 cur_i:=char_info(cur_f)(cur_c) 14225 else cur_i:=null_character; 14226 if not(char_exists(cur_i)) then 14227 begin char_warning(cur_f,qo(cur_c)); 14228 math_type(a):=empty; cur_i:=null_character; 14229 end; 14230 end; 14231 end; 14232 14233 @ @<Complain about an undefined family...@>= 14234 begin print_err(""); print_size(cur_size); print_char(" "); 14235 print_int(fam(a)); print(" is undefined (character "); 14236 print_ASCII(qo(cur_c)); print_char(")"); 14237 help4("Somewhere in the math formula just ended, you used the")@/ 14238 ("stated character from an undefined font family. For example,")@/ 14239 ("plain TeX doesn't allow \it or \sl in subscripts. Proceed,")@/ 14240 ("and I'll try to forget that I needed that character."); 14241 error; cur_i:=null_character; math_type(a):=empty; 14242 end 14243 14244 @ The outputs of |fetch| are placed in global variables. 14245 14246 @<Glob...@>= 14247 @!cur_f:internal_font_number; {the |font| field of a |math_char|} 14248 @!cur_c:quarterword; {the |character| field of a |math_char|} 14249 @!cur_i:four_quarters; {the |char_info| of a |math_char|, 14250 or a lig/kern instruction} 14251 14252 @ We need to do a lot of different things, so |mlist_to_hlist| makes two 14253 passes over the given mlist. 14254 14255 The first pass does most of the processing: It removes ``mu'' spacing from 14256 glue, it recursively evaluates all subsidiary mlists so that only the 14257 top-level mlist remains to be handled, it puts fractions and square roots 14258 and such things into boxes, it attaches subscripts and superscripts, and 14259 it computes the overall height and depth of the top-level mlist so that 14260 the size of delimiters for a |left_noad| and a |right_noad| will be known. 14261 The hlist resulting from each noad is recorded in that noad's |new_hlist| 14262 field, an integer field that replaces the |nucleus| or |thickness|. 14263 @^recursion@> 14264 14265 The second pass eliminates all noads and inserts the correct glue and 14266 penalties between nodes. 14267 14268 @d new_hlist(#)==mem[nucleus(#)].int {the translation of an mlist} 14269 14270 @ Here is the overall plan of |mlist_to_hlist|, and the list of its 14271 local variables. 14272 14273 @d done_with_noad=80 {go here when a noad has been fully translated} 14274 @d done_with_node=81 {go here when a node has been fully converted} 14275 @d check_dimensions=82 {go here to update |max_h| and |max_d|} 14276 @d delete_q=83 {go here to delete |q| and move to the next node} 14277 14278 @p@t\4@>@<Declare math construction procedures@> 14279 procedure mlist_to_hlist; 14280 label reswitch, check_dimensions, done_with_noad, done_with_node, delete_q, 14281 done; 14282 var mlist:pointer; {beginning of the given list} 14283 @!penalties:boolean; {should penalty nodes be inserted?} 14284 @!style:small_number; {the given style} 14285 @!save_style:small_number; {holds |cur_style| during recursion} 14286 @!q:pointer; {runs through the mlist} 14287 @!r:pointer; {the most recent noad preceding |q|} 14288 @!r_type:small_number; {the |type| of noad |r|, or |op_noad| if |r=null|} 14289 @!t:small_number; {the effective |type| of noad |q| during the second pass} 14290 @!p,@!x,@!y,@!z: pointer; {temporary registers for list construction} 14291 @!pen:integer; {a penalty to be inserted} 14292 @!s:small_number; {the size of a noad to be deleted} 14293 @!max_h,@!max_d:scaled; {maximum height and depth of the list translated so far} 14294 @!delta:scaled; {offset between subscript and superscript} 14295 begin mlist:=cur_mlist; penalties:=mlist_penalties; 14296 style:=cur_style; {tuck global parameters away as local variables} 14297 q:=mlist; r:=null; r_type:=op_noad; max_h:=0; max_d:=0; 14298 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14299 while q<>null do @<Process node-or-noad |q| as much as possible in preparation 14300 for the second pass of |mlist_to_hlist|, then move to the next 14301 item in the mlist@>; 14302 @<Convert \(a)a final |bin_noad| to an |ord_noad|@>; 14303 @<Make a second pass over the mlist, removing all noads and inserting the 14304 proper spacing and penalties@>; 14305 end; 14306 14307 @ We use the fact that no character nodes appear in an mlist, hence 14308 the field |type(q)| is always present. 14309 14310 @<Process node-or-noad...@>= 14311 begin @<Do first-pass processing based on |type(q)|; |goto done_with_noad| 14312 if a noad has been fully processed, |goto check_dimensions| if it 14313 has been translated into |new_hlist(q)|, or |goto done_with_node| 14314 if a node has been fully processed@>; 14315 check_dimensions: z:=hpack(new_hlist(q),natural); 14316 if height(z)>max_h then max_h:=height(z); 14317 if depth(z)>max_d then max_d:=depth(z); 14318 free_node(z,box_node_size); 14319 done_with_noad: r:=q; r_type:=type(r); 14320 done_with_node: q:=link(q); 14321 end 14322 14323 @ One of the things we must do on the first pass is change a |bin_noad| to 14324 an |ord_noad| if the |bin_noad| is not in the context of a binary operator. 14325 The values of |r| and |r_type| make this fairly easy. 14326 14327 @<Do first-pass processing...@>= 14328 reswitch: delta:=0; 14329 case type(q) of 14330 bin_noad: case r_type of 14331 bin_noad,op_noad,rel_noad,open_noad,punct_noad,left_noad: 14332 begin type(q):=ord_noad; goto reswitch; 14333 end; 14334 othercases do_nothing 14335 endcases; 14336 rel_noad,close_noad,punct_noad,right_noad: begin@t@>@;@/ 14337 @<Convert \(a)a final |bin_noad| to an |ord_noad|@>; 14338 if type(q)=right_noad then goto done_with_noad; 14339 end; 14340 @t\4@>@<Cases for noads that can follow a |bin_noad|@>@; 14341 @t\4@>@<Cases for nodes that can appear in an mlist, after which we 14342 |goto done_with_node|@>@; 14343 othercases confusion("mlist1") 14344 @:this can't happen mlist1}{\quad mlist1@> 14345 endcases;@/ 14346 @<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@> 14347 14348 @ @<Convert \(a)a final |bin_noad| to an |ord_noad|@>= 14349 if r_type=bin_noad then type(r):=ord_noad 14350 14351 @ @<Cases for nodes that can appear in an mlist...@>= 14352 style_node: begin cur_style:=subtype(q); 14353 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14354 goto done_with_node; 14355 end; 14356 choice_node: @<Change this node to a style node followed by the correct choice, 14357 then |goto done_with_node|@>; 14358 ins_node,mark_node,adjust_node, 14359 whatsit_node,penalty_node,disc_node: goto done_with_node; 14360 rule_node: begin if height(q)>max_h then max_h:=height(q); 14361 if depth(q)>max_d then max_d:=depth(q); goto done_with_node; 14362 end; 14363 glue_node: begin @<Convert \(m)math glue to ordinary glue@>; 14364 goto done_with_node; 14365 end; 14366 kern_node: begin math_kern(q,cur_mu); goto done_with_node; 14367 end; 14368 14369 @ @d choose_mlist(#)==begin p:=#(q); #(q):=null;@+end 14370 14371 @<Change this node to a style node...@>= 14372 begin case cur_style div 2 of 14373 0: choose_mlist(display_mlist); {|display_style=0|} 14374 1: choose_mlist(text_mlist); {|text_style=2|} 14375 2: choose_mlist(script_mlist); {|script_style=4|} 14376 3: choose_mlist(script_script_mlist); {|script_script_style=6|} 14377 end; {there are no other cases} 14378 flush_node_list(display_mlist(q)); 14379 flush_node_list(text_mlist(q)); 14380 flush_node_list(script_mlist(q)); 14381 flush_node_list(script_script_mlist(q));@/ 14382 type(q):=style_node; subtype(q):=cur_style; width(q):=0; depth(q):=0; 14383 if p<>null then 14384 begin z:=link(q); link(q):=p; 14385 while link(p)<>null do p:=link(p); 14386 link(p):=z; 14387 end; 14388 goto done_with_node; 14389 end 14390 14391 @ Conditional math glue (`\.{\\nonscript}') results in a |glue_node| 14392 pointing to |zero_glue|, with |subtype(q)=cond_math_glue|; in such a case 14393 the node following will be eliminated if it is a glue or kern node and if the 14394 current size is different from |text_size|. Unconditional math glue 14395 (`\.{\\muskip}') is converted to normal glue by multiplying the dimensions 14396 by |cur_mu|. 14397 @!@:non_script_}{\.{\\nonscript} primitive@> 14398 14399 @<Convert \(m)math glue to ordinary glue@>= 14400 if subtype(q)=mu_glue then 14401 begin x:=glue_ptr(q); 14402 y:=math_glue(x,cur_mu); delete_glue_ref(x); glue_ptr(q):=y; 14403 subtype(q):=normal; 14404 end 14405 else if (cur_size<>text_size)and(subtype(q)=cond_math_glue) then 14406 begin p:=link(q); 14407 if p<>null then if (type(p)=glue_node)or(type(p)=kern_node) then 14408 begin link(q):=link(p); link(p):=null; flush_node_list(p); 14409 end; 14410 end 14411 14412 @ @<Cases for noads that can follow a |bin_noad|@>= 14413 left_noad: goto done_with_noad; 14414 fraction_noad: begin make_fraction(q); goto check_dimensions; 14415 end; 14416 op_noad: begin delta:=make_op(q); 14417 if subtype(q)=limits then goto check_dimensions; 14418 end; 14419 ord_noad: make_ord(q); 14420 open_noad,inner_noad: do_nothing; 14421 radical_noad: make_radical(q); 14422 over_noad: make_over(q); 14423 under_noad: make_under(q); 14424 accent_noad: make_math_accent(q); 14425 vcenter_noad: make_vcenter(q); 14426 14427 @ Most of the actual construction work of |mlist_to_hlist| is done 14428 by procedures with names 14429 like |make_fraction|, |make_radical|, etc. To illustrate 14430 the general setup of such procedures, let's begin with a couple of 14431 simple ones. 14432 14433 @<Declare math...@>= 14434 procedure make_over(@!q:pointer); 14435 begin info(nucleus(q)):=@| 14436 overbar(clean_box(nucleus(q),cramped_style(cur_style)),@| 14437 3*default_rule_thickness,default_rule_thickness); 14438 math_type(nucleus(q)):=sub_box; 14439 end; 14440 14441 @ @<Declare math...@>= 14442 procedure make_under(@!q:pointer); 14443 var p,@!x,@!y: pointer; {temporary registers for box construction} 14444 @!delta:scaled; {overall height plus depth} 14445 begin x:=clean_box(nucleus(q),cur_style); 14446 p:=new_kern(3*default_rule_thickness); link(x):=p; 14447 link(p):=fraction_rule(default_rule_thickness); 14448 y:=vpack(x,natural); 14449 delta:=height(y)+depth(y)+default_rule_thickness; 14450 height(y):=height(x); depth(y):=delta-height(y); 14451 info(nucleus(q)):=y; math_type(nucleus(q)):=sub_box; 14452 end; 14453 14454 @ @<Declare math...@>= 14455 procedure make_vcenter(@!q:pointer); 14456 var v:pointer; {the box that should be centered vertically} 14457 @!delta:scaled; {its height plus depth} 14458 begin v:=info(nucleus(q)); 14459 if type(v)<>vlist_node then confusion("vcenter"); 14460 @:this can't happen vcenter}{\quad vcenter@> 14461 delta:=height(v)+depth(v); 14462 height(v):=axis_height(cur_size)+half(delta); 14463 depth(v):=delta-height(v); 14464 end; 14465 14466 @ According to the rules in the \.{DVI} file specifications, we ensure alignment 14467 @^square roots@> 14468 between a square root sign and the rule above its nucleus by assuming that the 14469 baseline of the square-root symbol is the same as the bottom of the rule. The 14470 height of the square-root symbol will be the thickness of the rule, and the 14471 depth of the square-root symbol should exceed or equal the height-plus-depth 14472 of the nucleus plus a certain minimum clearance~|clr|. The symbol will be 14473 placed so that the actual clearance is |clr| plus half the excess. 14474 14475 @<Declare math...@>= 14476 procedure make_radical(@!q:pointer); 14477 var x,@!y:pointer; {temporary registers for box construction} 14478 @!delta,@!clr:scaled; {dimensions involved in the calculation} 14479 begin x:=clean_box(nucleus(q),cramped_style(cur_style)); 14480 if cur_style<text_style then {display style} 14481 clr:=default_rule_thickness+(abs(math_x_height(cur_size)) div 4) 14482 else begin clr:=default_rule_thickness; clr:=clr + (abs(clr) div 4); 14483 end; 14484 y:=var_delimiter(left_delimiter(q),cur_size,height(x)+depth(x)+clr+ 14485 default_rule_thickness); 14486 delta:=depth(y)-(height(x)+depth(x)+clr); 14487 if delta>0 then clr:=clr+half(delta); {increase the actual clearance} 14488 shift_amount(y):=-(height(x)+clr); 14489 link(y):=overbar(x,clr,height(y)); 14490 info(nucleus(q)):=hpack(y,natural); math_type(nucleus(q)):=sub_box; 14491 end; 14492 14493 @ Slants are not considered when placing accents in math mode. The accenter is 14494 centered over the accentee, and the accent width is treated as zero with 14495 respect to the size of the final box. 14496 14497 @<Declare math...@>= 14498 procedure make_math_accent(@!q:pointer); 14499 label done,done1; 14500 var p,@!x,@!y:pointer; {temporary registers for box construction} 14501 @!a:integer; {address of lig/kern instruction} 14502 @!c:quarterword; {accent character} 14503 @!f:internal_font_number; {its font} 14504 @!i:four_quarters; {its |char_info|} 14505 @!s:scaled; {amount to skew the accent to the right} 14506 @!h:scaled; {height of character being accented} 14507 @!delta:scaled; {space to remove between accent and accentee} 14508 @!w:scaled; {width of the accentee, not including sub/superscripts} 14509 begin fetch(accent_chr(q)); 14510 if char_exists(cur_i) then 14511 begin i:=cur_i; c:=cur_c; f:=cur_f;@/ 14512 @<Compute the amount of skew@>; 14513 x:=clean_box(nucleus(q),cramped_style(cur_style)); w:=width(x); h:=height(x); 14514 @<Switch to a larger accent if available and appropriate@>; 14515 if h<x_height(f) then delta:=h@+else delta:=x_height(f); 14516 if (math_type(supscr(q))<>empty)or(math_type(subscr(q))<>empty) then 14517 if math_type(nucleus(q))=math_char then 14518 @<Swap the subscript and superscript into box |x|@>; 14519 y:=char_box(f,c); 14520 shift_amount(y):=s+half(w-width(y)); 14521 width(y):=0; p:=new_kern(-delta); link(p):=x; link(y):=p; 14522 y:=vpack(y,natural); width(y):=width(x); 14523 if height(y)<h then @<Make the height of box |y| equal to |h|@>; 14524 info(nucleus(q)):=y; 14525 math_type(nucleus(q)):=sub_box; 14526 end; 14527 end; 14528 14529 @ @<Make the height of box |y|...@>= 14530 begin p:=new_kern(h-height(y)); link(p):=list_ptr(y); list_ptr(y):=p; 14531 height(y):=h; 14532 end 14533 14534 @ @<Switch to a larger accent if available and appropriate@>= 14535 loop@+ begin if char_tag(i)<>list_tag then goto done; 14536 y:=rem_byte(i); 14537 i:=char_info(f)(y); 14538 if not char_exists(i) then goto done; 14539 if char_width(f)(i)>w then goto done; 14540 c:=y; 14541 end; 14542 done: 14543 14544 @ @<Compute the amount of skew@>= 14545 s:=0; 14546 if math_type(nucleus(q))=math_char then 14547 begin fetch(nucleus(q)); 14548 if char_tag(cur_i)=lig_tag then 14549 begin a:=lig_kern_start(cur_f)(cur_i); 14550 cur_i:=font_info[a].qqqq; 14551 if skip_byte(cur_i)>stop_flag then 14552 begin a:=lig_kern_restart(cur_f)(cur_i); 14553 cur_i:=font_info[a].qqqq; 14554 end; 14555 loop@+ begin if qo(next_char(cur_i))=skew_char[cur_f] then 14556 begin if op_byte(cur_i)>=kern_flag then 14557 if skip_byte(cur_i)<=stop_flag then s:=char_kern(cur_f)(cur_i); 14558 goto done1; 14559 end; 14560 if skip_byte(cur_i)>=stop_flag then goto done1; 14561 a:=a+qo(skip_byte(cur_i))+1; 14562 cur_i:=font_info[a].qqqq; 14563 end; 14564 end; 14565 end; 14566 done1: 14567 14568 @ @<Swap the subscript and superscript into box |x|@>= 14569 begin flush_node_list(x); x:=new_noad; 14570 mem[nucleus(x)]:=mem[nucleus(q)]; 14571 mem[supscr(x)]:=mem[supscr(q)]; 14572 mem[subscr(x)]:=mem[subscr(q)];@/ 14573 mem[supscr(q)].hh:=empty_field; 14574 mem[subscr(q)].hh:=empty_field;@/ 14575 math_type(nucleus(q)):=sub_mlist; info(nucleus(q)):=x; 14576 x:=clean_box(nucleus(q),cur_style); delta:=delta+height(x)-h; h:=height(x); 14577 end 14578 14579 @ The |make_fraction| procedure is a bit different because it sets 14580 |new_hlist(q)| directly rather than making a sub-box. 14581 14582 @<Declare math...@>= 14583 procedure make_fraction(@!q:pointer); 14584 var p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction} 14585 @!delta,@!delta1,@!delta2,@!shift_up,@!shift_down,@!clr:scaled; 14586 {dimensions for box calculations} 14587 begin if thickness(q)=default_code then thickness(q):=default_rule_thickness; 14588 @<Create equal-width boxes |x| and |z| for the numerator and denominator, 14589 and compute the default amounts |shift_up| and |shift_down| by which they 14590 are displaced from the baseline@>; 14591 if thickness(q)=0 then @<Adjust \(s)|shift_up| and |shift_down| for the case 14592 of no fraction line@> 14593 else @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>; 14594 @<Construct a vlist box for the fraction, according to |shift_up| and 14595 |shift_down|@>; 14596 @<Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)| 14597 point to it@>; 14598 end; 14599 14600 @ @<Create equal-width boxes |x| and |z| for the numerator and denom...@>= 14601 x:=clean_box(numerator(q),num_style(cur_style)); 14602 z:=clean_box(denominator(q),denom_style(cur_style)); 14603 if width(x)<width(z) then x:=rebox(x,width(z)) 14604 else z:=rebox(z,width(x)); 14605 if cur_style<text_style then {display style} 14606 begin shift_up:=num1(cur_size); shift_down:=denom1(cur_size); 14607 end 14608 else begin shift_down:=denom2(cur_size); 14609 if thickness(q)<>0 then shift_up:=num2(cur_size) 14610 else shift_up:=num3(cur_size); 14611 end 14612 14613 @ The numerator and denominator must be separated by a certain minimum 14614 clearance, called |clr| in the following program. The difference between 14615 |clr| and the actual clearance is twice |delta|. 14616 14617 @<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>= 14618 begin if cur_style<text_style then clr:=7*default_rule_thickness 14619 else clr:=3*default_rule_thickness; 14620 delta:=half(clr-((shift_up-depth(x))-(height(z)-shift_down))); 14621 if delta>0 then 14622 begin shift_up:=shift_up+delta; 14623 shift_down:=shift_down+delta; 14624 end; 14625 end 14626 14627 @ In the case of a fraction line, the minimum clearance depends on the actual 14628 thickness of the line. 14629 14630 @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>= 14631 begin if cur_style<text_style then clr:=3*thickness(q) 14632 else clr:=thickness(q); 14633 delta:=half(thickness(q)); 14634 delta1:=clr-((shift_up-depth(x))-(axis_height(cur_size)+delta)); 14635 delta2:=clr-((axis_height(cur_size)-delta)-(height(z)-shift_down)); 14636 if delta1>0 then shift_up:=shift_up+delta1; 14637 if delta2>0 then shift_down:=shift_down+delta2; 14638 end 14639 14640 @ @<Construct a vlist box for the fraction...@>= 14641 v:=new_null_box; type(v):=vlist_node; 14642 height(v):=shift_up+height(x); depth(v):=depth(z)+shift_down; 14643 width(v):=width(x); {this also equals |width(z)|} 14644 if thickness(q)=0 then 14645 begin p:=new_kern((shift_up-depth(x))-(height(z)-shift_down)); 14646 link(p):=z; 14647 end 14648 else begin y:=fraction_rule(thickness(q));@/ 14649 p:=new_kern((axis_height(cur_size)-delta)-@|(height(z)-shift_down));@/ 14650 link(y):=p; link(p):=z;@/ 14651 p:=new_kern((shift_up-depth(x))-(axis_height(cur_size)+delta)); 14652 link(p):=y; 14653 end; 14654 link(x):=p; list_ptr(v):=x 14655 14656 @ @<Put the \(f)fraction into a box with its delimiters...@>= 14657 if cur_style<text_style then delta:=delim1(cur_size) 14658 else delta:=delim2(cur_size); 14659 x:=var_delimiter(left_delimiter(q), cur_size, delta); link(x):=v;@/ 14660 z:=var_delimiter(right_delimiter(q), cur_size, delta); link(v):=z;@/ 14661 new_hlist(q):=hpack(x,natural) 14662 14663 @ If the nucleus of an |op_noad| is a single character, it is to be 14664 centered vertically with respect to the axis, after first being enlarged 14665 (via a character list in the font) if we are in display style. The normal 14666 convention for placing displayed limits is to put them above and below the 14667 operator in display style. 14668 14669 The italic correction is removed from the character if there is a subscript 14670 and the limits are not being displayed. The |make_op| 14671 routine returns the value that should be used as an offset between 14672 subscript and superscript. 14673 14674 After |make_op| has acted, |subtype(q)| will be |limits| if and only if 14675 the limits have been set above and below the operator. In that case, 14676 |new_hlist(q)| will already contain the desired final box. 14677 14678 @<Declare math...@>= 14679 function make_op(@!q:pointer):scaled; 14680 var delta:scaled; {offset between subscript and superscript} 14681 @!p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction} 14682 @!c:quarterword;@+@!i:four_quarters; {registers for character examination} 14683 @!shift_up,@!shift_down:scaled; {dimensions for box calculation} 14684 begin if (subtype(q)=normal)and(cur_style<text_style) then 14685 subtype(q):=limits; 14686 if math_type(nucleus(q))=math_char then 14687 begin fetch(nucleus(q)); 14688 if (cur_style<text_style)and(char_tag(cur_i)=list_tag) then {make it larger} 14689 begin c:=rem_byte(cur_i); i:=char_info(cur_f)(c); 14690 if char_exists(i) then 14691 begin cur_c:=c; cur_i:=i; character(nucleus(q)):=c; 14692 end; 14693 end; 14694 delta:=char_italic(cur_f)(cur_i); x:=clean_box(nucleus(q),cur_style); 14695 if (math_type(subscr(q))<>empty)and(subtype(q)<>limits) then 14696 width(x):=width(x)-delta; {remove italic correction} 14697 shift_amount(x):=half(height(x)-depth(x)) - axis_height(cur_size); 14698 {center vertically} 14699 math_type(nucleus(q)):=sub_box; info(nucleus(q)):=x; 14700 end 14701 else delta:=0; 14702 if subtype(q)=limits then 14703 @<Construct a box with limits above and below it, skewed by |delta|@>; 14704 make_op:=delta; 14705 end; 14706 14707 @ The following program builds a vlist box |v| for displayed limits. The 14708 width of the box is not affected by the fact that the limits may be skewed. 14709 14710 @<Construct a box with limits above and below it...@>= 14711 begin x:=clean_box(supscr(q),sup_style(cur_style)); 14712 y:=clean_box(nucleus(q),cur_style); 14713 z:=clean_box(subscr(q),sub_style(cur_style)); 14714 v:=new_null_box; type(v):=vlist_node; width(v):=width(y); 14715 if width(x)>width(v) then width(v):=width(x); 14716 if width(z)>width(v) then width(v):=width(z); 14717 x:=rebox(x,width(v)); y:=rebox(y,width(v)); z:=rebox(z,width(v));@/ 14718 shift_amount(x):=half(delta); shift_amount(z):=-shift_amount(x); 14719 height(v):=height(y); depth(v):=depth(y); 14720 @<Attach the limits to |y| and adjust |height(v)|, |depth(v)| to 14721 account for their presence@>; 14722 new_hlist(q):=v; 14723 end 14724 14725 @ We use |shift_up| and |shift_down| in the following program for the 14726 amount of glue between the displayed operator |y| and its limits |x| and 14727 |z|. The vlist inside box |v| will consist of |x| followed by |y| followed 14728 by |z|, with kern nodes for the spaces between and around them. 14729 14730 @<Attach the limits to |y| and adjust |height(v)|, |depth(v)|...@>= 14731 if math_type(supscr(q))=empty then 14732 begin free_node(x,box_node_size); list_ptr(v):=y; 14733 end 14734 else begin shift_up:=big_op_spacing3-depth(x); 14735 if shift_up<big_op_spacing1 then shift_up:=big_op_spacing1; 14736 p:=new_kern(shift_up); link(p):=y; link(x):=p;@/ 14737 p:=new_kern(big_op_spacing5); link(p):=x; list_ptr(v):=p; 14738 height(v):=height(v)+big_op_spacing5+height(x)+depth(x)+shift_up; 14739 end; 14740 if math_type(subscr(q))=empty then free_node(z,box_node_size) 14741 else begin shift_down:=big_op_spacing4-height(z); 14742 if shift_down<big_op_spacing2 then shift_down:=big_op_spacing2; 14743 p:=new_kern(shift_down); link(y):=p; link(p):=z;@/ 14744 p:=new_kern(big_op_spacing5); link(z):=p; 14745 depth(v):=depth(v)+big_op_spacing5+height(z)+depth(z)+shift_down; 14746 end 14747 14748 @ A ligature found in a math formula does not create a |ligature_node|, because 14749 there is no question of hyphenation afterwards; the ligature will simply be 14750 stored in an ordinary |char_node|, after residing in an |ord_noad|. 14751 14752 The |math_type| is converted to |math_text_char| here if we would not want to 14753 apply an italic correction to the current character unless it belongs 14754 to a math font (i.e., a font with |space=0|). 14755 14756 No boundary characters enter into these ligatures. 14757 14758 @<Declare math...@>= 14759 procedure make_ord(@!q:pointer); 14760 label restart,exit; 14761 var a:integer; {address of lig/kern instruction} 14762 @!p,@!r:pointer; {temporary registers for list manipulation} 14763 begin restart:@t@>@;@/ 14764 if math_type(subscr(q))=empty then if math_type(supscr(q))=empty then 14765 if math_type(nucleus(q))=math_char then 14766 begin p:=link(q); 14767 if p<>null then if (type(p)>=ord_noad)and(type(p)<=punct_noad) then 14768 if math_type(nucleus(p))=math_char then 14769 if fam(nucleus(p))=fam(nucleus(q)) then 14770 begin math_type(nucleus(q)):=math_text_char; 14771 fetch(nucleus(q)); 14772 if char_tag(cur_i)=lig_tag then 14773 begin a:=lig_kern_start(cur_f)(cur_i); 14774 cur_c:=character(nucleus(p)); 14775 cur_i:=font_info[a].qqqq; 14776 if skip_byte(cur_i)>stop_flag then 14777 begin a:=lig_kern_restart(cur_f)(cur_i); 14778 cur_i:=font_info[a].qqqq; 14779 end; 14780 loop@+ begin @<If instruction |cur_i| is a kern with |cur_c|, attach 14781 the kern after~|q|; or if it is a ligature with |cur_c|, combine 14782 noads |q| and~|p| appropriately; then |return| if the cursor has 14783 moved past a noad, or |goto restart|@>; 14784 if skip_byte(cur_i)>=stop_flag then return; 14785 a:=a+qo(skip_byte(cur_i))+1; 14786 cur_i:=font_info[a].qqqq; 14787 end; 14788 end; 14789 end; 14790 end; 14791 exit:end; 14792 14793 @ Note that a ligature between an |ord_noad| and another kind of noad 14794 is replaced by an |ord_noad|, when the two noads collapse into one. 14795 But we could make a parenthesis (say) change shape when it follows 14796 certain letters. Presumably a font designer will define such 14797 ligatures only when this convention makes sense. 14798 14799 \chardef\?='174 % vertical line to indicate character retention 14800 14801 @<If instruction |cur_i| is a kern with |cur_c|, ...@>= 14802 if next_char(cur_i)=cur_c then if skip_byte(cur_i)<=stop_flag then 14803 if op_byte(cur_i)>=kern_flag then 14804 begin p:=new_kern(char_kern(cur_f)(cur_i)); 14805 link(p):=link(q); link(q):=p; return; 14806 end 14807 else begin check_interrupt; {allow a way out of infinite ligature loop} 14808 case op_byte(cur_i) of 14809 qi(1),qi(5): character(nucleus(q)):=rem_byte(cur_i); {\.{=:\?}, \.{=:\?>}} 14810 qi(2),qi(6): character(nucleus(p)):=rem_byte(cur_i); {\.{\?=:}, \.{\?=:>}} 14811 qi(3),qi(7),qi(11):begin r:=new_noad; {\.{\?=:\?}, \.{\?=:\?>}, \.{\?=:\?>>}} 14812 character(nucleus(r)):=rem_byte(cur_i); 14813 fam(nucleus(r)):=fam(nucleus(q));@/ 14814 link(q):=r; link(r):=p; 14815 if op_byte(cur_i)<qi(11) then math_type(nucleus(r)):=math_char 14816 else math_type(nucleus(r)):=math_text_char; {prevent combination} 14817 end; 14818 othercases begin link(q):=link(p); 14819 character(nucleus(q)):=rem_byte(cur_i); {\.{=:}} 14820 mem[subscr(q)]:=mem[subscr(p)]; mem[supscr(q)]:=mem[supscr(p)];@/ 14821 free_node(p,noad_size); 14822 end 14823 endcases; 14824 if op_byte(cur_i)>qi(3) then return; 14825 math_type(nucleus(q)):=math_char; goto restart; 14826 end 14827 14828 @ When we get to the following part of the program, we have ``fallen through'' 14829 from cases that did not lead to |check_dimensions| or |done_with_noad| or 14830 |done_with_node|. Thus, |q|~points to a noad whose nucleus may need to be 14831 converted to an hlist, and whose subscripts and superscripts need to be 14832 appended if they are present. 14833 14834 If |nucleus(q)| is not a |math_char|, the variable |delta| is the amount 14835 by which a superscript should be moved right with respect to a subscript 14836 when both are present. 14837 @^subscripts@> 14838 @^superscripts@> 14839 14840 @<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>= 14841 case math_type(nucleus(q)) of 14842 math_char, math_text_char: 14843 @<Create a character node |p| for |nucleus(q)|, possibly followed 14844 by a kern node for the italic correction, and set |delta| to the 14845 italic correction if a subscript is present@>; 14846 empty: p:=null; 14847 sub_box: p:=info(nucleus(q)); 14848 sub_mlist: begin cur_mlist:=info(nucleus(q)); save_style:=cur_style; 14849 mlist_penalties:=false; mlist_to_hlist; {recursive call} 14850 @^recursion@> 14851 cur_style:=save_style; @<Set up the values...@>; 14852 p:=hpack(link(temp_head),natural); 14853 end; 14854 othercases confusion("mlist2") 14855 @:this can't happen mlist2}{\quad mlist2@> 14856 endcases;@/ 14857 new_hlist(q):=p; 14858 if (math_type(subscr(q))=empty)and(math_type(supscr(q))=empty) then 14859 goto check_dimensions; 14860 make_scripts(q,delta) 14861 14862 @ @<Create a character node |p| for |nucleus(q)|...@>= 14863 begin fetch(nucleus(q)); 14864 if char_exists(cur_i) then 14865 begin delta:=char_italic(cur_f)(cur_i); p:=new_character(cur_f,qo(cur_c)); 14866 if (math_type(nucleus(q))=math_text_char)and(space(cur_f)<>0) then 14867 delta:=0; {no italic correction in mid-word of text font} 14868 if (math_type(subscr(q))=empty)and(delta<>0) then 14869 begin link(p):=new_kern(delta); delta:=0; 14870 end; 14871 end 14872 else p:=null; 14873 end 14874 14875 @ The purpose of |make_scripts(q,delta)| is to attach the subscript and/or 14876 superscript of noad |q| to the list that starts at |new_hlist(q)|, 14877 given that the subscript and superscript aren't both empty. The superscript 14878 will appear to the right of the subscript by a given distance |delta|. 14879 14880 We set |shift_down| and |shift_up| to the minimum amounts to shift the 14881 baseline of subscripts and superscripts based on the given nucleus. 14882 14883 @<Declare math...@>= 14884 procedure make_scripts(@!q:pointer;@!delta:scaled); 14885 var p,@!x,@!y,@!z:pointer; {temporary registers for box construction} 14886 @!shift_up,@!shift_down,@!clr:scaled; {dimensions in the calculation} 14887 @!t:small_number; {subsidiary size code} 14888 begin p:=new_hlist(q); 14889 if is_char_node(p) then 14890 begin shift_up:=0; shift_down:=0; 14891 end 14892 else begin z:=hpack(p,natural); 14893 if cur_style<script_style then t:=script_size@+else t:=script_script_size; 14894 shift_up:=height(z)-sup_drop(t); 14895 shift_down:=depth(z)+sub_drop(t); 14896 free_node(z,box_node_size); 14897 end; 14898 if math_type(supscr(q))=empty then 14899 @<Construct a subscript box |x| when there is no superscript@> 14900 else begin @<Construct a superscript box |x|@>; 14901 if math_type(subscr(q))=empty then shift_amount(x):=-shift_up 14902 else @<Construct a sub/superscript combination box |x|, with the 14903 superscript offset by |delta|@>; 14904 end; 14905 if new_hlist(q)=null then new_hlist(q):=x 14906 else begin p:=new_hlist(q); 14907 while link(p)<>null do p:=link(p); 14908 link(p):=x; 14909 end; 14910 end; 14911 14912 @ When there is a subscript without a superscript, the top of the subscript 14913 should not exceed the baseline plus four-fifths of the x-height. 14914 14915 @<Construct a subscript box |x| when there is no superscript@>= 14916 begin x:=clean_box(subscr(q),sub_style(cur_style)); 14917 width(x):=width(x)+script_space; 14918 if shift_down<sub1(cur_size) then shift_down:=sub1(cur_size); 14919 clr:=height(x)-(abs(math_x_height(cur_size)*4) div 5); 14920 if shift_down<clr then shift_down:=clr; 14921 shift_amount(x):=shift_down; 14922 end 14923 14924 @ The bottom of a superscript should never descend below the baseline plus 14925 one-fourth of the x-height. 14926 14927 @<Construct a superscript box |x|@>= 14928 begin x:=clean_box(supscr(q),sup_style(cur_style)); 14929 width(x):=width(x)+script_space; 14930 if odd(cur_style) then clr:=sup3(cur_size) 14931 else if cur_style<text_style then clr:=sup1(cur_size) 14932 else clr:=sup2(cur_size); 14933 if shift_up<clr then shift_up:=clr; 14934 clr:=depth(x)+(abs(math_x_height(cur_size)) div 4); 14935 if shift_up<clr then shift_up:=clr; 14936 end 14937 14938 @ When both subscript and superscript are present, the subscript must be 14939 separated from the superscript by at least four times |default_rule_thickness|. 14940 If this condition would be violated, the subscript moves down, after which 14941 both subscript and superscript move up so that the bottom of the superscript 14942 is at least as high as the baseline plus four-fifths of the x-height. 14943 14944 @<Construct a sub/superscript combination box |x|...@>= 14945 begin y:=clean_box(subscr(q),sub_style(cur_style)); 14946 width(y):=width(y)+script_space; 14947 if shift_down<sub2(cur_size) then shift_down:=sub2(cur_size); 14948 clr:=4*default_rule_thickness- 14949 ((shift_up-depth(x))-(height(y)-shift_down)); 14950 if clr>0 then 14951 begin shift_down:=shift_down+clr; 14952 clr:=(abs(math_x_height(cur_size)*4) div 5)-(shift_up-depth(x)); 14953 if clr>0 then 14954 begin shift_up:=shift_up+clr; 14955 shift_down:=shift_down-clr; 14956 end; 14957 end; 14958 shift_amount(x):=delta; {superscript is |delta| to the right of the subscript} 14959 p:=new_kern((shift_up-depth(x))-(height(y)-shift_down)); link(x):=p; link(p):=y; 14960 x:=vpack(x,natural); shift_amount(x):=shift_down; 14961 end 14962 14963 @ We have now tied up all the loose ends of the first pass of |mlist_to_hlist|. 14964 The second pass simply goes through and hooks everything together with the 14965 proper glue and penalties. It also handles the |left_noad| and |right_noad| that 14966 might be present, since |max_h| and |max_d| are now known. Variable |p| points 14967 to a node at the current end of the final hlist. 14968 14969 @<Make a second pass over the mlist, ...@>= 14970 p:=temp_head; link(p):=null; q:=mlist; r_type:=0; cur_style:=style; 14971 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14972 while q<>null do 14973 begin @<If node |q| is a style node, change the style and |goto delete_q|; 14974 otherwise if it is not a noad, put it into the hlist, 14975 advance |q|, and |goto done|; otherwise set |s| to the size 14976 of noad |q|, set |t| to the associated type (|ord_noad.. 14977 inner_noad|), and set |pen| to the associated penalty@>; 14978 @<Append inter-element spacing based on |r_type| and |t|@>; 14979 @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>; 14980 r_type:=t; 14981 delete_q: r:=q; q:=link(q); free_node(r,s); 14982 done: end 14983 14984 @ Just before doing the big |case| switch in the second pass, the program 14985 sets up default values so that most of the branches are short. 14986 14987 @<If node |q| is a style node, change the style...@>= 14988 t:=ord_noad; s:=noad_size; pen:=inf_penalty; 14989 case type(q) of 14990 op_noad,open_noad,close_noad,punct_noad,inner_noad: t:=type(q); 14991 bin_noad: begin t:=bin_noad; pen:=bin_op_penalty; 14992 end; 14993 rel_noad: begin t:=rel_noad; pen:=rel_penalty; 14994 end; 14995 ord_noad,vcenter_noad,over_noad,under_noad: do_nothing; 14996 radical_noad: s:=radical_noad_size; 14997 accent_noad: s:=accent_noad_size; 14998 fraction_noad: s:=fraction_noad_size; 14999 left_noad,right_noad: t:=make_left_right(q,style,max_d,max_h); 15000 style_node: @<Change the current style and |goto delete_q|@>; 15001 whatsit_node,penalty_node,rule_node,disc_node,adjust_node,ins_node,mark_node, 15002 glue_node,kern_node:@t@>@;@/ 15003 begin link(p):=q; p:=q; q:=link(q); link(p):=null; goto done; 15004 end; 15005 othercases confusion("mlist3") 15006 @:this can't happen mlist3}{\quad mlist3@> 15007 endcases 15008 15009 @ The |make_left_right| function constructs a left or right delimiter of 15010 the required size and returns the value |open_noad| or |close_noad|. The 15011 |right_noad| and |left_noad| will both be based on the original |style|, 15012 so they will have consistent sizes. 15013 15014 We use the fact that |right_noad-left_noad=close_noad-open_noad|. 15015 15016 @<Declare math...@>= 15017 function make_left_right(@!q:pointer;@!style:small_number; 15018 @!max_d,@!max_h:scaled):small_number; 15019 var delta,@!delta1,@!delta2:scaled; {dimensions used in the calculation} 15020 begin if style<script_style then cur_size:=text_size 15021 else cur_size:=16*((style-text_style) div 2); 15022 delta2:=max_d+axis_height(cur_size); 15023 delta1:=max_h+max_d-delta2; 15024 if delta2>delta1 then delta1:=delta2; {|delta1| is max distance from axis} 15025 delta:=(delta1 div 500)*delimiter_factor; 15026 delta2:=delta1+delta1-delimiter_shortfall; 15027 if delta<delta2 then delta:=delta2; 15028 new_hlist(q):=var_delimiter(delimiter(q),cur_size,delta); 15029 make_left_right:=type(q)-(left_noad-open_noad); {|open_noad| or |close_noad|} 15030 end; 15031 15032 @ @<Change the current style and |goto delete_q|@>= 15033 begin cur_style:=subtype(q); s:=style_node_size; 15034 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 15035 goto delete_q; 15036 end 15037 15038 @ The inter-element spacing in math formulas depends on an $8\times8$ table that 15039 \TeX\ preloads as a 64-digit string. The elements of this string have the 15040 following significance: 15041 $$\vbox{\halign{#\hfil\cr 15042 \.0 means no space;\cr 15043 \.1 means a conditional thin space (\.{\\nonscript\\mskip\\thinmuskip});\cr 15044 \.2 means a thin space (\.{\\mskip\\thinmuskip});\cr 15045 \.3 means a conditional medium space 15046 (\.{\\nonscript\\mskip\\medmuskip});\cr 15047 \.4 means a conditional thick space 15048 (\.{\\nonscript\\mskip\\thickmuskip});\cr 15049 \.* means an impossible case.\cr}}$$ 15050 This is all pretty cryptic, but {\sl The \TeX book\/} explains what is 15051 supposed to happen, and the string makes it happen. 15052 @:TeXbook}{\sl The \TeX book@> 15053 15054 A global variable |magic_offset| is computed so that if |a| and |b| are 15055 in the range |ord_noad..inner_noad|, then |str_pool[a*8+b+magic_offset]| 15056 is the digit for spacing between noad types |a| and |b|. 15057 15058 If \PASCAL\ had provided a good way to preload constant arrays, this part of 15059 the program would not have been so strange. 15060 @:PASCAL}{\PASCAL@> 15061 15062 @d math_spacing=@;@/ 15063 @t\hskip-35pt@> 15064 "0234000122*4000133**3**344*0400400*000000234000111*1111112341011" 15065 @t$ \hskip-35pt$@> 15066 15067 @<Glob...@>= 15068 @!magic_offset:integer; {used to find inter-element spacing} 15069 15070 @ @<Compute the magic offset@>= 15071 magic_offset:=str_start[math_spacing]-9*ord_noad 15072 15073 @ @<Append inter-element spacing based on |r_type| and |t|@>= 15074 if r_type>0 then {not the first noad} 15075 begin case so(str_pool[r_type*8+t+magic_offset]) of 15076 "0": x:=0; 15077 "1": if cur_style<script_style then x:=thin_mu_skip_code@+else x:=0; 15078 "2": x:=thin_mu_skip_code; 15079 "3": if cur_style<script_style then x:=med_mu_skip_code@+else x:=0; 15080 "4": if cur_style<script_style then x:=thick_mu_skip_code@+else x:=0; 15081 othercases confusion("mlist4") 15082 @:this can't happen mlist4}{\quad mlist4@> 15083 endcases; 15084 if x<>0 then 15085 begin y:=math_glue(glue_par(x),cur_mu); 15086 z:=new_glue(y); glue_ref_count(y):=null; link(p):=z; p:=z;@/ 15087 subtype(z):=x+1; {store a symbolic subtype} 15088 end; 15089 end 15090 15091 @ We insert a penalty node after the hlist entries of noad |q| if |pen| 15092 is not an ``infinite'' penalty, and if the node immediately following |q| 15093 is not a penalty node or a |rel_noad| or absent entirely. 15094 15095 @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>= 15096 if new_hlist(q)<>null then 15097 begin link(p):=new_hlist(q); 15098 repeat p:=link(p); 15099 until link(p)=null; 15100 end; 15101 if penalties then if link(q)<>null then if pen<inf_penalty then 15102 begin r_type:=type(link(q)); 15103 if r_type<>penalty_node then if r_type<>rel_noad then 15104 begin z:=new_penalty(pen); link(p):=z; p:=z; 15105 end; 15106 end 15107 15108 @* \[37] Alignment. 15109 It's sort of a miracle whenever \.{\\halign} and \.{\\valign} work, because 15110 they cut across so many of the control structures of \TeX. 15111 15112 Therefore the 15113 present page is probably not the best place for a beginner to start reading 15114 this program; it is better to master everything else first. 15115 15116 Let us focus our thoughts on an example of what the input might be, in order 15117 to get some idea about how the alignment miracle happens. The example doesn't 15118 do anything useful, but it is sufficiently general to indicate all of the 15119 special cases that must be dealt with; please do not be disturbed by its 15120 apparent complexity and meaninglessness. 15121 $$\vbox{\halign{\.{#}\hfil\cr 15122 {}\\tabskip 2pt plus 3pt\cr 15123 {}\\halign to 300pt\{u1\#v1\&\cr 15124 \hskip 50pt\\tabskip 1pt plus 1fil u2\#v2\&\cr 15125 \hskip 50pt u3\#v3\\cr\cr 15126 \hskip 25pt a1\&\\omit a2\&\\vrule\\cr\cr 15127 \hskip 25pt \\noalign\{\\vskip 3pt\}\cr 15128 \hskip 25pt b1\\span b2\\cr\cr 15129 \hskip 25pt \\omit\&c2\\span\\omit\\cr\}\cr}}$$ 15130 Here's what happens: 15131 15132 \yskip 15133 (0) When `\.{\\halign to 300pt\{}' is scanned, the |scan_spec| routine 15134 places the 300pt dimension onto the |save_stack|, and an |align_group| 15135 code is placed above it. This will make it possible to complete the alignment 15136 when the matching `\.\}' is found. 15137 15138 (1) The preamble is scanned next. Macros in the preamble are not expanded, 15139 @^preamble@> 15140 except as part of a tabskip specification. For example, if \.{u2} had been 15141 a macro in the preamble above, it would have been expanded, since \TeX\ 15142 must look for `\.{minus...}' as part of the tabskip glue. A ``preamble list'' 15143 is constructed based on the user's preamble; in our case it contains the 15144 following seven items: 15145 $$\vbox{\halign{\.{#}\hfil\qquad&(#)\hfil\cr 15146 {}\\glue 2pt plus 3pt&the tabskip preceding column 1\cr 15147 {}\\alignrecord, width $-\infty$&preamble info for column 1\cr 15148 {}\\glue 2pt plus 3pt&the tabskip between columns 1 and 2\cr 15149 {}\\alignrecord, width $-\infty$&preamble info for column 2\cr 15150 {}\\glue 1pt plus 1fil&the tabskip between columns 2 and 3\cr 15151 {}\\alignrecord, width $-\infty$&preamble info for column 3\cr 15152 {}\\glue 1pt plus 1fil&the tabskip following column 3\cr}}$$ 15153 These ``alignrecord'' entries have the same size as an |unset_node|, 15154 since they will later be converted into such nodes. However, at the 15155 moment they have no |type| or |subtype| fields; they have |info| fields 15156 instead, and these |info| fields are initially set to the value |end_span|, 15157 for reasons explained below. Furthermore, the alignrecord nodes have no 15158 |height| or |depth| fields; these are renamed |u_part| and |v_part|, 15159 and they point to token lists for the templates of the alignment. 15160 For example, the |u_part| field in the first alignrecord points to the 15161 token list `\.{u1}', i.e., the template preceding the `\.\#' for column~1. 15162 15163 (2) \TeX\ now looks at what follows the \.{\\cr} that ended the preamble. 15164 It is not `\.{\\noalign}' or `\.{\\omit}', so this input is put back to 15165 be read again, and the template `\.{u1}' is fed to the scanner. Just 15166 before reading `\.{u1}', \TeX\ goes into restricted horizontal mode. 15167 Just after reading `\.{u1}', \TeX\ will see `\.{a1}', and then (when the 15168 {\.\&} is sensed) \TeX\ will see `\.{v1}'. Then \TeX\ scans an |endv| 15169 token, indicating the end of a column. At this point an |unset_node| is 15170 created, containing the contents of the current hlist (i.e., `\.{u1a1v1}'). 15171 The natural width of this unset node replaces the |width| field of the 15172 alignrecord for column~1; in general, the alignrecords will record the 15173 maximum natural width that has occurred so far in a given column. 15174 15175 (3) Since `\.{\\omit}' follows the `\.\&', the templates for column~2 15176 are now bypassed. Again \TeX\ goes into restricted horizontal mode and 15177 makes an |unset_node| from the resulting hlist; but this time the 15178 hlist contains simply `\.{a2}'. The natural width of the new unset box 15179 is remembered in the |width| field of the alignrecord for column~2. 15180 15181 (4) A third |unset_node| is created for column 3, using essentially the 15182 mechanism that worked for column~1; this unset box contains `\.{u3\\vrule 15183 v3}'. The vertical rule in this case has running dimensions that will later 15184 extend to the height and depth of the whole first row, since each |unset_node| 15185 in a row will eventually inherit the height and depth of its enclosing box. 15186 15187 (5) The first row has now ended; it is made into a single unset box 15188 comprising the following seven items: 15189 $$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr 15190 {}\\glue 2pt plus 3pt\cr 15191 {}\\unsetbox for 1 column: u1a1v1\cr 15192 {}\\glue 2pt plus 3pt\cr 15193 {}\\unsetbox for 1 column: a2\cr 15194 {}\\glue 1pt plus 1fil\cr 15195 {}\\unsetbox for 1 column: u3\\vrule v3\cr 15196 {}\\glue 1pt plus 1fil\cr}}$$ 15197 The width of this unset row is unimportant, but it has the correct height 15198 and depth, so the correct baselineskip glue will be computed as the row 15199 is inserted into a vertical list. 15200 15201 (6) Since `\.{\\noalign}' follows the current \.{\\cr}, \TeX\ appends 15202 additional material (in this case \.{\\vskip 3pt}) to the vertical list. 15203 While processing this material, \TeX\ will be in internal vertical 15204 mode, and |no_align_group| will be on |save_stack|. 15205 15206 (7) The next row produces an unset box that looks like this: 15207 $$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr 15208 {}\\glue 2pt plus 3pt\cr 15209 {}\\unsetbox for 2 columns: u1b1v1u2b2v2\cr 15210 {}\\glue 1pt plus 1fil\cr 15211 {}\\unsetbox for 1 column: {\rm(empty)}\cr 15212 {}\\glue 1pt plus 1fil\cr}}$$ 15213 The natural width of the unset box that spans columns 1~and~2 is stored 15214 in a ``span node,'' which we will explain later; the |info| field of the 15215 alignrecord for column~1 now points to the new span node, and the |info| 15216 of the span node points to |end_span|. 15217 15218 (8) The final row produces the unset box 15219 $$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr 15220 {}\\glue 2pt plus 3pt\cr 15221 {}\\unsetbox for 1 column: {\rm(empty)}\cr 15222 {}\\glue 2pt plus 3pt\cr 15223 {}\\unsetbox for 2 columns: u2c2v2\cr 15224 {}\\glue 1pt plus 1fil\cr}}$$ 15225 A new span node is attached to the alignrecord for column 2. 15226 15227 (9) The last step is to compute the true column widths and to change all the 15228 unset boxes to hboxes, appending the whole works to the vertical list that 15229 encloses the \.{\\halign}. The rules for deciding on the final widths of 15230 each unset column box will be explained below. 15231 15232 \yskip\noindent 15233 Note that as \.{\\halign} is being processed, we fearlessly give up control 15234 to the rest of \TeX. At critical junctures, an alignment routine is 15235 called upon to step in and do some little action, but most of the time 15236 these routines just lurk in the background. It's something like 15237 post-hypnotic suggestion. 15238 15239 @ We have mentioned that alignrecords contain no |height| or |depth| fields. 15240 Their |glue_sign| and |glue_order| are pre-empted as well, since it 15241 is necessary to store information about what to do when a template ends. 15242 This information is called the |extra_info| field. 15243 15244 @d u_part(#)==mem[#+height_offset].int {pointer to \<u_j> token list} 15245 @d v_part(#)==mem[#+depth_offset].int {pointer to \<v_j> token list} 15246 @d extra_info(#)==info(#+list_offset) {info to remember during template} 15247 15248 @ Alignments can occur within alignments, so a small stack is used to access 15249 the alignrecord information. At each level we have a |preamble| pointer, 15250 indicating the beginning of the preamble list; a |cur_align| pointer, 15251 indicating the current position in the preamble list; a |cur_span| pointer, 15252 indicating the value of |cur_align| at the beginning of a sequence of 15253 spanned columns; a |cur_loop| pointer, indicating the tabskip glue before 15254 an alignrecord that should be copied next if the current list is extended; 15255 and the |align_state| variable, which indicates the nesting of braces so 15256 that \.{\\cr} and \.{\\span} and tab marks are properly intercepted. 15257 There also are pointers |cur_head| and |cur_tail| to the head and tail 15258 of a list of adjustments being moved out from horizontal mode to 15259 vertical~mode. 15260 15261 The current values of these seven quantities appear in global variables; 15262 when they have to be pushed down, they are stored in 5-word nodes, and 15263 |align_ptr| points to the topmost such node. 15264 15265 @d preamble==link(align_head) {the current preamble list} 15266 @d align_stack_node_size=5 {number of |mem| words to save alignment states} 15267 15268 @<Glob...@>= 15269 @!cur_align:pointer; {current position in preamble list} 15270 @!cur_span:pointer; {start of currently spanned columns in preamble list} 15271 @!cur_loop:pointer; {place to copy when extending a periodic preamble} 15272 @!align_ptr:pointer; {most recently pushed-down alignment stack node} 15273 @!cur_head,@!cur_tail:pointer; {adjustment list pointers} 15274 15275 @ The |align_state| and |preamble| variables are initialized elsewhere. 15276 15277 @<Set init...@>= 15278 align_ptr:=null; cur_align:=null; cur_span:=null; cur_loop:=null; 15279 cur_head:=null; cur_tail:=null; 15280 15281 @ Alignment stack maintenance is handled by a pair of trivial routines 15282 called |push_alignment| and |pop_alignment|. 15283 15284 @p procedure push_alignment; 15285 var p:pointer; {the new alignment stack node} 15286 begin p:=get_node(align_stack_node_size); 15287 link(p):=align_ptr; info(p):=cur_align; 15288 llink(p):=preamble; rlink(p):=cur_span; 15289 mem[p+2].int:=cur_loop; mem[p+3].int:=align_state; 15290 info(p+4):=cur_head; link(p+4):=cur_tail; 15291 align_ptr:=p; 15292 cur_head:=get_avail; 15293 end; 15294 @# 15295 procedure pop_alignment; 15296 var p:pointer; {the top alignment stack node} 15297 begin free_avail(cur_head); 15298 p:=align_ptr; 15299 cur_tail:=link(p+4); cur_head:=info(p+4); 15300 align_state:=mem[p+3].int; cur_loop:=mem[p+2].int; 15301 cur_span:=rlink(p); preamble:=llink(p); 15302 cur_align:=info(p); align_ptr:=link(p); 15303 free_node(p,align_stack_node_size); 15304 end; 15305 15306 @ \TeX\ has eight procedures that govern alignments: |init_align| and 15307 |fin_align| are used at the very beginning and the very end; |init_row| and 15308 |fin_row| are used at the beginning and end of individual rows; |init_span| 15309 is used at the beginning of a sequence of spanned columns (possibly involving 15310 only one column); |init_col| and |fin_col| are used at the beginning and 15311 end of individual columns; and |align_peek| is used after \.{\\cr} to see 15312 whether the next item is \.{\\noalign}. 15313 15314 We shall consider these routines in the order they are first used during 15315 the course of a complete \.{\\halign}, namely |init_align|, |align_peek|, 15316 |init_row|, |init_span|, |init_col|, |fin_col|, |fin_row|, |fin_align|. 15317 15318 @ When \.{\\halign} or \.{\\valign} has been scanned in an appropriate 15319 mode, \TeX\ calls |init_align|, whose task is to get everything off to a 15320 good start. This mostly involves scanning the preamble and putting its 15321 information into the preamble list. 15322 @^preamble@> 15323 15324 @p @t\4@>@<Declare the procedure called |get_preamble_token|@>@t@>@/ 15325 procedure@?align_peek; forward;@t\2@>@/ 15326 procedure@?normal_paragraph; forward;@t\2@>@/ 15327 procedure init_align; 15328 label done, done1, done2, continue; 15329 var save_cs_ptr:pointer; {|warning_index| value for error messages} 15330 @!p:pointer; {for short-term temporary use} 15331 begin save_cs_ptr:=cur_cs; {\.{\\halign} or \.{\\valign}, usually} 15332 push_alignment; align_state:=-1000000; {enter a new alignment level} 15333 @<Check for improper alignment in displayed math@>; 15334 push_nest; {enter a new semantic level} 15335 @<Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}@>; 15336 scan_spec(align_group,false);@/ 15337 @<Scan the preamble and record it in the |preamble| list@>; 15338 new_save_level(align_group); 15339 if every_cr<>null then begin_token_list(every_cr,every_cr_text); 15340 align_peek; {look for \.{\\noalign} or \.{\\omit}} 15341 end; 15342 15343 @ In vertical modes, |prev_depth| already has the correct value. But 15344 if we are in |mmode| (displayed formula mode), we reach out to the 15345 enclosing vertical mode for the |prev_depth| value that produces the 15346 correct baseline calculations. 15347 15348 @<Change current mode...@>= 15349 if mode=mmode then 15350 begin mode:=-vmode; prev_depth:=nest[nest_ptr-2].aux_field.sc; 15351 end 15352 else if mode>0 then negate(mode) 15353 15354 @ When \.{\\halign} is used as a displayed formula, there should be 15355 no other pieces of mlists present. 15356 15357 @<Check for improper alignment in displayed math@>= 15358 if (mode=mmode)and((tail<>head)or(incompleat_noad<>null)) then 15359 begin print_err("Improper "); print_esc("halign"); print(" inside $$'s"); 15360 @.Improper \\halign...@> 15361 help3("Displays can use special alignments (like \eqalignno)")@/ 15362 ("only if nothing but the alignment itself is between $$'s.")@/ 15363 ("So I've deleted the formulas that preceded this alignment."); 15364 error; flush_math; 15365 end 15366 15367 @ @<Scan the preamble and record it in the |preamble| list@>= 15368 preamble:=null; cur_align:=align_head; cur_loop:=null; scanner_status:=aligning; 15369 warning_index:=save_cs_ptr; align_state:=-1000000; 15370 {at this point, |cur_cmd=left_brace|} 15371 loop@+ begin @<Append the current tabskip glue to the preamble list@>; 15372 if cur_cmd=car_ret then goto done; {\.{\\cr} ends the preamble} 15373 @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|, 15374 looking for changes in the tabskip glue; append an 15375 alignrecord to the preamble list@>; 15376 end; 15377 done: scanner_status:=normal 15378 15379 @ @<Append the current tabskip glue to the preamble list@>= 15380 link(cur_align):=new_param_glue(tab_skip_code); 15381 cur_align:=link(cur_align) 15382 15383 @ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|...@>= 15384 @<Scan the template \<u_j>, putting the resulting token list in |hold_head|@>; 15385 link(cur_align):=new_null_box; cur_align:=link(cur_align); {a new alignrecord} 15386 info(cur_align):=end_span; width(cur_align):=null_flag; 15387 u_part(cur_align):=link(hold_head); 15388 @<Scan the template \<v_j>, putting the resulting token list in |hold_head|@>; 15389 v_part(cur_align):=link(hold_head) 15390 15391 @ We enter `\.{\\span}' into |eqtb| with |tab_mark| as its command code, 15392 and with |span_code| as the command modifier. This makes \TeX\ interpret it 15393 essentially the same as an alignment delimiter like `\.\&', yet it is 15394 recognizably different when we need to distinguish it from a normal delimiter. 15395 It also turns out to be useful to give a special |cr_code| to `\.{\\cr}', 15396 and an even larger |cr_cr_code| to `\.{\\crcr}'. 15397 15398 The end of a template is represented by two ``frozen'' control sequences 15399 called \.{\\endtemplate}. The first has the command code |end_template|, which 15400 is |>outer_call|, so it will not easily disappear in the presence of errors. 15401 The |get_x_token| routine converts the first into the second, which has |endv| 15402 as its command code. 15403 15404 @d span_code=256 {distinct from any character} 15405 @d cr_code=257 {distinct from |span_code| and from any character} 15406 @d cr_cr_code=cr_code+1 {this distinguishes \.{\\crcr} from \.{\\cr}} 15407 @d end_template_token==cs_token_flag+frozen_end_template 15408 15409 @<Put each of \TeX's primitives into the hash table@>= 15410 primitive("span",tab_mark,span_code);@/ 15411 @!@:span_}{\.{\\span} primitive@> 15412 primitive("cr",car_ret,cr_code); 15413 @!@:cr_}{\.{\\cr} primitive@> 15414 text(frozen_cr):="cr"; eqtb[frozen_cr]:=eqtb[cur_val];@/ 15415 primitive("crcr",car_ret,cr_cr_code); 15416 @!@:cr_cr_}{\.{\\crcr} primitive@> 15417 text(frozen_end_template):="endtemplate"; text(frozen_endv):="endtemplate"; 15418 @.endtemplate@> 15419 eq_type(frozen_endv):=endv; equiv(frozen_endv):=null_list; 15420 eq_level(frozen_endv):=level_one;@/ 15421 eqtb[frozen_end_template]:=eqtb[frozen_endv]; 15422 eq_type(frozen_end_template):=end_template; 15423 15424 @ @<Cases of |print_cmd_chr|...@>= 15425 tab_mark: if chr_code=span_code then print_esc("span") 15426 else chr_cmd("alignment tab character "); 15427 car_ret: if chr_code=cr_code then print_esc("cr") 15428 else print_esc("crcr"); 15429 15430 @ The preamble is copied directly, except that \.{\\tabskip} causes a change 15431 to the tabskip glue, thereby possibly expanding macros that immediately 15432 follow it. An appearance of \.{\\span} also causes such an expansion. 15433 15434 Note that if the preamble contains `\.{\\global\\tabskip}', the `\.{\\global}' 15435 token survives in the preamble and the `\.{\\tabskip}' defines new 15436 tabskip glue (locally). 15437 15438 @<Declare the procedure called |get_preamble_token|@>= 15439 procedure get_preamble_token; 15440 label restart; 15441 begin restart: get_token; 15442 while (cur_chr=span_code)and(cur_cmd=tab_mark) do 15443 begin get_token; {this token will be expanded once} 15444 if cur_cmd>max_command then 15445 begin expand; get_token; 15446 end; 15447 end; 15448 if cur_cmd=endv then 15449 fatal_error("(interwoven alignment preambles are not allowed)"); 15450 @.interwoven alignment preambles...@> 15451 if (cur_cmd=assign_glue)and(cur_chr=glue_base+tab_skip_code) then 15452 begin scan_optional_equals; scan_glue(glue_val); 15453 if global_defs>0 then geq_define(glue_base+tab_skip_code,glue_ref,cur_val) 15454 else eq_define(glue_base+tab_skip_code,glue_ref,cur_val); 15455 goto restart; 15456 end; 15457 end; 15458 15459 @ Spaces are eliminated from the beginning of a template. 15460 15461 @<Scan the template \<u_j>...@>= 15462 p:=hold_head; link(p):=null; 15463 loop@+ begin get_preamble_token; 15464 if cur_cmd=mac_param then goto done1; 15465 if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then 15466 if (p=hold_head)and(cur_loop=null)and(cur_cmd=tab_mark) 15467 then cur_loop:=cur_align 15468 else begin print_err("Missing # inserted in alignment preamble"); 15469 @.Missing \# inserted...@> 15470 help3("There should be exactly one # between &'s, when an")@/ 15471 ("\halign or \valign is being set up. In this case you had")@/ 15472 ("none, so I've put one in; maybe that will work."); 15473 back_error; goto done1; 15474 end 15475 else if (cur_cmd<>spacer)or(p<>hold_head) then 15476 begin link(p):=get_avail; p:=link(p); info(p):=cur_tok; 15477 end; 15478 end; 15479 done1: 15480 15481 @ @<Scan the template \<v_j>...@>= 15482 p:=hold_head; link(p):=null; 15483 loop@+ begin continue: get_preamble_token; 15484 if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then 15485 goto done2; 15486 if cur_cmd=mac_param then 15487 begin print_err("Only one # is allowed per tab"); 15488 @.Only one \# is allowed...@> 15489 help3("There should be exactly one # between &'s, when an")@/ 15490 ("\halign or \valign is being set up. In this case you had")@/ 15491 ("more than one, so I'm ignoring all but the first."); 15492 error; goto continue; 15493 end; 15494 link(p):=get_avail; p:=link(p); info(p):=cur_tok; 15495 end; 15496 done2: link(p):=get_avail; p:=link(p); 15497 info(p):=end_template_token {put \.{\\endtemplate} at the end} 15498 15499 @ The tricky part about alignments is getting the templates into the 15500 scanner at the right time, and recovering control when a row or column 15501 is finished. 15502 15503 We usually begin a row after each \.{\\cr} has been sensed, unless that 15504 \.{\\cr} is followed by \.{\\noalign} or by the right brace that terminates 15505 the alignment. The |align_peek| routine is used to look ahead and do 15506 the right thing; it either gets a new row started, or gets a \.{\\noalign} 15507 started, or finishes off the alignment. 15508 15509 @<Declare the procedure called |align_peek|@>= 15510 procedure align_peek; 15511 label restart; 15512 begin restart: align_state:=1000000; @<Get the next non-blank non-call token@>; 15513 if cur_cmd=no_align then 15514 begin scan_left_brace; new_save_level(no_align_group); 15515 if mode=-vmode then normal_paragraph; 15516 end 15517 else if cur_cmd=right_brace then fin_align 15518 else if (cur_cmd=car_ret)and(cur_chr=cr_cr_code) then 15519 goto restart {ignore \.{\\crcr}} 15520 else begin init_row; {start a new row} 15521 init_col; {start a new column and replace what we peeked at} 15522 end; 15523 end; 15524 15525 @ To start a row (i.e., a `row' that rhymes with `dough' but not with `bough'), 15526 we enter a new semantic level, copy the first tabskip glue, and change 15527 from internal vertical mode to restricted horizontal mode or vice versa. 15528 The |space_factor| and |prev_depth| are not used on this semantic level, 15529 but we clear them to zero just to be tidy. 15530 15531 @p @t\4@>@<Declare the procedure called |init_span|@>@t@>@/ 15532 procedure init_row; 15533 begin push_nest; mode:=(-hmode-vmode)-mode; 15534 if mode=-hmode then space_factor:=0 @+else prev_depth:=0; 15535 tail_append(new_glue(glue_ptr(preamble))); 15536 subtype(tail):=tab_skip_code+1;@/ 15537 cur_align:=link(preamble); cur_tail:=cur_head; init_span(cur_align); 15538 end; 15539 15540 @ The parameter to |init_span| is a pointer to the alignrecord where the 15541 next column or group of columns will begin. A new semantic level is 15542 entered, so that the columns will generate a list for subsequent packaging. 15543 15544 @<Declare the procedure called |init_span|@>= 15545 procedure init_span(@!p:pointer); 15546 begin push_nest; 15547 if mode=-hmode then space_factor:=1000 15548 else begin prev_depth:=ignore_depth; normal_paragraph; 15549 end; 15550 cur_span:=p; 15551 end; 15552 15553 @ When a column begins, we assume that |cur_cmd| is either |omit| or else 15554 the current token should be put back into the input until the \<u_j> 15555 template has been scanned. (Note that |cur_cmd| might be |tab_mark| or 15556 |car_ret|.) We also assume that |align_state| is approximately 1000000 at 15557 this time. We remain in the same mode, and start the template if it is 15558 called for. 15559 15560 @p procedure init_col; 15561 begin extra_info(cur_align):=cur_cmd; 15562 if cur_cmd=omit then align_state:=0 15563 else begin back_input; begin_token_list(u_part(cur_align),u_template); 15564 end; {now |align_state=1000000|} 15565 end; 15566 15567 @ The scanner sets |align_state| to zero when the \<u_j> template ends. When 15568 a subsequent \.{\\cr} or \.{\\span} or tab mark occurs with |align_state=0|, 15569 the scanner activates the following code, which fires up the \<v_j> template. 15570 We need to remember the |cur_chr|, which is either |cr_cr_code|, |cr_code|, 15571 |span_code|, or a character code, depending on how the column text has ended. 15572 15573 This part of the program had better not be activated when the preamble 15574 to another alignment is being scanned, or when no alignment preamble is active. 15575 15576 @<Insert the \(v)\<v_j>...@>= 15577 begin if (scanner_status=aligning) or (cur_align=null) then 15578 fatal_error("(interwoven alignment preambles are not allowed)"); 15579 @.interwoven alignment preambles...@> 15580 cur_cmd:=extra_info(cur_align); extra_info(cur_align):=cur_chr; 15581 if cur_cmd=omit then begin_token_list(omit_template,v_template) 15582 else begin_token_list(v_part(cur_align),v_template); 15583 align_state:=1000000; goto restart; 15584 end 15585 15586 @ The token list |omit_template| just referred to is a constant token 15587 list that contains the special control sequence \.{\\endtemplate} only. 15588 15589 @<Initialize the special...@>= 15590 info(omit_template):=end_template_token; {|link(omit_template)=null|} 15591 15592 @ When the |endv| command at the end of a \<v_j> template comes through the 15593 scanner, things really start to happen; and it is the |fin_col| routine 15594 that makes them happen. This routine returns |true| if a row as well as a 15595 column has been finished. 15596 15597 @p function fin_col:boolean; 15598 label exit; 15599 var p:pointer; {the alignrecord after the current one} 15600 @!q,@!r:pointer; {temporary pointers for list manipulation} 15601 @!s:pointer; {a new span node} 15602 @!u:pointer; {a new unset box} 15603 @!w:scaled; {natural width} 15604 @!o:glue_ord; {order of infinity} 15605 @!n:halfword; {span counter} 15606 begin if cur_align=null then confusion("endv"); 15607 q:=link(cur_align);@+if q=null then confusion("endv"); 15608 @:this can't happen endv}{\quad endv@> 15609 if align_state<500000 then 15610 fatal_error("(interwoven alignment preambles are not allowed)"); 15611 @.interwoven alignment preambles...@> 15612 p:=link(q); 15613 @<If the preamble list has been traversed, check that the row has ended@>; 15614 if extra_info(cur_align)<>span_code then 15615 begin unsave; new_save_level(align_group);@/ 15616 @<Package an unset box for the current column and record its width@>; 15617 @<Copy the tabskip glue between columns@>; 15618 if extra_info(cur_align)>=cr_code then 15619 begin fin_col:=true; return; 15620 end; 15621 init_span(p); 15622 end; 15623 align_state:=1000000; @<Get the next non-blank non-call token@>; 15624 cur_align:=p; 15625 init_col; fin_col:=false; 15626 exit: end; 15627 15628 @ @<If the preamble list has been traversed, check that the row has ended@>= 15629 if (p=null)and(extra_info(cur_align)<cr_code) then 15630 if cur_loop<>null then @<Lengthen the preamble periodically@> 15631 else begin print_err("Extra alignment tab has been changed to "); 15632 @.Extra alignment tab...@> 15633 print_esc("cr"); 15634 help3("You have given more \span or & marks than there were")@/ 15635 ("in the preamble to the \halign or \valign now in progress.")@/ 15636 ("So I'll assume that you meant to type \cr instead."); 15637 extra_info(cur_align):=cr_code; error; 15638 end 15639 15640 @ @<Lengthen the preamble...@>= 15641 begin link(q):=new_null_box; p:=link(q); {a new alignrecord} 15642 info(p):=end_span; width(p):=null_flag; cur_loop:=link(cur_loop); 15643 @<Copy the templates from node |cur_loop| into node |p|@>; 15644 cur_loop:=link(cur_loop); 15645 link(p):=new_glue(glue_ptr(cur_loop)); 15646 subtype(link(p)):=tab_skip_code+1; 15647 end 15648 15649 @ @<Copy the templates from node |cur_loop| into node |p|@>= 15650 q:=hold_head; r:=u_part(cur_loop); 15651 while r<>null do 15652 begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r); 15653 end; 15654 link(q):=null; u_part(p):=link(hold_head); 15655 q:=hold_head; r:=v_part(cur_loop); 15656 while r<>null do 15657 begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r); 15658 end; 15659 link(q):=null; v_part(p):=link(hold_head) 15660 15661 @ @<Copy the tabskip glue...@>= 15662 tail_append(new_glue(glue_ptr(link(cur_align)))); 15663 subtype(tail):=tab_skip_code+1 15664 15665 @ @<Package an unset...@>= 15666 begin if mode=-hmode then 15667 begin adjust_tail:=cur_tail; u:=hpack(link(head),natural); w:=width(u); 15668 cur_tail:=adjust_tail; adjust_tail:=null; 15669 end 15670 else begin u:=vpackage(link(head),natural,0); w:=height(u); 15671 end; 15672 n:=min_quarterword; {this represents a span count of 1} 15673 if cur_span<>cur_align then @<Update width entry for spanned columns@> 15674 else if w>width(cur_align) then width(cur_align):=w; 15675 type(u):=unset_node; span_count(u):=n;@/ 15676 @<Determine the stretch order@>; 15677 glue_order(u):=o; glue_stretch(u):=total_stretch[o];@/ 15678 @<Determine the shrink order@>; 15679 glue_sign(u):=o; glue_shrink(u):=total_shrink[o];@/ 15680 pop_nest; link(tail):=u; tail:=u; 15681 end 15682 15683 @ A span node is a 2-word record containing |width|, |info|, and |link| 15684 fields. The |link| field is not really a link, it indicates the number of 15685 spanned columns; the |info| field points to a span node for the same 15686 starting column, having a greater extent of spanning, or to |end_span|, 15687 which has the largest possible |link| field; the |width| field holds the 15688 largest natural width corresponding to a particular set of spanned columns. 15689 15690 A list of the maximum widths so far, for spanned columns starting at a 15691 given column, begins with the |info| field of the alignrecord for that 15692 column. 15693 15694 @d span_node_size=2 {number of |mem| words for a span node} 15695 15696 @<Initialize the special list heads...@>= 15697 link(end_span):=max_quarterword+1; info(end_span):=null; 15698 15699 @ @<Update width entry for spanned columns@>= 15700 begin q:=cur_span; 15701 repeat incr(n); q:=link(link(q)); 15702 until q=cur_align; 15703 if n>max_quarterword then confusion("256 spans"); {this can happen, but won't} 15704 @^system dependencies@> 15705 @:this can't happen 256 spans}{\quad 256 spans@> 15706 q:=cur_span; while link(info(q))<n do q:=info(q); 15707 if link(info(q))>n then 15708 begin s:=get_node(span_node_size); info(s):=info(q); link(s):=n; 15709 info(q):=s; width(s):=w; 15710 end 15711 else if width(info(q))<w then width(info(q)):=w; 15712 end 15713 15714 @ At the end of a row, we append an unset box to the current vlist (for 15715 \.{\\halign}) or the current hlist (for \.{\\valign}). This unset box 15716 contains the unset boxes for the columns, separated by the tabskip glue. 15717 Everything will be set later. 15718 15719 @p procedure fin_row; 15720 var p:pointer; {the new unset box} 15721 begin if mode=-hmode then 15722 begin p:=hpack(link(head),natural); 15723 pop_nest; append_to_vlist(p); 15724 if cur_head<>cur_tail then 15725 begin link(tail):=link(cur_head); tail:=cur_tail; 15726 end; 15727 end 15728 else begin p:=vpack(link(head),natural); pop_nest; 15729 link(tail):=p; tail:=p; space_factor:=1000; 15730 end; 15731 type(p):=unset_node; glue_stretch(p):=0; 15732 if every_cr<>null then begin_token_list(every_cr,every_cr_text); 15733 align_peek; 15734 end; {note that |glue_shrink(p)=0| since |glue_shrink==shift_amount|} 15735 15736 @ Finally, we will reach the end of the alignment, and we can breathe a 15737 sigh of relief that memory hasn't overflowed. All the unset boxes will now be 15738 set so that the columns line up, taking due account of spanned columns. 15739 15740 @p procedure@?do_assignments; forward;@t\2@>@/ 15741 procedure@?resume_after_display; forward;@t\2@>@/ 15742 procedure@?build_page; forward;@t\2@>@/ 15743 procedure fin_align; 15744 var @!p,@!q,@!r,@!s,@!u,@!v: pointer; {registers for the list operations} 15745 @!t,@!w:scaled; {width of column} 15746 @!o:scaled; {shift offset for unset boxes} 15747 @!n:halfword; {matching span amount} 15748 @!rule_save:scaled; {temporary storage for |overfull_rule|} 15749 @!aux_save:memory_word; {temporary storage for |aux|} 15750 begin if cur_group<>align_group then confusion("align1"); 15751 @:this can't happen align}{\quad align@> 15752 unsave; {that |align_group| was for individual entries} 15753 if cur_group<>align_group then confusion("align0"); 15754 unsave; {that |align_group| was for the whole alignment} 15755 if nest[nest_ptr-1].mode_field=mmode then o:=display_indent 15756 else o:=0; 15757 @<Go through the preamble list, determining the column widths and 15758 changing the alignrecords to dummy unset boxes@>; 15759 @<Package the preamble list, to determine the actual tabskip glue amounts, 15760 and let |p| point to this prototype box@>; 15761 @<Set the glue in all the unset boxes of the current list@>; 15762 flush_node_list(p); pop_alignment; 15763 @<Insert the \(c)current list into its environment@>; 15764 end;@/ 15765 @t\4@>@<Declare the procedure called |align_peek|@> 15766 15767 @ It's time now to dismantle the preamble list and to compute the column 15768 widths. Let $w_{ij}$ be the maximum of the natural widths of all entries 15769 that span columns $i$ through $j$, inclusive. The alignrecord for column~$i$ 15770 contains $w_{ii}$ in its |width| field, and there is also a linked list of 15771 the nonzero $w_{ij}$ for increasing $j$, accessible via the |info| field; 15772 these span nodes contain the value $j-i+|min_quarterword|$ in their 15773 |link| fields. The values of $w_{ii}$ were initialized to |null_flag|, which 15774 we regard as $-\infty$. 15775 15776 The final column widths are defined by the formula 15777 $$w_j=\max_{1\L i\L j}\biggl( w_{ij}-\sum_{i\L k<j}(t_k+w_k)\biggr),$$ 15778 where $t_k$ is the natural width of the tabskip glue between columns 15779 $k$ and~$k+1$. However, if $w_{ij}=-\infty$ for all |i| in the range 15780 |1<=i<=j| (i.e., if every entry that involved column~|j| also involved 15781 column~|j+1|), we let $w_j=0$, and we zero out the tabskip glue after 15782 column~|j|. 15783 15784 \TeX\ computes these values by using the following scheme: First $w_1=w_{11}$. 15785 Then replace $w_{2j}$ by $\max(w_{2j},w_{1j}-t_1-w_1)$, for all $j>1$. 15786 Then $w_2=w_{22}$. Then replace $w_{3j}$ by $\max(w_{3j},w_{2j}-t_2-w_2)$ 15787 for all $j>2$; and so on. If any $w_j$ turns out to be $-\infty$, its 15788 value is changed to zero and so is the next tabskip. 15789 15790 @<Go through the preamble list,...@>= 15791 q:=link(preamble); 15792 repeat flush_list(u_part(q)); flush_list(v_part(q)); 15793 p:=link(link(q)); 15794 if width(q)=null_flag then 15795 @<Nullify |width(q)| and the tabskip glue following this column@>; 15796 if info(q)<>end_span then 15797 @<Merge the widths in the span nodes of |q| with those of |p|, 15798 destroying the span nodes of |q|@>; 15799 type(q):=unset_node; span_count(q):=min_quarterword; height(q):=0; 15800 depth(q):=0; glue_order(q):=normal; glue_sign(q):=normal; 15801 glue_stretch(q):=0; glue_shrink(q):=0; q:=p; 15802 until q=null 15803 15804 @ @<Nullify |width(q)| and the tabskip glue following this column@>= 15805 begin width(q):=0; r:=link(q); s:=glue_ptr(r); 15806 if s<>zero_glue then 15807 begin add_glue_ref(zero_glue); delete_glue_ref(s); 15808 glue_ptr(r):=zero_glue; 15809 end; 15810 end 15811 15812 @ Merging of two span-node lists is a typical exercise in the manipulation of 15813 linearly linked data structures. The essential invariant in the following 15814 |repeat| loop is that we want to dispense with node |r|, in |q|'s list, 15815 and |u| is its successor; all nodes of |p|'s list up to and including |s| 15816 have been processed, and the successor of |s| matches |r| or precedes |r| 15817 or follows |r|, according as |link(r)=n| or |link(r)>n| or |link(r)<n|. 15818 15819 @<Merge the widths...@>= 15820 begin t:=width(q)+width(glue_ptr(link(q))); 15821 r:=info(q); s:=end_span; info(s):=p; n:=min_quarterword+1; 15822 repeat width(r):=width(r)-t; u:=info(r); 15823 while link(r)>n do 15824 begin s:=info(s); n:=link(info(s))+1; 15825 end; 15826 if link(r)<n then 15827 begin info(r):=info(s); info(s):=r; decr(link(r)); s:=r; 15828 end 15829 else begin if width(r)>width(info(s)) then width(info(s)):=width(r); 15830 free_node(r,span_node_size); 15831 end; 15832 r:=u; 15833 until r=end_span; 15834 end 15835 15836 @ Now the preamble list has been converted to a list of alternating unset 15837 boxes and tabskip glue, where the box widths are equal to the final 15838 column sizes. In case of \.{\\valign}, we change the widths to heights, 15839 so that a correct error message will be produced if the alignment is 15840 overfull or underfull. 15841 15842 @<Package the preamble list...@>= 15843 save_ptr:=save_ptr-2; pack_begin_line:=-mode_line; 15844 if mode=-vmode then 15845 begin rule_save:=overfull_rule; 15846 overfull_rule:=0; {prevent rule from being packaged} 15847 p:=hpack(preamble,saved(1),saved(0)); overfull_rule:=rule_save; 15848 end 15849 else begin q:=link(preamble); 15850 repeat height(q):=width(q); width(q):=0; q:=link(link(q)); 15851 until q=null; 15852 p:=vpack(preamble,saved(1),saved(0)); 15853 q:=link(preamble); 15854 repeat width(q):=height(q); height(q):=0; q:=link(link(q)); 15855 until q=null; 15856 end; 15857 pack_begin_line:=0 15858 15859 @ @<Set the glue in all the unset...@>= 15860 q:=link(head); s:=head; 15861 while q<>null do 15862 begin if not is_char_node(q) then 15863 if type(q)=unset_node then 15864 @<Set the unset box |q| and the unset boxes in it@> 15865 else if type(q)=rule_node then 15866 @<Make the running dimensions in rule |q| extend to the 15867 boundaries of the alignment@>; 15868 s:=q; q:=link(q); 15869 end 15870 15871 @ @<Make the running dimensions in rule |q| extend...@>= 15872 begin if is_running(width(q)) then width(q):=width(p); 15873 if is_running(height(q)) then height(q):=height(p); 15874 if is_running(depth(q)) then depth(q):=depth(p); 15875 if o<>0 then 15876 begin r:=link(q); link(q):=null; q:=hpack(q,natural); 15877 shift_amount(q):=o; link(q):=r; link(s):=q; 15878 end; 15879 end 15880 15881 @ The unset box |q| represents a row that contains one or more unset boxes, 15882 depending on how soon \.{\\cr} occurred in that row. 15883 15884 @<Set the unset box |q| and the unset boxes in it@>= 15885 begin if mode=-vmode then 15886 begin type(q):=hlist_node; width(q):=width(p); 15887 end 15888 else begin type(q):=vlist_node; height(q):=height(p); 15889 end; 15890 glue_order(q):=glue_order(p); glue_sign(q):=glue_sign(p); 15891 glue_set(q):=glue_set(p); shift_amount(q):=o; 15892 r:=link(list_ptr(q)); s:=link(list_ptr(p)); 15893 repeat @<Set the glue in node |r| and change it from an unset node@>; 15894 r:=link(link(r)); s:=link(link(s)); 15895 until r=null; 15896 end 15897 15898 @ A box made from spanned columns will be followed by tabskip glue nodes and 15899 by empty boxes as if there were no spanning. This permits perfect alignment 15900 of subsequent entries, and it prevents values that depend on floating point 15901 arithmetic from entering into the dimensions of any boxes. 15902 15903 @<Set the glue in node |r|...@>= 15904 n:=span_count(r); t:=width(s); w:=t; u:=hold_head; 15905 while n>min_quarterword do 15906 begin decr(n); 15907 @<Append tabskip glue and an empty box to list |u|, 15908 and update |s| and |t| as the prototype nodes are passed@>; 15909 end; 15910 if mode=-vmode then 15911 @<Make the unset node |r| into an |hlist_node| of width |w|, 15912 setting the glue as if the width were |t|@> 15913 else @<Make the unset node |r| into a |vlist_node| of height |w|, 15914 setting the glue as if the height were |t|@>; 15915 shift_amount(r):=0; 15916 if u<>hold_head then {append blank boxes to account for spanned nodes} 15917 begin link(u):=link(r); link(r):=link(hold_head); r:=u; 15918 end 15919 15920 @ @<Append tabskip glue and an empty box to list |u|...@>= 15921 s:=link(s); v:=glue_ptr(s); link(u):=new_glue(v); u:=link(u); 15922 subtype(u):=tab_skip_code+1; t:=t+width(v); 15923 if glue_sign(p)=stretching then 15924 begin if stretch_order(v)=glue_order(p) then 15925 t:=t+round(float(glue_set(p))*stretch(v)); 15926 @^real multiplication@> 15927 end 15928 else if glue_sign(p)=shrinking then 15929 begin if shrink_order(v)=glue_order(p) then 15930 t:=t-round(float(glue_set(p))*shrink(v)); 15931 end; 15932 s:=link(s); link(u):=new_null_box; u:=link(u); t:=t+width(s); 15933 if mode=-vmode then width(u):=width(s)@+else 15934 begin type(u):=vlist_node; height(u):=width(s); 15935 end 15936 15937 @ @<Make the unset node |r| into an |hlist_node| of width |w|...@>= 15938 begin height(r):=height(q); depth(r):=depth(q); 15939 if t=width(r) then 15940 begin glue_sign(r):=normal; glue_order(r):=normal; 15941 set_glue_ratio_zero(glue_set(r)); 15942 end 15943 else if t>width(r) then 15944 begin glue_sign(r):=stretching; 15945 if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r)) 15946 else glue_set(r):=unfloat((t-width(r))/glue_stretch(r)); 15947 @^real division@> 15948 end 15949 else begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking; 15950 if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r)) 15951 else if (glue_order(r)=normal)and(width(r)-t>glue_shrink(r)) then 15952 set_glue_ratio_one(glue_set(r)) 15953 else glue_set(r):=unfloat((width(r)-t)/glue_shrink(r)); 15954 end; 15955 width(r):=w; type(r):=hlist_node; 15956 end 15957 15958 @ @<Make the unset node |r| into a |vlist_node| of height |w|...@>= 15959 begin width(r):=width(q); 15960 if t=height(r) then 15961 begin glue_sign(r):=normal; glue_order(r):=normal; 15962 set_glue_ratio_zero(glue_set(r)); 15963 end 15964 else if t>height(r) then 15965 begin glue_sign(r):=stretching; 15966 if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r)) 15967 else glue_set(r):=unfloat((t-height(r))/glue_stretch(r)); 15968 @^real division@> 15969 end 15970 else begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking; 15971 if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r)) 15972 else if (glue_order(r)=normal)and(height(r)-t>glue_shrink(r)) then 15973 set_glue_ratio_one(glue_set(r)) 15974 else glue_set(r):=unfloat((height(r)-t)/glue_shrink(r)); 15975 end; 15976 height(r):=w; type(r):=vlist_node; 15977 end 15978 15979 @ We now have a completed alignment, in the list that starts at |head| 15980 and ends at |tail|. This list will be merged with the one that encloses 15981 it. (In case the enclosing mode is |mmode|, for displayed formulas, 15982 we will need to insert glue before and after the display; that part of the 15983 program will be deferred until we're more familiar with such operations.) 15984 15985 In restricted horizontal mode, the |clang| part of |aux| is undefined; 15986 an over-cautious \PASCAL\ runtime system may complain about this. 15987 @^dirty \PASCAL@> 15988 15989 @<Insert the \(c)current list into its environment@>= 15990 aux_save:=aux; p:=link(head); q:=tail; pop_nest; 15991 if mode=mmode then @<Finish an alignment in a display@> 15992 else begin aux:=aux_save; link(tail):=p; 15993 if p<>null then tail:=q; 15994 if mode=vmode then build_page; 15995 end 15996 15997 @* \[38] Breaking paragraphs into lines. 15998 We come now to what is probably the most interesting algorithm of \TeX: 15999 the mechanism for choosing the ``best possible'' breakpoints that yield 16000 the individual lines of a paragraph. \TeX's line-breaking algorithm takes 16001 a given horizontal list and converts it to a sequence of boxes that are 16002 appended to the current vertical list. In the course of doing this, it 16003 creates a special data structure containing three kinds of records that are 16004 not used elsewhere in \TeX. Such nodes are created while a paragraph is 16005 being processed, and they are destroyed afterwards; thus, the other parts 16006 of \TeX\ do not need to know anything about how line-breaking is done. 16007 16008 The method used here is based on an approach devised by Michael F. Plass and 16009 @^Plass, Michael Frederick@> 16010 @^Knuth, Donald Ervin@> 16011 the author in 1977, subsequently generalized and improved by the same two 16012 people in 1980. A detailed discussion appears in {\sl Software---Practice 16013 and Experience \bf11} (1981), 1119--1184, where it is shown that the 16014 line-breaking problem can be regarded as a special case of the problem of 16015 computing the shortest path in an acyclic network. The cited paper includes 16016 numerous examples and describes the history of line breaking as it has been 16017 practiced by printers through the ages. The present implementation adds two 16018 new ideas to the algorithm of 1980: Memory space requirements are considerably 16019 reduced by using smaller records for inactive nodes than for active ones, 16020 and arithmetic overflow is avoided by using ``delta distances'' instead of 16021 keeping track of the total distance from the beginning of the paragraph to the 16022 current point. 16023 16024 @ The |line_break| procedure should be invoked only in horizontal mode; it 16025 leaves that mode and places its output into the current vlist of the 16026 enclosing vertical mode (or internal vertical mode). 16027 There is one explicit parameter: |final_widow_penalty| is the amount of 16028 additional penalty to be inserted before the final line of the paragraph. 16029 16030 There are also a number of implicit parameters: The hlist to be broken 16031 starts at |link(head)|, and it is nonempty. The value of |prev_graf| in the 16032 enclosing semantic level tells where the paragraph should begin in the 16033 sequence of line numbers, in case hanging indentation or \.{\\parshape} 16034 is in use; |prev_graf| is zero unless this paragraph is being continued 16035 after a displayed formula. Other implicit parameters, such as the 16036 |par_shape_ptr| and various penalties to use for hyphenation, etc., appear 16037 in |eqtb|. 16038 16039 After |line_break| has acted, it will have updated the current vlist and the 16040 value of |prev_graf|. Furthermore, the global variable |just_box| will 16041 point to the final box created by |line_break|, so that the width of this 16042 line can be ascertained when it is necessary to decide whether to use 16043 |above_display_skip| or |above_display_short_skip| before a displayed formula. 16044 16045 @<Glob...@>= 16046 @!just_box:pointer; {the |hlist_node| for the last line of the new paragraph} 16047 16048 @ Since |line_break| is a rather lengthy procedure---sort of a small world unto 16049 itself---we must build it up little by little, somewhat more cautiously 16050 than we have done with the simpler procedures of \TeX. Here is the 16051 general outline. 16052 16053 @p@t\4@>@<Declare subprocedures for |line_break|@> 16054 procedure line_break(@!final_widow_penalty:integer); 16055 label done,done1,done2,done3,done4,done5,continue; 16056 var @<Local variables for line breaking@>@; 16057 begin pack_begin_line:=mode_line; {this is for over/underfull box messages} 16058 @<Get ready to start line breaking@>; 16059 @<Find optimal breakpoints@>; 16060 @<Break the paragraph at the chosen breakpoints, justify the resulting lines 16061 to the correct widths, and append them to the current vertical list@>; 16062 @<Clean up the memory by removing the break nodes@>; 16063 pack_begin_line:=0; 16064 end; 16065 16066 @ The first task is to move the list from |head| to |temp_head| and go 16067 into the enclosing semantic level. We also append the \.{\\parfillskip} 16068 glue to the end of the paragraph, removing a space (or other glue node) if 16069 it was there, since spaces usually precede blank lines and instances of 16070 `\.{\$\$}'. The |par_fill_skip| is preceded by an infinite penalty, so 16071 it will never be considered as a potential breakpoint. 16072 16073 This code assumes that a |glue_node| and a |penalty_node| occupy the 16074 same number of |mem|~words. 16075 @^data structure assumptions@> 16076 16077 @<Get ready to start...@>= 16078 link(temp_head):=link(head); 16079 if is_char_node(tail) then tail_append(new_penalty(inf_penalty)) 16080 else if type(tail)<>glue_node then tail_append(new_penalty(inf_penalty)) 16081 else begin type(tail):=penalty_node; delete_glue_ref(glue_ptr(tail)); 16082 flush_node_list(leader_ptr(tail)); penalty(tail):=inf_penalty; 16083 end; 16084 link(tail):=new_param_glue(par_fill_skip_code); 16085 init_cur_lang:=prev_graf mod @'200000; 16086 init_l_hyf:=prev_graf div @'20000000; 16087 init_r_hyf:=(prev_graf div @'200000) mod @'100; 16088 pop_nest; 16089 16090 @ When looking for optimal line breaks, \TeX\ creates a ``break node'' for 16091 each break that is {\sl feasible}, in the sense that there is a way to end 16092 a line at the given place without requiring any line to stretch more than 16093 a given tolerance. A break node is characterized by three things: the position 16094 of the break (which is a pointer to a |glue_node|, |math_node|, |penalty_node|, 16095 or |disc_node|); the ordinal number of the line that will follow this 16096 breakpoint; and the fitness classification of the line that has just 16097 ended, i.e., |tight_fit|, |decent_fit|, |loose_fit|, or |very_loose_fit|. 16098 16099 @d tight_fit=3 {fitness classification for lines shrinking 0.5 to 1.0 of their 16100 shrinkability} 16101 @d loose_fit=1 {fitness classification for lines stretching 0.5 to 1.0 of their 16102 stretchability} 16103 @d very_loose_fit=0 {fitness classification for lines stretching more than 16104 their stretchability} 16105 @d decent_fit=2 {fitness classification for all other lines} 16106 16107 @ The algorithm essentially determines the best possible way to achieve 16108 each feasible combination of position, line, and fitness. Thus, it answers 16109 questions like, ``What is the best way to break the opening part of the 16110 paragraph so that the fourth line is a tight line ending at such-and-such 16111 a place?'' However, the fact that all lines are to be the same length 16112 after a certain point makes it possible to regard all sufficiently large 16113 line numbers as equivalent, when the looseness parameter is zero, and this 16114 makes it possible for the algorithm to save space and time. 16115 16116 An ``active node'' and a ``passive node'' are created in |mem| for each 16117 feasible breakpoint that needs to be considered. Active nodes are three 16118 words long and passive nodes are two words long. We need active nodes only 16119 for breakpoints near the place in the paragraph that is currently being 16120 examined, so they are recycled within a comparatively short time after 16121 they are created. 16122 16123 @ An active node for a given breakpoint contains six fields: 16124 16125 \yskip\hang|link| points to the next node in the list of active nodes; the 16126 last active node has |link=last_active|. 16127 16128 \yskip\hang|break_node| points to the passive node associated with this 16129 breakpoint. 16130 16131 \yskip\hang|line_number| is the number of the line that follows this 16132 breakpoint. 16133 16134 \yskip\hang|fitness| is the fitness classification of the line ending at this 16135 breakpoint. 16136 16137 \yskip\hang|type| is either |hyphenated| or |unhyphenated|, depending on 16138 whether this breakpoint is a |disc_node|. 16139 16140 \yskip\hang|total_demerits| is the minimum possible sum of demerits over all 16141 lines leading from the beginning of the paragraph to this breakpoint. 16142 16143 \yskip\noindent 16144 The value of |link(active)| points to the first active node on a linked list 16145 of all currently active nodes. This list is in order by |line_number|, 16146 except that nodes with |line_number>easy_line| may be in any order relative 16147 to each other. 16148 16149 @d active_node_size=3 {number of words in active nodes} 16150 @d fitness==subtype {|very_loose_fit..tight_fit| on final line for this break} 16151 @d break_node==rlink {pointer to the corresponding passive node} 16152 @d line_number==llink {line that begins at this breakpoint} 16153 @d total_demerits(#)==mem[#+2].int {the quantity that \TeX\ minimizes} 16154 @d unhyphenated=0 {the |type| of a normal active break node} 16155 @d hyphenated=1 {the |type| of an active node that breaks at a |disc_node|} 16156 @d last_active==active {the active list ends where it begins} 16157 16158 @ @<Initialize the special list heads...@>= 16159 type(last_active):=hyphenated; line_number(last_active):=max_halfword; 16160 subtype(last_active):=0; {the |subtype| is never examined by the algorithm} 16161 16162 @ The passive node for a given breakpoint contains only four fields: 16163 16164 \yskip\hang|link| points to the passive node created just before this one, 16165 if any, otherwise it is |null|. 16166 16167 \yskip\hang|cur_break| points to the position of this breakpoint in the 16168 horizontal list for the paragraph being broken. 16169 16170 \yskip\hang|prev_break| points to the passive node that should precede this 16171 one in an optimal path to this breakpoint. 16172 16173 \yskip\hang|serial| is equal to |n| if this passive node is the |n|th 16174 one created during the current pass. (This field is used only when 16175 printing out detailed statistics about the line-breaking calculations.) 16176 16177 \yskip\noindent 16178 There is a global variable called |passive| that points to the most 16179 recently created passive node. Another global variable, |printed_node|, 16180 is used to help print out the paragraph when detailed information about 16181 the line-breaking computation is being displayed. 16182 16183 @d passive_node_size=2 {number of words in passive nodes} 16184 @d cur_break==rlink {in passive node, points to position of this breakpoint} 16185 @d prev_break==llink {points to passive node that should precede this one} 16186 @d serial==info {serial number for symbolic identification} 16187 16188 @<Glob...@>= 16189 @!passive:pointer; {most recent node on passive list} 16190 @!printed_node:pointer; {most recent node that has been printed} 16191 @!pass_number:halfword; {the number of passive nodes allocated on this pass} 16192 16193 @ The active list also contains ``delta'' nodes that help the algorithm 16194 compute the badness of individual lines. Such nodes appear only between two 16195 active nodes, and they have |type=delta_node|. If |p| and |r| are active nodes 16196 and if |q| is a delta node between them, so that |link(p)=q| and |link(q)=r|, 16197 then |q| tells the space difference between lines in the horizontal list that 16198 start after breakpoint |p| and lines that start after breakpoint |r|. In 16199 other words, if we know the length of the line that starts after |p| and 16200 ends at our current position, then the corresponding length of the line that 16201 starts after |r| is obtained by adding the amounts in node~|q|. A delta node 16202 contains six scaled numbers, since it must record the net change in glue 16203 stretchability with respect to all orders of infinity. The natural width 16204 difference appears in |mem[q+1].sc|; the stretch differences in units of 16205 pt, fil, fill, and filll appear in |mem[q+2..q+5].sc|; and the shrink difference 16206 appears in |mem[q+6].sc|. The |subtype| field of a delta node is not used. 16207 16208 @d delta_node_size=7 {number of words in a delta node} 16209 @d delta_node=2 {|type| field in a delta node} 16210 16211 @ As the algorithm runs, it maintains a set of six delta-like registers 16212 for the length of the line following the first active breakpoint to the 16213 current position in the given hlist. When it makes a pass through the 16214 active list, it also maintains a similar set of six registers for the 16215 length following the active breakpoint of current interest. A third set 16216 holds the length of an empty line (namely, the sum of \.{\\leftskip} and 16217 \.{\\rightskip}); and a fourth set is used to create new delta nodes. 16218 16219 When we pass a delta node we want to do operations like 16220 $$\hbox{\ignorespaces|for 16221 k:=1 to 6 do cur_active_width[k]:=cur_active_width[k]+mem[q+k].sc|};$$ and we 16222 want to do this without the overhead of |for| loops. The |do_all_six| 16223 macro makes such six-tuples convenient. 16224 16225 @d do_all_six(#)==#(1);#(2);#(3);#(4);#(5);#(6) 16226 16227 @<Glob...@>= 16228 @!active_width:array[1..6] of scaled; 16229 {distance from first active node to~|cur_p|} 16230 @!cur_active_width:array[1..6] of scaled; {distance from current active node} 16231 @!background:array[1..6] of scaled; {length of an ``empty'' line} 16232 @!break_width:array[1..6] of scaled; {length being computed after current break} 16233 16234 @ Let's state the principles of the delta nodes more precisely and concisely, 16235 so that the following programs will be less obscure. For each legal 16236 breakpoint~|p| in the paragraph, we define two quantities $\alpha(p)$ and 16237 $\beta(p)$ such that the length of material in a line from breakpoint~|p| 16238 to breakpoint~|q| is $\gamma+\beta(q)-\alpha(p)$, for some fixed $\gamma$. 16239 Intuitively, $\alpha(p)$ and $\beta(q)$ are the total length of material from 16240 the beginning of the paragraph to a point ``after'' a break at |p| and to a 16241 point ``before'' a break at |q|; and $\gamma$ is the width of an empty line, 16242 namely the length contributed by \.{\\leftskip} and \.{\\rightskip}. 16243 16244 Suppose, for example, that the paragraph consists entirely of alternating 16245 boxes and glue skips; let the boxes have widths $x_1\ldots x_n$ and 16246 let the skips have widths $y_1\ldots y_n$, so that the paragraph can be 16247 represented by $x_1y_1\ldots x_ny_n$. Let $p_i$ be the legal breakpoint 16248 at $y_i$; then $\alpha(p_i)=x_1+y_1+\cdots+x_i+y_i$, and $\beta(p_i)= 16249 x_1+y_1+\cdots+x_i$. To check this, note that the length of material from 16250 $p_2$ to $p_5$, say, is $\gamma+x_3+y_3+x_4+y_4+x_5=\gamma+\beta(p_5) 16251 -\alpha(p_2)$. 16252 16253 The quantities $\alpha$, $\beta$, $\gamma$ involve glue stretchability and 16254 shrinkability as well as a natural width. If we were to compute $\alpha(p)$ 16255 and $\beta(p)$ for each |p|, we would need multiple precision arithmetic, and 16256 the multiprecise numbers would have to be kept in the active nodes. 16257 \TeX\ avoids this problem by working entirely with relative differences 16258 or ``deltas.'' Suppose, for example, that the active list contains 16259 $a_1\,\delta_1\,a_2\,\delta_2\,a_3$, where the |a|'s are active breakpoints 16260 and the $\delta$'s are delta nodes. Then $\delta_1=\alpha(a_1)-\alpha(a_2)$ 16261 and $\delta_2=\alpha(a_2)-\alpha(a_3)$. If the line breaking algorithm is 16262 currently positioned at some other breakpoint |p|, the |active_width| array 16263 contains the value $\gamma+\beta(p)-\alpha(a_1)$. If we are scanning through 16264 the list of active nodes and considering a tentative line that runs from 16265 $a_2$ to~|p|, say, the |cur_active_width| array will contain the value 16266 $\gamma+\beta(p)-\alpha(a_2)$. Thus, when we move from $a_2$ to $a_3$, 16267 we want to add $\alpha(a_2)-\alpha(a_3)$ to |cur_active_width|; and this 16268 is just $\delta_2$, which appears in the active list between $a_2$ and 16269 $a_3$. The |background| array contains $\gamma$. The |break_width| array 16270 will be used to calculate values of new delta nodes when the active 16271 list is being updated. 16272 16273 @ Glue nodes in a horizontal list that is being paragraphed are not supposed to 16274 include ``infinite'' shrinkability; that is why the algorithm maintains 16275 four registers for stretching but only one for shrinking. If the user tries to 16276 introduce infinite shrinkability, the shrinkability will be reset to finite 16277 and an error message will be issued. A boolean variable |no_shrink_error_yet| 16278 prevents this error message from appearing more than once per paragraph. 16279 16280 @d check_shrinkage(#)==if (shrink_order(#)<>normal)and(shrink(#)<>0) then 16281 begin #:=finite_shrink(#); 16282 end 16283 16284 @<Glob...@>= 16285 @!no_shrink_error_yet:boolean; {have we complained about infinite shrinkage?} 16286 16287 @ @<Declare subprocedures for |line_break|@>= 16288 function finite_shrink(@!p:pointer):pointer; {recovers from infinite shrinkage} 16289 var q:pointer; {new glue specification} 16290 begin if no_shrink_error_yet then 16291 begin no_shrink_error_yet:=false; 16292 @!stat if tracing_paragraphs>0 then end_diagnostic(true);@+tats@; 16293 print_err("Infinite glue shrinkage found in a paragraph"); 16294 @.Infinite glue shrinkage...@> 16295 help5("The paragraph just ended includes some glue that has")@/ 16296 ("infinite shrinkability, e.g., `\hskip 0pt minus 1fil'.")@/ 16297 ("Such glue doesn't belong there---it allows a paragraph")@/ 16298 ("of any length to fit on one line. But it's safe to proceed,")@/ 16299 ("since the offensive shrinkability has been made finite."); 16300 error; 16301 @!stat if tracing_paragraphs>0 then begin_diagnostic;@+tats@; 16302 end; 16303 q:=new_spec(p); shrink_order(q):=normal; 16304 delete_glue_ref(p); finite_shrink:=q; 16305 end; 16306 16307 @ @<Get ready to start...@>= 16308 no_shrink_error_yet:=true;@/ 16309 check_shrinkage(left_skip); check_shrinkage(right_skip);@/ 16310 q:=left_skip; r:=right_skip; background[1]:=width(q)+width(r);@/ 16311 background[2]:=0; background[3]:=0; background[4]:=0; background[5]:=0;@/ 16312 background[2+stretch_order(q)]:=stretch(q);@/ 16313 background[2+stretch_order(r)]:=@|background[2+stretch_order(r)]+stretch(r);@/ 16314 background[6]:=shrink(q)+shrink(r); 16315 16316 @ A pointer variable |cur_p| runs through the given horizontal list as we look 16317 for breakpoints. This variable is global, since it is used both by |line_break| 16318 and by its subprocedure |try_break|. 16319 16320 Another global variable called |threshold| is used to determine the feasibility 16321 of individual lines: Breakpoints are feasible if there is a way to reach 16322 them without creating lines whose badness exceeds |threshold|. (The 16323 badness is compared to |threshold| before penalties are added, so that 16324 penalty values do not affect the feasibility of breakpoints, except that 16325 no break is allowed when the penalty is 10000 or more.) If |threshold| 16326 is 10000 or more, all legal breaks are considered feasible, since the 16327 |badness| function specified above never returns a value greater than~10000. 16328 16329 Up to three passes might be made through the paragraph in an attempt to find at 16330 least one set of feasible breakpoints. On the first pass, we have 16331 |threshold=pretolerance| and |second_pass=final_pass=false|. 16332 If this pass fails to find a 16333 feasible solution, |threshold| is set to |tolerance|, |second_pass| is set 16334 |true|, and an attempt is made to hyphenate as many words as possible. 16335 If that fails too, we add |emergency_stretch| to the background 16336 stretchability and set |final_pass=true|. 16337 16338 @<Glob...@>= 16339 @!cur_p:pointer; {the current breakpoint under consideration} 16340 @!second_pass:boolean; {is this our second attempt to break this paragraph?} 16341 @!final_pass:boolean; {is this our final attempt to break this paragraph?} 16342 @!threshold:integer; {maximum badness on feasible lines} 16343 16344 @ The heart of the line-breaking procedure is `|try_break|', a subroutine 16345 that tests if the current breakpoint |cur_p| is feasible, by running 16346 through the active list to see what lines of text can be made from active 16347 nodes to~|cur_p|. If feasible breaks are possible, new break nodes are 16348 created. If |cur_p| is too far from an active node, that node is 16349 deactivated. 16350 16351 The parameter |pi| to |try_break| is the penalty associated 16352 with a break at |cur_p|; we have |pi=eject_penalty| if the break is forced, 16353 and |pi=inf_penalty| if the break is illegal. 16354 16355 The other parameter, |break_type|, is set to |hyphenated| or |unhyphenated|, 16356 depending on whether or not the current break is at a |disc_node|. The 16357 end of a paragraph is also regarded as `|hyphenated|'; this case is 16358 distinguishable by the condition |cur_p=null|. 16359 16360 @d copy_to_cur_active(#)==cur_active_width[#]:=active_width[#] 16361 @d deactivate=60 {go here when node |r| should be deactivated} 16362 16363 @<Declare subprocedures for |line_break|@>= 16364 procedure try_break(@!pi:integer;@!break_type:small_number); 16365 label exit,done,done1,continue,deactivate; 16366 var r:pointer; {runs through the active list} 16367 @!prev_r:pointer; {stays a step behind |r|} 16368 @!old_l:halfword; {maximum line number in current equivalence class of lines} 16369 @!no_break_yet:boolean; {have we found a feasible break at |cur_p|?} 16370 @<Other local variables for |try_break|@>@; 16371 begin @<Make sure that |pi| is in the proper range@>; 16372 no_break_yet:=true; prev_r:=active; old_l:=0; 16373 do_all_six(copy_to_cur_active); 16374 loop@+ begin continue: r:=link(prev_r); 16375 @<If node |r| is of type |delta_node|, update |cur_active_width|, 16376 set |prev_r| and |prev_prev_r|, then |goto continue|@>; 16377 @<If a line number class has ended, create new active nodes for 16378 the best feasible breaks in that class; then |return| 16379 if |r=last_active|, otherwise compute the new |line_width|@>; 16380 @<Consider the demerits for a line from |r| to |cur_p|; 16381 deactivate node |r| if it should no longer be active; 16382 then |goto continue| if a line from |r| to |cur_p| is infeasible, 16383 otherwise record a new feasible break@>; 16384 end; 16385 exit: @!stat @<Update the value of |printed_node| for 16386 symbolic displays@>@+tats@; 16387 end; 16388 16389 @ @<Other local variables for |try_break|@>= 16390 @!prev_prev_r:pointer; {a step behind |prev_r|, if |type(prev_r)=delta_node|} 16391 @!s:pointer; {runs through nodes ahead of |cur_p|} 16392 @!q:pointer; {points to a new node being created} 16393 @!v:pointer; {points to a glue specification or a node ahead of |cur_p|} 16394 @!t:integer; {node count, if |cur_p| is a discretionary node} 16395 @!f:internal_font_number; {used in character width calculation} 16396 @!l:halfword; {line number of current active node} 16397 @!node_r_stays_active:boolean; {should node |r| remain in the active list?} 16398 @!line_width:scaled; {the current line will be justified to this width} 16399 @!fit_class:very_loose_fit..tight_fit; {possible fitness class of test line} 16400 @!b:halfword; {badness of test line} 16401 @!d:integer; {demerits of test line} 16402 @!artificial_demerits:boolean; {has |d| been forced to zero?} 16403 @!save_link:pointer; {temporarily holds value of |link(cur_p)|} 16404 @!shortfall:scaled; {used in badness calculations} 16405 16406 @ @<Make sure that |pi| is in the proper range@>= 16407 if abs(pi)>=inf_penalty then 16408 if pi>0 then return {this breakpoint is inhibited by infinite penalty} 16409 else pi:=eject_penalty {this breakpoint will be forced} 16410 16411 @ The following code uses the fact that |type(last_active)<>delta_node|. 16412 16413 @d update_width(#)==@| 16414 cur_active_width[#]:=cur_active_width[#]+mem[r+#].sc 16415 16416 @<If node |r|...@>= 16417 @^inner loop@> 16418 if type(r)=delta_node then 16419 begin do_all_six(update_width); 16420 prev_prev_r:=prev_r; prev_r:=r; goto continue; 16421 end 16422 16423 @ As we consider various ways to end a line at |cur_p|, in a given line number 16424 class, we keep track of the best total demerits known, in an array with 16425 one entry for each of the fitness classifications. For example, 16426 |minimal_demerits[tight_fit]| contains the fewest total demerits of feasible 16427 line breaks ending at |cur_p| with a |tight_fit| line; |best_place[tight_fit]| 16428 points to the passive node for the break before~|cur_p| that achieves such 16429 an optimum; and |best_pl_line[tight_fit]| is the |line_number| field in the 16430 active node corresponding to |best_place[tight_fit]|. When no feasible break 16431 sequence is known, the |minimal_demerits| entries will be equal to 16432 |awful_bad|, which is $2^{30}-1$. Another variable, |minimum_demerits|, 16433 keeps track of the smallest value in the |minimal_demerits| array. 16434 16435 @d awful_bad==@'7777777777 {more than a billion demerits} 16436 16437 @<Global...@>= 16438 @!minimal_demerits:array[very_loose_fit..tight_fit] of integer; {best total 16439 demerits known for current line class and position, given the fitness} 16440 @!minimum_demerits:integer; {best total demerits known for current line class 16441 and position} 16442 @!best_place:array[very_loose_fit..tight_fit] of pointer; {how to achieve 16443 |minimal_demerits|} 16444 @!best_pl_line:array[very_loose_fit..tight_fit] of halfword; {corresponding 16445 line number} 16446 16447 @ @<Get ready to start...@>= 16448 minimum_demerits:=awful_bad; 16449 minimal_demerits[tight_fit]:=awful_bad; 16450 minimal_demerits[decent_fit]:=awful_bad; 16451 minimal_demerits[loose_fit]:=awful_bad; 16452 minimal_demerits[very_loose_fit]:=awful_bad; 16453 16454 @ The first part of the following code is part of \TeX's inner loop, so 16455 we don't want to waste any time. The current active node, namely node |r|, 16456 contains the line number that will be considered next. At the end of the 16457 list we have arranged the data structure so that |r=last_active| and 16458 |line_number(last_active)>old_l|. 16459 @^inner loop@> 16460 16461 @<If a line number class...@>= 16462 begin l:=line_number(r); 16463 if l>old_l then 16464 begin {now we are no longer in the inner loop} 16465 if (minimum_demerits<awful_bad)and@| 16466 ((old_l<>easy_line)or(r=last_active)) then 16467 @<Create new active nodes for the best feasible breaks 16468 just found@>; 16469 if r=last_active then return; 16470 @<Compute the new line width@>; 16471 end; 16472 end 16473 16474 @ It is not necessary to create new active nodes having |minimal_demerits| 16475 greater than 16476 |minimum_demerits+abs(adj_demerits)|, since such active nodes will never 16477 be chosen in the final paragraph breaks. This observation allows us to 16478 omit a substantial number of feasible breakpoints from further consideration. 16479 16480 @<Create new active nodes...@>= 16481 begin if no_break_yet then @<Compute the values of |break_width|@>; 16482 @<Insert a delta node to prepare for breaks at |cur_p|@>; 16483 if abs(adj_demerits)>=awful_bad-minimum_demerits then 16484 minimum_demerits:=awful_bad-1 16485 else minimum_demerits:=minimum_demerits+abs(adj_demerits); 16486 for fit_class:=very_loose_fit to tight_fit do 16487 begin if minimal_demerits[fit_class]<=minimum_demerits then 16488 @<Insert a new active node 16489 from |best_place[fit_class]| to |cur_p|@>; 16490 minimal_demerits[fit_class]:=awful_bad; 16491 end; 16492 minimum_demerits:=awful_bad; 16493 @<Insert a delta node to prepare for the next active node@>; 16494 end 16495 16496 @ When we insert a new active node for a break at |cur_p|, suppose this 16497 new node is to be placed just before active node |a|; then we essentially 16498 want to insert `$\delta\,|cur_p|\,\delta^\prime$' before |a|, where 16499 $\delta=\alpha(a)-\alpha(|cur_p|)$ and $\delta^\prime=\alpha(|cur_p|)-\alpha(a)$ 16500 in the notation explained above. The |cur_active_width| array now holds 16501 $\gamma+\beta(|cur_p|)-\alpha(a)$; so $\delta$ can be obtained by 16502 subtracting |cur_active_width| from the quantity $\gamma+\beta(|cur_p|)- 16503 \alpha(|cur_p|)$. The latter quantity can be regarded as the length of a 16504 line ``from |cur_p| to |cur_p|''; we call it the |break_width| at |cur_p|. 16505 16506 The |break_width| is usually negative, since it consists of the background 16507 (which is normally zero) minus the width of nodes following~|cur_p| that are 16508 eliminated after a break. If, for example, node |cur_p| is a glue node, the 16509 width of this glue is subtracted from the background; and we also look 16510 ahead to eliminate all subsequent glue and penalty and kern and math 16511 nodes, subtracting their widths as well. 16512 16513 Kern nodes do not disappear at a line break unless they are |explicit|. 16514 16515 @d set_break_width_to_background(#)==break_width[#]:=background[#] 16516 16517 @<Compute the values of |break...@>= 16518 begin no_break_yet:=false; do_all_six(set_break_width_to_background); 16519 s:=cur_p; 16520 if break_type>unhyphenated then if cur_p<>null then 16521 @<Compute the discretionary |break_width| values@>; 16522 while s<>null do 16523 begin if is_char_node(s) then goto done; 16524 case type(s) of 16525 glue_node:@<Subtract glue from |break_width|@>; 16526 penalty_node: do_nothing; 16527 math_node: break_width[1]:=break_width[1]-width(s); 16528 kern_node: if subtype(s)<>explicit then goto done 16529 else break_width[1]:=break_width[1]-width(s); 16530 othercases goto done 16531 endcases;@/ 16532 s:=link(s); 16533 end; 16534 done: end 16535 16536 @ @<Subtract glue from |break...@>= 16537 begin v:=glue_ptr(s); break_width[1]:=break_width[1]-width(v); 16538 break_width[2+stretch_order(v)]:=break_width[2+stretch_order(v)]-stretch(v); 16539 break_width[6]:=break_width[6]-shrink(v); 16540 end 16541 16542 @ When |cur_p| is a discretionary break, the length of a line ``from |cur_p| to 16543 |cur_p|'' has to be defined properly so that the other calculations work out. 16544 Suppose that the pre-break text at |cur_p| has length $l_0$, the post-break 16545 text has length $l_1$, and the replacement text has length |l|. Suppose 16546 also that |q| is the node following the replacement text. Then length of a 16547 line from |cur_p| to |q| will be computed as $\gamma+\beta(q)-\alpha(|cur_p|)$, 16548 where $\beta(q)=\beta(|cur_p|)-l_0+l$. The actual length will be the background 16549 plus $l_1$, so the length from |cur_p| to |cur_p| should be $\gamma+l_0+l_1-l$. 16550 If the post-break text of the discretionary is empty, a break may also 16551 discard~|q|; in that unusual case we subtract the length of~|q| and any 16552 other nodes that will be discarded after the discretionary break. 16553 16554 The value of $l_0$ need not be computed, since |line_break| will put 16555 it into the global variable |disc_width| before calling |try_break|. 16556 16557 @<Glob...@>= 16558 @!disc_width:scaled; {the length of discretionary material preceding a break} 16559 16560 @ @<Compute the discretionary |break...@>= 16561 begin t:=replace_count(cur_p); v:=cur_p; s:=post_break(cur_p); 16562 while t>0 do 16563 begin decr(t); v:=link(v); 16564 @<Subtract the width of node |v| from |break_width|@>; 16565 end; 16566 while s<>null do 16567 begin @<Add the width of node |s| to |break_width|@>; 16568 s:=link(s); 16569 end; 16570 break_width[1]:=break_width[1]+disc_width; 16571 if post_break(cur_p)=null then s:=link(v); 16572 {nodes may be discardable after the break} 16573 end 16574 16575 @ Replacement texts and discretionary texts are supposed to contain 16576 only character nodes, kern nodes, ligature nodes, and box or rule nodes. 16577 16578 @<Subtract the width of node |v|...@>= 16579 if is_char_node(v) then 16580 begin f:=font(v); 16581 break_width[1]:=break_width[1]-char_width(f)(char_info(f)(character(v))); 16582 end 16583 else case type(v) of 16584 ligature_node: begin f:=font(lig_char(v));@/ 16585 break_width[1]:=@|break_width[1]- 16586 char_width(f)(char_info(f)(character(lig_char(v)))); 16587 end; 16588 hlist_node,vlist_node,rule_node,kern_node: 16589 break_width[1]:=break_width[1]-width(v); 16590 othercases confusion("disc1") 16591 @:this can't happen disc1}{\quad disc1@> 16592 endcases 16593 16594 @ @<Add the width of node |s| to |b...@>= 16595 if is_char_node(s) then 16596 begin f:=font(s); 16597 break_width[1]:=@|break_width[1]+char_width(f)(char_info(f)(character(s))); 16598 end 16599 else case type(s) of 16600 ligature_node: begin f:=font(lig_char(s)); 16601 break_width[1]:=break_width[1]+ 16602 char_width(f)(char_info(f)(character(lig_char(s)))); 16603 end; 16604 hlist_node,vlist_node,rule_node,kern_node: 16605 break_width[1]:=break_width[1]+width(s); 16606 othercases confusion("disc2") 16607 @:this can't happen disc2}{\quad disc2@> 16608 endcases 16609 16610 @ We use the fact that |type(active)<>delta_node|. 16611 16612 @d convert_to_break_width(#)==@| 16613 mem[prev_r+#].sc:=@|@t\hskip10pt@>mem[prev_r+#].sc 16614 -cur_active_width[#]+break_width[#] 16615 @d store_break_width(#)==active_width[#]:=break_width[#] 16616 @d new_delta_to_break_width(#)==@| 16617 mem[q+#].sc:=break_width[#]-cur_active_width[#] 16618 16619 @<Insert a delta node to prepare for breaks at |cur_p|@>= 16620 if type(prev_r)=delta_node then {modify an existing delta node} 16621 begin do_all_six(convert_to_break_width); 16622 end 16623 else if prev_r=active then {no delta node needed at the beginning} 16624 begin do_all_six(store_break_width); 16625 end 16626 else begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/ 16627 subtype(q):=0; {the |subtype| is not used} 16628 do_all_six(new_delta_to_break_width); 16629 link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q; 16630 end 16631 16632 @ When the following code is performed, we will have just inserted at 16633 least one active node before |r|, so |type(prev_r)<>delta_node|. 16634 16635 @d new_delta_from_break_width(#)==@|mem[q+#].sc:= 16636 cur_active_width[#]-break_width[#] 16637 16638 @<Insert a delta node to prepare for the next active node@>= 16639 if r<>last_active then 16640 begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/ 16641 subtype(q):=0; {the |subtype| is not used} 16642 do_all_six(new_delta_from_break_width); 16643 link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q; 16644 end 16645 16646 @ When we create an active node, we also create the corresponding 16647 passive node. 16648 16649 @<Insert a new active node from |best_place[fit_class]| to |cur_p|@>= 16650 begin q:=get_node(passive_node_size); 16651 link(q):=passive; passive:=q; cur_break(q):=cur_p; 16652 @!stat incr(pass_number); serial(q):=pass_number;@+tats@;@/ 16653 prev_break(q):=best_place[fit_class];@/ 16654 q:=get_node(active_node_size); break_node(q):=passive; 16655 line_number(q):=best_pl_line[fit_class]+1; 16656 fitness(q):=fit_class; type(q):=break_type; 16657 total_demerits(q):=minimal_demerits[fit_class]; 16658 link(q):=r; link(prev_r):=q; prev_r:=q; 16659 @!stat if tracing_paragraphs>0 then 16660 @<Print a symbolic description of the new break node@>; 16661 tats@;@/ 16662 end 16663 16664 @ @<Print a symbolic description of the new break node@>= 16665 begin print_nl("@@@@"); print_int(serial(passive)); 16666 @.\AT!\AT!@> 16667 print(": line "); print_int(line_number(q)-1); 16668 print_char("."); print_int(fit_class); 16669 if break_type=hyphenated then print_char("-"); 16670 print(" t="); print_int(total_demerits(q)); 16671 print(" -> @@@@"); 16672 if prev_break(passive)=null then print_char("0") 16673 else print_int(serial(prev_break(passive))); 16674 end 16675 16676 @ The length of lines depends on whether the user has specified 16677 \.{\\parshape} or \.{\\hangindent}. If |par_shape_ptr| is not null, it 16678 points to a $(2n+1)$-word record in |mem|, where the |info| in the first 16679 word contains the value of |n|, and the other $2n$ words contain the left 16680 margins and line lengths for the first |n| lines of the paragraph; the 16681 specifications for line |n| apply to all subsequent lines. If 16682 |par_shape_ptr=null|, the shape of the paragraph depends on the value of 16683 |n=hang_after|; if |n>=0|, hanging indentation takes place on lines |n+1|, 16684 |n+2|, \dots, otherwise it takes place on lines 1, \dots, $\vert 16685 n\vert$. When hanging indentation is active, the left margin is 16686 |hang_indent|, if |hang_indent>=0|, else it is 0; the line length is 16687 $|hsize|-\vert|hang_indent|\vert$. The normal setting is 16688 |par_shape_ptr=null|, |hang_after=1|, and |hang_indent=0|. 16689 Note that if |hang_indent=0|, the value of |hang_after| is irrelevant. 16690 @^length of lines@> @^hanging indentation@> 16691 16692 @<Glob...@>= 16693 @!easy_line:halfword; {line numbers |>easy_line| are equivalent in break nodes} 16694 @!last_special_line:halfword; {line numbers |>last_special_line| all have 16695 the same width} 16696 @!first_width:scaled; {the width of all lines |<=last_special_line|, if 16697 no \.{\\parshape} has been specified} 16698 @!second_width:scaled; {the width of all lines |>last_special_line|} 16699 @!first_indent:scaled; {left margin to go with |first_width|} 16700 @!second_indent:scaled; {left margin to go with |second_width|} 16701 16702 @ We compute the values of |easy_line| and the other local variables relating 16703 to line length when the |line_break| procedure is initializing itself. 16704 16705 @<Get ready to start...@>= 16706 if par_shape_ptr=null then 16707 if hang_indent=0 then 16708 begin last_special_line:=0; second_width:=hsize; 16709 second_indent:=0; 16710 end 16711 else @<Set line length parameters in preparation for hanging indentation@> 16712 else begin last_special_line:=info(par_shape_ptr)-1; 16713 second_width:=mem[par_shape_ptr+2*(last_special_line+1)].sc; 16714 second_indent:=mem[par_shape_ptr+2*last_special_line+1].sc; 16715 end; 16716 if looseness=0 then easy_line:=last_special_line 16717 else easy_line:=max_halfword 16718 16719 @ @<Set line length parameters in preparation for hanging indentation@>= 16720 begin last_special_line:=abs(hang_after); 16721 if hang_after<0 then 16722 begin first_width:=hsize-abs(hang_indent); 16723 if hang_indent>=0 then first_indent:=hang_indent 16724 else first_indent:=0; 16725 second_width:=hsize; second_indent:=0; 16726 end 16727 else begin first_width:=hsize; first_indent:=0; 16728 second_width:=hsize-abs(hang_indent); 16729 if hang_indent>=0 then second_indent:=hang_indent 16730 else second_indent:=0; 16731 end; 16732 end 16733 16734 @ When we come to the following code, we have just encountered the first 16735 active node~|r| whose |line_number| field contains |l|. Thus we want to 16736 compute the length of the $l\mskip1mu$th line of the current paragraph. Furthermore, 16737 we want to set |old_l| to the last number in the class of line numbers 16738 equivalent to~|l|. 16739 16740 @<Compute the new line width@>= 16741 if l>easy_line then 16742 begin line_width:=second_width; old_l:=max_halfword-1; 16743 end 16744 else begin old_l:=l; 16745 if l>last_special_line then line_width:=second_width 16746 else if par_shape_ptr=null then line_width:=first_width 16747 else line_width:=mem[par_shape_ptr+2*l@,].sc; 16748 end 16749 16750 @ The remaining part of |try_break| deals with the calculation of 16751 demerits for a break from |r| to |cur_p|. 16752 16753 The first thing to do is calculate the badness, |b|. This value will always 16754 be between zero and |inf_bad+1|; the latter value occurs only in the 16755 case of lines from |r| to |cur_p| that cannot shrink enough to fit the necessary 16756 width. In such cases, node |r| will be deactivated. 16757 We also deactivate node~|r| when a break at~|cur_p| is forced, since future 16758 breaks must go through a forced break. 16759 16760 @<Consider the demerits for a line from |r| to |cur_p|...@>= 16761 begin artificial_demerits:=false;@/ 16762 @^inner loop@> 16763 shortfall:=line_width-cur_active_width[1]; {we're this much too short} 16764 if shortfall>0 then 16765 @<Set the value of |b| to the badness for stretching the line, 16766 and compute the corresponding |fit_class|@> 16767 else @<Set the value of |b| to the badness for shrinking the line, 16768 and compute the corresponding |fit_class|@>; 16769 if (b>inf_bad)or(pi=eject_penalty) then 16770 @<Prepare to deactivate node~|r|, and |goto deactivate| unless 16771 there is a reason to consider lines of text from |r| to |cur_p|@> 16772 else begin prev_r:=r; 16773 if b>threshold then goto continue; 16774 node_r_stays_active:=true; 16775 end; 16776 @<Record a new feasible break@>; 16777 if node_r_stays_active then goto continue; {|prev_r| has been set to |r|} 16778 deactivate: @<Deactivate node |r|@>; 16779 end 16780 16781 @ When a line must stretch, the available stretchability can be found in the 16782 subarray |cur_active_width[2..5]|, in units of points, fil, fill, and filll. 16783 16784 The present section is part of \TeX's inner loop, and it is most often performed 16785 when the badness is infinite; therefore it is worth while to make a quick 16786 test for large width excess and small stretchability, before calling the 16787 |badness| subroutine. 16788 @^inner loop@> 16789 16790 @<Set the value of |b| to the badness for stretching...@>= 16791 if (cur_active_width[3]<>0)or(cur_active_width[4]<>0)or@| 16792 (cur_active_width[5]<>0) then 16793 begin b:=0; fit_class:=decent_fit; {infinite stretch} 16794 end 16795 else begin if shortfall>7230584 then if cur_active_width[2]<1663497 then 16796 begin b:=inf_bad; fit_class:=very_loose_fit; goto done1; 16797 end; 16798 b:=badness(shortfall,cur_active_width[2]); 16799 if b>12 then 16800 if b>99 then fit_class:=very_loose_fit 16801 else fit_class:=loose_fit 16802 else fit_class:=decent_fit; 16803 done1: 16804 end 16805 16806 @ Shrinkability is never infinite in a paragraph; 16807 we can shrink the line from |r| to |cur_p| by at most |cur_active_width[6]|. 16808 16809 @<Set the value of |b| to the badness for shrinking...@>= 16810 begin if -shortfall>cur_active_width[6] then b:=inf_bad+1 16811 else b:=badness(-shortfall,cur_active_width[6]); 16812 if b>12 then fit_class:=tight_fit@+else fit_class:=decent_fit; 16813 end 16814 16815 @ During the final pass, we dare not lose all active nodes, lest we lose 16816 touch with the line breaks already found. The code shown here makes sure 16817 that such a catastrophe does not happen, by permitting overfull boxes as 16818 a last resort. This particular part of \TeX\ was a source of several subtle 16819 bugs before the correct program logic was finally discovered; readers 16820 who seek to ``improve'' \TeX\ should therefore think thrice before daring 16821 to make any changes here. 16822 @^overfull boxes@> 16823 16824 @<Prepare to deactivate node~|r|, and |goto deactivate| unless...@>= 16825 begin if final_pass and (minimum_demerits=awful_bad) and@| 16826 (link(r)=last_active) and 16827 (prev_r=active) then 16828 artificial_demerits:=true {set demerits zero, this break is forced} 16829 else if b>threshold then goto deactivate; 16830 node_r_stays_active:=false; 16831 end 16832 16833 @ When we get to this part of the code, the line from |r| to |cur_p| is 16834 feasible, its badness is~|b|, and its fitness classification is |fit_class|. 16835 We don't want to make an active node for this break yet, but we will 16836 compute the total demerits and record them in the |minimal_demerits| array, 16837 if such a break is the current champion among all ways to get to |cur_p| 16838 in a given line-number class and fitness class. 16839 16840 @<Record a new feasible break@>= 16841 if artificial_demerits then d:=0 16842 else @<Compute the demerits, |d|, from |r| to |cur_p|@>; 16843 @!stat if tracing_paragraphs>0 then 16844 @<Print a symbolic description of this feasible break@>; 16845 tats@;@/ 16846 d:=d+total_demerits(r); {this is the minimum total demerits 16847 from the beginning to |cur_p| via |r|} 16848 if d<=minimal_demerits[fit_class] then 16849 begin minimal_demerits[fit_class]:=d; 16850 best_place[fit_class]:=break_node(r); best_pl_line[fit_class]:=l; 16851 if d<minimum_demerits then minimum_demerits:=d; 16852 end 16853 16854 @ @<Print a symbolic description of this feasible break@>= 16855 begin if printed_node<>cur_p then 16856 @<Print the list between |printed_node| and |cur_p|, 16857 then set |printed_node:=cur_p|@>; 16858 print_nl("@@"); 16859 @.\AT!@> 16860 if cur_p=null then print_esc("par") 16861 else if type(cur_p)<>glue_node then 16862 begin if type(cur_p)=penalty_node then print_esc("penalty") 16863 else if type(cur_p)=disc_node then print_esc("discretionary") 16864 else if type(cur_p)=kern_node then print_esc("kern") 16865 else print_esc("math"); 16866 end; 16867 print(" via @@@@"); 16868 if break_node(r)=null then print_char("0") 16869 else print_int(serial(break_node(r))); 16870 print(" b="); 16871 if b>inf_bad then print_char("*")@+else print_int(b); 16872 @.*\relax@> 16873 print(" p="); print_int(pi); print(" d="); 16874 if artificial_demerits then print_char("*")@+else print_int(d); 16875 end 16876 16877 @ @<Print the list between |printed_node| and |cur_p|...@>= 16878 begin print_nl(""); 16879 if cur_p=null then short_display(link(printed_node)) 16880 else begin save_link:=link(cur_p); 16881 link(cur_p):=null; print_nl(""); short_display(link(printed_node)); 16882 link(cur_p):=save_link; 16883 end; 16884 printed_node:=cur_p; 16885 end 16886 16887 @ When the data for a discretionary break is being displayed, we will have 16888 printed the |pre_break| and |post_break| lists; we want to skip over the 16889 third list, so that the discretionary data will not appear twice. The 16890 following code is performed at the very end of |try_break|. 16891 16892 @<Update the value of |printed_node|...@>= 16893 if cur_p=printed_node then if cur_p<>null then if type(cur_p)=disc_node then 16894 begin t:=replace_count(cur_p); 16895 while t>0 do 16896 begin decr(t); printed_node:=link(printed_node); 16897 end; 16898 end 16899 16900 @ @<Compute the demerits, |d|, from |r| to |cur_p|@>= 16901 begin d:=line_penalty+b; 16902 if abs(d)>=10000 then d:=100000000@+else d:=d*d; 16903 if pi<>0 then 16904 if pi>0 then d:=d+pi*pi 16905 else if pi>eject_penalty then d:=d-pi*pi; 16906 if (break_type=hyphenated)and(type(r)=hyphenated) then 16907 if cur_p<>null then d:=d+double_hyphen_demerits 16908 else d:=d+final_hyphen_demerits; 16909 if abs(fit_class-fitness(r))>1 then d:=d+adj_demerits; 16910 end 16911 16912 @ When an active node disappears, we must delete an adjacent delta node if the 16913 active node was at the beginning or the end of the active list, or if it 16914 was surrounded by delta nodes. We also must preserve the property that 16915 |cur_active_width| represents the length of material from |link(prev_r)| 16916 to~|cur_p|. 16917 16918 @d combine_two_deltas(#)==@|mem[prev_r+#].sc:=mem[prev_r+#].sc+mem[r+#].sc 16919 @d downdate_width(#)==@|cur_active_width[#]:=cur_active_width[#]- 16920 mem[prev_r+#].sc 16921 16922 @<Deactivate node |r|@>= 16923 link(prev_r):=link(r); free_node(r,active_node_size); 16924 if prev_r=active then @<Update the active widths, since the first active 16925 node has been deleted@> 16926 else if type(prev_r)=delta_node then 16927 begin r:=link(prev_r); 16928 if r=last_active then 16929 begin do_all_six(downdate_width); 16930 link(prev_prev_r):=last_active; 16931 free_node(prev_r,delta_node_size); prev_r:=prev_prev_r; 16932 end 16933 else if type(r)=delta_node then 16934 begin do_all_six(update_width); 16935 do_all_six(combine_two_deltas); 16936 link(prev_r):=link(r); free_node(r,delta_node_size); 16937 end; 16938 end 16939 16940 @ The following code uses the fact that |type(last_active)<>delta_node|. If the 16941 active list has just become empty, we do not need to update the 16942 |active_width| array, since it will be initialized when an active 16943 node is next inserted. 16944 16945 @d update_active(#)==active_width[#]:=active_width[#]+mem[r+#].sc 16946 16947 @<Update the active widths,...@>= 16948 begin r:=link(active); 16949 if type(r)=delta_node then 16950 begin do_all_six(update_active); 16951 do_all_six(copy_to_cur_active); 16952 link(active):=link(r); free_node(r,delta_node_size); 16953 end; 16954 end 16955 16956 @* \[39] Breaking paragraphs into lines, continued. 16957 So far we have gotten a little way into the |line_break| routine, having 16958 covered its important |try_break| subroutine. Now let's consider the 16959 rest of the process. 16960 16961 The main loop of |line_break| traverses the given hlist, 16962 starting at |link(temp_head)|, and calls |try_break| at each legal 16963 breakpoint. A variable called |auto_breaking| is set to true except 16964 within math formulas, since glue nodes are not legal breakpoints when 16965 they appear in formulas. 16966 16967 The current node of interest in the hlist is pointed to by |cur_p|. Another 16968 variable, |prev_p|, is usually one step behind |cur_p|, but the real 16969 meaning of |prev_p| is this: If |type(cur_p)=glue_node| then |cur_p| is a legal 16970 breakpoint if and only if |auto_breaking| is true and |prev_p| does not 16971 point to a glue node, penalty node, explicit kern node, or math node. 16972 16973 The following declarations provide for a few other local variables that are 16974 used in special calculations. 16975 16976 @<Local variables for line breaking@>= 16977 @!auto_breaking:boolean; {is node |cur_p| outside a formula?} 16978 @!prev_p:pointer; {helps to determine when glue nodes are breakpoints} 16979 @!q,@!r,@!s,@!prev_s:pointer; {miscellaneous nodes of temporary interest} 16980 @!f:internal_font_number; {used when calculating character widths} 16981 16982 @ The `\ignorespaces|loop|\unskip' in the following code is performed at most 16983 thrice per call of |line_break|, since it is actually a pass over the 16984 entire paragraph. 16985 16986 @<Find optimal breakpoints@>= 16987 threshold:=pretolerance; 16988 if threshold>=0 then 16989 begin @!stat if tracing_paragraphs>0 then 16990 begin begin_diagnostic; print_nl("@@firstpass");@+end;@;@+tats@;@/ 16991 second_pass:=false; final_pass:=false; 16992 end 16993 else begin threshold:=tolerance; second_pass:=true; 16994 final_pass:=(emergency_stretch<=0); 16995 @!stat if tracing_paragraphs>0 then begin_diagnostic;@+tats@; 16996 end; 16997 loop@+ begin if threshold>inf_bad then threshold:=inf_bad; 16998 if second_pass then @<Initialize for hyphenating a paragraph@>; 16999 @<Create an active breakpoint representing the beginning of the paragraph@>; 17000 cur_p:=link(temp_head); auto_breaking:=true;@/ 17001 prev_p:=cur_p; {glue at beginning is not a legal breakpoint} 17002 while (cur_p<>null)and(link(active)<>last_active) do 17003 @<Call |try_break| if |cur_p| is a legal breakpoint; 17004 on the second pass, also try to hyphenate the next 17005 word, if |cur_p| is a glue node; 17006 then advance |cur_p| to the next node of the paragraph 17007 that could possibly be a legal breakpoint@>; 17008 if cur_p=null then 17009 @<Try the final line break at the end of the paragraph, 17010 and |goto done| if the desired breakpoints have been found@>; 17011 @<Clean up the memory by removing the break nodes@>; 17012 if not second_pass then 17013 begin@!stat if tracing_paragraphs>0 then print_nl("@@secondpass");@;@+tats@/ 17014 threshold:=tolerance; second_pass:=true; final_pass:=(emergency_stretch<=0); 17015 end {if at first you don't succeed, \dots} 17016 else begin @!stat if tracing_paragraphs>0 then 17017 print_nl("@@emergencypass");@;@+tats@/ 17018 background[2]:=background[2]+emergency_stretch; final_pass:=true; 17019 end; 17020 end; 17021 done: @!stat if tracing_paragraphs>0 then 17022 begin end_diagnostic(true); normalize_selector; 17023 end;@+tats@/ 17024 17025 @ The active node that represents the starting point does not need a 17026 corresponding passive node. 17027 17028 @d store_background(#)==active_width[#]:=background[#] 17029 17030 @<Create an active breakpoint representing the beginning of the paragraph@>= 17031 q:=get_node(active_node_size); 17032 type(q):=unhyphenated; fitness(q):=decent_fit; 17033 link(q):=last_active; break_node(q):=null; 17034 line_number(q):=prev_graf+1; total_demerits(q):=0; link(active):=q; 17035 do_all_six(store_background);@/ 17036 passive:=null; printed_node:=temp_head; pass_number:=0; 17037 font_in_short_display:=null_font 17038 17039 @ @<Clean...@>= 17040 q:=link(active); 17041 while q<>last_active do 17042 begin cur_p:=link(q); 17043 if type(q)=delta_node then free_node(q,delta_node_size) 17044 else free_node(q,active_node_size); 17045 q:=cur_p; 17046 end; 17047 q:=passive; 17048 while q<>null do 17049 begin cur_p:=link(q); 17050 free_node(q,passive_node_size); 17051 q:=cur_p; 17052 end 17053 17054 @ Here is the main switch in the |line_break| routine, where legal breaks 17055 are determined. As we move through the hlist, we need to keep the |active_width| 17056 array up to date, so that the badness of individual lines is readily calculated 17057 by |try_break|. It is convenient to use the short name |act_width| for 17058 the component of active width that represents real width as opposed to glue. 17059 17060 @d act_width==active_width[1] {length from first active node to current node} 17061 @d kern_break==begin if not is_char_node(link(cur_p)) and auto_breaking then 17062 if type(link(cur_p))=glue_node then try_break(0,unhyphenated); 17063 act_width:=act_width+width(cur_p); 17064 end 17065 17066 @<Call |try_break| if |cur_p| is a legal breakpoint...@>= 17067 begin if is_char_node(cur_p) then 17068 @<Advance \(c)|cur_p| to the node following the present 17069 string of characters@>; 17070 case type(cur_p) of 17071 hlist_node,vlist_node,rule_node: act_width:=act_width+width(cur_p); 17072 whatsit_node: @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>; 17073 glue_node: begin @<If node |cur_p| is a legal breakpoint, call |try_break|; 17074 then update the active widths by including the glue in |glue_ptr(cur_p)|@>; 17075 if second_pass and auto_breaking then 17076 @<Try to hyphenate the following word@>; 17077 end; 17078 kern_node: if subtype(cur_p)=explicit then kern_break 17079 else act_width:=act_width+width(cur_p); 17080 ligature_node: begin f:=font(lig_char(cur_p)); 17081 act_width:=act_width+char_width(f)(char_info(f)(character(lig_char(cur_p)))); 17082 end; 17083 disc_node: @<Try to break after a discretionary fragment, then |goto done5|@>; 17084 math_node: begin auto_breaking:=(subtype(cur_p)=after); kern_break; 17085 end; 17086 penalty_node: try_break(penalty(cur_p),unhyphenated); 17087 mark_node,ins_node,adjust_node: do_nothing; 17088 othercases confusion("paragraph") 17089 @:this can't happen paragraph}{\quad paragraph@> 17090 endcases;@/ 17091 prev_p:=cur_p; cur_p:=link(cur_p); 17092 done5:end 17093 17094 @ The code that passes over the characters of words in a paragraph is 17095 part of \TeX's inner loop, so it has been streamlined for speed. We use 17096 the fact that `\.{\\parfillskip}' glue appears at the end of each paragraph; 17097 it is therefore unnecessary to check if |link(cur_p)=null| when |cur_p| is a 17098 character node. 17099 @^inner loop@> 17100 17101 @<Advance \(c)|cur_p| to the node following the present string...@>= 17102 begin prev_p:=cur_p; 17103 repeat f:=font(cur_p); 17104 act_width:=act_width+char_width(f)(char_info(f)(character(cur_p))); 17105 cur_p:=link(cur_p); 17106 until not is_char_node(cur_p); 17107 end 17108 17109 @ When node |cur_p| is a glue node, we look at |prev_p| to see whether or not 17110 a breakpoint is legal at |cur_p|, as explained above. 17111 17112 @<If node |cur_p| is a legal breakpoint, call...@>= 17113 if auto_breaking then 17114 begin if is_char_node(prev_p) then try_break(0,unhyphenated) 17115 else if precedes_break(prev_p) then try_break(0,unhyphenated) 17116 else if (type(prev_p)=kern_node)and(subtype(prev_p)<>explicit) then 17117 try_break(0,unhyphenated); 17118 end; 17119 check_shrinkage(glue_ptr(cur_p)); q:=glue_ptr(cur_p); 17120 act_width:=act_width+width(q);@| 17121 active_width[2+stretch_order(q)]:=@| 17122 active_width[2+stretch_order(q)]+stretch(q);@/ 17123 active_width[6]:=active_width[6]+shrink(q) 17124 17125 @ The following code knows that discretionary texts contain 17126 only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes. 17127 17128 @<Try to break after a discretionary fragment...@>= 17129 begin s:=pre_break(cur_p); disc_width:=0; 17130 if s=null then try_break(ex_hyphen_penalty,hyphenated) 17131 else begin repeat @<Add the width of node |s| to |disc_width|@>; 17132 s:=link(s); 17133 until s=null; 17134 act_width:=act_width+disc_width; 17135 try_break(hyphen_penalty,hyphenated); 17136 act_width:=act_width-disc_width; 17137 end; 17138 r:=replace_count(cur_p); s:=link(cur_p); 17139 while r>0 do 17140 begin @<Add the width of node |s| to |act_width|@>; 17141 decr(r); s:=link(s); 17142 end; 17143 prev_p:=cur_p; cur_p:=s; goto done5; 17144 end 17145 17146 @ @<Add the width of node |s| to |disc_width|@>= 17147 if is_char_node(s) then 17148 begin f:=font(s); 17149 disc_width:=disc_width+char_width(f)(char_info(f)(character(s))); 17150 end 17151 else case type(s) of 17152 ligature_node: begin f:=font(lig_char(s)); 17153 disc_width:=disc_width+ 17154 char_width(f)(char_info(f)(character(lig_char(s)))); 17155 end; 17156 hlist_node,vlist_node,rule_node,kern_node: 17157 disc_width:=disc_width+width(s); 17158 othercases confusion("disc3") 17159 @:this can't happen disc3}{\quad disc3@> 17160 endcases 17161 17162 @ @<Add the width of node |s| to |act_width|@>= 17163 if is_char_node(s) then 17164 begin f:=font(s); 17165 act_width:=act_width+char_width(f)(char_info(f)(character(s))); 17166 end 17167 else case type(s) of 17168 ligature_node: begin f:=font(lig_char(s)); 17169 act_width:=act_width+ 17170 char_width(f)(char_info(f)(character(lig_char(s)))); 17171 end; 17172 hlist_node,vlist_node,rule_node,kern_node: 17173 act_width:=act_width+width(s); 17174 othercases confusion("disc4") 17175 @:this can't happen disc4}{\quad disc4@> 17176 endcases 17177 17178 @ The forced line break at the paragraph's end will reduce the list of 17179 breakpoints so that all active nodes represent breaks at |cur_p=null|. 17180 On the first pass, we insist on finding an active node that has the 17181 correct ``looseness.'' On the final pass, there will be at least one active 17182 node, and we will match the desired looseness as well as we can. 17183 17184 The global variable |best_bet| will be set to the active node for the best 17185 way to break the paragraph, and a few other variables are used to 17186 help determine what is best. 17187 17188 @<Glob...@>= 17189 @!best_bet:pointer; {use this passive node and its predecessors} 17190 @!fewest_demerits:integer; {the demerits associated with |best_bet|} 17191 @!best_line:halfword; {line number following the last line of the new paragraph} 17192 @!actual_looseness:integer; {the difference between |line_number(best_bet)| 17193 and the optimum |best_line|} 17194 @!line_diff:integer; {the difference between the current line number and 17195 the optimum |best_line|} 17196 17197 @ @<Try the final line break at the end of the paragraph...@>= 17198 begin try_break(eject_penalty,hyphenated); 17199 if link(active)<>last_active then 17200 begin @<Find an active node with fewest demerits@>; 17201 if looseness=0 then goto done; 17202 @<Find the best active node for the desired looseness@>; 17203 if (actual_looseness=looseness)or final_pass then goto done; 17204 end; 17205 end 17206 17207 @ @<Find an active node...@>= 17208 r:=link(active); fewest_demerits:=awful_bad; 17209 repeat if type(r)<>delta_node then if total_demerits(r)<fewest_demerits then 17210 begin fewest_demerits:=total_demerits(r); best_bet:=r; 17211 end; 17212 r:=link(r); 17213 until r=last_active; 17214 best_line:=line_number(best_bet) 17215 17216 @ The adjustment for a desired looseness is a slightly more complicated 17217 version of the loop just considered. Note that if a paragraph is broken 17218 into segments by displayed equations, each segment will be subject to the 17219 looseness calculation, independently of the other segments. 17220 17221 @<Find the best active node...@>= 17222 begin r:=link(active); actual_looseness:=0; 17223 repeat if type(r)<>delta_node then 17224 begin line_diff:=line_number(r)-best_line; 17225 if ((line_diff<actual_looseness)and(looseness<=line_diff))or@| 17226 ((line_diff>actual_looseness)and(looseness>=line_diff)) then 17227 begin best_bet:=r; actual_looseness:=line_diff; 17228 fewest_demerits:=total_demerits(r); 17229 end 17230 else if (line_diff=actual_looseness)and@| 17231 (total_demerits(r)<fewest_demerits) then 17232 begin best_bet:=r; fewest_demerits:=total_demerits(r); 17233 end; 17234 end; 17235 r:=link(r); 17236 until r=last_active; 17237 best_line:=line_number(best_bet); 17238 end 17239 17240 @ Once the best sequence of breakpoints has been found (hurray), we call on the 17241 procedure |post_line_break| to finish the remainder of the work. 17242 (By introducing this subprocedure, we are able to keep |line_break| 17243 from getting extremely long.) 17244 17245 @<Break the paragraph at the chosen...@>= 17246 post_line_break(final_widow_penalty) 17247 17248 @ The total number of lines that will be set by |post_line_break| 17249 is |best_line-prev_graf-1|. The last breakpoint is specified by 17250 |break_node(best_bet)|, and this passive node points to the other breakpoints 17251 via the |prev_break| links. The finishing-up phase starts by linking the 17252 relevant passive nodes in forward order, changing |prev_break| to 17253 |next_break|. (The |next_break| fields actually reside in the same memory 17254 space as the |prev_break| fields did, but we give them a new name because 17255 of their new significance.) Then the lines are justified, one by one. 17256 17257 @d next_break==prev_break {new name for |prev_break| after links are reversed} 17258 17259 @<Declare subprocedures for |line_break|@>= 17260 procedure post_line_break(@!final_widow_penalty:integer); 17261 label done,done1; 17262 var q,@!r,@!s:pointer; {temporary registers for list manipulation} 17263 @!disc_break:boolean; {was the current break at a discretionary node?} 17264 @!post_disc_break:boolean; {and did it have a nonempty post-break part?} 17265 @!cur_width:scaled; {width of line number |cur_line|} 17266 @!cur_indent:scaled; {left margin of line number |cur_line|} 17267 @!t:quarterword; {used for replacement counts in discretionary nodes} 17268 @!pen:integer; {use when calculating penalties between lines} 17269 @!cur_line: halfword; {the current line number being justified} 17270 begin @<Reverse the links of the relevant passive nodes, setting |cur_p| to the 17271 first breakpoint@>; 17272 cur_line:=prev_graf+1; 17273 repeat @<Justify the line ending at breakpoint |cur_p|, and append it to the 17274 current vertical list, together with associated penalties and other 17275 insertions@>; 17276 incr(cur_line); cur_p:=next_break(cur_p); 17277 if cur_p<>null then if not post_disc_break then 17278 @<Prune unwanted nodes at the beginning of the next line@>; 17279 until cur_p=null; 17280 if (cur_line<>best_line)or(link(temp_head)<>null) then 17281 confusion("line breaking"); 17282 @:this can't happen line breaking}{\quad line breaking@> 17283 prev_graf:=best_line-1; 17284 end; 17285 17286 @ The job of reversing links in a list is conveniently regarded as the job 17287 of taking items off one stack and putting them on another. In this case we 17288 take them off a stack pointed to by |q| and having |prev_break| fields; 17289 we put them on a stack pointed to by |cur_p| and having |next_break| fields. 17290 Node |r| is the passive node being moved from stack to stack. 17291 17292 @<Reverse the links of the relevant passive nodes...@>= 17293 q:=break_node(best_bet); cur_p:=null; 17294 repeat r:=q; q:=prev_break(q); next_break(r):=cur_p; cur_p:=r; 17295 until q=null 17296 17297 @ Glue and penalty and kern and math nodes are deleted at the beginning of 17298 a line, except in the anomalous case that the node to be deleted is actually 17299 one of the chosen breakpoints. Otherwise 17300 the pruning done here is designed to match 17301 the lookahead computation in |try_break|, where the |break_width| values 17302 are computed for non-discretionary breakpoints. 17303 17304 @<Prune unwanted nodes at the beginning of the next line@>= 17305 begin r:=temp_head; 17306 loop@+ begin q:=link(r); 17307 if q=cur_break(cur_p) then goto done1; 17308 {|cur_break(cur_p)| is the next breakpoint} 17309 {now |q| cannot be |null|} 17310 if is_char_node(q) then goto done1; 17311 if non_discardable(q) then goto done1; 17312 if type(q)=kern_node then if subtype(q)<>explicit then goto done1; 17313 r:=q; {now |type(q)=glue_node|, |kern_node|, |math_node|, or |penalty_node|} 17314 end; 17315 done1: if r<>temp_head then 17316 begin link(r):=null; flush_node_list(link(temp_head)); 17317 link(temp_head):=q; 17318 end; 17319 end 17320 17321 @ The current line to be justified appears in a horizontal list starting 17322 at |link(temp_head)| and ending at |cur_break(cur_p)|. If |cur_break(cur_p)| is 17323 a glue node, we reset the glue to equal the |right_skip| glue; otherwise 17324 we append the |right_skip| glue at the right. If |cur_break(cur_p)| is a 17325 discretionary node, we modify the list so that the discretionary break 17326 is compulsory, and we set |disc_break| to |true|. We also append 17327 the |left_skip| glue at the left of the line, unless it is zero. 17328 17329 @<Justify the line ending at breakpoint |cur_p|, and append it...@>= 17330 @<Modify the end of the line to reflect the nature of the break and to include 17331 \.{\\rightskip}; also set the proper value of |disc_break|@>; 17332 @<Put the \(l)\.{\\leftskip} glue at the left and detach this line@>; 17333 @<Call the packaging subroutine, setting |just_box| to the justified box@>; 17334 @<Append the new box to the current vertical list, followed by the list of 17335 special nodes taken out of the box by the packager@>; 17336 @<Append a penalty node, if a nonzero penalty is appropriate@> 17337 17338 @ At the end of the following code, |q| will point to the final node on the 17339 list about to be justified. 17340 17341 @<Modify the end of the line...@>= 17342 q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false; 17343 if q<>null then {|q| cannot be a |char_node|} 17344 if type(q)=glue_node then 17345 begin delete_glue_ref(glue_ptr(q)); 17346 glue_ptr(q):=right_skip; 17347 subtype(q):=right_skip_code+1; add_glue_ref(right_skip); 17348 goto done; 17349 end 17350 else begin if type(q)=disc_node then 17351 @<Change discretionary to compulsory and set 17352 |disc_break:=true|@> 17353 else if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0; 17354 end 17355 else begin q:=temp_head; 17356 while link(q)<>null do q:=link(q); 17357 end; 17358 @<Put the \(r)\.{\\rightskip} glue after node |q|@>; 17359 done: 17360 17361 @ @<Change discretionary to compulsory...@>= 17362 begin t:=replace_count(q); 17363 @<Destroy the |t| nodes following |q|, and 17364 make |r| point to the following node@>; 17365 if post_break(q)<>null then @<Transplant the post-break list@>; 17366 if pre_break(q)<>null then @<Transplant the pre-break list@>; 17367 link(q):=r; disc_break:=true; 17368 end 17369 17370 @ @<Destroy the |t| nodes following |q|...@>= 17371 if t=0 then r:=link(q) 17372 else begin r:=q; 17373 while t>1 do 17374 begin r:=link(r); decr(t); 17375 end; 17376 s:=link(r); 17377 r:=link(s); link(s):=null; 17378 flush_node_list(link(q)); replace_count(q):=0; 17379 end 17380 17381 @ We move the post-break list from inside node |q| to the main list by 17382 re\-attaching it just before the present node |r|, then resetting |r|. 17383 17384 @<Transplant the post-break list@>= 17385 begin s:=post_break(q); 17386 while link(s)<>null do s:=link(s); 17387 link(s):=r; r:=post_break(q); post_break(q):=null; post_disc_break:=true; 17388 end 17389 17390 @ We move the pre-break list from inside node |q| to the main list by 17391 re\-attaching it just after the present node |q|, then resetting |q|. 17392 17393 @<Transplant the pre-break list@>= 17394 begin s:=pre_break(q); link(q):=s; 17395 while link(s)<>null do s:=link(s); 17396 pre_break(q):=null; q:=s; 17397 end 17398 17399 @ @<Put the \(r)\.{\\rightskip} glue after node |q|@>= 17400 r:=new_param_glue(right_skip_code); link(r):=link(q); link(q):=r; q:=r 17401 17402 @ The following code begins with |q| at the end of the list to be 17403 justified. It ends with |q| at the beginning of that list, and with 17404 |link(temp_head)| pointing to the remainder of the paragraph, if any. 17405 17406 @<Put the \(l)\.{\\leftskip} glue at the left...@>= 17407 r:=link(q); link(q):=null; q:=link(temp_head); link(temp_head):=r; 17408 if left_skip<>zero_glue then 17409 begin r:=new_param_glue(left_skip_code); 17410 link(r):=q; q:=r; 17411 end 17412 17413 @ @<Append the new box to the current vertical list...@>= 17414 append_to_vlist(just_box); 17415 if adjust_head<>adjust_tail then 17416 begin link(tail):=link(adjust_head); tail:=adjust_tail; 17417 end; 17418 adjust_tail:=null 17419 17420 @ Now |q| points to the hlist that represents the current line of the 17421 paragraph. We need to compute the appropriate line width, pack the 17422 line into a box of this size, and shift the box by the appropriate 17423 amount of indentation. 17424 17425 @<Call the packaging subroutine...@>= 17426 if cur_line>last_special_line then 17427 begin cur_width:=second_width; cur_indent:=second_indent; 17428 end 17429 else if par_shape_ptr=null then 17430 begin cur_width:=first_width; cur_indent:=first_indent; 17431 end 17432 else begin cur_width:=mem[par_shape_ptr+2*cur_line].sc; 17433 cur_indent:=mem[par_shape_ptr+2*cur_line-1].sc; 17434 end; 17435 adjust_tail:=adjust_head; just_box:=hpack(q,cur_width,exactly); 17436 shift_amount(just_box):=cur_indent 17437 17438 @ Penalties between the lines of a paragraph come from club and widow lines, 17439 from the |inter_line_penalty| parameter, and from lines that end at 17440 discretionary breaks. Breaking between lines of a two-line paragraph gets 17441 both club-line and widow-line penalties. The local variable |pen| will 17442 be set to the sum of all relevant penalties for the current line, except 17443 that the final line is never penalized. 17444 17445 @<Append a penalty node, if a nonzero penalty is appropriate@>= 17446 if cur_line+1<>best_line then 17447 begin pen:=inter_line_penalty; 17448 if cur_line=prev_graf+1 then pen:=pen+club_penalty; 17449 if cur_line+2=best_line then pen:=pen+final_widow_penalty; 17450 if disc_break then pen:=pen+broken_penalty; 17451 if pen<>0 then 17452 begin r:=new_penalty(pen); 17453 link(tail):=r; tail:=r; 17454 end; 17455 end 17456 17457 @* \[40] Pre-hyphenation. 17458 When the line-breaking routine is unable to find a feasible sequence of 17459 breakpoints, it makes a second pass over the paragraph, attempting to 17460 hyphenate the hyphenatable words. The goal of hyphenation is to insert 17461 discretionary material into the paragraph so that there are more 17462 potential places to break. 17463 17464 The general rules for hyphenation are somewhat complex and technical, 17465 because we want to be able to hyphenate words that are preceded or 17466 followed by punctuation marks, and because we want the rules to work 17467 for languages other than English. We also must contend with the fact 17468 that hyphens might radically alter the ligature and kerning structure 17469 of a word. 17470 17471 A sequence of characters will be considered for hyphenation only if it 17472 belongs to a ``potentially hyphenatable part'' of the current paragraph. 17473 This is a sequence of nodes $p_0p_1\ldots p_m$ where $p_0$ is a glue node, 17474 $p_1\ldots p_{m-1}$ are either character or ligature or whatsit or 17475 implicit kern nodes, and $p_m$ is a glue or penalty or insertion or adjust 17476 or mark or whatsit or explicit kern node. (Therefore hyphenation is 17477 disabled by boxes, math formulas, and discretionary nodes already inserted 17478 by the user.) The ligature nodes among $p_1\ldots p_{m-1}$ are effectively 17479 expanded into the original non-ligature characters; the kern nodes and 17480 whatsits are ignored. Each character |c| is now classified as either a 17481 nonletter (if |lc_code(c)=0|), a lowercase letter (if 17482 |lc_code(c)=c|), or an uppercase letter (otherwise); an uppercase letter 17483 is treated as if it were |lc_code(c)| for purposes of hyphenation. The 17484 characters generated by $p_1\ldots p_{m-1}$ may begin with nonletters; let 17485 $c_1$ be the first letter that is not in the middle of a ligature. Whatsit 17486 nodes preceding $c_1$ are ignored; a whatsit found after $c_1$ will be the 17487 terminating node $p_m$. All characters that do not have the same font as 17488 $c_1$ will be treated as nonletters. The |hyphen_char| for that font 17489 must be between 0 and 255, otherwise hyphenation will not be attempted. 17490 \TeX\ looks ahead for as many consecutive letters $c_1\ldots c_n$ as 17491 possible; however, |n| must be less than 64, so a character that would 17492 otherwise be $c_{64}$ is effectively not a letter. Furthermore $c_n$ must 17493 not be in the middle of a ligature. In this way we obtain a string of 17494 letters $c_1\ldots c_n$ that are generated by nodes $p_a\ldots p_b$, where 17495 |1<=a<=b+1<=m|. If |n>=l_hyf+r_hyf|, this string qualifies for hyphenation; 17496 however, |uc_hyph| must be positive, if $c_1$ is uppercase. 17497 17498 The hyphenation process takes place in three stages. First, the candidate 17499 sequence $c_1\ldots c_n$ is found; then potential positions for hyphens 17500 are determined by referring to hyphenation tables; and finally, the nodes 17501 $p_a\ldots p_b$ are replaced by a new sequence of nodes that includes the 17502 discretionary breaks found. 17503 17504 Fortunately, we do not have to do all this calculation very often, because 17505 of the way it has been taken out of \TeX's inner loop. For example, when 17506 the second edition of the author's 700-page book {\sl Seminumerical 17507 Algorithms} was typeset by \TeX, only about 1.2 hyphenations needed to be 17508 @^Knuth, Donald Ervin@> 17509 tried per paragraph, since the line breaking algorithm needed to use two 17510 passes on only about 5 per cent of the paragraphs. 17511 17512 @<Initialize for hyphenating...@>= 17513 begin @!init if trie_not_ready then init_trie;@+tini@;@/ 17514 cur_lang:=init_cur_lang; l_hyf:=init_l_hyf; r_hyf:=init_r_hyf; 17515 end 17516 17517 @ The letters $c_1\ldots c_n$ that are candidates for hyphenation are placed 17518 into an array called |hc|; the number |n| is placed into |hn|; pointers to 17519 nodes $p_{a-1}$ and~$p_b$ in the description above are placed into variables 17520 |ha| and |hb|; and the font number is placed into |hf|. 17521 17522 @<Glob...@>= 17523 @!hc:array[0..65] of 0..256; {word to be hyphenated} 17524 @!hn:0..64; {the number of positions occupied in |hc|; 17525 not always a |small_number|} 17526 @!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result} 17527 @!hf:internal_font_number; {font number of the letters in |hc|} 17528 @!hu:array[0..63] of 0..256; {like |hc|, before conversion to lowercase} 17529 @!hyf_char:integer; {hyphen character of the relevant font} 17530 @!cur_lang,@!init_cur_lang:ASCII_code; {current hyphenation table of interest} 17531 @!l_hyf,@!r_hyf,@!init_l_hyf,@!init_r_hyf:integer; {limits on fragment sizes} 17532 @!hyf_bchar:halfword; {boundary character after $c_n$} 17533 17534 @ Hyphenation routines need a few more local variables. 17535 17536 @<Local variables for line...@>= 17537 @!j:small_number; {an index into |hc| or |hu|} 17538 @!c:0..255; {character being considered for hyphenation} 17539 17540 @ When the following code is activated, the |line_break| procedure is in its 17541 second pass, and |cur_p| points to a glue node. 17542 17543 @<Try to hyphenate...@>= 17544 begin prev_s:=cur_p; s:=link(prev_s); 17545 if s<>null then 17546 begin @<Skip to node |ha|, or |goto done1| if no hyphenation 17547 should be attempted@>; 17548 if l_hyf+r_hyf>63 then goto done1; 17549 @<Skip to node |hb|, putting letters into |hu| and |hc|@>; 17550 @<Check that the nodes following |hb| permit hyphenation and that at least 17551 |l_hyf+r_hyf| letters have been found, otherwise |goto done1|@>; 17552 hyphenate; 17553 end; 17554 done1: end 17555 17556 @ @<Declare subprocedures for |line_break|@>= 17557 @t\4@>@<Declare the function called |reconstitute|@> 17558 procedure hyphenate; 17559 label common_ending,done,found,found1,found2,not_found,exit; 17560 var @<Local variables for hyphenation@>@; 17561 begin @<Find hyphen locations for the word in |hc|, or |return|@>; 17562 @<If no hyphens were found, |return|@>; 17563 @<Replace nodes |ha..hb| by a sequence of nodes that includes 17564 the discretionary hyphens@>; 17565 exit:end; 17566 17567 @ The first thing we need to do is find the node |ha| just before the 17568 first letter. 17569 17570 @<Skip to node |ha|, or |goto done1|...@>= 17571 loop@+ begin if is_char_node(s) then 17572 begin c:=qo(character(s)); hf:=font(s); 17573 end 17574 else if type(s)=ligature_node then 17575 if lig_ptr(s)=null then goto continue 17576 else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q); 17577 end 17578 else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue 17579 else if type(s)=whatsit_node then 17580 begin @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>; 17581 goto continue; 17582 end 17583 else goto done1; 17584 if lc_code(c)<>0 then 17585 if (lc_code(c)=c)or(uc_hyph>0) then goto done2 17586 else goto done1; 17587 continue: prev_s:=s; s:=link(prev_s); 17588 end; 17589 done2: hyf_char:=hyphen_char[hf]; 17590 if hyf_char<0 then goto done1; 17591 if hyf_char>255 then goto done1; 17592 ha:=prev_s 17593 17594 @ The word to be hyphenated is now moved to the |hu| and |hc| arrays. 17595 17596 @<Skip to node |hb|, putting letters...@>= 17597 hn:=0; 17598 loop@+ begin if is_char_node(s) then 17599 begin if font(s)<>hf then goto done3; 17600 hyf_bchar:=character(s); c:=qo(hyf_bchar); 17601 if lc_code(c)=0 then goto done3; 17602 if hn=63 then goto done3; 17603 hb:=s; incr(hn); hu[hn]:=c; hc[hn]:=lc_code(c); hyf_bchar:=non_char; 17604 end 17605 else if type(s)=ligature_node then 17606 @<Move the characters of a ligature node to |hu| and |hc|; 17607 but |goto done3| if they are not all letters@> 17608 else if (type(s)=kern_node)and(subtype(s)=normal) then 17609 begin hb:=s; 17610 hyf_bchar:=font_bchar[hf]; 17611 end 17612 else goto done3; 17613 s:=link(s); 17614 end; 17615 done3: 17616 17617 @ We let |j| be the index of the character being stored when a ligature node 17618 is being expanded, since we do not want to advance |hn| until we are sure 17619 that the entire ligature consists of letters. Note that it is possible 17620 to get to |done3| with |hn=0| and |hb| not set to any value. 17621 17622 @<Move the characters of a ligature node to |hu| and |hc|...@>= 17623 begin if font(lig_char(s))<>hf then goto done3; 17624 j:=hn; q:=lig_ptr(s);@+if q>null then hyf_bchar:=character(q); 17625 while q>null do 17626 begin c:=qo(character(q)); 17627 if lc_code(c)=0 then goto done3; 17628 if j=63 then goto done3; 17629 incr(j); hu[j]:=c; hc[j]:=lc_code(c);@/ 17630 q:=link(q); 17631 end; 17632 hb:=s; hn:=j; 17633 if odd(subtype(s)) then hyf_bchar:=font_bchar[hf]@+else hyf_bchar:=non_char; 17634 end 17635 17636 @ @<Check that the nodes following |hb| permit hyphenation...@>= 17637 if hn<l_hyf+r_hyf then goto done1; {|l_hyf| and |r_hyf| are |>=1|} 17638 loop@+ begin if not(is_char_node(s)) then 17639 case type(s) of 17640 ligature_node: do_nothing; 17641 kern_node: if subtype(s)<>normal then goto done4; 17642 whatsit_node,glue_node,penalty_node,ins_node,adjust_node,mark_node: 17643 goto done4; 17644 othercases goto done1 17645 endcases; 17646 s:=link(s); 17647 end; 17648 done4: 17649 17650 @* \[41] Post-hyphenation. 17651 If a hyphen may be inserted between |hc[j]| and |hc[j+1]|, the hyphenation 17652 procedure will set |hyf[j]| to some small odd number. But before we look 17653 at \TeX's hyphenation procedure, which is independent of the rest of the 17654 line-breaking algorithm, let us consider what we will do with the hyphens 17655 it finds, since it is better to work on this part of the program before 17656 forgetting what |ha| and |hb|, etc., are all about. 17657 17658 @<Glob...@>= 17659 @!hyf:array [0..64] of 0..9; {odd values indicate discretionary hyphens} 17660 @!init_list:pointer; {list of punctuation characters preceding the word} 17661 @!init_lig:boolean; {does |init_list| represent a ligature?} 17662 @!init_lft:boolean; {if so, did the ligature involve a left boundary?} 17663 17664 @ @<Local variables for hyphenation@>= 17665 @!i,@!j,@!l:0..65; {indices into |hc| or |hu|} 17666 @!q,@!r,@!s:pointer; {temporary registers for list manipulation} 17667 @!bchar:halfword; {boundary character of hyphenated word, or |non_char|} 17668 17669 @ \TeX\ will never insert a hyphen that has fewer than 17670 \.{\\lefthyphenmin} letters before it or fewer than 17671 \.{\\righthyphenmin} after it; hence, a short word has 17672 comparatively little chance of being hyphenated. If no hyphens have 17673 been found, we can save time by not having to make any changes to the 17674 paragraph. 17675 17676 @<If no hyphens were found, |return|@>= 17677 for j:=l_hyf to hn-r_hyf do if odd(hyf[j]) then goto found1; 17678 return; 17679 found1: 17680 17681 @ If hyphens are in fact going to be inserted, \TeX\ first deletes the 17682 subsequence of nodes between |ha| and~|hb|. An attempt is made to 17683 preserve the effect that implicit boundary characters and punctuation marks 17684 had on ligatures inside the hyphenated word, by storing a left boundary or 17685 preceding character in |hu[0]| and by storing a possible right boundary 17686 in |bchar|. We set |j:=0| if |hu[0]| is to be part of the reconstruction; 17687 otherwise |j:=1|. 17688 The variable |s| will point to the tail of the current hlist, and 17689 |q| will point to the node following |hb|, so that 17690 things can be hooked up after we reconstitute the hyphenated word. 17691 17692 @<Replace nodes |ha..hb| by a sequence of nodes...@>= 17693 q:=link(hb); link(hb):=null; r:=link(ha); link(ha):=null; bchar:=hyf_bchar; 17694 if is_char_node(ha) then 17695 if font(ha)<>hf then goto found2 17696 else begin init_list:=ha; init_lig:=false; hu[0]:=qo(character(ha)); 17697 end 17698 else if type(ha)=ligature_node then 17699 if font(lig_char(ha))<>hf then goto found2 17700 else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1); 17701 hu[0]:=qo(character(lig_char(ha))); 17702 if init_list=null then if init_lft then 17703 begin hu[0]:=256; init_lig:=false; 17704 end; {in this case a ligature will be reconstructed from scratch} 17705 free_node(ha,small_node_size); 17706 end 17707 else begin {no punctuation found; look for left boundary} 17708 if not is_char_node(r) then if type(r)=ligature_node then 17709 if subtype(r)>1 then goto found2; 17710 j:=1; s:=ha; init_list:=null; goto common_ending; 17711 end; 17712 s:=cur_p; {we have |cur_p<>ha| because |type(cur_p)=glue_node|} 17713 while link(s)<>ha do s:=link(s); 17714 j:=0; goto common_ending; 17715 found2: s:=ha; j:=0; hu[0]:=256; init_lig:=false; init_list:=null; 17716 common_ending: flush_node_list(r); 17717 @<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>; 17718 flush_list(init_list) 17719 17720 @ We must now face the fact that the battle is not over, even though the 17721 {\def\!{\kern-1pt}% 17722 hyphens have been found: The process of reconstituting a word can be nontrivial 17723 because ligatures might change when a hyphen is present. {\sl The \TeX book\/} 17724 discusses the difficulties of the word ``difficult'', and 17725 the discretionary material surrounding a 17726 hyphen can be considerably more complex than that. Suppose 17727 \.{abcdef} is a word in a font for which the only ligatures are \.{b\!c}, 17728 \.{c\!d}, \.{d\!e}, and \.{e\!f}. If this word permits hyphenation 17729 between \.b and \.c, the two patterns with and without hyphenation are 17730 $\.a\,\.b\,\.-\,\.{c\!d}\,\.{e\!f}$ and $\.a\,\.{b\!c}\,\.{d\!e}\,\.f$. 17731 Thus the insertion of a hyphen might cause effects to ripple arbitrarily 17732 far into the rest of the word. A further complication arises if additional 17733 hyphens appear together with such rippling, e.g., if the word in the 17734 example just given could also be hyphenated between \.c and \.d; \TeX\ 17735 avoids this by simply ignoring the additional hyphens in such weird cases.} 17736 17737 Still further complications arise in the presence of ligatures that do not 17738 delete the original characters. When punctuation precedes the word being 17739 hyphenated, \TeX's method is not perfect under all possible scenarios, 17740 because punctuation marks and letters can propagate information back and forth. 17741 For example, suppose the original pre-hyphenation pair 17742 \.{*a} changes to \.{*y} via a \.{\?=:} ligature, which changes to \.{xy} 17743 via a \.{=:\?} ligature; if $p_{a-1}=\.x$ and $p_a=\.y$, the reconstitution 17744 procedure isn't smart enough to obtain \.{xy} again. In such cases the 17745 font designer should include a ligature that goes from \.{xa} to \.{xy}. 17746 17747 @ The processing is facilitated by a subroutine called |reconstitute|. Given 17748 a string of characters $x_j\ldots x_n$, there is a smallest index $m\ge j$ 17749 such that the ``translation'' of $x_j\ldots x_n$ by ligatures and kerning 17750 has the form $y_1\ldots y_t$ followed by the translation of $x_{m+1}\ldots x_n$, 17751 where $y_1\ldots y_t$ is some nonempty sequence of character, ligature, and 17752 kern nodes. We call $x_j\ldots x_m$ a ``cut prefix'' of $x_j\ldots x_n$. 17753 For example, if $x_1x_2x_3=\.{fly}$, and if the font contains `fl' as a 17754 ligature and a kern between `fl' and `y', then $m=2$, $t=2$, and $y_1$ will 17755 be a ligature node for `fl' followed by an appropriate kern node~$y_2$. 17756 In the most common case, $x_j$~forms no ligature with $x_{j+1}$ and we 17757 simply have $m=j$, $y_1=x_j$. If $m<n$ we can repeat the procedure on 17758 $x_{m+1}\ldots x_n$ until the entire translation has been found. 17759 17760 The |reconstitute| function returns the integer $m$ and puts the nodes 17761 $y_1\ldots y_t$ into a linked list starting at |link(hold_head)|, 17762 getting the input $x_j\ldots x_n$ from the |hu| array. If $x_j=256$, 17763 we consider $x_j$ to be an implicit left boundary character; in this 17764 case |j| must be strictly less than~|n|. There is a 17765 parameter |bchar|, which is either 256 or an implicit right boundary character 17766 assumed to be present just following~$x_n$. (The value |hu[n+1]| is never 17767 explicitly examined, but the algorithm imagines that |bchar| is there.) 17768 17769 If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]| 17770 is odd and such that the result of |reconstitute| would have been different 17771 if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed| 17772 to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero. 17773 17774 A special convention is used in the case |j=0|: Then we assume that the 17775 translation of |hu[0]| appears in a special list of charnodes starting at 17776 |init_list|; moreover, if |init_lig| is |true|, then |hu[0]| will be 17777 a ligature character, involving a left boundary if |init_lft| is |true|. 17778 This facility is provided for cases when a hyphenated 17779 word is preceded by punctuation (like single or double quotes) that might 17780 affect the translation of the beginning of the word. 17781 17782 @<Glob...@>= 17783 @!hyphen_passed:small_number; {first hyphen in a ligature, if any} 17784 17785 @ @<Declare the function called |reconstitute|@>= 17786 function reconstitute(@!j,@!n:small_number;@!bchar,@!hchar:halfword): 17787 small_number; 17788 label continue,done; 17789 var @!p:pointer; {temporary register for list manipulation} 17790 @!t:pointer; {a node being appended to} 17791 @!q:four_quarters; {character information or a lig/kern instruction} 17792 @!cur_rh:halfword; {hyphen character for ligature testing} 17793 @!test_char:halfword; {hyphen or other character for ligature testing} 17794 @!w:scaled; {amount of kerning} 17795 @!k:font_index; {position of current lig/kern instruction} 17796 begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null; 17797 {at this point |ligature_present=lft_hit=rt_hit=false|} 17798 @<Set up data structures with the cursor following position |j|@>; 17799 continue:@<If there's a ligature or kern at the cursor position, update the data 17800 structures, possibly advancing~|j|; continue until the cursor moves@>; 17801 @<Append a ligature and/or kern to the translation; 17802 |goto continue| if the stack of inserted ligatures is nonempty@>; 17803 reconstitute:=j; 17804 end; 17805 17806 @ The reconstitution procedure shares many of the global data structures 17807 by which \TeX\ has processed the words before they were hyphenated. 17808 There is an implied ``cursor'' between characters |cur_l| and |cur_r|; 17809 these characters will be tested for possible ligature activity. If 17810 |ligature_present| then |cur_l| is a ligature character formed from the 17811 original characters following |cur_q| in the current translation list. 17812 There is a ``ligature stack'' between the cursor and character |j+1|, 17813 consisting of pseudo-ligature nodes linked together by their |link| fields. 17814 This stack is normally empty unless a ligature command has created a new 17815 character that will need to be processed later. A pseudo-ligature is 17816 a special node having a |character| field that represents a potential 17817 ligature and a |lig_ptr| field that points to a |char_node| or is |null|. 17818 We have 17819 $$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr 17820 |qi(hu[j+1])|,&if |lig_stack=null| and |j<n|;\cr 17821 bchar,&if |lig_stack=null| and |j=n|.\cr}$$ 17822 17823 @<Glob...@>= 17824 @!cur_l,@!cur_r:halfword; {characters before and after the cursor} 17825 @!cur_q:pointer; {where a ligature should be detached} 17826 @!lig_stack:pointer; {unfinished business to the right of the cursor} 17827 @!ligature_present:boolean; {should a ligature node be made for |cur_l|?} 17828 @!lft_hit,@!rt_hit:boolean; {did we hit a ligature with a boundary character?} 17829 17830 @ @d append_charnode_to_t(#)== begin link(t):=get_avail; t:=link(t); 17831 font(t):=hf; character(t):=#; 17832 end 17833 @d set_cur_r==begin if j<n then cur_r:=qi(hu[j+1])@+else cur_r:=bchar; 17834 if odd(hyf[j]) then cur_rh:=hchar@+else cur_rh:=non_char; 17835 end 17836 17837 @<Set up data structures with the cursor following position |j|@>= 17838 cur_l:=qi(hu[j]); cur_q:=t; 17839 if j=0 then 17840 begin ligature_present:=init_lig; p:=init_list; 17841 if ligature_present then lft_hit:=init_lft; 17842 while p>null do 17843 begin append_charnode_to_t(character(p)); p:=link(p); 17844 end; 17845 end 17846 else if cur_l<non_char then append_charnode_to_t(cur_l); 17847 lig_stack:=null; set_cur_r 17848 17849 @ We may want to look at the lig/kern program twice, once for a hyphen 17850 and once for a normal letter. (The hyphen might appear after the letter 17851 in the program, so we'd better not try to look for both at once.) 17852 17853 @<If there's a ligature or kern at the cursor position, update...@>= 17854 if cur_l=non_char then 17855 begin k:=bchar_label[hf]; 17856 if k=non_address then goto done@+else q:=font_info[k].qqqq; 17857 end 17858 else begin q:=char_info(hf)(cur_l); 17859 if char_tag(q)<>lig_tag then goto done; 17860 k:=lig_kern_start(hf)(q); q:=font_info[k].qqqq; 17861 if skip_byte(q)>stop_flag then 17862 begin k:=lig_kern_restart(hf)(q); q:=font_info[k].qqqq; 17863 end; 17864 end; {now |k| is the starting address of the lig/kern program} 17865 if cur_rh<non_char then test_char:=cur_rh@+else test_char:=cur_r; 17866 loop@+begin if next_char(q)=test_char then if skip_byte(q)<=stop_flag then 17867 if cur_rh<non_char then 17868 begin hyphen_passed:=j; hchar:=non_char; cur_rh:=non_char; 17869 goto continue; 17870 end 17871 else begin if hchar<non_char then if odd(hyf[j]) then 17872 begin hyphen_passed:=j; hchar:=non_char; 17873 end; 17874 if op_byte(q)<kern_flag then 17875 @<Carry out a ligature replacement, updating the cursor structure 17876 and possibly advancing~|j|; |goto continue| if the cursor doesn't 17877 advance, otherwise |goto done|@>; 17878 w:=char_kern(hf)(q); goto done; {this kern will be inserted below} 17879 end; 17880 if skip_byte(q)>=stop_flag then 17881 if cur_rh=non_char then goto done 17882 else begin cur_rh:=non_char; goto continue; 17883 end; 17884 k:=k+qo(skip_byte(q))+1; q:=font_info[k].qqqq; 17885 end; 17886 done: 17887 17888 @ @d wrap_lig(#)==if ligature_present then 17889 begin p:=new_ligature(hf,cur_l,link(cur_q)); 17890 if lft_hit then 17891 begin subtype(p):=2; lft_hit:=false; 17892 end; 17893 if # then if lig_stack=null then 17894 begin incr(subtype(p)); rt_hit:=false; 17895 end; 17896 link(cur_q):=p; t:=p; ligature_present:=false; 17897 end 17898 @d pop_lig_stack==begin if lig_ptr(lig_stack)>null then 17899 begin link(t):=lig_ptr(lig_stack); {this is a charnode for |hu[j+1]|} 17900 t:=link(t); incr(j); 17901 end; 17902 p:=lig_stack; lig_stack:=link(p); free_node(p,small_node_size); 17903 if lig_stack=null then set_cur_r@+else cur_r:=character(lig_stack); 17904 end {if |lig_stack| isn't |null| we have |cur_rh=non_char|} 17905 17906 @<Append a ligature and/or kern to the translation...@>= 17907 wrap_lig(rt_hit); 17908 if w<>0 then 17909 begin link(t):=new_kern(w); t:=link(t); w:=0; 17910 end; 17911 if lig_stack>null then 17912 begin cur_q:=t; cur_l:=character(lig_stack); ligature_present:=true; 17913 pop_lig_stack; goto continue; 17914 end 17915 17916 @ @<Carry out a ligature replacement, updating the cursor structure...@>= 17917 begin if cur_l=non_char then lft_hit:=true; 17918 if j=n then if lig_stack=null then rt_hit:=true; 17919 check_interrupt; {allow a way out in case there's an infinite ligature loop} 17920 case op_byte(q) of 17921 qi(1),qi(5):begin cur_l:=rem_byte(q); {\.{=:\?}, \.{=:\?>}} 17922 ligature_present:=true; 17923 end; 17924 qi(2),qi(6):begin cur_r:=rem_byte(q); {\.{\?=:}, \.{\?=:>}} 17925 if lig_stack>null then character(lig_stack):=cur_r 17926 else begin lig_stack:=new_lig_item(cur_r); 17927 if j=n then bchar:=non_char 17928 else begin p:=get_avail; lig_ptr(lig_stack):=p; 17929 character(p):=qi(hu[j+1]); font(p):=hf; 17930 end; 17931 end; 17932 end; 17933 qi(3):begin cur_r:=rem_byte(q); {\.{\?=:\?}} 17934 p:=lig_stack; lig_stack:=new_lig_item(cur_r); link(lig_stack):=p; 17935 end; 17936 qi(7),qi(11):begin wrap_lig(false); {\.{\?=:\?>}, \.{\?=:\?>>}} 17937 cur_q:=t; cur_l:=rem_byte(q); ligature_present:=true; 17938 end; 17939 othercases begin cur_l:=rem_byte(q); ligature_present:=true; {\.{=:}} 17940 if lig_stack>null then pop_lig_stack 17941 else if j=n then goto done 17942 else begin append_charnode_to_t(cur_r); incr(j); set_cur_r; 17943 end; 17944 end 17945 endcases; 17946 if op_byte(q)>qi(4) then if op_byte(q)<>qi(7) then goto done; 17947 goto continue; 17948 end 17949 17950 @ Okay, we're ready to insert the potential hyphenations that were found. 17951 When the following program is executed, we want to append the word 17952 |hu[1..hn]| after node |ha|, and node |q| should be appended to the result. 17953 During this process, the variable |i| will be a temporary 17954 index into |hu|; the variable |j| will be an index to our current position 17955 in |hu|; the variable |l| will be the counterpart of |j|, in a discretionary 17956 branch; the variable |r| will point to new nodes being created; and 17957 we need a few new local variables: 17958 17959 @<Local variables for hyph...@>= 17960 @!major_tail,@!minor_tail:pointer; {the end of lists in the main and 17961 discretionary branches being reconstructed} 17962 @!c:ASCII_code; {character temporarily replaced by a hyphen} 17963 @!c_loc:0..63; {where that character came from} 17964 @!r_count:integer; {replacement count for discretionary} 17965 @!hyf_node:pointer; {the hyphen, if it exists} 17966 17967 @ When the following code is performed, |hyf[0]| and |hyf[hn]| will be zero. 17968 17969 @<Reconstitute nodes for the hyphenated word...@>= 17970 repeat l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1; 17971 if hyphen_passed=0 then 17972 begin link(s):=link(hold_head); 17973 while link(s)>null do s:=link(s); 17974 if odd(hyf[j-1]) then 17975 begin l:=j; hyphen_passed:=j-1; link(hold_head):=null; 17976 end; 17977 end; 17978 if hyphen_passed>0 then 17979 @<Create and append a discretionary node as an alternative to the 17980 unhyphenated word, and continue to develop both branches until they 17981 become equivalent@>; 17982 until j>hn; 17983 link(s):=q 17984 17985 @ In this repeat loop we will insert another discretionary if |hyf[j-1]| is 17986 odd, when both branches of the previous discretionary end at position |j-1|. 17987 Strictly speaking, we aren't justified in doing this, because we don't know 17988 that a hyphen after |j-1| is truly independent of those branches. But in almost 17989 all applications we would rather not lose a potentially valuable hyphenation 17990 point. (Consider the word `difficult', where the letter `c' is in position |j|.) 17991 17992 @d advance_major_tail==begin major_tail:=link(major_tail); incr(r_count); 17993 end 17994 17995 @<Create and append a discretionary node as an alternative...@>= 17996 repeat r:=get_node(small_node_size); 17997 link(r):=link(hold_head); type(r):=disc_node; 17998 major_tail:=r; r_count:=0; 17999 while link(major_tail)>null do advance_major_tail; 18000 i:=hyphen_passed; hyf[i]:=0; 18001 @<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>; 18002 @<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this 18003 list and to |major_tail| until synchronization has been achieved@>; 18004 @<Move pointer |s| to the end of the current list, and set |replace_count(r)| 18005 appropriately@>; 18006 hyphen_passed:=j-1; link(hold_head):=null; 18007 until not odd(hyf[j-1]) 18008 18009 @ The new hyphen might combine with the previous character via ligature 18010 or kern. At this point we have |l-1<=i<j| and |i<hn|. 18011 18012 @<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>= 18013 minor_tail:=null; pre_break(r):=null; hyf_node:=new_character(hf,hyf_char); 18014 if hyf_node<>null then 18015 begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node); 18016 end; 18017 while l<=i do 18018 begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1; 18019 if link(hold_head)>null then 18020 begin if minor_tail=null then pre_break(r):=link(hold_head) 18021 else link(minor_tail):=link(hold_head); 18022 minor_tail:=link(hold_head); 18023 while link(minor_tail)>null do minor_tail:=link(minor_tail); 18024 end; 18025 end; 18026 if hyf_node<>null then 18027 begin hu[i]:=c; {restore the character in the hyphen position} 18028 l:=i; decr(i); 18029 end 18030 18031 @ The synchronization algorithm begins with |l=i+1<=j|. 18032 18033 @<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|...@>= 18034 minor_tail:=null; post_break(r):=null; c_loc:=0; 18035 if bchar_label[hf]<>non_address then {put left boundary at beginning of new line} 18036 begin decr(l); c:=hu[l]; c_loc:=l; hu[l]:=256; 18037 end; 18038 while l<j do 18039 begin repeat l:=reconstitute(l,hn,bchar,non_char)+1; 18040 if c_loc>0 then 18041 begin hu[c_loc]:=c; c_loc:=0; 18042 end; 18043 if link(hold_head)>null then 18044 begin if minor_tail=null then post_break(r):=link(hold_head) 18045 else link(minor_tail):=link(hold_head); 18046 minor_tail:=link(hold_head); 18047 while link(minor_tail)>null do minor_tail:=link(minor_tail); 18048 end; 18049 until l>=j; 18050 while l>j do 18051 @<Append characters of |hu[j..@,]| to |major_tail|, advancing~|j|@>; 18052 end 18053 18054 @ @<Append characters of |hu[j..@,]|...@>= 18055 begin j:=reconstitute(j,hn,bchar,non_char)+1; 18056 link(major_tail):=link(hold_head); 18057 while link(major_tail)>null do advance_major_tail; 18058 end 18059 18060 @ Ligature insertion can cause a word to grow exponentially in size. Therefore 18061 we must test the size of |r_count| here, even though the hyphenated text 18062 was at most 63 characters long. 18063 18064 @<Move pointer |s| to the end of the current list...@>= 18065 if r_count>127 then {we have to forget the discretionary hyphen} 18066 begin link(s):=link(r); link(r):=null; flush_node_list(r); 18067 end 18068 else begin link(s):=r; replace_count(r):=r_count; 18069 end; 18070 s:=major_tail 18071 18072 @* \[42] Hyphenation. 18073 When a word |hc[1..hn]| has been set up to contain a candidate for hyphenation, 18074 \TeX\ first looks to see if it is in the user's exception dictionary. If not, 18075 hyphens are inserted based on patterns that appear within the given word, 18076 using an algorithm due to Frank~M. Liang. 18077 @^Liang, Franklin Mark@> 18078 18079 Let's consider Liang's method first, since it is much more interesting than the 18080 exception-lookup routine. The algorithm begins by setting |hyf[j]| to zero 18081 for all |j|, and invalid characters are inserted into |hc[0]| 18082 and |hc[hn+1]| to serve as delimiters. Then a reasonably fast method is 18083 used to see which of a given set of patterns occurs in the word 18084 |hc[0..(hn+1)]|. Each pattern $p_1\ldots p_k$ of length |k| has an associated 18085 sequence of |k+1| numbers $n_0\ldots n_k$; and if the pattern occurs in 18086 |hc[(j+1)..(j+k)]|, \TeX\ will set |hyf[j+i]:=@tmax@>(hyf[j+i],@t$n_i$@>)| for 18087 |0<=i<=k|. After this has been done for each pattern that occurs, a 18088 discretionary hyphen will be inserted between |hc[j]| and |hc[j+1]| when 18089 |hyf[j]| is odd, as we have already seen. 18090 18091 The set of patterns $p_1\ldots p_k$ and associated numbers $n_0\ldots n_k$ 18092 depends, of course, on the language whose words are being hyphenated, and 18093 on the degree of hyphenation that is desired. A method for finding 18094 appropriate |p|'s and |n|'s, from a given dictionary of words and acceptable 18095 hyphenations, is discussed in Liang's Ph.D. thesis (Stanford University, 18096 1983); \TeX\ simply starts with the patterns and works from there. 18097 18098 @ The patterns are stored in a compact table that is also efficient for 18099 retrieval, using a variant of ``trie memory'' [cf.\ {\sl The Art of 18100 Computer Programming \bf3} (1973), 481--505]. We can find each pattern 18101 $p_1\ldots p_k$ by letting $z_0$ be one greater than the relevant language 18102 index and then, for |1<=i<=k|, 18103 setting |@t$z_i$@>:=trie_link@t$(z_{i-1})+p_i$@>|; the pattern will be 18104 identified by the number $z_k$. Since all the pattern information is 18105 packed together into a single |trie_link| array, it is necessary to 18106 prevent confusion between the data from inequivalent patterns, so another 18107 table is provided such that |trie_char@t$(z_i)=p_i$@>| for all |i|. There 18108 is also a table |trie_op|$(z_k)$ to identify the numbers $n_0\ldots n_k$ 18109 associated with $p_1\ldots p_k$. 18110 18111 Comparatively few different number sequences $n_0\ldots n_k$ actually occur, 18112 since most of the |n|'s are generally zero. Therefore the number sequences 18113 are encoded in such a way that |trie_op|$(z_k)$ is only one byte long. 18114 If |trie_op(@t$z_k$@>)<>min_quarterword|, when $p_1\ldots p_k$ has matched 18115 the letters in |hc[(l-k+1)..l@,]| of language |t|, 18116 we perform all of the required operations 18117 for this pattern by carrying out the following little program: Set 18118 |v:=trie_op(@t$z_k$@>)|. Then set |v:=v+op_start[t]|, 18119 |hyf[l-hyf_distance[v]]:=@tmax@>(hyf[l-hyf_distance[v]], hyf_num[v])|, 18120 and |v:=hyf_next[v]|; repeat, if necessary, until |v=min_quarterword|. 18121 18122 @<Types...@>= 18123 @!trie_pointer=0..trie_size; {an index into |trie|} 18124 18125 @ @d trie_link(#)==trie[#].rh {``downward'' link in a trie} 18126 @d trie_char(#)==trie[#].b1 {character matched at this trie location} 18127 @d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location} 18128 18129 @<Glob...@>= 18130 @!trie:array[trie_pointer] of two_halves; {|trie_link|, |trie_char|, |trie_op|} 18131 @!hyf_distance:array[1..trie_op_size] of small_number; {position |k-j| of $n_j$} 18132 @!hyf_num:array[1..trie_op_size] of small_number; {value of $n_j$} 18133 @!hyf_next:array[1..trie_op_size] of quarterword; {continuation code} 18134 @!op_start:array[ASCII_code] of 0..trie_op_size; {offset for current language} 18135 18136 @ @<Local variables for hyph...@>= 18137 @!z:trie_pointer; {an index into |trie|} 18138 @!v:integer; {an index into |hyf_distance|, etc.} 18139 18140 @ Assuming that these auxiliary tables have been set up properly, the 18141 hyphenation algorithm is quite short. In the following code we set |hc[hn+2]| 18142 to the impossible value 256, in order to guarantee that |hc[hn+3]| will 18143 never be fetched. 18144 18145 @<Find hyphen locations for the word in |hc|...@>= 18146 for j:=0 to hn do hyf[j]:=0; 18147 @<Look for the word |hc[1..hn]| in the exception table, and |goto found| (with 18148 |hyf| containing the hyphens) if an entry is found@>; 18149 if trie_char(cur_lang+1)<>qi(cur_lang) then return; {no patterns for |cur_lang|} 18150 hc[0]:=0; hc[hn+1]:=0; hc[hn+2]:=256; {insert delimiters} 18151 for j:=0 to hn-r_hyf+1 do 18152 begin z:=trie_link(cur_lang+1)+hc[j]; l:=j; 18153 while hc[l]=qo(trie_char(z)) do 18154 begin if trie_op(z)<>min_quarterword then 18155 @<Store \(m)maximum values in the |hyf| table@>; 18156 incr(l); z:=trie_link(z)+hc[l]; 18157 end; 18158 end; 18159 found: for j:=0 to l_hyf-1 do hyf[j]:=0; 18160 for j:=0 to r_hyf-1 do hyf[hn-j]:=0 18161 18162 @ @<Store \(m)maximum values in the |hyf| table@>= 18163 begin v:=trie_op(z); 18164 repeat v:=v+op_start[cur_lang]; i:=l-hyf_distance[v]; 18165 if hyf_num[v]>hyf[i] then hyf[i]:=hyf_num[v]; 18166 v:=hyf_next[v]; 18167 until v=min_quarterword; 18168 end 18169 18170 @ The exception table that is built by \TeX's \.{\\hyphenation} primitive is 18171 organized as an ordered hash table [cf.\ Amble and Knuth, {\sl The Computer 18172 @^Amble, Ole@> @^Knuth, Donald Ervin@> 18173 Journal\/ \bf17} (1974), 135--142] using linear probing. If $\alpha$ and 18174 $\beta$ are words, we will say that $\alpha<\beta$ if $\vert\alpha\vert< 18175 \vert\beta\vert$ or if $\vert\alpha\vert=\vert\beta\vert$ and 18176 $\alpha$ is lexicographically smaller than $\beta$. (The notation $\vert 18177 \alpha\vert$ stands for the length of $\alpha$.) The idea of ordered hashing 18178 is to arrange the table so that a given word $\alpha$ can be sought by computing 18179 a hash address $h=h(\alpha)$ and then looking in table positions |h|, |h-1|, 18180 \dots, until encountering the first word $\L\alpha$. If this word is 18181 different from $\alpha$, we can conclude that $\alpha$ is not in the table. 18182 18183 The words in the table point to lists in |mem| that specify hyphen positions 18184 in their |info| fields. The list for $c_1\ldots c_n$ contains the number |k| if 18185 the word $c_1\ldots c_n$ has a discretionary hyphen between $c_k$ and 18186 $c_{k+1}$. 18187 18188 @<Types...@>= 18189 @!hyph_pointer=0..hyph_size; {an index into the ordered hash table} 18190 18191 @ @<Glob...@>= 18192 @!hyph_word:array[hyph_pointer] of str_number; {exception words} 18193 @!hyph_list:array[hyph_pointer] of pointer; {lists of hyphen positions} 18194 @!hyph_count:hyph_pointer; {the number of words in the exception dictionary} 18195 18196 @ @<Local variables for init...@>= 18197 @!z:hyph_pointer; {runs through the exception dictionary} 18198 18199 @ @<Set init...@>= 18200 for z:=0 to hyph_size do 18201 begin hyph_word[z]:=0; hyph_list[z]:=null; 18202 end; 18203 hyph_count:=0; 18204 18205 @ The algorithm for exception lookup is quite simple, as soon as we have 18206 a few more local variables to work with. 18207 18208 @<Local variables for hyph...@>= 18209 @!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|} 18210 @!k:str_number; {an index into |str_start|} 18211 @!u:pool_pointer; {an index into |str_pool|} 18212 18213 @ First we compute the hash code |h|, then we search until we either 18214 find the word or we don't. Words from different languages are kept 18215 separate by appending the language code to the string. 18216 18217 @<Look for the word |hc[1...@>= 18218 h:=hc[1]; incr(hn); hc[hn]:=cur_lang; 18219 for j:=2 to hn do h:=(h+h+hc[j]) mod hyph_size; 18220 loop@+ begin @<If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|, 18221 |goto not_found|; but if the two strings are equal, 18222 set |hyf| to the hyphen positions and |goto found|@>; 18223 if h>0 then decr(h)@+else h:=hyph_size; 18224 end; 18225 not_found: decr(hn) 18226 18227 @ @<If the string |hyph_word[h]| is less than \(hc)...@>= 18228 k:=hyph_word[h]; if k=0 then goto not_found; 18229 if length(k)<hn then goto not_found; 18230 if length(k)=hn then 18231 begin j:=1; u:=str_start[k]; 18232 repeat if so(str_pool[u])<hc[j] then goto not_found; 18233 if so(str_pool[u])>hc[j] then goto done; 18234 incr(j); incr(u); 18235 until j>hn; 18236 @<Insert hyphens as specified in |hyph_list[h]|@>; 18237 decr(hn); goto found; 18238 end; 18239 done: 18240 18241 @ @<Insert hyphens as specified...@>= 18242 s:=hyph_list[h]; 18243 while s<>null do 18244 begin hyf[info(s)]:=1; s:=link(s); 18245 end 18246 18247 @ @<Search |hyph_list| for pointers to |p|@>= 18248 for q:=0 to hyph_size do 18249 begin if hyph_list[q]=p then 18250 begin print_nl("HYPH("); print_int(q); print_char(")"); 18251 end; 18252 end 18253 18254 @ We have now completed the hyphenation routine, so the |line_break| procedure 18255 is finished at last. Since the hyphenation exception table is fresh in our 18256 minds, it's a good time to deal with the routine that adds new entries to it. 18257 18258 When \TeX\ has scanned `\.{\\hyphenation}', it calls on a procedure named 18259 |new_hyph_exceptions| to do the right thing. 18260 18261 @d set_cur_lang==if language<=0 then cur_lang:=0 18262 else if language>255 then cur_lang:=0 18263 else cur_lang:=language 18264 18265 @p procedure new_hyph_exceptions; {enters new exceptions} 18266 label reswitch, exit, found, not_found; 18267 var n:0..64; {length of current word; not always a |small_number|} 18268 @!j:0..64; {an index into |hc|} 18269 @!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|} 18270 @!k:str_number; {an index into |str_start|} 18271 @!p:pointer; {head of a list of hyphen positions} 18272 @!q:pointer; {used when creating a new node for list |p|} 18273 @!s,@!t:str_number; {strings being compared or stored} 18274 @!u,@!v:pool_pointer; {indices into |str_pool|} 18275 begin scan_left_brace; {a left brace must follow \.{\\hyphenation}} 18276 set_cur_lang; 18277 @<Enter as many hyphenation exceptions as are listed, 18278 until coming to a right brace; then |return|@>; 18279 exit:end; 18280 18281 @ @<Enter as many...@>= 18282 n:=0; p:=null; 18283 loop@+ begin get_x_token; 18284 reswitch: case cur_cmd of 18285 letter,other_char,char_given:@<Append a new letter or hyphen@>; 18286 char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given; 18287 goto reswitch; 18288 end; 18289 spacer,right_brace: begin if n>1 then @<Enter a hyphenation exception@>; 18290 if cur_cmd=right_brace then return; 18291 n:=0; p:=null; 18292 end; 18293 othercases @<Give improper \.{\\hyphenation} error@> 18294 endcases; 18295 end 18296 18297 @ @<Give improper \.{\\hyph...@>= 18298 begin print_err("Improper "); print_esc("hyphenation"); 18299 @.Improper \\hyphenation...@> 18300 print(" will be flushed"); 18301 help2("Hyphenation exceptions must contain only letters")@/ 18302 ("and hyphens. But continue; I'll forgive and forget."); 18303 error; 18304 end 18305 18306 @ @<Append a new letter or hyphen@>= 18307 if cur_chr="-" then @<Append the value |n| to list |p|@> 18308 else begin if lc_code(cur_chr)=0 then 18309 begin print_err("Not a letter"); 18310 @.Not a letter@> 18311 help2("Letters in \hyphenation words must have \lccode>0.")@/ 18312 ("Proceed; I'll ignore the character I just read."); 18313 error; 18314 end 18315 else if n<63 then 18316 begin incr(n); hc[n]:=lc_code(cur_chr); 18317 end; 18318 end 18319 18320 @ @<Append the value |n| to list |p|@>= 18321 begin if n<63 then 18322 begin q:=get_avail; link(q):=p; info(q):=n; p:=q; 18323 end; 18324 end 18325 18326 @ @<Enter a hyphenation exception@>= 18327 begin incr(n); hc[n]:=cur_lang; str_room(n); h:=0; 18328 for j:=1 to n do 18329 begin h:=(h+h+hc[j]) mod hyph_size; 18330 append_char(hc[j]); 18331 end; 18332 s:=make_string; 18333 @<Insert the \(p)pair |(s,p)| into the exception table@>; 18334 end 18335 18336 @ @<Insert the \(p)pair |(s,p)|...@>= 18337 if hyph_count=hyph_size then overflow("exception dictionary",hyph_size); 18338 @:TeX capacity exceeded exception dictionary}{\quad exception dictionary@> 18339 incr(hyph_count); 18340 while hyph_word[h]<>0 do 18341 begin @<If the string |hyph_word[h]| is less than \(or)or equal to 18342 |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|@>; 18343 if h>0 then decr(h)@+else h:=hyph_size; 18344 end; 18345 hyph_word[h]:=s; hyph_list[h]:=p 18346 18347 @ @<If the string |hyph_word[h]| is less than \(or)...@>= 18348 k:=hyph_word[h]; 18349 if length(k)<length(s) then goto found; 18350 if length(k)>length(s) then goto not_found; 18351 u:=str_start[k]; v:=str_start[s]; 18352 repeat if str_pool[u]<str_pool[v] then goto found; 18353 if str_pool[u]>str_pool[v] then goto not_found; 18354 incr(u); incr(v); 18355 until u=str_start[k+1]; 18356 found:q:=hyph_list[h]; hyph_list[h]:=p; p:=q;@/ 18357 t:=hyph_word[h]; hyph_word[h]:=s; s:=t; 18358 not_found: 18359 18360 @* \[43] Initializing the hyphenation tables. 18361 The trie for \TeX's hyphenation algorithm is built from a sequence of 18362 patterns following a \.{\\patterns} specification. Such a specification 18363 is allowed only in \.{INITEX}, since the extra memory for auxiliary tables 18364 and for the initialization program itself would only clutter up the 18365 production version of \TeX\ with a lot of deadwood. 18366 18367 The first step is to build a trie that is linked, instead of packed 18368 into sequential storage, so that insertions are readily made. 18369 After all patterns have been processed, \.{INITEX} 18370 compresses the linked trie by identifying common subtries. Finally the 18371 trie is packed into the efficient sequential form that the hyphenation 18372 algorithm actually uses. 18373 18374 @<Declare subprocedures for |line_break|@>= 18375 @!init @<Declare procedures for preprocessing hyphenation patterns@>@; 18376 tini 18377 18378 @ Before we discuss trie building in detail, let's consider the simpler 18379 problem of creating the |hyf_distance|, |hyf_num|, and |hyf_next| arrays. 18380 18381 Suppose, for example, that \TeX\ reads the pattern `\.{ab2cde1}'. This is 18382 a pattern of length 5, with $n_0\ldots n_5=0\,0\,2\,0\,0\,1$ in the 18383 notation above. We want the corresponding |trie_op| code |v| to have 18384 |hyf_distance[v]=3|, |hyf_num[v]=2|, and |hyf_next[v]=@t$v^\prime$@>|, 18385 where the auxiliary |trie_op| code $v^\prime$ has 18386 |hyf_distance[@t$v^\prime$@>]=0|, |hyf_num[@t$v^\prime$@>]=1|, and 18387 |hyf_next[@t$v^\prime$@>]=min_quarterword|. 18388 18389 \TeX\ computes an appropriate value |v| with the |new_trie_op| subroutine 18390 below, by setting 18391 $$\hbox{|@t$v^\prime$@>:=new_trie_op(0,1,min_quarterword)|,\qquad 18392 |v:=new_trie_op(3,2,@t$v^\prime$@>)|.}$$ 18393 This subroutine looks up its three 18394 parameters in a special hash table, assigning a new value only if these 18395 three have not appeared before for the current language. 18396 18397 The hash table is called |trie_op_hash|, and the number of entries it contains 18398 is |trie_op_ptr|. 18399 18400 @<Glob...@>= 18401 @!init @!trie_op_hash:array[-trie_op_size..trie_op_size] of 0..trie_op_size; 18402 {trie op codes for quadruples} 18403 @!trie_used:array[ASCII_code] of quarterword; 18404 {largest opcode used so far for this language} 18405 @!trie_op_lang:array[1..trie_op_size] of ASCII_code; 18406 {language part of a hashed quadruple} 18407 @!trie_op_val:array[1..trie_op_size] of quarterword; 18408 {opcode corresponding to a hashed quadruple} 18409 @!trie_op_ptr:0..trie_op_size; {number of stored ops so far} 18410 tini 18411 18412 @ It's tempting to remove the |overflow| stops in the following procedure; 18413 |new_trie_op| could return |min_quarterword| (thereby simply ignoring 18414 part of a hyphenation pattern) instead of aborting the job. However, that would 18415 lead to different hyphenation results on different installations of \TeX\ 18416 using the same patterns. The |overflow| stops are necessary for portability 18417 of patterns. 18418 18419 @<Declare procedures for preprocessing hyph...@>= 18420 function new_trie_op(@!d,@!n:small_number;@!v:quarterword):quarterword; 18421 label exit; 18422 var h:-trie_op_size..trie_op_size; {trial hash location} 18423 @!u:quarterword; {trial op code} 18424 @!l:0..trie_op_size; {pointer to stored data} 18425 begin h:=abs(n+313*d+361*v+1009*cur_lang) mod (trie_op_size+trie_op_size) 18426 - trie_op_size; 18427 loop@+ begin l:=trie_op_hash[h]; 18428 if l=0 then {empty position found for a new op} 18429 begin if trie_op_ptr=trie_op_size then 18430 overflow("pattern memory ops",trie_op_size); 18431 u:=trie_used[cur_lang]; 18432 if u=max_quarterword then 18433 overflow("pattern memory ops per language", 18434 max_quarterword-min_quarterword); 18435 incr(trie_op_ptr); incr(u); trie_used[cur_lang]:=u; 18436 hyf_distance[trie_op_ptr]:=d; 18437 hyf_num[trie_op_ptr]:=n; hyf_next[trie_op_ptr]:=v; 18438 trie_op_lang[trie_op_ptr]:=cur_lang; trie_op_hash[h]:=trie_op_ptr; 18439 trie_op_val[trie_op_ptr]:=u; new_trie_op:=u; return; 18440 end; 18441 if (hyf_distance[l]=d)and(hyf_num[l]=n)and(hyf_next[l]=v) 18442 and(trie_op_lang[l]=cur_lang) then 18443 begin new_trie_op:=trie_op_val[l]; return; 18444 end; 18445 if h>-trie_op_size then decr(h)@+else h:=trie_op_size; 18446 end; 18447 exit:end; 18448 18449 @ After |new_trie_op| has compressed the necessary opcode information, 18450 plenty of information is available to unscramble the data into the 18451 final form needed by our hyphenation algorithm. 18452 18453 @<Sort \(t)the hyphenation op tables into proper order@>= 18454 op_start[0]:=-min_quarterword; 18455 for j:=1 to 255 do op_start[j]:=op_start[j-1]+qo(trie_used[j-1]); 18456 for j:=1 to trie_op_ptr do 18457 trie_op_hash[j]:=op_start[trie_op_lang[j]]+trie_op_val[j]; {destination} 18458 for j:=1 to trie_op_ptr do while trie_op_hash[j]>j do 18459 begin k:=trie_op_hash[j];@/ 18460 t:=hyf_distance[k]; hyf_distance[k]:=hyf_distance[j]; hyf_distance[j]:=t;@/ 18461 t:=hyf_num[k]; hyf_num[k]:=hyf_num[j]; hyf_num[j]:=t;@/ 18462 t:=hyf_next[k]; hyf_next[k]:=hyf_next[j]; hyf_next[j]:=t;@/ 18463 trie_op_hash[j]:=trie_op_hash[k]; trie_op_hash[k]:=k; 18464 end 18465 18466 @ Before we forget how to initialize the data structures that have been 18467 mentioned so far, let's write down the code that gets them started. 18468 18469 @<Initialize table entries...@>= 18470 for k:=-trie_op_size to trie_op_size do trie_op_hash[k]:=0; 18471 for k:=0 to 255 do trie_used[k]:=min_quarterword; 18472 trie_op_ptr:=0; 18473 18474 @ The linked trie that is used to preprocess hyphenation patterns appears 18475 in several global arrays. Each node represents an instruction of the form 18476 ``if you see character |c|, then perform operation |o|, move to the 18477 next character, and go to node |l|; otherwise go to node |r|.'' 18478 The four quantities |c|, |o|, |l|, and |r| are stored in four arrays 18479 |trie_c|, |trie_o|, |trie_l|, and |trie_r|. The root of the trie 18480 is |trie_l[0]|, and the number of nodes is |trie_ptr|. Null trie 18481 pointers are represented by zero. To initialize the trie, we simply 18482 set |trie_l[0]| and |trie_ptr| to zero. We also set |trie_c[0]| to some 18483 arbitrary value, since the algorithm may access it. 18484 18485 The algorithms maintain the condition 18486 $$\hbox{|trie_c[trie_r[z]]>trie_c[z]|\qquad 18487 whenever |z<>0| and |trie_r[z]<>0|};$$ in other words, sibling nodes are 18488 ordered by their |c| fields. 18489 18490 @d trie_root==trie_l[0] {root of the linked trie} 18491 18492 @<Glob...@>= 18493 @!init @!trie_c:packed array[trie_pointer] of packed_ASCII_code; 18494 {characters to match} 18495 @t\hskip10pt@>@!trie_o:packed array[trie_pointer] of quarterword; 18496 {operations to perform} 18497 @t\hskip10pt@>@!trie_l:packed array[trie_pointer] of trie_pointer; 18498 {left subtrie links} 18499 @t\hskip10pt@>@!trie_r:packed array[trie_pointer] of trie_pointer; 18500 {right subtrie links} 18501 @t\hskip10pt@>@!trie_ptr:trie_pointer; {the number of nodes in the trie} 18502 @t\hskip10pt@>@!trie_hash:packed array[trie_pointer] of trie_pointer; 18503 {used to identify equivalent subtries} 18504 tini 18505 18506 @ Let us suppose that a linked trie has already been constructed. 18507 Experience shows that we can often reduce its size by recognizing common 18508 subtries; therefore another hash table is introduced for this purpose, 18509 somewhat similar to |trie_op_hash|. The new hash table will be 18510 initialized to zero. 18511 18512 The function |trie_node(p)| returns |p| if |p| is distinct from other nodes 18513 that it has seen, otherwise it returns the number of the first equivalent 18514 node that it has seen. 18515 18516 Notice that we might make subtries equivalent even if they correspond to 18517 patterns for different languages, in which the trie ops might mean quite 18518 different things. That's perfectly all right. 18519 18520 @<Declare procedures for preprocessing hyph...@>= 18521 function trie_node(@!p:trie_pointer):trie_pointer; {converts 18522 to a canonical form} 18523 label exit; 18524 var h:trie_pointer; {trial hash location} 18525 @!q:trie_pointer; {trial trie node} 18526 begin h:=abs(trie_c[p]+1009*trie_o[p]+@| 18527 2718*trie_l[p]+3142*trie_r[p]) mod trie_size; 18528 loop@+ begin q:=trie_hash[h]; 18529 if q=0 then 18530 begin trie_hash[h]:=p; trie_node:=p; return; 18531 end; 18532 if (trie_c[q]=trie_c[p])and(trie_o[q]=trie_o[p])and@| 18533 (trie_l[q]=trie_l[p])and(trie_r[q]=trie_r[p]) then 18534 begin trie_node:=q; return; 18535 end; 18536 if h>0 then decr(h)@+else h:=trie_size; 18537 end; 18538 exit:end; 18539 18540 @ A neat recursive procedure is now able to compress a trie by 18541 traversing it and applying |trie_node| to its nodes in ``bottom up'' 18542 fashion. We will compress the entire trie by clearing |trie_hash| to 18543 zero and then saying `|trie_root:=compress_trie(trie_root)|'. 18544 @^recursion@> 18545 18546 @<Declare procedures for preprocessing hyph...@>= 18547 function compress_trie(@!p:trie_pointer):trie_pointer; 18548 begin if p=0 then compress_trie:=0 18549 else begin trie_l[p]:=compress_trie(trie_l[p]); 18550 trie_r[p]:=compress_trie(trie_r[p]); 18551 compress_trie:=trie_node(p); 18552 end; 18553 end; 18554 18555 @ The compressed trie will be packed into the |trie| array using a 18556 ``top-down first-fit'' procedure. This is a little tricky, so the reader 18557 should pay close attention: The |trie_hash| array is cleared to zero 18558 again and renamed |trie_ref| for this phase of the operation; later on, 18559 |trie_ref[p]| will be nonzero only if the linked trie node |p| is the 18560 smallest character 18561 in a family and if the characters |c| of that family have been allocated to 18562 locations |trie_ref[p]+c| in the |trie| array. Locations of |trie| that 18563 are in use will have |trie_link=0|, while the unused holes in |trie| 18564 will be doubly linked with |trie_link| pointing to the next larger vacant 18565 location and |trie_back| pointing to the next smaller one. This double 18566 linking will have been carried out only as far as |trie_max|, where 18567 |trie_max| is the largest index of |trie| that will be needed. 18568 To save time at the low end of the trie, we maintain array entries 18569 |trie_min[c]| pointing to the smallest hole that is greater than~|c|. 18570 Another array |trie_taken| tells whether or not a given location is 18571 equal to |trie_ref[p]| for some |p|; this array is used to ensure that 18572 distinct nodes in the compressed trie will have distinct |trie_ref| 18573 entries. 18574 18575 @d trie_ref==trie_hash {where linked trie families go into |trie|} 18576 @d trie_back(#)==trie[#].lh {backward links in |trie| holes} 18577 18578 @<Glob...@>= 18579 @!init @!trie_taken:packed array[1..trie_size] of boolean; 18580 {does a family start here?} 18581 @t\hskip10pt@>@!trie_min:array[ASCII_code] of trie_pointer; 18582 {the first possible slot for each character} 18583 @t\hskip10pt@>@!trie_max:trie_pointer; {largest location used in |trie|} 18584 @t\hskip10pt@>@!trie_not_ready:boolean; {is the trie still in linked form?} 18585 tini 18586 18587 @ Each time \.{\\patterns} appears, it contributes further patterns to 18588 the future trie, which will be built only when hyphenation is attempted or 18589 when a format file is dumped. The boolean variable |trie_not_ready| 18590 will change to |false| when the trie is compressed; this will disable 18591 further patterns. 18592 18593 @<Initialize table entries...@>= 18594 trie_not_ready:=true; trie_root:=0; trie_c[0]:=si(0); trie_ptr:=0; 18595 18596 @ Here is how the trie-compression data structures are initialized. 18597 If storage is tight, it would be possible to overlap |trie_op_hash|, 18598 |trie_op_lang|, and |trie_op_val| with |trie|, |trie_hash|, and |trie_taken|, 18599 because we finish with the former just before we need the latter. 18600 18601 @<Get ready to compress the trie@>= 18602 @<Sort \(t)the hyphenation...@>; 18603 for p:=0 to trie_size do trie_hash[p]:=0; 18604 trie_root:=compress_trie(trie_root); {identify equivalent subtries} 18605 for p:=0 to trie_ptr do trie_ref[p]:=0; 18606 for p:=0 to 255 do trie_min[p]:=p+1; 18607 trie_link(0):=1; trie_max:=0 18608 18609 @ The |first_fit| procedure finds the smallest hole |z| in |trie| such that 18610 a trie family starting at a given node |p| will fit into vacant positions 18611 starting at |z|. If |c=trie_c[p]|, this means that location |z-c| must 18612 not already be taken by some other family, and that |z-c+@t$c^\prime$@>| 18613 must be vacant for all characters $c^\prime$ in the family. The procedure 18614 sets |trie_ref[p]| to |z-c| when the first fit has been found. 18615 18616 @<Declare procedures for preprocessing hyph...@>= 18617 procedure first_fit(@!p:trie_pointer); {packs a family into |trie|} 18618 label not_found,found; 18619 var h:trie_pointer; {candidate for |trie_ref[p]|} 18620 @!z:trie_pointer; {runs through holes} 18621 @!q:trie_pointer; {runs through the family starting at |p|} 18622 @!c:ASCII_code; {smallest character in the family} 18623 @!l,@!r:trie_pointer; {left and right neighbors} 18624 @!ll:1..256; {upper limit of |trie_min| updating} 18625 begin c:=so(trie_c[p]); 18626 z:=trie_min[c]; {get the first conceivably good hole} 18627 loop@+ begin h:=z-c;@/ 18628 @<Ensure that |trie_max>=h+256|@>; 18629 if trie_taken[h] then goto not_found; 18630 @<If all characters of the family fit relative to |h|, then 18631 |goto found|,\30\ otherwise |goto not_found|@>; 18632 not_found: z:=trie_link(z); {move to the next hole} 18633 end; 18634 found: @<Pack the family into |trie| relative to |h|@>; 18635 end; 18636 18637 @ By making sure that |trie_max| is at least |h+256|, we can be sure that 18638 |trie_max>z|, since |h=z-c|. It follows that location |trie_max| will 18639 never be occupied in |trie|, and we will have |trie_max>=trie_link(z)|. 18640 18641 @<Ensure that |trie_max>=h+256|@>= 18642 if trie_max<h+256 then 18643 begin if trie_size<=h+256 then overflow("pattern memory",trie_size); 18644 @:TeX capacity exceeded pattern memory}{\quad pattern memory@> 18645 repeat incr(trie_max); trie_taken[trie_max]:=false; 18646 trie_link(trie_max):=trie_max+1; trie_back(trie_max):=trie_max-1; 18647 until trie_max=h+256; 18648 end 18649 18650 @ @<If all characters of the family fit relative to |h|...@>= 18651 q:=trie_r[p]; 18652 while q>0 do 18653 begin if trie_link(h+so(trie_c[q]))=0 then goto not_found; 18654 q:=trie_r[q]; 18655 end; 18656 goto found 18657 18658 @ @<Pack the family into |trie| relative to |h|@>= 18659 trie_taken[h]:=true; trie_ref[p]:=h; q:=p; 18660 repeat z:=h+so(trie_c[q]); l:=trie_back(z); r:=trie_link(z); 18661 trie_back(r):=l; trie_link(l):=r; trie_link(z):=0; 18662 if l<256 then 18663 begin if z<256 then ll:=z @+else ll:=256; 18664 repeat trie_min[l]:=r; incr(l); 18665 until l=ll; 18666 end; 18667 q:=trie_r[q]; 18668 until q=0 18669 18670 @ To pack the entire linked trie, we use the following recursive procedure. 18671 @^recursion@> 18672 18673 @<Declare procedures for preprocessing hyph...@>= 18674 procedure trie_pack(@!p:trie_pointer); {pack subtries of a family} 18675 var q:trie_pointer; {a local variable that need not be saved on recursive calls} 18676 begin repeat q:=trie_l[p]; 18677 if (q>0)and(trie_ref[q]=0) then 18678 begin first_fit(q); trie_pack(q); 18679 end; 18680 p:=trie_r[p]; 18681 until p=0; 18682 end; 18683 18684 @ When the whole trie has been allocated into the sequential table, we 18685 must go through it once again so that |trie| contains the correct 18686 information. Null pointers in the linked trie will be represented by the 18687 value~0, which properly implements an ``empty'' family. 18688 18689 @<Move the data into |trie|@>= 18690 h.rh:=0; h.b0:=min_quarterword; h.b1:=min_quarterword; {|trie_link:=0|, 18691 |trie_op:=min_quarterword|, |trie_char:=qi(0)|} 18692 if trie_root=0 then {no patterns were given} 18693 begin for r:=0 to 256 do trie[r]:=h; 18694 trie_max:=256; 18695 end 18696 else begin trie_fix(trie_root); {this fixes the non-holes in |trie|} 18697 r:=0; {now we will zero out all the holes} 18698 repeat s:=trie_link(r); trie[r]:=h; r:=s; 18699 until r>trie_max; 18700 end; 18701 trie_char(0):=qi("?"); {make |trie_char(c)<>c| for all |c|} 18702 18703 @ The fixing-up procedure is, of course, recursive. Since the linked trie 18704 usually has overlapping subtries, the same data may be moved several 18705 times; but that causes no harm, and at most as much work is done as it 18706 took to build the uncompressed trie. 18707 @^recursion@> 18708 18709 @<Declare procedures for preprocessing hyph...@>= 18710 procedure trie_fix(@!p:trie_pointer); {moves |p| and its siblings into |trie|} 18711 var q:trie_pointer; {a local variable that need not be saved on recursive calls} 18712 @!c:ASCII_code; {another one that need not be saved} 18713 @!z:trie_pointer; {|trie| reference; this local variable must be saved} 18714 begin z:=trie_ref[p]; 18715 repeat q:=trie_l[p]; c:=so(trie_c[p]); 18716 trie_link(z+c):=trie_ref[q]; trie_char(z+c):=qi(c); trie_op(z+c):=trie_o[p]; 18717 if q>0 then trie_fix(q); 18718 p:=trie_r[p]; 18719 until p=0; 18720 end; 18721 18722 @ Now let's go back to the easier problem, of building the linked 18723 trie. When \.{INITEX} has scanned the `\.{\\patterns}' control 18724 sequence, it calls on |new_patterns| to do the right thing. 18725 18726 @<Declare procedures for preprocessing hyph...@>= 18727 procedure new_patterns; {initializes the hyphenation pattern data} 18728 label done, done1; 18729 var k,@!l:0..64; {indices into |hc| and |hyf|; 18730 not always in |small_number| range} 18731 @!digit_sensed:boolean; {should the next digit be treated as a letter?} 18732 @!v:quarterword; {trie op code} 18733 @!p,@!q:trie_pointer; {nodes of trie traversed during insertion} 18734 @!first_child:boolean; {is |p=trie_l[q]|?} 18735 @!c:ASCII_code; {character being inserted} 18736 begin if trie_not_ready then 18737 begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}} 18738 @<Enter all of the patterns into a linked trie, until coming to a right 18739 brace@>; 18740 end 18741 else begin print_err("Too late for "); print_esc("patterns"); 18742 help1("All patterns must be given before typesetting begins."); 18743 error; link(garbage):=scan_toks(false,false); flush_list(def_ref); 18744 end; 18745 end; 18746 18747 @ Novices are not supposed to be using \.{\\patterns}, so the error 18748 messages are terse. (Note that all error messages appear in \TeX's string 18749 pool, even if they are used only by \.{INITEX}.) 18750 18751 @<Enter all of the patterns into a linked trie...@>= 18752 k:=0; hyf[0]:=0; digit_sensed:=false; 18753 loop@+ begin get_x_token; 18754 case cur_cmd of 18755 letter,other_char:@<Append a new letter or a hyphen level@>; 18756 spacer,right_brace: begin if k>0 then 18757 @<Insert a new pattern into the linked trie@>; 18758 if cur_cmd=right_brace then goto done; 18759 k:=0; hyf[0]:=0; digit_sensed:=false; 18760 end; 18761 othercases begin print_err("Bad "); print_esc("patterns"); 18762 @.Bad \\patterns@> 18763 help1("(See Appendix H.)"); error; 18764 end 18765 endcases; 18766 end; 18767 done: 18768 18769 @ @<Append a new letter or a hyphen level@>= 18770 if digit_sensed or(cur_chr<"0")or(cur_chr>"9") then 18771 begin if cur_chr="." then cur_chr:=0 {edge-of-word delimiter} 18772 else begin cur_chr:=lc_code(cur_chr); 18773 if cur_chr=0 then 18774 begin print_err("Nonletter"); 18775 @.Nonletter@> 18776 help1("(See Appendix H.)"); error; 18777 end; 18778 end; 18779 if k<63 then 18780 begin incr(k); hc[k]:=cur_chr; hyf[k]:=0; digit_sensed:=false; 18781 end; 18782 end 18783 else if k<63 then 18784 begin hyf[k]:=cur_chr-"0"; digit_sensed:=true; 18785 end 18786 18787 @ When the following code comes into play, the pattern $p_1\ldots p_k$ 18788 appears in |hc[1..k]|, and the corresponding sequence of numbers $n_0\ldots 18789 n_k$ appears in |hyf[0..k]|. 18790 18791 @<Insert a new pattern into the linked trie@>= 18792 begin @<Compute the trie op code, |v|, and set |l:=0|@>; 18793 q:=0; hc[0]:=cur_lang; 18794 while l<=k do 18795 begin c:=hc[l]; incr(l); p:=trie_l[q]; first_child:=true; 18796 while (p>0)and(c>so(trie_c[p])) do 18797 begin q:=p; p:=trie_r[q]; first_child:=false; 18798 end; 18799 if (p=0)or(c<so(trie_c[p])) then 18800 @<Insert a new trie node between |q| and |p|, and 18801 make |p| point to it@>; 18802 q:=p; {now node |q| represents $p_1\ldots p_{l-1}$} 18803 end; 18804 if trie_o[q]<>min_quarterword then 18805 begin print_err("Duplicate pattern"); 18806 @.Duplicate pattern@> 18807 help1("(See Appendix H.)"); error; 18808 end; 18809 trie_o[q]:=v; 18810 end 18811 18812 @ @<Insert a new trie node between |q| and |p|...@>= 18813 begin if trie_ptr=trie_size then overflow("pattern memory",trie_size); 18814 @:TeX capacity exceeded pattern memory}{\quad pattern memory@> 18815 incr(trie_ptr); trie_r[trie_ptr]:=p; p:=trie_ptr; trie_l[p]:=0; 18816 if first_child then trie_l[q]:=p@+else trie_r[q]:=p; 18817 trie_c[p]:=si(c); trie_o[p]:=min_quarterword; 18818 end 18819 18820 @ @<Compute the trie op code, |v|...@>= 18821 if hc[1]=0 then hyf[0]:=0; 18822 if hc[k]=0 then hyf[k]:=0; 18823 l:=k; v:=min_quarterword; 18824 loop@+ begin if hyf[l]<>0 then v:=new_trie_op(k-l,hyf[l],v); 18825 if l>0 then decr(l)@+else goto done1; 18826 end; 18827 done1: 18828 18829 @ Finally we put everything together: Here is how the trie gets to its 18830 final, efficient form. 18831 The following packing routine is rigged so that the root of the linked 18832 tree gets mapped into location 1 of |trie|, as required by the hyphenation 18833 algorithm. This happens because the first call of |first_fit| will 18834 ``take'' location~1. 18835 18836 @<Declare procedures for preprocessing hyphenation patterns@>= 18837 procedure init_trie; 18838 var @!p:trie_pointer; {pointer for initialization} 18839 @!j,@!k,@!t:integer; {all-purpose registers for initialization} 18840 @!r,@!s:trie_pointer; {used to clean up the packed |trie|} 18841 @!h:two_halves; {template used to zero out |trie|'s holes} 18842 begin @<Get ready to compress the trie@>; 18843 if trie_root<>0 then 18844 begin first_fit(trie_root); trie_pack(trie_root); 18845 end; 18846 @<Move the data into |trie|@>; 18847 trie_not_ready:=false; 18848 end; 18849 18850 @* \[44] Breaking vertical lists into pages. 18851 The |vsplit| procedure, which implements \TeX's \.{\\vsplit} operation, 18852 is considerably simpler than |line_break| because it doesn't have to 18853 worry about hyphenation, and because its mission is to discover a single 18854 break instead of an optimum sequence of breakpoints. But before we get 18855 into the details of |vsplit|, we need to consider a few more basic things. 18856 18857 @ A subroutine called |prune_page_top| takes a pointer to a vlist and 18858 returns a pointer to a modified vlist in which all glue, kern, and penalty nodes 18859 have been deleted before the first box or rule node. However, the first 18860 box or rule is actually preceded by a newly created glue node designed so that 18861 the topmost baseline will be at distance |split_top_skip| from the top, 18862 whenever this is possible without backspacing. 18863 18864 In this routine and those that follow, we make use of the fact that a 18865 vertical list contains no character nodes, hence the |type| field exists 18866 for each node in the list. 18867 @^data structure assumptions@> 18868 18869 @p function prune_page_top(@!p:pointer):pointer; {adjust top after page break} 18870 var prev_p:pointer; {lags one step behind |p|} 18871 @!q:pointer; {temporary variable for list manipulation} 18872 begin prev_p:=temp_head; link(temp_head):=p; 18873 while p<>null do 18874 case type(p) of 18875 hlist_node,vlist_node,rule_node:@<Insert glue for |split_top_skip| 18876 and set~|p:=null|@>; 18877 whatsit_node,mark_node,ins_node: begin prev_p:=p; p:=link(prev_p); 18878 end; 18879 glue_node,kern_node,penalty_node: begin q:=p; p:=link(q); link(q):=null; 18880 link(prev_p):=p; flush_node_list(q); 18881 end; 18882 othercases confusion("pruning") 18883 @:this can't happen pruning}{\quad pruning@> 18884 endcases; 18885 prune_page_top:=link(temp_head); 18886 end; 18887 18888 @ @<Insert glue for |split_top_skip|...@>= 18889 begin q:=new_skip_param(split_top_skip_code); link(prev_p):=q; link(q):=p; 18890 {now |temp_ptr=glue_ptr(q)|} 18891 if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p) 18892 else width(temp_ptr):=0; 18893 p:=null; 18894 end 18895 18896 @ The next subroutine finds the best place to break a given vertical list 18897 so as to obtain a box of height~|h|, with maximum depth~|d|. 18898 A pointer to the beginning of the vertical list is given, 18899 and a pointer to the optimum breakpoint is returned. The list is effectively 18900 followed by a forced break, i.e., a penalty node with the |eject_penalty|; 18901 if the best break occurs at this artificial node, the value |null| is returned. 18902 18903 An array of six |scaled| distances is used to keep track of the height 18904 from the beginning of the list to the current place, just as in |line_break|. 18905 In fact, we use one of the same arrays, only changing its name to reflect 18906 its new significance. 18907 18908 @d active_height==active_width {new name for the six distance variables} 18909 @d cur_height==active_height[1] {the natural height} 18910 @d set_height_zero(#)==active_height[#]:=0 {initialize the height to zero} 18911 @# 18912 @d update_heights=90 {go here to record glue in the |active_height| table} 18913 18914 @p function vert_break(@!p:pointer; @!h,@!d:scaled):pointer; 18915 {finds optimum page break} 18916 label done,not_found,update_heights; 18917 var prev_p:pointer; {if |p| is a glue node, |type(prev_p)| determines 18918 whether |p| is a legal breakpoint} 18919 @!q,@!r:pointer; {glue specifications} 18920 @!pi:integer; {penalty value} 18921 @!b:integer; {badness at a trial breakpoint} 18922 @!least_cost:integer; {the smallest badness plus penalties found so far} 18923 @!best_place:pointer; {the most recent break that leads to |least_cost|} 18924 @!prev_dp:scaled; {depth of previous box in the list} 18925 @!t:small_number; {|type| of the node following a kern} 18926 begin prev_p:=p; {an initial glue node is not a legal breakpoint} 18927 least_cost:=awful_bad; do_all_six(set_height_zero); prev_dp:=0; 18928 loop@+ begin @<If node |p| is a legal breakpoint, check if this break is 18929 the best known, and |goto done| if |p| is null or 18930 if the page-so-far is already too full to accept more stuff@>; 18931 prev_p:=p; p:=link(prev_p); 18932 end; 18933 done: vert_break:=best_place; 18934 end; 18935 18936 @ A global variable |best_height_plus_depth| will be set to the natural size 18937 of the box that corresponds to the optimum breakpoint found by |vert_break|. 18938 (This value is used by the insertion-splitting algorithm of the page builder.) 18939 18940 @<Glob...@>= 18941 @!best_height_plus_depth:scaled; {height of the best box, without stretching or 18942 shrinking} 18943 18944 @ A subtle point to be noted here is that the maximum depth~|d| might be 18945 negative, so |cur_height| and |prev_dp| might need to be corrected even 18946 after a glue or kern node. 18947 18948 @<If node |p| is a legal breakpoint, check...@>= 18949 if p=null then pi:=eject_penalty 18950 else @<Use node |p| to update the current height and depth measurements; 18951 if this node is not a legal breakpoint, |goto not_found| 18952 or |update_heights|, 18953 otherwise set |pi| to the associated penalty at the break@>; 18954 @<Check if node |p| is a new champion breakpoint; then \(go)|goto done| 18955 if |p| is a forced break or if the page-so-far is already too full@>; 18956 if (type(p)<glue_node)or(type(p)>kern_node) then goto not_found; 18957 update_heights: @<Update the current height and depth measurements with 18958 respect to a glue or kern node~|p|@>; 18959 not_found: if prev_dp>d then 18960 begin cur_height:=cur_height+prev_dp-d; 18961 prev_dp:=d; 18962 end; 18963 18964 @ @<Use node |p| to update the current height and depth measurements...@>= 18965 case type(p) of 18966 hlist_node,vlist_node,rule_node: begin@t@>@;@/ 18967 cur_height:=cur_height+prev_dp+height(p); prev_dp:=depth(p); 18968 goto not_found; 18969 end; 18970 whatsit_node:@<Process whatsit |p| in |vert_break| loop, |goto not_found|@>; 18971 glue_node: if precedes_break(prev_p) then pi:=0 18972 else goto update_heights; 18973 kern_node: begin if link(p)=null then t:=penalty_node 18974 else t:=type(link(p)); 18975 if t=glue_node then pi:=0@+else goto update_heights; 18976 end; 18977 penalty_node: pi:=penalty(p); 18978 mark_node,ins_node: goto not_found; 18979 othercases confusion("vertbreak") 18980 @:this can't happen vertbreak}{\quad vertbreak@> 18981 endcases 18982 18983 @ @d deplorable==100000 {more than |inf_bad|, but less than |awful_bad|} 18984 18985 @<Check if node |p| is a new champion breakpoint; then \(go)...@>= 18986 if pi<inf_penalty then 18987 begin @<Compute the badness, |b|, using |awful_bad| 18988 if the box is too full@>; 18989 if b<awful_bad then 18990 if pi<=eject_penalty then b:=pi 18991 else if b<inf_bad then b:=b+pi 18992 else b:=deplorable; 18993 if b<=least_cost then 18994 begin best_place:=p; least_cost:=b; 18995 best_height_plus_depth:=cur_height+prev_dp; 18996 end; 18997 if (b=awful_bad)or(pi<=eject_penalty) then goto done; 18998 end 18999 19000 @ @<Compute the badness, |b|, using |awful_bad| if the box is too full@>= 19001 if cur_height<h then 19002 if (active_height[3]<>0) or (active_height[4]<>0) or 19003 (active_height[5]<>0) then b:=0 19004 else b:=badness(h-cur_height,active_height[2]) 19005 else if cur_height-h>active_height[6] then b:=awful_bad 19006 else b:=badness(cur_height-h,active_height[6]) 19007 19008 @ Vertical lists that are subject to the |vert_break| procedure should not 19009 contain infinite shrinkability, since that would permit any amount of 19010 information to ``fit'' on one page. 19011 19012 @<Update the current height and depth measurements with...@>= 19013 if type(p)=kern_node then q:=p 19014 else begin q:=glue_ptr(p); 19015 active_height[2+stretch_order(q)]:=@| 19016 active_height[2+stretch_order(q)]+stretch(q);@/ 19017 active_height[6]:=active_height[6]+shrink(q); 19018 if (shrink_order(q)<>normal)and(shrink(q)<>0) then 19019 begin@t@>@;@/ 19020 print_err("Infinite glue shrinkage found in box being split");@/ 19021 @.Infinite glue shrinkage...@> 19022 help4("The box you are \vsplitting contains some infinitely")@/ 19023 ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/ 19024 ("Such glue doesn't belong there; but you can safely proceed,")@/ 19025 ("since the offensive shrinkability has been made finite."); 19026 error; r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q); 19027 glue_ptr(p):=r; q:=r; 19028 end; 19029 end; 19030 cur_height:=cur_height+prev_dp+width(q); prev_dp:=0 19031 19032 @ Now we are ready to consider |vsplit| itself. Most of 19033 its work is accomplished by the two subroutines that we have just considered. 19034 19035 Given the number of a vlist box |n|, and given a desired page height |h|, 19036 the |vsplit| function finds the best initial segment of the vlist and 19037 returns a box for a page of height~|h|. The remainder of the vlist, if 19038 any, replaces the original box, after removing glue and penalties and 19039 adjusting for |split_top_skip|. Mark nodes in the split-off box are used to 19040 set the values of |split_first_mark| and |split_bot_mark|; we use the 19041 fact that |split_first_mark=null| if and only if |split_bot_mark=null|. 19042 19043 The original box becomes ``void'' if and only if it has been entirely 19044 extracted. The extracted box is ``void'' if and only if the original 19045 box was void (or if it was, erroneously, an hlist box). 19046 19047 @p function vsplit(@!n:eight_bits; @!h:scaled):pointer; 19048 {extracts a page of height |h| from box |n|} 19049 label exit,done; 19050 var v:pointer; {the box to be split} 19051 p:pointer; {runs through the vlist} 19052 q:pointer; {points to where the break occurs} 19053 begin v:=box(n); 19054 if split_first_mark<>null then 19055 begin delete_token_ref(split_first_mark); split_first_mark:=null; 19056 delete_token_ref(split_bot_mark); split_bot_mark:=null; 19057 end; 19058 @<Dispense with trivial cases of void or bad boxes@>; 19059 q:=vert_break(list_ptr(v),h,split_max_depth); 19060 @<Look at all the marks in nodes before the break, and set the final 19061 link to |null| at the break@>; 19062 q:=prune_page_top(q); p:=list_ptr(v); free_node(v,box_node_size); 19063 if q=null then box(n):=null {the |eq_level| of the box stays the same} 19064 else box(n):=vpack(q,natural); 19065 vsplit:=vpackage(p,h,exactly,split_max_depth); 19066 exit: end; 19067 19068 @ @<Dispense with trivial cases of void or bad boxes@>= 19069 if v=null then 19070 begin vsplit:=null; return; 19071 end; 19072 if type(v)<>vlist_node then 19073 begin print_err(""); print_esc("vsplit"); print(" needs a "); 19074 print_esc("vbox"); 19075 @:vsplit_}{\.{\\vsplit needs a \\vbox}@> 19076 help2("The box you are trying to split is an \hbox.")@/ 19077 ("I can't split such a box, so I'll leave it alone."); 19078 error; vsplit:=null; return; 19079 end 19080 19081 @ It's possible that the box begins with a penalty node that is the 19082 ``best'' break, so we must be careful to handle this special case correctly. 19083 19084 @<Look at all the marks...@>= 19085 p:=list_ptr(v); 19086 if p=q then list_ptr(v):=null 19087 else loop@+begin if type(p)=mark_node then 19088 if split_first_mark=null then 19089 begin split_first_mark:=mark_ptr(p); 19090 split_bot_mark:=split_first_mark; 19091 token_ref_count(split_first_mark):=@| 19092 token_ref_count(split_first_mark)+2; 19093 end 19094 else begin delete_token_ref(split_bot_mark); 19095 split_bot_mark:=mark_ptr(p); 19096 add_token_ref(split_bot_mark); 19097 end; 19098 if link(p)=q then 19099 begin link(p):=null; goto done; 19100 end; 19101 p:=link(p); 19102 end; 19103 done: 19104 19105 @* \[45] The page builder. 19106 When \TeX\ appends new material to its main vlist in vertical mode, it uses 19107 a method something like |vsplit| to decide where a page ends, except that 19108 the calculations are done ``on line'' as new items come in. 19109 The main complication in this process is that insertions must be put 19110 into their boxes and removed from the vlist, in a more-or-less optimum manner. 19111 19112 We shall use the term ``current page'' for that part of the main vlist that 19113 is being considered as a candidate for being broken off and sent to the 19114 user's output routine. The current page starts at |link(page_head)|, and 19115 it ends at |page_tail|. We have |page_head=page_tail| if this list is empty. 19116 @^current page@> 19117 19118 Utter chaos would reign if the user kept changing page specifications 19119 while a page is being constructed, so the page builder keeps the pertinent 19120 specifications frozen as soon as the page receives its first box or 19121 insertion. The global variable |page_contents| is |empty| when the 19122 current page contains only mark nodes and content-less whatsit nodes; it 19123 is |inserts_only| if the page contains only insertion nodes in addition to 19124 marks and whatsits. Glue nodes, kern nodes, and penalty nodes are 19125 discarded until a box or rule node appears, at which time |page_contents| 19126 changes to |box_there|. As soon as |page_contents| becomes non-|empty|, 19127 the current |vsize| and |max_depth| are squirreled away into |page_goal| 19128 and |page_max_depth|; the latter values will be used until the page has 19129 been forwarded to the user's output routine. The \.{\\topskip} adjustment 19130 is made when |page_contents| changes to |box_there|. 19131 19132 Although |page_goal| starts out equal to |vsize|, it is decreased by the 19133 scaled natural height-plus-depth of the insertions considered so far, and by 19134 the \.{\\skip} corrections for those insertions. Therefore it represents 19135 the size into which the non-inserted material should fit, assuming that 19136 all insertions in the current page have been made. 19137 19138 The global variables |best_page_break| and |least_page_cost| correspond 19139 respectively to the local variables |best_place| and |least_cost| in the 19140 |vert_break| routine that we have already studied; i.e., they record the 19141 location and value of the best place currently known for breaking the 19142 current page. The value of |page_goal| at the time of the best break is 19143 stored in |best_size|. 19144 19145 @d inserts_only=1 19146 {|page_contents| when an insert node has been contributed, but no boxes} 19147 @d box_there=2 {|page_contents| when a box or rule has been contributed} 19148 19149 @<Glob...@>= 19150 @!page_tail:pointer; {the final node on the current page} 19151 @!page_contents:empty..box_there; {what is on the current page so far?} 19152 @!page_max_depth:scaled; {maximum box depth on page being built} 19153 @!best_page_break:pointer; {break here to get the best page known so far} 19154 @!least_page_cost:integer; {the score for this currently best page} 19155 @!best_size:scaled; {its |page_goal|} 19156 19157 @ The page builder has another data structure to keep track of insertions. 19158 This is a list of four-word nodes, starting and ending at |page_ins_head|. 19159 That is, the first element of the list is node |r@t$_1$@>=link(page_ins_head)|; 19160 node $r_j$ is followed by |r@t$_{j+1}$@>=link(r@t$_j$@>)|; and if there are 19161 |n| items we have |r@t$_{n+1}$@>=page_ins_head|. The |subtype| field of 19162 each node in this list refers to an insertion number; for example, `\.{\\insert 19163 250}' would correspond to a node whose |subtype| is |qi(250)| 19164 (the same as the |subtype| field of the relevant |ins_node|). These |subtype| 19165 fields are in increasing order, and |subtype(page_ins_head)= 19166 qi(255)|, so |page_ins_head| serves as a convenient sentinel 19167 at the end of the list. A record is present for each insertion number that 19168 appears in the current page. 19169 19170 The |type| field in these nodes distinguishes two possibilities that 19171 might occur as we look ahead before deciding on the optimum page break. 19172 If |type(r)=inserting|, then |height(r)| contains the total of the 19173 height-plus-depth dimensions of the box and all its inserts seen so far. 19174 If |type(r)=split_up|, then no more insertions will be made into this box, 19175 because at least one previous insertion was too big to fit on the current 19176 page; |broken_ptr(r)| points to the node where that insertion will be 19177 split, if \TeX\ decides to split it, |broken_ins(r)| points to the 19178 insertion node that was tentatively split, and |height(r)| includes also the 19179 natural height plus depth of the part that would be split off. 19180 19181 In both cases, |last_ins_ptr(r)| points to the last |ins_node| 19182 encountered for box |qo(subtype(r))| that would be at least partially 19183 inserted on the next page; and |best_ins_ptr(r)| points to the last 19184 such |ins_node| that should actually be inserted, to get the page with 19185 minimum badness among all page breaks considered so far. We have 19186 |best_ins_ptr(r)=null| if and only if no insertion for this box should 19187 be made to produce this optimum page. 19188 19189 The data structure definitions here use the fact that the |@!height| field 19190 appears in the fourth word of a box node. 19191 @^data structure assumptions@> 19192 19193 @d page_ins_node_size=4 {number of words for a page insertion node} 19194 @d inserting=0 {an insertion class that has not yet overflowed} 19195 @d split_up=1 {an overflowed insertion class} 19196 @d broken_ptr(#)==link(#+1) 19197 {an insertion for this class will break here if anywhere} 19198 @d broken_ins(#)==info(#+1) {this insertion might break at |broken_ptr|} 19199 @d last_ins_ptr(#)==link(#+2) {the most recent insertion for this |subtype|} 19200 @d best_ins_ptr(#)==info(#+2) {the optimum most recent insertion} 19201 19202 @<Initialize the special list heads...@>= 19203 subtype(page_ins_head):=qi(255); 19204 type(page_ins_head):=split_up; link(page_ins_head):=page_ins_head; 19205 19206 @ An array |page_so_far| records the heights and depths of everything 19207 on the current page. This array contains six |scaled| numbers, like the 19208 similar arrays already considered in |line_break| and |vert_break|; and it 19209 also contains |page_goal| and |page_depth|, since these values are 19210 all accessible to the user via |set_page_dimen| commands. The 19211 value of |page_so_far[1]| is also called |page_total|. The stretch 19212 and shrink components of the \.{\\skip} corrections for each insertion are 19213 included in |page_so_far|, but the natural space components of these 19214 corrections are not, since they have been subtracted from |page_goal|. 19215 19216 The variable |page_depth| records the depth of the current page; it has been 19217 adjusted so that it is at most |page_max_depth|. The variable 19218 |last_glue| points to the glue specification of the most recent node 19219 contributed from the contribution list, if this was a glue node; otherwise 19220 |last_glue=max_halfword|. (If the contribution list is nonempty, 19221 however, the value of |last_glue| is not necessarily accurate.) 19222 The variables |last_penalty| and |last_kern| are similar. And 19223 finally, |insert_penalties| holds the sum of the penalties associated with 19224 all split and floating insertions. 19225 19226 @d page_goal==page_so_far[0] {desired height of information on page being built} 19227 @d page_total==page_so_far[1] {height of the current page} 19228 @d page_shrink==page_so_far[6] {shrinkability of the current page} 19229 @d page_depth==page_so_far[7] {depth of the current page} 19230 19231 @<Glob...@>= 19232 @!page_so_far:array [0..7] of scaled; {height and glue of the current page} 19233 @!last_glue:pointer; {used to implement \.{\\lastskip}} 19234 @!last_penalty:integer; {used to implement \.{\\lastpenalty}} 19235 @!last_kern:scaled; {used to implement \.{\\lastkern}} 19236 @!insert_penalties:integer; {sum of the penalties for insertions 19237 that were held over} 19238 19239 @ @<Put each...@>= 19240 primitive("pagegoal",set_page_dimen,0); 19241 @!@:page_goal_}{\.{\\pagegoal} primitive@> 19242 primitive("pagetotal",set_page_dimen,1); 19243 @!@:page_total_}{\.{\\pagetotal} primitive@> 19244 primitive("pagestretch",set_page_dimen,2); 19245 @!@:page_stretch_}{\.{\\pagestretch} primitive@> 19246 primitive("pagefilstretch",set_page_dimen,3); 19247 @!@:page_fil_stretch_}{\.{\\pagefilstretch} primitive@> 19248 primitive("pagefillstretch",set_page_dimen,4); 19249 @!@:page_fill_stretch_}{\.{\\pagefillstretch} primitive@> 19250 primitive("pagefilllstretch",set_page_dimen,5); 19251 @!@:page_filll_stretch_}{\.{\\pagefilllstretch} primitive@> 19252 primitive("pageshrink",set_page_dimen,6); 19253 @!@:page_shrink_}{\.{\\pageshrink} primitive@> 19254 primitive("pagedepth",set_page_dimen,7); 19255 @!@:page_depth_}{\.{\\pagedepth} primitive@> 19256 19257 @ @<Cases of |print_cmd_chr|...@>= 19258 set_page_dimen: case chr_code of 19259 0: print_esc("pagegoal"); 19260 1: print_esc("pagetotal"); 19261 2: print_esc("pagestretch"); 19262 3: print_esc("pagefilstretch"); 19263 4: print_esc("pagefillstretch"); 19264 5: print_esc("pagefilllstretch"); 19265 6: print_esc("pageshrink"); 19266 othercases print_esc("pagedepth") 19267 endcases; 19268 19269 @ @d print_plus_end(#)==print(#);@+end 19270 @d print_plus(#)==if page_so_far[#]<>0 then 19271 begin print(" plus "); print_scaled(page_so_far[#]); print_plus_end 19272 19273 @p procedure print_totals; 19274 begin print_scaled(page_total); 19275 print_plus(2)(""); 19276 print_plus(3)("fil"); 19277 print_plus(4)("fill"); 19278 print_plus(5)("filll"); 19279 if page_shrink<>0 then 19280 begin print(" minus "); print_scaled(page_shrink); 19281 end; 19282 end; 19283 19284 @ @<Show the status of the current page@>= 19285 if page_head<>page_tail then 19286 begin print_nl("### current page:"); 19287 if output_active then print(" (held over for next output)"); 19288 @.held over for next output@> 19289 show_box(link(page_head)); 19290 if page_contents>empty then 19291 begin print_nl("total height "); print_totals; 19292 @:total_height}{\.{total height}@> 19293 print_nl(" goal height "); print_scaled(page_goal); 19294 @.goal height@> 19295 r:=link(page_ins_head); 19296 while r<>page_ins_head do 19297 begin print_ln; print_esc("insert"); t:=qo(subtype(r)); 19298 print_int(t); print(" adds "); 19299 if count(t)=1000 then t:=height(r) 19300 else t:=x_over_n(height(r),1000)*count(t); 19301 print_scaled(t); 19302 if type(r)=split_up then 19303 begin q:=page_head; t:=0; 19304 repeat q:=link(q); 19305 if (type(q)=ins_node)and(subtype(q)=subtype(r)) then incr(t); 19306 until q=broken_ins(r); 19307 print(", #"); print_int(t); print(" might split"); 19308 end; 19309 r:=link(r); 19310 end; 19311 end; 19312 end 19313 19314 @ Here is a procedure that is called when the |page_contents| is changing 19315 from |empty| to |inserts_only| or |box_there|. 19316 19317 @d set_page_so_far_zero(#)==page_so_far[#]:=0 19318 19319 @p procedure freeze_page_specs(@!s:small_number); 19320 begin page_contents:=s; 19321 page_goal:=vsize; page_max_depth:=max_depth; 19322 page_depth:=0; do_all_six(set_page_so_far_zero); 19323 least_page_cost:=awful_bad; 19324 @!stat if tracing_pages>0 then 19325 begin begin_diagnostic; 19326 print_nl("%% goal height="); print_scaled(page_goal); 19327 @.goal height@> 19328 print(", max depth="); print_scaled(page_max_depth); 19329 end_diagnostic(false); 19330 end;@;@+tats@;@/ 19331 end; 19332 19333 @ Pages are built by appending nodes to the current list in \TeX's 19334 vertical mode, which is at the outermost level of the semantic nest. This 19335 vlist is split into two parts; the ``current page'' that we have been 19336 talking so much about already, and the ``contribution list'' that receives 19337 new nodes as they are created. The current page contains everything that 19338 the page builder has accounted for in its data structures, as described 19339 above, while the contribution list contains other things that have been 19340 generated by other parts of \TeX\ but have not yet been 19341 seen by the page builder. 19342 The contribution list starts at |link(contrib_head)|, and it ends at the 19343 current node in \TeX's vertical mode. 19344 19345 When \TeX\ has appended new material in vertical mode, it calls the procedure 19346 |build_page|, which tries to catch up by moving nodes from the contribution 19347 list to the current page. This procedure will succeed in its goal of 19348 emptying the contribution list, unless a page break is discovered, i.e., 19349 unless the current page has grown to the point where the optimum next 19350 page break has been determined. In the latter case, the nodes after the 19351 optimum break will go back onto the contribution list, and control will 19352 effectively pass to the user's output routine. 19353 19354 We make |type(page_head)=glue_node|, so that an initial glue node on 19355 the current page will not be considered a valid breakpoint. 19356 19357 @<Initialize the special list...@>= 19358 type(page_head):=glue_node; subtype(page_head):=normal; 19359 19360 @ The global variable |output_active| is true during the time the 19361 user's output routine is driving \TeX. 19362 19363 @<Glob...@>= 19364 @!output_active:boolean; {are we in the midst of an output routine?} 19365 19366 @ @<Set init...@>= 19367 output_active:=false; insert_penalties:=0; 19368 19369 @ The page builder is ready to start a fresh page if we initialize 19370 the following state variables. (However, the page insertion list is initialized 19371 elsewhere.) 19372 19373 @<Start a new current page@>= 19374 page_contents:=empty; page_tail:=page_head; link(page_head):=null;@/ 19375 last_glue:=max_halfword; last_penalty:=0; last_kern:=0; 19376 page_depth:=0; page_max_depth:=0 19377 19378 @ At certain times box 255 is supposed to be void (i.e., |null|), 19379 or an insertion box is supposed to be ready to accept a vertical list. 19380 If not, an error message is printed, and the following subroutine 19381 flushes the unwanted contents, reporting them to the user. 19382 19383 @p procedure box_error(@!n:eight_bits); 19384 begin error; begin_diagnostic; 19385 print_nl("The following box has been deleted:"); 19386 @.The following...deleted@> 19387 show_box(box(n)); end_diagnostic(true); 19388 flush_node_list(box(n)); box(n):=null; 19389 end; 19390 19391 @ The following procedure guarantees that a given box register 19392 does not contain an \.{\\hbox}. 19393 19394 @p procedure ensure_vbox(@!n:eight_bits); 19395 var p:pointer; {the box register contents} 19396 begin p:=box(n); 19397 if p<>null then if type(p)=hlist_node then 19398 begin print_err("Insertions can only be added to a vbox"); 19399 @.Insertions can only...@> 19400 help3("Tut tut: You're trying to \insert into a")@/ 19401 ("\box register that now contains an \hbox.")@/ 19402 ("Proceed, and I'll discard its present contents."); 19403 box_error(n); 19404 end; 19405 end; 19406 19407 @ \TeX\ is not always in vertical mode at the time |build_page| 19408 is called; the current mode reflects what \TeX\ should return to, after 19409 the contribution list has been emptied. A call on |build_page| should 19410 be immediately followed by `|goto big_switch|', which is \TeX's central 19411 control point. 19412 19413 @d contribute=80 {go here to link a node into the current page} 19414 19415 @p @t\4@>@<Declare the procedure called |fire_up|@>@;@/ 19416 procedure build_page; {append contributions to the current page} 19417 label exit,done,done1,continue,contribute,update_heights; 19418 var p:pointer; {the node being appended} 19419 @!q,@!r:pointer; {nodes being examined} 19420 @!b,@!c:integer; {badness and cost of current page} 19421 @!pi:integer; {penalty to be added to the badness} 19422 @!n:min_quarterword..255; {insertion box number} 19423 @!delta,@!h,@!w:scaled; {sizes used for insertion calculations} 19424 begin if (link(contrib_head)=null)or output_active then return; 19425 repeat continue: p:=link(contrib_head);@/ 19426 @<Update the values of |last_glue|, |last_penalty|, and |last_kern|@>; 19427 @<Move node |p| to the current page; if it is time for a page break, 19428 put the nodes following the break back onto the contribution list, 19429 and |return| to the user's output routine if there is one@>; 19430 until link(contrib_head)=null; 19431 @<Make the contribution list empty by setting its tail to |contrib_head|@>; 19432 exit:end; 19433 19434 @ @d contrib_tail==nest[0].tail_field {tail of the contribution list} 19435 19436 @<Make the contribution list empty...@>= 19437 if nest_ptr=0 then tail:=contrib_head {vertical mode} 19438 else contrib_tail:=contrib_head {other modes} 19439 19440 @ @<Update the values of |last_glue|...@>= 19441 if last_glue<>max_halfword then delete_glue_ref(last_glue); 19442 last_penalty:=0; last_kern:=0; 19443 if type(p)=glue_node then 19444 begin last_glue:=glue_ptr(p); add_glue_ref(last_glue); 19445 end 19446 else begin last_glue:=max_halfword; 19447 if type(p)=penalty_node then last_penalty:=penalty(p) 19448 else if type(p)=kern_node then last_kern:=width(p); 19449 end 19450 19451 @ The code here is an example of a many-way switch into routines that 19452 merge together in different places. Some people call this unstructured 19453 programming, but the author doesn't see much wrong with it, as long as 19454 @^Knuth, Donald Ervin@> 19455 the various labels have a well-understood meaning. 19456 19457 @<Move node |p| to the current page; ...@>= 19458 @<If the current page is empty and node |p| is to be deleted, |goto done1|; 19459 otherwise use node |p| to update the state of the current page; 19460 if this node is an insertion, |goto contribute|; otherwise if this node 19461 is not a legal breakpoint, |goto contribute| or |update_heights|; 19462 otherwise set |pi| to the penalty associated with this breakpoint@>; 19463 @<Check if node |p| is a new champion breakpoint; then \(if)if it is time for 19464 a page break, prepare for output, and either fire up the user's 19465 output routine and |return| or ship out the page and |goto done|@>; 19466 if (type(p)<glue_node)or(type(p)>kern_node) then goto contribute; 19467 update_heights:@<Update the current page measurements with respect to the 19468 glue or kern specified by node~|p|@>; 19469 contribute: @<Make sure that |page_max_depth| is not exceeded@>; 19470 @<Link node |p| into the current page and |goto done|@>; 19471 done1:@<Recycle node |p|@>; 19472 done: 19473 19474 @ @<Link node |p| into the current page and |goto done|@>= 19475 link(page_tail):=p; page_tail:=p; 19476 link(contrib_head):=link(p); link(p):=null; goto done 19477 19478 @ @<Recycle node |p|@>= 19479 link(contrib_head):=link(p); link(p):=null; flush_node_list(p) 19480 19481 @ The title of this section is already so long, it seems best to avoid 19482 making it more accurate but still longer, by mentioning the fact that a 19483 kern node at the end of the contribution list will not be contributed until 19484 we know its successor. 19485 19486 @<If the current page is empty...@>= 19487 case type(p) of 19488 hlist_node,vlist_node,rule_node: if page_contents<box_there then 19489 @<Initialize the current page, insert the \.{\\topskip} glue 19490 ahead of |p|, and |goto continue|@> 19491 else @<Prepare to move a box or rule node to the current page, 19492 then |goto contribute|@>; 19493 whatsit_node: @<Prepare to move whatsit |p| to the current page, 19494 then |goto contribute|@>; 19495 glue_node: if page_contents<box_there then goto done1 19496 else if precedes_break(page_tail) then pi:=0 19497 else goto update_heights; 19498 kern_node: if page_contents<box_there then goto done1 19499 else if link(p)=null then return 19500 else if type(link(p))=glue_node then pi:=0 19501 else goto update_heights; 19502 penalty_node: if page_contents<box_there then goto done1@+else pi:=penalty(p); 19503 mark_node: goto contribute; 19504 ins_node: @<Append an insertion to the current page and |goto contribute|@>; 19505 othercases confusion("page") 19506 @:this can't happen page}{\quad page@> 19507 endcases 19508 19509 @ @<Initialize the current page, insert the \.{\\topskip} glue...@>= 19510 begin if page_contents=empty then freeze_page_specs(box_there) 19511 else page_contents:=box_there; 19512 q:=new_skip_param(top_skip_code); {now |temp_ptr=glue_ptr(q)|} 19513 if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p) 19514 else width(temp_ptr):=0; 19515 link(q):=p; link(contrib_head):=q; goto continue; 19516 end 19517 19518 @ @<Prepare to move a box or rule node to the current page...@>= 19519 begin page_total:=page_total+page_depth+height(p); 19520 page_depth:=depth(p); 19521 goto contribute; 19522 end 19523 19524 @ @<Make sure that |page_max_depth| is not exceeded@>= 19525 if page_depth>page_max_depth then 19526 begin page_total:=@| 19527 page_total+page_depth-page_max_depth;@/ 19528 page_depth:=page_max_depth; 19529 end; 19530 19531 @ @<Update the current page measurements with respect to the glue...@>= 19532 if type(p)=kern_node then q:=p 19533 else begin q:=glue_ptr(p); 19534 page_so_far[2+stretch_order(q)]:=@| 19535 page_so_far[2+stretch_order(q)]+stretch(q);@/ 19536 page_shrink:=page_shrink+shrink(q); 19537 if (shrink_order(q)<>normal)and(shrink(q)<>0) then 19538 begin@t@>@;@/ 19539 print_err("Infinite glue shrinkage found on current page");@/ 19540 @.Infinite glue shrinkage...@> 19541 help4("The page about to be output contains some infinitely")@/ 19542 ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/ 19543 ("Such glue doesn't belong there; but you can safely proceed,")@/ 19544 ("since the offensive shrinkability has been made finite."); 19545 error; 19546 r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q); 19547 glue_ptr(p):=r; q:=r; 19548 end; 19549 end; 19550 page_total:=page_total+page_depth+width(q); page_depth:=0 19551 19552 @ @<Check if node |p| is a new champion breakpoint; then \(if)...@>= 19553 if pi<inf_penalty then 19554 begin @<Compute the badness, |b|, of the current page, 19555 using |awful_bad| if the box is too full@>; 19556 if b<awful_bad then 19557 if pi<=eject_penalty then c:=pi 19558 else if b<inf_bad then c:=b+pi+insert_penalties 19559 else c:=deplorable 19560 else c:=b; 19561 if insert_penalties>=10000 then c:=awful_bad; 19562 @!stat if tracing_pages>0 then @<Display the page break cost@>;@+tats@;@/ 19563 if c<=least_page_cost then 19564 begin best_page_break:=p; best_size:=page_goal; 19565 least_page_cost:=c; 19566 r:=link(page_ins_head); 19567 while r<>page_ins_head do 19568 begin best_ins_ptr(r):=last_ins_ptr(r); 19569 r:=link(r); 19570 end; 19571 end; 19572 if (c=awful_bad)or(pi<=eject_penalty) then 19573 begin fire_up(p); {output the current page at the best place} 19574 if output_active then return; {user's output routine will act} 19575 goto done; {the page has been shipped out by default output routine} 19576 end; 19577 end 19578 19579 @ @<Display the page break cost@>= 19580 begin begin_diagnostic; print_nl("%"); 19581 print(" t="); print_totals;@/ 19582 print(" g="); print_scaled(page_goal);@/ 19583 print(" b="); 19584 if b=awful_bad then print_char("*")@+else print_int(b); 19585 @.*\relax@> 19586 print(" p="); print_int(pi); 19587 print(" c="); 19588 if c=awful_bad then print_char("*")@+else print_int(c); 19589 if c<=least_page_cost then print_char("#"); 19590 end_diagnostic(false); 19591 end 19592 19593 @ @<Compute the badness, |b|, of the current page...@>= 19594 if page_total<page_goal then 19595 if (page_so_far[3]<>0) or (page_so_far[4]<>0) or@| 19596 (page_so_far[5]<>0) then b:=0 19597 else b:=badness(page_goal-page_total,page_so_far[2]) 19598 else if page_total-page_goal>page_shrink then b:=awful_bad 19599 else b:=badness(page_total-page_goal,page_shrink) 19600 19601 @ @<Append an insertion to the current page and |goto contribute|@>= 19602 begin if page_contents=empty then freeze_page_specs(inserts_only); 19603 n:=subtype(p); r:=page_ins_head; 19604 while n>=subtype(link(r)) do r:=link(r); 19605 n:=qo(n); 19606 if subtype(r)<>qi(n) then 19607 @<Create a page insertion node with |subtype(r)=qi(n)|, and 19608 include the glue correction for box |n| in the 19609 current page state@>; 19610 if type(r)=split_up then insert_penalties:=insert_penalties+float_cost(p) 19611 else begin last_ins_ptr(r):=p; 19612 delta:=page_goal-page_total-page_depth+page_shrink; 19613 {this much room is left if we shrink the maximum} 19614 if count(n)=1000 then h:=height(p) 19615 else h:=x_over_n(height(p),1000)*count(n); {this much room is needed} 19616 if ((h<=0)or(h<=delta))and(height(p)+height(r)<=dimen(n)) then 19617 begin page_goal:=page_goal-h; height(r):=height(r)+height(p); 19618 end 19619 else @<Find the best way to split the insertion, and change 19620 |type(r)| to |split_up|@>; 19621 end; 19622 goto contribute; 19623 end 19624 19625 @ We take note of the value of \.{\\skip} |n| and the height plus depth 19626 of \.{\\box}~|n| only when the first \.{\\insert}~|n| node is 19627 encountered for a new page. A user who changes the contents of \.{\\box}~|n| 19628 after that first \.{\\insert}~|n| had better be either extremely careful 19629 or extremely lucky, or both. 19630 19631 @<Create a page insertion node...@>= 19632 begin q:=get_node(page_ins_node_size); link(q):=link(r); link(r):=q; r:=q; 19633 subtype(r):=qi(n); type(r):=inserting; ensure_vbox(n); 19634 if box(n)=null then height(r):=0 19635 else height(r):=height(box(n))+depth(box(n)); 19636 best_ins_ptr(r):=null;@/ 19637 q:=skip(n); 19638 if count(n)=1000 then h:=height(r) 19639 else h:=x_over_n(height(r),1000)*count(n); 19640 page_goal:=page_goal-h-width(q);@/ 19641 page_so_far[2+stretch_order(q)]:=@|page_so_far[2+stretch_order(q)]+stretch(q);@/ 19642 page_shrink:=page_shrink+shrink(q); 19643 if (shrink_order(q)<>normal)and(shrink(q)<>0) then 19644 begin print_err("Infinite glue shrinkage inserted from "); print_esc("skip"); 19645 @.Infinite glue shrinkage...@> 19646 print_int(n); 19647 help3("The correction glue for page breaking with insertions")@/ 19648 ("must have finite shrinkability. But you may proceed,")@/ 19649 ("since the offensive shrinkability has been made finite."); 19650 error; 19651 end; 19652 end 19653 19654 @ Here is the code that will split a long footnote between pages, in an 19655 emergency. The current situation deserves to be recapitulated: Node |p| 19656 is an insertion into box |n|; the insertion will not fit, in its entirety, 19657 either because it would make the total contents of box |n| greater than 19658 \.{\\dimen} |n|, or because it would make the incremental amount of growth 19659 |h| greater than the available space |delta|, or both. (This amount |h| has 19660 been weighted by the insertion scaling factor, i.e., by \.{\\count} |n| 19661 over 1000.) Now we will choose the best way to break the vlist of the 19662 insertion, using the same criteria as in the \.{\\vsplit} operation. 19663 19664 @<Find the best way to split the insertion...@>= 19665 begin if count(n)<=0 then w:=max_dimen 19666 else begin w:=page_goal-page_total-page_depth; 19667 if count(n)<>1000 then w:=x_over_n(w,count(n))*1000; 19668 end; 19669 if w>dimen(n)-height(r) then w:=dimen(n)-height(r); 19670 q:=vert_break(ins_ptr(p),w,depth(p)); 19671 height(r):=height(r)+best_height_plus_depth; 19672 @!stat if tracing_pages>0 then @<Display the insertion split cost@>;@+tats@;@/ 19673 if count(n)<>1000 then 19674 best_height_plus_depth:=x_over_n(best_height_plus_depth,1000)*count(n); 19675 page_goal:=page_goal-best_height_plus_depth; 19676 type(r):=split_up; broken_ptr(r):=q; broken_ins(r):=p; 19677 if q=null then insert_penalties:=insert_penalties+eject_penalty 19678 else if type(q)=penalty_node then insert_penalties:=insert_penalties+penalty(q); 19679 end 19680 19681 @ @<Display the insertion split cost@>= 19682 begin begin_diagnostic; print_nl("% split"); print_int(n); 19683 @.split@> 19684 print(" to "); print_scaled(w); 19685 print_char(","); print_scaled(best_height_plus_depth);@/ 19686 print(" p="); 19687 if q=null then print_int(eject_penalty) 19688 else if type(q)=penalty_node then print_int(penalty(q)) 19689 else print_char("0"); 19690 end_diagnostic(false); 19691 end 19692 19693 @ When the page builder has looked at as much material as could appear before 19694 the next page break, it makes its decision. The break that gave minimum 19695 badness will be used to put a completed ``page'' into box 255, with insertions 19696 appended to their other boxes. 19697 19698 We also set the values of |top_mark|, |first_mark|, and |bot_mark|. The 19699 program uses the fact that |bot_mark<>null| implies |first_mark<>null|; 19700 it also knows that |bot_mark=null| implies |top_mark=first_mark=null|. 19701 19702 The |fire_up| subroutine prepares to output the current page at the best 19703 place; then it fires up the user's output routine, if there is one, 19704 or it simply ships out the page. There is one parameter, |c|, which represents 19705 the node that was being contributed to the page when the decision to 19706 force an output was made. 19707 19708 @<Declare the procedure called |fire_up|@>= 19709 procedure fire_up(@!c:pointer); 19710 label exit; 19711 var p,@!q,@!r,@!s:pointer; {nodes being examined and/or changed} 19712 @!prev_p:pointer; {predecessor of |p|} 19713 @!n:min_quarterword..255; {insertion box number} 19714 @!wait:boolean; {should the present insertion be held over?} 19715 @!save_vbadness:integer; {saved value of |vbadness|} 19716 @!save_vfuzz: scaled; {saved value of |vfuzz|} 19717 @!save_split_top_skip: pointer; {saved value of |split_top_skip|} 19718 begin @<Set the value of |output_penalty|@>; 19719 if bot_mark<>null then 19720 begin if top_mark<>null then delete_token_ref(top_mark); 19721 top_mark:=bot_mark; add_token_ref(top_mark); 19722 delete_token_ref(first_mark); first_mark:=null; 19723 end; 19724 @<Put the \(o)optimal current page into box 255, update |first_mark| and 19725 |bot_mark|, append insertions to their boxes, and put the 19726 remaining nodes back on the contribution list@>; 19727 if (top_mark<>null)and(first_mark=null) then 19728 begin first_mark:=top_mark; add_token_ref(top_mark); 19729 end; 19730 if output_routine<>null then 19731 if dead_cycles>=max_dead_cycles then 19732 @<Explain that too many dead cycles have occurred in a row@> 19733 else @<Fire up the user's output routine and |return|@>; 19734 @<Perform the default output routine@>; 19735 exit:end; 19736 19737 @ @<Set the value of |output_penalty|@>= 19738 if type(best_page_break)=penalty_node then 19739 begin geq_word_define(int_base+output_penalty_code,penalty(best_page_break)); 19740 penalty(best_page_break):=inf_penalty; 19741 end 19742 else geq_word_define(int_base+output_penalty_code,inf_penalty) 19743 19744 @ As the page is finally being prepared for output, 19745 pointer |p| runs through the vlist, with |prev_p| trailing behind; 19746 pointer |q| is the tail of a list of insertions that 19747 are being held over for a subsequent page. 19748 19749 @<Put the \(o)optimal current page into box 255...@>= 19750 if c=best_page_break then best_page_break:=null; {|c| not yet linked in} 19751 @<Ensure that box 255 is empty before output@>; 19752 insert_penalties:=0; {this will count the number of insertions held over} 19753 save_split_top_skip:=split_top_skip; 19754 if holding_inserts<=0 then 19755 @<Prepare all the boxes involved in insertions to act as queues@>; 19756 q:=hold_head; link(q):=null; prev_p:=page_head; p:=link(prev_p); 19757 while p<>best_page_break do 19758 begin if type(p)=ins_node then 19759 begin if holding_inserts<=0 then 19760 @<Either insert the material specified by node |p| into the 19761 appropriate box, or hold it for the next page; 19762 also delete node |p| from the current page@>; 19763 end 19764 else if type(p)=mark_node then @<Update the values of 19765 |first_mark| and |bot_mark|@>; 19766 prev_p:=p; p:=link(prev_p); 19767 end; 19768 split_top_skip:=save_split_top_skip; 19769 @<Break the current page at node |p|, put it in box~255, 19770 and put the remaining nodes on the contribution list@>; 19771 @<Delete \(t)the page-insertion nodes@> 19772 19773 @ @<Ensure that box 255 is empty before output@>= 19774 if box(255)<>null then 19775 begin print_err(""); print_esc("box"); print("255 is not void"); 19776 @:box255}{\.{\\box255 is not void}@> 19777 help2("You shouldn't use \box255 except in \output routines.")@/ 19778 ("Proceed, and I'll discard its present contents."); 19779 box_error(255); 19780 end 19781 19782 @ @<Update the values of |first_mark| and |bot_mark|@>= 19783 begin if first_mark=null then 19784 begin first_mark:=mark_ptr(p); 19785 add_token_ref(first_mark); 19786 end; 19787 if bot_mark<>null then delete_token_ref(bot_mark); 19788 bot_mark:=mark_ptr(p); add_token_ref(bot_mark); 19789 end 19790 19791 @ When the following code is executed, the current page runs from node 19792 |link(page_head)| to node |prev_p|, and the nodes from |p| to |page_tail| 19793 are to be placed back at the front of the contribution list. Furthermore 19794 the heldover insertions appear in a list from |link(hold_head)| to |q|; we 19795 will put them into the current page list for safekeeping while the user's 19796 output routine is active. We might have |q=hold_head|; and |p=null| if 19797 and only if |prev_p=page_tail|. Error messages are suppressed within 19798 |vpackage|, since the box might appear to be overfull or underfull simply 19799 because the stretch and shrink from the \.{\\skip} registers for inserts 19800 are not actually present in the box. 19801 19802 @<Break the current page at node |p|, put it...@>= 19803 if p<>null then 19804 begin if link(contrib_head)=null then 19805 if nest_ptr=0 then tail:=page_tail 19806 else contrib_tail:=page_tail; 19807 link(page_tail):=link(contrib_head); 19808 link(contrib_head):=p; 19809 link(prev_p):=null; 19810 end; 19811 save_vbadness:=vbadness; vbadness:=inf_bad; 19812 save_vfuzz:=vfuzz; vfuzz:=max_dimen; {inhibit error messages} 19813 box(255):=vpackage(link(page_head),best_size,exactly,page_max_depth); 19814 vbadness:=save_vbadness; vfuzz:=save_vfuzz; 19815 if last_glue<>max_halfword then delete_glue_ref(last_glue); 19816 @<Start a new current page@>; {this sets |last_glue:=max_halfword|} 19817 if q<>hold_head then 19818 begin link(page_head):=link(hold_head); page_tail:=q; 19819 end 19820 19821 @ If many insertions are supposed to go into the same box, we want to know 19822 the position of the last node in that box, so that we don't need to waste time 19823 when linking further information into it. The |last_ins_ptr| fields of the 19824 page insertion nodes are therefore used for this purpose during the 19825 packaging phase. 19826 19827 @<Prepare all the boxes involved in insertions to act as queues@>= 19828 begin r:=link(page_ins_head); 19829 while r<>page_ins_head do 19830 begin if best_ins_ptr(r)<>null then 19831 begin n:=qo(subtype(r)); ensure_vbox(n); 19832 if box(n)=null then box(n):=new_null_box; 19833 p:=box(n)+list_offset; 19834 while link(p)<>null do p:=link(p); 19835 last_ins_ptr(r):=p; 19836 end; 19837 r:=link(r); 19838 end; 19839 end 19840 19841 @ @<Delete \(t)the page-insertion nodes@>= 19842 r:=link(page_ins_head); 19843 while r<>page_ins_head do 19844 begin q:=link(r); free_node(r,page_ins_node_size); r:=q; 19845 end; 19846 link(page_ins_head):=page_ins_head 19847 19848 @ We will set |best_ins_ptr:=null| and package the box corresponding to 19849 insertion node~|r|, just after making the final insertion into that box. 19850 If this final insertion is `|split_up|', the remainder after splitting 19851 and pruning (if any) will be carried over to the next page. 19852 19853 @<Either insert the material specified by node |p| into...@>= 19854 begin r:=link(page_ins_head); 19855 while subtype(r)<>subtype(p) do r:=link(r); 19856 if best_ins_ptr(r)=null then wait:=true 19857 else begin wait:=false; s:=last_ins_ptr(r); link(s):=ins_ptr(p); 19858 if best_ins_ptr(r)=p then 19859 @<Wrap up the box specified by node |r|, splitting node |p| if 19860 called for; set |wait:=true| if node |p| holds a remainder after 19861 splitting@> 19862 else begin while link(s)<>null do s:=link(s); 19863 last_ins_ptr(r):=s; 19864 end; 19865 end; 19866 @<Either append the insertion node |p| after node |q|, and remove it 19867 from the current page, or delete |node(p)|@>; 19868 end 19869 19870 @ @<Wrap up the box specified by node |r|, splitting node |p| if...@>= 19871 begin if type(r)=split_up then 19872 if (broken_ins(r)=p)and(broken_ptr(r)<>null) then 19873 begin while link(s)<>broken_ptr(r) do s:=link(s); 19874 link(s):=null; 19875 split_top_skip:=split_top_ptr(p); 19876 ins_ptr(p):=prune_page_top(broken_ptr(r)); 19877 if ins_ptr(p)<>null then 19878 begin temp_ptr:=vpack(ins_ptr(p),natural); 19879 height(p):=height(temp_ptr)+depth(temp_ptr); 19880 free_node(temp_ptr,box_node_size); wait:=true; 19881 end; 19882 end; 19883 best_ins_ptr(r):=null; 19884 n:=qo(subtype(r)); 19885 temp_ptr:=list_ptr(box(n)); 19886 free_node(box(n),box_node_size); 19887 box(n):=vpack(temp_ptr,natural); 19888 end 19889 19890 @ @<Either append the insertion node |p|...@>= 19891 link(prev_p):=link(p); link(p):=null; 19892 if wait then 19893 begin link(q):=p; q:=p; incr(insert_penalties); 19894 end 19895 else begin delete_glue_ref(split_top_ptr(p)); 19896 free_node(p,ins_node_size); 19897 end; 19898 p:=prev_p 19899 19900 @ The list of heldover insertions, running from |link(page_head)| to 19901 |page_tail|, must be moved to the contribution list when the user has 19902 specified no output routine. 19903 19904 @<Perform the default output routine@>= 19905 begin if link(page_head)<>null then 19906 begin if link(contrib_head)=null then 19907 if nest_ptr=0 then tail:=page_tail@+else contrib_tail:=page_tail 19908 else link(page_tail):=link(contrib_head); 19909 link(contrib_head):=link(page_head); 19910 link(page_head):=null; page_tail:=page_head; 19911 end; 19912 ship_out(box(255)); box(255):=null; 19913 end 19914 19915 @ @<Explain that too many dead cycles have occurred in a row@>= 19916 begin print_err("Output loop---"); print_int(dead_cycles); 19917 @.Output loop...@> 19918 print(" consecutive dead cycles"); 19919 help3("I've concluded that your \output is awry; it never does a")@/ 19920 ("\shipout, so I'm shipping \box255 out myself. Next time")@/ 19921 ("increase \maxdeadcycles if you want me to be more patient!"); error; 19922 end 19923 19924 @ @<Fire up the user's output routine and |return|@>= 19925 begin output_active:=true; 19926 incr(dead_cycles); 19927 push_nest; mode:=-vmode; prev_depth:=ignore_depth; mode_line:=-line; 19928 begin_token_list(output_routine,output_text); 19929 new_save_level(output_group); normal_paragraph; 19930 scan_left_brace; 19931 return; 19932 end 19933 19934 @ When the user's output routine finishes, it has constructed a vlist 19935 in internal vertical mode, and \TeX\ will do the following: 19936 19937 @<Resume the page builder after an output routine has come to an end@>= 19938 begin if (loc<>null) or 19939 ((token_type<>output_text)and(token_type<>backed_up)) then 19940 @<Recover from an unbalanced output routine@>; 19941 end_token_list; {conserve stack space in case more outputs are triggered} 19942 end_graf; unsave; output_active:=false; insert_penalties:=0;@/ 19943 @<Ensure that box 255 is empty after output@>; 19944 if tail<>head then {current list goes after heldover insertions} 19945 begin link(page_tail):=link(head); 19946 page_tail:=tail; 19947 end; 19948 if link(page_head)<>null then {and both go before heldover contributions} 19949 begin if link(contrib_head)=null then contrib_tail:=page_tail; 19950 link(page_tail):=link(contrib_head); 19951 link(contrib_head):=link(page_head); 19952 link(page_head):=null; page_tail:=page_head; 19953 end; 19954 pop_nest; build_page; 19955 end 19956 19957 @ @<Recover from an unbalanced output routine@>= 19958 begin print_err("Unbalanced output routine"); 19959 @.Unbalanced output routine@> 19960 help2("Your sneaky output routine has problematic {'s and/or }'s.")@/ 19961 ("I can't handle that very well; good luck."); error; 19962 repeat get_token; 19963 until loc=null; 19964 end {loops forever if reading from a file, since |null=min_halfword<=0|} 19965 19966 @ @<Ensure that box 255 is empty after output@>= 19967 if box(255)<>null then 19968 begin print_err("Output routine didn't use all of "); 19969 print_esc("box"); print_int(255); 19970 @.Output routine didn't use...@> 19971 help3("Your \output commands should empty \box255,")@/ 19972 ("e.g., by saying `\shipout\box255'.")@/ 19973 ("Proceed; I'll discard its present contents."); 19974 box_error(255); 19975 end 19976 19977 @* \[46] The chief executive. 19978 We come now to the |main_control| routine, which contains the master 19979 switch that causes all the various pieces of \TeX\ to do their things, 19980 in the right order. 19981 19982 In a sense, this is the grand climax of the program: It applies all the 19983 tools that we have worked so hard to construct. In another sense, this is 19984 the messiest part of the program: It necessarily refers to other pieces 19985 of code all over the place, so that a person can't fully understand what is 19986 going on without paging back and forth to be reminded of conventions that 19987 are defined elsewhere. We are now at the hub of the web, the central nervous 19988 system that touches most of the other parts and ties them together. 19989 @^brain@> 19990 19991 The structure of |main_control| itself is quite simple. There's a label 19992 called |big_switch|, at which point the next token of input is fetched 19993 using |get_x_token|. Then the program branches at high speed into one of 19994 about 100 possible directions, based on the value of the current 19995 mode and the newly fetched command code; the sum |abs(mode)+cur_cmd| 19996 indicates what to do next. For example, the case `|vmode+letter|' arises 19997 when a letter occurs in vertical mode (or internal vertical mode); this 19998 case leads to instructions that initialize a new paragraph and enter 19999 horizontal mode. 20000 20001 The big |case| statement that contains this multiway switch has been labeled 20002 |reswitch|, so that the program can |goto reswitch| when the next token 20003 has already been fetched. Most of the cases are quite short; they call 20004 an ``action procedure'' that does the work for that case, and then they 20005 either |goto reswitch| or they ``fall through'' to the end of the |case| 20006 statement, which returns control back to |big_switch|. Thus, |main_control| 20007 is not an extremely large procedure, in spite of the multiplicity of things 20008 it must do; it is small enough to be handled by \PASCAL\ compilers that put 20009 severe restrictions on procedure size. 20010 @!@^action procedure@> 20011 20012 One case is singled out for special treatment, because it accounts for most 20013 of \TeX's activities in typical applications. The process of reading simple 20014 text and converting it into |char_node| records, while looking for ligatures 20015 and kerns, is part of \TeX's ``inner loop''; the whole program runs 20016 efficiently when its inner loop is fast, so this part has been written 20017 with particular care. 20018 20019 @ We shall concentrate first on the inner loop of |main_control|, deferring 20020 consideration of the other cases until later. 20021 @^inner loop@> 20022 20023 @d big_switch=60 {go here to branch on the next token of input} 20024 @d main_loop=70 {go here to typeset a string of consecutive characters} 20025 @d main_loop_wrapup=80 {go here to finish a character or ligature} 20026 @d main_loop_move=90 {go here to advance the ligature cursor} 20027 @d main_loop_move_lig=95 {same, when advancing past a generated ligature} 20028 @d main_loop_lookahead=100 {go here to bring in another character, if any} 20029 @d main_lig_loop=110 {go here to check for ligatures or kerning} 20030 @d append_normal_space=120 {go here to append a normal space between words} 20031 20032 @p @t\4@>@<Declare action procedures for use by |main_control|@>@; 20033 @t\4@>@<Declare the procedure called |handle_right_brace|@>@; 20034 procedure main_control; {governs \TeX's activities} 20035 label big_switch,reswitch,main_loop,main_loop_wrapup, 20036 main_loop_move,main_loop_move+1,main_loop_move+2,main_loop_move_lig, 20037 main_loop_lookahead,main_loop_lookahead+1, 20038 main_lig_loop,main_lig_loop+1,main_lig_loop+2, 20039 append_normal_space,exit; 20040 var@!t:integer; {general-purpose temporary variable} 20041 begin if every_job<>null then begin_token_list(every_job,every_job_text); 20042 big_switch: get_x_token;@/ 20043 reswitch: @<Give diagnostic information, if requested@>; 20044 case abs(mode)+cur_cmd of 20045 hmode+letter,hmode+other_char,hmode+char_given: goto main_loop; 20046 hmode+char_num: begin scan_char_num; cur_chr:=cur_val; goto main_loop;@+end; 20047 hmode+no_boundary: begin get_x_token; 20048 if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given)or 20049 (cur_cmd=char_num) then cancel_boundary:=true; 20050 goto reswitch; 20051 end; 20052 hmode+spacer: if space_factor=1000 then goto append_normal_space 20053 else app_space; 20054 hmode+ex_space,mmode+ex_space: goto append_normal_space; 20055 @t\4@>@<Cases of |main_control| that are not part of the inner loop@>@; 20056 end; {of the big |case| statement} 20057 goto big_switch; 20058 main_loop:@<Append character |cur_chr| and the following characters (if~any) 20059 to the current hlist in the current font; |goto reswitch| when 20060 a non-character has been fetched@>; 20061 append_normal_space:@<Append a normal inter-word space to the current list, 20062 then |goto big_switch|@>; 20063 exit:end; 20064 20065 @ When a new token has just been fetched at |big_switch|, we have an 20066 ideal place to monitor \TeX's activity. 20067 @^debugging@> 20068 20069 @<Give diagnostic information, if requested@>= 20070 if interrupt<>0 then if OK_to_interrupt then 20071 begin back_input; check_interrupt; goto big_switch; 20072 end; 20073 @!debug if panicking then check_mem(false);@+@;@+gubed 20074 if tracing_commands>0 then show_cur_cmd_chr 20075 20076 @ The following part of the program was first written in a structured 20077 manner, according to the philosophy that ``premature optimization is 20078 the root of all evil.'' Then it was rearranged into pieces of 20079 spaghetti so that the most common actions could proceed with little or 20080 no redundancy. 20081 20082 The original unoptimized form of this algorithm resembles the 20083 |reconstitute| procedure, which was described earlier in connection with 20084 hyphenation. Again we have an implied ``cursor'' between characters 20085 |cur_l| and |cur_r|. The main difference is that the |lig_stack| can now 20086 contain a charnode as well as pseudo-ligatures; that stack is now 20087 usually nonempty, because the next character of input (if any) has been 20088 appended to it. In |main_control| we have 20089 $$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr 20090 |font_bchar[cur_font]|,&otherwise;\cr}$$ 20091 except when |character(lig_stack)=font_false_bchar[cur_font]|. 20092 Several additional global variables are needed. 20093 20094 @<Glob...@>= 20095 @!main_f:internal_font_number; {the current font} 20096 @!main_i:four_quarters; {character information bytes for |cur_l|} 20097 @!main_j:four_quarters; {ligature/kern command} 20098 @!main_k:font_index; {index into |font_info|} 20099 @!main_p:pointer; {temporary register for list manipulation} 20100 @!main_s:integer; {space factor value} 20101 @!bchar:halfword; {boundary character of current font, or |non_char|} 20102 @!false_bchar:halfword; {nonexistent character matching |bchar|, or |non_char|} 20103 @!cancel_boundary:boolean; {should the left boundary be ignored?} 20104 @!ins_disc:boolean; {should we insert a discretionary node?} 20105 20106 @ The boolean variables of the main loop are normally false, and always reset 20107 to false before the loop is left. That saves us the extra work of initializing 20108 each time. 20109 20110 @<Set init...@>= 20111 ligature_present:=false; cancel_boundary:=false; lft_hit:=false; rt_hit:=false; 20112 ins_disc:=false; 20113 20114 @ We leave the |space_factor| unchanged if |sf_code(cur_chr)=0|; otherwise we 20115 set it equal to |sf_code(cur_chr)|, except that it should never change 20116 from a value less than 1000 to a value exceeding 1000. The most common 20117 case is |sf_code(cur_chr)=1000|, so we want that case to be fast. 20118 20119 The overall structure of the main loop is presented here. Some program labels 20120 are inside the individual sections. 20121 @^inner loop@> 20122 20123 @d adjust_space_factor==@t@>@;@/ 20124 main_s:=sf_code(cur_chr); 20125 if main_s=1000 then space_factor:=1000 20126 else if main_s<1000 then 20127 begin if main_s>0 then space_factor:=main_s; 20128 end 20129 else if space_factor<1000 then space_factor:=1000 20130 else space_factor:=main_s 20131 20132 @<Append character |cur_chr|...@>= 20133 adjust_space_factor;@/ 20134 main_f:=cur_font; 20135 bchar:=font_bchar[main_f]; false_bchar:=font_false_bchar[main_f]; 20136 if mode>0 then if language<>clang then fix_language; 20137 fast_get_avail(lig_stack); font(lig_stack):=main_f; cur_l:=qi(cur_chr); 20138 character(lig_stack):=cur_l;@/ 20139 cur_q:=tail; 20140 if cancel_boundary then 20141 begin cancel_boundary:=false; main_k:=non_address; 20142 end 20143 else main_k:=bchar_label[main_f]; 20144 if main_k=non_address then goto main_loop_move+2; {no left boundary processing} 20145 cur_r:=cur_l; cur_l:=non_char; 20146 goto main_lig_loop+1; {begin with cursor after left boundary} 20147 @# 20148 main_loop_wrapup:@<Make a ligature node, if |ligature_present|; 20149 insert a null discretionary, if appropriate@>; 20150 main_loop_move:@<If the cursor is immediately followed by the right boundary, 20151 |goto reswitch|; if it's followed by an invalid character, |goto big_switch|; 20152 otherwise move the cursor one step to the right and |goto main_lig_loop|@>; 20153 main_loop_lookahead:@<Look ahead for another character, or leave |lig_stack| 20154 empty if there's none there@>; 20155 main_lig_loop:@<If there's a ligature/kern command relevant to |cur_l| and 20156 |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|@>; 20157 main_loop_move_lig:@<Move the cursor past a pseudo-ligature, then 20158 |goto main_loop_lookahead| or |main_lig_loop|@> 20159 20160 @ If |link(cur_q)| is nonnull when |wrapup| is invoked, |cur_q| points to 20161 the list of characters that were consumed while building the ligature 20162 character~|cur_l|. 20163 20164 A discretionary break is not inserted for an explicit hyphen when we are in 20165 restricted horizontal mode. In particular, this avoids putting discretionary 20166 nodes inside of other discretionaries. 20167 @^inner loop@> 20168 20169 @d pack_lig(#)== {the parameter is either |rt_hit| or |false|} 20170 begin main_p:=new_ligature(main_f,cur_l,link(cur_q)); 20171 if lft_hit then 20172 begin subtype(main_p):=2; lft_hit:=false; 20173 end; 20174 if # then if lig_stack=null then 20175 begin incr(subtype(main_p)); rt_hit:=false; 20176 end; 20177 link(cur_q):=main_p; tail:=main_p; ligature_present:=false; 20178 end 20179 20180 @d wrapup(#)==if cur_l<non_char then 20181 begin if link(cur_q)>null then 20182 if character(tail)=qi(hyphen_char[main_f]) then ins_disc:=true; 20183 if ligature_present then pack_lig(#); 20184 if ins_disc then 20185 begin ins_disc:=false; 20186 if mode>0 then tail_append(new_disc); 20187 end; 20188 end 20189 20190 @<Make a ligature node, if |ligature_present|;...@>= 20191 wrapup(rt_hit) 20192 20193 @ @<If the cursor is immediately followed by the right boundary...@>= 20194 @^inner loop@> 20195 if lig_stack=null then goto reswitch; 20196 cur_q:=tail; cur_l:=character(lig_stack); 20197 main_loop_move+1:if not is_char_node(lig_stack) then goto main_loop_move_lig; 20198 main_loop_move+2:if(cur_chr<font_bc[main_f])or(cur_chr>font_ec[main_f]) then 20199 begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch; 20200 end; 20201 main_i:=char_info(main_f)(cur_l); 20202 if not char_exists(main_i) then 20203 begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch; 20204 end; 20205 link(tail):=lig_stack; tail:=lig_stack {|main_loop_lookahead| is next} 20206 20207 @ Here we are at |main_loop_move_lig|. 20208 When we begin this code we have |cur_q=tail| and |cur_l=character(lig_stack)|. 20209 20210 @<Move the cursor past a pseudo-ligature...@>= 20211 main_p:=lig_ptr(lig_stack); 20212 if main_p>null then tail_append(main_p); {append a single character} 20213 temp_ptr:=lig_stack; lig_stack:=link(temp_ptr); 20214 free_node(temp_ptr,small_node_size); 20215 main_i:=char_info(main_f)(cur_l); ligature_present:=true; 20216 if lig_stack=null then 20217 if main_p>null then goto main_loop_lookahead 20218 else cur_r:=bchar 20219 else cur_r:=character(lig_stack); 20220 goto main_lig_loop 20221 20222 @ The result of \.{\\char} can participate in a ligature or kern, so we must 20223 look ahead for it. 20224 20225 @<Look ahead for another character...@>= 20226 get_next; {set only |cur_cmd| and |cur_chr|, for speed} 20227 if cur_cmd=letter then goto main_loop_lookahead+1; 20228 if cur_cmd=other_char then goto main_loop_lookahead+1; 20229 if cur_cmd=char_given then goto main_loop_lookahead+1; 20230 x_token; {now expand and set |cur_cmd|, |cur_chr|, |cur_tok|} 20231 if cur_cmd=letter then goto main_loop_lookahead+1; 20232 if cur_cmd=other_char then goto main_loop_lookahead+1; 20233 if cur_cmd=char_given then goto main_loop_lookahead+1; 20234 if cur_cmd=char_num then 20235 begin scan_char_num; cur_chr:=cur_val; goto main_loop_lookahead+1; 20236 end; 20237 if cur_cmd=no_boundary then bchar:=non_char; 20238 cur_r:=bchar; lig_stack:=null; goto main_lig_loop; 20239 main_loop_lookahead+1: adjust_space_factor; 20240 fast_get_avail(lig_stack); font(lig_stack):=main_f; 20241 cur_r:=qi(cur_chr); character(lig_stack):=cur_r; 20242 if cur_r=false_bchar then cur_r:=non_char {this prevents spurious ligatures} 20243 20244 @ Even though comparatively few characters have a lig/kern program, several 20245 of the instructions here count as part of \TeX's inner loop, since a 20246 @^inner loop@> 20247 potentially long sequential search must be performed. For example, tests with 20248 Computer Modern Roman showed that about 40 per cent of all characters 20249 actually encountered in practice had a lig/kern program, and that about four 20250 lig/kern commands were investigated for every such character. 20251 20252 At the beginning of this code we have |main_i=char_info(main_f)(cur_l)|. 20253 20254 @<If there's a ligature/kern command...@>= 20255 if char_tag(main_i)<>lig_tag then goto main_loop_wrapup; 20256 if cur_r=non_char then goto main_loop_wrapup; 20257 main_k:=lig_kern_start(main_f)(main_i); main_j:=font_info[main_k].qqqq; 20258 if skip_byte(main_j)<=stop_flag then goto main_lig_loop+2; 20259 main_k:=lig_kern_restart(main_f)(main_j); 20260 main_lig_loop+1:main_j:=font_info[main_k].qqqq; 20261 main_lig_loop+2:if next_char(main_j)=cur_r then 20262 if skip_byte(main_j)<=stop_flag then 20263 @<Do ligature or kern command, returning to |main_lig_loop| 20264 or |main_loop_wrapup| or |main_loop_move|@>; 20265 if skip_byte(main_j)=qi(0) then incr(main_k) 20266 else begin if skip_byte(main_j)>=stop_flag then goto main_loop_wrapup; 20267 main_k:=main_k+qo(skip_byte(main_j))+1; 20268 end; 20269 goto main_lig_loop+1 20270 20271 @ When a ligature or kern instruction matches a character, we know from 20272 |read_font_info| that the character exists in the font, even though we 20273 haven't verified its existence in the normal way. 20274 20275 This section could be made into a subroutine, if the code inside 20276 |main_control| needs to be shortened. 20277 20278 \chardef\?='174 % vertical line to indicate character retention 20279 20280 @<Do ligature or kern command...@>= 20281 begin if op_byte(main_j)>=kern_flag then 20282 begin wrapup(rt_hit); 20283 tail_append(new_kern(char_kern(main_f)(main_j))); goto main_loop_move; 20284 end; 20285 if cur_l=non_char then lft_hit:=true 20286 else if lig_stack=null then rt_hit:=true; 20287 check_interrupt; {allow a way out in case there's an infinite ligature loop} 20288 case op_byte(main_j) of 20289 qi(1),qi(5):begin cur_l:=rem_byte(main_j); {\.{=:\?}, \.{=:\?>}} 20290 main_i:=char_info(main_f)(cur_l); ligature_present:=true; 20291 end; 20292 qi(2),qi(6):begin cur_r:=rem_byte(main_j); {\.{\?=:}, \.{\?=:>}} 20293 if lig_stack=null then {right boundary character is being consumed} 20294 begin lig_stack:=new_lig_item(cur_r); bchar:=non_char; 20295 end 20296 else if is_char_node(lig_stack) then {|link(lig_stack)=null|} 20297 begin main_p:=lig_stack; lig_stack:=new_lig_item(cur_r); 20298 lig_ptr(lig_stack):=main_p; 20299 end 20300 else character(lig_stack):=cur_r; 20301 end; 20302 qi(3):begin cur_r:=rem_byte(main_j); {\.{\?=:\?}} 20303 main_p:=lig_stack; lig_stack:=new_lig_item(cur_r); 20304 link(lig_stack):=main_p; 20305 end; 20306 qi(7),qi(11):begin wrapup(false); {\.{\?=:\?>}, \.{\?=:\?>>}} 20307 cur_q:=tail; cur_l:=rem_byte(main_j); 20308 main_i:=char_info(main_f)(cur_l); ligature_present:=true; 20309 end; 20310 othercases begin cur_l:=rem_byte(main_j); ligature_present:=true; {\.{=:}} 20311 if lig_stack=null then goto main_loop_wrapup 20312 else goto main_loop_move+1; 20313 end 20314 endcases; 20315 if op_byte(main_j)>qi(4) then 20316 if op_byte(main_j)<>qi(7) then goto main_loop_wrapup; 20317 if cur_l<non_char then goto main_lig_loop; 20318 main_k:=bchar_label[main_f]; goto main_lig_loop+1; 20319 end 20320 20321 @ The occurrence of blank spaces is almost part of \TeX's inner loop, 20322 @^inner loop@> 20323 since we usually encounter about one space for every five non-blank characters. 20324 Therefore |main_control| gives second-highest priority to ordinary spaces. 20325 20326 When a glue parameter like \.{\\spaceskip} is set to `\.{0pt}', we will 20327 see to it later that the corresponding glue specification is precisely 20328 |zero_glue|, not merely a pointer to some specification that happens 20329 to be full of zeroes. Therefore it is simple to test whether a glue parameter 20330 is zero or~not. 20331 20332 @<Append a normal inter-word space...@>= 20333 if space_skip=zero_glue then 20334 begin @<Find the glue specification, |main_p|, for 20335 text spaces in the current font@>; 20336 temp_ptr:=new_glue(main_p); 20337 end 20338 else temp_ptr:=new_param_glue(space_skip_code); 20339 link(tail):=temp_ptr; tail:=temp_ptr; 20340 goto big_switch 20341 20342 @ Having |font_glue| allocated for each text font saves both time and memory. 20343 If any of the three spacing parameters are subsequently changed by the 20344 use of \.{\\fontdimen}, the |find_font_dimen| procedure deallocates the 20345 |font_glue| specification allocated here. 20346 20347 @<Find the glue specification...@>= 20348 begin main_p:=font_glue[cur_font]; 20349 if main_p=null then 20350 begin main_p:=new_spec(zero_glue); main_k:=param_base[cur_font]+space_code; 20351 width(main_p):=font_info[main_k].sc; {that's |space(cur_font)|} 20352 stretch(main_p):=font_info[main_k+1].sc; {and |space_stretch(cur_font)|} 20353 shrink(main_p):=font_info[main_k+2].sc; {and |space_shrink(cur_font)|} 20354 font_glue[cur_font]:=main_p; 20355 end; 20356 end 20357 20358 @ @<Declare act...@>= 20359 procedure app_space; {handle spaces when |space_factor<>1000|} 20360 var@!q:pointer; {glue node} 20361 begin if (space_factor>=2000)and(xspace_skip<>zero_glue) then 20362 q:=new_param_glue(xspace_skip_code) 20363 else begin if space_skip<>zero_glue then main_p:=space_skip 20364 else @<Find the glue specification...@>; 20365 main_p:=new_spec(main_p); 20366 @<Modify the glue specification in |main_p| according to the space factor@>; 20367 q:=new_glue(main_p); glue_ref_count(main_p):=null; 20368 end; 20369 link(tail):=q; tail:=q; 20370 end; 20371 20372 @ @<Modify the glue specification in |main_p| according to the space factor@>= 20373 if space_factor>=2000 then width(main_p):=width(main_p)+extra_space(cur_font); 20374 stretch(main_p):=xn_over_d(stretch(main_p),space_factor,1000); 20375 shrink(main_p):=xn_over_d(shrink(main_p),1000,space_factor) 20376 20377 @ Whew---that covers the main loop. We can now proceed at a leisurely 20378 pace through the other combinations of possibilities. 20379 20380 @d any_mode(#)==vmode+#,hmode+#,mmode+# {for mode-independent commands} 20381 20382 @<Cases of |main_control| that are not part of the inner loop@>= 20383 any_mode(relax),vmode+spacer,mmode+spacer,mmode+no_boundary:do_nothing; 20384 any_mode(ignore_spaces): begin @<Get the next non-blank non-call...@>; 20385 goto reswitch; 20386 end; 20387 vmode+stop: if its_all_over then return; {this is the only way out} 20388 @t\4@>@<Forbidden cases detected in |main_control|@>@+@,any_mode(mac_param): 20389 report_illegal_case; 20390 @<Math-only cases in non-math modes, or vice versa@>: insert_dollar_sign; 20391 @t\4@>@<Cases of |main_control| that build boxes and lists@>@; 20392 @t\4@>@<Cases of |main_control| that don't depend on |mode|@>@; 20393 @t\4@>@<Cases of |main_control| that are for extensions to \TeX@>@; 20394 20395 @ Here is a list of cases where the user has probably gotten into or out of math 20396 mode by mistake. \TeX\ will insert a dollar sign and rescan the current token. 20397 20398 @d non_math(#)==vmode+#,hmode+# 20399 20400 @<Math-only cases in non-math modes...@>= 20401 non_math(sup_mark), non_math(sub_mark), non_math(math_char_num), 20402 non_math(math_given), non_math(math_comp), non_math(delim_num), 20403 non_math(left_right), non_math(above), non_math(radical), 20404 non_math(math_style), non_math(math_choice), non_math(vcenter), 20405 non_math(non_script), non_math(mkern), non_math(limit_switch), 20406 non_math(mskip), non_math(math_accent), 20407 mmode+endv, mmode+par_end, mmode+stop, mmode+vskip, mmode+un_vbox, 20408 mmode+valign, mmode+hrule 20409 20410 @ @<Declare action...@>= 20411 procedure insert_dollar_sign; 20412 begin back_input; cur_tok:=math_shift_token+"$"; 20413 print_err("Missing $ inserted"); 20414 @.Missing \$ inserted@> 20415 help2("I've inserted a begin-math/end-math symbol since I think")@/ 20416 ("you left one out. Proceed, with fingers crossed."); ins_error; 20417 end; 20418 20419 @ When erroneous situations arise, \TeX\ usually issues an error message 20420 specific to the particular error. For example, `\.{\\noalign}' should 20421 not appear in any mode, since it is recognized by the |align_peek| routine 20422 in all of its legitimate appearances; a special error message is given 20423 when `\.{\\noalign}' occurs elsewhere. But sometimes the most appropriate 20424 error message is simply that the user is not allowed to do what he or she 20425 has attempted. For example, `\.{\\moveleft}' is allowed only in vertical mode, 20426 and `\.{\\lower}' only in non-vertical modes. Such cases are enumerated 20427 here and in the other sections referred to under `See also \dots.' 20428 20429 @<Forbidden cases...@>= 20430 vmode+vmove,hmode+hmove,mmode+hmove,any_mode(last_item), 20431 20432 @ The `|you_cant|' procedure prints a line saying that the current command 20433 is illegal in the current mode; it identifies these things symbolically. 20434 20435 @<Declare action...@>= 20436 procedure you_cant; 20437 begin print_err("You can't use `"); 20438 @.You can't use x in y mode@> 20439 print_cmd_chr(cur_cmd,cur_chr); 20440 print("' in "); print_mode(mode); 20441 end; 20442 20443 @ @<Declare act...@>= 20444 procedure report_illegal_case; 20445 begin you_cant; 20446 help4("Sorry, but I'm not programmed to handle this case;")@/ 20447 ("I'll just pretend that you didn't ask for it.")@/ 20448 ("If you're in the wrong mode, you might be able to")@/ 20449 ("return to the right one by typing `I}' or `I$' or `I\par'.");@/ 20450 error; 20451 end; 20452 20453 @ Some operations are allowed only in privileged modes, i.e., in cases 20454 that |mode>0|. The |privileged| function is used to detect violations 20455 of this rule; it issues an error message and returns |false| if the 20456 current |mode| is negative. 20457 20458 @<Declare act...@>= 20459 function privileged:boolean; 20460 begin if mode>0 then privileged:=true 20461 else begin report_illegal_case; privileged:=false; 20462 end; 20463 end; 20464 20465 @ Either \.{\\dump} or \.{\\end} will cause |main_control| to enter the 20466 endgame, since both of them have `|stop|' as their command code. 20467 20468 @<Put each...@>= 20469 primitive("end",stop,0);@/ 20470 @!@:end_}{\.{\\end} primitive@> 20471 primitive("dump",stop,1);@/ 20472 @!@:dump_}{\.{\\dump} primitive@> 20473 20474 @ @<Cases of |print_cmd_chr|...@>= 20475 stop:if chr_code=1 then print_esc("dump")@+else print_esc("end"); 20476 20477 @ We don't want to leave |main_control| immediately when a |stop| command 20478 is sensed, because it may be necessary to invoke an \.{\\output} routine 20479 several times before things really grind to a halt. (The output routine 20480 might even say `\.{\\gdef\\end\{...\}}', to prolong the life of the job.) 20481 Therefore |its_all_over| is |true| only when the current page 20482 and contribution list are empty, and when the last output was not a 20483 ``dead cycle.'' 20484 20485 @<Declare act...@>= 20486 function its_all_over:boolean; {do this when \.{\\end} or \.{\\dump} occurs} 20487 label exit; 20488 begin if privileged then 20489 begin if (page_head=page_tail)and(head=tail)and(dead_cycles=0) then 20490 begin its_all_over:=true; return; 20491 end; 20492 back_input; {we will try to end again after ejecting residual material} 20493 tail_append(new_null_box); 20494 width(tail):=hsize; 20495 tail_append(new_glue(fill_glue)); 20496 tail_append(new_penalty(-@'10000000000));@/ 20497 build_page; {append \.{\\hbox to \\hsize\{\}\\vfill\\penalty-'10000000000}} 20498 end; 20499 its_all_over:=false; 20500 exit:end; 20501 20502 @* \[47] Building boxes and lists. 20503 The most important parts of |main_control| are concerned with \TeX's 20504 chief mission of box-making. We need to control the activities that put 20505 entries on vlists and hlists, as well as the activities that convert 20506 those lists into boxes. All of the necessary machinery has already been 20507 developed; it remains for us to ``push the buttons'' at the right times. 20508 20509 @ As an introduction to these routines, let's consider one of the simplest 20510 cases: What happens when `\.{\\hrule}' occurs in vertical mode, or 20511 `\.{\\vrule}' in horizontal mode or math mode? The code in |main_control| 20512 is short, since the |scan_rule_spec| routine already does most of what is 20513 required; thus, there is no need for a special action procedure. 20514 20515 Note that baselineskip calculations are disabled after a rule in vertical 20516 mode, by setting |prev_depth:=ignore_depth|. 20517 20518 @<Cases of |main_control| that build...@>= 20519 vmode+hrule,hmode+vrule,mmode+vrule: begin tail_append(scan_rule_spec); 20520 if abs(mode)=vmode then prev_depth:=ignore_depth 20521 else if abs(mode)=hmode then space_factor:=1000; 20522 end; 20523 20524 @ The processing of things like \.{\\hskip} and \.{\\vskip} is slightly 20525 more complicated. But the code in |main_control| is very short, since 20526 it simply calls on the action routine |append_glue|. Similarly, \.{\\kern} 20527 activates |append_kern|. 20528 20529 @<Cases of |main_control| that build...@>= 20530 vmode+vskip,hmode+hskip,mmode+hskip,mmode+mskip: append_glue; 20531 any_mode(kern),mmode+mkern: append_kern; 20532 20533 @ The |hskip| and |vskip| command codes are used for control sequences 20534 like \.{\\hss} and \.{\\vfil} as well as for \.{\\hskip} and \.{\\vskip}. 20535 The difference is in the value of |cur_chr|. 20536 20537 @d fil_code=0 {identifies \.{\\hfil} and \.{\\vfil}} 20538 @d fill_code=1 {identifies \.{\\hfill} and \.{\\vfill}} 20539 @d ss_code=2 {identifies \.{\\hss} and \.{\\vss}} 20540 @d fil_neg_code=3 {identifies \.{\\hfilneg} and \.{\\vfilneg}} 20541 @d skip_code=4 {identifies \.{\\hskip} and \.{\\vskip}} 20542 @d mskip_code=5 {identifies \.{\\mskip}} 20543 20544 @<Put each...@>= 20545 primitive("hskip",hskip,skip_code);@/ 20546 @!@:hskip_}{\.{\\hskip} primitive@> 20547 primitive("hfil",hskip,fil_code); 20548 @!@:hfil_}{\.{\\hfil} primitive@> 20549 primitive("hfill",hskip,fill_code);@/ 20550 @!@:hfill_}{\.{\\hfill} primitive@> 20551 primitive("hss",hskip,ss_code); 20552 @!@:hss_}{\.{\\hss} primitive@> 20553 primitive("hfilneg",hskip,fil_neg_code);@/ 20554 @!@:hfil_neg_}{\.{\\hfilneg} primitive@> 20555 primitive("vskip",vskip,skip_code);@/ 20556 @!@:vskip_}{\.{\\vskip} primitive@> 20557 primitive("vfil",vskip,fil_code); 20558 @!@:vfil_}{\.{\\vfil} primitive@> 20559 primitive("vfill",vskip,fill_code);@/ 20560 @!@:vfill_}{\.{\\vfill} primitive@> 20561 primitive("vss",vskip,ss_code); 20562 @!@:vss_}{\.{\\vss} primitive@> 20563 primitive("vfilneg",vskip,fil_neg_code);@/ 20564 @!@:vfil_neg_}{\.{\\vfilneg} primitive@> 20565 primitive("mskip",mskip,mskip_code);@/ 20566 @!@:mskip_}{\.{\\mskip} primitive@> 20567 primitive("kern",kern,explicit); 20568 @!@:kern_}{\.{\\kern} primitive@> 20569 primitive("mkern",mkern,mu_glue);@/ 20570 @!@:mkern_}{\.{\\mkern} primitive@> 20571 20572 @ @<Cases of |print_cmd_chr|...@>= 20573 hskip: case chr_code of 20574 skip_code:print_esc("hskip"); 20575 fil_code:print_esc("hfil"); 20576 fill_code:print_esc("hfill"); 20577 ss_code:print_esc("hss"); 20578 othercases print_esc("hfilneg") 20579 endcases; 20580 vskip: case chr_code of 20581 skip_code:print_esc("vskip"); 20582 fil_code:print_esc("vfil"); 20583 fill_code:print_esc("vfill"); 20584 ss_code:print_esc("vss"); 20585 othercases print_esc("vfilneg") 20586 endcases; 20587 mskip: print_esc("mskip"); 20588 kern: print_esc("kern"); 20589 mkern: print_esc("mkern"); 20590 20591 @ All the work relating to glue creation has been relegated to the 20592 following subroutine. It does not call |build_page|, because it is 20593 used in at least one place where that would be a mistake. 20594 20595 @<Declare action...@>= 20596 procedure append_glue; 20597 var s:small_number; {modifier of skip command} 20598 begin s:=cur_chr; 20599 case s of 20600 fil_code: cur_val:=fil_glue; 20601 fill_code: cur_val:=fill_glue; 20602 ss_code: cur_val:=ss_glue; 20603 fil_neg_code: cur_val:=fil_neg_glue; 20604 skip_code: scan_glue(glue_val); 20605 mskip_code: scan_glue(mu_val); 20606 end; {now |cur_val| points to the glue specification} 20607 tail_append(new_glue(cur_val)); 20608 if s>=skip_code then 20609 begin decr(glue_ref_count(cur_val)); 20610 if s>skip_code then subtype(tail):=mu_glue; 20611 end; 20612 end; 20613 20614 @ @<Declare act...@>= 20615 procedure append_kern; 20616 var s:quarterword; {|subtype| of the kern node} 20617 begin s:=cur_chr; scan_dimen(s=mu_glue,false,false); 20618 tail_append(new_kern(cur_val)); subtype(tail):=s; 20619 end; 20620 20621 @ Many of the actions related to box-making are triggered by the appearance 20622 of braces in the input. For example, when the user says `\.{\\hbox} 20623 \.{to} \.{100pt\{$\langle\,\hbox{\rm hlist}\,\rangle$\}}' in vertical mode, 20624 the information about the box size (100pt, |exactly|) is put onto |save_stack| 20625 with a level boundary word just above it, and |cur_group:=adjusted_hbox_group|; 20626 \TeX\ enters restricted horizontal mode to process the hlist. The right 20627 brace eventually causes |save_stack| to be restored to its former state, 20628 at which time the information about the box size (100pt, |exactly|) is 20629 available once again; a box is packaged and we leave restricted horizontal 20630 mode, appending the new box to the current list of the enclosing mode 20631 (in this case to the current list of vertical mode), followed by any 20632 vertical adjustments that were removed from the box by |hpack|. 20633 20634 The next few sections of the program are therefore concerned with the 20635 treatment of left and right curly braces. 20636 20637 @ If a left brace occurs in the middle of a page or paragraph, it simply 20638 introduces a new level of grouping, and the matching right brace will not have 20639 such a drastic effect. Such grouping affects neither the mode nor the 20640 current list. 20641 20642 @<Cases of |main_control| that build...@>= 20643 non_math(left_brace): new_save_level(simple_group); 20644 any_mode(begin_group): new_save_level(semi_simple_group); 20645 any_mode(end_group): if cur_group=semi_simple_group then unsave 20646 else off_save; 20647 20648 @ We have to deal with errors in which braces and such things are not 20649 properly nested. Sometimes the user makes an error of commission by 20650 inserting an extra symbol, but sometimes the user makes an error of omission. 20651 \TeX\ can't always tell one from the other, so it makes a guess and tries 20652 to avoid getting into a loop. 20653 20654 The |off_save| routine is called when the current group code is wrong. It tries 20655 to insert something into the user's input that will help clean off 20656 the top level. 20657 20658 @<Declare act...@>= 20659 procedure off_save; 20660 var p:pointer; {inserted token} 20661 begin if cur_group=bottom_level then 20662 @<Drop current token and complain that it was unmatched@> 20663 else begin back_input; p:=get_avail; link(temp_head):=p; 20664 print_err("Missing "); 20665 @<Prepare to insert a token that matches |cur_group|, 20666 and print what it is@>; 20667 print(" inserted"); ins_list(link(temp_head)); 20668 help5("I've inserted something that you may have forgotten.")@/ 20669 ("(See the <inserted text> above.)")@/ 20670 ("With luck, this will get me unwedged. But if you")@/ 20671 ("really didn't forget anything, try typing `2' now; then")@/ 20672 ("my insertion and my current dilemma will both disappear."); 20673 error; 20674 end; 20675 end; 20676 20677 @ At this point, |link(temp_head)=p|, a pointer to an empty one-word node. 20678 20679 @<Prepare to insert a token that matches |cur_group|...@>= 20680 case cur_group of 20681 semi_simple_group: begin info(p):=cs_token_flag+frozen_end_group; 20682 print_esc("endgroup"); 20683 @.Missing \\endgroup inserted@> 20684 end; 20685 math_shift_group: begin info(p):=math_shift_token+"$"; print_char("$"); 20686 @.Missing \$ inserted@> 20687 end; 20688 math_left_group: begin info(p):=cs_token_flag+frozen_right; link(p):=get_avail; 20689 p:=link(p); info(p):=other_token+"."; print_esc("right."); 20690 @.Missing \\right\hbox{.} inserted@> 20691 @^null delimiter@> 20692 end; 20693 othercases begin info(p):=right_brace_token+"}"; print_char("}"); 20694 @.Missing \} inserted@> 20695 end 20696 endcases 20697 20698 @ @<Drop current token and complain that it was unmatched@>= 20699 begin print_err("Extra "); print_cmd_chr(cur_cmd,cur_chr); 20700 @.Extra x@> 20701 help1("Things are pretty mixed up, but I think the worst is over.");@/ 20702 error; 20703 end 20704 20705 @ The routine for a |right_brace| character branches into many subcases, 20706 since a variety of things may happen, depending on |cur_group|. Some 20707 types of groups are not supposed to be ended by a right brace; error 20708 messages are given in hopes of pinpointing the problem. Most branches 20709 of this routine will be filled in later, when we are ready to understand 20710 them; meanwhile, we must prepare ourselves to deal with such errors. 20711 20712 @<Cases of |main_control| that build...@>= 20713 any_mode(right_brace): handle_right_brace; 20714 20715 @ @<Declare the procedure called |handle_right_brace|@>= 20716 procedure handle_right_brace; 20717 var p,@!q:pointer; {for short-term use} 20718 @!d:scaled; {holds |split_max_depth| in |insert_group|} 20719 @!f:integer; {holds |floating_penalty| in |insert_group|} 20720 begin case cur_group of 20721 simple_group: unsave; 20722 bottom_level: begin print_err("Too many }'s"); 20723 @.Too many \}'s@> 20724 help2("You've closed more groups than you opened.")@/ 20725 ("Such booboos are generally harmless, so keep going."); error; 20726 end; 20727 semi_simple_group,math_shift_group,math_left_group: extra_right_brace; 20728 @t\4@>@<Cases of |handle_right_brace| where a |right_brace| triggers 20729 a delayed action@>@; 20730 othercases confusion("rightbrace") 20731 @:this can't happen rightbrace}{\quad rightbrace@> 20732 endcases; 20733 end; 20734 20735 @ @<Declare act...@>= 20736 procedure extra_right_brace; 20737 begin print_err("Extra }, or forgotten "); 20738 @.Extra \}, or forgotten x@> 20739 case cur_group of 20740 semi_simple_group: print_esc("endgroup"); 20741 math_shift_group: print_char("$"); 20742 math_left_group: print_esc("right"); 20743 end;@/ 20744 help5("I've deleted a group-closing symbol because it seems to be")@/ 20745 ("spurious, as in `$x}$'. But perhaps the } is legitimate and")@/ 20746 ("you forgot something else, as in `\hbox{$x}'. In such cases")@/ 20747 ("the way to recover is to insert both the forgotten and the")@/ 20748 ("deleted material, e.g., by typing `I$}'."); error; 20749 incr(align_state); 20750 end; 20751 20752 @ Here is where we clear the parameters that are supposed to revert to their 20753 default values after every paragraph and when internal vertical mode is entered. 20754 20755 @<Declare act...@>= 20756 procedure normal_paragraph; 20757 begin if looseness<>0 then eq_word_define(int_base+looseness_code,0); 20758 if hang_indent<>0 then eq_word_define(dimen_base+hang_indent_code,0); 20759 if hang_after<>1 then eq_word_define(int_base+hang_after_code,1); 20760 if par_shape_ptr<>null then eq_define(par_shape_loc,shape_ref,null); 20761 end; 20762 20763 @ Now let's turn to the question of how \.{\\hbox} is treated. We actually 20764 need to consider also a slightly larger context, since constructions like 20765 `\.{\\setbox3=}\penalty0\.{\\hbox...}' and 20766 `\.{\\leaders}\penalty0\.{\\hbox...}' and 20767 `\.{\\lower3.8pt\\hbox...}' 20768 are supposed to invoke quite 20769 different actions after the box has been packaged. Conversely, 20770 constructions like `\.{\\setbox3=}' can be followed by a variety of 20771 different kinds of boxes, and we would like to encode such things in an 20772 efficient way. 20773 20774 In other words, there are two problems: to represent the context of a box, 20775 and to represent its type. 20776 20777 The first problem is solved by putting a ``context code'' on the |save_stack|, 20778 just below the two entries that give the dimensions produced by |scan_spec|. 20779 The context code is either a (signed) shift amount, or it is a large 20780 integer |>=box_flag|, where |box_flag=@t$2^{30}$@>|. Codes |box_flag| through 20781 |box_flag+255| represent `\.{\\setbox0}' through `\.{\\setbox255}'; 20782 codes |box_flag+256| through |box_flag+511| represent `\.{\\global\\setbox0}' 20783 through `\.{\\global\\setbox255}'; 20784 code |box_flag+512| represents `\.{\\shipout}'; and codes |box_flag+513| 20785 through |box_flag+515| represent `\.{\\leaders}', `\.{\\cleaders}', 20786 and `\.{\\xleaders}'. 20787 20788 The second problem is solved by giving the command code |make_box| to all 20789 control sequences that produce a box, and by using the following |chr_code| 20790 values to distinguish between them: |box_code|, |copy_code|, |last_box_code|, 20791 |vsplit_code|, |vtop_code|, |vtop_code+vmode|, and |vtop_code+hmode|, where 20792 the latter two are used to denote \.{\\vbox} and \.{\\hbox}, respectively. 20793 20794 @d box_flag==@'10000000000 {context code for `\.{\\setbox0}'} 20795 @d ship_out_flag==box_flag+512 {context code for `\.{\\shipout}'} 20796 @d leader_flag==box_flag+513 {context code for `\.{\\leaders}'} 20797 @d box_code=0 {|chr_code| for `\.{\\box}'} 20798 @d copy_code=1 {|chr_code| for `\.{\\copy}'} 20799 @d last_box_code=2 {|chr_code| for `\.{\\lastbox}'} 20800 @d vsplit_code=3 {|chr_code| for `\.{\\vsplit}'} 20801 @d vtop_code=4 {|chr_code| for `\.{\\vtop}'} 20802 20803 @<Put each...@>= 20804 primitive("moveleft",hmove,1); 20805 @!@:move_left_}{\.{\\moveleft} primitive@> 20806 primitive("moveright",hmove,0);@/ 20807 @!@:move_right_}{\.{\\moveright} primitive@> 20808 primitive("raise",vmove,1); 20809 @!@:raise_}{\.{\\raise} primitive@> 20810 primitive("lower",vmove,0); 20811 @!@:lower_}{\.{\\lower} primitive@> 20812 @# 20813 primitive("box",make_box,box_code); 20814 @!@:box_}{\.{\\box} primitive@> 20815 primitive("copy",make_box,copy_code); 20816 @!@:copy_}{\.{\\copy} primitive@> 20817 primitive("lastbox",make_box,last_box_code); 20818 @!@:last_box_}{\.{\\lastbox} primitive@> 20819 primitive("vsplit",make_box,vsplit_code); 20820 @!@:vsplit_}{\.{\\vsplit} primitive@> 20821 primitive("vtop",make_box,vtop_code);@/ 20822 @!@:vtop_}{\.{\\vtop} primitive@> 20823 primitive("vbox",make_box,vtop_code+vmode); 20824 @!@:vbox_}{\.{\\vbox} primitive@> 20825 primitive("hbox",make_box,vtop_code+hmode);@/ 20826 @!@:hbox_}{\.{\\hbox} primitive@> 20827 primitive("shipout",leader_ship,a_leaders-1); {|ship_out_flag=leader_flag-1|} 20828 @!@:ship_out_}{\.{\\shipout} primitive@> 20829 primitive("leaders",leader_ship,a_leaders); 20830 @!@:leaders_}{\.{\\leaders} primitive@> 20831 primitive("cleaders",leader_ship,c_leaders); 20832 @!@:c_leaders_}{\.{\\cleaders} primitive@> 20833 primitive("xleaders",leader_ship,x_leaders); 20834 @!@:x_leaders_}{\.{\\xleaders} primitive@> 20835 20836 @ @<Cases of |print_cmd_chr|...@>= 20837 hmove: if chr_code=1 then print_esc("moveleft")@+else print_esc("moveright"); 20838 vmove: if chr_code=1 then print_esc("raise")@+else print_esc("lower"); 20839 make_box: case chr_code of 20840 box_code: print_esc("box"); 20841 copy_code: print_esc("copy"); 20842 last_box_code: print_esc("lastbox"); 20843 vsplit_code: print_esc("vsplit"); 20844 vtop_code: print_esc("vtop"); 20845 vtop_code+vmode: print_esc("vbox"); 20846 othercases print_esc("hbox") 20847 endcases; 20848 leader_ship: if chr_code=a_leaders then print_esc("leaders") 20849 else if chr_code=c_leaders then print_esc("cleaders") 20850 else if chr_code=x_leaders then print_esc("xleaders") 20851 else print_esc("shipout"); 20852 20853 @ Constructions that require a box are started by calling |scan_box| with 20854 a specified context code. The |scan_box| routine verifies 20855 that a |make_box| command comes next and then it calls |begin_box|. 20856 20857 @<Cases of |main_control| that build...@>= 20858 vmode+hmove,hmode+vmove,mmode+vmove: begin t:=cur_chr; 20859 scan_normal_dimen; 20860 if t=0 then scan_box(cur_val)@+else scan_box(-cur_val); 20861 end; 20862 any_mode(leader_ship): scan_box(leader_flag-a_leaders+cur_chr); 20863 any_mode(make_box): begin_box(0); 20864 20865 @ The global variable |cur_box| will point to a newly made box. If the box 20866 is void, we will have |cur_box=null|. Otherwise we will have 20867 |type(cur_box)=hlist_node| or |vlist_node| or |rule_node|; the |rule_node| 20868 case can occur only with leaders. 20869 20870 @<Glob...@>= 20871 @!cur_box:pointer; {box to be placed into its context} 20872 20873 @ The |box_end| procedure does the right thing with |cur_box|, if 20874 |box_context| represents the context as explained above. 20875 20876 @<Declare act...@>= 20877 procedure box_end(@!box_context:integer); 20878 var p:pointer; {|ord_noad| for new box in math mode} 20879 begin if box_context<box_flag then @<Append box |cur_box| to the current list, 20880 shifted by |box_context|@> 20881 else if box_context<ship_out_flag then @<Store \(c)|cur_box| in a box register@> 20882 else if cur_box<>null then 20883 if box_context>ship_out_flag then @<Append a new leader node that 20884 uses |cur_box|@> 20885 else ship_out(cur_box); 20886 end; 20887 20888 @ The global variable |adjust_tail| will be non-null if and only if the 20889 current box might include adjustments that should be appended to the 20890 current vertical list. 20891 20892 @<Append box |cur_box| to the current...@>= 20893 begin if cur_box<>null then 20894 begin shift_amount(cur_box):=box_context; 20895 if abs(mode)=vmode then 20896 begin append_to_vlist(cur_box); 20897 if adjust_tail<>null then 20898 begin if adjust_head<>adjust_tail then 20899 begin link(tail):=link(adjust_head); tail:=adjust_tail; 20900 end; 20901 adjust_tail:=null; 20902 end; 20903 if mode>0 then build_page; 20904 end 20905 else begin if abs(mode)=hmode then space_factor:=1000 20906 else begin p:=new_noad; 20907 math_type(nucleus(p)):=sub_box; 20908 info(nucleus(p)):=cur_box; cur_box:=p; 20909 end; 20910 link(tail):=cur_box; tail:=cur_box; 20911 end; 20912 end; 20913 end 20914 20915 @ @<Store \(c)|cur_box| in a box register@>= 20916 if box_context<box_flag+256 then 20917 eq_define(box_base-box_flag+box_context,box_ref,cur_box) 20918 else geq_define(box_base-box_flag-256+box_context,box_ref,cur_box) 20919 20920 @ @<Append a new leader node ...@>= 20921 begin @<Get the next non-blank non-relax...@>; 20922 if ((cur_cmd=hskip)and(abs(mode)<>vmode))or@| 20923 ((cur_cmd=vskip)and(abs(mode)=vmode)) then 20924 begin append_glue; subtype(tail):=box_context-(leader_flag-a_leaders); 20925 leader_ptr(tail):=cur_box; 20926 end 20927 else begin print_err("Leaders not followed by proper glue"); 20928 @.Leaders not followed by...@> 20929 help3("You should say `\leaders <box or rule><hskip or vskip>'.")@/ 20930 ("I found the <box or rule>, but there's no suitable")@/ 20931 ("<hskip or vskip>, so I'm ignoring these leaders."); back_error; 20932 flush_node_list(cur_box); 20933 end; 20934 end 20935 20936 @ Now that we can see what eventually happens to boxes, we can consider 20937 the first steps in their creation. The |begin_box| routine is called when 20938 |box_context| is a context specification, |cur_chr| specifies the type of 20939 box desired, and |cur_cmd=make_box|. 20940 20941 @<Declare act...@>= 20942 procedure begin_box(@!box_context:integer); 20943 label exit, done; 20944 var @!p,@!q:pointer; {run through the current list} 20945 @!m:quarterword; {the length of a replacement list} 20946 @!k:halfword; {0 or |vmode| or |hmode|} 20947 @!n:eight_bits; {a box number} 20948 begin case cur_chr of 20949 box_code: begin scan_eight_bit_int; cur_box:=box(cur_val); 20950 box(cur_val):=null; {the box becomes void, at the same level} 20951 end; 20952 copy_code: begin scan_eight_bit_int; cur_box:=copy_node_list(box(cur_val)); 20953 end; 20954 last_box_code: @<If the current list ends with a box node, delete it from 20955 the list and make |cur_box| point to it; otherwise set |cur_box:=null|@>; 20956 vsplit_code: @<Split off part of a vertical box, make |cur_box| point to it@>; 20957 othercases @<Initiate the construction of an hbox or vbox, then |return|@> 20958 endcases;@/ 20959 box_end(box_context); {in simple cases, we use the box immediately} 20960 exit:end; 20961 20962 @ Note that the condition |not is_char_node(tail)| implies that |head<>tail|, 20963 since |head| is a one-word node. 20964 20965 @<If the current list ends with a box node, delete it...@>= 20966 begin cur_box:=null; 20967 if abs(mode)=mmode then 20968 begin you_cant; help1("Sorry; this \lastbox will be void."); error; 20969 end 20970 else if (mode=vmode)and(head=tail) then 20971 begin you_cant; 20972 help2("Sorry...I usually can't take things from the current page.")@/ 20973 ("This \lastbox will therefore be void."); error; 20974 end 20975 else begin if not is_char_node(tail) then 20976 if (type(tail)=hlist_node)or(type(tail)=vlist_node) then 20977 @<Remove the last box, unless it's part of a discretionary@>; 20978 end; 20979 end 20980 20981 @ @<Remove the last box...@>= 20982 begin q:=head; 20983 repeat p:=q; 20984 if not is_char_node(q) then if type(q)=disc_node then 20985 begin for m:=1 to replace_count(q) do p:=link(p); 20986 if p=tail then goto done; 20987 end; 20988 q:=link(p); 20989 until q=tail; 20990 cur_box:=tail; shift_amount(cur_box):=0; 20991 tail:=p; link(p):=null; 20992 done:end 20993 20994 @ Here we deal with things like `\.{\\vsplit 13 to 100pt}'. 20995 20996 @<Split off part of a vertical box, make |cur_box| point to it@>= 20997 begin scan_eight_bit_int; n:=cur_val; 20998 if not scan_keyword("to") then 20999 @.to@> 21000 begin print_err("Missing `to' inserted"); 21001 @.Missing `to' inserted@> 21002 help2("I'm working on `\vsplit<box number> to <dimen>';")@/ 21003 ("will look for the <dimen> next."); error; 21004 end; 21005 scan_normal_dimen; 21006 cur_box:=vsplit(n,cur_val); 21007 end 21008 21009 @ Here is where we enter restricted horizontal mode or internal vertical 21010 mode, in order to make a box. 21011 21012 @<Initiate the construction of an hbox or vbox, then |return|@>= 21013 begin k:=cur_chr-vtop_code; saved(0):=box_context; 21014 if k=hmode then 21015 if (box_context<box_flag)and(abs(mode)=vmode) then 21016 scan_spec(adjusted_hbox_group,true) 21017 else scan_spec(hbox_group,true) 21018 else begin if k=vmode then scan_spec(vbox_group,true) 21019 else begin scan_spec(vtop_group,true); k:=vmode; 21020 end; 21021 normal_paragraph; 21022 end; 21023 push_nest; mode:=-k; 21024 if k=vmode then 21025 begin prev_depth:=ignore_depth; 21026 if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text); 21027 end 21028 else begin space_factor:=1000; 21029 if every_hbox<>null then begin_token_list(every_hbox,every_hbox_text); 21030 end; 21031 return; 21032 end 21033 21034 @ @<Declare act...@>= 21035 procedure scan_box(@!box_context:integer); 21036 {the next input should specify a box or perhaps a rule} 21037 begin @<Get the next non-blank non-relax...@>; 21038 if cur_cmd=make_box then begin_box(box_context) 21039 else if (box_context>=leader_flag)and((cur_cmd=hrule)or(cur_cmd=vrule)) then 21040 begin cur_box:=scan_rule_spec; box_end(box_context); 21041 end 21042 else begin@t@>@;@/ 21043 print_err("A <box> was supposed to be here");@/ 21044 @.A <box> was supposed to...@> 21045 help3("I was expecting to see \hbox or \vbox or \copy or \box or")@/ 21046 ("something like that. So you might find something missing in")@/ 21047 ("your output. But keep trying; you can fix this later."); back_error; 21048 end; 21049 end; 21050 21051 @ When the right brace occurs at the end of an \.{\\hbox} or \.{\\vbox} or 21052 \.{\\vtop} construction, the |package| routine comes into action. We might 21053 also have to finish a paragraph that hasn't ended. 21054 21055 @<Cases of |handle...@>= 21056 hbox_group: package(0); 21057 adjusted_hbox_group: begin adjust_tail:=adjust_head; package(0); 21058 end; 21059 vbox_group: begin end_graf; package(0); 21060 end; 21061 vtop_group: begin end_graf; package(vtop_code); 21062 end; 21063 21064 @ @<Declare action...@>= 21065 procedure package(@!c:small_number); 21066 var h:scaled; {height of box} 21067 @!p:pointer; {first node in a box} 21068 @!d:scaled; {max depth} 21069 begin d:=box_max_depth; unsave; save_ptr:=save_ptr-3; 21070 if mode=-hmode then cur_box:=hpack(link(head),saved(2),saved(1)) 21071 else begin cur_box:=vpackage(link(head),saved(2),saved(1),d); 21072 if c=vtop_code then @<Readjust the height and depth of |cur_box|, 21073 for \.{\\vtop}@>; 21074 end; 21075 pop_nest; box_end(saved(0)); 21076 end; 21077 21078 @ The height of a `\.{\\vtop}' box is inherited from the first item on its list, 21079 if that item is an |hlist_node|, |vlist_node|, or |rule_node|; otherwise 21080 the \.{\\vtop} height is zero. 21081 21082 21083 @<Readjust the height...@>= 21084 begin h:=0; p:=list_ptr(cur_box); 21085 if p<>null then if type(p)<=rule_node then h:=height(p); 21086 depth(cur_box):=depth(cur_box)-h+height(cur_box); height(cur_box):=h; 21087 end 21088 21089 @ A paragraph begins when horizontal-mode material occurs in vertical mode, 21090 or when the paragraph is explicitly started by `\.{\\indent}' or 21091 `\.{\\noindent}'. 21092 21093 @<Put each...@>= 21094 primitive("indent",start_par,1); 21095 @!@:indent_}{\.{\\indent} primitive@> 21096 primitive("noindent",start_par,0); 21097 @!@:no_indent_}{\.{\\noindent} primitive@> 21098 21099 @ @<Cases of |print_cmd_chr|...@>= 21100 start_par: if chr_code=0 then print_esc("noindent")@+ else print_esc("indent"); 21101 21102 @ @<Cases of |main_control| that build...@>= 21103 vmode+start_par: new_graf(cur_chr>0); 21104 vmode+letter,vmode+other_char,vmode+char_num,vmode+char_given, 21105 vmode+math_shift,vmode+un_hbox,vmode+vrule, 21106 vmode+accent,vmode+discretionary,vmode+hskip,vmode+valign, 21107 vmode+ex_space,vmode+no_boundary:@t@>@;@/ 21108 begin back_input; new_graf(true); 21109 end; 21110 21111 @ @<Declare act...@>= 21112 function norm_min(@!h:integer):small_number; 21113 begin if h<=0 then norm_min:=1@+else if h>=63 then norm_min:=63@+ 21114 else norm_min:=h; 21115 end; 21116 @# 21117 procedure new_graf(@!indented:boolean); 21118 begin prev_graf:=0; 21119 if (mode=vmode)or(head<>tail) then 21120 tail_append(new_param_glue(par_skip_code)); 21121 push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang; 21122 prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min)) 21123 *@'200000+cur_lang; 21124 if indented then 21125 begin tail:=new_null_box; link(head):=tail; width(tail):=par_indent;@+ 21126 end; 21127 if every_par<>null then begin_token_list(every_par,every_par_text); 21128 if nest_ptr=1 then build_page; {put |par_skip| glue on current page} 21129 end; 21130 21131 @ @<Cases of |main_control| that build...@>= 21132 hmode+start_par,mmode+start_par: indent_in_hmode; 21133 21134 @ @<Declare act...@>= 21135 procedure indent_in_hmode; 21136 var p,@!q:pointer; 21137 begin if cur_chr>0 then {\.{\\indent}} 21138 begin p:=new_null_box; width(p):=par_indent; 21139 if abs(mode)=hmode then space_factor:=1000 21140 else begin q:=new_noad; math_type(nucleus(q)):=sub_box; 21141 info(nucleus(q)):=p; p:=q; 21142 end; 21143 tail_append(p); 21144 end; 21145 end; 21146 21147 @ A paragraph ends when a |par_end| command is sensed, or when we are in 21148 horizontal mode when reaching the right brace of vertical-mode routines 21149 like \.{\\vbox}, \.{\\insert}, or \.{\\output}. 21150 21151 @<Cases of |main_control| that build...@>= 21152 vmode+par_end: begin normal_paragraph; 21153 if mode>0 then build_page; 21154 end; 21155 hmode+par_end: begin if align_state<0 then off_save; {this tries to 21156 recover from an alignment that didn't end properly} 21157 end_graf; {this takes us to the enclosing mode, if |mode>0|} 21158 if mode=vmode then build_page; 21159 end; 21160 hmode+stop,hmode+vskip,hmode+hrule,hmode+un_vbox,hmode+halign: head_for_vmode; 21161 21162 @ @<Declare act...@>= 21163 procedure head_for_vmode; 21164 begin if mode<0 then 21165 if cur_cmd<>hrule then off_save 21166 else begin print_err("You can't use `"); 21167 print_esc("hrule"); print("' here except with leaders"); 21168 @.You can't use \\hrule...@> 21169 help2("To put a horizontal rule in an hbox or an alignment,")@/ 21170 ("you should use \leaders or \hrulefill (see The TeXbook)."); 21171 error; 21172 end 21173 else begin back_input; cur_tok:=par_token; back_input; token_type:=inserted; 21174 end; 21175 end; 21176 21177 @ @<Declare act...@>= 21178 procedure end_graf; 21179 begin if mode=hmode then 21180 begin if head=tail then pop_nest {null paragraphs are ignored} 21181 else line_break(widow_penalty); 21182 normal_paragraph; 21183 error_count:=0; 21184 end; 21185 end; 21186 21187 @ Insertion and adjustment and mark nodes are constructed by the following 21188 pieces of the program. 21189 21190 @<Cases of |main_control| that build...@>= 21191 any_mode(insert),hmode+vadjust,mmode+vadjust: begin_insert_or_adjust; 21192 any_mode(mark): make_mark; 21193 21194 @ @<Forbidden...@>= 21195 vmode+vadjust, 21196 21197 @ @<Declare act...@>= 21198 procedure begin_insert_or_adjust; 21199 begin if cur_cmd=vadjust then cur_val:=255 21200 else begin scan_eight_bit_int; 21201 if cur_val=255 then 21202 begin print_err("You can't "); print_esc("insert"); print_int(255); 21203 @.You can't \\insert255@> 21204 help1("I'm changing to \insert0; box 255 is special."); 21205 error; cur_val:=0; 21206 end; 21207 end; 21208 saved(0):=cur_val; incr(save_ptr); 21209 new_save_level(insert_group); scan_left_brace; normal_paragraph; 21210 push_nest; mode:=-vmode; prev_depth:=ignore_depth; 21211 end; 21212 21213 @ @<Cases of |handle...@>= 21214 insert_group: begin end_graf; q:=split_top_skip; add_glue_ref(q); 21215 d:=split_max_depth; f:=floating_penalty; unsave; decr(save_ptr); 21216 {now |saved(0)| is the insertion number, or 255 for |vadjust|} 21217 p:=vpack(link(head),natural); pop_nest; 21218 if saved(0)<255 then 21219 begin tail_append(get_node(ins_node_size)); 21220 type(tail):=ins_node; subtype(tail):=qi(saved(0)); 21221 height(tail):=height(p)+depth(p); ins_ptr(tail):=list_ptr(p); 21222 split_top_ptr(tail):=q; depth(tail):=d; float_cost(tail):=f; 21223 end 21224 else begin tail_append(get_node(small_node_size)); 21225 type(tail):=adjust_node;@/ 21226 subtype(tail):=0; {the |subtype| is not used} 21227 adjust_ptr(tail):=list_ptr(p); delete_glue_ref(q); 21228 end; 21229 free_node(p,box_node_size); 21230 if nest_ptr=0 then build_page; 21231 end; 21232 output_group: @<Resume the page builder...@>; 21233 21234 @ @<Declare act...@>= 21235 procedure make_mark; 21236 var p:pointer; {new node} 21237 begin p:=scan_toks(false,true); p:=get_node(small_node_size); 21238 type(p):=mark_node; subtype(p):=0; {the |subtype| is not used} 21239 mark_ptr(p):=def_ref; link(tail):=p; tail:=p; 21240 end; 21241 21242 @ Penalty nodes get into a list via the |break_penalty| command. 21243 @^penalties@> 21244 21245 @<Cases of |main_control| that build...@>= 21246 any_mode(break_penalty): append_penalty; 21247 21248 @ @<Declare action...@>= 21249 procedure append_penalty; 21250 begin scan_int; tail_append(new_penalty(cur_val)); 21251 if mode=vmode then build_page; 21252 end; 21253 21254 @ The |remove_item| command removes a penalty, kern, or glue node if it 21255 appears at the tail of the current list, using a brute-force linear scan. 21256 Like \.{\\lastbox}, this command is not allowed in vertical mode (except 21257 internal vertical mode), since the current list in vertical mode is sent 21258 to the page builder. But if we happen to be able to implement it in 21259 vertical mode, we do. 21260 21261 @<Cases of |main_control| that build...@>= 21262 any_mode(remove_item): delete_last; 21263 21264 @ When |delete_last| is called, |cur_chr| is the |type| of node that 21265 will be deleted, if present. 21266 21267 @<Declare action...@>= 21268 procedure delete_last; 21269 label exit; 21270 var @!p,@!q:pointer; {run through the current list} 21271 @!m:quarterword; {the length of a replacement list} 21272 begin if (mode=vmode)and(tail=head) then 21273 @<Apologize for inability to do the operation now, 21274 unless \.{\\unskip} follows non-glue@> 21275 else begin if not is_char_node(tail) then if type(tail)=cur_chr then 21276 begin q:=head; 21277 repeat p:=q; 21278 if not is_char_node(q) then if type(q)=disc_node then 21279 begin for m:=1 to replace_count(q) do p:=link(p); 21280 if p=tail then return; 21281 end; 21282 q:=link(p); 21283 until q=tail; 21284 link(p):=null; flush_node_list(tail); tail:=p; 21285 end; 21286 end; 21287 exit:end; 21288 21289 @ @<Apologize for inability to do the operation...@>= 21290 begin if (cur_chr<>glue_node)or(last_glue<>max_halfword) then 21291 begin you_cant; 21292 help2("Sorry...I usually can't take things from the current page.")@/ 21293 ("Try `I\vskip-\lastskip' instead."); 21294 if cur_chr=kern_node then help_line[0]:= 21295 ("Try `I\kern-\lastkern' instead.") 21296 else if cur_chr<>glue_node then help_line[0]:=@| 21297 ("Perhaps you can make the output routine do it."); 21298 error; 21299 end; 21300 end 21301 21302 @ @<Put each...@>= 21303 primitive("unpenalty",remove_item,penalty_node);@/ 21304 @!@:un_penalty_}{\.{\\unpenalty} primitive@> 21305 primitive("unkern",remove_item,kern_node);@/ 21306 @!@:un_kern_}{\.{\\unkern} primitive@> 21307 primitive("unskip",remove_item,glue_node);@/ 21308 @!@:un_skip_}{\.{\\unskip} primitive@> 21309 primitive("unhbox",un_hbox,box_code);@/ 21310 @!@:un_hbox_}{\.{\\unhbox} primitive@> 21311 primitive("unhcopy",un_hbox,copy_code);@/ 21312 @!@:un_hcopy_}{\.{\\unhcopy} primitive@> 21313 primitive("unvbox",un_vbox,box_code);@/ 21314 @!@:un_vbox_}{\.{\\unvbox} primitive@> 21315 primitive("unvcopy",un_vbox,copy_code);@/ 21316 @!@:un_vcopy_}{\.{\\unvcopy} primitive@> 21317 21318 @ @<Cases of |print_cmd_chr|...@>= 21319 remove_item: if chr_code=glue_node then print_esc("unskip") 21320 else if chr_code=kern_node then print_esc("unkern") 21321 else print_esc("unpenalty"); 21322 un_hbox: if chr_code=copy_code then print_esc("unhcopy") 21323 else print_esc("unhbox"); 21324 un_vbox: if chr_code=copy_code then print_esc("unvcopy") 21325 else print_esc("unvbox"); 21326 21327 @ The |un_hbox| and |un_vbox| commands unwrap one of the 256 current boxes. 21328 21329 @<Cases of |main_control| that build...@>= 21330 vmode+un_vbox,hmode+un_hbox,mmode+un_hbox: unpackage; 21331 21332 @ @<Declare act...@>= 21333 procedure unpackage; 21334 label exit; 21335 var p:pointer; {the box} 21336 @!c:box_code..copy_code; {should we copy?} 21337 begin c:=cur_chr; scan_eight_bit_int; p:=box(cur_val); 21338 if p=null then return; 21339 if (abs(mode)=mmode)or((abs(mode)=vmode)and(type(p)<>vlist_node))or@| 21340 ((abs(mode)=hmode)and(type(p)<>hlist_node)) then 21341 begin print_err("Incompatible list can't be unboxed"); 21342 @.Incompatible list...@> 21343 help3("Sorry, Pandora. (You sneaky devil.)")@/ 21344 ("I refuse to unbox an \hbox in vertical mode or vice versa.")@/ 21345 ("And I can't open any boxes in math mode.");@/ 21346 error; return; 21347 end; 21348 if c=copy_code then link(tail):=copy_node_list(list_ptr(p)) 21349 else begin link(tail):=list_ptr(p); box(cur_val):=null; 21350 free_node(p,box_node_size); 21351 end; 21352 while link(tail)<>null do tail:=link(tail); 21353 exit:end; 21354 21355 @ @<Forbidden...@>=vmode+ital_corr, 21356 21357 @ Italic corrections are converted to kern nodes when the |ital_corr| command 21358 follows a character. In math mode the same effect is achieved by appending 21359 a kern of zero here, since italic corrections are supplied later. 21360 21361 @<Cases of |main_control| that build...@>= 21362 hmode+ital_corr: append_italic_correction; 21363 mmode+ital_corr: tail_append(new_kern(0)); 21364 21365 @ @<Declare act...@>= 21366 procedure append_italic_correction; 21367 label exit; 21368 var p:pointer; {|char_node| at the tail of the current list} 21369 @!f:internal_font_number; {the font in the |char_node|} 21370 begin if tail<>head then 21371 begin if is_char_node(tail) then p:=tail 21372 else if type(tail)=ligature_node then p:=lig_char(tail) 21373 else return; 21374 f:=font(p); 21375 tail_append(new_kern(char_italic(f)(char_info(f)(character(p))))); 21376 subtype(tail):=explicit; 21377 end; 21378 exit:end; 21379 21380 @ Discretionary nodes are easy in the common case `\.{\\-}', but in the 21381 general case we must process three braces full of items. 21382 21383 @<Put each...@>= 21384 primitive("-",discretionary,1); 21385 @!@:Single-character primitives -}{\quad\.{\\-}@> 21386 primitive("discretionary",discretionary,0); 21387 @!@:discretionary_}{\.{\\discretionary} primitive@> 21388 21389 @ @<Cases of |print_cmd_chr|...@>= 21390 discretionary: if chr_code=1 then 21391 print_esc("-")@+else print_esc("discretionary"); 21392 21393 @ @<Cases of |main_control| that build...@>= 21394 hmode+discretionary,mmode+discretionary: append_discretionary; 21395 21396 @ The space factor does not change when we append a discretionary node, 21397 but it starts out as 1000 in the subsidiary lists. 21398 21399 @<Declare act...@>= 21400 procedure append_discretionary; 21401 var c:integer; {hyphen character} 21402 begin tail_append(new_disc); 21403 if cur_chr=1 then 21404 begin c:=hyphen_char[cur_font]; 21405 if c>=0 then if c<256 then pre_break(tail):=new_character(cur_font,c); 21406 end 21407 else begin incr(save_ptr); saved(-1):=0; new_save_level(disc_group); 21408 scan_left_brace; push_nest; mode:=-hmode; space_factor:=1000; 21409 end; 21410 end; 21411 21412 @ The three discretionary lists are constructed somewhat as if they were 21413 hboxes. A~subroutine called |build_discretionary| handles the transitions. 21414 (This is sort of fun.) 21415 21416 @<Cases of |handle...@>= 21417 disc_group: build_discretionary; 21418 21419 @ @<Declare act...@>= 21420 procedure build_discretionary; 21421 label done,exit; 21422 var p,@!q:pointer; {for link manipulation} 21423 @!n:integer; {length of discretionary list} 21424 begin unsave; 21425 @<Prune the current list, if necessary, until it contains only 21426 |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|, 21427 and |ligature_node| items; set |n| to the length of the list, 21428 and set |q| to the list's tail@>; 21429 p:=link(head); pop_nest; 21430 case saved(-1) of 21431 0:pre_break(tail):=p; 21432 1:post_break(tail):=p; 21433 2:@<Attach list |p| to the current list, and record its length; 21434 then finish up and |return|@>; 21435 end; {there are no other cases} 21436 incr(saved(-1)); new_save_level(disc_group); scan_left_brace; 21437 push_nest; mode:=-hmode; space_factor:=1000; 21438 exit:end; 21439 21440 @ @<Attach list |p| to the current...@>= 21441 begin if (n>0)and(abs(mode)=mmode) then 21442 begin print_err("Illegal math "); print_esc("discretionary"); 21443 @.Illegal math \\disc...@> 21444 help2("Sorry: The third part of a discretionary break must be")@/ 21445 ("empty, in math formulas. I had to delete your third part."); 21446 flush_node_list(p); n:=0; error; 21447 end 21448 else link(tail):=p; 21449 if n<=max_quarterword then replace_count(tail):=n 21450 else begin print_err("Discretionary list is too long"); 21451 @.Discretionary list is too long@> 21452 help2("Wow---I never thought anybody would tweak me here.")@/ 21453 ("You can't seriously need such a huge discretionary list?"); 21454 error; 21455 end; 21456 if n>0 then tail:=q; 21457 decr(save_ptr); return; 21458 end 21459 21460 @ During this loop, |p=link(q)| and there are |n| items preceding |p|. 21461 21462 @<Prune the current list, if necessary...@>= 21463 q:=head; p:=link(q); n:=0; 21464 while p<>null do 21465 begin if not is_char_node(p) then if type(p)>rule_node then 21466 if type(p)<>kern_node then if type(p)<>ligature_node then 21467 begin print_err("Improper discretionary list"); 21468 @.Improper discretionary list@> 21469 help1("Discretionary lists must contain only boxes and kerns.");@/ 21470 error; 21471 begin_diagnostic; 21472 print_nl("The following discretionary sublist has been deleted:"); 21473 @.The following...deleted@> 21474 show_box(p); 21475 end_diagnostic(true); 21476 flush_node_list(p); link(q):=null; goto done; 21477 end; 21478 q:=p; p:=link(q); incr(n); 21479 end; 21480 done: 21481 21482 @ We need only one more thing to complete the horizontal mode routines, namely 21483 the \.{\\accent} primitive. 21484 21485 @<Cases of |main_control| that build...@>= 21486 hmode+accent: make_accent; 21487 21488 @ The positioning of accents is straightforward but tedious. Given an accent 21489 of width |a|, designed for characters of height |x| and slant |s|; 21490 and given a character of width |w|, height |h|, and slant |t|: We will shift 21491 the accent down by |x-h|, and we will insert kern nodes that have the effect of 21492 centering the accent over the character and shifting the accent to the 21493 right by $\delta={1\over2}(w-a)+h\cdot t-x\cdot s$. If either character is 21494 absent from the font, we will simply use the other, without shifting. 21495 21496 @<Declare act...@>= 21497 procedure make_accent; 21498 var s,@!t: real; {amount of slant} 21499 @!p,@!q,@!r:pointer; {character, box, and kern nodes} 21500 @!f:internal_font_number; {relevant font} 21501 @!a,@!h,@!x,@!w,@!delta:scaled; {heights and widths, as explained above} 21502 @!i:four_quarters; {character information} 21503 begin scan_char_num; f:=cur_font; p:=new_character(f,cur_val); 21504 if p<>null then 21505 begin x:=x_height(f); s:=slant(f)/float_constant(65536); 21506 @^real division@> 21507 a:=char_width(f)(char_info(f)(character(p)));@/ 21508 do_assignments;@/ 21509 @<Create a character node |q| for the next character, 21510 but set |q:=null| if problems arise@>; 21511 if q<>null then @<Append the accent with appropriate kerns, 21512 then set |p:=q|@>; 21513 link(tail):=p; tail:=p; space_factor:=1000; 21514 end; 21515 end; 21516 21517 @ @<Create a character node |q| for the next...@>= 21518 q:=null; f:=cur_font; 21519 if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given) then 21520 q:=new_character(f,cur_chr) 21521 else if cur_cmd=char_num then 21522 begin scan_char_num; q:=new_character(f,cur_val); 21523 end 21524 else back_input 21525 21526 @ The kern nodes appended here must be distinguished from other kerns, lest 21527 they be wiped away by the hyphenation algorithm or by a previous line break. 21528 21529 The two kerns are computed with (machine-dependent) |real| arithmetic, but 21530 their sum is machine-independent; the net effect is machine-independent, 21531 because the user cannot remove these nodes nor access them via \.{\\lastkern}. 21532 21533 @<Append the accent with appropriate kerns...@>= 21534 begin t:=slant(f)/float_constant(65536); 21535 @^real division@> 21536 i:=char_info(f)(character(q)); 21537 w:=char_width(f)(i); h:=char_height(f)(height_depth(i)); 21538 if h<>x then {the accent must be shifted up or down} 21539 begin p:=hpack(p,natural); shift_amount(p):=x-h; 21540 end; 21541 delta:=round((w-a)/float_constant(2)+h*t-x*s); 21542 @^real multiplication@> 21543 @^real addition@> 21544 r:=new_kern(delta); subtype(r):=acc_kern; link(tail):=r; link(r):=p; 21545 tail:=new_kern(-a-delta); subtype(tail):=acc_kern; link(p):=tail; p:=q; 21546 end 21547 21548 @ When `\.{\\cr}' or `\.{\\span}' or a tab mark comes through the scanner 21549 into |main_control|, it might be that the user has foolishly inserted 21550 one of them into something that has nothing to do with alignment. But it is 21551 far more likely that a left brace or right brace has been omitted, since 21552 |get_next| takes actions appropriate to alignment only when `\.{\\cr}' 21553 or `\.{\\span}' or tab marks occur with |align_state=0|. The following 21554 program attempts to make an appropriate recovery. 21555 21556 @<Cases of |main_control| that build...@>= 21557 any_mode(car_ret), any_mode(tab_mark): align_error; 21558 any_mode(no_align): no_align_error; 21559 any_mode(omit): omit_error; 21560 21561 @ @<Declare act...@>= 21562 procedure align_error; 21563 begin if abs(align_state)>2 then 21564 @<Express consternation over the fact that no alignment is in progress@> 21565 else begin back_input; 21566 if align_state<0 then 21567 begin print_err("Missing { inserted"); 21568 @.Missing \{ inserted@> 21569 incr(align_state); cur_tok:=left_brace_token+"{"; 21570 end 21571 else begin print_err("Missing } inserted"); 21572 @.Missing \} inserted@> 21573 decr(align_state); cur_tok:=right_brace_token+"}"; 21574 end; 21575 help3("I've put in what seems to be necessary to fix")@/ 21576 ("the current column of the current alignment.")@/ 21577 ("Try to go on, since this might almost work."); ins_error; 21578 end; 21579 end; 21580 21581 @ @<Express consternation...@>= 21582 begin print_err("Misplaced "); print_cmd_chr(cur_cmd,cur_chr); 21583 @.Misplaced \&@> 21584 @.Misplaced \\span@> 21585 @.Misplaced \\cr@> 21586 if cur_tok=tab_token+"&" then 21587 begin help6("I can't figure out why you would want to use a tab mark")@/ 21588 ("here. If you just want an ampersand, the remedy is")@/ 21589 ("simple: Just type `I\&' now. But if some right brace")@/ 21590 ("up above has ended a previous alignment prematurely,")@/ 21591 ("you're probably due for more error messages, and you")@/ 21592 ("might try typing `S' now just to see what is salvageable."); 21593 end 21594 else begin help5("I can't figure out why you would want to use a tab mark")@/ 21595 ("or \cr or \span just now. If something like a right brace")@/ 21596 ("up above has ended a previous alignment prematurely,")@/ 21597 ("you're probably due for more error messages, and you")@/ 21598 ("might try typing `S' now just to see what is salvageable."); 21599 end; 21600 error; 21601 end 21602 21603 @ The help messages here contain a little white lie, since \.{\\noalign} 21604 and \.{\\omit} are allowed also after `\.{\\noalign\{...\}}'. 21605 21606 @<Declare act...@>= 21607 procedure no_align_error; 21608 begin print_err("Misplaced "); print_esc("noalign"); 21609 @.Misplaced \\noalign@> 21610 help2("I expect to see \noalign only after the \cr of")@/ 21611 ("an alignment. Proceed, and I'll ignore this case."); error; 21612 end; 21613 procedure omit_error; 21614 begin print_err("Misplaced "); print_esc("omit"); 21615 @.Misplaced \\omit@> 21616 help2("I expect to see \omit only after tab marks or the \cr of")@/ 21617 ("an alignment. Proceed, and I'll ignore this case."); error; 21618 end; 21619 21620 @ We've now covered most of the abuses of \.{\\halign} and \.{\\valign}. 21621 Let's take a look at what happens when they are used correctly. 21622 21623 @<Cases of |main_control| that build...@>= 21624 vmode+halign,hmode+valign:init_align; 21625 mmode+halign: if privileged then 21626 if cur_group=math_shift_group then init_align 21627 else off_save; 21628 vmode+endv,hmode+endv: do_endv; 21629 21630 @ An |align_group| code is supposed to remain on the |save_stack| 21631 during an entire alignment, until |fin_align| removes it. 21632 21633 A devious user might force an |endv| command to occur just about anywhere; 21634 we must defeat such hacks. 21635 21636 @<Declare act...@>= 21637 procedure do_endv; 21638 begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input; 21639 while (input_stack[base_ptr].index_field<>v_template) and 21640 (input_stack[base_ptr].loc_field=null) and 21641 (input_stack[base_ptr].state_field=token_list) do decr(base_ptr); 21642 if (input_stack[base_ptr].index_field<>v_template) or 21643 (input_stack[base_ptr].loc_field<>null) or 21644 (input_stack[base_ptr].state_field<>token_list) then 21645 fatal_error("(interwoven alignment preambles are not allowed)"); 21646 @.interwoven alignment preambles...@> 21647 if cur_group=align_group then 21648 begin end_graf; 21649 if fin_col then fin_row; 21650 end 21651 else off_save; 21652 end; 21653 21654 @ @<Cases of |handle_right_brace|...@>= 21655 align_group: begin back_input; cur_tok:=cs_token_flag+frozen_cr; 21656 print_err("Missing "); print_esc("cr"); print(" inserted"); 21657 @.Missing \\cr inserted@> 21658 help1("I'm guessing that you meant to end an alignment here."); 21659 ins_error; 21660 end; 21661 21662 @ @<Cases of |handle_right_brace|...@>= 21663 no_align_group: begin end_graf; unsave; align_peek; 21664 end; 21665 21666 @ Finally, \.{\\endcsname} is not supposed to get through to |main_control|. 21667 21668 @<Cases of |main_control| that build...@>= 21669 any_mode(end_cs_name): cs_error; 21670 21671 @ @<Declare act...@>= 21672 procedure cs_error; 21673 begin print_err("Extra "); print_esc("endcsname"); 21674 @.Extra \\endcsname@> 21675 help1("I'm ignoring this, since I wasn't doing a \csname."); 21676 error; 21677 end; 21678 21679 @* \[48] Building math lists. 21680 The routines that \TeX\ uses to create mlists are similar to those we have 21681 just seen for the generation of hlists and vlists. But it is necessary to 21682 make ``noads'' as well as nodes, so the reader should review the 21683 discussion of math mode data structures before trying to make sense out of 21684 the following program. 21685 21686 Here is a little routine that needs to be done whenever a subformula 21687 is about to be processed. The parameter is a code like |math_group|. 21688 21689 @<Declare act...@>= 21690 procedure push_math(@!c:group_code); 21691 begin push_nest; mode:=-mmode; incompleat_noad:=null; new_save_level(c); 21692 end; 21693 21694 @ We get into math mode from horizontal mode when a `\.\$' (i.e., a 21695 |math_shift| character) is scanned. We must check to see whether this 21696 `\.\$' is immediately followed by another, in case display math mode is 21697 called for. 21698 21699 @<Cases of |main_control| that build...@>= 21700 hmode+math_shift:init_math; 21701 21702 @ @<Declare act...@>= 21703 procedure init_math; 21704 label reswitch,found,not_found,done; 21705 var w:scaled; {new or partial |pre_display_size|} 21706 @!l:scaled; {new |display_width|} 21707 @!s:scaled; {new |display_indent|} 21708 @!p:pointer; {current node when calculating |pre_display_size|} 21709 @!q:pointer; {glue specification when calculating |pre_display_size|} 21710 @!f:internal_font_number; {font in current |char_node|} 21711 @!n:integer; {scope of paragraph shape specification} 21712 @!v:scaled; {|w| plus possible glue amount} 21713 @!d:scaled; {increment to |v|} 21714 begin get_token; {|get_x_token| would fail on \.{\\ifmmode}\thinspace!} 21715 if (cur_cmd=math_shift)and(mode>0) then @<Go into display math mode@> 21716 else begin back_input; @<Go into ordinary math mode@>; 21717 end; 21718 end; 21719 21720 @ @<Go into ordinary math mode@>= 21721 begin push_math(math_shift_group); eq_word_define(int_base+cur_fam_code,-1); 21722 if every_math<>null then begin_token_list(every_math,every_math_text); 21723 end 21724 21725 @ We get into ordinary math mode from display math mode when `\.{\\eqno}' or 21726 `\.{\\leqno}' appears. In such cases |cur_chr| will be 0 or~1, respectively; 21727 the value of |cur_chr| is placed onto |save_stack| for safe keeping. 21728 21729 @<Cases of |main_control| that build...@>= 21730 mmode+eq_no: if privileged then 21731 if cur_group=math_shift_group then start_eq_no 21732 else off_save; 21733 21734 @ @<Put each...@>= 21735 primitive("eqno",eq_no,0); 21736 @!@:eq_no_}{\.{\\eqno} primitive@> 21737 primitive("leqno",eq_no,1); 21738 @!@:leq_no_}{\.{\\leqno} primitive@> 21739 21740 @ When \TeX\ is in display math mode, |cur_group=math_shift_group|, 21741 so it is not necessary for the |start_eq_no| procedure to test for 21742 this condition. 21743 21744 @<Declare act...@>= 21745 procedure start_eq_no; 21746 begin saved(0):=cur_chr; incr(save_ptr); 21747 @<Go into ordinary math mode@>; 21748 end; 21749 21750 @ @<Cases of |print_cmd_chr|...@>= 21751 eq_no:if chr_code=1 then print_esc("leqno")@+else print_esc("eqno"); 21752 21753 @ @<Forbidden...@>=non_math(eq_no), 21754 21755 @ When we enter display math mode, we need to call |line_break| to 21756 process the partial paragraph that has just been interrupted by the 21757 display. Then we can set the proper values of |display_width| and 21758 |display_indent| and |pre_display_size|. 21759 21760 @<Go into display math mode@>= 21761 begin if head=tail then {`\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'} 21762 begin pop_nest; w:=-max_dimen; 21763 end 21764 else begin line_break(display_widow_penalty);@/ 21765 @<Calculate the natural width, |w|, by which the characters of the 21766 final line extend to the right of the reference point, 21767 plus two ems; or set |w:=max_dimen| if the non-blank information 21768 on that line is affected by stretching or shrinking@>; 21769 end; 21770 {now we are in vertical mode, working on the list that will contain the display} 21771 @<Calculate the length, |l|, and the shift amount, |s|, of the display lines@>; 21772 push_math(math_shift_group); mode:=mmode; 21773 eq_word_define(int_base+cur_fam_code,-1);@/ 21774 eq_word_define(dimen_base+pre_display_size_code,w); 21775 eq_word_define(dimen_base+display_width_code,l); 21776 eq_word_define(dimen_base+display_indent_code,s); 21777 if every_display<>null then begin_token_list(every_display,every_display_text); 21778 if nest_ptr=1 then build_page; 21779 end 21780 21781 @ @<Calculate the natural width, |w|, by which...@>= 21782 v:=shift_amount(just_box)+2*quad(cur_font); w:=-max_dimen; 21783 p:=list_ptr(just_box); 21784 while p<>null do 21785 begin @<Let |d| be the natural width of node |p|; 21786 if the node is ``visible,'' |goto found|; 21787 if the node is glue that stretches or shrinks, set |v:=max_dimen|@>; 21788 if v<max_dimen then v:=v+d; 21789 goto not_found; 21790 found: if v<max_dimen then 21791 begin v:=v+d; w:=v; 21792 end 21793 else begin w:=max_dimen; goto done; 21794 end; 21795 not_found: p:=link(p); 21796 end; 21797 done: 21798 21799 @ @<Let |d| be the natural width of node |p|...@>= 21800 reswitch: if is_char_node(p) then 21801 begin f:=font(p); d:=char_width(f)(char_info(f)(character(p))); 21802 goto found; 21803 end; 21804 case type(p) of 21805 hlist_node,vlist_node,rule_node: begin d:=width(p); goto found; 21806 end; 21807 ligature_node:@<Make node |p| look like a |char_node|...@>; 21808 kern_node,math_node: d:=width(p); 21809 glue_node:@<Let |d| be the natural width of this glue; if stretching 21810 or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders@>; 21811 whatsit_node: @<Let |d| be the width of the whatsit |p|@>; 21812 othercases d:=0 21813 endcases 21814 21815 @ We need to be careful that |w|, |v|, and |d| do not depend on any |glue_set| 21816 values, since such values are subject to system-dependent rounding. 21817 System-dependent numbers are not allowed to infiltrate parameters like 21818 |pre_display_size|, since \TeX82 is supposed to make the same decisions on all 21819 machines. 21820 21821 @<Let |d| be the natural width of this glue...@>= 21822 begin q:=glue_ptr(p); d:=width(q); 21823 if glue_sign(just_box)=stretching then 21824 begin if (glue_order(just_box)=stretch_order(q))and@| 21825 (stretch(q)<>0) then 21826 v:=max_dimen; 21827 end 21828 else if glue_sign(just_box)=shrinking then 21829 begin if (glue_order(just_box)=shrink_order(q))and@| 21830 (shrink(q)<>0) then 21831 v:=max_dimen; 21832 end; 21833 if subtype(p)>=a_leaders then goto found; 21834 end 21835 21836 @ A displayed equation is considered to be three lines long, so we 21837 calculate the length and offset of line number |prev_graf+2|. 21838 21839 @<Calculate the length, |l|, ...@>= 21840 if par_shape_ptr=null then 21841 if (hang_indent<>0)and@| 21842 (((hang_after>=0)and(prev_graf+2>hang_after))or@| 21843 (prev_graf+1<-hang_after)) then 21844 begin l:=hsize-abs(hang_indent); 21845 if hang_indent>0 then s:=hang_indent@+else s:=0; 21846 end 21847 else begin l:=hsize; s:=0; 21848 end 21849 else begin n:=info(par_shape_ptr); 21850 if prev_graf+2>=n then p:=par_shape_ptr+2*n 21851 else p:=par_shape_ptr+2*(prev_graf+2); 21852 s:=mem[p-1].sc; l:=mem[p].sc; 21853 end 21854 21855 @ Subformulas of math formulas cause a new level of math mode to be entered, 21856 on the semantic nest as well as the save stack. These subformulas arise in 21857 several ways: (1)~A left brace by itself indicates the beginning of a 21858 subformula that will be put into a box, thereby freezing its glue and 21859 preventing line breaks. (2)~A subscript or superscript is treated as a 21860 subformula if it is not a single character; the same applies to 21861 the nucleus of things like \.{\\underline}. (3)~The \.{\\left} primitive 21862 initiates a subformula that will be terminated by a matching \.{\\right}. 21863 The group codes placed on |save_stack| in these three cases are 21864 |math_group|, |math_group|, and |math_left_group|, respectively. 21865 21866 Here is the code that handles case (1); the other cases are not quite as 21867 trivial, so we shall consider them later. 21868 21869 @<Cases of |main_control| that build...@>= 21870 mmode+left_brace: begin tail_append(new_noad); 21871 back_input; scan_math(nucleus(tail)); 21872 end; 21873 21874 @ Recall that the |nucleus|, |subscr|, and |supscr| fields in a noad are 21875 broken down into subfields called |math_type| and either |info| or 21876 |(fam,character)|. The job of |scan_math| is to figure out what to place 21877 in one of these principal fields; it looks at the subformula that 21878 comes next in the input, and places an encoding of that subformula 21879 into a given word of |mem|. 21880 21881 @d fam_in_range==((cur_fam>=0)and(cur_fam<16)) 21882 21883 @<Declare act...@>= 21884 procedure scan_math(@!p:pointer); 21885 label restart,reswitch,exit; 21886 var c:integer; {math character code} 21887 begin restart:@<Get the next non-blank non-relax...@>; 21888 reswitch:case cur_cmd of 21889 letter,other_char,char_given: begin c:=ho(math_code(cur_chr)); 21890 if c=@'100000 then 21891 begin @<Treat |cur_chr| as an active character@>; 21892 goto restart; 21893 end; 21894 end; 21895 char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given; 21896 goto reswitch; 21897 end; 21898 math_char_num: begin scan_fifteen_bit_int; c:=cur_val; 21899 end; 21900 math_given: c:=cur_chr; 21901 delim_num: begin scan_twenty_seven_bit_int; c:=cur_val div @'10000; 21902 end; 21903 othercases @<Scan a subformula enclosed in braces and |return|@> 21904 endcases;@/ 21905 math_type(p):=math_char; character(p):=qi(c mod 256); 21906 if (c>=var_code)and fam_in_range then fam(p):=cur_fam 21907 else fam(p):=(c div 256) mod 16; 21908 exit:end; 21909 21910 @ An active character that is an |outer_call| is allowed here. 21911 21912 @<Treat |cur_chr|...@>= 21913 begin cur_cs:=cur_chr+active_base; 21914 cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 21915 x_token; back_input; 21916 end 21917 21918 @ The pointer |p| is placed on |save_stack| while a complex subformula 21919 is being scanned. 21920 21921 @<Scan a subformula...@>= 21922 begin back_input; scan_left_brace;@/ 21923 saved(0):=p; incr(save_ptr); push_math(math_group); return; 21924 end 21925 21926 @ The simplest math formula is, of course, `\.{\${ }\$}', when no noads are 21927 generated. The next simplest cases involve a single character, e.g., 21928 `\.{\$x\$}'. Even though such cases may not seem to be very interesting, 21929 the reader can perhaps understand how happy the author was when `\.{\$x\$}' 21930 was first properly typeset by \TeX. The code in this section was used. 21931 @^Knuth, Donald Ervin@> 21932 21933 @<Cases of |main_control| that build...@>= 21934 mmode+letter,mmode+other_char,mmode+char_given: 21935 set_math_char(ho(math_code(cur_chr))); 21936 mmode+char_num: begin scan_char_num; cur_chr:=cur_val; 21937 set_math_char(ho(math_code(cur_chr))); 21938 end; 21939 mmode+math_char_num: begin scan_fifteen_bit_int; set_math_char(cur_val); 21940 end; 21941 mmode+math_given: set_math_char(cur_chr); 21942 mmode+delim_num: begin scan_twenty_seven_bit_int; 21943 set_math_char(cur_val div @'10000); 21944 end; 21945 21946 @ The |set_math_char| procedure creates a new noad appropriate to a given 21947 math code, and appends it to the current mlist. However, if the math code 21948 is sufficiently large, the |cur_chr| is treated as an active character and 21949 nothing is appended. 21950 21951 @<Declare act...@>= 21952 procedure set_math_char(@!c:integer); 21953 var p:pointer; {the new noad} 21954 begin if c>=@'100000 then 21955 @<Treat |cur_chr|...@> 21956 else begin p:=new_noad; math_type(nucleus(p)):=math_char; 21957 character(nucleus(p)):=qi(c mod 256); 21958 fam(nucleus(p)):=(c div 256) mod 16; 21959 if c>=var_code then 21960 begin if fam_in_range then fam(nucleus(p)):=cur_fam; 21961 type(p):=ord_noad; 21962 end 21963 else type(p):=ord_noad+(c div @'10000); 21964 link(tail):=p; tail:=p; 21965 end; 21966 end; 21967 21968 @ Primitive math operators like \.{\\mathop} and \.{\\underline} are given 21969 the command code |math_comp|, supplemented by the noad type that they 21970 generate. 21971 21972 @<Put each...@>= 21973 primitive("mathord",math_comp,ord_noad); 21974 @!@:math_ord_}{\.{\\mathord} primitive@> 21975 primitive("mathop",math_comp,op_noad); 21976 @!@:math_op_}{\.{\\mathop} primitive@> 21977 primitive("mathbin",math_comp,bin_noad); 21978 @!@:math_bin_}{\.{\\mathbin} primitive@> 21979 primitive("mathrel",math_comp,rel_noad); 21980 @!@:math_rel_}{\.{\\mathrel} primitive@> 21981 primitive("mathopen",math_comp,open_noad); 21982 @!@:math_open_}{\.{\\mathopen} primitive@> 21983 primitive("mathclose",math_comp,close_noad); 21984 @!@:math_close_}{\.{\\mathclose} primitive@> 21985 primitive("mathpunct",math_comp,punct_noad); 21986 @!@:math_punct_}{\.{\\mathpunct} primitive@> 21987 primitive("mathinner",math_comp,inner_noad); 21988 @!@:math_inner_}{\.{\\mathinner} primitive@> 21989 primitive("underline",math_comp,under_noad); 21990 @!@:underline_}{\.{\\underline} primitive@> 21991 primitive("overline",math_comp,over_noad);@/ 21992 @!@:overline_}{\.{\\overline} primitive@> 21993 primitive("displaylimits",limit_switch,normal); 21994 @!@:display_limits_}{\.{\\displaylimits} primitive@> 21995 primitive("limits",limit_switch,limits); 21996 @!@:limits_}{\.{\\limits} primitive@> 21997 primitive("nolimits",limit_switch,no_limits); 21998 @!@:no_limits_}{\.{\\nolimits} primitive@> 21999 22000 @ @<Cases of |print_cmd_chr|...@>= 22001 math_comp: case chr_code of 22002 ord_noad: print_esc("mathord"); 22003 op_noad: print_esc("mathop"); 22004 bin_noad: print_esc("mathbin"); 22005 rel_noad: print_esc("mathrel"); 22006 open_noad: print_esc("mathopen"); 22007 close_noad: print_esc("mathclose"); 22008 punct_noad: print_esc("mathpunct"); 22009 inner_noad: print_esc("mathinner"); 22010 under_noad: print_esc("underline"); 22011 othercases print_esc("overline") 22012 endcases; 22013 limit_switch: if chr_code=limits then print_esc("limits") 22014 else if chr_code=no_limits then print_esc("nolimits") 22015 else print_esc("displaylimits"); 22016 22017 @ @<Cases of |main_control| that build...@>= 22018 mmode+math_comp: begin tail_append(new_noad); 22019 type(tail):=cur_chr; scan_math(nucleus(tail)); 22020 end; 22021 mmode+limit_switch: math_limit_switch; 22022 22023 @ @<Declare act...@>= 22024 procedure math_limit_switch; 22025 label exit; 22026 begin if head<>tail then if type(tail)=op_noad then 22027 begin subtype(tail):=cur_chr; return; 22028 end; 22029 print_err("Limit controls must follow a math operator"); 22030 @.Limit controls must follow...@> 22031 help1("I'm ignoring this misplaced \limits or \nolimits command."); error; 22032 exit:end; 22033 22034 @ Delimiter fields of noads are filled in by the |scan_delimiter| routine. 22035 The first parameter of this procedure is the |mem| address where the 22036 delimiter is to be placed; the second tells if this delimiter follows 22037 \.{\\radical} or not. 22038 22039 @<Declare act...@>= 22040 procedure scan_delimiter(@!p:pointer;@!r:boolean); 22041 begin if r then scan_twenty_seven_bit_int 22042 else begin @<Get the next non-blank non-relax...@>; 22043 case cur_cmd of 22044 letter,other_char: cur_val:=del_code(cur_chr); 22045 delim_num: scan_twenty_seven_bit_int; 22046 othercases cur_val:=-1 22047 endcases; 22048 end; 22049 if cur_val<0 then @<Report that an invalid delimiter code is being changed 22050 to null; set~|cur_val:=0|@>; 22051 small_fam(p):=(cur_val div @'4000000) mod 16; 22052 small_char(p):=qi((cur_val div @'10000) mod 256); 22053 large_fam(p):=(cur_val div 256) mod 16; 22054 large_char(p):=qi(cur_val mod 256); 22055 end; 22056 22057 @ @<Report that an invalid delimiter...@>= 22058 begin print_err("Missing delimiter (. inserted)"); 22059 @.Missing delimiter...@> 22060 help6("I was expecting to see something like `(' or `\{' or")@/ 22061 ("`\}' here. If you typed, e.g., `{' instead of `\{', you")@/ 22062 ("should probably delete the `{' by typing `1' now, so that")@/ 22063 ("braces don't get unbalanced. Otherwise just proceed.")@/ 22064 ("Acceptable delimiters are characters whose \delcode is")@/ 22065 ("nonnegative, or you can use `\delimiter <delimiter code>'."); 22066 back_error; cur_val:=0; 22067 end 22068 22069 @ @<Cases of |main_control| that build...@>= 22070 mmode+radical:math_radical; 22071 22072 @ @<Declare act...@>= 22073 procedure math_radical; 22074 begin tail_append(get_node(radical_noad_size)); 22075 type(tail):=radical_noad; subtype(tail):=normal; 22076 mem[nucleus(tail)].hh:=empty_field; 22077 mem[subscr(tail)].hh:=empty_field; 22078 mem[supscr(tail)].hh:=empty_field; 22079 scan_delimiter(left_delimiter(tail),true); scan_math(nucleus(tail)); 22080 end; 22081 22082 @ @<Cases of |main_control| that build...@>= 22083 mmode+accent,mmode+math_accent:math_ac; 22084 22085 @ @<Declare act...@>= 22086 procedure math_ac; 22087 begin if cur_cmd=accent then 22088 @<Complain that the user should have said \.{\\mathaccent}@>; 22089 tail_append(get_node(accent_noad_size)); 22090 type(tail):=accent_noad; subtype(tail):=normal; 22091 mem[nucleus(tail)].hh:=empty_field; 22092 mem[subscr(tail)].hh:=empty_field; 22093 mem[supscr(tail)].hh:=empty_field; 22094 math_type(accent_chr(tail)):=math_char; 22095 scan_fifteen_bit_int; 22096 character(accent_chr(tail)):=qi(cur_val mod 256); 22097 if (cur_val>=var_code)and fam_in_range then fam(accent_chr(tail)):=cur_fam 22098 else fam(accent_chr(tail)):=(cur_val div 256) mod 16; 22099 scan_math(nucleus(tail)); 22100 end; 22101 22102 @ @<Complain that the user should have said \.{\\mathaccent}@>= 22103 begin print_err("Please use "); print_esc("mathaccent"); 22104 print(" for accents in math mode"); 22105 @.Please use \\mathaccent...@> 22106 help2("I'm changing \accent to \mathaccent here; wish me luck.")@/ 22107 ("(Accents are not the same in formulas as they are in text.)"); 22108 error; 22109 end 22110 22111 @ @<Cases of |main_control| that build...@>= 22112 mmode+vcenter: begin scan_spec(vcenter_group,false); normal_paragraph; 22113 push_nest; mode:=-vmode; prev_depth:=ignore_depth; 22114 if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text); 22115 end; 22116 22117 @ @<Cases of |handle...@>= 22118 vcenter_group: begin end_graf; unsave; save_ptr:=save_ptr-2; 22119 p:=vpack(link(head),saved(1),saved(0)); pop_nest; 22120 tail_append(new_noad); type(tail):=vcenter_noad; 22121 math_type(nucleus(tail)):=sub_box; info(nucleus(tail)):=p; 22122 end; 22123 22124 @ The routine that inserts a |style_node| holds no surprises. 22125 22126 @<Put each...@>= 22127 primitive("displaystyle",math_style,display_style); 22128 @!@:display_style_}{\.{\\displaystyle} primitive@> 22129 primitive("textstyle",math_style,text_style); 22130 @!@:text_style_}{\.{\\textstyle} primitive@> 22131 primitive("scriptstyle",math_style,script_style); 22132 @!@:script_style_}{\.{\\scriptstyle} primitive@> 22133 primitive("scriptscriptstyle",math_style,script_script_style); 22134 @!@:script_script_style_}{\.{\\scriptscriptstyle} primitive@> 22135 22136 @ @<Cases of |print_cmd_chr|...@>= 22137 math_style: print_style(chr_code); 22138 22139 @ @<Cases of |main_control| that build...@>= 22140 mmode+math_style: tail_append(new_style(cur_chr)); 22141 mmode+non_script: begin tail_append(new_glue(zero_glue)); 22142 subtype(tail):=cond_math_glue; 22143 end; 22144 mmode+math_choice: append_choices; 22145 22146 @ The routine that scans the four mlists of a \.{\\mathchoice} is very 22147 much like the routine that builds discretionary nodes. 22148 22149 @<Declare act...@>= 22150 procedure append_choices; 22151 begin tail_append(new_choice); incr(save_ptr); saved(-1):=0; 22152 push_math(math_choice_group); scan_left_brace; 22153 end; 22154 22155 @ @<Cases of |handle_right_brace|...@>= 22156 math_choice_group: build_choices; 22157 22158 @ @<Declare act...@>= 22159 @t\4@>@<Declare the function called |fin_mlist|@>@t@>@;@/ 22160 procedure build_choices; 22161 label exit; 22162 var p:pointer; {the current mlist} 22163 begin unsave; p:=fin_mlist(null); 22164 case saved(-1) of 22165 0:display_mlist(tail):=p; 22166 1:text_mlist(tail):=p; 22167 2:script_mlist(tail):=p; 22168 3:begin script_script_mlist(tail):=p; decr(save_ptr); return; 22169 end; 22170 end; {there are no other cases} 22171 incr(saved(-1)); push_math(math_choice_group); scan_left_brace; 22172 exit:end; 22173 22174 @ Subscripts and superscripts are attached to the previous nucleus by the 22175 @^superscripts@>@^subscripts@> 22176 action procedure called |sub_sup|. We use the facts that |sub_mark=sup_mark+1| 22177 and |subscr(p)=supscr(p)+1|. 22178 22179 @<Cases of |main_control| that build...@>= 22180 mmode+sub_mark,mmode+sup_mark: sub_sup; 22181 22182 @ @<Declare act...@>= 22183 procedure sub_sup; 22184 var t:small_number; {type of previous sub/superscript} 22185 @!p:pointer; {field to be filled by |scan_math|} 22186 begin t:=empty; p:=null; 22187 if tail<>head then if scripts_allowed(tail) then 22188 begin p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|} 22189 t:=math_type(p); 22190 end; 22191 if (p=null)or(t<>empty) then @<Insert a dummy noad to be sub/superscripted@>; 22192 scan_math(p); 22193 end; 22194 22195 @ @<Insert a dummy...@>= 22196 begin tail_append(new_noad); 22197 p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|} 22198 if t<>empty then 22199 begin if cur_cmd=sup_mark then 22200 begin print_err("Double superscript"); 22201 @.Double superscript@> 22202 help1("I treat `x^1^2' essentially like `x^1{}^2'."); 22203 end 22204 else begin print_err("Double subscript"); 22205 @.Double subscript@> 22206 help1("I treat `x_1_2' essentially like `x_1{}_2'."); 22207 end; 22208 error; 22209 end; 22210 end 22211 22212 @ An operation like `\.{\\over}' causes the current mlist to go into a 22213 state of suspended animation: |incompleat_noad| points to a |fraction_noad| 22214 that contains the mlist-so-far as its numerator, while the denominator 22215 is yet to come. Finally when the mlist is finished, the denominator will 22216 go into the incompleat fraction noad, and that noad will become the 22217 whole formula, unless it is surrounded by `\.{\\left}' and `\.{\\right}' 22218 delimiters. 22219 22220 @d above_code=0 { `\.{\\above}' } 22221 @d over_code=1 { `\.{\\over}' } 22222 @d atop_code=2 { `\.{\\atop}' } 22223 @d delimited_code=3 { `\.{\\abovewithdelims}', etc.} 22224 22225 @<Put each...@>= 22226 primitive("above",above,above_code);@/ 22227 @!@:above_}{\.{\\above} primitive@> 22228 primitive("over",above,over_code);@/ 22229 @!@:over_}{\.{\\over} primitive@> 22230 primitive("atop",above,atop_code);@/ 22231 @!@:atop_}{\.{\\atop} primitive@> 22232 primitive("abovewithdelims",above,delimited_code+above_code);@/ 22233 @!@:above_with_delims_}{\.{\\abovewithdelims} primitive@> 22234 primitive("overwithdelims",above,delimited_code+over_code);@/ 22235 @!@:over_with_delims_}{\.{\\overwithdelims} primitive@> 22236 primitive("atopwithdelims",above,delimited_code+atop_code); 22237 @!@:atop_with_delims_}{\.{\\atopwithdelims} primitive@> 22238 22239 @ @<Cases of |print_cmd_chr|...@>= 22240 above: case chr_code of 22241 over_code:print_esc("over"); 22242 atop_code:print_esc("atop"); 22243 delimited_code+above_code:print_esc("abovewithdelims"); 22244 delimited_code+over_code:print_esc("overwithdelims"); 22245 delimited_code+atop_code:print_esc("atopwithdelims"); 22246 othercases print_esc("above") 22247 endcases; 22248 22249 @ @<Cases of |main_control| that build...@>= 22250 mmode+above: math_fraction; 22251 22252 @ @<Declare act...@>= 22253 procedure math_fraction; 22254 var c:small_number; {the type of generalized fraction we are scanning} 22255 begin c:=cur_chr; 22256 if incompleat_noad<>null then 22257 @<Ignore the fraction operation and complain about this ambiguous case@> 22258 else begin incompleat_noad:=get_node(fraction_noad_size); 22259 type(incompleat_noad):=fraction_noad; 22260 subtype(incompleat_noad):=normal; 22261 math_type(numerator(incompleat_noad)):=sub_mlist; 22262 info(numerator(incompleat_noad)):=link(head); 22263 mem[denominator(incompleat_noad)].hh:=empty_field; 22264 mem[left_delimiter(incompleat_noad)].qqqq:=null_delimiter; 22265 mem[right_delimiter(incompleat_noad)].qqqq:=null_delimiter;@/ 22266 link(head):=null; tail:=head; 22267 @<Use code |c| to distinguish between generalized fractions@>; 22268 end; 22269 end; 22270 22271 @ @<Use code |c|...@>= 22272 if c>=delimited_code then 22273 begin scan_delimiter(left_delimiter(incompleat_noad),false); 22274 scan_delimiter(right_delimiter(incompleat_noad),false); 22275 end; 22276 case c mod delimited_code of 22277 above_code: begin scan_normal_dimen; 22278 thickness(incompleat_noad):=cur_val; 22279 end; 22280 over_code: thickness(incompleat_noad):=default_code; 22281 atop_code: thickness(incompleat_noad):=0; 22282 end {there are no other cases} 22283 22284 @ @<Ignore the fraction...@>= 22285 begin if c>=delimited_code then 22286 begin scan_delimiter(garbage,false); scan_delimiter(garbage,false); 22287 end; 22288 if c mod delimited_code=above_code then scan_normal_dimen; 22289 print_err("Ambiguous; you need another { and }"); 22290 @.Ambiguous...@> 22291 help3("I'm ignoring this fraction specification, since I don't")@/ 22292 ("know whether a construction like `x \over y \over z'")@/ 22293 ("means `{x \over y} \over z' or `x \over {y \over z}'."); 22294 error; 22295 end 22296 22297 @ At the end of a math formula or subformula, the |fin_mlist| routine is 22298 called upon to return a pointer to the newly completed mlist, and to 22299 pop the nest back to the enclosing semantic level. The parameter to 22300 |fin_mlist|, if not null, points to a |right_noad| that ends the 22301 current mlist; this |right_noad| has not yet been appended. 22302 22303 @<Declare the function called |fin_mlist|@>= 22304 function fin_mlist(@!p:pointer):pointer; 22305 var q:pointer; {the mlist to return} 22306 begin if incompleat_noad<>null then @<Compleat the incompleat noad@> 22307 else begin link(tail):=p; q:=link(head); 22308 end; 22309 pop_nest; fin_mlist:=q; 22310 end; 22311 22312 @ @<Compleat...@>= 22313 begin math_type(denominator(incompleat_noad)):=sub_mlist; 22314 info(denominator(incompleat_noad)):=link(head); 22315 if p=null then q:=incompleat_noad 22316 else begin q:=info(numerator(incompleat_noad)); 22317 if type(q)<>left_noad then confusion("right"); 22318 @:this can't happen right}{\quad right@> 22319 info(numerator(incompleat_noad)):=link(q); 22320 link(q):=incompleat_noad; link(incompleat_noad):=p; 22321 end; 22322 end 22323 22324 @ Now at last we're ready to see what happens when a right brace occurs 22325 in a math formula. Two special cases are simplified here: Braces are effectively 22326 removed when they surround a single Ord without sub/superscripts, or when they 22327 surround an accent that is the nucleus of an Ord atom. 22328 22329 @<Cases of |handle...@>= 22330 math_group: begin unsave; decr(save_ptr);@/ 22331 math_type(saved(0)):=sub_mlist; p:=fin_mlist(null); info(saved(0)):=p; 22332 if p<>null then if link(p)=null then 22333 if type(p)=ord_noad then 22334 begin if math_type(subscr(p))=empty then 22335 if math_type(supscr(p))=empty then 22336 begin mem[saved(0)].hh:=mem[nucleus(p)].hh; 22337 free_node(p,noad_size); 22338 end; 22339 end 22340 else if type(p)=accent_noad then if saved(0)=nucleus(tail) then 22341 if type(tail)=ord_noad then @<Replace the tail of the list by |p|@>; 22342 end; 22343 22344 @ @<Replace the tail...@>= 22345 begin q:=head; while link(q)<>tail do q:=link(q); 22346 link(q):=p; free_node(tail,noad_size); tail:=p; 22347 end 22348 22349 @ We have dealt with all constructions of math mode except `\.{\\left}' and 22350 `\.{\\right}', so the picture is completed by the following sections of 22351 the program. 22352 22353 @<Put each...@>= 22354 primitive("left",left_right,left_noad); 22355 @!@:left_}{\.{\\left} primitive@> 22356 primitive("right",left_right,right_noad); 22357 @!@:right_}{\.{\\right} primitive@> 22358 text(frozen_right):="right"; eqtb[frozen_right]:=eqtb[cur_val]; 22359 22360 @ @<Cases of |print_cmd_chr|...@>= 22361 left_right: if chr_code=left_noad then print_esc("left") 22362 else print_esc("right"); 22363 22364 @ @<Cases of |main_control| that build...@>= 22365 mmode+left_right: math_left_right; 22366 22367 @ @<Declare act...@>= 22368 procedure math_left_right; 22369 var t:small_number; {|left_noad| or |right_noad|} 22370 @!p:pointer; {new noad} 22371 begin t:=cur_chr; 22372 if (t=right_noad)and(cur_group<>math_left_group) then 22373 @<Try to recover from mismatched \.{\\right}@> 22374 else begin p:=new_noad; type(p):=t; 22375 scan_delimiter(delimiter(p),false); 22376 if t=left_noad then 22377 begin push_math(math_left_group); link(head):=p; tail:=p; 22378 end 22379 else begin p:=fin_mlist(p); unsave; {end of |math_left_group|} 22380 tail_append(new_noad); type(tail):=inner_noad; 22381 math_type(nucleus(tail)):=sub_mlist; 22382 info(nucleus(tail)):=p; 22383 end; 22384 end; 22385 end; 22386 22387 @ @<Try to recover from mismatch...@>= 22388 begin if cur_group=math_shift_group then 22389 begin scan_delimiter(garbage,false); 22390 print_err("Extra "); print_esc("right"); 22391 @.Extra \\right.@> 22392 help1("I'm ignoring a \right that had no matching \left."); 22393 error; 22394 end 22395 else off_save; 22396 end 22397 22398 @ Here is the only way out of math mode. 22399 22400 @<Cases of |main_control| that build...@>= 22401 mmode+math_shift: if cur_group=math_shift_group then after_math 22402 else off_save; 22403 22404 @ @<Declare act...@>= 22405 procedure after_math; 22406 var l:boolean; {`\.{\\leqno}' instead of `\.{\\eqno}'} 22407 @!danger:boolean; {not enough symbol fonts are present} 22408 @!m:integer; {|mmode| or |-mmode|} 22409 @!p:pointer; {the formula} 22410 @!a:pointer; {box containing equation number} 22411 @<Local variables for finishing a displayed formula@>@; 22412 begin danger:=false; 22413 @<Check that the necessary fonts for math symbols are present; 22414 if not, flush the current math lists and set |danger:=true|@>; 22415 m:=mode; l:=false; p:=fin_mlist(null); {this pops the nest} 22416 if mode=-m then {end of equation number} 22417 begin @<Check that another \.\$ follows@>; 22418 cur_mlist:=p; cur_style:=text_style; mlist_penalties:=false; 22419 mlist_to_hlist; a:=hpack(link(temp_head),natural); 22420 unsave; decr(save_ptr); {now |cur_group=math_shift_group|} 22421 if saved(0)=1 then l:=true; 22422 danger:=false; 22423 @<Check that the necessary fonts for math symbols are present; 22424 if not, flush the current math lists and set |danger:=true|@>; 22425 m:=mode; p:=fin_mlist(null); 22426 end 22427 else a:=null; 22428 if m<0 then @<Finish math in text@> 22429 else begin if a=null then @<Check that another \.\$ follows@>; 22430 @<Finish displayed math@>; 22431 end; 22432 end; 22433 22434 @ @<Check that the necessary fonts...@>= 22435 if (font_params[fam_fnt(2+text_size)]<total_mathsy_params)or@| 22436 (font_params[fam_fnt(2+script_size)]<total_mathsy_params)or@| 22437 (font_params[fam_fnt(2+script_script_size)]<total_mathsy_params) then 22438 begin print_err("Math formula deleted: Insufficient symbol fonts");@/ 22439 @.Math formula deleted...@> 22440 help3("Sorry, but I can't typeset math unless \textfont 2")@/ 22441 ("and \scriptfont 2 and \scriptscriptfont 2 have all")@/ 22442 ("the \fontdimen values needed in math symbol fonts."); 22443 error; flush_math; danger:=true; 22444 end 22445 else if (font_params[fam_fnt(3+text_size)]<total_mathex_params)or@| 22446 (font_params[fam_fnt(3+script_size)]<total_mathex_params)or@| 22447 (font_params[fam_fnt(3+script_script_size)]<total_mathex_params) then 22448 begin print_err("Math formula deleted: Insufficient extension fonts");@/ 22449 help3("Sorry, but I can't typeset math unless \textfont 3")@/ 22450 ("and \scriptfont 3 and \scriptscriptfont 3 have all")@/ 22451 ("the \fontdimen values needed in math extension fonts."); 22452 error; flush_math; danger:=true; 22453 end 22454 22455 @ The |unsave| is done after everything else here; hence an appearance of 22456 `\.{\\mathsurround}' inside of `\.{\$...\$}' affects the spacing at these 22457 particular \.\$'s. This is consistent with the conventions of 22458 `\.{\$\$...\$\$}', since `\.{\\abovedisplayskip}' inside a display affects the 22459 space above that display. 22460 22461 @<Finish math in text@>= 22462 begin tail_append(new_math(math_surround,before)); 22463 cur_mlist:=p; cur_style:=text_style; mlist_penalties:=(mode>0); mlist_to_hlist; 22464 link(tail):=link(temp_head); 22465 while link(tail)<>null do tail:=link(tail); 22466 tail_append(new_math(math_surround,after)); 22467 space_factor:=1000; unsave; 22468 end 22469 22470 @ \TeX\ gets to the following part of the program when the first `\.\$' ending 22471 a display has been scanned. 22472 22473 @<Check that another \.\$ follows@>= 22474 begin get_x_token; 22475 if cur_cmd<>math_shift then 22476 begin print_err("Display math should end with $$"); 22477 @.Display math...with \$\$@> 22478 help2("The `$' that I just saw supposedly matches a previous `$$'.")@/ 22479 ("So I shall assume that you typed `$$' both times."); 22480 back_error; 22481 end; 22482 end 22483 22484 @ We have saved the worst for last: The fussiest part of math mode processing 22485 occurs when a displayed formula is being centered and placed with an optional 22486 equation number. 22487 22488 @<Local variables for finishing...@>= 22489 @!b:pointer; {box containing the equation} 22490 @!w:scaled; {width of the equation} 22491 @!z:scaled; {width of the line} 22492 @!e:scaled; {width of equation number} 22493 @!q:scaled; {width of equation number plus space to separate from equation} 22494 @!d:scaled; {displacement of equation in the line} 22495 @!s:scaled; {move the line right this much} 22496 @!g1,@!g2:small_number; {glue parameter codes for before and after} 22497 @!r:pointer; {kern node used to position the display} 22498 @!t:pointer; {tail of adjustment list} 22499 22500 @ At this time |p| points to the mlist for the formula; |a| is either 22501 |null| or it points to a box containing the equation number; and we are in 22502 vertical mode (or internal vertical mode). 22503 22504 @<Finish displayed math@>= 22505 cur_mlist:=p; cur_style:=display_style; mlist_penalties:=false; 22506 mlist_to_hlist; p:=link(temp_head);@/ 22507 adjust_tail:=adjust_head; b:=hpack(p,natural); p:=list_ptr(b); 22508 t:=adjust_tail; adjust_tail:=null;@/ 22509 w:=width(b); z:=display_width; s:=display_indent; 22510 if (a=null)or danger then 22511 begin e:=0; q:=0; 22512 end 22513 else begin e:=width(a); q:=e+math_quad(text_size); 22514 end; 22515 if w+q>z then 22516 @<Squeeze the equation as much as possible; if there is an equation 22517 number that should go on a separate line by itself, 22518 set~|e:=0|@>; 22519 @<Determine the displacement, |d|, of the left edge of the equation, with 22520 respect to the line size |z|, assuming that |l=false|@>; 22521 @<Append the glue or equation number preceding the display@>; 22522 @<Append the display and perhaps also the equation number@>; 22523 @<Append the glue or equation number following the display@>; 22524 resume_after_display 22525 22526 @ @<Declare act...@>= 22527 procedure resume_after_display; 22528 begin if cur_group<>math_shift_group then confusion("display"); 22529 @:this can't happen display}{\quad display@> 22530 unsave; prev_graf:=prev_graf+3; 22531 push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang; 22532 prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min)) 22533 *@'200000+cur_lang; 22534 @<Scan an optional space@>; 22535 if nest_ptr=1 then build_page; 22536 end; 22537 22538 @ The user can force the equation number to go on a separate line 22539 by causing its width to be zero. 22540 22541 @<Squeeze the equation as much as possible...@>= 22542 begin if (e<>0)and((w-total_shrink[normal]+q<=z)or@| 22543 (total_shrink[fil]<>0)or(total_shrink[fill]<>0)or 22544 (total_shrink[filll]<>0)) then 22545 begin free_node(b,box_node_size); 22546 b:=hpack(p,z-q,exactly); 22547 end 22548 else begin e:=0; 22549 if w>z then 22550 begin free_node(b,box_node_size); 22551 b:=hpack(p,z,exactly); 22552 end; 22553 end; 22554 w:=width(b); 22555 end 22556 22557 @ We try first to center the display without regard to the existence of 22558 the equation number. If that would make it too close (where ``too close'' 22559 means that the space between display and equation number is less than the 22560 width of the equation number), we either center it in the remaining space 22561 or move it as far from the equation number as possible. The latter alternative 22562 is taken only if the display begins with glue, since we assume that the 22563 user put glue there to control the spacing precisely. 22564 22565 @<Determine the displacement, |d|, of the left edge of the equation...@>= 22566 d:=half(z-w); 22567 if (e>0)and(d<2*e) then {too close} 22568 begin d:=half(z-w-e); 22569 if p<>null then if not is_char_node(p) then if type(p)=glue_node then d:=0; 22570 end 22571 22572 @ If the equation number is set on a line by itself, either before or 22573 after the formula, we append an infinite penalty so that no page break will 22574 separate the display from its number; and we use the same size and 22575 displacement for all three potential lines of the display, even though 22576 `\.{\\parshape}' may specify them differently. 22577 22578 @<Append the glue or equation number preceding the display@>= 22579 tail_append(new_penalty(pre_display_penalty));@/ 22580 if (d+s<=pre_display_size)or l then {not enough clearance} 22581 begin g1:=above_display_skip_code; g2:=below_display_skip_code; 22582 end 22583 else begin g1:=above_display_short_skip_code; 22584 g2:=below_display_short_skip_code; 22585 end; 22586 if l and(e=0) then {it follows that |type(a)=hlist_node|} 22587 begin shift_amount(a):=s; append_to_vlist(a); 22588 tail_append(new_penalty(inf_penalty)); 22589 end 22590 else tail_append(new_param_glue(g1)) 22591 22592 @ @<Append the display and perhaps also the equation number@>= 22593 if e<>0 then 22594 begin r:=new_kern(z-w-e-d); 22595 if l then 22596 begin link(a):=r; link(r):=b; b:=a; d:=0; 22597 end 22598 else begin link(b):=r; link(r):=a; 22599 end; 22600 b:=hpack(b,natural); 22601 end; 22602 shift_amount(b):=s+d; append_to_vlist(b) 22603 22604 @ @<Append the glue or equation number following the display@>= 22605 if (a<>null)and(e=0)and not l then 22606 begin tail_append(new_penalty(inf_penalty)); 22607 shift_amount(a):=s+z-width(a); 22608 append_to_vlist(a); 22609 g2:=0; 22610 end; 22611 if t<>adjust_head then {migrating material comes after equation number} 22612 begin link(tail):=link(adjust_head); tail:=t; 22613 end; 22614 tail_append(new_penalty(post_display_penalty)); 22615 if g2>0 then tail_append(new_param_glue(g2)) 22616 22617 @ When \.{\\halign} appears in a display, the alignment routines operate 22618 essentially as they do in vertical mode. Then the following program is 22619 activated, with |p| and |q| pointing to the beginning and end of the 22620 resulting list, and with |aux_save| holding the |prev_depth| value. 22621 22622 @<Finish an alignment in a display@>= 22623 begin do_assignments; 22624 if cur_cmd<>math_shift then @<Pontificate about improper alignment in display@> 22625 else @<Check that another \.\$ follows@>; 22626 pop_nest; 22627 tail_append(new_penalty(pre_display_penalty)); 22628 tail_append(new_param_glue(above_display_skip_code)); 22629 link(tail):=p; 22630 if p<>null then tail:=q; 22631 tail_append(new_penalty(post_display_penalty)); 22632 tail_append(new_param_glue(below_display_skip_code)); 22633 prev_depth:=aux_save.sc; resume_after_display; 22634 end 22635 22636 @ @<Pontificate...@>= 22637 begin print_err("Missing $$ inserted"); 22638 @.Missing {\$\$} inserted@> 22639 help2("Displays can use special alignments (like \eqalignno)")@/ 22640 ("only if nothing but the alignment itself is between $$'s."); 22641 back_error; 22642 end 22643 22644 @* \[49] Mode-independent processing. 22645 The long |main_control| procedure has now been fully specified, except for 22646 certain activities that are independent of the current mode. These activities 22647 do not change the current vlist or hlist or mlist; if they change anything, 22648 it is the value of a parameter or the meaning of a control sequence. 22649 22650 Assignments to values in |eqtb| can be global or local. Furthermore, a 22651 control sequence can be defined to be `\.{\\long}' or `\.{\\outer}', and 22652 it might or might not be expanded. The prefixes `\.{\\global}', `\.{\\long}', 22653 and `\.{\\outer}' can occur in any order. Therefore we assign binary numeric 22654 codes, making it possible to accumulate the union of all specified prefixes 22655 by adding the corresponding codes. (\PASCAL's |set| operations could also 22656 have been used.) 22657 22658 @<Put each...@>= 22659 primitive("long",prefix,1); 22660 @!@:long_}{\.{\\long} primitive@> 22661 primitive("outer",prefix,2); 22662 @!@:outer_}{\.{\\outer} primitive@> 22663 primitive("global",prefix,4); 22664 @!@:global_}{\.{\\global} primitive@> 22665 primitive("def",def,0); 22666 @!@:def_}{\.{\\def} primitive@> 22667 primitive("gdef",def,1); 22668 @!@:gdef_}{\.{\\gdef} primitive@> 22669 primitive("edef",def,2); 22670 @!@:edef_}{\.{\\edef} primitive@> 22671 primitive("xdef",def,3); 22672 @!@:xdef_}{\.{\\xdef} primitive@> 22673 22674 @ @<Cases of |print_cmd_chr|...@>= 22675 prefix: if chr_code=1 then print_esc("long") 22676 else if chr_code=2 then print_esc("outer") 22677 else print_esc("global"); 22678 def: if chr_code=0 then print_esc("def") 22679 else if chr_code=1 then print_esc("gdef") 22680 else if chr_code=2 then print_esc("edef") 22681 else print_esc("xdef"); 22682 22683 @ Every prefix, and every command code that might or might not be prefixed, 22684 calls the action procedure |prefixed_command|. This routine accumulates 22685 a sequence of prefixes until coming to a non-prefix, then it carries out 22686 the command. 22687 22688 @<Cases of |main_control| that don't...@>= 22689 any_mode(toks_register), 22690 any_mode(assign_toks), 22691 any_mode(assign_int), 22692 any_mode(assign_dimen), 22693 any_mode(assign_glue), 22694 any_mode(assign_mu_glue), 22695 any_mode(assign_font_dimen), 22696 any_mode(assign_font_int), 22697 any_mode(set_aux), 22698 any_mode(set_prev_graf), 22699 any_mode(set_page_dimen), 22700 any_mode(set_page_int), 22701 any_mode(set_box_dimen), 22702 any_mode(set_shape), 22703 any_mode(def_code), 22704 any_mode(def_family), 22705 any_mode(set_font), 22706 any_mode(def_font), 22707 any_mode(register), 22708 any_mode(advance), 22709 any_mode(multiply), 22710 any_mode(divide), 22711 any_mode(prefix), 22712 any_mode(let), 22713 any_mode(shorthand_def), 22714 any_mode(read_to_cs), 22715 any_mode(def), 22716 any_mode(set_box), 22717 any_mode(hyph_data), 22718 any_mode(set_interaction):prefixed_command; 22719 22720 @ If the user says, e.g., `\.{\\global\\global}', the redundancy is 22721 silently accepted. 22722 22723 @<Declare act...@>= 22724 @t\4@>@<Declare subprocedures for |prefixed_command|@>@t@>@;@/ 22725 procedure prefixed_command; 22726 label done,exit; 22727 var a:small_number; {accumulated prefix codes so far} 22728 @!f:internal_font_number; {identifies a font} 22729 @!j:halfword; {index into a \.{\\parshape} specification} 22730 @!k:font_index; {index into |font_info|} 22731 @!p,@!q:pointer; {for temporary short-term use} 22732 @!n:integer; {ditto} 22733 @!e:boolean; {should a definition be expanded? or was \.{\\let} not done?} 22734 begin a:=0; 22735 while cur_cmd=prefix do 22736 begin if not odd(a div cur_chr) then a:=a+cur_chr; 22737 @<Get the next non-blank non-relax...@>; 22738 if cur_cmd<=max_non_prefixed_command then 22739 @<Discard erroneous prefixes and |return|@>; 22740 end; 22741 @<Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant@>; 22742 @<Adjust \(f)for the setting of \.{\\globaldefs}@>; 22743 case cur_cmd of 22744 @t\4@>@<Assignments@>@; 22745 othercases confusion("prefix") 22746 @:this can't happen prefix}{\quad prefix@> 22747 endcases; 22748 done: @<Insert a token saved by \.{\\afterassignment}, if any@>; 22749 exit:end; 22750 22751 @ @<Discard erroneous...@>= 22752 begin print_err("You can't use a prefix with `"); 22753 @.You can't use a prefix with x@> 22754 print_cmd_chr(cur_cmd,cur_chr); print_char("'"); 22755 help1("I'll pretend you didn't say \long or \outer or \global."); 22756 back_error; return; 22757 end 22758 22759 @ @<Discard the prefixes...@>= 22760 if (cur_cmd<>def)and(a mod 4<>0) then 22761 begin print_err("You can't use `"); print_esc("long"); print("' or `"); 22762 print_esc("outer"); print("' with `"); 22763 @.You can't use \\long...@> 22764 print_cmd_chr(cur_cmd,cur_chr); print_char("'"); 22765 help1("I'll pretend you didn't say \long or \outer here."); 22766 error; 22767 end 22768 22769 @ The previous routine does not have to adjust |a| so that |a mod 4=0|, 22770 since the following routines test for the \.{\\global} prefix as follows. 22771 22772 @d global==(a>=4) 22773 @d define(#)==if global then geq_define(#)@+else eq_define(#) 22774 @d word_define(#)==if global then geq_word_define(#)@+else eq_word_define(#) 22775 22776 @<Adjust \(f)for the setting of \.{\\globaldefs}@>= 22777 if global_defs<>0 then 22778 if global_defs<0 then 22779 begin if global then a:=a-4; 22780 end 22781 else begin if not global then a:=a+4; 22782 end 22783 22784 @ When a control sequence is to be defined, by \.{\\def} or \.{\\let} or 22785 something similar, the |get_r_token| routine will substitute a special 22786 control sequence for a token that is not redefinable. 22787 22788 @<Declare subprocedures for |prefixed_command|@>= 22789 procedure get_r_token; 22790 label restart; 22791 begin restart: repeat get_token; 22792 until cur_tok<>space_token; 22793 if (cur_cs=0)or(cur_cs>frozen_control_sequence) then 22794 begin print_err("Missing control sequence inserted"); 22795 @.Missing control...@> 22796 help5("Please don't say `\def cs{...}', say `\def\cs{...}'.")@/ 22797 ("I've inserted an inaccessible control sequence so that your")@/ 22798 ("definition will be completed without mixing me up too badly.")@/ 22799 ("You can recover graciously from this error, if you're")@/ 22800 ("careful; see exercise 27.2 in The TeXbook."); 22801 @:TeXbook}{\sl The \TeX book@> 22802 if cur_cs=0 then back_input; 22803 cur_tok:=cs_token_flag+frozen_protection; ins_error; goto restart; 22804 end; 22805 end; 22806 22807 @ @<Initialize table entries...@>= 22808 text(frozen_protection):="inaccessible"; 22809 @.inaccessible@> 22810 22811 @ Here's an example of the way many of the following routines operate. 22812 (Unfortunately, they aren't all as simple as this.) 22813 22814 @<Assignments@>= 22815 set_font: define(cur_font_loc,data,cur_chr); 22816 22817 @ When a |def| command has been scanned, 22818 |cur_chr| is odd if the definition is supposed to be global, and 22819 |cur_chr>=2| if the definition is supposed to be expanded. 22820 22821 @<Assignments@>= 22822 def: begin if odd(cur_chr)and not global and(global_defs>=0) then a:=a+4; 22823 e:=(cur_chr>=2); get_r_token; p:=cur_cs; 22824 q:=scan_toks(true,e); define(p,call+(a mod 4),def_ref); 22825 end; 22826 22827 @ Both \.{\\let} and \.{\\futurelet} share the command code |let|. 22828 22829 @<Put each...@>= 22830 primitive("let",let,normal);@/ 22831 @!@:let_}{\.{\\let} primitive@> 22832 primitive("futurelet",let,normal+1);@/ 22833 @!@:future_let_}{\.{\\futurelet} primitive@> 22834 22835 @ @<Cases of |print_cmd_chr|...@>= 22836 let: if chr_code<>normal then print_esc("futurelet")@+else print_esc("let"); 22837 22838 @ @<Assignments@>= 22839 let: begin n:=cur_chr; 22840 get_r_token; p:=cur_cs; 22841 if n=normal then 22842 begin repeat get_token; 22843 until cur_cmd<>spacer; 22844 if cur_tok=other_token+"=" then 22845 begin get_token; 22846 if cur_cmd=spacer then get_token; 22847 end; 22848 end 22849 else begin get_token; q:=cur_tok; get_token; back_input; 22850 cur_tok:=q; back_input; {look ahead, then back up} 22851 end; {note that |back_input| doesn't affect |cur_cmd|, |cur_chr|} 22852 if cur_cmd>=call then add_token_ref(cur_chr); 22853 define(p,cur_cmd,cur_chr); 22854 end; 22855 22856 @ A \.{\\chardef} creates a control sequence whose |cmd| is |char_given|; 22857 a \.{\\mathchardef} creates a control sequence whose |cmd| is |math_given|; 22858 and the corresponding |chr| is the character code or math code. A \.{\\countdef} 22859 or \.{\\dimendef} or \.{\\skipdef} or \.{\\muskipdef} creates a control 22860 sequence whose |cmd| is |assign_int| or \dots\ or |assign_mu_glue|, and the 22861 corresponding |chr| is the |eqtb| location of the internal register in question. 22862 22863 @d char_def_code=0 {|shorthand_def| for \.{\\chardef}} 22864 @d math_char_def_code=1 {|shorthand_def| for \.{\\mathchardef}} 22865 @d count_def_code=2 {|shorthand_def| for \.{\\countdef}} 22866 @d dimen_def_code=3 {|shorthand_def| for \.{\\dimendef}} 22867 @d skip_def_code=4 {|shorthand_def| for \.{\\skipdef}} 22868 @d mu_skip_def_code=5 {|shorthand_def| for \.{\\muskipdef}} 22869 @d toks_def_code=6 {|shorthand_def| for \.{\\toksdef}} 22870 22871 @<Put each...@>= 22872 primitive("chardef",shorthand_def,char_def_code);@/ 22873 @!@:char_def_}{\.{\\chardef} primitive@> 22874 primitive("mathchardef",shorthand_def,math_char_def_code);@/ 22875 @!@:math_char_def_}{\.{\\mathchardef} primitive@> 22876 primitive("countdef",shorthand_def,count_def_code);@/ 22877 @!@:count_def_}{\.{\\countdef} primitive@> 22878 primitive("dimendef",shorthand_def,dimen_def_code);@/ 22879 @!@:dimen_def_}{\.{\\dimendef} primitive@> 22880 primitive("skipdef",shorthand_def,skip_def_code);@/ 22881 @!@:skip_def_}{\.{\\skipdef} primitive@> 22882 primitive("muskipdef",shorthand_def,mu_skip_def_code);@/ 22883 @!@:mu_skip_def_}{\.{\\muskipdef} primitive@> 22884 primitive("toksdef",shorthand_def,toks_def_code);@/ 22885 @!@:toks_def_}{\.{\\toksdef} primitive@> 22886 22887 @ @<Cases of |print_cmd_chr|...@>= 22888 shorthand_def: case chr_code of 22889 char_def_code: print_esc("chardef"); 22890 math_char_def_code: print_esc("mathchardef"); 22891 count_def_code: print_esc("countdef"); 22892 dimen_def_code: print_esc("dimendef"); 22893 skip_def_code: print_esc("skipdef"); 22894 mu_skip_def_code: print_esc("muskipdef"); 22895 othercases print_esc("toksdef") 22896 endcases; 22897 char_given: begin print_esc("char"); print_hex(chr_code); 22898 end; 22899 math_given: begin print_esc("mathchar"); print_hex(chr_code); 22900 end; 22901 22902 @ We temporarily define |p| to be |relax|, so that an occurrence of |p| 22903 while scanning the definition will simply stop the scanning instead of 22904 producing an ``undefined control sequence'' error or expanding the 22905 previous meaning. This allows, for instance, `\.{\\chardef\\foo=123\\foo}'. 22906 22907 @<Assignments@>= 22908 shorthand_def: begin n:=cur_chr; get_r_token; p:=cur_cs; define(p,relax,256); 22909 scan_optional_equals; 22910 case n of 22911 char_def_code: begin scan_char_num; define(p,char_given,cur_val); 22912 end; 22913 math_char_def_code: begin scan_fifteen_bit_int; define(p,math_given,cur_val); 22914 end; 22915 othercases begin scan_eight_bit_int; 22916 case n of 22917 count_def_code: define(p,assign_int,count_base+cur_val); 22918 dimen_def_code: define(p,assign_dimen,scaled_base+cur_val); 22919 skip_def_code: define(p,assign_glue,skip_base+cur_val); 22920 mu_skip_def_code: define(p,assign_mu_glue,mu_skip_base+cur_val); 22921 toks_def_code: define(p,assign_toks,toks_base+cur_val); 22922 end; {there are no other cases} 22923 end 22924 endcases; 22925 end; 22926 22927 @ @<Assignments@>= 22928 read_to_cs: begin scan_int; n:=cur_val; 22929 if not scan_keyword("to") then 22930 @.to@> 22931 begin print_err("Missing `to' inserted"); 22932 @.Missing `to'...@> 22933 help2("You should have said `\read<number> to \cs'.")@/ 22934 ("I'm going to look for the \cs now."); error; 22935 end; 22936 get_r_token; 22937 p:=cur_cs; read_toks(n,p); define(p,call,cur_val); 22938 end; 22939 22940 @ The token-list parameters, \.{\\output} and \.{\\everypar}, etc., receive 22941 their values in the following way. (For safety's sake, we place an 22942 enclosing pair of braces around an \.{\\output} list.) 22943 22944 @<Assignments@>= 22945 toks_register,assign_toks: begin q:=cur_cs; 22946 if cur_cmd=toks_register then 22947 begin scan_eight_bit_int; p:=toks_base+cur_val; 22948 end 22949 else p:=cur_chr; {|p=every_par_loc| or |output_routine_loc| or \dots} 22950 scan_optional_equals; 22951 @<Get the next non-blank non-relax non-call token@>; 22952 if cur_cmd<>left_brace then @<If the right-hand side is a token parameter 22953 or token register, finish the assignment and |goto done|@>; 22954 back_input; cur_cs:=q; q:=scan_toks(false,false); 22955 if link(def_ref)=null then {empty list: revert to the default} 22956 begin define(p,undefined_cs,null); free_avail(def_ref); 22957 end 22958 else begin if p=output_routine_loc then {enclose in curlies} 22959 begin link(q):=get_avail; q:=link(q); 22960 info(q):=right_brace_token+"}"; 22961 q:=get_avail; info(q):=left_brace_token+"{"; 22962 link(q):=link(def_ref); link(def_ref):=q; 22963 end; 22964 define(p,call,def_ref); 22965 end; 22966 end; 22967 22968 @ @<If the right-hand side is a token parameter...@>= 22969 begin if cur_cmd=toks_register then 22970 begin scan_eight_bit_int; cur_cmd:=assign_toks; cur_chr:=toks_base+cur_val; 22971 end; 22972 if cur_cmd=assign_toks then 22973 begin q:=equiv(cur_chr); 22974 if q=null then define(p,undefined_cs,null) 22975 else begin add_token_ref(q); define(p,call,q); 22976 end; 22977 goto done; 22978 end; 22979 end 22980 22981 @ Similar routines are used to assign values to the numeric parameters. 22982 22983 @<Assignments@>= 22984 assign_int: begin p:=cur_chr; scan_optional_equals; scan_int; 22985 word_define(p,cur_val); 22986 end; 22987 assign_dimen: begin p:=cur_chr; scan_optional_equals; 22988 scan_normal_dimen; word_define(p,cur_val); 22989 end; 22990 assign_glue,assign_mu_glue: begin p:=cur_chr; n:=cur_cmd; scan_optional_equals; 22991 if n=assign_mu_glue then scan_glue(mu_val)@+else scan_glue(glue_val); 22992 trap_zero_glue; 22993 define(p,glue_ref,cur_val); 22994 end; 22995 22996 @ When a glue register or parameter becomes zero, it will always point to 22997 |zero_glue| because of the following procedure. (Exception: The tabskip 22998 glue isn't trapped while preambles are being scanned.) 22999 23000 @<Declare subprocedures for |prefixed_command|@>= 23001 procedure trap_zero_glue; 23002 begin if (width(cur_val)=0)and(stretch(cur_val)=0)and(shrink(cur_val)=0) then 23003 begin add_glue_ref(zero_glue); 23004 delete_glue_ref(cur_val); cur_val:=zero_glue; 23005 end; 23006 end; 23007 23008 @ The various character code tables are changed by the |def_code| commands, 23009 and the font families are declared by |def_family|. 23010 23011 @<Put each...@>= 23012 primitive("catcode",def_code,cat_code_base); 23013 @!@:cat_code_}{\.{\\catcode} primitive@> 23014 primitive("mathcode",def_code,math_code_base); 23015 @!@:math_code_}{\.{\\mathcode} primitive@> 23016 primitive("lccode",def_code,lc_code_base); 23017 @!@:lc_code_}{\.{\\lccode} primitive@> 23018 primitive("uccode",def_code,uc_code_base); 23019 @!@:uc_code_}{\.{\\uccode} primitive@> 23020 primitive("sfcode",def_code,sf_code_base); 23021 @!@:sf_code_}{\.{\\sfcode} primitive@> 23022 primitive("delcode",def_code,del_code_base); 23023 @!@:del_code_}{\.{\\delcode} primitive@> 23024 primitive("textfont",def_family,math_font_base); 23025 @!@:text_font_}{\.{\\textfont} primitive@> 23026 primitive("scriptfont",def_family,math_font_base+script_size); 23027 @!@:script_font_}{\.{\\scriptfont} primitive@> 23028 primitive("scriptscriptfont",def_family,math_font_base+script_script_size); 23029 @!@:script_script_font_}{\.{\\scriptscriptfont} primitive@> 23030 23031 @ @<Cases of |print_cmd_chr|...@>= 23032 def_code: if chr_code=cat_code_base then print_esc("catcode") 23033 else if chr_code=math_code_base then print_esc("mathcode") 23034 else if chr_code=lc_code_base then print_esc("lccode") 23035 else if chr_code=uc_code_base then print_esc("uccode") 23036 else if chr_code=sf_code_base then print_esc("sfcode") 23037 else print_esc("delcode"); 23038 def_family: print_size(chr_code-math_font_base); 23039 23040 @ The different types of code values have different legal ranges; the 23041 following program is careful to check each case properly. 23042 23043 @<Assignments@>= 23044 def_code: begin @<Let |n| be the largest legal code value, based on |cur_chr|@>; 23045 p:=cur_chr; scan_char_num; p:=p+cur_val; scan_optional_equals; 23046 scan_int; 23047 if ((cur_val<0)and(p<del_code_base))or(cur_val>n) then 23048 begin print_err("Invalid code ("); print_int(cur_val); 23049 @.Invalid code@> 23050 if p<del_code_base then print("), should be in the range 0..") 23051 else print("), should be at most "); 23052 print_int(n); 23053 help1("I'm going to use 0 instead of that illegal code value.");@/ 23054 error; cur_val:=0; 23055 end; 23056 if p<math_code_base then define(p,data,cur_val) 23057 else if p<del_code_base then define(p,data,hi(cur_val)) 23058 else word_define(p,cur_val); 23059 end; 23060 23061 @ @<Let |n| be the largest...@>= 23062 if cur_chr=cat_code_base then n:=max_char_code 23063 else if cur_chr=math_code_base then n:=@'100000 23064 else if cur_chr=sf_code_base then n:=@'77777 23065 else if cur_chr=del_code_base then n:=@'77777777 23066 else n:=255 23067 23068 @ @<Assignments@>= 23069 def_family: begin p:=cur_chr; scan_four_bit_int; p:=p+cur_val; 23070 scan_optional_equals; scan_font_ident; define(p,data,cur_val); 23071 end; 23072 23073 @ Next we consider changes to \TeX's numeric registers. 23074 23075 @<Assignments@>= 23076 register,advance,multiply,divide: do_register_command(a); 23077 23078 @ We use the fact that |register<advance<multiply<divide|. 23079 23080 @<Declare subprocedures for |prefixed_command|@>= 23081 procedure do_register_command(@!a:small_number); 23082 label found,exit; 23083 var l,@!q,@!r,@!s:pointer; {for list manipulation} 23084 @!p:int_val..mu_val; {type of register involved} 23085 begin q:=cur_cmd; 23086 @<Compute the register location |l| and its type |p|; but |return| if invalid@>; 23087 if q=register then scan_optional_equals 23088 else if scan_keyword("by") then do_nothing; {optional `\.{by}'} 23089 @.by@> 23090 arith_error:=false; 23091 if q<multiply then @<Compute result of |register| or 23092 |advance|, put it in |cur_val|@> 23093 else @<Compute result of |multiply| or |divide|, put it in |cur_val|@>; 23094 if arith_error then 23095 begin print_err("Arithmetic overflow"); 23096 @.Arithmetic overflow@> 23097 help2("I can't carry out that multiplication or division,")@/ 23098 ("since the result is out of range."); 23099 if p>=glue_val then delete_glue_ref(cur_val); 23100 error; return; 23101 end; 23102 if p<glue_val then word_define(l,cur_val) 23103 else begin trap_zero_glue; define(l,glue_ref,cur_val); 23104 end; 23105 exit: end; 23106 23107 @ Here we use the fact that the consecutive codes |int_val..mu_val| and 23108 |assign_int..assign_mu_glue| correspond to each other nicely. 23109 23110 @<Compute the register location |l| and its type |p|...@>= 23111 begin if q<>register then 23112 begin get_x_token; 23113 if (cur_cmd>=assign_int)and(cur_cmd<=assign_mu_glue) then 23114 begin l:=cur_chr; p:=cur_cmd-assign_int; goto found; 23115 end; 23116 if cur_cmd<>register then 23117 begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr); 23118 @.You can't use x after ...@> 23119 print("' after "); print_cmd_chr(q,0); 23120 help1("I'm forgetting what you said and not changing anything."); 23121 error; return; 23122 end; 23123 end; 23124 p:=cur_chr; scan_eight_bit_int; 23125 case p of 23126 int_val: l:=cur_val+count_base; 23127 dimen_val: l:=cur_val+scaled_base; 23128 glue_val: l:=cur_val+skip_base; 23129 mu_val: l:=cur_val+mu_skip_base; 23130 end; {there are no other cases} 23131 end; 23132 found: 23133 23134 @ @<Compute result of |register| or |advance|...@>= 23135 if p<glue_val then 23136 begin if p=int_val then scan_int@+else scan_normal_dimen; 23137 if q=advance then cur_val:=cur_val+eqtb[l].int; 23138 end 23139 else begin scan_glue(p); 23140 if q=advance then @<Compute the sum of two glue specs@>; 23141 end 23142 23143 @ @<Compute the sum of two glue specs@>= 23144 begin q:=new_spec(cur_val); r:=equiv(l); 23145 delete_glue_ref(cur_val); 23146 width(q):=width(q)+width(r); 23147 if stretch(q)=0 then stretch_order(q):=normal; 23148 if stretch_order(q)=stretch_order(r) then stretch(q):=stretch(q)+stretch(r) 23149 else if (stretch_order(q)<stretch_order(r))and(stretch(r)<>0) then 23150 begin stretch(q):=stretch(r); stretch_order(q):=stretch_order(r); 23151 end; 23152 if shrink(q)=0 then shrink_order(q):=normal; 23153 if shrink_order(q)=shrink_order(r) then shrink(q):=shrink(q)+shrink(r) 23154 else if (shrink_order(q)<shrink_order(r))and(shrink(r)<>0) then 23155 begin shrink(q):=shrink(r); shrink_order(q):=shrink_order(r); 23156 end; 23157 cur_val:=q; 23158 end 23159 23160 @ @<Compute result of |multiply| or |divide|...@>= 23161 begin scan_int; 23162 if p<glue_val then 23163 if q=multiply then 23164 if p=int_val then cur_val:=mult_integers(eqtb[l].int,cur_val) 23165 else cur_val:=nx_plus_y(eqtb[l].int,cur_val,0) 23166 else cur_val:=x_over_n(eqtb[l].int,cur_val) 23167 else begin s:=equiv(l); r:=new_spec(s); 23168 if q=multiply then 23169 begin width(r):=nx_plus_y(width(s),cur_val,0); 23170 stretch(r):=nx_plus_y(stretch(s),cur_val,0); 23171 shrink(r):=nx_plus_y(shrink(s),cur_val,0); 23172 end 23173 else begin width(r):=x_over_n(width(s),cur_val); 23174 stretch(r):=x_over_n(stretch(s),cur_val); 23175 shrink(r):=x_over_n(shrink(s),cur_val); 23176 end; 23177 cur_val:=r; 23178 end; 23179 end 23180 23181 @ The processing of boxes is somewhat different, because we may need 23182 to scan and create an entire box before we actually change the value of the old 23183 one. 23184 23185 @<Assignments@>= 23186 set_box: begin scan_eight_bit_int; 23187 if global then n:=256+cur_val@+else n:=cur_val; 23188 scan_optional_equals; 23189 if set_box_allowed then scan_box(box_flag+n) 23190 else begin print_err("Improper "); print_esc("setbox"); 23191 @.Improper \\setbox@> 23192 help2("Sorry, \setbox is not allowed after \halign in a display,")@/ 23193 ("or between \accent and an accented character."); error; 23194 end; 23195 end; 23196 23197 @ The |space_factor| or |prev_depth| settings are changed when a |set_aux| 23198 command is sensed. Similarly, |prev_graf| is changed in the presence of 23199 |set_prev_graf|, and |dead_cycles| or |insert_penalties| in the presence of 23200 |set_page_int|. These definitions are always global. 23201 23202 When some dimension of a box register is changed, the change isn't exactly 23203 global; but \TeX\ does not look at the \.{\\global} switch. 23204 23205 @<Assignments@>= 23206 set_aux:alter_aux; 23207 set_prev_graf:alter_prev_graf; 23208 set_page_dimen:alter_page_so_far; 23209 set_page_int:alter_integer; 23210 set_box_dimen:alter_box_dimen; 23211 23212 @ @<Declare subprocedures for |prefixed_command|@>= 23213 procedure alter_aux; 23214 var c:halfword; {|hmode| or |vmode|} 23215 begin if cur_chr<>abs(mode) then report_illegal_case 23216 else begin c:=cur_chr; scan_optional_equals; 23217 if c=vmode then 23218 begin scan_normal_dimen; prev_depth:=cur_val; 23219 end 23220 else begin scan_int; 23221 if (cur_val<=0)or(cur_val>32767) then 23222 begin print_err("Bad space factor"); 23223 @.Bad space factor@> 23224 help1("I allow only values in the range 1..32767 here."); 23225 int_error(cur_val); 23226 end 23227 else space_factor:=cur_val; 23228 end; 23229 end; 23230 end; 23231 23232 @ @<Declare subprocedures for |prefixed_command|@>= 23233 procedure alter_prev_graf; 23234 var p:0..nest_size; {index into |nest|} 23235 begin nest[nest_ptr]:=cur_list; p:=nest_ptr; 23236 while abs(nest[p].mode_field)<>vmode do decr(p); 23237 scan_optional_equals; scan_int; 23238 if cur_val<0 then 23239 begin print_err("Bad "); print_esc("prevgraf"); 23240 @.Bad \\prevgraf@> 23241 help1("I allow only nonnegative values here."); 23242 int_error(cur_val); 23243 end 23244 else begin nest[p].pg_field:=cur_val; cur_list:=nest[nest_ptr]; 23245 end; 23246 end; 23247 23248 @ @<Declare subprocedures for |prefixed_command|@>= 23249 procedure alter_page_so_far; 23250 var c:0..7; {index into |page_so_far|} 23251 begin c:=cur_chr; scan_optional_equals; scan_normal_dimen; 23252 page_so_far[c]:=cur_val; 23253 end; 23254 23255 @ @<Declare subprocedures for |prefixed_command|@>= 23256 procedure alter_integer; 23257 var c:0..1; {0 for \.{\\deadcycles}, 1 for \.{\\insertpenalties}} 23258 begin c:=cur_chr; scan_optional_equals; scan_int; 23259 if c=0 then dead_cycles:=cur_val 23260 else insert_penalties:=cur_val; 23261 end; 23262 23263 @ @<Declare subprocedures for |prefixed_command|@>= 23264 procedure alter_box_dimen; 23265 var c:small_number; {|width_offset| or |height_offset| or |depth_offset|} 23266 @!b:eight_bits; {box number} 23267 begin c:=cur_chr; scan_eight_bit_int; b:=cur_val; scan_optional_equals; 23268 scan_normal_dimen; 23269 if box(b)<>null then mem[box(b)+c].sc:=cur_val; 23270 end; 23271 23272 @ Paragraph shapes are set up in the obvious way. 23273 23274 @<Assignments@>= 23275 set_shape: begin scan_optional_equals; scan_int; n:=cur_val; 23276 if n<=0 then p:=null 23277 else begin p:=get_node(2*n+1); info(p):=n; 23278 for j:=1 to n do 23279 begin scan_normal_dimen; 23280 mem[p+2*j-1].sc:=cur_val; {indentation} 23281 scan_normal_dimen; 23282 mem[p+2*j].sc:=cur_val; {width} 23283 end; 23284 end; 23285 define(par_shape_loc,shape_ref,p); 23286 end; 23287 23288 @ Here's something that isn't quite so obvious. It guarantees that 23289 |info(par_shape_ptr)| can hold any positive~|n| for which |get_node(2*n+1)| 23290 doesn't overflow the memory capacity. 23291 23292 @<Check the ``constant''...@>= 23293 if 2*max_halfword<mem_top-mem_min then bad:=41; 23294 23295 @ New hyphenation data is loaded by the |hyph_data| command. 23296 23297 @<Put each...@>= 23298 primitive("hyphenation",hyph_data,0); 23299 @!@:hyphenation_}{\.{\\hyphenation} primitive@> 23300 primitive("patterns",hyph_data,1); 23301 @!@:patterns_}{\.{\\patterns} primitive@> 23302 23303 @ @<Cases of |print_cmd_chr|...@>= 23304 hyph_data: if chr_code=1 then print_esc("patterns") 23305 else print_esc("hyphenation"); 23306 23307 @ @<Assignments@>= 23308 hyph_data: if cur_chr=1 then 23309 begin @!init new_patterns; goto done;@;@+tini@/ 23310 print_err("Patterns can be loaded only by INITEX"); 23311 @.Patterns can be...@> 23312 help0; error; 23313 repeat get_token; until cur_cmd=right_brace; {flush the patterns} 23314 return; 23315 end 23316 else begin new_hyph_exceptions; goto done; 23317 end; 23318 23319 @ All of \TeX's parameters are kept in |eqtb| except the font information, 23320 the interaction mode, and the hyphenation tables; these are strictly global. 23321 23322 @<Assignments@>= 23323 assign_font_dimen: begin find_font_dimen(true); k:=cur_val; 23324 scan_optional_equals; scan_normal_dimen; font_info[k].sc:=cur_val; 23325 end; 23326 assign_font_int: begin n:=cur_chr; scan_font_ident; f:=cur_val; 23327 scan_optional_equals; scan_int; 23328 if n=0 then hyphen_char[f]:=cur_val@+else skew_char[f]:=cur_val; 23329 end; 23330 23331 @ @<Put each...@>= 23332 primitive("hyphenchar",assign_font_int,0); 23333 @!@:hyphen_char_}{\.{\\hyphenchar} primitive@> 23334 primitive("skewchar",assign_font_int,1); 23335 @!@:skew_char_}{\.{\\skewchar} primitive@> 23336 23337 @ @<Cases of |print_cmd_chr|...@>= 23338 assign_font_int: if chr_code=0 then print_esc("hyphenchar") 23339 else print_esc("skewchar"); 23340 23341 @ Here is where the information for a new font gets loaded. 23342 23343 @<Assignments@>= 23344 def_font: new_font(a); 23345 23346 @ @<Declare subprocedures for |prefixed_command|@>= 23347 procedure new_font(@!a:small_number); 23348 label common_ending; 23349 var u:pointer; {user's font identifier} 23350 @!s:scaled; {stated ``at'' size, or negative of scaled magnification} 23351 @!f:internal_font_number; {runs through existing fonts} 23352 @!t:str_number; {name for the frozen font identifier} 23353 @!old_setting:0..max_selector; {holds |selector| setting} 23354 @!flushable_string:str_number; {string not yet referenced} 23355 begin if job_name=0 then open_log_file; 23356 {avoid confusing \.{texput} with the font name} 23357 @.texput@> 23358 get_r_token; u:=cur_cs; 23359 if u>=hash_base then t:=text(u) 23360 else if u>=single_base then 23361 if u=null_cs then t:="FONT"@+else t:=u-single_base 23362 else begin old_setting:=selector; selector:=new_string; 23363 print("FONT"); print(u-active_base); selector:=old_setting; 23364 @.FONTx@> 23365 str_room(1); t:=make_string; 23366 end; 23367 define(u,set_font,null_font); scan_optional_equals; scan_file_name; 23368 @<Scan the font size specification@>; 23369 @<If this font has already been loaded, set |f| to the internal 23370 font number and |goto common_ending|@>; 23371 f:=read_font_info(u,cur_name,cur_area,s); 23372 common_ending: equiv(u):=f; eqtb[font_id_base+f]:=eqtb[u]; font_id_text(f):=t; 23373 end; 23374 23375 @ @<Scan the font size specification@>= 23376 name_in_progress:=true; {this keeps |cur_name| from being changed} 23377 if scan_keyword("at") then @<Put the \(p)(positive) `at' size into |s|@> 23378 @.at@> 23379 else if scan_keyword("scaled") then 23380 @.scaled@> 23381 begin scan_int; s:=-cur_val; 23382 if (cur_val<=0)or(cur_val>32768) then 23383 begin print_err("Illegal magnification has been changed to 1000");@/ 23384 @.Illegal magnification...@> 23385 help1("The magnification ratio must be between 1 and 32768."); 23386 int_error(cur_val); s:=-1000; 23387 end; 23388 end 23389 else s:=-1000; 23390 name_in_progress:=false 23391 23392 @ @<Put the \(p)(positive) `at' size into |s|@>= 23393 begin scan_normal_dimen; s:=cur_val; 23394 if (s<=0)or(s>=@'1000000000) then 23395 begin print_err("Improper `at' size ("); 23396 print_scaled(s); print("pt), replaced by 10pt"); 23397 @.Improper `at' size...@> 23398 help2("I can only handle fonts at positive sizes that are")@/ 23399 ("less than 2048pt, so I've changed what you said to 10pt."); 23400 error; s:=10*unity; 23401 end; 23402 end 23403 23404 @ When the user gives a new identifier to a font that was previously loaded, 23405 the new name becomes the font identifier of record. Font names `\.{xyz}' and 23406 `\.{XYZ}' are considered to be different. 23407 23408 @<If this font has already been loaded...@>= 23409 flushable_string:=str_ptr-1; 23410 for f:=font_base+1 to font_ptr do 23411 if str_eq_str(font_name[f],cur_name)and str_eq_str(font_area[f],cur_area) then 23412 begin if cur_name=flushable_string then 23413 begin flush_string; cur_name:=font_name[f]; 23414 end; 23415 if s>0 then 23416 begin if s=font_size[f] then goto common_ending; 23417 end 23418 else if font_size[f]=xn_over_d(font_dsize[f],-s,1000) then 23419 goto common_ending; 23420 end 23421 23422 @ @<Cases of |print_cmd_chr|...@>= 23423 set_font:begin print("select font "); slow_print(font_name[chr_code]); 23424 if font_size[chr_code]<>font_dsize[chr_code] then 23425 begin print(" at "); print_scaled(font_size[chr_code]); 23426 print("pt"); 23427 end; 23428 end; 23429 23430 @ @<Put each...@>= 23431 primitive("batchmode",set_interaction,batch_mode); 23432 @!@:batch_mode_}{\.{\\batchmode} primitive@> 23433 primitive("nonstopmode",set_interaction,nonstop_mode); 23434 @!@:nonstop_mode_}{\.{\\nonstopmode} primitive@> 23435 primitive("scrollmode",set_interaction,scroll_mode); 23436 @!@:scroll_mode_}{\.{\\scrollmode} primitive@> 23437 primitive("errorstopmode",set_interaction,error_stop_mode); 23438 @!@:error_stop_mode_}{\.{\\errorstopmode} primitive@> 23439 23440 @ @<Cases of |print_cmd_chr|...@>= 23441 set_interaction: case chr_code of 23442 batch_mode: print_esc("batchmode"); 23443 nonstop_mode: print_esc("nonstopmode"); 23444 scroll_mode: print_esc("scrollmode"); 23445 othercases print_esc("errorstopmode") 23446 endcases; 23447 23448 @ @<Assignments@>= 23449 set_interaction: new_interaction; 23450 23451 @ @<Declare subprocedures for |prefixed_command|@>= 23452 procedure new_interaction; 23453 begin print_ln; 23454 interaction:=cur_chr; 23455 @<Initialize the print |selector| based on |interaction|@>; 23456 if log_opened then selector:=selector+2; 23457 end; 23458 23459 @ The \.{\\afterassignment} command puts a token into the global 23460 variable |after_token|. This global variable is examined just after 23461 every assignment has been performed. 23462 23463 @<Glob...@>= 23464 @!after_token:halfword; {zero, or a saved token} 23465 23466 @ @<Set init...@>= 23467 after_token:=0; 23468 23469 @ @<Cases of |main_control| that don't...@>= 23470 any_mode(after_assignment):begin get_token; after_token:=cur_tok; 23471 end; 23472 23473 @ @<Insert a token saved by \.{\\afterassignment}, if any@>= 23474 if after_token<>0 then 23475 begin cur_tok:=after_token; back_input; after_token:=0; 23476 end 23477 23478 @ Here is a procedure that might be called `Get the next non-blank non-relax 23479 non-call non-assignment token'. 23480 23481 @<Declare act...@>= 23482 procedure do_assignments; 23483 label exit; 23484 begin loop begin @<Get the next non-blank non-relax...@>; 23485 if cur_cmd<=max_non_prefixed_command then return; 23486 set_box_allowed:=false; prefixed_command; set_box_allowed:=true; 23487 end; 23488 exit:end; 23489 23490 @ @<Cases of |main_control| that don't...@>= 23491 any_mode(after_group):begin get_token; save_for_after(cur_tok); 23492 end; 23493 23494 @ Files for \.{\\read} are opened and closed by the |in_stream| command. 23495 23496 @<Put each...@>= 23497 primitive("openin",in_stream,1); 23498 @!@:open_in_}{\.{\\openin} primitive@> 23499 primitive("closein",in_stream,0); 23500 @!@:close_in_}{\.{\\closein} primitive@> 23501 23502 @ @<Cases of |print_cmd_chr|...@>= 23503 in_stream: if chr_code=0 then print_esc("closein") 23504 else print_esc("openin"); 23505 23506 @ @<Cases of |main_control| that don't...@>= 23507 any_mode(in_stream): open_or_close_in; 23508 23509 @ @<Declare act...@>= 23510 procedure open_or_close_in; 23511 var c:0..1; {1 for \.{\\openin}, 0 for \.{\\closein}} 23512 @!n:0..15; {stream number} 23513 begin c:=cur_chr; scan_four_bit_int; n:=cur_val; 23514 if read_open[n]<>closed then 23515 begin a_close(read_file[n]); read_open[n]:=closed; 23516 end; 23517 if c<>0 then 23518 begin scan_optional_equals; scan_file_name; 23519 if cur_ext="" then cur_ext:=".tex"; 23520 pack_cur_name; 23521 if a_open_in(read_file[n]) then read_open[n]:=just_open; 23522 end; 23523 end; 23524 23525 @ The user can issue messages to the terminal, regardless of the 23526 current mode. 23527 23528 @<Cases of |main_control| that don't...@>= 23529 any_mode(message):issue_message; 23530 23531 @ @<Put each...@>= 23532 primitive("message",message,0); 23533 @!@:message_}{\.{\\message} primitive@> 23534 primitive("errmessage",message,1); 23535 @!@:err_message_}{\.{\\errmessage} primitive@> 23536 23537 @ @<Cases of |print_cmd_chr|...@>= 23538 message: if chr_code=0 then print_esc("message") 23539 else print_esc("errmessage"); 23540 23541 @ @<Declare act...@>= 23542 procedure issue_message; 23543 var old_setting:0..max_selector; {holds |selector| setting} 23544 @!c:0..1; {identifies \.{\\message} and \.{\\errmessage}} 23545 @!s:str_number; {the message} 23546 begin c:=cur_chr; link(garbage):=scan_toks(false,true); 23547 old_setting:=selector; selector:=new_string; 23548 token_show(def_ref); selector:=old_setting; 23549 flush_list(def_ref); 23550 str_room(1); s:=make_string; 23551 if c=0 then @<Print string |s| on the terminal@> 23552 else @<Print string |s| as an error message@>; 23553 flush_string; 23554 end; 23555 23556 @ @<Print string |s| on the terminal@>= 23557 begin if term_offset+length(s)>max_print_line-2 then print_ln 23558 else if (term_offset>0)or(file_offset>0) then print_char(" "); 23559 slow_print(s); update_terminal; 23560 end 23561 23562 @ If \.{\\errmessage} occurs often in |scroll_mode|, without user-defined 23563 \.{\\errhelp}, we don't want to give a long help message each time. So we 23564 give a verbose explanation only once. 23565 23566 @<Glob...@>= 23567 @!long_help_seen:boolean; {has the long \.{\\errmessage} help been used?} 23568 23569 @ @<Set init...@>=long_help_seen:=false; 23570 23571 @ @<Print string |s| as an error message@>= 23572 begin print_err(""); slow_print(s); 23573 if err_help<>null then use_err_help:=true 23574 else if long_help_seen then help1("(That was another \errmessage.)") 23575 else begin if interaction<error_stop_mode then long_help_seen:=true; 23576 help4("This error message was generated by an \errmessage")@/ 23577 ("command, so I can't give any explicit help.")@/ 23578 ("Pretend that you're Hercule Poirot: Examine all clues,")@/ 23579 @^Poirot, Hercule@> 23580 ("and deduce the truth by order and method."); 23581 end; 23582 error; use_err_help:=false; 23583 end 23584 23585 @ The |error| routine calls on |give_err_help| if help is requested from 23586 the |err_help| parameter. 23587 23588 @p procedure give_err_help; 23589 begin token_show(err_help); 23590 end; 23591 23592 @ The \.{\\uppercase} and \.{\\lowercase} commands are implemented by 23593 building a token list and then changing the cases of the letters in it. 23594 23595 @<Cases of |main_control| that don't...@>= 23596 any_mode(case_shift):shift_case; 23597 23598 @ @<Put each...@>= 23599 primitive("lowercase",case_shift,lc_code_base); 23600 @!@:lowercase_}{\.{\\lowercase} primitive@> 23601 primitive("uppercase",case_shift,uc_code_base); 23602 @!@:uppercase_}{\.{\\uppercase} primitive@> 23603 23604 @ @<Cases of |print_cmd_chr|...@>= 23605 case_shift:if chr_code=lc_code_base then print_esc("lowercase") 23606 else print_esc("uppercase"); 23607 23608 @ @<Declare act...@>= 23609 procedure shift_case; 23610 var b:pointer; {|lc_code_base| or |uc_code_base|} 23611 @!p:pointer; {runs through the token list} 23612 @!t:halfword; {token} 23613 @!c:eight_bits; {character code} 23614 begin b:=cur_chr; p:=scan_toks(false,false); p:=link(def_ref); 23615 while p<>null do 23616 begin @<Change the case of the token in |p|, if a change is appropriate@>; 23617 p:=link(p); 23618 end; 23619 back_list(link(def_ref)); free_avail(def_ref); {omit reference count} 23620 end; 23621 23622 @ When the case of a |chr_code| changes, we don't change the |cmd|. 23623 We also change active characters, using the fact that 23624 |cs_token_flag+active_base| is a multiple of~256. 23625 @^data structure assumptions@> 23626 23627 @<Change the case of the token in |p|, if a change is appropriate@>= 23628 t:=info(p); 23629 if t<cs_token_flag+single_base then 23630 begin c:=t mod 256; 23631 if equiv(b+c)<>0 then info(p):=t-c+equiv(b+c); 23632 end 23633 23634 @ We come finally to the last pieces missing from |main_control|, namely the 23635 `\.{\\show}' commands that are useful when debugging. 23636 23637 @<Cases of |main_control| that don't...@>= 23638 any_mode(xray): show_whatever; 23639 23640 @ @d show_code=0 { \.{\\show} } 23641 @d show_box_code=1 { \.{\\showbox} } 23642 @d show_the_code=2 { \.{\\showthe} } 23643 @d show_lists_code=3 { \.{\\showlists} } 23644 23645 @<Put each...@>= 23646 primitive("show",xray,show_code); 23647 @!@:show_}{\.{\\show} primitive@> 23648 primitive("showbox",xray,show_box_code); 23649 @!@:show_box_}{\.{\\showbox} primitive@> 23650 primitive("showthe",xray,show_the_code); 23651 @!@:show_the_}{\.{\\showthe} primitive@> 23652 primitive("showlists",xray,show_lists_code); 23653 @!@:show_lists_code_}{\.{\\showlists} primitive@> 23654 23655 @ @<Cases of |print_cmd_chr|...@>= 23656 xray: case chr_code of 23657 show_box_code:print_esc("showbox"); 23658 show_the_code:print_esc("showthe"); 23659 show_lists_code:print_esc("showlists"); 23660 othercases print_esc("show") 23661 endcases; 23662 23663 @ @<Declare act...@>= 23664 procedure show_whatever; 23665 label common_ending; 23666 var p:pointer; {tail of a token list to show} 23667 begin case cur_chr of 23668 show_lists_code: begin begin_diagnostic; show_activities; 23669 end; 23670 show_box_code: @<Show the current contents of a box@>; 23671 show_code: @<Show the current meaning of a token, then |goto common_ending|@>; 23672 othercases @<Show the current value of some parameter or register, 23673 then |goto common_ending|@> 23674 endcases;@/ 23675 @<Complete a potentially long \.{\\show} command@>; 23676 common_ending: if interaction<error_stop_mode then 23677 begin help0; decr(error_count); 23678 end 23679 else if tracing_online>0 then 23680 begin@t@>@;@/ 23681 help3("This isn't an error message; I'm just \showing something.")@/ 23682 ("Type `I\show...' to show more (e.g., \show\cs,")@/ 23683 ("\showthe\count10, \showbox255, \showlists)."); 23684 end 23685 else begin@t@>@;@/ 23686 help5("This isn't an error message; I'm just \showing something.")@/ 23687 ("Type `I\show...' to show more (e.g., \show\cs,")@/ 23688 ("\showthe\count10, \showbox255, \showlists).")@/ 23689 ("And type `I\tracingonline=1\show...' to show boxes and")@/ 23690 ("lists on your terminal as well as in the transcript file."); 23691 end; 23692 error; 23693 end; 23694 23695 @ @<Show the current meaning of a token...@>= 23696 begin get_token; 23697 if interaction=error_stop_mode then wake_up_terminal; 23698 print_nl("> "); 23699 if cur_cs<>0 then 23700 begin sprint_cs(cur_cs); print_char("="); 23701 end; 23702 print_meaning; goto common_ending; 23703 end 23704 23705 @ @<Cases of |print_cmd_chr|...@>= 23706 undefined_cs: print("undefined"); 23707 call: print("macro"); 23708 long_call: print_esc("long macro"); 23709 outer_call: print_esc("outer macro"); 23710 long_outer_call: begin print_esc("long"); print_esc("outer macro"); 23711 end; 23712 end_template: print_esc("outer endtemplate"); 23713 23714 @ @<Show the current contents of a box@>= 23715 begin scan_eight_bit_int; begin_diagnostic; 23716 print_nl("> \box"); print_int(cur_val); print_char("="); 23717 if box(cur_val)=null then print("void") 23718 else show_box(box(cur_val)); 23719 end 23720 23721 @ @<Show the current value of some parameter...@>= 23722 begin p:=the_toks; 23723 if interaction=error_stop_mode then wake_up_terminal; 23724 print_nl("> "); token_show(temp_head); 23725 flush_list(link(temp_head)); goto common_ending; 23726 end 23727 23728 @ @<Complete a potentially long \.{\\show} command@>= 23729 end_diagnostic(true); print_err("OK"); 23730 @.OK@> 23731 if selector=term_and_log then if tracing_online<=0 then 23732 begin selector:=term_only; print(" (see the transcript file)"); 23733 selector:=term_and_log; 23734 end 23735 23736 @* \[50] Dumping and undumping the tables. 23737 After \.{INITEX} has seen a collection of fonts and macros, it 23738 can write all the necessary information on an auxiliary file so 23739 that production versions of \TeX\ are able to initialize their 23740 memory at high speed. The present section of the program takes 23741 care of such output and input. We shall consider simultaneously 23742 the processes of storing and restoring, 23743 so that the inverse relation between them is clear. 23744 @.INITEX@> 23745 23746 The global variable |format_ident| is a string that is printed right 23747 after the |banner| line when \TeX\ is ready to start. For \.{INITEX} this 23748 string says simply `\.{ (INITEX)}'; for other versions of \TeX\ it says, 23749 for example, `\.{ (preloaded format=plain 1982.11.19)}', showing the year, 23750 month, and day that the format file was created. We have |format_ident=0| 23751 before \TeX's tables are loaded. 23752 23753 @<Glob...@>= 23754 @!format_ident:str_number; 23755 23756 @ @<Set init...@>= 23757 format_ident:=0; 23758 23759 @ @<Initialize table entries...@>= 23760 format_ident:=" (INITEX)"; 23761 23762 @ @<Declare act...@>= 23763 @!init procedure store_fmt_file; 23764 label found1,found2,done1,done2; 23765 var j,@!k,@!l:integer; {all-purpose indices} 23766 @!p,@!q: pointer; {all-purpose pointers} 23767 @!x: integer; {something to dump} 23768 @!w: four_quarters; {four ASCII codes} 23769 begin @<If dumping is not allowed, abort@>; 23770 @<Create the |format_ident|, open the format file, 23771 and inform the user that dumping has begun@>; 23772 @<Dump constants for consistency check@>; 23773 @<Dump the string pool@>; 23774 @<Dump the dynamic memory@>; 23775 @<Dump the table of equivalents@>; 23776 @<Dump the font information@>; 23777 @<Dump the hyphenation tables@>; 23778 @<Dump a couple more things and the closing check word@>; 23779 @<Close the format file@>; 23780 end; 23781 tini 23782 23783 @ Corresponding to the procedure that dumps a format file, we have a function 23784 that reads one in. The function returns |false| if the dumped format is 23785 incompatible with the present \TeX\ table sizes, etc. 23786 23787 @d bad_fmt=6666 {go here if the format file is unacceptable} 23788 @d too_small(#)==begin wake_up_terminal; 23789 wterm_ln('---! Must increase the ',#); 23790 @.Must increase the x@> 23791 goto bad_fmt; 23792 end 23793 23794 @p @t\4@>@<Declare the function called |open_fmt_file|@>@; 23795 function load_fmt_file:boolean; 23796 label bad_fmt,exit; 23797 var j,@!k:integer; {all-purpose indices} 23798 @!p,@!q: pointer; {all-purpose pointers} 23799 @!x: integer; {something undumped} 23800 @!w: four_quarters; {four ASCII codes} 23801 begin @<Undump constants for consistency check@>; 23802 @<Undump the string pool@>; 23803 @<Undump the dynamic memory@>; 23804 @<Undump the table of equivalents@>; 23805 @<Undump the font information@>; 23806 @<Undump the hyphenation tables@>; 23807 @<Undump a couple more things and the closing check word@>; 23808 load_fmt_file:=true; return; {it worked!} 23809 bad_fmt: wake_up_terminal; 23810 wterm_ln('(Fatal format file error; I''m stymied)'); 23811 @.Fatal format file error@> 23812 load_fmt_file:=false; 23813 exit:end; 23814 23815 @ The user is not allowed to dump a format file unless |save_ptr=0|. 23816 This condition implies that |cur_level=level_one|, hence 23817 the |xeq_level| array is constant and it need not be dumped. 23818 23819 @<If dumping is not allowed, abort@>= 23820 if save_ptr<>0 then 23821 begin print_err("You can't dump inside a group"); 23822 @.You can't dump...@> 23823 help1("`{...\dump}' is a no-no."); succumb; 23824 end 23825 23826 @ Format files consist of |memory_word| items, and we use the following 23827 macros to dump words of different types: 23828 23829 @d dump_wd(#)==begin fmt_file^:=#; put(fmt_file);@+end 23830 @d dump_int(#)==begin fmt_file^.int:=#; put(fmt_file);@+end 23831 @d dump_hh(#)==begin fmt_file^.hh:=#; put(fmt_file);@+end 23832 @d dump_qqqq(#)==begin fmt_file^.qqqq:=#; put(fmt_file);@+end 23833 23834 @<Glob...@>= 23835 @!fmt_file:word_file; {for input or output of format information} 23836 23837 @ The inverse macros are slightly more complicated, since we need to check 23838 the range of the values we are reading in. We say `|undump(a)(b)(x)|' to 23839 read an integer value |x| that is supposed to be in the range |a<=x<=b|. 23840 System error messages should be suppressed when undumping. 23841 @^system dependencies@> 23842 23843 @d undump_wd(#)==begin get(fmt_file); #:=fmt_file^;@+end 23844 @d undump_int(#)==begin get(fmt_file); #:=fmt_file^.int;@+end 23845 @d undump_hh(#)==begin get(fmt_file); #:=fmt_file^.hh;@+end 23846 @d undump_qqqq(#)==begin get(fmt_file); #:=fmt_file^.qqqq;@+end 23847 @d undump_end_end(#)==#:=x;@+end 23848 @d undump_end(#)==(x>#) then goto bad_fmt@+else undump_end_end 23849 @d undump(#)==begin undump_int(x); if (x<#) or undump_end 23850 @d undump_size_end_end(#)==too_small(#)@+else undump_end_end 23851 @d undump_size_end(#)==if x># then undump_size_end_end 23852 @d undump_size(#)==begin undump_int(x); 23853 if x<# then goto bad_fmt; undump_size_end 23854 23855 @ The next few sections of the program should make it clear how we use the 23856 dump/undump macros. 23857 23858 @<Dump constants for consistency check@>= 23859 dump_int(@$);@/ 23860 dump_int(mem_bot);@/ 23861 dump_int(mem_top);@/ 23862 dump_int(eqtb_size);@/ 23863 dump_int(hash_prime);@/ 23864 dump_int(hyph_size) 23865 23866 @ Sections of a \.{WEB} program that are ``commented out'' still contribute 23867 strings to the string pool; therefore \.{INITEX} and \TeX\ will have 23868 the same strings. (And it is, of course, a good thing that they do.) 23869 @.WEB@> 23870 @^string pool@> 23871 23872 @<Undump constants for consistency check@>= 23873 x:=fmt_file^.int; 23874 if x<>@$ then goto bad_fmt; {check that strings are the same} 23875 undump_int(x); 23876 if x<>mem_bot then goto bad_fmt; 23877 undump_int(x); 23878 if x<>mem_top then goto bad_fmt; 23879 undump_int(x); 23880 if x<>eqtb_size then goto bad_fmt; 23881 undump_int(x); 23882 if x<>hash_prime then goto bad_fmt; 23883 undump_int(x); 23884 if x<>hyph_size then goto bad_fmt 23885 23886 @ @d dump_four_ASCII== 23887 w.b0:=qi(so(str_pool[k])); w.b1:=qi(so(str_pool[k+1])); 23888 w.b2:=qi(so(str_pool[k+2])); w.b3:=qi(so(str_pool[k+3])); 23889 dump_qqqq(w) 23890 23891 @<Dump the string pool@>= 23892 dump_int(pool_ptr); 23893 dump_int(str_ptr); 23894 for k:=0 to str_ptr do dump_int(str_start[k]); 23895 k:=0; 23896 while k+4<pool_ptr do 23897 begin dump_four_ASCII; k:=k+4; 23898 end; 23899 k:=pool_ptr-4; dump_four_ASCII; 23900 print_ln; print_int(str_ptr); print(" strings of total length "); 23901 print_int(pool_ptr) 23902 23903 @ @d undump_four_ASCII== 23904 undump_qqqq(w); 23905 str_pool[k]:=si(qo(w.b0)); str_pool[k+1]:=si(qo(w.b1)); 23906 str_pool[k+2]:=si(qo(w.b2)); str_pool[k+3]:=si(qo(w.b3)) 23907 23908 @<Undump the string pool@>= 23909 undump_size(0)(pool_size)('string pool size')(pool_ptr); 23910 undump_size(0)(max_strings)('max strings')(str_ptr); 23911 for k:=0 to str_ptr do undump(0)(pool_ptr)(str_start[k]); 23912 k:=0; 23913 while k+4<pool_ptr do 23914 begin undump_four_ASCII; k:=k+4; 23915 end; 23916 k:=pool_ptr-4; undump_four_ASCII; 23917 init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr 23918 23919 @ By sorting the list of available spaces in the variable-size portion of 23920 |mem|, we are usually able to get by without having to dump very much 23921 of the dynamic memory. 23922 23923 We recompute |var_used| and |dyn_used|, so that \.{INITEX} dumps valid 23924 information even when it has not been gathering statistics. 23925 23926 @<Dump the dynamic memory@>= 23927 sort_avail; var_used:=0; 23928 dump_int(lo_mem_max); dump_int(rover); 23929 p:=mem_bot; q:=rover; x:=0; 23930 repeat for k:=p to q+1 do dump_wd(mem[k]); 23931 x:=x+q+2-p; var_used:=var_used+q-p; 23932 p:=q+node_size(q); q:=rlink(q); 23933 until q=rover; 23934 var_used:=var_used+lo_mem_max-p; dyn_used:=mem_end+1-hi_mem_min;@/ 23935 for k:=p to lo_mem_max do dump_wd(mem[k]); 23936 x:=x+lo_mem_max+1-p; 23937 dump_int(hi_mem_min); dump_int(avail); 23938 for k:=hi_mem_min to mem_end do dump_wd(mem[k]); 23939 x:=x+mem_end+1-hi_mem_min; 23940 p:=avail; 23941 while p<>null do 23942 begin decr(dyn_used); p:=link(p); 23943 end; 23944 dump_int(var_used); dump_int(dyn_used); 23945 print_ln; print_int(x); 23946 print(" memory locations dumped; current usage is "); 23947 print_int(var_used); print_char("&"); print_int(dyn_used) 23948 23949 @ @<Undump the dynamic memory@>= 23950 undump(lo_mem_stat_max+1000)(hi_mem_stat_min-1)(lo_mem_max); 23951 undump(lo_mem_stat_max+1)(lo_mem_max)(rover); 23952 p:=mem_bot; q:=rover; 23953 repeat for k:=p to q+1 do undump_wd(mem[k]); 23954 p:=q+node_size(q); 23955 if (p>lo_mem_max)or((q>=rlink(q))and(rlink(q)<>rover)) then goto bad_fmt; 23956 q:=rlink(q); 23957 until q=rover; 23958 for k:=p to lo_mem_max do undump_wd(mem[k]); 23959 if mem_min<mem_bot-2 then {make more low memory available} 23960 begin p:=llink(rover); q:=mem_min+1; 23961 link(mem_min):=null; info(mem_min):=null; {we don't use the bottom word} 23962 rlink(p):=q; llink(rover):=q;@/ 23963 rlink(q):=rover; llink(q):=p; link(q):=empty_flag; 23964 node_size(q):=mem_bot-q; 23965 end; 23966 undump(lo_mem_max+1)(hi_mem_stat_min)(hi_mem_min); 23967 undump(null)(mem_top)(avail); mem_end:=mem_top; 23968 for k:=hi_mem_min to mem_end do undump_wd(mem[k]); 23969 undump_int(var_used); undump_int(dyn_used) 23970 23971 @ @<Dump the table of equivalents@>= 23972 @<Dump regions 1 to 4 of |eqtb|@>; 23973 @<Dump regions 5 and 6 of |eqtb|@>; 23974 dump_int(par_loc); dump_int(write_loc);@/ 23975 @<Dump the hash table@> 23976 23977 @ @<Undump the table of equivalents@>= 23978 @<Undump regions 1 to 6 of |eqtb|@>; 23979 undump(hash_base)(frozen_control_sequence)(par_loc); 23980 par_token:=cs_token_flag+par_loc;@/ 23981 undump(hash_base)(frozen_control_sequence)(write_loc);@/ 23982 @<Undump the hash table@> 23983 23984 @ The table of equivalents usually contains repeated information, so we dump it 23985 in compressed form: The sequence of $n+2$ values $(n,x_1,\ldots,x_n,m)$ in the 23986 format file represents $n+m$ consecutive entries of |eqtb|, with |m| extra 23987 copies of $x_n$, namely $(x_1,\ldots,x_n,x_n,\ldots,x_n)$. 23988 23989 @<Dump regions 1 to 4 of |eqtb|@>= 23990 k:=active_base; 23991 repeat j:=k; 23992 while j<int_base-1 do 23993 begin if (equiv(j)=equiv(j+1))and(eq_type(j)=eq_type(j+1))and@| 23994 (eq_level(j)=eq_level(j+1)) then goto found1; 23995 incr(j); 23996 end; 23997 l:=int_base; goto done1; {|j=int_base-1|} 23998 found1: incr(j); l:=j; 23999 while j<int_base-1 do 24000 begin if (equiv(j)<>equiv(j+1))or(eq_type(j)<>eq_type(j+1))or@| 24001 (eq_level(j)<>eq_level(j+1)) then goto done1; 24002 incr(j); 24003 end; 24004 done1:dump_int(l-k); 24005 while k<l do 24006 begin dump_wd(eqtb[k]); incr(k); 24007 end; 24008 k:=j+1; dump_int(k-l); 24009 until k=int_base 24010 24011 @ @<Dump regions 5 and 6 of |eqtb|@>= 24012 repeat j:=k; 24013 while j<eqtb_size do 24014 begin if eqtb[j].int=eqtb[j+1].int then goto found2; 24015 incr(j); 24016 end; 24017 l:=eqtb_size+1; goto done2; {|j=eqtb_size|} 24018 found2: incr(j); l:=j; 24019 while j<eqtb_size do 24020 begin if eqtb[j].int<>eqtb[j+1].int then goto done2; 24021 incr(j); 24022 end; 24023 done2:dump_int(l-k); 24024 while k<l do 24025 begin dump_wd(eqtb[k]); incr(k); 24026 end; 24027 k:=j+1; dump_int(k-l); 24028 until k>eqtb_size 24029 24030 @ @<Undump regions 1 to 6 of |eqtb|@>= 24031 k:=active_base; 24032 repeat undump_int(x); 24033 if (x<1)or(k+x>eqtb_size+1) then goto bad_fmt; 24034 for j:=k to k+x-1 do undump_wd(eqtb[j]); 24035 k:=k+x; 24036 undump_int(x); 24037 if (x<0)or(k+x>eqtb_size+1) then goto bad_fmt; 24038 for j:=k to k+x-1 do eqtb[j]:=eqtb[k-1]; 24039 k:=k+x; 24040 until k>eqtb_size 24041 24042 @ A different scheme is used to compress the hash table, since its lower 24043 region is usually sparse. When |text(p)<>0| for |p<=hash_used|, we output 24044 two words, |p| and |hash[p]|. The hash table is, of course, densely packed 24045 for |p>=hash_used|, so the remaining entries are output in a~block. 24046 24047 @<Dump the hash table@>= 24048 dump_int(hash_used); cs_count:=frozen_control_sequence-1-hash_used; 24049 for p:=hash_base to hash_used do if text(p)<>0 then 24050 begin dump_int(p); dump_hh(hash[p]); incr(cs_count); 24051 end; 24052 for p:=hash_used+1 to undefined_control_sequence-1 do dump_hh(hash[p]); 24053 dump_int(cs_count);@/ 24054 print_ln; print_int(cs_count); print(" multiletter control sequences") 24055 24056 @ @<Undump the hash table@>= 24057 undump(hash_base)(frozen_control_sequence)(hash_used); p:=hash_base-1; 24058 repeat undump(p+1)(hash_used)(p); undump_hh(hash[p]); 24059 until p=hash_used; 24060 for p:=hash_used+1 to undefined_control_sequence-1 do undump_hh(hash[p]); 24061 undump_int(cs_count) 24062 24063 @ @<Dump the font information@>= 24064 dump_int(fmem_ptr); 24065 for k:=0 to fmem_ptr-1 do dump_wd(font_info[k]); 24066 dump_int(font_ptr); 24067 for k:=null_font to font_ptr do 24068 @<Dump the array info for internal font number |k|@>; 24069 print_ln; print_int(fmem_ptr-7); print(" words of font info for "); 24070 print_int(font_ptr-font_base); print(" preloaded font"); 24071 if font_ptr<>font_base+1 then print_char("s") 24072 24073 @ @<Undump the font information@>= 24074 undump_size(7)(font_mem_size)('font mem size')(fmem_ptr); 24075 for k:=0 to fmem_ptr-1 do undump_wd(font_info[k]); 24076 undump_size(font_base)(font_max)('font max')(font_ptr); 24077 for k:=null_font to font_ptr do 24078 @<Undump the array info for internal font number |k|@> 24079 24080 @ @<Dump the array info for internal font number |k|@>= 24081 begin dump_qqqq(font_check[k]); 24082 dump_int(font_size[k]); 24083 dump_int(font_dsize[k]); 24084 dump_int(font_params[k]);@/ 24085 dump_int(hyphen_char[k]); 24086 dump_int(skew_char[k]);@/ 24087 dump_int(font_name[k]); 24088 dump_int(font_area[k]);@/ 24089 dump_int(font_bc[k]); 24090 dump_int(font_ec[k]);@/ 24091 dump_int(char_base[k]); 24092 dump_int(width_base[k]); 24093 dump_int(height_base[k]);@/ 24094 dump_int(depth_base[k]); 24095 dump_int(italic_base[k]); 24096 dump_int(lig_kern_base[k]);@/ 24097 dump_int(kern_base[k]); 24098 dump_int(exten_base[k]); 24099 dump_int(param_base[k]);@/ 24100 dump_int(font_glue[k]);@/ 24101 dump_int(bchar_label[k]); 24102 dump_int(font_bchar[k]); 24103 dump_int(font_false_bchar[k]);@/ 24104 print_nl("\font"); print_esc(font_id_text(k)); print_char("="); 24105 print_file_name(font_name[k],font_area[k],""); 24106 if font_size[k]<>font_dsize[k] then 24107 begin print(" at "); print_scaled(font_size[k]); print("pt"); 24108 end; 24109 end 24110 24111 @ @<Undump the array info for internal font number |k|@>= 24112 begin undump_qqqq(font_check[k]);@/ 24113 undump_int(font_size[k]); 24114 undump_int(font_dsize[k]); 24115 undump(min_halfword)(max_halfword)(font_params[k]);@/ 24116 undump_int(hyphen_char[k]); 24117 undump_int(skew_char[k]);@/ 24118 undump(0)(str_ptr)(font_name[k]); 24119 undump(0)(str_ptr)(font_area[k]);@/ 24120 undump(0)(255)(font_bc[k]); 24121 undump(0)(255)(font_ec[k]);@/ 24122 undump_int(char_base[k]); 24123 undump_int(width_base[k]); 24124 undump_int(height_base[k]);@/ 24125 undump_int(depth_base[k]); 24126 undump_int(italic_base[k]); 24127 undump_int(lig_kern_base[k]);@/ 24128 undump_int(kern_base[k]); 24129 undump_int(exten_base[k]); 24130 undump_int(param_base[k]);@/ 24131 undump(min_halfword)(lo_mem_max)(font_glue[k]);@/ 24132 undump(0)(fmem_ptr-1)(bchar_label[k]); 24133 undump(min_quarterword)(non_char)(font_bchar[k]); 24134 undump(min_quarterword)(non_char)(font_false_bchar[k]); 24135 end 24136 24137 @ @<Dump the hyphenation tables@>= 24138 dump_int(hyph_count); 24139 for k:=0 to hyph_size do if hyph_word[k]<>0 then 24140 begin dump_int(k); dump_int(hyph_word[k]); dump_int(hyph_list[k]); 24141 end; 24142 print_ln; print_int(hyph_count); print(" hyphenation exception"); 24143 if hyph_count<>1 then print_char("s"); 24144 if trie_not_ready then init_trie; 24145 dump_int(trie_max); 24146 for k:=0 to trie_max do dump_hh(trie[k]); 24147 dump_int(trie_op_ptr); 24148 for k:=1 to trie_op_ptr do 24149 begin dump_int(hyf_distance[k]); 24150 dump_int(hyf_num[k]); 24151 dump_int(hyf_next[k]); 24152 end; 24153 print_nl("Hyphenation trie of length "); print_int(trie_max); 24154 @.Hyphenation trie...@> 24155 print(" has "); print_int(trie_op_ptr); print(" op"); 24156 if trie_op_ptr<>1 then print_char("s"); 24157 print(" out of "); print_int(trie_op_size); 24158 for k:=255 downto 0 do if trie_used[k]>min_quarterword then 24159 begin print_nl(" "); print_int(qo(trie_used[k])); 24160 print(" for language "); print_int(k); 24161 dump_int(k); dump_int(qo(trie_used[k])); 24162 end 24163 24164 @ Only ``nonempty'' parts of |op_start| need to be restored. 24165 24166 @<Undump the hyphenation tables@>= 24167 undump(0)(hyph_size)(hyph_count); 24168 for k:=1 to hyph_count do 24169 begin undump(0)(hyph_size)(j); 24170 undump(0)(str_ptr)(hyph_word[j]); 24171 undump(min_halfword)(max_halfword)(hyph_list[j]); 24172 end; 24173 undump_size(0)(trie_size)('trie size')(j); @+init trie_max:=j;@+tini 24174 for k:=0 to j do undump_hh(trie[k]); 24175 undump_size(0)(trie_op_size)('trie op size')(j); @+init trie_op_ptr:=j;@+tini 24176 for k:=1 to j do 24177 begin undump(0)(63)(hyf_distance[k]); {a |small_number|} 24178 undump(0)(63)(hyf_num[k]); 24179 undump(min_quarterword)(max_quarterword)(hyf_next[k]); 24180 end; 24181 init for k:=0 to 255 do trie_used[k]:=min_quarterword;@+tini@;@/ 24182 k:=256; 24183 while j>0 do 24184 begin undump(0)(k-1)(k); undump(1)(j)(x);@+init trie_used[k]:=qi(x);@+tini@;@/ 24185 j:=j-x; op_start[k]:=qo(j); 24186 end; 24187 @!init trie_not_ready:=false @+tini 24188 24189 @ We have already printed a lot of statistics, so we set |tracing_stats:=0| 24190 to prevent them from appearing again. 24191 24192 @<Dump a couple more things and the closing check word@>= 24193 dump_int(interaction); dump_int(format_ident); dump_int(69069); 24194 tracing_stats:=0 24195 24196 @ @<Undump a couple more things and the closing check word@>= 24197 undump(batch_mode)(error_stop_mode)(interaction); 24198 undump(0)(str_ptr)(format_ident); 24199 undump_int(x); 24200 if (x<>69069)or eof(fmt_file) then goto bad_fmt 24201 24202 @ @<Create the |format_ident|...@>= 24203 selector:=new_string; 24204 print(" (preloaded format="); print(job_name); print_char(" "); 24205 print_int(year); print_char("."); 24206 print_int(month); print_char("."); print_int(day); print_char(")"); 24207 if interaction=batch_mode then selector:=log_only 24208 else selector:=term_and_log; 24209 str_room(1); 24210 format_ident:=make_string; 24211 pack_job_name(format_extension); 24212 while not w_open_out(fmt_file) do 24213 prompt_file_name("format file name",format_extension); 24214 print_nl("Beginning to dump on file "); 24215 @.Beginning to dump...@> 24216 slow_print(w_make_name_string(fmt_file)); flush_string; 24217 print_nl(""); slow_print(format_ident) 24218 24219 @ @<Close the format file@>= 24220 w_close(fmt_file) 24221 24222 @* \[51] The main program. 24223 This is it: the part of \TeX\ that executes all those procedures we have 24224 written. 24225 24226 Well---almost. Let's leave space for a few more routines that we may 24227 have forgotten. 24228 24229 @p @<Last-minute procedures@> 24230 24231 @ We have noted that there are two versions of \TeX82. One, called \.{INITEX}, 24232 @.INITEX@> 24233 has to be run first; it initializes everything from scratch, without 24234 reading a format file, and it has the capability of dumping a format file. 24235 The other one is called `\.{VIRTEX}'; it is a ``virgin'' program that needs 24236 @.VIRTEX@> 24237 to input a format file in order to get started. \.{VIRTEX} typically has 24238 more memory capacity than \.{INITEX}, because it does not need the space 24239 consumed by the auxiliary hyphenation tables and the numerous calls on 24240 |primitive|, etc. 24241 24242 The \.{VIRTEX} program cannot read a format file instantaneously, of course; 24243 the best implementations therefore allow for production versions of \TeX\ that 24244 not only avoid the loading routine for \PASCAL\ object code, they also have 24245 a format file pre-loaded. This is impossible to do if we stick to standard 24246 \PASCAL; but there is a simple way to fool many systems into avoiding the 24247 initialization, as follows:\quad(1)~We declare a global integer variable 24248 called |ready_already|. The probability is negligible that this 24249 variable holds any particular value like 314159 when \.{VIRTEX} is first 24250 loaded.\quad(2)~After we have read in a format file and initialized 24251 everything, we set |ready_already:=314159|.\quad(3)~Soon \.{VIRTEX} 24252 will print `\.*', waiting for more input; and at this point we 24253 interrupt the program and save its core image in some form that the 24254 operating system can reload speedily.\quad(4)~When that core image is 24255 activated, the program starts again at the beginning; but now 24256 |ready_already=314159| and all the other global variables have 24257 their initial values too. The former chastity has vanished! 24258 24259 In other words, if we allow ourselves to test the condition 24260 |ready_already=314159|, before |ready_already| has been 24261 assigned a value, we can avoid the lengthy initialization. Dirty tricks 24262 rarely pay off so handsomely. 24263 @^dirty \PASCAL@> 24264 @^system dependencies@> 24265 24266 On systems that allow such preloading, the standard program called \.{TeX} 24267 should be the one that has \.{plain} format preloaded, since that agrees 24268 with {\sl The \TeX book}. Other versions, e.g., \.{AmSTeX}, should also 24269 @:TeXbook}{\sl The \TeX book@> 24270 @.AmSTeX@> 24271 @.plain@> 24272 be provided for commonly used formats. 24273 24274 @<Glob...@>= 24275 @!ready_already:integer; {a sacrifice of purity for economy} 24276 24277 @ Now this is really it: \TeX\ starts and ends here. 24278 24279 The initial test involving |ready_already| should be deleted if the 24280 \PASCAL\ runtime system is smart enough to detect such a ``mistake.'' 24281 @^system dependencies@> 24282 24283 @p begin @!{|start_here|} 24284 history:=fatal_error_stop; {in case we quit during initialization} 24285 t_open_out; {open the terminal for output} 24286 if ready_already=314159 then goto start_of_TEX; 24287 @<Check the ``constant'' values...@>@; 24288 if bad>0 then 24289 begin wterm_ln('Ouch---my internal constants have been clobbered!', 24290 '---case ',bad:1); 24291 @.Ouch...clobbered@> 24292 goto final_end; 24293 end; 24294 initialize; {set global variables to their starting values} 24295 @!init if not get_strings_started then goto final_end; 24296 init_prim; {call |primitive| for each primitive} 24297 init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr; fix_date_and_time; 24298 tini@/ 24299 ready_already:=314159; 24300 start_of_TEX: @<Initialize the output routines@>; 24301 @<Get the first line of input and prepare to start@>; 24302 history:=spotless; {ready to go!} 24303 main_control; {come to life} 24304 final_cleanup; {prepare for death} 24305 end_of_TEX: close_files_and_terminate; 24306 final_end: ready_already:=0; 24307 end. 24308 24309 @ Here we do whatever is needed to complete \TeX's job gracefully on the 24310 local operating system. The code here might come into play after a fatal 24311 error; it must therefore consist entirely of ``safe'' operations that 24312 cannot produce error messages. For example, it would be a mistake to call 24313 |str_room| or |make_string| at this time, because a call on |overflow| 24314 might lead to an infinite loop. 24315 @^system dependencies@> 24316 (Actually there's one way to get error messages, via |prepare_mag|; 24317 but that can't cause infinite recursion.) 24318 @^recursion@> 24319 24320 If |final_cleanup| is bypassed, this program doesn't bother to close 24321 the input files that may still be open. 24322 24323 @<Last-minute...@>= 24324 procedure close_files_and_terminate; 24325 var k:integer; {all-purpose index} 24326 begin @<Finish the extensions@>; new_line_char:=-1; 24327 @!stat if tracing_stats>0 then @<Output statistics about this job@>;@;@+tats@/ 24328 wake_up_terminal; @<Finish the \.{DVI} file@>; 24329 if log_opened then 24330 begin wlog_cr; a_close(log_file); selector:=selector-2; 24331 if selector=term_only then 24332 begin print_nl("Transcript written on "); 24333 @.Transcript written...@> 24334 slow_print(log_name); print_char("."); 24335 end; 24336 end; 24337 end; 24338 24339 @ The present section goes directly to the log file instead of using 24340 |print| commands, because there's no need for these strings to take 24341 up |str_pool| memory when a non-{\bf stat} version of \TeX\ is being used. 24342 24343 @<Output statistics...@>= 24344 if log_opened then 24345 begin wlog_ln(' '); 24346 wlog_ln('Here is how much of TeX''s memory',' you used:'); 24347 @.Here is how much...@> 24348 wlog(' ',str_ptr-init_str_ptr:1,' string'); 24349 if str_ptr<>init_str_ptr+1 then wlog('s'); 24350 wlog_ln(' out of ', max_strings-init_str_ptr:1);@/ 24351 wlog_ln(' ',pool_ptr-init_pool_ptr:1,' string characters out of ', 24352 pool_size-init_pool_ptr:1);@/ 24353 wlog_ln(' ',lo_mem_max-mem_min+mem_end-hi_mem_min+2:1,@| 24354 ' words of memory out of ',mem_end+1-mem_min:1);@/ 24355 wlog_ln(' ',cs_count:1,' multiletter control sequences out of ', 24356 hash_size:1);@/ 24357 wlog(' ',fmem_ptr:1,' words of font info for ', 24358 font_ptr-font_base:1,' font'); 24359 if font_ptr<>font_base+1 then wlog('s'); 24360 wlog_ln(', out of ',font_mem_size:1,' for ',font_max-font_base:1);@/ 24361 wlog(' ',hyph_count:1,' hyphenation exception'); 24362 if hyph_count<>1 then wlog('s'); 24363 wlog_ln(' out of ',hyph_size:1);@/ 24364 wlog_ln(' ',max_in_stack:1,'i,',max_nest_stack:1,'n,',@| 24365 max_param_stack:1,'p,',@| 24366 max_buf_stack+1:1,'b,',@| 24367 max_save_stack+6:1,'s stack positions out of ',@| 24368 stack_size:1,'i,', 24369 nest_size:1,'n,', 24370 param_size:1,'p,', 24371 buf_size:1,'b,', 24372 save_size:1,'s'); 24373 end 24374 24375 @ We get to the |final_cleanup| routine when \.{\\end} or \.{\\dump} has 24376 been scanned and |its_all_over|\kern-2pt. 24377 24378 @<Last-minute...@>= 24379 procedure final_cleanup; 24380 label exit; 24381 var c:small_number; {0 for \.{\\end}, 1 for \.{\\dump}} 24382 begin c:=cur_chr; if c<>1 then new_line_char:=-1; 24383 if job_name=0 then open_log_file; 24384 while input_ptr>0 do 24385 if state=token_list then end_token_list@+else end_file_reading; 24386 while open_parens>0 do 24387 begin print(" )"); decr(open_parens); 24388 end; 24389 if cur_level>level_one then 24390 begin print_nl("("); print_esc("end occurred "); 24391 print("inside a group at level "); 24392 @:end_}{\.{(\\end occurred...)}@> 24393 print_int(cur_level-level_one); print_char(")"); 24394 end; 24395 while cond_ptr<>null do 24396 begin print_nl("("); print_esc("end occurred "); 24397 print("when "); print_cmd_chr(if_test,cur_if); 24398 if if_line<>0 then 24399 begin print(" on line "); print_int(if_line); 24400 end; 24401 print(" was incomplete)"); 24402 if_line:=if_line_field(cond_ptr); 24403 cur_if:=subtype(cond_ptr); temp_ptr:=cond_ptr; 24404 cond_ptr:=link(cond_ptr); free_node(temp_ptr,if_node_size); 24405 end; 24406 if history<>spotless then 24407 if ((history=warning_issued)or(interaction<error_stop_mode)) then 24408 if selector=term_and_log then 24409 begin selector:=term_only; 24410 print_nl("(see the transcript file for additional information)"); 24411 @.see the transcript file...@> 24412 selector:=term_and_log; 24413 end; 24414 if c=1 then 24415 begin @!init for c:=top_mark_code to split_bot_mark_code do 24416 if cur_mark[c]<>null then delete_token_ref(cur_mark[c]); 24417 if last_glue<>max_halfword then delete_glue_ref(last_glue); 24418 store_fmt_file; return;@+tini@/ 24419 print_nl("(\dump is performed only by INITEX)"); return; 24420 @:dump_}{\.{\\dump...only by INITEX}@> 24421 end; 24422 exit:end; 24423 24424 @ @<Last-minute...@>= 24425 @!init procedure init_prim; {initialize all the primitives} 24426 begin no_new_control_sequence:=false; 24427 @<Put each...@>; 24428 no_new_control_sequence:=true; 24429 end; 24430 tini 24431 24432 @ When we begin the following code, \TeX's tables may still contain garbage; 24433 the strings might not even be present. Thus we must proceed cautiously to get 24434 bootstrapped in. 24435 24436 But when we finish this part of the program, \TeX\ is ready to call on the 24437 |main_control| routine to do its work. 24438 24439 @<Get the first line...@>= 24440 begin @<Initialize the input routines@>; 24441 if (format_ident=0)or(buffer[loc]="&") then 24442 begin if format_ident<>0 then initialize; {erase preloaded format} 24443 if not open_fmt_file then goto final_end; 24444 if not load_fmt_file then 24445 begin w_close(fmt_file); goto final_end; 24446 end; 24447 w_close(fmt_file); 24448 while (loc<limit)and(buffer[loc]=" ") do incr(loc); 24449 end; 24450 if end_line_char_inactive then decr(limit) 24451 else buffer[limit]:=end_line_char; 24452 fix_date_and_time;@/ 24453 @<Compute the magic offset@>; 24454 @<Initialize the print |selector|...@>; 24455 if (loc<limit)and(cat_code(buffer[loc])<>escape) then start_input; 24456 {\.{\\input} assumed} 24457 end 24458 24459 @* \[52] Debugging. 24460 Once \TeX\ is working, you should be able to diagnose most errors with 24461 the \.{\\show} commands and other diagnostic features. But for the initial 24462 stages of debugging, and for the revelation of really deep mysteries, you 24463 can compile \TeX\ with a few more aids, including the \PASCAL\ runtime 24464 checks and its debugger. An additional routine called |debug_help| 24465 will also come into play when you type `\.D' after an error message; 24466 |debug_help| also occurs just before a fatal error causes \TeX\ to succumb. 24467 @^debugging@> 24468 @^system dependencies@> 24469 24470 The interface to |debug_help| is primitive, but it is good enough when used 24471 with a \PASCAL\ debugger that allows you to set breakpoints and to read 24472 variables and change their values. After getting the prompt `\.{debug \#}', you 24473 type either a negative number (this exits |debug_help|), or zero (this 24474 goes to a location where you can set a breakpoint, thereby entering into 24475 dialog with the \PASCAL\ debugger), or a positive number |m| followed by 24476 an argument |n|. The meaning of |m| and |n| will be clear from the 24477 program below. (If |m=13|, there is an additional argument, |l|.) 24478 @.debug \#@> 24479 24480 @d breakpoint=888 {place where a breakpoint is desirable} 24481 24482 @<Last-minute...@>= 24483 @!debug procedure debug_help; {routine to display various things} 24484 label breakpoint,exit; 24485 var k,@!l,@!m,@!n:integer; 24486 begin clear_terminal; 24487 loop begin wake_up_terminal; 24488 print_nl("debug # (-1 to exit):"); update_terminal; 24489 @.debug \#@> 24490 read(term_in,m); 24491 if m<0 then return 24492 else if m=0 then 24493 begin goto breakpoint;@/ {go to every declared label at least once} 24494 breakpoint: m:=0; @{'BREAKPOINT'@}@/ 24495 end 24496 else begin read(term_in,n); 24497 case m of 24498 @t\4@>@<Numbered cases for |debug_help|@>@; 24499 othercases print("?") 24500 endcases; 24501 end; 24502 end; 24503 exit:end; 24504 gubed 24505 24506 @ @<Numbered cases...@>= 24507 1: print_word(mem[n]); {display |mem[n]| in all forms} 24508 2: print_int(info(n)); 24509 3: print_int(link(n)); 24510 4: print_word(eqtb[n]); 24511 5: print_word(font_info[n]); 24512 6: print_word(save_stack[n]); 24513 7: show_box(n); 24514 {show a box, abbreviated by |show_box_depth| and |show_box_breadth|} 24515 8: begin breadth_max:=10000; depth_threshold:=pool_size-pool_ptr-10; 24516 show_node_list(n); {show a box in its entirety} 24517 end; 24518 9: show_token_list(n,null,1000); 24519 10: slow_print(n); 24520 11: check_mem(n>0); {check wellformedness; print new busy locations if |n>0|} 24521 12: search_mem(n); {look for pointers to |n|} 24522 13: begin read(term_in,l); print_cmd_chr(n,l); 24523 end; 24524 14: for k:=0 to n do print(buffer[k]); 24525 15: begin font_in_short_display:=null_font; short_display(n); 24526 end; 24527 16: panicking:=not panicking; 24528 24529 @* \[53] Extensions. 24530 The program above includes a bunch of ``hooks'' that allow further 24531 capabilities to be added without upsetting \TeX's basic structure. 24532 Most of these hooks are concerned with ``whatsit'' nodes, which are 24533 intended to be used for special purposes; whenever a new extension to 24534 \TeX\ involves a new kind of whatsit node, a corresponding change needs 24535 to be made to the routines below that deal with such nodes, 24536 but it will usually be unnecessary to make many changes to the 24537 other parts of this program. 24538 24539 In order to demonstrate how extensions can be made, we shall treat 24540 `\.{\\write}', `\.{\\openout}', `\.{\\closeout}', `\.{\\immediate}', 24541 `\.{\\special}', and `\.{\\setlanguage}' as if they were extensions. 24542 These commands are actually primitives of \TeX, and they should 24543 appear in all implementations of the system; but let's try to imagine 24544 that they aren't. Then the program below illustrates how a person 24545 could add them. 24546 24547 Sometimes, of course, an extension will require changes to \TeX\ itself; 24548 no system of hooks could be complete enough for all conceivable extensions. 24549 The features associated with `\.{\\write}' are almost all confined to the 24550 following paragraphs, but there are small parts of the |print_ln| and 24551 |print_char| procedures that were introduced specifically to \.{\\write} 24552 characters. Furthermore one of the token lists recognized by the scanner 24553 is a |write_text|; and there are a few other miscellaneous places where we 24554 have already provided for some aspect of \.{\\write}. The goal of a \TeX\ 24555 extender should be to minimize alterations to the standard parts of the 24556 program, and to avoid them completely if possible. He or she should also 24557 be quite sure that there's no easy way to accomplish the desired goals 24558 with the standard features that \TeX\ already has. ``Think thrice before 24559 extending,'' because that may save a lot of work, and it will also keep 24560 incompatible extensions of \TeX\ from proliferating. 24561 @^system dependencies@> 24562 @^extensions to \TeX@> 24563 24564 @ First let's consider the format of whatsit nodes that are used to represent 24565 the data associated with \.{\\write} and its relatives. Recall that a whatsit 24566 has |type=whatsit_node|, and the |subtype| is supposed to distinguish 24567 different kinds of whatsits. Each node occupies two or more words; the 24568 exact number is immaterial, as long as it is readily determined from the 24569 |subtype| or other data. 24570 24571 We shall introduce five |subtype| values here, corresponding to the 24572 control sequences \.{\\openout}, \.{\\write}, \.{\\closeout}, \.{\\special}, and 24573 \.{\\setlanguage}. The second word of I/O whatsits has a |write_stream| field 24574 that identifies the write-stream number (0 to 15, or 16 for out-of-range and 24575 positive, or 17 for out-of-range and negative). 24576 In the case of \.{\\write} and \.{\\special}, there is also a field that 24577 points to the reference count of a token list that should be sent. In the 24578 case of \.{\\openout}, we need three words and three auxiliary subfields 24579 to hold the string numbers for name, area, and extension. 24580 24581 @d write_node_size=2 {number of words in a write/whatsit node} 24582 @d open_node_size=3 {number of words in an open/whatsit node} 24583 @d open_node=0 {|subtype| in whatsits that represent files to \.{\\openout}} 24584 @d write_node=1 {|subtype| in whatsits that represent things to \.{\\write}} 24585 @d close_node=2 {|subtype| in whatsits that represent streams to \.{\\closeout}} 24586 @d special_node=3 {|subtype| in whatsits that represent \.{\\special} things} 24587 @d language_node=4 {|subtype| in whatsits that change the current language} 24588 @d what_lang(#)==link(#+1) {language number, in the range |0..255|} 24589 @d what_lhm(#)==type(#+1) {minimum left fragment, in the range |1..63|} 24590 @d what_rhm(#)==subtype(#+1) {minimum right fragment, in the range |1..63|} 24591 @d write_tokens(#) == link(#+1) {reference count of token list to write} 24592 @d write_stream(#) == info(#+1) {stream number (0 to 17)} 24593 @d open_name(#) == link(#+1) {string number of file name to open} 24594 @d open_area(#) == info(#+2) {string number of file area for |open_name|} 24595 @d open_ext(#) == link(#+2) {string number of file extension for |open_name|} 24596 24597 @ The sixteen possible \.{\\write} streams are represented by the |write_file| 24598 array. The |j|th file is open if and only if |write_open[j]=true|. The last 24599 two streams are special; |write_open[16]| represents a stream number 24600 greater than 15, while |write_open[17]| represents a negative stream number, 24601 and both of these variables are always |false|. 24602 24603 @<Glob...@>= 24604 @!write_file:array[0..15] of alpha_file; 24605 @!write_open:array[0..17] of boolean; 24606 24607 @ @<Set init...@>= 24608 for k:=0 to 17 do write_open[k]:=false; 24609 24610 @ Extensions might introduce new command codes; but it's best to use 24611 |extension| with a modifier, whenever possible, so that |main_control| 24612 stays the same. 24613 24614 @d immediate_code=4 {command modifier for \.{\\immediate}} 24615 @d set_language_code=5 {command modifier for \.{\\setlanguage}} 24616 24617 @<Put each...@>= 24618 primitive("openout",extension,open_node);@/ 24619 @!@:open_out_}{\.{\\openout} primitive@> 24620 primitive("write",extension,write_node); write_loc:=cur_val;@/ 24621 @!@:write_}{\.{\\write} primitive@> 24622 primitive("closeout",extension,close_node);@/ 24623 @!@:close_out_}{\.{\\closeout} primitive@> 24624 primitive("special",extension,special_node);@/ 24625 @!@:special_}{\.{\\special} primitive@> 24626 primitive("immediate",extension,immediate_code);@/ 24627 @!@:immediate_}{\.{\\immediate} primitive@> 24628 primitive("setlanguage",extension,set_language_code);@/ 24629 @!@:set_language_}{\.{\\setlanguage} primitive@> 24630 24631 @ The variable |write_loc| just introduced is used to provide an 24632 appropriate error message in case of ``runaway'' write texts. 24633 24634 @<Glob...@>= 24635 @!write_loc:pointer; {|eqtb| address of \.{\\write}} 24636 24637 @ @<Cases of |print_cmd_chr|...@>= 24638 extension: case chr_code of 24639 open_node:print_esc("openout"); 24640 write_node:print_esc("write"); 24641 close_node:print_esc("closeout"); 24642 special_node:print_esc("special"); 24643 immediate_code:print_esc("immediate"); 24644 set_language_code:print_esc("setlanguage"); 24645 othercases print("[unknown extension!]") 24646 endcases; 24647 24648 @ When an |extension| command occurs in |main_control|, in any mode, 24649 the |do_extension| routine is called. 24650 24651 @<Cases of |main_control| that are for extensions...@>= 24652 any_mode(extension):do_extension; 24653 24654 @ @<Declare act...@>= 24655 @t\4@>@<Declare procedures needed in |do_extension|@>@; 24656 procedure do_extension; 24657 var i,@!j,@!k:integer; {all-purpose integers} 24658 @!p,@!q,@!r:pointer; {all-purpose pointers} 24659 begin case cur_chr of 24660 open_node:@<Implement \.{\\openout}@>; 24661 write_node:@<Implement \.{\\write}@>; 24662 close_node:@<Implement \.{\\closeout}@>; 24663 special_node:@<Implement \.{\\special}@>; 24664 immediate_code:@<Implement \.{\\immediate}@>; 24665 set_language_code:@<Implement \.{\\setlanguage}@>; 24666 othercases confusion("ext1") 24667 @:this can't happen ext1}{\quad ext1@> 24668 endcases; 24669 end; 24670 24671 @ Here is a subroutine that creates a whatsit node having a given |subtype| 24672 and a given number of words. It initializes only the first word of the whatsit, 24673 and appends it to the current list. 24674 24675 @<Declare procedures needed in |do_extension|@>= 24676 procedure new_whatsit(@!s:small_number;@!w:small_number); 24677 var p:pointer; {the new node} 24678 begin p:=get_node(w); type(p):=whatsit_node; subtype(p):=s; 24679 link(tail):=p; tail:=p; 24680 end; 24681 24682 @ The next subroutine uses |cur_chr| to decide what sort of whatsit is 24683 involved, and also inserts a |write_stream| number. 24684 24685 @<Declare procedures needed in |do_ext...@>= 24686 procedure new_write_whatsit(@!w:small_number); 24687 begin new_whatsit(cur_chr,w); 24688 if w<>write_node_size then scan_four_bit_int 24689 else begin scan_int; 24690 if cur_val<0 then cur_val:=17 24691 else if cur_val>15 then cur_val:=16; 24692 end; 24693 write_stream(tail):=cur_val; 24694 end; 24695 24696 @ @<Implement \.{\\openout}@>= 24697 begin new_write_whatsit(open_node_size); 24698 scan_optional_equals; scan_file_name;@/ 24699 open_name(tail):=cur_name; open_area(tail):=cur_area; open_ext(tail):=cur_ext; 24700 end 24701 24702 @ When `\.{\\write 12\{...\}}' appears, we scan the token list `\.{\{...\}}' 24703 without expanding its macros; the macros will be expanded later when this 24704 token list is rescanned. 24705 24706 @<Implement \.{\\write}@>= 24707 begin k:=cur_cs; new_write_whatsit(write_node_size);@/ 24708 cur_cs:=k; p:=scan_toks(false,false); write_tokens(tail):=def_ref; 24709 end 24710 24711 @ @<Implement \.{\\closeout}@>= 24712 begin new_write_whatsit(write_node_size); write_tokens(tail):=null; 24713 end 24714 24715 @ When `\.{\\special\{...\}}' appears, we expand the macros in the token 24716 list as in \.{\\xdef} and \.{\\mark}. 24717 24718 @<Implement \.{\\special}@>= 24719 begin new_whatsit(special_node,write_node_size); write_stream(tail):=null; 24720 p:=scan_toks(false,true); write_tokens(tail):=def_ref; 24721 end 24722 24723 @ Each new type of node that appears in our data structure must be capable 24724 of being displayed, copied, destroyed, and so on. The routines that we 24725 need for write-oriented whatsits are somewhat like those for mark nodes; 24726 other extensions might, of course, involve more subtlety here. 24727 24728 @<Basic printing...@>= 24729 procedure print_write_whatsit(@!s:str_number;@!p:pointer); 24730 begin print_esc(s); 24731 if write_stream(p)<16 then print_int(write_stream(p)) 24732 else if write_stream(p)=16 then print_char("*") 24733 @.*\relax@> 24734 else print_char("-"); 24735 end; 24736 24737 @ @<Display the whatsit...@>= 24738 case subtype(p) of 24739 open_node:begin print_write_whatsit("openout",p); 24740 print_char("="); print_file_name(open_name(p),open_area(p),open_ext(p)); 24741 end; 24742 write_node:begin print_write_whatsit("write",p); 24743 print_mark(write_tokens(p)); 24744 end; 24745 close_node:print_write_whatsit("closeout",p); 24746 special_node:begin print_esc("special"); 24747 print_mark(write_tokens(p)); 24748 end; 24749 language_node:begin print_esc("setlanguage"); 24750 print_int(what_lang(p)); print(" (hyphenmin "); 24751 print_int(what_lhm(p)); print_char(","); 24752 print_int(what_rhm(p)); print_char(")"); 24753 end; 24754 othercases print("whatsit?") 24755 endcases 24756 24757 @ @<Make a partial copy of the whatsit...@>= 24758 case subtype(p) of 24759 open_node: begin r:=get_node(open_node_size); words:=open_node_size; 24760 end; 24761 write_node,special_node: begin r:=get_node(write_node_size); 24762 add_token_ref(write_tokens(p)); words:=write_node_size; 24763 end; 24764 close_node,language_node: begin r:=get_node(small_node_size); 24765 words:=small_node_size; 24766 end; 24767 othercases confusion("ext2") 24768 @:this can't happen ext2}{\quad ext2@> 24769 endcases 24770 24771 @ @<Wipe out the whatsit...@>= 24772 begin case subtype(p) of 24773 open_node: free_node(p,open_node_size); 24774 write_node,special_node: begin delete_token_ref(write_tokens(p)); 24775 free_node(p,write_node_size); goto done; 24776 end; 24777 close_node,language_node: free_node(p,small_node_size); 24778 othercases confusion("ext3") 24779 @:this can't happen ext3}{\quad ext3@> 24780 endcases;@/ 24781 goto done; 24782 end 24783 24784 @ @<Incorporate a whatsit node into a vbox@>=do_nothing 24785 24786 @ @<Incorporate a whatsit node into an hbox@>=do_nothing 24787 24788 @ @<Let |d| be the width of the whatsit |p|@>=d:=0 24789 24790 @ @d adv_past(#)==@+if subtype(#)=language_node then 24791 begin cur_lang:=what_lang(#); l_hyf:=what_lhm(#); r_hyf:=what_rhm(#);@+end 24792 24793 @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>=@+ 24794 adv_past(cur_p) 24795 24796 @ @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>=@+ 24797 adv_past(s) 24798 24799 @ @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>= 24800 goto contribute 24801 24802 @ @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>= 24803 goto not_found 24804 24805 @ @<Output the whatsit node |p| in a vlist@>= 24806 out_what(p) 24807 24808 @ @<Output the whatsit node |p| in an hlist@>= 24809 out_what(p) 24810 24811 @ After all this preliminary shuffling, we come finally to the routines 24812 that actually send out the requested data. Let's do \.{\\special} first 24813 (it's easier). 24814 24815 @<Declare procedures needed in |hlist_out|, |vlist_out|@>= 24816 procedure special_out(@!p:pointer); 24817 var old_setting:0..max_selector; {holds print |selector|} 24818 @!k:pool_pointer; {index into |str_pool|} 24819 begin synch_h; synch_v;@/ 24820 old_setting:=selector; selector:=new_string; 24821 show_token_list(link(write_tokens(p)),null,pool_size-pool_ptr); 24822 selector:=old_setting; 24823 str_room(1); 24824 if cur_length<256 then 24825 begin dvi_out(xxx1); dvi_out(cur_length); 24826 end 24827 else begin dvi_out(xxx4); dvi_four(cur_length); 24828 end; 24829 for k:=str_start[str_ptr] to pool_ptr-1 do dvi_out(so(str_pool[k])); 24830 pool_ptr:=str_start[str_ptr]; {erase the string} 24831 end; 24832 24833 @ To write a token list, we must run it through \TeX's scanner, expanding 24834 macros and \.{\\the} and \.{\\number}, etc. This might cause runaways, 24835 if a delimited macro parameter isn't matched, and runaways would be 24836 extremely confusing since we are calling on \TeX's scanner in the middle 24837 of a \.{\\shipout} command. Therefore we will put a dummy control sequence as 24838 a ``stopper,'' right after the token list. This control sequence is 24839 artificially defined to be \.{\\outer}. 24840 @:end_write_}{\.{\\endwrite}@> 24841 24842 @<Initialize table...@>= 24843 text(end_write):="endwrite"; eq_level(end_write):=level_one; 24844 eq_type(end_write):=outer_call; equiv(end_write):=null; 24845 24846 @ @<Declare procedures needed in |hlist_out|, |vlist_out|@>= 24847 procedure write_out(@!p:pointer); 24848 var old_setting:0..max_selector; {holds print |selector|} 24849 @!old_mode:integer; {saved |mode|} 24850 @!j:small_number; {write stream number} 24851 @!q,@!r:pointer; {temporary variables for list manipulation} 24852 begin @<Expand macros in the token list 24853 and make |link(def_ref)| point to the result@>; 24854 old_setting:=selector; j:=write_stream(p); 24855 if write_open[j] then selector:=j 24856 else begin {write to the terminal if file isn't open} 24857 if (j=17)and(selector=term_and_log) then selector:=log_only; 24858 print_nl(""); 24859 end; 24860 token_show(def_ref); print_ln; 24861 flush_list(def_ref); selector:=old_setting; 24862 end; 24863 24864 @ The final line of this routine is slightly subtle; at least, the author 24865 didn't think about it until getting burnt! There is a used-up token list 24866 @^Knuth, Donald Ervin@> 24867 on the stack, namely the one that contained |end_write_token|. (We 24868 insert this artificial `\.{\\endwrite}' to prevent runaways, as explained 24869 above.) If it were not removed, and if there were numerous writes on a 24870 single page, the stack would overflow. 24871 24872 @d end_write_token==cs_token_flag+end_write 24873 24874 @<Expand macros in the token list and...@>= 24875 q:=get_avail; info(q):=right_brace_token+"}";@/ 24876 r:=get_avail; link(q):=r; info(r):=end_write_token; ins_list(q);@/ 24877 begin_token_list(write_tokens(p),write_text);@/ 24878 q:=get_avail; info(q):=left_brace_token+"{"; ins_list(q); 24879 {now we're ready to scan 24880 `\.\{$\langle\,$token list$\,\rangle$\.{\} \\endwrite}'} 24881 old_mode:=mode; mode:=0; 24882 {disable \.{\\prevdepth}, \.{\\spacefactor}, \.{\\lastskip}, \.{\\prevgraf}} 24883 cur_cs:=write_loc; q:=scan_toks(false,true); {expand macros, etc.} 24884 get_token;@+if cur_tok<>end_write_token then 24885 @<Recover from an unbalanced write command@>; 24886 mode:=old_mode; 24887 end_token_list {conserve stack space} 24888 24889 @ @<Recover from an unbalanced write command@>= 24890 begin print_err("Unbalanced write command"); 24891 @.Unbalanced write...@> 24892 help2("On this page there's a \write with fewer real {'s than }'s.")@/ 24893 ("I can't handle that very well; good luck."); error; 24894 repeat get_token; 24895 until cur_tok=end_write_token; 24896 end 24897 24898 @ The |out_what| procedure takes care of outputting whatsit nodes for 24899 |vlist_out| and |hlist_out|\kern-.3pt. 24900 24901 @<Declare procedures needed in |hlist_out|, |vlist_out|@>= 24902 procedure out_what(@!p:pointer); 24903 var j:small_number; {write stream number} 24904 begin case subtype(p) of 24905 open_node,write_node,close_node:@<Do some work that has been queued up 24906 for \.{\\write}@>; 24907 special_node:special_out(p); 24908 language_node:do_nothing; 24909 othercases confusion("ext4") 24910 @:this can't happen ext4}{\quad ext4@> 24911 endcases; 24912 end; 24913 24914 @ We don't implement \.{\\write} inside of leaders. (The reason is that 24915 the number of times a leader box appears might be different in different 24916 implementations, due to machine-dependent rounding in the glue calculations.) 24917 @^leaders@> 24918 24919 @<Do some work that has been queued up...@>= 24920 if not doing_leaders then 24921 begin j:=write_stream(p); 24922 if subtype(p)=write_node then write_out(p) 24923 else begin if write_open[j] then a_close(write_file[j]); 24924 if subtype(p)=close_node then write_open[j]:=false 24925 else if j<16 then 24926 begin cur_name:=open_name(p); cur_area:=open_area(p); 24927 cur_ext:=open_ext(p); 24928 if cur_ext="" then cur_ext:=".tex"; 24929 pack_cur_name; 24930 while not a_open_out(write_file[j]) do 24931 prompt_file_name("output file name",".tex"); 24932 write_open[j]:=true; 24933 end; 24934 end; 24935 end 24936 24937 @ The presence of `\.{\\immediate}' causes the |do_extension| procedure 24938 to descend to one level of recursion. Nothing happens unless \.{\\immediate} 24939 is followed by `\.{\\openout}', `\.{\\write}', or `\.{\\closeout}'. 24940 @^recursion@> 24941 24942 @<Implement \.{\\immediate}@>= 24943 begin get_x_token; 24944 if (cur_cmd=extension)and(cur_chr<=close_node) then 24945 begin p:=tail; do_extension; {append a whatsit node} 24946 out_what(tail); {do the action immediately} 24947 flush_node_list(tail); tail:=p; link(p):=null; 24948 end 24949 else back_input; 24950 end 24951 24952 @ The \.{\\language} extension is somewhat different. 24953 We need a subroutine that comes into play when a character of 24954 a non-|clang| language is being appended to the current paragraph. 24955 24956 @<Declare action...@>= 24957 procedure fix_language; 24958 var @!l:ASCII_code; {the new current language} 24959 begin if language<=0 then l:=0 24960 else if language>255 then l:=0 24961 else l:=language; 24962 if l<>clang then 24963 begin new_whatsit(language_node,small_node_size); 24964 what_lang(tail):=l; clang:=l;@/ 24965 what_lhm(tail):=norm_min(left_hyphen_min); 24966 what_rhm(tail):=norm_min(right_hyphen_min); 24967 end; 24968 end; 24969 24970 @ @<Implement \.{\\setlanguage}@>= 24971 if abs(mode)<>hmode then report_illegal_case 24972 else begin new_whatsit(language_node,small_node_size); 24973 scan_int; 24974 if cur_val<=0 then clang:=0 24975 else if cur_val>255 then clang:=0 24976 else clang:=cur_val; 24977 what_lang(tail):=clang; 24978 what_lhm(tail):=norm_min(left_hyphen_min); 24979 what_rhm(tail):=norm_min(right_hyphen_min); 24980 end 24981 24982 @ @<Finish the extensions@>= 24983 for k:=0 to 15 do if write_open[k] then a_close(write_file[k]) 24984 24985 @* \[54] System-dependent changes. 24986 This section should be replaced, if necessary, by any special 24987 modifications of the program 24988 that are necessary to make \TeX\ work at a particular installation. 24989 It is usually best to design your change file so that all changes to 24990 previous sections preserve the section numbering; then everybody's version 24991 will be consistent with the published program. More extensive changes, 24992 which introduce new sections, can be inserted here; then only the index 24993 itself will get a new section number. 24994 @^system dependencies@> 24995 24996 @* \[55] Index. 24997 Here is where you can find all uses of each identifier in the program, 24998 with underlined entries pointing to where the identifier was defined. 24999 If the identifier is only one letter long, however, you get to see only 25000 the underlined entries. {\sl All references are to section numbers instead of 25001 page numbers.} 25002 25003 This index also lists error messages and other aspects of the program 25004 that you might want to look up some day. For example, the entry 25005 for ``system dependencies'' lists all sections that should receive 25006 special attention from people who are installing \TeX\ in a new 25007 operating environment. A list of various things that can't happen appears 25008 under ``this can't happen''. Approximately 40 sections are listed under 25009 ``inner loop''; these account for about 60\pct! of \TeX's running time, 25010 exclusive of input and output.