modernc.org/knuth@v0.0.4/tangle/tangle.web (about)

     1  % This program by D. E. Knuth is not copyrighted and can be used freely.
     2  % Version 0 was released in December, 1981.
     3  % Version 1 was released in September, 1982, with version 0 of TeX.
     4  % Slight changes were made in October, 1982, for version 0.6 of TeX.
     5  % Version 1.2 introduced {:nnn} comments, added @@= and @@\ (December, 1982).
     6  % Version 1.4 added "history" (February, 1983).
     7  % Version 1.5 conformed to TeX version 0.96 and fixed @@\ (March, 1983).
     8  % Version 1.7 introduced the new change file format (June, 1983).
     9  % Version 2.0 was released in July, 1983, with version 0.999 of TeX.
    10  % Version 2.5 was released in November, 1983, with version 1.0 of TeX.
    11  % Version 2.6 fixed a bug: force-line-break after a constant (August, 1984).
    12  % Version 2.7 fixed the definition of check_sum_prime (May, 1985).
    13  % Version 2.8 fixed a bug in change_buffer movement (August, 1985).
    14  % Version 2.9 allows nonnumeric macros before their def (December, 1988).
    15  % Version 3, for Sewell's book, fixed long-line bug in input_ln (March, 1989).
    16  % Version 4 was major change to allow 8-bit input (September, 1989).
    17  % Version 4.1 conforms to ANSI standard for-loop rules (September, 1990).
    18  % Version 4.2 fixes stat report if phase one dies (March, 1991).
    19  % Version 4.3 fixes @@ bug in verbatim, catches extra } (September, 1991).
    20  % Version 4.4 activates debug_help on errors as advertised (February, 1993).
    21  % Version 4.5 prevents modno-comments from being split across lines (Dec 2002).
    22  % Version 4.6 fixes archaic @@z logic; is again big enough for TeX (Jan 2021).
    23  
    24  % Here is TeX material that gets inserted after \input webmac
    25  \def\hang{\hangindent 3em\indent\ignorespaces}
    26  \font\ninerm=cmr9
    27  \let\mc=\ninerm % medium caps for names like SAIL
    28  \def\PASCAL{Pascal}
    29  \def\pb{$\.|\ldots\.|$} % Pascal brackets (|...|)
    30  \def\v{\.{\char'174}} % vertical (|) in typewriter font
    31  \mathchardef\BA="3224 % double arrow
    32  \def\({} % kludge for alphabetizing certain module names
    33  
    34  \def\title{TANGLE}
    35  \def\contentspagenumber{125} % should be odd
    36  \def\topofcontents{\null\vfill
    37    \titlefalse % include headline on the contents page
    38    \def\rheader{\mainfont Appendix E\hfil \contentspagenumber}
    39    \centerline{\titlefont The {\ttitlefont TANGLE} processor}
    40    \vskip 15pt
    41    \centerline{(Version 4.6)}
    42    \vfill}
    43  \pageno=\contentspagenumber \advance\pageno by 1
    44  
    45  @* Introduction.
    46  This program converts a \.{WEB} file to a \PASCAL\ file. It was written
    47  by D. E. Knuth in September, 1981; a somewhat similar {\mc SAIL} program had
    48  been developed in March, 1979. Since this program describes itself, a
    49  bootstrapping process involving hand-translation had to be used to get started.
    50  
    51  For large \.{WEB} files one should have a large memory, since \.{TANGLE} keeps
    52  all the \PASCAL\ text in memory (in an abbreviated form). The program uses
    53  a few features of the local \PASCAL\ compiler that may need to be changed in
    54  other installations:
    55  
    56  \yskip\item{1)} Case statements have a default.
    57  \item{2)} Input-output routines may need to be adapted for use with a particular
    58  character set and/or for printing messages on the user's terminal.
    59  
    60  \yskip\noindent
    61  These features are also present in the \PASCAL\ version of \TeX, where they
    62  are used in a similar (but more complex) way. System-dependent portions
    63  of \.{TANGLE} can be identified by looking at the entries for `system
    64  dependencies' in the index below.
    65  @!@^system dependencies@>
    66  
    67  The ``banner line'' defined here should be changed whenever \.{TANGLE}
    68  is modified.
    69  
    70  @d banner=='This is TANGLE, Version 4.6'
    71  
    72  @ The program begins with a fairly normal header, made up of pieces that
    73  @^system dependencies@>
    74  will mostly be filled in later. The \.{WEB} input comes from files |web_file|
    75  and |change_file|, the \PASCAL\ output goes to file |Pascal_file|,
    76  and the string pool output goes to file |pool|.
    77  
    78  If it is necessary to abort the job because of a fatal error, the program
    79  calls the `|jump_out|' procedure, which goes to the label |end_of_TANGLE|.
    80  
    81  @d end_of_TANGLE = 9999 {go here to wrap it up}
    82  
    83  @p @t\4@>@<Compiler directives@>@/
    84  program TANGLE(@!web_file,@!change_file,@!Pascal_file,@!pool);
    85  label end_of_TANGLE; {go here to finish}
    86  const @<Constants in the outer block@>@/
    87  type @<Types in the outer block@>@/
    88  var @<Globals in the outer block@>@/
    89  @<Error handling procedures@>@/
    90  procedure initialize;
    91    var @<Local variables for initialization@>@/
    92    begin @<Set initial values@>@/
    93    end;
    94  
    95  @ Some of this code is optional for use when debugging only;
    96  such material is enclosed between the delimiters |debug| and $|gubed|$.
    97  Other parts, delimited by |stat| and $|tats|$, are optionally included if
    98  statistics about \.{TANGLE}'s memory usage are desired.
    99  
   100  @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging}
   101  @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging}
   102  @f debug==begin
   103  @f gubed==end
   104  @#
   105  @d stat==@{ {change this to `$\\{stat}\equiv\null$'
   106    when gathering usage statistics}
   107  @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$'
   108    when gathering usage statistics}
   109  @f stat==begin
   110  @f tats==end
   111  
   112  @ The \PASCAL\ compiler used to develop this system has ``compiler
   113  directives'' that can appear in comments whose first character is a dollar sign.
   114  In production versions of \.{TANGLE} these directives tell the compiler that
   115  @^system dependencies@>
   116  it is safe to avoid range checks and to leave out the extra code it inserts
   117  for the \PASCAL\ debugger's benefit, although interrupts will occur if
   118  there is arithmetic overflow.
   119  
   120  @<Compiler directives@>=
   121  @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead}
   122  @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging}
   123  
   124  @ Labels are given symbolic names by the following definitions. We insert
   125  the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a
   126  procedure in which we have used the `|return|' statement defined below;
   127  the label `|restart|' is occasionally used at the very beginning of a
   128  procedure; and the label `|reswitch|' is occasionally used just prior to
   129  a \&{case} statement in which some cases change the conditions and we wish to
   130  branch to the newly applicable case.
   131  Loops that are set up with the \&{loop} construction defined below are
   132  commonly exited by going to `|done|' or to `|found|' or to `|not_found|',
   133  and they are sometimes repeated by going to `|continue|'.
   134  
   135  @d exit=10 {go here to leave a procedure}
   136  @d restart=20 {go here to start a procedure again}
   137  @d reswitch=21 {go here to start a case statement again}
   138  @d continue=22 {go here to resume a loop}
   139  @d done=30 {go here to exit a loop}
   140  @d found=31 {go here when you've found it}
   141  @d not_found=32 {go here when you've found something else}
   142  
   143  @ Here are some macros for common programming idioms.
   144  
   145  @d incr(#) == #:=#+1 {increase a variable by unity}
   146  @d decr(#) == #:=#-1 {decrease a variable by unity}
   147  @d loop == @+ while true do@+ {repeat over and over until a |goto| happens}
   148  @d do_nothing == {empty statement}
   149  @d return == goto exit {terminate a procedure call}
   150  @f return == nil
   151  @f loop == xclause
   152  
   153  @ We assume that |case| statements may include a default case that applies
   154  if no matching label is found. Thus, we shall use constructions like
   155  @^system dependencies@>
   156  $$\vbox{\halign{#\hfil\cr
   157  |case x of|\cr
   158  1: $\langle\,$code for $x=1\,\rangle$;\cr
   159  3: $\langle\,$code for $x=3\,\rangle$;\cr
   160  |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr
   161  |endcases|\cr}}$$
   162  since most \PASCAL\ compilers have plugged this hole in the language by
   163  incorporating some sort of default mechanism. For example, the compiler
   164  used to develop \.{WEB} and \TeX\ allows `|others|:' as a default label,
   165  and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or
   166  `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases|
   167  and |endcases| should be changed to agree with local conventions. (Of
   168  course, if no default mechanism is available, the |case| statements of
   169  this program must be extended by listing all remaining cases. The author
   170  would have taken the trouble to modify \.{TANGLE} so that such extensions
   171  were done automatically, if he had not wanted to encourage \PASCAL\
   172  compiler writers to make this important change in \PASCAL, where it belongs.)
   173  
   174  @d othercases == others: {default for cases not listed explicitly}
   175  @d endcases == @+end {follows the default case in an extended |case| statement}
   176  @f othercases == else
   177  @f endcases == end
   178  
   179  @ The following parameters are set big enough to handle \TeX, so they
   180  should be sufficient for most applications of \.{TANGLE}.
   181  
   182  @<Constants...@>=
   183  @!buf_size=100; {maximum length of input line}
   184  @!max_bytes=45000; {|1/ww| times the number of bytes in identifiers,
   185    strings, and module names; must be less than 65536}
   186  @!max_toks=65000; {|1/zz| times the number of bytes in compressed \PASCAL\ code;
   187    must be less than 65536}
   188  @!max_names=4000; {number of identifiers, strings, module names;
   189    must be less than 10240}
   190  @!max_texts=2000; {number of replacement texts, must be less than 10240}
   191  @!hash_size=353; {should be prime}
   192  @!longest_name=400; {module names shouldn't be longer than this}
   193  @!line_length=72; {lines of \PASCAL\ output have at most this many characters}
   194  @!out_buf_size=144; {length of output buffer, should be twice |line_length|}
   195  @!stack_size=50; {number of simultaneous levels of macro expansion}
   196  @!max_id_length=12; {long identifiers are chopped to this length, which must
   197    not exceed |line_length|}
   198  @!unambig_length=7; {identifiers must be unique if chopped to this length}
   199    {note that 7 is more strict than \PASCAL's 8, but this can be varied}
   200  
   201  @ A global variable called |history| will contain one of four values
   202  at the end of every run: |spotless| means that no unusual messages were
   203  printed; |harmless_message| means that a message of possible interest
   204  was printed but no serious errors were detected; |error_message| means that
   205  at least one error was found; |fatal_message| means that the program
   206  terminated abnormally. The value of |history| does not influence the
   207  behavior of the program; it is simply computed for the convenience
   208  of systems that might want to use such information.
   209  
   210  @d spotless=0 {|history| value for normal jobs}
   211  @d harmless_message=1 {|history| value when non-serious info was printed}
   212  @d error_message=2 {|history| value when an error was noted}
   213  @d fatal_message=3 {|history| value when we had to stop prematurely}
   214  @#
   215  @d mark_harmless==@t@>@+if history=spotless then history:=harmless_message
   216  @d mark_error==history:=error_message
   217  @d mark_fatal==history:=fatal_message
   218  
   219  @<Glob...@>=@!history:spotless..fatal_message; {how bad was this run?}
   220  
   221  @ @<Set init...@>=history:=spotless;
   222  
   223  @* The character set.
   224  One of the main goals in the design of \.{WEB} has been to make it readily
   225  portable between a wide variety of computers. Yet \.{WEB} by its very
   226  nature must use a greater variety of characters than most computer
   227  programs deal with, and character encoding is one of the areas in which
   228  existing machines differ most widely from each other.
   229  
   230  To resolve this problem, all input to \.{WEAVE} and \.{TANGLE} is converted
   231  to an internal eight-bit code that is essentially standard ASCII, the ``American
   232  Standard Code for Information Interchange.''  The conversion is done
   233  immediately when each character is read in. Conversely, characters are
   234  converted from ASCII to the user's external representation just before
   235  they are output. (The original ASCII code was seven bits only; \.{WEB} now
   236  allows eight bits in an attempt to keep up with modern times.)
   237  
   238  Such an internal code is relevant to users of \.{WEB} only because it is
   239  the code used for preprocessed constants like \.{"A"}. If you are writing
   240  a program in \.{WEB} that makes use of such one-character constants, you
   241  should convert your input to ASCII form, like \.{WEAVE} and \.{TANGLE} do.
   242  Otherwise \.{WEB}'s internal coding scheme does not affect you.
   243  @^ASCII code@>
   244  
   245  Here is a table of the standard visible ASCII codes:
   246  $$\def\:{\char\count255\global\advance\count255 by 1}
   247  \count255='40
   248  \vbox{
   249  \hbox{\hbox to 40pt{\it\hfill0\/\hfill}%
   250  \hbox to 40pt{\it\hfill1\/\hfill}%
   251  \hbox to 40pt{\it\hfill2\/\hfill}%
   252  \hbox to 40pt{\it\hfill3\/\hfill}%
   253  \hbox to 40pt{\it\hfill4\/\hfill}%
   254  \hbox to 40pt{\it\hfill5\/\hfill}%
   255  \hbox to 40pt{\it\hfill6\/\hfill}%
   256  \hbox to 40pt{\it\hfill7\/\hfill}}
   257  \vskip 4pt
   258  \hrule
   259  \def\^{\vrule height 10.5pt depth 4.5pt}
   260  \halign{\hbox to 0pt{\hskip -24pt\O{#0}\hfill}&\^
   261  \hbox to 40pt{\tt\hfill#\hfill\^}&
   262  &\hbox to 40pt{\tt\hfill#\hfill\^}\cr
   263  04&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   264  05&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   265  06&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   266  07&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   267  10&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   268  11&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   269  12&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   270  13&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   271  14&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   272  15&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   273  16&\:&\:&\:&\:&\:&\:&\:&\:\cr\noalign{\hrule}
   274  17&\:&\:&\:&\:&\:&\:&\:\cr}
   275  \hrule width 280pt}$$
   276  (Actually, of course, code @'040 is an invisible blank space.)  Code @'136
   277  was once an upward arrow (\.{\char'13}), and code @'137 was
   278  once a left arrow (\.^^X), in olden times when the first draft
   279  of ASCII code was prepared; but \.{WEB} works with today's standard
   280  ASCII in which those codes represent circumflex and underline as shown.
   281  
   282  @<Types...@>=
   283  @!ASCII_code=0..255; {eight-bit numbers, a subrange of the integers}
   284  
   285  @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit
   286  character sets were common, so it did not make provision for lowercase
   287  letters. Nowadays, of course, we need to deal with both capital and small
   288  letters in a convenient way, so \.{WEB} assumes that it is being used
   289  with a \PASCAL\ whose character set contains at least the characters of
   290  standard ASCII as listed above. Some \PASCAL\ compilers use the original
   291  name |char| for the data type associated with the characters in text files,
   292  while other \PASCAL s consider |char| to be a 64-element subrange of a larger
   293  data type that has some other name.
   294  
   295  In order to accommodate this difference, we shall use the name |text_char|
   296  to stand for the data type of the characters in the input and output
   297  files.  We shall also assume that |text_char| consists of the elements
   298  |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The
   299  following definitions should be adjusted if necessary.
   300  @^system dependencies@>
   301  
   302  @d text_char == char {the data type of characters in text files}
   303  @d first_text_char=0 {ordinal number of the smallest element of |text_char|}
   304  @d last_text_char=255 {ordinal number of the largest element of |text_char|}
   305  
   306  @<Types...@>=
   307  @!text_file=packed file of text_char;
   308  
   309  @ The \.{WEAVE} and \.{TANGLE} processors convert between ASCII code and
   310  the user's external character set by means of arrays |xord| and |xchr|
   311  that are analogous to \PASCAL's |ord| and |chr| functions.
   312  
   313  @<Globals...@>=
   314  @!xord: array [text_char] of ASCII_code;
   315    {specifies conversion of input characters}
   316  @!xchr: array [ASCII_code] of text_char;
   317    {specifies conversion of output characters}
   318  
   319  @ If we assume that every system using \.{WEB} is able to read and write the
   320  visible characters of standard ASCII (although not necessarily using the
   321  ASCII codes to represent them), the following assignment statements initialize
   322  most of the |xchr| array properly, without needing any system-dependent
   323  changes. For example, the statement \.{xchr[@@\'101]:=\'A\'} that appears
   324  in the present \.{WEB} file might be encoded in, say, {\mc EBCDIC} code
   325  on the external medium on which it resides, but \.{TANGLE} will convert from
   326  this external code to ASCII and back again. Therefore the assignment
   327  statement \.{XCHR[65]:=\'A\'} will appear in the corresponding \PASCAL\ file,
   328  and \PASCAL\ will compile this statement so that |xchr[65]| receives the
   329  character \.A in the external (|char|) code. Note that it would be quite
   330  incorrect to say \.{xchr[@@\'101]:="A"}, because |"A"| is a constant of
   331  type |integer|, not |char|, and because we have $|"A"|=65$ regardless of
   332  the external character set.
   333  
   334  @<Set init...@>=
   335  xchr[@'40]:=' ';
   336  xchr[@'41]:='!';
   337  xchr[@'42]:='"';
   338  xchr[@'43]:='#';
   339  xchr[@'44]:='$';
   340  xchr[@'45]:='%';
   341  xchr[@'46]:='&';
   342  xchr[@'47]:='''';@/
   343  xchr[@'50]:='(';
   344  xchr[@'51]:=')';
   345  xchr[@'52]:='*';
   346  xchr[@'53]:='+';
   347  xchr[@'54]:=',';
   348  xchr[@'55]:='-';
   349  xchr[@'56]:='.';
   350  xchr[@'57]:='/';@/
   351  xchr[@'60]:='0';
   352  xchr[@'61]:='1';
   353  xchr[@'62]:='2';
   354  xchr[@'63]:='3';
   355  xchr[@'64]:='4';
   356  xchr[@'65]:='5';
   357  xchr[@'66]:='6';
   358  xchr[@'67]:='7';@/
   359  xchr[@'70]:='8';
   360  xchr[@'71]:='9';
   361  xchr[@'72]:=':';
   362  xchr[@'73]:=';';
   363  xchr[@'74]:='<';
   364  xchr[@'75]:='=';
   365  xchr[@'76]:='>';
   366  xchr[@'77]:='?';@/
   367  xchr[@'100]:='@@';
   368  xchr[@'101]:='A';
   369  xchr[@'102]:='B';
   370  xchr[@'103]:='C';
   371  xchr[@'104]:='D';
   372  xchr[@'105]:='E';
   373  xchr[@'106]:='F';
   374  xchr[@'107]:='G';@/
   375  xchr[@'110]:='H';
   376  xchr[@'111]:='I';
   377  xchr[@'112]:='J';
   378  xchr[@'113]:='K';
   379  xchr[@'114]:='L';
   380  xchr[@'115]:='M';
   381  xchr[@'116]:='N';
   382  xchr[@'117]:='O';@/
   383  xchr[@'120]:='P';
   384  xchr[@'121]:='Q';
   385  xchr[@'122]:='R';
   386  xchr[@'123]:='S';
   387  xchr[@'124]:='T';
   388  xchr[@'125]:='U';
   389  xchr[@'126]:='V';
   390  xchr[@'127]:='W';@/
   391  xchr[@'130]:='X';
   392  xchr[@'131]:='Y';
   393  xchr[@'132]:='Z';
   394  xchr[@'133]:='[';
   395  xchr[@'134]:='\';
   396  xchr[@'135]:=']';
   397  xchr[@'136]:='^';
   398  xchr[@'137]:='_';@/
   399  xchr[@'140]:='`';
   400  xchr[@'141]:='a';
   401  xchr[@'142]:='b';
   402  xchr[@'143]:='c';
   403  xchr[@'144]:='d';
   404  xchr[@'145]:='e';
   405  xchr[@'146]:='f';
   406  xchr[@'147]:='g';@/
   407  xchr[@'150]:='h';
   408  xchr[@'151]:='i';
   409  xchr[@'152]:='j';
   410  xchr[@'153]:='k';
   411  xchr[@'154]:='l';
   412  xchr[@'155]:='m';
   413  xchr[@'156]:='n';
   414  xchr[@'157]:='o';@/
   415  xchr[@'160]:='p';
   416  xchr[@'161]:='q';
   417  xchr[@'162]:='r';
   418  xchr[@'163]:='s';
   419  xchr[@'164]:='t';
   420  xchr[@'165]:='u';
   421  xchr[@'166]:='v';
   422  xchr[@'167]:='w';@/
   423  xchr[@'170]:='x';
   424  xchr[@'171]:='y';
   425  xchr[@'172]:='z';
   426  xchr[@'173]:='{';
   427  xchr[@'174]:='|';
   428  xchr[@'175]:='}';
   429  xchr[@'176]:='~';@/
   430  xchr[0]:=' '; xchr[@'177]:=' '; {these ASCII codes are not used}
   431  
   432  @ Some of the ASCII codes below @'40 have been given symbolic names in
   433  \.{WEAVE} and \.{TANGLE} because they are used with a special meaning.
   434  
   435  @d and_sign=@'4 {equivalent to `\.{and}'}
   436  @d not_sign=@'5 {equivalent to `\.{not}'}
   437  @d set_element_sign=@'6 {equivalent to `\.{in}'}
   438  @d tab_mark=@'11 {ASCII code used as tab-skip}
   439  @d line_feed=@'12 {ASCII code thrown away at end of line}
   440  @d form_feed=@'14 {ASCII code used at end of page}
   441  @d carriage_return=@'15 {ASCII code used at end of line}
   442  @d left_arrow=@'30 {equivalent to `\.{:=}'}
   443  @d not_equal=@'32 {equivalent to `\.{<>}'}
   444  @d less_or_equal=@'34 {equivalent to `\.{<=}'}
   445  @d greater_or_equal=@'35 {equivalent to `\.{>=}'}
   446  @d equivalence_sign=@'36 {equivalent to `\.{==}'}
   447  @d or_sign=@'37 {equivalent to `\.{or}'}
   448  
   449  @ When we initialize the |xord| array and the remaining parts of |xchr|,
   450  it will be convenient to make use of an index variable, |i|.
   451  
   452  @<Local variables for init...@>=
   453  @!i:0..255;
   454  
   455  @ Here now is the system-dependent part of the character set.
   456  If \.{WEB} is being implemented on a garden-variety \PASCAL\ for which
   457  only standard ASCII codes will appear in the input and output files, you
   458  don't need to make any changes here. But if you have, for example, an extended
   459  character set like the one in Appendix~C of {\sl The \TeX book}, the first
   460  line of code in this module should be changed to
   461  $$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$
   462  \.{WEB}'s character set is essentially identical to \TeX's, even with respect to
   463  characters less than @'40.
   464  @^system dependencies@>
   465  
   466  Changes to the present module will make \.{WEB} more friendly on computers
   467  that have an extended character set, so that one can type things like
   468  \.^^Z\ instead of \.{<>}. If you have an extended set of characters that
   469  are easily incorporated into text files, you can assign codes arbitrarily
   470  here, giving an |xchr| equivalent to whatever characters the users of
   471  \.{WEB} are allowed to have in their input files, provided that unsuitable
   472  characters do not correspond to special codes like |carriage_return|
   473  that are listed above.
   474  
   475  (The present file \.{TANGLE.WEB} does not contain any of the non-ASCII
   476  characters, because it is intended to be used with all implementations of
   477  \.{WEB}.  It was originally created on a Stanford system that has a
   478  convenient extended character set, then ``sanitized'' by applying another
   479  program that transliterated all of the non-standard characters into
   480  standard equivalents.)
   481  
   482  @<Set init...@>=
   483  for i:=1 to @'37 do xchr[i]:=' ';
   484  for i:=@'200 to @'377 do xchr[i]:=' ';
   485  
   486  @ The following system-independent code makes the |xord| array contain a
   487  suitable inverse to the information in |xchr|.
   488  
   489  @<Set init...@>=
   490  for i:=first_text_char to last_text_char do xord[chr(i)]:=" ";
   491  for i:=1 to @'377 do xord[xchr[i]]:=i;
   492  xord[' ']:=" ";
   493  
   494  @* Input and output.
   495  The input conventions of this program are intended to be very much like those
   496  of \TeX\ (except, of course, that they are much simpler, because much less
   497  needs to be done). Furthermore they are identical to those of \.{WEAVE}.
   498  Therefore people who need to make modifications to all three systems
   499  should be able to do so without too many headaches.
   500  
   501  We use the standard \PASCAL\ input/output procedures in several places that
   502  \TeX\ cannot, since \.{TANGLE} does not have to deal with files that are named
   503  dynamically by the user, and since there is no input from the terminal.
   504  
   505  @ Terminal output is done by writing on file |term_out|, which is assumed to
   506  consist of characters of type |text_char|:
   507  @^system dependencies@>
   508  
   509  @d print(#)==write(term_out,#) {`|print|' means write on the terminal}
   510  @d print_ln(#)==write_ln(term_out,#) {`|print|' and then start new line}
   511  @d new_line==write_ln(term_out) {start new line}
   512  @d print_nl(#)==  {print information starting on a new line}
   513    begin new_line; print(#);
   514    end
   515  
   516  @<Globals...@>=
   517  @!term_out:text_file; {the terminal as an output file}
   518  
   519  @ Different systems have different ways of specifying that the output on a
   520  certain file will appear on the user's terminal. Here is one way to do this
   521  on the \PASCAL\ system that was used in \.{TANGLE}'s initial development:
   522  @^system dependencies@>
   523  
   524  @<Set init...@>=
   525  rewrite(term_out,'TTY:'); {send |term_out| output to the terminal}
   526  
   527  @ The |update_terminal| procedure is called when we want
   528  to make sure that everything we have output to the terminal so far has
   529  actually left the computer's internal buffers and been sent.
   530  @^system dependencies@>
   531  
   532  @d update_terminal == break(term_out) {empty the terminal output buffer}
   533  
   534  @ The main input comes from |web_file|; this input may be overridden
   535  by changes in |change_file|. (If |change_file| is empty, there are no changes.)
   536  
   537  @<Globals...@>=
   538  @!web_file:text_file; {primary input}
   539  @!change_file:text_file; {updates}
   540  
   541  @ The following code opens the input files.  Since these files were listed
   542  in the program header, we assume that the \PASCAL\ runtime system has
   543  already checked that suitable file names have been given; therefore no
   544  additional error checking needs to be done.
   545  @^system dependencies@>
   546  
   547  @p procedure open_input; {prepare to read |web_file| and |change_file|}
   548  begin reset(web_file); reset(change_file);
   549  end;
   550  
   551  @ The main output goes to |Pascal_file|, and string pool constants are
   552  written to the |pool| file.
   553  
   554  @<Globals...@>=
   555  @!Pascal_file: text_file;
   556  @!pool: text_file;
   557  
   558  @ The following code opens |Pascal_file| and |pool|.
   559  Since these files were listed in the program header, we assume that the
   560  \PASCAL\ runtime system has checked that suitable external file names have
   561  been given.
   562  @^system dependencies@>
   563  
   564  @<Set init...@>=
   565  rewrite(Pascal_file); rewrite(pool);
   566  
   567  @ Input goes into an array called |buffer|.
   568  
   569  @<Globals...@>=@!buffer: array[0..buf_size] of ASCII_code;
   570  
   571  @ The |input_ln| procedure brings the next line of input from the specified
   572  file into the |buffer| array and returns the value |true|, unless the file has
   573  already been entirely read, in which case it returns |false|. The conventions
   574  of \TeX\ are followed; i.e., |ASCII_code| numbers representing the next line
   575  of the file are input into |buffer[0]|, |buffer[1]|, \dots,
   576  |buffer[limit-1]|; trailing blanks are ignored;
   577  and the global variable |limit| is set to the length of the
   578  @^system dependencies@>
   579  line. The value of |limit| must be strictly less than |buf_size|.
   580  
   581  We assume that none of the |ASCII_code| values
   582  of |buffer[j]| for |0<=j<limit| is equal to 0, @'177, |line_feed|, |form_feed|,
   583  or |carriage_return|.
   584  
   585  @p function input_ln(var f:text_file):boolean;
   586    {inputs a line or returns |false|}
   587  var final_limit:0..buf_size; {|limit| without trailing blanks}
   588  begin limit:=0; final_limit:=0;
   589  if eof(f) then input_ln:=false
   590  else  begin while not eoln(f) do
   591      begin buffer[limit]:=xord[f^]; get(f);
   592      incr(limit);
   593      if buffer[limit-1]<>" " then final_limit:=limit;
   594      if limit=buf_size then
   595        begin while not eoln(f) do get(f);
   596        decr(limit); {keep |buffer[buf_size]| empty}
   597        if final_limit>limit then final_limit:=limit;
   598        print_nl('! Input line too long'); loc:=0; error;
   599  @.Input line too long@>
   600        end;
   601      end;
   602    read_ln(f); limit:=final_limit; input_ln:=true;
   603    end;
   604  end;
   605  
   606  @* Reporting errors to the user.
   607  The \.{TANGLE} processor operates in two phases: first it inputs the source
   608  file and stores a compressed representation of the program, then it produces
   609  the \PASCAL\ output from the compressed representation.
   610  
   611  The global variable |phase_one| tells whether we are in Phase I or not.
   612  
   613  @<Globals...@>=
   614  @!phase_one: boolean; {|true| in Phase I, |false| in Phase II}
   615  
   616  @ If an error is detected while we are debugging,
   617  we usually want to look at the contents of memory.
   618  A special procedure will be declared later for this purpose.
   619  
   620  @<Error handling...@>=
   621  @!debug @+ procedure debug_help; forward;@+ gubed
   622  
   623  @ During the first phase, syntax errors are reported to the user by saying
   624  $$\hbox{`|err_print('! Error message')|'},$$
   625  followed by `|jump_out|' if no recovery from the error is provided.
   626  This will print the error message followed by an indication of where the error
   627  was spotted in the source file. Note that no period follows the error message,
   628  since the error routine will automatically supply a period.
   629  
   630  Errors that are noticed during the second phase are reported to the user
   631  in the same fashion, but the error message will be
   632  followed by an indication of where the error was spotted in the output file.
   633  
   634  The actual error indications are provided by a procedure called |error|.
   635  
   636  @d err_print(#)==begin new_line; print(#); error;
   637    end
   638  
   639  @<Error handling...@>=
   640  procedure error; {prints '\..' and location of error message}
   641  var j: 0..out_buf_size; {index into |out_buf|}
   642  @!k,@!l: 0..buf_size; {indices into |buffer|}
   643  begin if phase_one then @<Print error location based on input buffer@>
   644  else @<Print error location based on output buffer@>;
   645  update_terminal; mark_error;
   646  @!debug debug_skipped:=debug_cycle; debug_help;@+gubed
   647  end;
   648  
   649  @ The error locations during Phase I can be indicated by using the global
   650  variables |loc|, |line|, and |changing|, which tell respectively the first
   651  unlooked-at position in |buffer|, the current line number, and whether or not
   652  the current line is from |change_file| or |web_file|.
   653  This routine should be modified on systems whose standard text editor
   654  has special line-numbering conventions.
   655  @^system dependencies@>
   656  
   657  @<Print error location based on input buffer@>=
   658  begin if changing then print('. (change file ')@+else print('. (');
   659  print_ln('l.', line:1, ')');
   660  if loc>=limit then l:=limit else l:=loc;
   661  for k:=1 to l do
   662    if buffer[k-1]=tab_mark then print(' ')
   663    else print(xchr[buffer[k-1]]); {print the characters already read}
   664  new_line;
   665  for k:=1 to l do print(' '); {space out the next line}
   666  for k:=l+1 to limit do print(xchr[buffer[k-1]]); {print the part not yet read}
   667  print(' '); {this space separates the message from future asterisks}
   668  end
   669  
   670  @ The position of errors detected during the second phase can be indicated
   671  by outputting the partially-filled output buffer, which contains |out_ptr|
   672  entries.
   673  
   674  @<Print error location based on output...@>=
   675  begin print_ln('. (l.',line:1,')');
   676  for j:=1 to out_ptr do print(xchr[out_buf[j-1]]); {print current partial line}
   677  print('... '); {indicate that this information is partial}
   678  end
   679  
   680  @ The |jump_out| procedure just cuts across all active procedure levels
   681  and jumps out of the program. This is the only non-local |goto| statement
   682  in \.{TANGLE}. It is used when no recovery from a particular error has
   683  been provided.
   684  
   685  Some \PASCAL\ compilers do not implement non-local |goto| statements.
   686  @^system dependencies@>
   687  In such cases the code that appears at label |end_of_TANGLE| should be
   688  copied into the |jump_out| procedure, followed by a call to a system procedure
   689  that terminates the program.
   690  
   691  @d fatal_error(#)==begin new_line; print(#); error; mark_fatal; jump_out;
   692    end
   693  
   694  @<Error handling...@>=
   695  procedure jump_out;
   696  begin goto end_of_TANGLE;
   697  end;
   698  
   699  @ Sometimes the program's behavior is far different from what it should be,
   700  and \.{TANGLE} prints an error message that is really for the \.{TANGLE}
   701  maintenance person, not the user. In such cases the program says
   702  |confusion('indication of where we are')|.
   703  
   704  @d confusion(#)==fatal_error('! This can''t happen (',#,')')
   705  @.This can't happen@>
   706  
   707  @ An overflow stop occurs if \.{TANGLE}'s tables aren't large enough.
   708  
   709  @d overflow(#)==fatal_error('! Sorry, ',#,' capacity exceeded')
   710  @.Sorry, x capacity exceeded@>
   711  
   712  
   713  @* Data structures.
   714  Most of the user's \PASCAL\ code is packed into eight-bit integers
   715  in two large arrays called |byte_mem| and |tok_mem|.
   716  The |byte_mem| array holds the names of identifiers, strings, and modules;
   717  the |tok_mem| array holds the replacement texts
   718  for macros and modules. Allocation is sequential, since things are deleted only
   719  during Phase II, and only in a last-in-first-out manner.
   720  
   721  Auxiliary arrays |byte_start| and |tok_start| are used as directories to
   722  |byte_mem| and |tok_mem|, and the |link|, |ilk|, |equiv|, and |text_link|
   723  arrays give further information about names. These auxiliary arrays
   724  consist of sixteen-bit items.
   725  
   726  @<Types...@>=
   727  @!eight_bits=0..255; {unsigned one-byte quantity}
   728  @!sixteen_bits=0..65535; {unsigned two-byte quantity}
   729  
   730  @ \.{TANGLE} has been designed to avoid the need for indices that are more
   731  than sixteen bits wide, so that it can be used on most computers. But
   732  there are programs that need more than 65536 tokens, and some programs
   733  even need more than 65536 bytes; \TeX\ is one of these.  To get around
   734  this problem, a slight complication has been added to the data structures:
   735  |byte_mem| and |tok_mem| are two-dimensional arrays, whose first index is
   736  either 0 or 1 or 2. (For generality, the first index is actually allowed to run
   737  between 0 and |ww-1| in |byte_mem|, or between 0 and |zz-1| in |tok_mem|,
   738  where |ww| and |zz| are set to 2 and~3; the program will work for any
   739  positive values of |ww| and |zz|, and it can be simplified in obvious ways
   740  if |ww=1| or |zz=1|.)
   741  
   742  @d ww=2 {we multiply the byte capacity by approximately this amount}
   743  @d zz=3 {we multiply the token capacity by approximately this amount}
   744  
   745  @<Globals...@>=
   746  @!byte_mem: packed array [0..ww-1,0..max_bytes] of ASCII_code;
   747    {characters of names}
   748  @!tok_mem: packed array [0..zz-1,0..max_toks] of eight_bits; {tokens}
   749  @!byte_start: array [0..max_names] of sixteen_bits; {directory into |byte_mem|}
   750  @!tok_start: array [0..max_texts] of sixteen_bits; {directory into |tok_mem|}
   751  @!link: array [0..max_names] of sixteen_bits; {hash table or tree links}
   752  @!ilk: array [0..max_names] of sixteen_bits; {type codes or tree links}
   753  @!equiv: array [0..max_names] of sixteen_bits; {info corresponding to names}
   754  @!text_link: array [0..max_texts] of sixteen_bits; {relates replacement texts}
   755  
   756  @ The names of identifiers are found by computing a hash address |h| and
   757  then looking at strings of bytes signified by |hash[h]|, |link[hash[h]]|,
   758  |link[link[hash[h]]]|, \dots, until either finding the desired name
   759  or encountering a zero.
   760  
   761  A `|name_pointer|' variable, which signifies a name, is an index into
   762  |byte_start|. The actual sequence of characters in the name pointed to by
   763  |p| appears in positions |byte_start[p]| to |byte_start[p+ww]-1|, inclusive,
   764  in the segment of |byte_mem| whose first index is |p mod ww|. Thus, when
   765  |ww=2| the even-numbered name bytes appear in |byte_mem[0,@t$*$@>]|
   766  and the odd-numbered ones appear in |byte_mem[1,@t$*$@>]|.
   767  The pointer 0 is used for undefined module names; we don't
   768  want to use it for the names of identifiers, since 0 stands for a null
   769  pointer in a linked list.
   770  
   771  Strings are treated like identifiers; the first character (a double-quote)
   772  distinguishes a string from an alphabetic name, but for \.{TANGLE}'s purposes
   773  strings behave like numeric macros. (A `string' here refers to the
   774  strings delimited by double-quotes that \.{TANGLE} processes. \PASCAL\
   775  string constants delimited by single-quote marks are not given such special
   776  treatment; they simply appear as sequences of characters in the \PASCAL\
   777  texts.)  The total number of strings in the string
   778  pool is called |string_ptr|, and the total number of names in |byte_mem|
   779  is called |name_ptr|. The total number of bytes occupied in
   780  |byte_mem[w,@t$*$@>]| is called |byte_ptr[w]|.
   781  
   782  We usually have |byte_start[name_ptr+w]=byte_ptr[(name_ptr+w) mod ww]|
   783  for |0<=w<ww|, since these are the starting positions for the next |ww|
   784  names to be stored in |byte_mem|.
   785  
   786  @d length(#)==byte_start[#+ww]-byte_start[#] {the length of a name}
   787  
   788  @<Types...@>=
   789  @!name_pointer=0..max_names; {identifies a name}
   790  
   791  @ @<Global...@>=
   792  @!name_ptr:name_pointer; {first unused position in |byte_start|}
   793  @!string_ptr:name_pointer; {next number to be given to a string of length |<>1|}
   794  @!byte_ptr:array [0..ww-1] of 0..max_bytes;
   795    {first unused position in |byte_mem|}
   796  @!pool_check_sum:integer; {sort of a hash for the whole string pool}
   797  
   798  @ @<Local variables for init...@>=
   799  @!wi: 0..ww-1; {to initialize the |byte_mem| indices}
   800  
   801  @ @<Set init...@>=
   802  for wi:=0 to ww-1 do
   803    begin byte_start[wi]:=0; byte_ptr[wi]:=0;
   804    end;
   805  byte_start[ww]:=0; {this makes name 0 of length zero}
   806  name_ptr:=1; string_ptr:=256; pool_check_sum:=271828;
   807  
   808  @ Replacement texts are stored in |tok_mem|, using similar conventions.
   809  A `|text_pointer|' variable is an index into |tok_start|, and the
   810  replacement text that corresponds to |p| runs from positions
   811  |tok_start[p]| to |tok_start[p+zz]-1|, inclusive, in the segment of
   812  |tok_mem| whose first index is |p mod zz|. Thus, when |zz=2| the
   813  even-numbered replacement texts appear in |tok_mem[0,@t$*$@>]| and the
   814  odd-numbered ones appear in |tok_mem[1,@t$*$@>]|.  Furthermore,
   815  |text_link[p]| is used to connect pieces of text that have the same name,
   816  as we shall see later. The pointer 0 is used for undefined replacement
   817  texts.
   818  
   819  The first position of |tok_mem[z,@t$*$@>]| that is unoccupied by
   820  replacement text is called |tok_ptr[z]|, and the first unused location of
   821  |tok_start| is called |text_ptr|.  We usually have the identity
   822  |tok_start[text_ptr+z]=tok_ptr[(text_ptr+z) mod zz]|, for |0<=z<zz|, since
   823  these are the starting positions for the next |zz| replacement texts to
   824  be stored in |tok_mem|.
   825  
   826  @<Types...@>=
   827  @!text_pointer=0..max_texts; {identifies a replacement text}
   828  
   829  @ It is convenient to maintain a variable |z| that is equal to |text_ptr
   830  mod zz|, so that we always insert tokens into segment |z| of |tok_mem|.
   831  
   832  @<Glob...@>=
   833  @t\hskip1em@>@!text_ptr:text_pointer; {first unused position in |tok_start|}
   834  @t\hskip1em@>@!tok_ptr:array[0..zz-1] of 0..max_toks;
   835    {first unused position in a given segment of |tok_mem|}
   836  @t\hskip1em@>@!z:0..zz-1; {current segment of |tok_mem|}
   837  stat @!max_tok_ptr:array[0..zz-1] of 0..max_toks;
   838    {largest values assumed by |tok_ptr|}
   839  tats
   840  
   841  @ @<Local variables for init...@>=
   842  @!zi:0..zz-1; {to initialize the |tok_mem| indices}
   843  
   844  @ @<Set init...@>=
   845  for zi:=0 to zz-1 do
   846    begin tok_start[zi]:=0; tok_ptr[zi]:=0;
   847    end;
   848  tok_start[zz]:=0; {this makes replacement text 0 of length zero}
   849  text_ptr:=1; z:=1 mod zz;
   850  
   851  @ Four types of identifiers are distinguished by their |ilk|:
   852  
   853  \yskip\hang |normal| identifiers will appear in the \PASCAL\ program as
   854  ordinary identifiers since they have not been defined to be macros; the
   855  corresponding value in the |equiv| array
   856  for such identifiers is a link in a secondary hash table that
   857  is used to check whether any two of them agree in their first |unambig_length|
   858  characters after underline symbols are removed and lowercase letters are
   859  changed to uppercase.
   860  
   861  \yskip\hang |numeric| identifiers have been defined to be numeric macros;
   862  their |equiv| value contains the corresponding numeric value plus $2^{15}$.
   863  Strings are treated as numeric macros.
   864  
   865  \yskip\hang |simple| identifiers have been defined to be simple macros;
   866  their |equiv| value points to the corresponding replacement text.
   867  
   868  \yskip\hang |parametric| identifiers have been defined to be parametric macros;
   869  like simple identifiers, their |equiv| value points to the replacement text.
   870  
   871  @d normal=0 {ordinary identifiers have |normal| ilk}
   872  @d numeric=1 {numeric macros and strings have |numeric| ilk}
   873  @d simple=2 {simple macros have |simple| ilk}
   874  @d parametric=3 {parametric macros have |parametric| ilk}
   875  
   876  @ The names of modules are stored in |byte_mem| together
   877  with the identifier names, but a hash table is not used for them because
   878  \.{TANGLE} needs to be able to recognize a module name when given a prefix of
   879  that name. A conventional binary search tree is used to retrieve module names,
   880  with fields called |llink| and |rlink| in place of |link| and |ilk|. The
   881  root of this tree is |rlink[0]|. If |p| is a pointer to a module name,
   882  |equiv[p]| points to its replacement text, just as in simple and parametric
   883  macros, unless this replacement text has not yet been defined (in which case
   884  |equiv[p]=0|).
   885  
   886  @d llink==link {left link in binary search tree for module names}
   887  @d rlink==ilk {right link in binary search tree for module names}
   888  
   889  @<Set init...@>=
   890  rlink[0]:=0; {the binary search tree starts out with nothing in it}
   891  equiv[0]:=0; {the undefined module has no replacement text}
   892  
   893  @ Here is a little procedure that prints the text of a given name.
   894  
   895  @p procedure print_id(@!p:name_pointer); {print identifier or module name}
   896  var k:0..max_bytes; {index into |byte_mem|}
   897  @!w:0..ww-1; {segment of |byte_mem|}
   898  begin if p>=name_ptr then print('IMPOSSIBLE')
   899  else  begin w:=p mod ww;
   900    for k:=byte_start[p] to byte_start[p+ww]-1 do print(xchr[byte_mem[w,k]]);
   901    end;
   902  end;
   903  
   904  @* Searching for identifiers.
   905  The hash table described above is updated by the |id_lookup| procedure,
   906  which finds a given identifier and returns a pointer to its index in
   907  |byte_start|. If the identifier was not already present, it is inserted with
   908  a given |ilk| code; and an error message is printed if the identifier is being
   909  doubly defined.
   910  
   911  Because of the way \.{TANGLE}'s scanning mechanism works, it is most convenient
   912  to let |id_lookup| search for an identifier that is present in the |buffer|
   913  array. Two other global variables specify its position in the buffer: the
   914  first character is |buffer[id_first]|, and the last is |buffer[id_loc-1]|.
   915  Furthermore, if the identifier is really a string, the global variable
   916  |double_chars| tells how many of the characters in the buffer appear
   917  twice (namely \.{@@@@} and \.{""}), since this additional information makes
   918  it easy to calculate the true length of the string. The final double-quote
   919  of the string is not included in its ``identifier,'' but the first one is,
   920  so the string length is |id_loc-id_first-double_chars-1|.
   921  
   922  We have mentioned that |normal| identifiers belong to two hash tables,
   923  one for their true names as they appear in the \.{WEB} file and the other
   924  when they have been reduced to their first |unambig_length| characters.
   925  The hash tables are kept by the method of simple chaining, where the
   926  heads of the individual lists appear in the |hash| and |chop_hash| arrays.
   927  If |h| is a hash code, the primary hash table list starts at |hash[h]| and
   928  proceeds through |link| pointers; the secondary hash table list starts at
   929  |chop_hash[h]| and proceeds through |equiv| pointers. Of course, the same
   930  identifier will probably have two different values of |h|.
   931  
   932  The |id_lookup| procedure uses an auxiliary array called |chopped_id| to
   933  contain up to |unambig_length| characters of the current identifier, if
   934  it is necessary to compute the secondary hash code. (This array could be
   935  declared local to |id_lookup|, but in general we are making all array
   936  declarations global in this program, because some compilers and some machine
   937  architectures make dynamic array allocation inefficient.)
   938  
   939  @<Glob...@>=
   940  @!id_first:0..buf_size; {where the current identifier begins in the buffer}
   941  @!id_loc:0..buf_size; {just after the current identifier in the buffer}
   942  @!double_chars:0..buf_size; {correction to length in case of strings}
   943  @#
   944  @!hash,@!chop_hash:array [0..hash_size] of sixteen_bits; {heads of hash lists}
   945  @!chopped_id:array [0..unambig_length] of ASCII_code; {chopped identifier}
   946  
   947  @ Initially all the hash lists are empty.
   948  
   949  @<Local variables for init...@>=
   950  @!h:0..hash_size; {index into hash-head arrays}
   951  
   952  @ @<Set init...@>=
   953  for h:=0 to hash_size-1 do
   954    begin hash[h]:=0; chop_hash[h]:=0;
   955    end;
   956  
   957  @ Here now is the main procedure for finding identifiers (and strings).
   958  The parameter |t| is set to |normal| except when the identifier is
   959  a macro name that is just being defined; in the latter case, |t| will be
   960  |numeric|, |simple|, or |parametric|.
   961  
   962  @p function id_lookup(@!t:eight_bits):name_pointer; {finds current identifier}
   963  label found, not_found;
   964  var c:eight_bits; {byte being chopped}
   965  @!i:0..buf_size; {index into |buffer|}
   966  @!h:0..hash_size; {hash code}
   967  @!k:0..max_bytes; {index into |byte_mem|}
   968  @!w:0..ww-1; {segment of |byte_mem|}
   969  @!l:0..buf_size; {length of the given identifier}
   970  @!p,@!q:name_pointer; {where the identifier is being sought}
   971  @!s:0..unambig_length; {index into |chopped_id|}
   972  begin l:=id_loc-id_first; {compute the length}
   973  @<Compute the hash code |h|@>;
   974  @<Compute the name location |p|@>;
   975  if (p=name_ptr)or(t<>normal) then
   976    @<Update the tables and check for possible errors@>;
   977  id_lookup:=p;
   978  end;
   979  
   980  @ A simple hash code is used: If the sequence of
   981  ASCII codes is $c_1c_2\ldots c_n$, its hash value will be
   982  $$(2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\,\bmod\,|hash_size|.$$
   983  
   984  @<Compute the hash...@>=
   985  h:=buffer[id_first]; i:=id_first+1;
   986  while i<id_loc do
   987    begin h:=(h+h+buffer[i]) mod hash_size; incr(i);
   988    end
   989  
   990  @ If the identifier is new, it will be placed in position |p=name_ptr|,
   991  otherwise |p| will point to its existing location.
   992  
   993  @<Compute the name location...@>=
   994  p:=hash[h];
   995  while p<>0 do
   996    begin if length(p)=l then
   997        @<Compare name |p| with current identifier, |goto found| if equal@>;
   998    p:=link[p];
   999    end;
  1000  p:=name_ptr; {the current identifier is new}
  1001  link[p]:=hash[h]; hash[h]:=p; {insert |p| at beginning of hash list}
  1002  found:
  1003  
  1004  @ @<Compare name |p|...@>=
  1005  begin i:=id_first; k:=byte_start[p]; w:=p mod ww;
  1006  while (i<id_loc)and(buffer[i]=byte_mem[w,k]) do
  1007    begin incr(i); incr(k);
  1008    end;
  1009  if i=id_loc then goto found; {all characters agree}
  1010  end
  1011  
  1012  @ @<Update the tables...@>=
  1013  begin if ((p<>name_ptr)and(t<>normal)and(ilk[p]=normal)) or
  1014      ((p=name_ptr)and(t=normal)and(buffer[id_first]<>"""")) then
  1015    @<Compute the secondary hash code |h| and put the first characters
  1016    into the auxiliary array |chopped_id|@>;
  1017  if p<>name_ptr then
  1018    @<Give double-definition error, if necessary, and change |p| to type |t|@>
  1019  else @<Enter a new identifier into the table at position |p|@>;
  1020  end
  1021  
  1022  @ The following routine, which is called into play when it is necessary to
  1023  look at the secondary hash table, computes the same hash function as before
  1024  (but on the chopped data), and places a zero after the chopped identifier
  1025  in |chopped_id| to serve as a convenient sentinel.
  1026  
  1027  @<Compute the secondary...@>=
  1028  begin i:=id_first; s:=0; h:=0;
  1029  while (i<id_loc)and(s<unambig_length) do
  1030    begin if buffer[i]<>"_" then
  1031      begin if buffer[i]>="a" then chopped_id[s]:=buffer[i]-@'40
  1032      else chopped_id[s]:=buffer[i];
  1033      h:=(h+h+chopped_id[s]) mod hash_size; incr(s);
  1034      end;
  1035    incr(i);
  1036    end;
  1037  chopped_id[s]:=0;
  1038  end
  1039  
  1040  @ If a nonnumeric macro has appeared before it was defined, \.{TANGLE}
  1041  will still work all right; after all, such behavior is typical of the
  1042  replacement texts for modules, which act very much like macros.
  1043  However, an undefined numeric macro may not be used on the right-hand
  1044  side of another numeric macro definition, so \.{TANGLE} finds it
  1045  simplest to make a blanket rule that numeric macros should be defined
  1046  before they are used. The following routine gives an error message and
  1047  also fixes up any damage that may have been caused.
  1048  
  1049  @<Give double...@>= {now |p<>name_ptr| and |t<>normal|}
  1050  begin if ilk[p]=normal then
  1051    begin if t=numeric then err_print('! This identifier has already appeared');
  1052  @.This identifier has already...@>
  1053    @<Remove |p| from secondary hash table@>;
  1054    end
  1055  else err_print('! This identifier was defined before');
  1056  @.This identifier was defined...@>
  1057  ilk[p]:=t;
  1058  end
  1059  
  1060  @ When we have to remove a secondary hash entry, because a |normal| identifier
  1061  is changing to another |ilk|, the hash code |h| and chopped identifier have
  1062  already been computed.
  1063  
  1064  @<Remove |p| from secondary...@>=
  1065  q:=chop_hash[h];
  1066  if q=p then chop_hash[h]:=equiv[p]
  1067  else  begin while equiv[q]<>p do q:=equiv[q];
  1068    equiv[q]:=equiv[p];
  1069    end
  1070  
  1071  @ The following routine could make good use of a generalized |pack| procedure
  1072  that puts items into just part of a packed array instead of the whole thing.
  1073  
  1074  @<Enter a new identifier...@>=
  1075  begin if (t=normal)and(buffer[id_first]<>"""") then
  1076    @<Check for ambiguity and update secondary hash@>;
  1077  w:=name_ptr mod ww; k:=byte_ptr[w];
  1078  if k+l>max_bytes then overflow('byte memory');
  1079  if name_ptr>max_names-ww then overflow('name');
  1080  i:=id_first; {get ready to move the identifier into |byte_mem|}
  1081  while i<id_loc do
  1082    begin byte_mem[w,k]:=buffer[i]; incr(k); incr(i);
  1083    end;
  1084  byte_ptr[w]:=k; byte_start[name_ptr+ww]:=k; incr(name_ptr);
  1085  if buffer[id_first]<>"""" then ilk[p]:=t
  1086  else @<Define and output a new string of the pool@>;
  1087  end
  1088  
  1089  @ @<Check for ambig...@>=
  1090  begin q:=chop_hash[h];
  1091  while q<>0 do
  1092    begin @<Check if |q| conflicts with |p|@>;
  1093    q:=equiv[q];
  1094    end;
  1095  equiv[p]:=chop_hash[h]; chop_hash[h]:=p; {put |p| at front of secondary list}
  1096  end
  1097  
  1098  @ @<Check if |q| conflicts...@>=
  1099  begin k:=byte_start[q]; s:=0; w:=q mod ww;
  1100  while (k<byte_start[q+ww]) and (s<unambig_length) do
  1101    begin c:=byte_mem[w,k];
  1102    if c<>"_" then
  1103      begin if c>="a" then c:=c-@'40; {merge lowercase with uppercase}
  1104      if chopped_id[s]<>c then goto not_found;
  1105      incr(s);
  1106      end;
  1107    incr(k);
  1108    end;
  1109  if (k=byte_start[q+ww])and(chopped_id[s]<>0) then goto not_found;
  1110  print_nl('! Identifier conflict with ');
  1111  @.Identifier conflict...@>
  1112  for k:=byte_start[q] to byte_start[q+ww]-1 do print(xchr[byte_mem[w,k]]);
  1113  error; q:=0; {only one conflict will be printed, since |equiv[0]=0|}
  1114  not_found:
  1115  end
  1116  
  1117  @ We compute the string pool check sum by working modulo a prime number
  1118  that is large but not so large that overflow might occur.
  1119  
  1120  @d check_sum_prime==@'3777777667 {$2^{29}-73$}
  1121  @^preprocessed strings@>
  1122  
  1123  @<Define and output a new string...@>=
  1124  begin ilk[p]:=numeric; {strings are like numeric macros}
  1125  if l-double_chars=2 then {this string is for a single character}
  1126    equiv[p]:=buffer[id_first+1]+@'100000
  1127  else  begin equiv[p]:=string_ptr+@'100000;
  1128    l:=l-double_chars-1;
  1129    if l>99 then err_print('! Preprocessed string is too long');
  1130  @.Preprocessed string is too long@>
  1131    incr(string_ptr);
  1132    write(pool,xchr["0"+l div 10],xchr["0"+l mod 10]); {output the length}
  1133    pool_check_sum:=pool_check_sum+pool_check_sum+l;
  1134    while pool_check_sum>check_sum_prime do
  1135      pool_check_sum:=pool_check_sum-check_sum_prime;
  1136    i:=id_first+1;
  1137    while i<id_loc do
  1138      begin write(pool,xchr[buffer[i]]); {output characters of string}
  1139      pool_check_sum:=pool_check_sum+pool_check_sum+buffer[i];
  1140      while pool_check_sum>check_sum_prime do
  1141        pool_check_sum:=pool_check_sum-check_sum_prime;
  1142      if (buffer[i]="""") or (buffer[i]="@@") then
  1143        i:=i+2 {omit second appearance of doubled character}
  1144      else incr(i);
  1145      end;
  1146    write_ln(pool);
  1147    end;
  1148  end
  1149  
  1150  @* Searching for module names.
  1151  The |mod_lookup| procedure finds the module name |mod_text[1..l]| in the
  1152  search tree, after inserting it if necessary, and returns a pointer to
  1153  where it was found.
  1154  
  1155  @<Glob...@>=
  1156  @!mod_text:array [0..longest_name] of ASCII_code; {name being sought for}
  1157  
  1158  @ According to the rules of \.{WEB}, no module name
  1159  should be a proper prefix of another, so a ``clean'' comparison should
  1160  occur between any two names. The result of |mod_lookup| is 0 if this
  1161  prefix condition is violated. An error message is printed when such violations
  1162  are detected during phase two of \.{WEAVE}.
  1163  
  1164  @d less=0 {the first name is lexicographically less than the second}
  1165  @d equal=1 {the first name is equal to the second}
  1166  @d greater=2 {the first name is lexicographically greater than the second}
  1167  @d prefix=3 {the first name is a proper prefix of the second}
  1168  @d extension=4 {the first name is a proper extension of the second}
  1169  
  1170  @p function mod_lookup(@!l:sixteen_bits):name_pointer; {finds module name}
  1171  label found;
  1172  var c:less..extension; {comparison between two names}
  1173  @!j:0..longest_name; {index into |mod_text|}
  1174  @!k:0..max_bytes; {index into |byte_mem|}
  1175  @!w:0..ww-1; {segment of |byte_mem|}
  1176  @!p:name_pointer; {current node of the search tree}
  1177  @!q:name_pointer; {father of node |p|}
  1178  begin c:=greater; q:=0; p:=rlink[0]; {|rlink[0]| is the root of the tree}
  1179  while p<>0 do
  1180    begin @<Set \(|c| to the result of comparing the given name to
  1181      name |p|@>;
  1182    q:=p;
  1183    if c=less then p:=llink[q]
  1184    else if c=greater then p:=rlink[q]
  1185    else goto found;
  1186    end;
  1187  @<Enter a new module name into the tree@>;
  1188  found: if c<>equal then
  1189    begin err_print('! Incompatible section names'); p:=0;
  1190  @.Incompatible module names@>
  1191    end;
  1192  mod_lookup:=p;
  1193  end;
  1194  
  1195  @ @<Enter a new module name...@>=
  1196  w:=name_ptr mod ww; k:=byte_ptr[w];
  1197  if k+l>max_bytes then overflow('byte memory');
  1198  if name_ptr>max_names-ww then overflow('name');
  1199  p:=name_ptr;
  1200  if c=less then llink[q]:=p else rlink[q]:=p;
  1201  llink[p]:=0; rlink[p]:=0; c:=equal; equiv[p]:=0;
  1202  for j:=1 to l do byte_mem[w,k+j-1]:=mod_text[j];
  1203  byte_ptr[w]:=k+l; byte_start[name_ptr+ww]:=k+l; incr(name_ptr);
  1204  
  1205  @ @<Set \(|c|...@>=
  1206  begin k:=byte_start[p]; w:=p mod ww; c:=equal; j:=1;
  1207  while (k<byte_start[p+ww]) and (j<=l) and (mod_text[j]=byte_mem[w,k]) do
  1208    begin incr(k); incr(j);
  1209    end;
  1210  if k=byte_start[p+ww] then
  1211    if j>l then c:=equal
  1212    else c:=extension
  1213  else if j>l then c:=prefix
  1214  else if mod_text[j]<byte_mem[w,k] then c:=less
  1215  else c:=greater;
  1216  end
  1217  
  1218  @ The |prefix_lookup| procedure is supposed to find exactly one module
  1219  name that has |mod_text[1..l]| as a prefix. Actually the algorithm silently
  1220  accepts also the situation that some module name is a prefix of
  1221  |mod_text[1..l]|, because the user who painstakingly typed in more than
  1222  necessary probably doesn't want to be told about the wasted effort.
  1223  
  1224  @p function prefix_lookup(@!l:sixteen_bits):name_pointer; {finds name extension}
  1225  var c:less..extension; {comparison between two names}
  1226  @!count:0..max_names; {the number of hits}
  1227  @!j:0..longest_name; {index into |mod_text|}
  1228  @!k:0..max_bytes; {index into |byte_mem|}
  1229  @!w:0..ww-1; {segment of |byte_mem|}
  1230  @!p:name_pointer; {current node of the search tree}
  1231  @!q:name_pointer; {another place to resume the search after one branch is done}
  1232  @!r:name_pointer; {extension found}
  1233  begin q:=0; p:=rlink[0]; count:=0; r:=0; {begin search at root of tree}
  1234  while p<>0 do
  1235    begin @<Set \(|c|...@>;
  1236    if c=less then p:=llink[p]
  1237    else if c=greater then p:=rlink[p]
  1238    else  begin r:=p; incr(count); q:=rlink[p]; p:=llink[p];
  1239      end;
  1240    if p=0 then
  1241      begin p:=q; q:=0;
  1242      end;
  1243    end;
  1244  if count<>1 then
  1245    if count=0 then err_print('! Name does not match')
  1246  @.Name does not match@>
  1247    else err_print('! Ambiguous prefix');
  1248  @.Ambiguous prefix@>
  1249  prefix_lookup:=r; {the result will be 0 if there was no match}
  1250  end;
  1251  
  1252  @* Tokens.
  1253  Replacement texts, which represent \PASCAL\ code in a compressed format,
  1254  appear in |tok_mem| as mentioned above. The codes in
  1255  these texts are called `tokens'; some tokens occupy two consecutive
  1256  eight-bit byte positions, and the others take just one byte.
  1257  
  1258  If $p>0$ points to a replacement text, |tok_start[p]| is the |tok_mem| position
  1259  of the first eight-bit code of that text. If |text_link[p]=0|,
  1260  this is the replacement text for a macro, otherwise it is the replacement
  1261  text for a module. In the latter case |text_link[p]| is either equal to
  1262  |module_flag|, which means that there is no further text for this module, or
  1263  |text_link[p]| points to a
  1264  continuation of this replacement text; such links are created when
  1265  several modules have \PASCAL\ texts with the same name, and they also
  1266  tie together all the \PASCAL\ texts of unnamed modules.
  1267  The replacement text pointer for the first unnamed module
  1268  appears in |text_link[0]|, and the most recent such pointer is |last_unnamed|.
  1269  
  1270  @d module_flag==max_texts {final |text_link| in module replacement texts}
  1271  
  1272  @<Glob...@>=
  1273  @!last_unnamed:text_pointer; {most recent replacement text of unnamed module}
  1274  
  1275  @ @<Set init...@>= last_unnamed:=0; text_link[0]:=0;
  1276  
  1277  @ If the first byte of a token is less than @'200, the token occupies a
  1278  single byte. Otherwise we make a sixteen-bit token by combining two consecutive
  1279  bytes |a| and |b|. If |@'200<=a<@'250|, then $(a-@'200)\times2^8+b$ points
  1280  to an identifier; if |@'250<=a<@'320|, then
  1281  $(a-@'250)\times2^8+b$ points to a module name; otherwise, i.e., if
  1282  |@'320<=a<@'400|, then $(a-@'320)\times2^8+b$ is the number of the module
  1283  in which the current replacement text appears.
  1284  
  1285  Codes less than @'200 are 7-bit ASCII codes that represent themselves.
  1286  In particular, a single-character identifier like `|x|' will be a one-byte
  1287  token, while all longer identifiers will occupy two bytes.
  1288  
  1289  Some of the 7-bit ASCII codes will not be present, however, so we can
  1290  use them for special purposes. The following symbolic names are used:
  1291  
  1292  \yskip\hang |param| denotes insertion of a parameter. This occurs only in
  1293  the replacement texts of parametric macros, outside of single-quoted strings
  1294  in those texts.
  1295  
  1296  \hang |begin_comment| denotes \.{@@\{}, which will become either
  1297  \.{\{} or \.{[}.
  1298  
  1299  \hang |end_comment| denotes \.{@@\}}, which will become either
  1300  \.{\}} or \.{]}.
  1301  
  1302  \hang |octal| denotes the \.{@@\'} that precedes an octal constant.
  1303  
  1304  \hang |hex| denotes the \.{@@"} that precedes a hexadecimal constant.
  1305  
  1306  \hang |check_sum| denotes the \.{@@\char'44} that denotes the string pool
  1307  check sum.
  1308  
  1309  \hang |join| denotes the concatenation of adjacent items with no
  1310  space or line breaks allowed between them (the \.{@@\&} operation of \.{WEB}).
  1311  
  1312  \hang |double_dot| denotes `\.{..}' in \PASCAL.
  1313  
  1314  \hang |verbatim| denotes the \.{@@=} that begins a verbatim \PASCAL\ string.
  1315  The \.{@@>} at the end of such a string is also denoted by |verbatim|.
  1316  
  1317  \hang |force_line| denotes the \.{@@\\} that forces a new line in the
  1318  \PASCAL\ output.
  1319  @^ASCII code@>
  1320  
  1321  @d param=0 {ASCII null code will not appear}
  1322  @d verbatim=@'2 {extended ASCII alpha should not appear}
  1323  @d force_line=@'3 {extended ASCII beta should not appear}
  1324  @d begin_comment=@'11 {ASCII tab mark will not appear}
  1325  @d end_comment=@'12 {ASCII line feed will not appear}
  1326  @d octal=@'14 {ASCII form feed will not appear}
  1327  @d hex=@'15 {ASCII carriage return will not appear}
  1328  @d double_dot=@'40 {ASCII space will not appear except in strings}
  1329  @d check_sum=@'175 {will not be confused with right brace}
  1330  @d join=@'177 {ASCII delete will not appear}
  1331  
  1332  @ The following procedure is used to enter a two-byte value into
  1333  |tok_mem| when a replacement text is being generated.
  1334  
  1335  @p procedure store_two_bytes(@!x:sixteen_bits);
  1336    {stores high byte, then low byte}
  1337  begin if tok_ptr[z]+2>max_toks then overflow('token');
  1338  tok_mem[z,tok_ptr[z]]:=x div@'400; {this could be done by a shift command}
  1339  tok_mem[z,tok_ptr[z]+1]:=x mod@'400; {this could be done by a logical and}
  1340  tok_ptr[z]:=tok_ptr[z]+2;
  1341  end;
  1342  
  1343  @ When \.{TANGLE} is being operated in debug mode, it has a procedure to display
  1344  a replacement text in symbolic form. This procedure has not been spruced up to
  1345  generate a real great format, but at least the results are not as bad as
  1346  a memory dump.
  1347  
  1348  @p @!debug procedure print_repl(@!p:text_pointer);
  1349  var k:0..max_toks; {index into |tok_mem|}
  1350  @!a: sixteen_bits; {current byte(s)}
  1351  @!zp: 0..zz-1; {segment of |tok_mem| being accessed}
  1352  begin if p>=text_ptr then print('BAD')
  1353  else  begin k:=tok_start[p]; zp:=p mod zz;
  1354    while k<tok_start[p+zz] do
  1355      begin a:=tok_mem[zp,k];
  1356      if a>=@'200 then @<Display two-byte token starting with |a|@>
  1357      else @<Display one-byte token |a|@>;
  1358      incr(k);
  1359      end;
  1360    end;
  1361  end;
  1362  gubed
  1363  
  1364  @ @<Display two-byte...@>=
  1365  begin incr(k);
  1366  if a<@'250 then {identifier or string}
  1367    begin a:=(a-@'200)*@'400+tok_mem[zp,k]; print_id(a);
  1368    if byte_mem[a mod ww,byte_start[a]]="""" then print('"')
  1369    else print(' ');
  1370    end
  1371  else if a<@'320 then {module name}
  1372    begin print('@@<'); print_id((a-@'250)*@'400+tok_mem[zp,k]);
  1373    print('@@>');
  1374    end
  1375  else  begin a:=(a-@'320)*@'400+tok_mem[zp,k]; {module number}
  1376    print('@@',xchr["{"],a:1,'@@',xchr["}"]); {can't use right brace
  1377      between \&{debug} and \&{gubed}}
  1378    end;
  1379  end
  1380  
  1381  @ @<Display one-byte...@>=
  1382  case a of
  1383  begin_comment: print('@@',xchr["{"]);
  1384  end_comment: print('@@',xchr["}"]); {can't use right brace
  1385      between \&{debug} and \&{gubed}}
  1386  octal: print('@@''');
  1387  hex: print('@@"');
  1388  check_sum: print('@@$');
  1389  param: print('#');
  1390  "@@": print('@@@@');
  1391  verbatim: print('@@=');
  1392  force_line: print('@@\');
  1393  othercases print(xchr[a])
  1394  endcases
  1395  
  1396  @* Stacks for output.
  1397  Let's make sure that our data structures contain enough information to
  1398  produce the entire \PASCAL\ program as desired, by working next on the
  1399  algorithms that actually do produce that program.
  1400  
  1401  @ The output process uses a stack to keep track of what is going on at
  1402  different ``levels'' as the macros are being expanded.
  1403  Entries on this stack have five parts:
  1404  
  1405  \yskip\hang |end_field| is the |tok_mem| location where the replacement
  1406  text of a particular level will end;
  1407  
  1408  \hang |byte_field| is the |tok_mem| location from which the next token
  1409  on a particular level will be read;
  1410  
  1411  \hang |name_field| points to the name corresponding to a particular level;
  1412  
  1413  \hang |repl_field| points to the replacement text currently being read
  1414  at a particular level;
  1415  
  1416  \hang |mod_field| is the module number, or zero if this is a macro.
  1417  
  1418  \yskip\noindent The current values of these five quantities are referred to
  1419  quite frequently, so they are stored in a separate place instead of in
  1420  the |stack| array. We call the current values |cur_end|, |cur_byte|,
  1421  |cur_name|, |cur_repl|, and |cur_mod|.
  1422  
  1423  The global variable |stack_ptr| tells how many levels of output are
  1424  currently in progress. The end of all output occurs when the stack is
  1425  empty, i.e., when |stack_ptr=0|.
  1426  
  1427  @<Types...@>=
  1428  @t\4@>@!output_state=record
  1429    @!end_field: sixteen_bits; {ending location of replacement text}
  1430    @!byte_field: sixteen_bits; {present location within replacement text}
  1431    @!name_field: name_pointer; {|byte_start| index for text being output}
  1432    @!repl_field: text_pointer; {|tok_start| index for text being output}
  1433    @!mod_field: 0..@'27777; {module number or zero if not a module}
  1434    end;
  1435  
  1436  @ @d cur_end==cur_state.end_field {current ending location in |tok_mem|}
  1437  @d cur_byte==cur_state.byte_field {location of next output byte in |tok_mem|}
  1438  @d cur_name==cur_state.name_field {pointer to current name being expanded}
  1439  @d cur_repl==cur_state.repl_field {pointer to current replacement text}
  1440  @d cur_mod==cur_state.mod_field {current module number being expanded}
  1441  
  1442  @<Globals...@>=
  1443  @!cur_state : output_state; {|cur_end|, |cur_byte|, |cur_name|,
  1444    |cur_repl|, |cur_mod|}
  1445  @!stack : array [1..stack_size] of output_state; {info for non-current levels}
  1446  @!stack_ptr: 0..stack_size; {first unused location in the output state stack}
  1447  
  1448  @ It is convenient to keep a global variable |zo| equal to |cur_repl mod zz|.
  1449  
  1450  @<Glob...@>=
  1451  @!zo:0..zz-1; {the segment of |tok_mem| from which output is coming}
  1452  
  1453  @ Parameters must also be stacked. They are placed in
  1454  |tok_mem| just above the other replacement texts, and dummy parameter
  1455  `names' are placed in |byte_start| just after the other names.
  1456  The variables |text_ptr| and |tok_ptr[z]| essentially serve as parameter
  1457  stack pointers during the output phase, so there is no need for a separate
  1458  data structure to handle this problem.
  1459  
  1460  @ There is an implicit stack corresponding to meta-comments that are output
  1461  via \.{@@\{} and \.{@@\}}. But this stack need not be represented in detail,
  1462  because we only need to know whether it is empty or not. A global variable
  1463  |brace_level| tells how many items would be on this stack if it were present.
  1464  
  1465  @<Globals...@>=
  1466  @!brace_level: eight_bits; {current depth of $\.{@@\{}\ldots\.{@@\}}$ nesting}
  1467  
  1468  @ To get the output process started, we will perform the following
  1469  initialization steps. We may assume that |text_link[0]| is nonzero, since it
  1470  points to the \PASCAL\ text in the first unnamed module that generates
  1471  code; if there are no such modules, there is nothing to output, and an
  1472  error message will have been generated before we do any of the initialization.
  1473  
  1474  @<Initialize the output stacks@>=
  1475  stack_ptr:=1; brace_level:=0; cur_name:=0; cur_repl:=text_link[0];
  1476  zo:=cur_repl mod zz; cur_byte:=tok_start[cur_repl];
  1477  cur_end:=tok_start[cur_repl+zz]; cur_mod:=0;
  1478  
  1479  @ When the replacement text for name |p| is to be inserted into the output,
  1480  the following subroutine is called to save the old level of output and get
  1481  the new one going.
  1482  
  1483  @p procedure push_level(@!p:name_pointer); {suspends the current level}
  1484  begin if stack_ptr=stack_size then overflow('stack')
  1485  else  begin stack[stack_ptr]:=cur_state; {save |cur_end|, |cur_byte|, etc.}
  1486    incr(stack_ptr);
  1487    cur_name:=p; cur_repl:=equiv[p]; zo:=cur_repl mod zz;
  1488    cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz];
  1489    cur_mod:=0;
  1490    end;
  1491  end;
  1492  
  1493  @ When we come to the end of a replacement text, the |pop_level| subroutine
  1494  does the right thing: It either moves to the continuation of this replacement
  1495  text or returns the state to the most recently stacked level. Part of this
  1496  subroutine, which updates the parameter stack, will be given later when we
  1497  study the parameter stack in more detail.
  1498  
  1499  @p procedure pop_level; {do this when |cur_byte| reaches |cur_end|}
  1500  label exit;
  1501  begin if text_link[cur_repl]=0 then {end of macro expansion}
  1502    begin if ilk[cur_name]=parametric then
  1503      @<Remove a parameter from the parameter stack@>;
  1504    end
  1505  else if text_link[cur_repl]<module_flag then {link to a continuation}
  1506    begin cur_repl:=text_link[cur_repl]; {we will stay on the same level}
  1507    zo:=cur_repl mod zz;
  1508    cur_byte:=tok_start[cur_repl]; cur_end:=tok_start[cur_repl+zz];
  1509    return;
  1510    end;
  1511  decr(stack_ptr); {we will go down to the previous level}
  1512  if stack_ptr>0 then
  1513    begin cur_state:=stack[stack_ptr]; zo:=cur_repl mod zz;
  1514    end;
  1515  exit: end;
  1516  
  1517  @ The heart of the output procedure is the |get_output| routine, which produces
  1518  the next token of output that is not a reference to a macro. This procedure
  1519  handles all the stacking and unstacking that is necessary. It returns the
  1520  value |number| if the next output has a numeric value (the value of a
  1521  numeric macro or string), in which case |cur_val| has been set to the
  1522  number in question. The procedure also returns the value |module_number|
  1523  if the next output begins or ends the replacement text of some module,
  1524  in which case |cur_val| is that module's number (if beginning) or the
  1525  negative of that value (if ending). And it returns the value |identifier|
  1526  if the next output is an identifier of length two or more, in which case
  1527  |cur_val| points to that identifier name.
  1528  
  1529  @d number=@'200 {code returned by |get_output| when next output is numeric}
  1530  @d module_number=@'201 {code returned by |get_output| for module numbers}
  1531  @d identifier=@'202 {code returned by |get_output| for identifiers}
  1532  
  1533  @<Globals...@>=
  1534  @!cur_val:integer; {additional information corresponding to output token}
  1535  
  1536  @ If |get_output| finds that no more output remains, it returns the value zero.
  1537  
  1538  @p function get_output:sixteen_bits; {returns next token after macro expansion}
  1539  label restart, done, found;
  1540  var a:sixteen_bits; {value of current byte}
  1541  @!b:eight_bits; {byte being copied}
  1542  @!bal:sixteen_bits; {excess of \.( versus \.) while copying a parameter}
  1543  @!k:0..max_bytes; {index into |byte_mem|}
  1544  @!w:0..ww-1; {segment of |byte_mem|}
  1545  begin restart: if stack_ptr=0 then
  1546    begin a:=0; goto found;
  1547    end;
  1548  if cur_byte=cur_end then
  1549    begin cur_val:=-cur_mod; pop_level;
  1550    if cur_val=0 then goto restart;
  1551    a:=module_number; goto found;
  1552    end;
  1553  a:=tok_mem[zo,cur_byte]; incr(cur_byte);
  1554  if a<@'200 then {one-byte token}
  1555    if a=param then
  1556        @<Start scanning current macro parameter, |goto restart|@>
  1557    else goto found;
  1558  a:=(a-@'200)*@'400+tok_mem[zo,cur_byte]; incr(cur_byte);
  1559  if a<@'24000 then {|@'24000=(@'250-@'200)*@'400|}
  1560    @<Expand macro |a| and |goto found|, or |goto restart| if no output found@>;
  1561  if a<@'50000 then {|@'50000=(@'320-@'200)*@'400|}
  1562    @<Expand module |a-@'24000|, |goto restart|@>;
  1563  cur_val:=a-@'50000; a:=module_number; cur_mod:=cur_val;
  1564  found:
  1565  @!debug if trouble_shooting then debug_help;@;@+gubed@/
  1566  get_output:=a;
  1567  end;
  1568  
  1569  @ The user may have forgotten to give any \PASCAL\ text for a module name,
  1570  or the \PASCAL\ text may have been associated with a different name by mistake.
  1571  
  1572  @<Expand module |a-...@>=
  1573  begin a:=a-@'24000;
  1574  if equiv[a]<>0 then push_level(a)
  1575  else if a<>0 then
  1576    begin print_nl('! Not present: <'); print_id(a); print('>'); error;
  1577  @.Not present: <section name>@>
  1578    end;
  1579  goto restart;
  1580  end
  1581  
  1582  @ @<Expand macro ...@>=
  1583  begin case ilk[a] of
  1584  normal: begin cur_val:=a; a:=identifier;
  1585    end;
  1586  numeric: begin cur_val:=equiv[a]-@'100000; a:=number;
  1587    end;
  1588  simple: begin push_level(a); goto restart;
  1589    end;
  1590  parametric: begin @<Put a parameter on the parameter stack,
  1591    or |goto restart| if error occurs@>;
  1592    push_level(a); goto restart;
  1593    end;
  1594  othercases confusion('output')
  1595  endcases;@/
  1596  goto found;
  1597  end
  1598  
  1599  @ We come now to the interesting part, the job of putting a parameter on
  1600  the parameter stack. First we pop the stack if necessary until getting to
  1601  a level that hasn't ended. Then the next character must be a `\.(';
  1602  and since parentheses are balanced on each level, the entire parameter must
  1603  be present, so we can copy it without difficulty.
  1604  
  1605  @<Put a parameter...@>=
  1606  while (cur_byte=cur_end)and(stack_ptr>0) do pop_level;
  1607  if (stack_ptr=0)or(tok_mem[zo,cur_byte]<>"(") then
  1608    begin print_nl('! No parameter given for '); print_id(a); error;
  1609  @.No parameter given for macro@>
  1610    goto restart;
  1611    end;
  1612  @<Copy the parameter into |tok_mem|@>;
  1613  equiv[name_ptr]:=text_ptr; ilk[name_ptr]:=simple; w:=name_ptr mod ww;
  1614  k:=byte_ptr[w];
  1615  @!debug if k=max_bytes then overflow('byte memory');
  1616  byte_mem[w,k]:="#"; incr(k); byte_ptr[w]:=k;
  1617  gubed {this code has set the parameter identifier for debugging printouts}
  1618  if name_ptr>max_names-ww then overflow('name');
  1619  byte_start[name_ptr+ww]:=k; incr(name_ptr);
  1620  if text_ptr>max_texts-zz then overflow('text');
  1621  text_link[text_ptr]:=0; tok_start[text_ptr+zz]:=tok_ptr[z];
  1622  incr(text_ptr);
  1623  z:=text_ptr mod zz
  1624  
  1625  @ The |pop_level| routine undoes the effect of parameter-pushing when
  1626  a parameter macro is finished:
  1627  
  1628  @<Remove a parameter...@>=
  1629  begin decr(name_ptr); decr(text_ptr);
  1630  z:=text_ptr mod zz;
  1631  stat if tok_ptr[z]>max_tok_ptr[z] then max_tok_ptr[z]:=tok_ptr[z];
  1632  tats {the maximum value of |tok_ptr| occurs just before parameter popping}
  1633  tok_ptr[z]:=tok_start[text_ptr];
  1634  @!debug decr(byte_ptr[name_ptr mod ww]);@+gubed
  1635  end
  1636  
  1637  @ When a parameter occurs in a replacement text, we treat it as a simple
  1638  macro in position (|name_ptr-1|):
  1639  
  1640  @<Start scanning...@>=
  1641  begin push_level(name_ptr-1); goto restart;
  1642  end
  1643  
  1644  @ Similarly, a |param| token encountered as we copy a parameter is converted
  1645  into a simple macro call for |name_ptr-1|.
  1646  Some care is needed to handle cases like \\{macro}|(#; print('#)'))|; the
  1647  \.{\#} token will have been changed to |param| outside of strings, but we
  1648  still must distinguish `real' parentheses from those in strings.
  1649  
  1650  @d app_repl(#)==begin if tok_ptr[z]=max_toks then overflow('token');
  1651    tok_mem[z,tok_ptr[z]]:=#; incr(tok_ptr[z]); end
  1652  
  1653  @<Copy the parameter...@>=
  1654  bal:=1; incr(cur_byte); {skip the opening `\.('}
  1655  loop@+  begin b:=tok_mem[zo,cur_byte]; incr(cur_byte);
  1656    if b=param then store_two_bytes(name_ptr+@'77777)
  1657    else  begin if b>=@'200 then
  1658        begin app_repl(b);
  1659        b:=tok_mem[zo,cur_byte]; incr(cur_byte);
  1660        end
  1661      else   case b of
  1662        "(": incr(bal);
  1663        ")":  begin decr(bal);
  1664          if bal=0 then goto done;
  1665          end;
  1666        "'": repeat app_repl(b);
  1667          b:=tok_mem[zo,cur_byte]; incr(cur_byte);
  1668          until b="'"; {copy string, don't change |bal|}
  1669        othercases do_nothing
  1670        endcases;
  1671      app_repl(b);
  1672      end;
  1673    end;
  1674  done:
  1675  
  1676  @* Producing the output.
  1677  The |get_output| routine above handles most of the complexity of output
  1678  generation, but there are two further considerations that have a nontrivial
  1679  effect on \.{TANGLE}'s algorithms.
  1680  
  1681  First, we want to make sure that the output is broken into lines not
  1682  exceeding |line_length| characters per line, where these breaks occur at
  1683  valid places (e.g., not in the middle of a string or a constant or an
  1684  identifier, not between `\.<' and `\.>', not at a `\.{@@\&}' position
  1685  where quantities are being joined together). Therefore we assemble the
  1686  output into a buffer before deciding where the line breaks will appear.
  1687  However, we make very little attempt to make ``logical'' line breaks that
  1688  would enhance the readability of the output; people are supposed to read
  1689  the input of \.{TANGLE} or the \TeX ed output of \.{WEAVE}, but not the
  1690  tangled-up output. The only concession to readability is that a break after
  1691  a semicolon will be made if possible, since commonly used ``pretty
  1692  printing'' routines give better results in such cases.
  1693  
  1694  Second, we want to decimalize non-decimal constants, and to combine integer
  1695  quantities that are added or subtracted, because \PASCAL\ doesn't allow
  1696  constant expressions in subrange types or in case labels. This means we
  1697  want to have a procedure that treats a construction like \.{(E-15+17)}
  1698  as equivalent to `\.{(E+2)}', while also leaving `\.{(1E-15+17)}' and
  1699  `\.{(E-15+17*y)}' untouched. Consider also `\.{-15+17.5}' versus
  1700  `\.{-15+17..5}'. We shall not combine integers preceding or following
  1701  \.*, \./, \.{div}, \.{mod}, or \.{@@\&}. Note that if |y| has been defined
  1702  to equal $-2$, we must expand `\.{x*y}' into `\.{x*(-2)}'; but `\.{x-y}'
  1703  can expand into `\.{x+2}' and we can even change `\.{x - y mod z}' to
  1704  @^mod@>
  1705  `\.{x + 2 mod z}' because \PASCAL\ has a nonstandard \&{mod} operation!
  1706  
  1707  The following solution to these problems has been adopted: An array
  1708  |out_buf| contains characters that have been generated but not yet output,
  1709  and there are three pointers into this array. One of these, |out_ptr|, is
  1710  the number of characters currently in the buffer, and we will have
  1711  |1<=out_ptr<=line_length| most of the time. The second is |break_ptr|,
  1712  which is the largest value |<=out_ptr| such that we are definitely entitled
  1713  to end a line by outputting the characters |out_buf[1..(break_ptr-1)]|;
  1714  we will always have |break_ptr<=line_length|. Finally, |semi_ptr| is either
  1715  zero or the largest known value of a legal break after a semicolon or comment
  1716  on the current line; we will always have |semi_ptr<=break_ptr|.
  1717  
  1718  @<Globals...@>=
  1719  @!out_buf: array [0..out_buf_size] of ASCII_code; {assembled characters}
  1720  @!out_ptr: 0..out_buf_size; {first available place in |out_buf|}
  1721  @!break_ptr: 0..out_buf_size; {last breaking place in |out_buf|}
  1722  @!semi_ptr: 0..out_buf_size; {last semicolon breaking place in |out_buf|}
  1723  
  1724  @ Besides having those three pointers,
  1725  the output process is in one of several states:
  1726  
  1727  \yskip\hang |num_or_id| means that the last item in the buffer is a number or
  1728  identifier, hence a blank space or line break must be inserted if the next
  1729  item is also a number or identifier.
  1730  
  1731  \yskip\hang |unbreakable| means that the last item in the buffer was followed
  1732  by the \.{@@\&} operation that inhibits spaces between it and the next item.
  1733  
  1734  \yskip\hang |sign| means that the last item in the buffer is to be followed
  1735  by \.+ or \.-, depending on whether |out_app| is positive or negative.
  1736  
  1737  \yskip\hang |sign_val| means that the decimal equivalent of
  1738  $\vert|out_val|\vert$ should be appended to the buffer. If |out_val<0|,
  1739  or if |out_val=0| and |last_sign<0|, the number should be preceded by a minus
  1740  sign. Otherwise it should be preceded by the character |out_sign| unless
  1741  |out_sign=0|; the |out_sign| variable is either 0 or \.{"\ "} or \.{"+"}.
  1742  
  1743  \yskip\hang |sign_val_sign| is like |sign_val|, but also append \.+ or \.-
  1744  afterwards, depending on whether |out_app| is positive or negative.
  1745  
  1746  \yskip\hang |sign_val_val| is like |sign_val|, but also append the decimal
  1747  equivalent of |out_app| including its sign, using |last_sign| in case
  1748  |out_app=0|.
  1749  
  1750  \yskip\hang |misc| means none of the above.
  1751  
  1752  \yskip\noindent
  1753  For example, the output buffer and output state run through the following
  1754  sequence as we generate characters from `\.{(x-15+19-2)}':
  1755  $$\vbox{\halign{$\hfil#\hfil$\quad&#\hfil&\quad\hfil#\hfil&\quad
  1756  \hfil#\hfil&\quad\hfil#\hfil&\quad\hfil#\hfil\quad&\hfil#\hfil\cr
  1757  output&|out_buf|&|out_state|&|out_sign|&|out_val|&|out_app|&|last_sign|\cr
  1758  \noalign{\vskip 3pt}
  1759  (&\.(&|misc|\cr
  1760  x&\.{(x}&|num_or_id|\cr
  1761  -&\.{(x}&|sign|&&&$-1$&$-1$\cr
  1762  15&\.{(x}&|sign_val|&\.{"+"}&$-15$&&$-1$\cr
  1763  +&\.{(x}&|sign_val_sign|&\.{"+"}&$-15$&$+1$&$+1$\cr
  1764  19&\.{(x}&|sign_val_val|&\.{"+"}&$-15$&$+19$&$+1$\cr
  1765  -&\.{(x}&|sign_val_sign|&\.{"+"}&$+4$&$-1$&$-1$\cr
  1766  2&\.{(x}&|sign_val_val|&\.{"+"}&$+4$&$-2$&$-1$\cr
  1767  )&\.{(x+2)}&|misc|\cr}}$$
  1768  At each stage we have put as much into the buffer as possible without
  1769  knowing what is coming next. Examples like `\.{x-0.1}' indicate why
  1770  |last_sign| is needed to associate the proper sign with an output of zero.
  1771  
  1772  In states |num_or_id|, |unbreakable|, and |misc| the last item in the buffer
  1773  lies between |break_ptr| and |out_ptr-1|, inclusive; in the other states we
  1774  have |break_ptr=out_ptr|.
  1775  
  1776  The numeric values assigned to |num_or_id|, etc., have been chosen to
  1777  shorten some of the program logic; for example, the program makes use of
  1778  the fact that |sign+2=sign_val_sign|.
  1779  
  1780  @d misc=0 {state associated with special characters}
  1781  @d num_or_id=1 {state associated with numbers and identifiers}
  1782  @d sign=2 {state associated with pending \.+ or \.-}
  1783  @d sign_val=num_or_id+2 {state associated with pending sign and value}
  1784  @d sign_val_sign=sign+2 {|sign_val| followed by another pending sign}
  1785  @d sign_val_val=sign_val+2 {|sign_val| followed by another pending value}
  1786  @d unbreakable=sign_val_val+1 {state associated with \.{@@\&}}
  1787  
  1788  @<Globals...@>=
  1789  @!out_state:eight_bits; {current status of partial output}
  1790  @!out_val,@!out_app:integer; {pending values}
  1791  @!out_sign:ASCII_code; {sign to use if appending |out_val>=0|}
  1792  @!last_sign:-1..+1; {sign to use if appending a zero}
  1793  
  1794  @ During the output process, |line| will equal the number of the next line
  1795  to be output.
  1796  
  1797  @<Initialize the output buffer@>=
  1798  out_state:=misc; out_ptr:=0; break_ptr:=0; semi_ptr:=0; out_buf[0]:=0; line:=1;
  1799  
  1800  @ Here is a routine that is invoked when |out_ptr>line_length|
  1801  or when it is time to flush out the final line. The |flush_buffer| procedure
  1802  often writes out the line up to the current |break_ptr| position, then moves the
  1803  remaining information to the front of |out_buf|. However, it prefers to
  1804  write only up to |semi_ptr|, if the residual line won't be too long.
  1805  
  1806  @d check_break==if out_ptr>line_length then flush_buffer
  1807  
  1808  @p procedure flush_buffer; {writes one line to output file}
  1809  var k:0..out_buf_size; {index into |out_buf|}
  1810  @!b:0..out_buf_size; {value of |break_ptr| upon entry}
  1811  begin b:=break_ptr;
  1812  if (semi_ptr<>0)and(out_ptr-semi_ptr<=line_length) then break_ptr:=semi_ptr;
  1813  for k:=1 to break_ptr do write(Pascal_file,xchr[out_buf[k-1]]);
  1814  write_ln(Pascal_file); incr(line);
  1815  if line mod 100 = 0 then
  1816    begin print('.');
  1817    if line mod 500 = 0 then print(line:1);
  1818    update_terminal; {progress report}
  1819    end;
  1820  if break_ptr<out_ptr then
  1821    begin if out_buf[break_ptr]=" " then
  1822      begin incr(break_ptr); {drop space at break}
  1823      if break_ptr>b then b:=break_ptr;
  1824      end;
  1825    for k:=break_ptr to out_ptr-1 do out_buf[k-break_ptr]:=out_buf[k];
  1826    end;
  1827  out_ptr:=out_ptr-break_ptr; break_ptr:=b-break_ptr; semi_ptr:=0;
  1828  if out_ptr>line_length then
  1829    begin err_print('! Long line must be truncated'); out_ptr:=line_length;
  1830  @.Long line must be truncated@>
  1831    end;
  1832  end;
  1833  
  1834  @ @<Empty the last line from the buffer@>=
  1835  break_ptr:=out_ptr; semi_ptr:=0; flush_buffer;
  1836  if brace_level<>0 then
  1837    err_print('! Program ended at brace level ',brace_level:1);
  1838  @.Program ended at brace level n@>
  1839  
  1840  @ Another simple and useful routine appends the decimal equivalent of
  1841  a nonnegative integer to the output buffer.
  1842  
  1843  @d app(#)==begin out_buf[out_ptr]:=#; incr(out_ptr); {append a single character}
  1844    end
  1845  
  1846  @p procedure app_val(@!v:integer); {puts |v| into buffer, assumes |v>=0|}
  1847  var k:0..out_buf_size; {index into |out_buf|}
  1848  begin k:=out_buf_size; {first we put the digits at the very end of |out_buf|}
  1849  repeat out_buf[k]:=v mod 10; v:=v div 10; decr(k);
  1850  until v=0;
  1851  repeat incr(k); app(out_buf[k]+"0");
  1852  until k=out_buf_size; {then we append them, most significant first}
  1853  end;
  1854  
  1855  @ The output states are kept up to date by the output routines, which are
  1856  called |send_out|, |send_val|, and |send_sign|. The |send_out| procedure
  1857  has two parameters: |t| tells the type of information being sent and
  1858  |v| contains the information proper. Some information may also be passed
  1859  in the array |out_contrib|.
  1860  
  1861  \yskip\hang If |t=misc| then |v| is a character to be output.
  1862  
  1863  \hang If |t=str| then |v| is the length of a string or something like `\.{<>}'
  1864  in |out_contrib|.
  1865  
  1866  \hang If |t=ident| then |v| is the length of an identifier in |out_contrib|.
  1867  
  1868  \hang If |t=frac| then |v| is the length of a fraction and/or exponent in
  1869  |out_contrib|.
  1870  
  1871  @d str=1 {|send_out| code for a string}
  1872  @d ident=2 {|send_out| code for an identifier}
  1873  @d frac=3 {|send_out| code for a fraction}
  1874  
  1875  @<Glob...@>=
  1876  @!out_contrib:array[1..line_length] of ASCII_code; {a contribution to |out_buf|}
  1877  
  1878  @ A slightly subtle point in the following code is that the user may ask
  1879  for a |join| operation (i.e., \.{@@\&}) following whatever is being sent
  1880  out.  We will see later that |join| is implemented in part by calling
  1881  |send_out(frac,0)|.
  1882  
  1883  @p procedure send_out(@!t:eight_bits; @!v:sixteen_bits);
  1884    {outputs |v| of type |t|}
  1885  label restart;
  1886  var k: 0..line_length; {index into |out_contrib|}
  1887  begin @<Get the buffer ready for appending the new information@>;
  1888  if t<>misc then for k:=1 to v do app(out_contrib[k])
  1889  else app(v);
  1890  check_break;
  1891  if (t=misc)and((v=";")or(v="}")) then
  1892    begin semi_ptr:=out_ptr; break_ptr:=out_ptr;
  1893    end;
  1894  if t>=ident then out_state:=num_or_id {|t=ident| or |frac|}
  1895  else out_state:=misc {|t=str| or |misc|}
  1896  end;
  1897  
  1898  @ Here is where the buffer states for signs and values collapse into simpler
  1899  states, because we are about to append something that doesn't combine with
  1900  the previous integer constants.
  1901  
  1902  We use an ASCII-code trick: Since |","-1="+"| and |","+1="-"|, we have
  1903  |","-c=@t sign of $c$@>|, when $\vert c\vert=1$.
  1904  
  1905  @<Get the buffer ready...@>=
  1906  restart: case out_state of
  1907  num_or_id: if t<>frac then
  1908    begin break_ptr:=out_ptr;
  1909    if t=ident then app(" ");
  1910    end;
  1911  sign: begin app(","-out_app); check_break; break_ptr:=out_ptr;
  1912    end;
  1913  sign_val,sign_val_sign: begin @<Append \(|out_val| to buffer@>;
  1914    out_state:=out_state-2; goto restart;
  1915    end;
  1916  sign_val_val: @<Reduce |sign_val_val| to |sign_val| and |goto restart|@>;
  1917  misc: if t<>frac then break_ptr:=out_ptr;@/
  1918  othercases do_nothing {this is for |unbreakable| state}
  1919  endcases
  1920  
  1921  @ @<Append \(|out_val|...@>=
  1922  if (out_val<0)or((out_val=0)and(last_sign<0)) then app("-")
  1923  else if out_sign>0 then app(out_sign);
  1924  app_val(abs(out_val)); check_break;
  1925  
  1926  @ @<Reduce |sign_val_val|...@>=
  1927  begin if (t=frac)or(@<Contribution is \.* or \./ or \.{DIV} or \.{MOD}@>) then
  1928    begin @<Append \(|out_val| to buffer@>;
  1929    out_sign:="+"; out_val:=out_app;
  1930    end
  1931  else out_val:=out_val+out_app;
  1932  out_state:=sign_val; goto restart;
  1933  end
  1934  
  1935  @ @<Contribution is \.*...@>=
  1936  ((t=ident)and(v=3)and@|
  1937   (((out_contrib[1]="D")and(out_contrib[2]="I")and(out_contrib[3]="V")) or@|
  1938   ((out_contrib[1]="M")and(out_contrib[2]="O")and(out_contrib[3]="D")) ))or@|
  1939  @^uppercase@>
  1940   ((t=misc)and((v="*")or(v="/")))
  1941  
  1942  @ The following routine is called with $v=\pm1$ when a plus or minus sign is
  1943  appended to the output. It extends \PASCAL\ to allow repeated signs
  1944  (e.g., `\.{--}' is equivalent to `\.+'), rather than to give an error message.
  1945  The signs following `\.E' in real constants are treated as part of a fraction,
  1946  so they are not seen by this routine.
  1947  
  1948  @p procedure send_sign(@!v:integer);
  1949  begin case out_state of
  1950  sign, sign_val_sign: out_app:=out_app*v;
  1951  sign_val:begin out_app:=v; out_state:=sign_val_sign;
  1952    end;
  1953  sign_val_val: begin out_val:=out_val+out_app; out_app:=v;
  1954    out_state:=sign_val_sign;
  1955    end;
  1956  othercases begin break_ptr:=out_ptr; out_app:=v; out_state:=sign;
  1957    end
  1958  endcases;@/
  1959  last_sign:=out_app;
  1960  end;
  1961  
  1962  @ When a (signed) integer value is to be output, we call |send_val|.
  1963  
  1964  @d bad_case=666 {this is a label used below}
  1965  
  1966  @p procedure send_val(@!v:integer); {output the (signed) value |v|}
  1967  label bad_case, {go here if we can't keep |v| in the output state}
  1968    exit;
  1969  begin case out_state of
  1970  num_or_id: begin @<If previous output was \.{DIV} or \.{MOD}, |goto bad_case|@>;
  1971    out_sign:=" "; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr;
  1972    last_sign:=+1;
  1973    end;
  1974  misc: begin @<If previous output was \.* or \./, |goto bad_case|@>;
  1975    out_sign:=0; out_state:=sign_val; out_val:=v; break_ptr:=out_ptr;
  1976    last_sign:=+1;
  1977    end;
  1978  @t\4@>@<Handle cases of |send_val| when |out_state| contains a sign@>@;
  1979  othercases goto bad_case
  1980  endcases;@/
  1981  return;
  1982  bad_case: @<Append the decimal value of |v|, with parentheses if negative@>;
  1983  exit: end;
  1984  
  1985  @ @<Handle cases of |send_val|...@>=
  1986  sign: begin out_sign:="+"; out_state:=sign_val; out_val:=out_app*v;
  1987    end;
  1988  sign_val: begin out_state:=sign_val_val; out_app:=v;
  1989    err_print('! Two numbers occurred without a sign between them');
  1990    end;
  1991  sign_val_sign: begin out_state:=sign_val_val; out_app:=out_app*v;
  1992    end;
  1993  sign_val_val: begin out_val:=out_val+out_app; out_app:=v;
  1994    err_print('! Two numbers occurred without a sign between them');
  1995  @.Two numbers occurred...@>
  1996    end;
  1997  
  1998  @ @<If previous output was \.*...@>=
  1999  if (out_ptr=break_ptr+1)and((out_buf[break_ptr]="*")or(out_buf[break_ptr]="/"))
  2000    then goto bad_case
  2001  
  2002  @ @<If previous output was \.{DIV}...@>=
  2003  if (out_ptr=break_ptr+3)or
  2004   ((out_ptr=break_ptr+4)and(out_buf[break_ptr]=" ")) then
  2005  @^uppercase@>
  2006    if ((out_buf[out_ptr-3]="D")and(out_buf[out_ptr-2]="I")and
  2007      (out_buf[out_ptr-1]="V"))or @/
  2008       ((out_buf[out_ptr-3]="M")and(out_buf[out_ptr-2]="O")and
  2009      (out_buf[out_ptr-1]="D")) then@/ goto bad_case
  2010  
  2011  @ @<Append the decimal value...@>=
  2012  if v>=0 then
  2013    begin if out_state=num_or_id then
  2014      begin break_ptr:=out_ptr; app(" ");
  2015      end;
  2016    app_val(v); check_break; out_state:=num_or_id;
  2017    end
  2018  else  begin app("("); app("-"); app_val(-v); app(")"); check_break;
  2019    out_state:=misc;
  2020    end
  2021  
  2022  @* The big output switch.
  2023  To complete the output process, we need a routine that takes the results
  2024  of |get_output| and feeds them to |send_out|, |send_val|, or |send_sign|.
  2025  This procedure `|send_the_output|' will be invoked just once, as follows:
  2026  
  2027  @<Phase II: Output the contents of the compressed tables@>=
  2028  if text_link[0]=0 then
  2029    begin print_nl('! No output was specified.'); mark_harmless;
  2030  @.No output was specified@>
  2031    end
  2032  else  begin print_nl('Writing the output file'); update_terminal;@/
  2033    @<Initialize the output stacks@>;
  2034    @<Initialize the output buffer@>;
  2035    send_the_output;@/
  2036    @<Empty the last line...@>;
  2037    print_nl('Done.');
  2038    end
  2039  
  2040  @ A many-way switch is used to send the output:
  2041  
  2042  @d get_fraction=2 {this label is used below}
  2043  
  2044  @p procedure send_the_output;
  2045  label get_fraction, {go here to finish scanning a real constant}
  2046    reswitch, continue;
  2047  var cur_char:eight_bits; {the latest character received}
  2048    @!k:0..line_length; {index into |out_contrib|}
  2049    @!j:0..max_bytes; {index into |byte_mem|}
  2050    @!w:0..ww-1; {segment of |byte_mem|}
  2051    @!n:integer; {number being scanned}
  2052  begin while stack_ptr>0 do
  2053    begin cur_char:=get_output;
  2054    reswitch: case cur_char of
  2055    0: do_nothing; {this case might arise if output ends unexpectedly}
  2056    @t\4@>@<Cases related to identifiers@>@;
  2057    @t\4@>@<Cases related to constants, possibly leading to
  2058      |get_fraction| or |reswitch|@>@;
  2059    "+","-": send_sign(","-cur_char);
  2060    @t\4@>@<Cases like \.{<>} and \.{:=}@>@;
  2061    "'": @<Send a string, |goto reswitch|@>;
  2062    @<Other printable characters@>: send_out(misc,cur_char);
  2063    @t\4@>@<Cases involving \.{@@\{} and \.{@@\}}@>@;
  2064    join: begin send_out(frac,0); out_state:=unbreakable;
  2065      end;
  2066    verbatim: @<Send verbatim string@>;
  2067    force_line: @<Force a line break@>;
  2068    othercases err_print('! Can''t output ASCII code ',cur_char:1)
  2069  @.Can't output ASCII code n@>
  2070    endcases;@/
  2071    goto continue;
  2072    get_fraction: @<Special code to finish real constants@>;
  2073    continue: end;
  2074  end;
  2075  
  2076  @ @<Cases like \.{<>}...@>=
  2077  and_sign: begin out_contrib[1]:="A"; out_contrib[2]:="N"; out_contrib[3]:="D";
  2078  @^uppercase@>
  2079    send_out(ident,3);
  2080    end;
  2081  not_sign: begin out_contrib[1]:="N"; out_contrib[2]:="O"; out_contrib[3]:="T";
  2082    send_out(ident,3);
  2083    end;
  2084  set_element_sign: begin out_contrib[1]:="I"; out_contrib[2]:="N";
  2085    send_out(ident,2);
  2086    end;
  2087  or_sign: begin out_contrib[1]:="O"; out_contrib[2]:="R"; send_out(ident,2);
  2088    end;
  2089  left_arrow: begin out_contrib[1]:=":"; out_contrib[2]:="="; send_out(str,2);
  2090    end;
  2091  not_equal: begin out_contrib[1]:="<"; out_contrib[2]:=">"; send_out(str,2);
  2092    end;
  2093  less_or_equal: begin out_contrib[1]:="<"; out_contrib[2]:="="; send_out(str,2);
  2094    end;
  2095  greater_or_equal: begin out_contrib[1]:=">"; out_contrib[2]:="=";
  2096    send_out(str,2);
  2097    end;
  2098  equivalence_sign: begin out_contrib[1]:="="; out_contrib[2]:="=";
  2099    send_out(str,2);
  2100    end;
  2101  double_dot: begin out_contrib[1]:="."; out_contrib[2]:="."; send_out(str,2);
  2102    end;
  2103  
  2104  @ Please don't ask how all of the following characters can actually get
  2105  through \.{TANGLE} outside of strings. It seems that |""""| and |"{"|
  2106  cannot actually occur at this point of the program, but they have
  2107  been included just in case \.{TANGLE} changes.
  2108  
  2109  If \.{TANGLE} is producing code for a \PASCAL\ compiler that uses `\.{(.}'
  2110  and `\.{.)}' instead of square brackets (e.g., on machines with {\mc EBCDIC}
  2111  code), one should remove |"["| and |"]"| from this list and put them into
  2112  the preceding module in the appropriate way. Similarly, some compilers
  2113  want `\.\^' to be converted to `\.{@@}'.
  2114  @^system dependencies@>@^EBCDIC@>
  2115  
  2116  @<Other printable characters@>=
  2117  "!","""","#","$","%","&","(",")","*",",","/",":",";","<","=",">","?",
  2118  "@@","[","\","]","^","_","`","{","|"
  2119  
  2120  @ Single-character identifiers represent themselves, while longer ones
  2121  appear in |byte_mem|. All must be converted to uppercase,
  2122  with underlines removed. Extremely long identifiers must be chopped.
  2123  
  2124  (Some \PASCAL\ compilers work with lowercase letters instead of
  2125  uppercase. If this module of \.{TANGLE} is changed, it's also necessary
  2126  to change from uppercase to lowercase in the modules that are
  2127  listed in the index under ``uppercase''.)
  2128  @^system dependencies@>
  2129  @^uppercase@>
  2130  
  2131  @d up_to(#)==#-24,#-23,#-22,#-21,#-20,#-19,#-18,#-17,#-16,#-15,#-14,
  2132    #-13,#-12,#-11,#-10,#-9,#-8,#-7,#-6,#-5,#-4,#-3,#-2,#-1,#
  2133  
  2134  @<Cases related to identifiers@>=
  2135  "A",up_to("Z"): begin out_contrib[1]:=cur_char; send_out(ident,1);
  2136    end;
  2137  "a",up_to("z"): begin out_contrib[1]:=cur_char-@'40; send_out(ident,1);
  2138    end;
  2139  identifier: begin k:=0; j:=byte_start[cur_val]; w:=cur_val mod ww;
  2140    while (k<max_id_length)and(j<byte_start[cur_val+ww]) do
  2141      begin incr(k); out_contrib[k]:=byte_mem[w,j]; incr(j);
  2142      if out_contrib[k]>="a" then out_contrib[k]:=out_contrib[k]-@'40
  2143      else if out_contrib[k]="_" then decr(k);
  2144      end;
  2145    send_out(ident,k);
  2146    end;
  2147  
  2148  @ After sending a string, we need to look ahead at the next character, in order
  2149  to see if there were two consecutive single-quote marks. Afterwards we go to
  2150  |reswitch| to process the next character.
  2151  
  2152  @<Send a string...@>=
  2153  begin k:=1; out_contrib[1]:="'";
  2154  repeat if k<line_length then incr(k);
  2155  out_contrib[k]:=get_output;
  2156  until (out_contrib[k]="'")or(stack_ptr=0);
  2157  if k=line_length then err_print('! String too long');
  2158  @.String too long@>
  2159  send_out(str,k); cur_char:=get_output;
  2160  if cur_char="'" then out_state:=unbreakable;
  2161  goto reswitch;
  2162  end
  2163  
  2164  @ Sending a verbatim string is similar, but we don't have to look ahead.
  2165  
  2166  @<Send verbatim string@>=
  2167  begin k:=0;
  2168  repeat if k<line_length then incr(k);
  2169  out_contrib[k]:=get_output;
  2170  until (out_contrib[k]=verbatim)or(stack_ptr=0);
  2171  if k=line_length then err_print('! Verbatim string too long');
  2172  @.Verbatim string too long@>
  2173  send_out(str,k-1);
  2174  end
  2175  
  2176  @ In order to encourage portable software, \.{TANGLE} complains
  2177  if the constants get dangerously close to the largest value representable
  2178  on a 32-bit computer ($2^{31}-1$).
  2179  
  2180  @d digits=="0","1","2","3","4","5","6","7","8","9"
  2181  
  2182  @<Cases related to constants...@>=
  2183  digits: begin n:=0;
  2184    repeat cur_char:=cur_char-"0";
  2185    if n>=@'1463146314 then err_print('! Constant too big')
  2186  @.Constant too big@>
  2187    else n:=10*n+cur_char;
  2188    cur_char:=get_output;
  2189    until (cur_char>"9")or(cur_char<"0");
  2190    send_val(n); k:=0;
  2191    if cur_char="e" then cur_char:="E";
  2192  @^uppercase@>
  2193    if cur_char="E" then goto get_fraction
  2194    else goto reswitch;
  2195    end;
  2196  check_sum: send_val(pool_check_sum);
  2197  octal: begin n:=0; cur_char:="0";
  2198    repeat cur_char:=cur_char-"0";
  2199    if n>=@'2000000000 then err_print('! Constant too big')
  2200    else n:=8*n+cur_char;
  2201    cur_char:=get_output;
  2202    until (cur_char>"7")or(cur_char<"0");
  2203    send_val(n); goto reswitch;
  2204    end;
  2205  hex: begin n:=0; cur_char:="0";
  2206    repeat if cur_char>="A" then cur_char:=cur_char+10-"A"
  2207    else cur_char:=cur_char-"0";
  2208    if n>=@"8000000 then err_print('! Constant too big')
  2209    else n:=16*n+cur_char;
  2210    cur_char:=get_output;
  2211    until (cur_char>"F")or(cur_char<"0")or@|
  2212      ((cur_char>"9")and(cur_char<"A"));
  2213    send_val(n); goto reswitch;
  2214    end;
  2215  number: send_val(cur_val);
  2216  ".":  begin k:=1; out_contrib[1]:="."; cur_char:=get_output;
  2217    if cur_char="." then
  2218      begin out_contrib[2]:="."; send_out(str,2);
  2219      end
  2220    else if (cur_char>="0")and(cur_char<="9") then goto get_fraction
  2221    else  begin send_out(misc,"."); goto reswitch;
  2222      end;
  2223    end;
  2224  
  2225  @ The following code appears at label `|get_fraction|', when we want to
  2226  scan to the end of a real constant. The first |k| characters of a fraction
  2227  have already been placed in |out_contrib|, and |cur_char| is the next character.
  2228  
  2229  @<Special code...@>=
  2230  repeat if k<line_length then incr(k);
  2231  out_contrib[k]:=cur_char; cur_char:=get_output;
  2232  if (out_contrib[k]="E")and((cur_char="+")or(cur_char="-")) then
  2233  @^uppercase@>
  2234    begin if k<line_length then incr(k);
  2235    out_contrib[k]:=cur_char; cur_char:=get_output;
  2236    end
  2237  else if cur_char="e" then cur_char:="E";
  2238  until (cur_char<>"E")and((cur_char<"0")or(cur_char>"9"));
  2239  if k=line_length then err_print('! Fraction too long');
  2240  @.Fraction too long@>
  2241  send_out(frac,k); goto reswitch
  2242  
  2243  @ Some \PASCAL\ compilers do not recognize comments in braces, so the
  2244  comments must be delimited by `\.{(*}' and `\.{*)}'.
  2245  @^system dependencies@>
  2246  In such cases the statement `|out_contrib[1]:="{"|' that appears here should
  2247  be replaced by `\ignorespaces|begin out_contrib[1]:="("; out_contrib[2]:="*";
  2248  incr(k); end|', and a similar change should be made to
  2249  `|out_contrib[k]:="}"|'.
  2250  
  2251  @<Cases involving \.{@@\{} and \.{@@\}}@>=
  2252  begin_comment: begin if brace_level=0 then send_out(misc,"{")
  2253    else send_out(misc,"[");
  2254    incr(brace_level);
  2255    end;
  2256  end_comment: if brace_level>0 then
  2257      begin decr(brace_level);
  2258      if brace_level=0 then send_out(misc,"}")
  2259      else send_out(misc,"]");
  2260      end
  2261    else err_print('! Extra @@}');
  2262  @.Extra \AT!\}@>
  2263  module_number: begin k:=2;
  2264    if brace_level=0 then out_contrib[1]:="{"
  2265    else out_contrib[1]:="[";
  2266    if cur_val<0 then
  2267      begin out_contrib[k]:=":"; cur_val:=-cur_val; incr(k);
  2268      end;
  2269    n:=10;
  2270    while cur_val>=n do n:=10*n;
  2271    repeat n:=n div 10;
  2272      out_contrib[k]:="0"+(cur_val div n); cur_val:=cur_val mod n; incr(k);
  2273    until n=1;
  2274    if out_contrib[2]<>":" then
  2275      begin out_contrib[k]:=":"; incr(k);
  2276      end;
  2277    if brace_level=0 then out_contrib[k]:="}"
  2278    else out_contrib[k]:="]";
  2279    send_out(str,k);
  2280    end;
  2281  
  2282  @ @<Force a line break@>=
  2283  begin send_out(str,0); {normalize the buffer}
  2284  while out_ptr>0 do
  2285    begin if out_ptr<=line_length then break_ptr:=out_ptr;
  2286    flush_buffer;
  2287    end;
  2288  out_state:=misc;
  2289  end
  2290  
  2291  @* Introduction to the input phase.
  2292  We have now seen that \.{TANGLE} will be able to output the full
  2293  \PASCAL\ program, if we can only get that program into the byte memory in
  2294  the proper format. The input process is something like the output process
  2295  in reverse, since we compress the text as we read it in and we expand it
  2296  as we write it out.
  2297  
  2298  There are three main input routines. The most interesting is the one that gets
  2299  the next token of a \PASCAL\ text; the other two are used to scan rapidly past
  2300  \TeX\ text in the \.{WEB} source code. One of the latter routines will jump to
  2301  the next token that starts with `\.{@@}', and the other skips to the end
  2302  of a \PASCAL\ comment.
  2303  
  2304  @ But first we need to consider the low-level routine |get_line|
  2305  that takes care of merging |change_file| into |web_file|. The |get_line|
  2306  procedure also updates the line numbers for error messages.
  2307  
  2308  @<Globals...@>=
  2309  @!ii:integer; {general purpose |for| loop variable in the outer block}
  2310  @!line:integer; {the number of the current line in the current file}
  2311  @!other_line:integer; {the number of the current line in the input file that
  2312    is not currently being read}
  2313  @!temp_line:integer; {used when interchanging |line| with |other_line|}
  2314  @!limit:0..buf_size; {the last character position occupied in the buffer}
  2315  @!loc:0..buf_size; {the next character position to be read from the buffer}
  2316  @!input_has_ended: boolean; {if |true|, there is no more input}
  2317  @!changing: boolean; {if |true|, the current line is from |change_file|}
  2318  
  2319  @ As we change |changing| from |true| to |false| and back again, we must
  2320  remember to swap the values of |line| and |other_line| so that the |err_print|
  2321  routine will be sure to report the correct line number.
  2322  
  2323  @d change_changing==
  2324    changing := not changing;
  2325    temp_line:=other_line; other_line:=line; line:=temp_line
  2326      {|line @t$\null\BA\null$@> other_line|}
  2327  
  2328  @ When |changing| is |false|, the next line of |change_file| is kept in
  2329  |change_buffer[0..change_limit]|, for purposes of comparison with the next
  2330  line of |web_file|. After the change file has been completely input, we
  2331  set |change_limit:=0|, so that no further matches will be made.
  2332  
  2333  @<Globals...@>=
  2334  @!change_buffer:array[0..buf_size] of ASCII_code;
  2335  @!change_limit:0..buf_size; {the last position occupied in |change_buffer|}
  2336  
  2337  @ Here's a simple function that checks if the two buffers are different.
  2338  
  2339  @p function lines_dont_match:boolean;
  2340  label exit;
  2341  var k:0..buf_size; {index into the buffers}
  2342  begin lines_dont_match:=true;
  2343  if change_limit<>limit then return;
  2344  if limit>0 then
  2345    for k:=0 to limit-1 do if change_buffer[k]<>buffer[k] then return;
  2346  lines_dont_match:=false;
  2347  exit: end;
  2348  
  2349  @ Procedure |prime_the_change_buffer| sets |change_buffer| in preparation
  2350  for the next matching operation. Since blank lines in the change file are
  2351  not used for matching, we have |(change_limit=0)and not changing| if and
  2352  only if the change file is exhausted. This procedure is called only
  2353  when |changing| is true; hence error messages will be reported correctly.
  2354  
  2355  @p procedure prime_the_change_buffer;
  2356  label continue, done, exit;
  2357  var k:0..buf_size; {index into the buffers}
  2358  begin change_limit:=0; {this value will be used if the change file ends}
  2359  @<Skip over comment lines in the change file; |return| if end of file@>;
  2360  @<Skip to the next nonblank line; |return| if end of file@>;
  2361  @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>;
  2362  exit: end;
  2363  
  2364  @ While looking for a line that begins with \.{@@x} in the change file,
  2365  we allow lines that begin with \.{@@}, as long as they don't begin with
  2366  \.{@@y} or \.{@@z} (which would probably indicate that the change file is
  2367  fouled up).
  2368  
  2369  @<Skip over comment lines in the change file...@>=
  2370  loop@+  begin incr(line);
  2371    if not input_ln(change_file) then return;
  2372    if limit<2 then goto continue;
  2373    if buffer[0]<>"@@" then goto continue;
  2374    if (buffer[1]>="X")and(buffer[1]<="Z") then
  2375      buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
  2376    if buffer[1]="x" then goto done;
  2377    if (buffer[1]="y")or(buffer[1]="z") then
  2378      begin loc:=2; err_print('! Where is the matching @@x?');
  2379  @.Where is the match...@>
  2380      end;
  2381  continue: end;
  2382  done:
  2383  
  2384  @ Here we are looking at lines following the \.{@@x}.
  2385  
  2386  @<Skip to the next nonblank line...@>=
  2387  repeat incr(line);
  2388    if not input_ln(change_file) then
  2389      begin err_print('! Change file ended after @@x');
  2390  @.Change file ended...@>
  2391      return;
  2392      end;
  2393  until limit>0;
  2394  
  2395  @ @<Move |buffer| and |limit| to |change_buffer| and |change_limit|@>=
  2396  begin change_limit:=limit;
  2397  if limit>0 then for k:=0 to limit-1 do change_buffer[k]:=buffer[k];
  2398  end
  2399  
  2400  @ The following procedure is used to see if the next change entry should
  2401  go into effect; it is called only when |changing| is false.
  2402  The idea is to test whether or not the current
  2403  contents of |buffer| matches the current contents of |change_buffer|.
  2404  If not, there's nothing more to do; but if so, a change is called for:
  2405  All of the text down to the \.{@@y} is supposed to match. An error
  2406  message is issued if any discrepancy is found. Then the procedure
  2407  prepares to read the next line from |change_file|.
  2408  
  2409  @p procedure check_change; {switches to |change_file| if the buffers match}
  2410  label exit;
  2411  var n:integer; {the number of discrepancies found}
  2412  @!k:0..buf_size; {index into the buffers}
  2413  begin if lines_dont_match then return;
  2414  n:=0;
  2415  loop@+  begin change_changing; {now it's |true|}
  2416    incr(line);
  2417    if not input_ln(change_file) then
  2418      begin err_print('! Change file ended before @@y');
  2419  @.Change file ended...@>
  2420      change_limit:=0;  change_changing; {|false| again}
  2421      return;
  2422      end;
  2423    @<If the current line starts with \.{@@y},
  2424      report any discrepancies and |return|@>;
  2425    @<Move |buffer| and |limit|...@>;
  2426    change_changing; {now it's |false|}
  2427    incr(line);
  2428    if not input_ln(web_file) then
  2429      begin err_print('! WEB file ended during a change');
  2430  @.WEB file ended...@>
  2431      input_has_ended:=true; return;
  2432      end;
  2433    if lines_dont_match then incr(n);
  2434    end;
  2435  exit: end;
  2436  
  2437  @ @<If the current line starts with \.{@@y}...@>=
  2438  if limit>1 then if buffer[0]="@@" then
  2439    begin if (buffer[1]>="X")and(buffer[1]<="Z") then
  2440      buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
  2441    if (buffer[1]="x")or(buffer[1]="z") then
  2442      begin loc:=2; err_print('! Where is the matching @@y?');
  2443  @.Where is the match...@>
  2444      end
  2445    else if buffer[1]="y" then
  2446      begin if n>0 then
  2447        begin loc:=2; err_print('! Hmm... ',n:1,
  2448          ' of the preceding lines failed to match');
  2449  @.Hmm... n of the preceding...@>
  2450        end;
  2451      return;
  2452      end;
  2453    end
  2454  
  2455  @ @<Initialize the input system@>=
  2456  open_input; line:=0; other_line:=0;@/
  2457  changing:=true; prime_the_change_buffer; change_changing;@/
  2458  limit:=0; loc:=1; buffer[0]:=" "; input_has_ended:=false;
  2459  
  2460  @ The |get_line| procedure is called when |loc>limit|; it puts the next
  2461  line of merged input into the buffer and updates the other variables
  2462  appropriately. A space is placed at the right end of the line.
  2463  
  2464  @p procedure get_line; {inputs the next line}
  2465  label restart;
  2466  begin restart: if changing then
  2467    @<Read from |change_file| and maybe turn off |changing|@>;
  2468  if not changing then
  2469    begin @<Read from |web_file| and maybe turn on |changing|@>;
  2470    if changing then goto restart;
  2471    end;
  2472  loc:=0; buffer[limit]:=" ";
  2473  end;
  2474  
  2475  @ @<Read from |web_file|...@>=
  2476  begin incr(line);
  2477  if not input_ln(web_file) then input_has_ended:=true
  2478  else if change_limit>0 then check_change;
  2479  end
  2480  
  2481  @ @<Read from |change_file|...@>=
  2482  begin incr(line);
  2483  if not input_ln(change_file) then
  2484    begin err_print('! Change file ended without @@z');
  2485  @.Change file ended...@>
  2486    buffer[0]:="@@"; buffer[1]:="z"; limit:=2;
  2487    end;
  2488  if limit>1 then {check if the change has ended}
  2489    if buffer[0]="@@" then
  2490      begin if (buffer[1]>="X")and(buffer[1]<="Z") then
  2491        buffer[1]:=buffer[1]+"z"-"Z"; {lowercasify}
  2492      if (buffer[1]="x")or(buffer[1]="y") then
  2493        begin loc:=2; err_print('! Where is the matching @@z?');
  2494  @.Where is the match...@>
  2495        end
  2496      else if buffer[1]="z" then
  2497        begin prime_the_change_buffer; change_changing;
  2498        end;
  2499      end;
  2500  end
  2501  
  2502  @ At the end of the program, we will tell the user if the change file
  2503  had a line that didn't match any relevant line in |web_file|.
  2504  
  2505  @<Check that all changes have been read@>=
  2506  if change_limit<>0 then {|changing| is false}
  2507    begin for ii:=0 to change_limit-1 do buffer[ii]:=change_buffer[ii];
  2508    limit:=change_limit; changing:=true; line:=other_line; loc:=change_limit;
  2509    err_print('! Change file entry did not match');
  2510  @.Change file entry did not match@>
  2511    end
  2512  
  2513  @ Important milestones are reached during the input phase when certain
  2514  control codes are sensed.
  2515  
  2516  Control codes in \.{WEB} begin with `\.{@@}', and the next character
  2517  identifies the code. Some of these are of interest only to \.{WEAVE},
  2518  so \.{TANGLE} ignores them; the others are converted by \.{TANGLE} into
  2519  internal code numbers by the |control_code| function below. The ordering
  2520  of these internal code numbers has been chosen to simplify the program logic;
  2521  larger numbers are given to the control codes that denote more significant
  2522  milestones.
  2523  
  2524  @d ignore=0 {control code of no interest to \.{TANGLE}}
  2525  @d control_text=@'203 {control code for `\.{@@t}', `\.{@@\^}', etc.}
  2526  @d format=@'204 {control code for `\.{@@f}'}
  2527  @d definition=@'205 {control code for `\.{@@d}'}
  2528  @d begin_Pascal=@'206 {control code for `\.{@@p}'}
  2529  @d module_name=@'207 {control code for `\.{@@<}'}
  2530  @d new_module=@'210 {control code for `\.{@@\ }' and `\.{@@*}'}
  2531  
  2532  @p function control_code(@!c:ASCII_code):eight_bits; {convert |c| after \.{@@}}
  2533  begin case c of
  2534  "@@": control_code:="@@"; {`quoted' at sign}
  2535  "'": control_code:=octal; {precedes octal constant}
  2536  """": control_code:=hex; {precedes hexadecimal constant}
  2537  "$": control_code:=check_sum; {string pool check sum}
  2538  " ",tab_mark: control_code:=new_module; {beginning of a new module}
  2539  "*": begin print('*',module_count+1:1);
  2540    update_terminal; {print a progress report}
  2541    control_code:=new_module; {beginning of a new module}
  2542    end;
  2543  "D","d": control_code:=definition; {macro definition}
  2544  "F","f": control_code:=format; {format definition}
  2545  "{": control_code:=begin_comment; {begin-comment delimiter}
  2546  "}": control_code:=end_comment; {end-comment delimiter}
  2547  "P","p": control_code:=begin_Pascal; {\PASCAL\ text in unnamed module}
  2548  "T","t","^",".",":": control_code:=control_text; {control text to be ignored}
  2549  "&": control_code:=join; {concatenate two tokens}
  2550  "<": control_code:=module_name; {beginning of a module name}
  2551  "=": control_code:=verbatim; {beginning of \PASCAL\ verbatim mode}
  2552  "\": control_code:=force_line; {force a new line in \PASCAL\ output}
  2553  othercases control_code:=ignore {ignore all other cases}
  2554  endcases;
  2555  end;
  2556  
  2557  @ The |skip_ahead| procedure reads through the input at fairly high speed
  2558  until finding the next non-ignorable control code, which it returns.
  2559  
  2560  @p function skip_ahead:eight_bits; {skip to next control code}
  2561  label done;
  2562  var c:eight_bits; {control code found}
  2563  begin loop begin if loc>limit then
  2564      begin get_line;
  2565      if input_has_ended then
  2566        begin c:=new_module; goto done;
  2567        end;
  2568      end;
  2569    buffer[limit+1]:="@@";
  2570    while buffer[loc]<>"@@" do incr(loc);
  2571    if loc<=limit then
  2572      begin loc:=loc+2; c:=control_code(buffer[loc-1]);
  2573      if (c<>ignore)or(buffer[loc-1]=">") then goto done;
  2574      end;
  2575    end;
  2576  done: skip_ahead:=c;
  2577  end;
  2578  
  2579  @ The |skip_comment| procedure reads through the input at somewhat high speed
  2580  until finding the first unmatched right brace or until coming to the end
  2581  of the file. It ignores characters following `\.\\' characters, since all
  2582  braces that aren't nested are supposed to be hidden in that way. For
  2583  example, consider the process of skipping the first comment below,
  2584  where the string containing the right brace has been typed as \.{\`\\.\\\}\'}
  2585  in the \.{WEB} file.
  2586  
  2587  @p procedure skip_comment; {skips to next unmatched `\.\}'}
  2588  label exit;
  2589  var bal:eight_bits; {excess of left braces}
  2590  @!c:ASCII_code; {current character}
  2591  begin bal:=0;
  2592  loop@+  begin if loc>limit then
  2593      begin get_line;
  2594      if input_has_ended then
  2595        begin err_print('! Input ended in mid-comment');
  2596  @.Input ended in mid-comment@>
  2597        return;
  2598        end;
  2599      end;
  2600    c:=buffer[loc]; incr(loc);
  2601    @<Do special things when |c="@@", "\", "{", "}"|; |return| at end@>;
  2602    end;
  2603  exit:end;
  2604  
  2605  @ @<Do special things when |c="@@"...@>=
  2606  if c="@@" then
  2607    begin c:=buffer[loc];
  2608    if (c<>" ")and(c<>tab_mark)and(c<>"*") then incr(loc)
  2609    else  begin err_print('! Section ended in mid-comment');
  2610  @.Section ended in mid-comment@>
  2611      decr(loc); return;
  2612      end
  2613    end
  2614  else if (c="\")and(buffer[loc]<>"@@") then incr(loc)
  2615  else if c="{" then incr(bal)
  2616  else if c="}" then
  2617    begin if bal=0 then return;
  2618    decr(bal);
  2619    end
  2620  
  2621  @* Inputting the next token.
  2622  As stated above, \.{TANGLE}'s most interesting input procedure is the
  2623  |get_next| routine that inputs the next token. However, the procedure
  2624  isn't especially difficult.
  2625  
  2626  In most cases the tokens output by |get_next| have the form used in
  2627  replacement texts, except that two-byte tokens are not produced.
  2628  An identifier that isn't one letter long is represented by the
  2629  output `|identifier|', and in such a case the global variables
  2630  |id_first| and |id_loc| will have been set to the appropriate values
  2631  needed by the |id_lookup| procedure. A string that begins with a
  2632  double-quote is also considered an |identifier|, and in such a case
  2633  the global variable |double_chars| will also have been set appropriately.
  2634  Control codes produce the corresponding output of the |control_code|
  2635  function above; and if that code is |module_name|, the value of |cur_module|
  2636  will point to the |byte_start| entry for that module name.
  2637  
  2638  Another global variable, |scanning_hex|, is |true| during the time that
  2639  the letters \.A through \.F should be treated as if they were digits.
  2640  
  2641  @<Globals...@>=
  2642  @!cur_module: name_pointer; {name of module just scanned}
  2643  @!scanning_hex: boolean; {are we scanning a hexadecimal constant?}
  2644  
  2645  @ @<Set init...@>=
  2646  scanning_hex:=false;
  2647  
  2648  @ At the top level, |get_next| is a multi-way switch based on the next
  2649  character in the input buffer. A |new_module| code is inserted at the
  2650  very end of the input file.
  2651  
  2652  @p function get_next:eight_bits; {produces the next input token}
  2653  label restart,done,found;
  2654  var c:eight_bits; {the current character}
  2655  @!d:eight_bits; {the next character}
  2656  @!j,@!k:0..longest_name; {indices into |mod_text|}
  2657  begin restart: if loc>limit then
  2658    begin get_line;
  2659    if input_has_ended then
  2660      begin c:=new_module; goto found;
  2661      end;
  2662    end;
  2663  c:=buffer[loc]; incr(loc);
  2664  if scanning_hex then @<Go to |found| if |c| is a hexadecimal digit,
  2665    otherwise set |scanning_hex:=false|@>;
  2666  case c of
  2667  "A",up_to("Z"),"a",up_to("z"): @<Get an identifier@>;
  2668  """": @<Get a preprocessed string@>;
  2669  "@@": @<Get control code and possible module name@>;
  2670  @t\4@>@<Compress two-symbol combinations like `\.{:=}'@>@;
  2671  " ",tab_mark: goto restart; {ignore spaces and tabs}
  2672  "{": begin skip_comment; goto restart;
  2673    end;
  2674  "}": begin err_print('! Extra }'); goto restart;
  2675  @.Extra \}@>
  2676    end;
  2677  othercases if c>=128 then goto restart {ignore nonstandard characters}
  2678    else do_nothing
  2679  endcases;
  2680  found:@!debug if trouble_shooting then debug_help;@;@+gubed@/
  2681  get_next:=c;
  2682  end;
  2683  
  2684  @ @<Go to |found| if |c| is a hexadecimal digit...@>=
  2685  if ((c>="0")and(c<="9"))or((c>="A")and(c<="F")) then goto found
  2686  else scanning_hex:=false
  2687  
  2688  @ Note that the following code substitutes \.{@@\{} and \.{@@\}} for the
  2689  respective combinations `\.{(*}' and `\.{*)}'. Explicit braces should be used
  2690  for \TeX\ comments in \PASCAL\ text.
  2691  
  2692  @d compress(#)==begin if loc<=limit then begin c:=#; incr(loc); end; end
  2693  
  2694  @<Compress two-symbol...@>=
  2695  ".": if buffer[loc]="." then compress(double_dot)
  2696    else if buffer[loc]=")" then compress("]");
  2697  ":": if buffer[loc]="=" then compress(left_arrow);
  2698  "=": if buffer[loc]="=" then compress(equivalence_sign);
  2699  ">": if buffer[loc]="=" then compress(greater_or_equal);
  2700  "<": if buffer[loc]="=" then compress(less_or_equal)
  2701    else if buffer[loc]=">" then compress(not_equal);
  2702  "(": if buffer[loc]="*" then compress(begin_comment)
  2703    else if buffer[loc]="." then compress("[");
  2704  "*": if buffer[loc]=")" then compress(end_comment);
  2705  
  2706  @ We have to look at the preceding character to make sure this isn't part
  2707  of a real constant, before trying to find an identifier starting with
  2708  `\.e' or `\.E'.
  2709  
  2710  @<Get an identifier@>=
  2711  begin if ((c="e")or(c="E"))and(loc>1) then
  2712    if (buffer[loc-2]<="9")and(buffer[loc-2]>="0") then c:=0;
  2713  if c<>0 then
  2714    begin decr(loc); id_first:=loc;
  2715    repeat incr(loc); d:=buffer[loc];
  2716    until ((d<"0")or((d>"9")and(d<"A"))or((d>"Z")and(d<"a"))or(d>"z")) and
  2717      (d<>"_");
  2718    if loc>id_first+1 then
  2719      begin c:=identifier; id_loc:=loc;
  2720      end;
  2721    end
  2722  else c:="E"; {exponent of a real constant}
  2723  end
  2724  
  2725  @ A string that starts and ends with double-quote marks is converted into
  2726  an identifier that behaves like a numeric macro by means of the following
  2727  piece of the program.
  2728  @^preprocessed strings@>
  2729  
  2730  @<Get a preprocessed string@>=
  2731  begin double_chars:=0; id_first:=loc-1;
  2732  repeat d:=buffer[loc]; incr(loc);
  2733    if (d="""")or(d="@@") then
  2734      if buffer[loc]=d then
  2735        begin incr(loc); d:=0; incr(double_chars);
  2736        end
  2737      else  begin if d="@@" then err_print('! Double @@ sign missing')
  2738  @.Double \AT! sign missing@>
  2739        end
  2740    else if loc>limit then
  2741      begin err_print('! String constant didn''t end'); d:="""";
  2742  @.String constant didn't end@>
  2743      end;
  2744  until d="""";
  2745  id_loc:=loc-1; c:=identifier;
  2746  end
  2747  
  2748  @ After an \.{@@} sign has been scanned, the next character tells us
  2749  whether there is more work to do.
  2750  
  2751  @<Get control code and possible module name@>=
  2752  begin c:=control_code(buffer[loc]); incr(loc);
  2753  if c=ignore then goto restart
  2754  else if c=hex then scanning_hex:=true
  2755  else if c=module_name then
  2756    @<Scan the \(module name and make |cur_module| point to it@>
  2757  else if c=control_text then
  2758    begin repeat c:=skip_ahead;
  2759    until c<>"@@";
  2760    if buffer[loc-1]<>">" then
  2761      err_print('! Improper @@ within control text');
  2762  @.Improper \AT! within control text@>
  2763    goto restart;
  2764    end;
  2765  end
  2766  
  2767  @ @<Scan the \(module name...@>=
  2768  begin @<Put module name into |mod_text[1..k]|@>;
  2769  if k>3 then
  2770    begin if (mod_text[k]=".")and(mod_text[k-1]=".")and(mod_text[k-2]=".") then
  2771      cur_module:=prefix_lookup(k-3)
  2772    else cur_module:=mod_lookup(k);
  2773    end
  2774  else cur_module:=mod_lookup(k);
  2775  end
  2776  
  2777  @ Module names are placed into the |mod_text| array with consecutive spaces,
  2778  tabs, and carriage-returns replaced by single spaces. There will be no
  2779  spaces at the beginning or the end. (We set |mod_text[0]:=" "| to facilitate
  2780  this, since the |mod_lookup| routine uses |mod_text[1]| as the first
  2781  character of the name.)
  2782  
  2783  @<Set init...@>=mod_text[0]:=" ";
  2784  
  2785  @ @<Put module name...@>=
  2786  k:=0;
  2787  loop@+  begin if loc>limit then
  2788      begin get_line;
  2789      if input_has_ended then
  2790        begin err_print('! Input ended in section name');
  2791  @.Input ended in section name@>
  2792        goto done;
  2793        end;
  2794      end;
  2795    d:=buffer[loc];
  2796    @<If end of name, |goto done|@>;
  2797    incr(loc); if k<longest_name-1 then incr(k);
  2798    if (d=" ")or(d=tab_mark) then
  2799      begin d:=" "; if mod_text[k-1]=" " then decr(k);
  2800      end;
  2801    mod_text[k]:=d;
  2802    end;
  2803  done: @<Check for overlong name@>;
  2804  if (mod_text[k]=" ")and(k>0) then decr(k);
  2805  
  2806  @ @<If end of name,...@>=
  2807  if d="@@" then
  2808    begin d:=buffer[loc+1];
  2809    if d=">" then
  2810      begin loc:=loc+2; goto done;
  2811      end;
  2812    if (d=" ")or(d=tab_mark)or(d="*") then
  2813      begin err_print('! Section name didn''t end'); goto done;
  2814  @.Section name didn't end@>
  2815      end;
  2816    incr(k); mod_text[k]:="@@"; incr(loc); {now |d=buffer[loc]| again}
  2817    end
  2818  
  2819  @ @<Check for overlong name@>=
  2820  if k>=longest_name-2 then
  2821    begin print_nl('! Section name too long: ');
  2822  @.Section name too long@>
  2823    for j:=1 to 25 do print(xchr[mod_text[j]]);
  2824    print('...'); mark_harmless;
  2825    end
  2826  
  2827  @* Scanning a numeric definition.
  2828  When \.{TANGLE} looks at the \PASCAL\ text following the `\.=' of a numeric
  2829  macro definition, it calls on the procedure |scan_numeric(p)|, where |p|
  2830  points to the name that is to be defined. This procedure evaluates the
  2831  right-hand side, which must consist entirely of integer constants and
  2832  defined numeric macros connected with \.+ and \.- signs (no parentheses).
  2833  It also sets the global variable |next_control| to the control code that
  2834  terminated this definition.
  2835  
  2836  A definition ends with the control codes |definition|, |format|, |module_name|,
  2837  |begin_Pascal|, and |new_module|, all of which can be recognized
  2838  by the fact that they are the largest values |get_next| can return.
  2839  
  2840  @d end_of_definition(#)==(#>=format)
  2841    {is |#| a control code ending a definition?}
  2842  
  2843  @<Global...@>=
  2844  @!next_control:eight_bits; {control code waiting to be acted upon}
  2845  
  2846  @ The evaluation of a numeric expression makes use of two variables called the
  2847  |accumulator| and the |next_sign|. At the beginning, |accumulator| is zero and
  2848  |next_sign| is $+1$. When a \.+ or \.- is scanned, |next_sign| is multiplied
  2849  by the value of that sign. When a numeric value is scanned, it is multiplied by
  2850  |next_sign| and added to the |accumulator|, then |next_sign| is reset to $+1$.
  2851  
  2852  @d add_in(#)==begin accumulator:=accumulator+next_sign*(#); next_sign:=+1;
  2853    end
  2854  
  2855  @p procedure scan_numeric(@!p:name_pointer); {defines numeric macros}
  2856  label reswitch, done;
  2857  var accumulator:integer; {accumulates sums}
  2858  @!next_sign:-1..+1; {sign to attach to next value}
  2859  @!q:name_pointer; {points to identifiers being evaluated}
  2860  @!val:integer; {constants being evaluated}
  2861  begin @<Set \(|accumulator| to the value of the right-hand side@>;
  2862  if abs(accumulator)>=@'100000 then
  2863    begin err_print('! Value too big: ',accumulator:1); accumulator:=0;
  2864  @.Value too big@>
  2865    end;
  2866  equiv[p]:=accumulator+@'100000; {name |p| now is defined to equal |accumulator|}
  2867  end;
  2868  
  2869  @ @<Set \(|accumulator| to the value of the right-hand side@>=
  2870  accumulator:=0; next_sign:=+1;
  2871  loop@+  begin next_control:=get_next;
  2872    reswitch: case next_control of
  2873    digits: begin @<Set |val| to value of decimal constant, and
  2874        set |next_control| to the following token@>;
  2875      add_in(val); goto reswitch;
  2876      end;
  2877    octal: begin @<Set |val| to value of octal constant, and
  2878        set |next_control| to the following token@>;
  2879      add_in(val); goto reswitch;
  2880      end;
  2881    hex: begin @<Set |val| to value of hexadecimal constant, and
  2882        set |next_control| to the following token@>;
  2883      add_in(val); goto reswitch;
  2884      end;
  2885    identifier: begin q:=id_lookup(normal);
  2886      if ilk[q]<>numeric then
  2887        begin next_control:="*"; goto reswitch; {leads to error}
  2888        end;
  2889      add_in(equiv[q]-@'100000);
  2890      end;
  2891    "+": do_nothing;
  2892    "-": next_sign:=-next_sign;
  2893    format, definition, module_name, begin_Pascal, new_module: goto done;
  2894    ";": err_print('! Omit semicolon in numeric definition');
  2895  @.Omit semicolon in numeric def...@>
  2896    othercases @<Signal error, flush rest of the definition@>
  2897    endcases;
  2898    end;
  2899  done:
  2900  
  2901  @ @<Signal error, flush rest...@>=
  2902  begin err_print('! Improper numeric definition will be flushed');
  2903  @.Improper numeric definition...@>
  2904  repeat next_control:=skip_ahead
  2905  until end_of_definition(next_control);
  2906  if next_control=module_name then
  2907    begin {we want to scan the module name too}
  2908    loc:=loc-2; next_control:=get_next;
  2909    end;
  2910  accumulator:=0; goto done;
  2911  end
  2912  
  2913  @ @<Set |val| to value of decimal...@>=
  2914  val:=0;
  2915  repeat val:=10*val+next_control-"0"; next_control:=get_next;
  2916  until (next_control>"9")or(next_control<"0")
  2917  
  2918  @ @<Set |val| to value of octal...@>=
  2919  val:=0; next_control:="0";
  2920  repeat val:=8*val+next_control-"0"; next_control:=get_next;
  2921  until (next_control>"7")or(next_control<"0")
  2922  
  2923  @ @<Set |val| to value of hex...@>=
  2924  val:=0; next_control:="0";
  2925  repeat if next_control>="A" then next_control:=next_control+"0"+10-"A";
  2926  val:=16*val+next_control-"0"; next_control:=get_next;
  2927  until (next_control>"F")or(next_control<"0")or@|
  2928    ((next_control>"9")and(next_control<"A"))
  2929  
  2930  @* Scanning a macro definition.
  2931  The rules for generating the replacement texts corresponding to simple
  2932  macros, parametric macros, and \PASCAL\ texts of a module are almost
  2933  identical, so a single procedure is used for all three cases. The
  2934  differences are that
  2935  
  2936  \yskip\item{a)} The sign |#| denotes a parameter only when it appears
  2937  outside of strings in a parametric macro; otherwise it stands for the
  2938  ASCII character |#|. (This is not used in standard \PASCAL, but some
  2939  \PASCAL s allow, for example, `\.{/\#}' after a certain kind of file name.)
  2940  
  2941  \item{b)}Module names are not allowed in simple macros or parametric macros;
  2942  in fact, the appearance of a module name terminates such macros and denotes
  2943  the name of the current module.
  2944  
  2945  \item{c)}The symbols \.{@@d} and \.{@@f} and \.{@@p} are not allowed after
  2946  module names, while they terminate macro definitions.
  2947  
  2948  @ Therefore there is a procedure |scan_repl| whose parameter |t| specifies
  2949  either |simple| or |parametric| or |module_name|. After |scan_repl| has
  2950  acted, |cur_repl_text| will point to the replacement text just generated, and
  2951  |next_control| will contain the control code that terminated the activity.
  2952  
  2953  @<Globals...@>=
  2954  @!cur_repl_text:text_pointer; {replacement text formed by |scan_repl|}
  2955  
  2956  @ @p procedure scan_repl(@!t:eight_bits); {creates a replacement text}
  2957  label continue, done, found, reswitch;
  2958  var a:sixteen_bits; {the current token}
  2959  @!b:ASCII_code; {a character from the buffer}
  2960  @!bal:eight_bits; {left parentheses minus right parentheses}
  2961  begin bal:=0;
  2962  loop@+  begin continue: a:=get_next;
  2963    case a of
  2964    "(": incr(bal);
  2965    ")": if bal=0 then err_print('! Extra )')
  2966  @.Extra )@>
  2967      else decr(bal);
  2968    "'": @<Copy a string from the buffer to |tok_mem|@>;
  2969    "#": if t=parametric then a:=param;
  2970    @t\4@>@<In cases that |a| is a non-ASCII token (|identifier|,
  2971    |module_name|, etc.), either process it and change |a| to a byte
  2972    that should be stored, or |goto continue| if |a| should be ignored,
  2973    or |goto done| if |a| signals the end of this replacement text@>@;
  2974    othercases do_nothing
  2975    endcases;@/
  2976    app_repl(a); {store |a| in |tok_mem|}
  2977    end;
  2978  done: next_control:=a;
  2979  @<Make sure the parentheses balance@>;
  2980  if text_ptr>max_texts-zz then overflow('text');
  2981  cur_repl_text:=text_ptr; tok_start[text_ptr+zz]:=tok_ptr[z];
  2982  incr(text_ptr);
  2983  if z=zz-1 then z:=0@+else incr(z);
  2984  end;
  2985  
  2986  @ @<Make sure the parentheses balance@>=
  2987  if bal>0 then
  2988    begin if bal=1 then err_print('! Missing )')
  2989    else err_print('! Missing ',bal:1,' )''s');
  2990  @.Missing n )@>
  2991    while bal>0 do
  2992      begin app_repl(")"); decr(bal);
  2993      end;
  2994    end
  2995  
  2996  @ @<In cases that |a| is...@>=
  2997  identifier: begin a:=id_lookup(normal); app_repl((a div @'400)+@'200);
  2998    a:=a mod @'400;
  2999    end;
  3000  module_name: if t<>module_name then goto done
  3001    else  begin app_repl((cur_module div @'400)+@'250);
  3002      a:=cur_module mod @'400;
  3003      end;
  3004  verbatim: @<Copy verbatim string from the buffer to |tok_mem|@>;
  3005  definition, format, begin_Pascal: if t<>module_name then goto done
  3006    else  begin err_print('! @@',xchr[buffer[loc-1]],
  3007  @.\AT!p is ignored in Pascal text@>
  3008  @.\AT!d is ignored in Pascal text@>
  3009  @.\AT!f is ignored in Pascal text@>
  3010        ' is ignored in Pascal text'); goto continue;
  3011      end;
  3012  new_module: goto done;
  3013  
  3014  @ @<Copy a string...@>=
  3015  begin b:="'";
  3016  loop@+  begin app_repl(b);
  3017    if b="@@" then
  3018      if buffer[loc]="@@" then incr(loc) {store only one \.{@@}}
  3019      else err_print('! You should double @@ signs in strings');
  3020  @.You should double \AT! signs@>
  3021    if loc=limit then
  3022      begin err_print('! String didn''t end');
  3023  @.String didn't end@>
  3024      buffer[loc]:="'"; buffer[loc+1]:=0;
  3025      end;
  3026    b:=buffer[loc]; incr(loc);
  3027    if b="'" then
  3028      begin if buffer[loc]<>"'" then goto found
  3029      else  begin incr(loc); app_repl("'");
  3030        end;
  3031      end;
  3032    end;
  3033  found: end {now |a| holds the final |"'"| that will be stored}
  3034  
  3035  @ @<Copy verbatim string...@>=
  3036  begin app_repl(verbatim);
  3037  buffer[limit+1]:="@@";
  3038  reswitch: if buffer[loc]="@@" then
  3039    begin if loc<limit then if buffer[loc+1]="@@" then
  3040      begin app_repl("@@");
  3041      loc:=loc+2;
  3042      goto reswitch;
  3043      end;
  3044    end
  3045  else begin app_repl(buffer[loc]);
  3046    incr(loc);
  3047    goto reswitch;
  3048    end;
  3049  if loc>=limit then err_print('! Verbatim string didn''t end')
  3050  @.Verbatim string didn't end@>
  3051  else if buffer[loc+1]<>">" then
  3052    err_print('! You should double @@ signs in verbatim strings');
  3053  @.You should double \AT! signs@>
  3054  loc:=loc+2;
  3055  end {another |verbatim| byte will be stored, since |a=verbatim|}
  3056  
  3057  @ The following procedure is used to define a simple or parametric macro,
  3058  just after the `\.{==}' of its definition has been scanned.
  3059  
  3060  @p procedure define_macro(@!t:eight_bits);
  3061  var p:name_pointer; {the identifier being defined}
  3062  begin p:=id_lookup(t); scan_repl(t);@/
  3063  equiv[p]:=cur_repl_text; text_link[cur_repl_text]:=0;
  3064  end;
  3065  
  3066  @* Scanning a module.
  3067  The |scan_module| procedure starts when `\.{@@\ }' or `\.{@@*}' has been
  3068  sensed in the input, and it proceeds until the end of that module.  It
  3069  uses |module_count| to keep track of the current module number; with luck,
  3070  \.{WEAVE} and \.{TANGLE} will both assign the same numbers to modules.
  3071  
  3072  @<Globals...@>=
  3073  @!module_count:0..@'27777; {the current module number}
  3074  
  3075  @ The top level of |scan_module| is trivial.
  3076  @p procedure scan_module;
  3077  label continue, done, exit;
  3078  var p:name_pointer; {module name for the current module}
  3079  begin incr(module_count);
  3080  @<Scan the \(definition part of the current module@>;
  3081  @<Scan the \PASCAL\ part of the current module@>;
  3082  exit: end;
  3083  
  3084  @ @<Scan the \(definition part...@>=
  3085  next_control:=0;
  3086  loop@+  begin continue: while next_control<=format do
  3087      begin next_control:=skip_ahead;
  3088      if next_control=module_name then
  3089        begin {we want to scan the module name too}
  3090        loc:=loc-2; next_control:=get_next;
  3091        end;
  3092      end;
  3093    if next_control<>definition then goto done;
  3094    next_control:=get_next; {get identifier name}
  3095    if next_control<>identifier then
  3096      begin err_print('! Definition flushed, must start with ',
  3097  @.Definition flushed...@>
  3098        'identifier of length > 1'); goto continue;
  3099      end;
  3100    next_control:=get_next; {get token after the identifier}
  3101    if next_control="=" then
  3102      begin scan_numeric(id_lookup(numeric)); goto continue;
  3103      end
  3104    else if next_control=equivalence_sign then
  3105      begin define_macro(simple); goto continue;
  3106      end
  3107    else @<If the next text is `\.{(\#)==}', call |define_macro|
  3108      and |goto continue|@>;
  3109    err_print('! Definition flushed since it starts badly');
  3110  @.Definition flushed...@>
  3111    end;
  3112  done:
  3113  
  3114  @ @<If the next text is `\.{(\#)==}'...@>=
  3115  if next_control="(" then
  3116    begin next_control:=get_next;
  3117    if next_control="#" then
  3118      begin next_control:=get_next;
  3119      if next_control=")" then
  3120        begin next_control:=get_next;
  3121        if next_control="=" then
  3122          begin err_print('! Use == for macros');
  3123  @.Use == for macros@>
  3124          next_control:=equivalence_sign;
  3125          end;
  3126        if next_control=equivalence_sign then
  3127          begin define_macro(parametric); goto continue;
  3128          end;
  3129        end;
  3130      end;
  3131    end;
  3132  
  3133  @ @<Scan the \PASCAL...@>=
  3134  case next_control of
  3135  begin_Pascal:p:=0;
  3136  module_name: begin p:=cur_module;
  3137    @<Check that |=| or |==| follows this module name, otherwise |return|@>;
  3138    end;
  3139  othercases return
  3140  endcases;@/
  3141  @<Insert the module number into |tok_mem|@>;
  3142  scan_repl(module_name); {now |cur_repl_text| points to the replacement text}
  3143  @<Update the data structure so that the replacement text is accessible@>;
  3144  
  3145  @ @<Check that |=|...@>=
  3146  repeat next_control:=get_next;
  3147  until next_control<>"+"; {allow optional `\.{+=}'}
  3148  if (next_control<>"=")and(next_control<>equivalence_sign) then
  3149    begin err_print('! Pascal text flushed, = sign is missing');
  3150  @.Pascal text flushed...@>
  3151    repeat next_control:=skip_ahead;
  3152    until next_control=new_module;
  3153    return;
  3154    end
  3155  
  3156  @ @<Insert the module number...@>=
  3157  store_two_bytes(@'150000+module_count); {|@'150000=@'320*@'400|}
  3158  
  3159  @ @<Update the data...@>=
  3160  if p=0 then {unnamed module}
  3161    begin text_link[last_unnamed]:=cur_repl_text; last_unnamed:=cur_repl_text;
  3162    end
  3163  else if equiv[p]=0 then equiv[p]:=cur_repl_text {first module of this name}
  3164  else  begin p:=equiv[p];
  3165    while text_link[p]<module_flag do p:=text_link[p]; {find end of list}
  3166    text_link[p]:=cur_repl_text;
  3167    end;
  3168  text_link[cur_repl_text]:=module_flag;
  3169    {mark this replacement text as a nonmacro}
  3170  
  3171  @* Debugging.
  3172  The \PASCAL\ debugger with which \.{TANGLE} was developed allows breakpoints
  3173  to be set, and variables can be read and changed, but procedures cannot be
  3174  executed. Therefore a `|debug_help|' procedure has been inserted in the main
  3175  loops of each phase of the program; when |ddt| and |dd| are set to appropriate
  3176  values, symbolic printouts of various tables will appear.
  3177  
  3178  The idea is to set a breakpoint inside the |debug_help| routine, at the
  3179  place of `\ignorespaces|breakpoint:|\unskip' below.  Then when
  3180  |debug_help| is to be activated, set |trouble_shooting| equal to |true|.
  3181  The |debug_help| routine will prompt you for values of |ddt| and |dd|,
  3182  discontinuing this when |ddt<=0|; thus you type $2n+1$ integers, ending
  3183  with zero or a negative number. Then control either passes to the
  3184  breakpoint, allowing you to look at and/or change variables (if you typed
  3185  zero), or to exit the routine (if you typed a negative value).
  3186  
  3187  Another global variable, |debug_cycle|, can be used to skip silently
  3188  past calls on |debug_help|. If you set |debug_cycle>1|, the program stops
  3189  only every |debug_cycle| times |debug_help| is called; however,
  3190  any error stop will set |debug_cycle| to zero.
  3191  
  3192  @<Globals...@>=
  3193  @!debug@!trouble_shooting:boolean; {is |debug_help| wanted?}
  3194  @!ddt:integer; {operation code for the |debug_help| routine}
  3195  @!dd:integer; {operand in procedures performed by |debug_help|}
  3196  @!debug_cycle:integer; {threshold for |debug_help| stopping}
  3197  @!debug_skipped:integer; {we have skipped this many |debug_help| calls}
  3198  @!term_in:text_file; {the user's terminal as an input file}
  3199  gubed
  3200  
  3201  @ The debugging routine needs to read from the user's terminal.
  3202  @^system dependencies@>
  3203  @<Set init...@>=
  3204  @!debug trouble_shooting:=true; debug_cycle:=1; debug_skipped:=0;@/
  3205  trouble_shooting:=false; debug_cycle:=99999; {use these when it almost works}
  3206  reset(term_in,'TTY:','/I'); {open |term_in| as the terminal, don't do a |get|}
  3207  gubed
  3208  
  3209  @ @d breakpoint=888 {place where a breakpoint is desirable}
  3210  @^system dependencies@>
  3211  
  3212  @p @!debug procedure debug_help; {routine to display various things}
  3213  label breakpoint,exit;
  3214  var k:integer; {index into various arrays}
  3215  begin incr(debug_skipped);
  3216  if debug_skipped<debug_cycle then return;
  3217  debug_skipped:=0;
  3218  loop@+  begin print_nl('#'); update_terminal; {prompt}
  3219    read(term_in,ddt); {read a debug-command code}
  3220    if ddt<0 then return
  3221    else if ddt=0 then
  3222      begin goto breakpoint;@\ {go to every label at least once}
  3223      breakpoint: ddt:=0;@\
  3224      end
  3225    else  begin read(term_in,dd);
  3226      case ddt of
  3227      1: print_id(dd);
  3228      2: print_repl(dd);
  3229      3: for k:=1 to dd do print(xchr[buffer[k]]);
  3230      4: for k:=1 to dd do print(xchr[mod_text[k]]);
  3231      5: for k:=1 to out_ptr do print(xchr[out_buf[k]]);
  3232      6: for k:=1 to dd do print(xchr[out_contrib[k]]);
  3233      othercases print('?')
  3234      endcases;
  3235      end;
  3236    end;
  3237  exit:end;
  3238  gubed
  3239  
  3240  @* The main program.
  3241  We have defined plenty of procedures, and it is time to put the last
  3242  pieces of the puzzle in place. Here is where \.{TANGLE} starts, and where
  3243  it ends.
  3244  @^system dependencies@>
  3245  
  3246  @p begin initialize;
  3247  @<Initialize the input system@>;
  3248  print_ln(banner); {print a ``banner line''}
  3249  @<Phase I: Read all the user's text and compress it into |tok_mem|@>;
  3250  stat for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];@+tats@;@/
  3251  @<Phase II:...@>;
  3252  end_of_TANGLE:
  3253  if string_ptr>256 then @<Finish off the string pool file@>;
  3254  stat @<Print statistics about memory usage@>;@+tats@;@/
  3255  @t\4\4@>{here files should be closed if the operating system requires it}
  3256  @<Print the job |history|@>;
  3257  end.
  3258  
  3259  @ @<Phase I:...@>=
  3260  phase_one:=true;
  3261  module_count:=0;
  3262  repeat next_control:=skip_ahead;
  3263  until next_control=new_module;
  3264  while not input_has_ended do scan_module;
  3265  @<Check that all changes have been read@>;
  3266  phase_one:=false;
  3267  
  3268  @ @<Finish off the string pool file@>=
  3269  begin print_nl(string_ptr-256:1, ' strings written to string pool file.');
  3270  write(pool,'*');
  3271  for ii:=1 to 9 do
  3272    begin out_buf[ii]:=pool_check_sum mod 10;
  3273    pool_check_sum:=pool_check_sum div 10;
  3274    end;
  3275  for ii:=9 downto 1 do write(pool,xchr["0"+out_buf[ii]]);
  3276  write_ln(pool);
  3277  end
  3278  
  3279  @ @<Glob...@>=
  3280  stat @!wo:0..ww-1; {segment of memory for which statistics are being printed}
  3281  tats
  3282  
  3283  @ @<Print statistics about memory usage@>=
  3284  print_nl('Memory usage statistics:');
  3285  print_nl(name_ptr:1, ' names, ', text_ptr:1, ' replacement texts;');
  3286  print_nl(byte_ptr[0]:1);
  3287  for wo:=1 to ww-1 do print('+',byte_ptr[wo]:1);
  3288  if phase_one then
  3289    for ii:=0 to zz-1 do max_tok_ptr[ii]:=tok_ptr[ii];
  3290  print(' bytes, ', max_tok_ptr[0]:1);
  3291  for ii:=1 to zz-1 do print('+',max_tok_ptr[ii]:1);
  3292  print(' tokens.');
  3293  
  3294  @ Some implementations may wish to pass the |history| value to the
  3295  operating system so that it can be used to govern whether or not other
  3296  programs are started. Here we simply report the history to the user.
  3297  @^system dependencies@>
  3298  
  3299  @<Print the job |history|@>=
  3300  case history of
  3301  spotless: print_nl('(No errors were found.)');
  3302  harmless_message: print_nl('(Did you see the warning message above?)');
  3303  error_message: print_nl('(Pardon me, but I think I spotted something wrong.)');
  3304  fatal_message: print_nl('(That was a fatal error, my friend.)');
  3305  end {there are no other cases}
  3306  
  3307  @* System-dependent changes.
  3308  This module should be replaced, if necessary, by changes to the program
  3309  that are necessary to make \.{TANGLE} work at a particular installation.
  3310  It is usually best to design your change file so that all changes to
  3311  previous modules preserve the module numbering; then everybody's version
  3312  will be consistent with the printed program. More extensive changes,
  3313  which introduce new modules, can be inserted here; then only the index
  3314  itself will get a new module number.
  3315  @^system dependencies@>
  3316  
  3317  @* Index.
  3318  Here is a cross-reference table for the \.{TANGLE} processor.
  3319  All modules in which an identifier is
  3320  used are listed with that identifier, except that reserved words are
  3321  indexed only when they appear in format definitions, and the appearances
  3322  of identifiers in module names are not indexed. Underlined entries
  3323  correspond to where the identifier was declared. Error messages and
  3324  a few other things like ``ASCII code'' are indexed here too.