modernc.org/knuth@v0.0.4/web/testdata/ctan.org/tex-archive/systems/knuth/dist/texware/tftopl.web (about) 1 % This program by D. E. Knuth is not copyrighted and can be used freely. 2 % Version 0 was implemented in January 1982. 3 % In February 1982 a new restriction on ligature steps was added. 4 % In June 1982 the routines were divided into smaller pieces for IBM people, 5 % and the result was designated "Version 1" in September 1982. 6 % Slight changes were made in October, 1982, for version 0.6 of TeX. 7 % Version 2 (July 1983) was released with TeX version 0.999. 8 % Version 2.1 (September 1983) changed TEXINFO to FONTDIMEN. 9 % Version 2.2 (February 1984) simplified decimal fraction output. 10 % Version 2.3 (May 1984) fixed a bug when lh=17. 11 % Version 2.4 (July 1984) fixed a bug involving unused ligature code. 12 % Version 2.5 (September 1985) updated the standard codingscheme names. 13 % Version 3 (October 1989) introduced new ligature capabilities. 14 % Version 3.1 (November 1989) renamed z[] to lig_z[] for better portability. 15 % Version 3.2 (February 2008) added a newline after a warning message. 16 % Version 3.3 (January 2014) added a space to an error message (Breitenlohner), 17 % and tests nl>lig_size not 4*lig_size (C. M. Connelly, Melissa O'Neill). 18 19 % Here is TeX material that gets inserted after \input webmac 20 \def\hang{\hangindent 3em\indent\ignorespaces} 21 \font\ninerm=cmr9 22 \let\mc=\ninerm % medium caps for names like SAIL 23 \def\PASCAL{Pascal} 24 25 \def\(#1){} % this is used to make section names sort themselves better 26 \def\9#1{} % this is used for sort keys in the index 27 28 \def\title{TF\lowercase{to}PL} 29 \def\contentspagenumber{201} 30 \def\topofcontents{\null 31 \titlefalse % include headline on the contents page 32 \def\rheader{\mainfont\hfil \contentspagenumber} 33 \vfill 34 \centerline{\titlefont The {\ttitlefont TFtoPL} processor} 35 \vskip 15pt 36 \centerline{(Version 3.3, January 2014)} 37 \vfill} 38 \def\botofcontents{\vfill 39 \centerline{\hsize 5in\baselineskip9pt 40 \vbox{\ninerm\noindent 41 The preparation of this report 42 was supported in part by the National Science 43 Foundation under grants IST-8201926 and MCS-8300984, 44 and by the System Development Foundation. `\TeX' is a 45 trademark of the American Mathematical Society.}}} 46 \pageno=\contentspagenumber \advance\pageno by 1 47 48 @* Introduction. 49 The \.{TFtoPL} utility program converts \TeX\ font metric (``\.{TFM}'') 50 files into equivalent property-list (``\.{PL}'') files. It also 51 makes a thorough check of the given \.{TFM} file, using essentially the 52 same algorithm as \TeX. Thus if \TeX\ complains that a \.{TFM} 53 file is ``bad,'' this program will pinpoint the source or sources of 54 badness. A \.{PL} file output by this program can be edited with 55 a normal text editor, and the result can be converted back to \.{TFM} 56 format using the companion program \.{PLtoTF}. 57 58 The first \.{TFtoPL} program was designed by Leo Guibas in the summer of 59 1978. Contributions by Frank Liang, Doug Wyatt, and Lyle Ramshaw 60 also had a significant effect on the evolution of the present code. 61 62 Extensions for an enhanced ligature mechanism were added by the author in 1989. 63 64 The |banner| string defined here should be changed whenever \.{TFtoPL} 65 gets modified. 66 67 @d banner=='This is TFtoPL, Version 3.3' {printed when the program starts} 68 69 @ This program is written entirely in standard \PASCAL, except that 70 it occasionally has lower case letters in strings that are output. 71 Such letters can be converted to upper case if necessary. The input is read 72 from |tfm_file|, and the output is written on |pl_file|; error messages and 73 other remarks are written on the |output| file, which the user may 74 choose to assign to the terminal if the system permits it. 75 @^system dependencies@> 76 77 The term |print| is used instead of |write| when this program writes on 78 the |output| file, so that all such output can be easily deflected. 79 80 @d print(#)==write(#) 81 @d print_ln(#)==write_ln(#) 82 83 @p program TFtoPL(@!tfm_file,@!pl_file,@!output); 84 label @<Labels in the outer block@>@/ 85 const @<Constants in the outer block@>@/ 86 type @<Types in the outer block@>@/ 87 var @<Globals in the outer block@>@/ 88 procedure initialize; {this procedure gets things started properly} 89 begin print_ln(banner);@/ 90 @<Set initial values@>@/ 91 end; 92 93 @ If the program has to stop prematurely, it goes to the 94 `|final_end|'. 95 96 @d final_end=9999 {label for the end of it all} 97 98 @<Labels...@>=final_end; 99 100 @ The following parameters can be changed at compile time to extend or 101 reduce \.{TFtoPL}'s capacity. 102 103 @<Constants...@>= 104 @!tfm_size=30000; {maximum length of |tfm| data, in bytes} 105 @!lig_size=5000; {maximum length of |lig_kern| program, in words} 106 @!hash_size=5003; {preferably a prime number, a bit larger than the number 107 of character pairs in lig/kern steps} 108 109 @ Here are some macros for common programming idioms. 110 111 @d incr(#) == #:=#+1 {increase a variable by unity} 112 @d decr(#) == #:=#-1 {decrease a variable by unity} 113 @d do_nothing == {empty statement} 114 115 @* Font metric data. 116 The idea behind \.{TFM} files is that typesetting routines like \TeX\ 117 need a compact way to store the relevant information about several 118 dozen fonts, and computer centers need a compact way to store the 119 relevant information about several hundred fonts. \.{TFM} files are 120 compact, and most of the information they contain is highly relevant, 121 so they provide a solution to the problem. 122 123 The information in a \.{TFM} file appears in a sequence of 8-bit bytes. 124 Since the number of bytes is always a multiple of 4, we could 125 also regard the file as a sequence of 32-bit words; but \TeX\ uses the 126 byte interpretation, and so does \.{TFtoPL}. Note that the bytes 127 are considered to be unsigned numbers. 128 129 @<Glob...@>= 130 @!tfm_file:packed file of 0..255; 131 132 @ On some systems you may have to do something special to read a 133 packed file of bytes. For example, the following code didn't work 134 when it was first tried at Stanford, because packed files have to be 135 opened with a special switch setting on the \PASCAL\ that was used. 136 @^system dependencies@> 137 138 @<Set init...@>= 139 reset(tfm_file); 140 141 @ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit 142 integers that give the lengths of the various subsequent portions 143 of the file. These twelve integers are, in order: 144 $$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr 145 |@!lf|&length of the entire file, in words;\cr 146 |@!lh|&length of the header data, in words;\cr 147 |@!bc|&smallest character code in the font;\cr 148 |@!ec|&largest character code in the font;\cr 149 |@!nw|&number of words in the width table;\cr 150 |@!nh|&number of words in the height table;\cr 151 |@!nd|&number of words in the depth table;\cr 152 |@!ni|&number of words in the italic correction table;\cr 153 |@!nl|&number of words in the lig/kern table;\cr 154 |@!nk|&number of words in the kern table;\cr 155 |@!ne|&number of words in the extensible character table;\cr 156 |@!np|&number of font parameter words.\cr}}$$ 157 They are all nonnegative and less than $2^{15}$. We must have |bc-1<=ec<=255|, 158 |ne<=256|, and 159 $$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$ 160 Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|), 161 and as few as 0 characters (if |bc=ec+1|). 162 163 Incidentally, when two or more 8-bit bytes are combined to form an integer of 164 16 or more bits, the most significant bytes appear first in the file. 165 This is called BigEndian order. 166 167 @<Glob...@>= 168 @!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np:0..@'77777; 169 {subfile sizes} 170 171 @ The rest of the \.{TFM} file may be regarded as a sequence of ten data 172 arrays having the informal specification 173 $$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2} 174 \vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr 175 header&|[0..lh-1]stuff|\cr 176 char\_info&|[bc..ec]char_info_word|\cr 177 width&|[0..nw-1]fix_word|\cr 178 height&|[0..nh-1]fix_word|\cr 179 depth&|[0..nd-1]fix_word|\cr 180 italic&|[0..ni-1]fix_word|\cr 181 lig\_kern&|[0..nl-1]lig_kern_command|\cr 182 kern&|[0..nk-1]fix_word|\cr 183 exten&|[0..ne-1]extensible_recipe|\cr 184 param&|[1..np]fix_word|\cr}}$$ 185 The most important data type used here is a |@!fix_word|, which is 186 a 32-bit representation of a binary fraction. A |fix_word| is a signed 187 quantity, with the two's complement of the entire word used to represent 188 negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the 189 binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and 190 the smallest is $-2048$. We will see below, however, that all but one of 191 the |fix_word| values will lie between $-16$ and $+16$. 192 193 @ The first data array is a block of header information, which contains 194 general facts about the font. The header must contain at least two words, 195 and for \.{TFM} files to be used with Xerox printing software it must 196 contain at least 18 words, allocated as described below. When different 197 kinds of devices need to be interfaced, it may be necessary to add further 198 words to the header block. 199 200 \yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into the 201 \.{DVI} output file whenever it uses the font. Later on when the \.{DVI} 202 file is printed, possibly on another computer, the actual font that gets 203 used is supposed to have a check sum that agrees with the one in the 204 \.{TFM} file used by \TeX. In this way, users will be warned about 205 potential incompatibilities. (However, if the check sum is zero in either 206 the font file or the \.{TFM} file, no check is made.) The actual relation 207 between this check sum and the rest of the \.{TFM} file is not important; 208 the check sum is simply an identification number with the property that 209 incompatible fonts almost always have distinct check sums. 210 @^check sum@> 211 212 \yskip\hang|header[1]| is a |fix_word| containing the design size of the 213 font, in units of \TeX\ points (7227 \TeX\ points = 254 cm). This number 214 must be at least 1.0; it is fairly arbitrary, but usually the design size 215 is 10.0 for a ``10 point'' font, i.e., a font that was designed to look 216 best at a 10-point size, whatever that really means. When a \TeX\ user 217 asks for a font `\.{at} $\delta$ \.{pt}', the effect is to override the 218 design size and replace it by $\delta$, and to multiply the $x$ and~$y$ 219 coordinates of the points in the font image by a factor of $\delta$ 220 divided by the design size. {\sl All other dimensions in the\/\ \.{TFM} 221 file are |fix_word|\kern-1pt\ numbers in design-size units.} Thus, for example, 222 the value of |param[6]|, one \.{em} or \.{\\quad}, is often the |fix_word| 223 value $2^{20}=1.0$, since many fonts have a design size equal to one em. 224 The other dimensions must be less than 16 design-size units in absolute 225 value; thus, |header[1]| and |param[1]| are the only |fix_word| entries in 226 the whole \.{TFM} file whose first byte might be something besides 0 or 227 255. @^design size@> 228 229 \yskip\hang|header[2..11]|, if present, contains 40 bytes that identify 230 the character coding scheme. The first byte, which must be between 0 and 231 39, is the number of subsequent ASCII bytes actually relevant in this 232 string, which is intended to specify what character-code-to-symbol 233 convention is present in the font. Examples are \.{ASCII} for standard 234 ASCII, \.{TeX text} for fonts like \.{cmr10} and \.{cmti9}, \.{TeX math 235 extension} for \.{cmex10}, \.{XEROX text} for Xerox fonts, \.{GRAPHIC} for 236 special-purpose non-alphabetic fonts, \.{UNSPECIFIED} for the default case 237 when there is no information. Parentheses should not appear in this name. 238 (Such a string is said to be in {\mc BCPL} format.) 239 @^coding scheme@> 240 241 \yskip\hang|header[12..16]|, if present, contains 20 bytes that name the 242 font family (e.g., \.{CMR} or \.{HELVETICA}), in {\mc BCPL} format. 243 This field is also known as the ``font identifier.'' 244 @^family name@> 245 @^font identifier@> 246 247 \yskip\hang|header[17]|, if present, contains a first byte called the 248 |seven_bit_safe_flag|, then two bytes that are ignored, and a fourth byte 249 called the |face|. If the value of the fourth byte is less than 18, it has 250 the following interpretation as a ``weight, slope, and expansion'': Add 0 251 or 2 or 4 (for medium or bold or light) to 0 or 1 (for roman or italic) to 252 0 or 6 or 12 (for regular or condensed or extended). For example, 13 is 253 0+1+12, so it represents medium italic extended. A three-letter code 254 (e.g., \.{MIE}) can be used for such |face| data. 255 256 \yskip\hang|header[18..@twhatever@>]| might also be present; the individual 257 words are simply called |header[18]|, |header[19]|, etc., at the moment. 258 259 @ Next comes the |char_info| array, which contains one |char_info_word| 260 per character. Each |char_info_word| contains six fields packed into 261 four bytes as follows. 262 263 \yskip\hang first byte: |width_index| (8 bits)\par 264 \hang second byte: |height_index| (4 bits) times 16, plus |depth_index| 265 (4~bits)\par 266 \hang third byte: |italic_index| (6 bits) times 4, plus |tag| 267 (2~bits)\par 268 \hang fourth byte: |remainder| (8 bits)\par 269 \yskip\noindent 270 The actual width of a character is |width[width_index]|, in design-size 271 units; this is a device for compressing information, since many characters 272 have the same width. Since it is quite common for many characters 273 to have the same height, depth, or italic correction, the \.{TFM} format 274 imposes a limit of 16 different heights, 16 different depths, and 275 64 different italic corrections. 276 277 Incidentally, the relation |width[0]=height[0]=depth[0]=italic[0]=0| 278 should always hold, so that an index of zero implies a value of zero. 279 The |width_index| should never be zero unless the character does 280 not exist in the font, since a character is valid if and only if it lies 281 between |bc| and |ec| and has a nonzero |width_index|. 282 283 @ The |tag| field in a |char_info_word| has four values that explain how to 284 interpret the |remainder| field. 285 286 \yskip\hang|tag=0| (|no_tag|) means that |remainder| is unused.\par 287 \hang|tag=1| (|lig_tag|) means that this character has a ligature/kerning 288 program starting at |lig_kern[remainder]|.\par 289 \hang|tag=2| (|list_tag|) means that this character is part of a chain of 290 characters of ascending sizes, and not the largest in the chain. The 291 |remainder| field gives the character code of the next larger character.\par 292 \hang|tag=3| (|ext_tag|) means that this character code represents an 293 extensible character, i.e., a character that is built up of smaller pieces 294 so that it can be made arbitrarily large. The pieces are specified in 295 |exten[remainder]|.\par 296 297 @d no_tag=0 {vanilla character} 298 @d lig_tag=1 {character has a ligature/kerning program} 299 @d list_tag=2 {character has a successor in a charlist} 300 @d ext_tag=3 {character is extensible} 301 302 @ The |lig_kern| array contains instructions in a simple programming language 303 that explains what to do for special letter pairs. Each word is a 304 |lig_kern_command| of four bytes. 305 306 \yskip\hang first byte: |skip_byte|, indicates that this is the final program 307 step if the byte is 128 or more, otherwise the next step is obtained by 308 skipping this number of intervening steps.\par 309 \hang second byte: |next_char|, ``if |next_char| follows the current character, 310 then perform the operation and stop, otherwise continue.''\par 311 \hang third byte: |op_byte|, indicates a ligature step if less than~128, 312 a kern step otherwise.\par 313 \hang fourth byte: |remainder|.\par 314 \yskip\noindent 315 In a kern step, an 316 additional space equal to |kern[256*(op_byte-128)+remainder]| is inserted 317 between the current character and |next_char|. This amount is 318 often negative, so that the characters are brought closer together 319 by kerning; but it might be positive. 320 321 There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where 322 $0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is 323 |remainder| is inserted between the current character and |next_char|; 324 then the current character is deleted if $b=0$, and |next_char| is 325 deleted if $c=0$; then we pass over $a$~characters to reach the next 326 current character (which may have a ligature/kerning program of its own). 327 328 Notice that if $a=0$ and $b=1$, the current character is unchanged; if 329 $a=b$ and $c=1$, the current character is changed but the next character is 330 unchanged. \.{TFtoPL} will check to see that infinite loops are avoided. 331 332 If the very first instruction of the |lig_kern| array has |skip_byte=255|, 333 the |next_char| byte is the so-called right boundary character of this font; 334 the value of |next_char| need not lie between |bc| and~|ec|. 335 If the very last instruction of the |lig_kern| array has |skip_byte=255|, 336 there is a special ligature/kerning program for a left boundary character, 337 beginning at location |256*op_byte+remainder|. 338 The interpretation is that \TeX\ puts implicit boundary characters 339 before and after each consecutive string of characters from the same font. 340 These implicit characters do not appear in the output, but they can affect 341 ligatures and kerning. 342 343 If the very first instruction of a character's |lig_kern| program has 344 |skip_byte>128|, the program actually begins in location 345 |256*op_byte+remainder|. This feature allows access to large |lig_kern| 346 arrays, because the first instruction must otherwise 347 appear in a location |<=255|. 348 349 Any instruction with |skip_byte>128| in the |lig_kern| array must have 350 |256*op_byte+remainder<nl|. If such an instruction is encountered during 351 normal program execution, it denotes an unconditional halt; no ligature 352 command is performed. 353 354 @d stop_flag=128 {value indicating `\.{STOP}' in a lig/kern program} 355 @d kern_flag=128 {op code for a kern step} 356 357 @ Extensible characters are specified by an |extensible_recipe|, 358 which consists of four bytes called |top|, |mid|, 359 |bot|, and |rep| (in this order). These bytes are the character codes 360 of individual pieces used to build up a large symbol. 361 If |top|, |mid|, or |bot| are zero, 362 they are not present in the built-up result. For example, an extensible 363 vertical line is like an extensible bracket, except that the top and 364 bottom pieces are missing. 365 366 367 @ The final portion of a \.{TFM} file is the |param| array, which is another 368 sequence of |fix_word| values. 369 370 \yskip\hang|param[1]=@!slant| is the amount of italic slant, which is used 371 to help position accents. For example, |slant=.25| means that when you go 372 up one unit, you also go .25 units to the right. The |slant| is a pure 373 number; it's the only |fix_word| other than the design size itself that is 374 not scaled by the design size. 375 376 \hang|param[2]=space| is the normal spacing between words in text. 377 Note that character |" "| in the font need not have anything to do with 378 blank spaces. 379 380 \hang|param[3]=space_stretch| is the amount of glue stretching between words. 381 382 \hang|param[4]=space_shrink| is the amount of glue shrinking between words. 383 384 \hang|param[5]=x_height| is the height of letters for which accents don't 385 have to be raised or lowered. 386 387 \hang|param[6]=quad| is the size of one em in the font. 388 389 \hang|param[7]=extra_space| is the amount added to |param[2]| at the 390 ends of sentences. 391 392 When the character coding scheme is \.{TeX math symbols}, the font is 393 supposed to have 15 additional parameters called |num1|, |num2|, |num3|, 394 |denom1|, |denom2|, |sup1|, |sup2|, |sup3|, |sub1|, |sub2|, |supdrop|, 395 |subdrop|, |delim1|, |delim2|, and |axis_height|, respectively. When the 396 character coding scheme is \.{TeX math extension}, the font is supposed to 397 have six additional parameters called |default_rule_thickness| and 398 |big_op_spacing1| through |big_op_spacing5|. 399 400 @ So that is what \.{TFM} files hold. The next question is, ``What about 401 \.{PL} files?'' A complete answer to that question appears in the 402 documentation of the companion program, \.{PLtoTF}, so it will not 403 be repeated here. Suffice it to say that a \.{PL} file is an ordinary 404 \PASCAL\ text file, and that the output of \.{TFtoPL} uses only a 405 subset of the possible constructions that might appear in a \.{PL} file. 406 Furthermore, hardly anybody really wants to look at the formal 407 definition of \.{PL} format, because it is almost self-explanatory when 408 you see an example or two. 409 410 @<Glob...@>= 411 @!pl_file:text; 412 413 @ @<Set init...@>= 414 rewrite(pl_file); 415 416 @* Unpacked representation. 417 The first thing \.{TFtoPL} does is read the entire |tfm_file| into an array of 418 bytes, |tfm[0..(4*lf-1)]|. 419 420 @<Types...@>= 421 @!byte=0..255; {unsigned eight-bit quantity} 422 @!index=0..tfm_size; {address of a byte in |tfm|} 423 424 @ @<Glob...@>= 425 @!tfm:array [-1000..tfm_size] of byte; {the input data all goes here} 426 {the negative addresses avoid range checks for invalid characters} 427 428 @ The input may, of course, be all screwed up and not a \.{TFM} file 429 at all. So we begin cautiously. 430 431 @d abort(#)==begin print_ln(#); 432 print_ln('Sorry, but I can''t go on; are you sure this is a TFM?'); 433 goto final_end; 434 end 435 436 @<Read the whole input file@>= 437 read(tfm_file,tfm[0]); 438 if tfm[0]>127 then abort('The first byte of the input file exceeds 127!'); 439 @.The first byte...@> 440 if eof(tfm_file) then abort('The input file is only one byte long!'); 441 @.The input...one byte long@> 442 read(tfm_file,tfm[1]); lf:=tfm[0]*@'400+tfm[1]; 443 if lf=0 then 444 abort('The file claims to have length zero, but that''s impossible!'); 445 @.The file claims...@> 446 if 4*lf-1>tfm_size then abort('The file is bigger than I can handle!'); 447 @.The file is bigger...@> 448 for tfm_ptr:=2 to 4*lf-1 do 449 begin if eof(tfm_file) then 450 abort('The file has fewer bytes than it claims!'); 451 @.The file has fewer bytes...@> 452 read(tfm_file,tfm[tfm_ptr]); 453 end; 454 if not eof(tfm_file) then 455 begin print_ln('There''s some extra junk at the end of the TFM file,'); 456 @.There's some extra junk...@> 457 print_ln('but I''ll proceed as if it weren''t there.'); 458 end 459 460 @ After the file has been read successfully, we look at the subfile sizes 461 to see if they check out. 462 463 @d eval_two_bytes(#)==begin if tfm[tfm_ptr]>127 then 464 abort('One of the subfile sizes is negative!'); 465 @.One of the subfile sizes...@> 466 #:=tfm[tfm_ptr]*@'400+tfm[tfm_ptr+1]; 467 tfm_ptr:=tfm_ptr+2; 468 end 469 470 @<Set subfile sizes |lh|, |bc|, \dots, |np|@>= 471 begin tfm_ptr:=2;@/ 472 eval_two_bytes(lh); 473 eval_two_bytes(bc); 474 eval_two_bytes(ec); 475 eval_two_bytes(nw); 476 eval_two_bytes(nh); 477 eval_two_bytes(nd); 478 eval_two_bytes(ni); 479 eval_two_bytes(nl); 480 eval_two_bytes(nk); 481 eval_two_bytes(ne); 482 eval_two_bytes(np); 483 if lh<2 then abort('The header length is only ',lh:1,'!'); 484 @.The header length...@> 485 if nl>lig_size then 486 abort('The lig/kern program is longer than I can handle!'); 487 @.The lig/kern program...@> 488 if (bc>ec+1)or(ec>255) then abort('The character code range ', 489 @.The character code range...@> 490 bc:1,'..',ec:1,' is illegal!'); 491 if (nw=0)or(nh=0)or(nd=0)or(ni=0) then 492 abort('Incomplete subfiles for character dimensions!'); 493 @.Incomplete subfiles...@> 494 if ne>256 then abort('There are ',ne:1,' extensible recipes!'); 495 @.There are ... recipes@> 496 if lf<>6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then 497 abort('Subfile sizes don''t add up to the stated total!'); 498 @.Subfile sizes don't add up...@> 499 end 500 501 @ Once the input data successfully passes these basic checks, 502 \.{TFtoPL} believes that it is a \.{TFM} file, and the conversion 503 to \.{PL} format will take place. Access to the various subfiles 504 is facilitated by computing the following base addresses. For example, 505 the |char_info| for character |c| will start in location 506 |4*(char_base+c)| of the |tfm| array. 507 508 @<Globals...@>= 509 @!char_base,@!width_base,@!height_base,@!depth_base,@!italic_base, 510 @!lig_kern_base,@!kern_base,@!exten_base,@!param_base:integer; 511 {base addresses for the subfiles} 512 513 @ @<Compute the base addresses@>= 514 begin char_base:=6+lh-bc; 515 width_base:=char_base+ec+1; 516 height_base:=width_base+nw; 517 depth_base:=height_base+nh; 518 italic_base:=depth_base+nd; 519 lig_kern_base:=italic_base+ni; 520 kern_base:=lig_kern_base+nl; 521 exten_base:=kern_base+nk; 522 param_base:=exten_base+ne-1; 523 end 524 525 @ Of course we want to define macros that suppress the detail of how the 526 font information is actually encoded. Each word will be referred to by 527 the |tfm| index of its first byte. For example, if |c| is a character 528 code between |bc| and |ec|, then |tfm[char_info(c)]| will be the 529 first byte of its |char_info|, i.e., the |width_index|; furthermore 530 |width(c)| will point to the |fix_word| for |c|'s width. 531 532 @d check_sum=24 533 @d design_size=check_sum+4 534 @d scheme=design_size+4 535 @d family=scheme+40 536 @d random_word=family+20 537 @d char_info(#)==4*(char_base+#) 538 @d width_index(#)==tfm[char_info(#)] 539 @d nonexistent(#)==((#<bc)or(#>ec)or(width_index(#)=0)) 540 @d height_index(#)==(tfm[char_info(#)+1] div 16) 541 @d depth_index(#)==(tfm[char_info(#)+1] mod 16) 542 @d italic_index(#)==(tfm[char_info(#)+2] div 4) 543 @d tag(#)==(tfm[char_info(#)+2] mod 4) 544 @d reset_tag(#)==tfm[char_info(#)+2]:=4*italic_index(#)+no_tag 545 @d remainder(#)==tfm[char_info(#)+3] 546 @d width(#)==4*(width_base+width_index(#)) 547 @d height(#)==4*(height_base+height_index(#)) 548 @d depth(#)==4*(depth_base+depth_index(#)) 549 @d italic(#)==4*(italic_base+italic_index(#)) 550 @d exten(#)==4*(exten_base+remainder(#)) 551 @d lig_step(#)==4*(lig_kern_base+(#)) 552 @d kern(#)==4*(kern_base+#) {here \#\ is an index, not a character} 553 @d param(#)==4*(param_base+#) {likewise} 554 555 @ One of the things we would like to do is take cognizance of fonts whose 556 character coding scheme is \.{TeX math symbols} or \.{TeX math extension}; 557 we will set the |font_type| variable to one of the three choices 558 |vanilla|, |mathsy|, or |mathex|. 559 560 @d vanilla=0 {not a special scheme} 561 @d mathsy=1 {\.{TeX math symbols} scheme} 562 @d mathex=2 {\.{TeX math extension} scheme} 563 564 @<Glob...@>= 565 @!font_type:vanilla..mathex; {is this font special?} 566 567 @* Basic output subroutines. 568 Let us now define some procedures that will reduce the rest of \.{TFtoPL}'s 569 work to a triviality. 570 571 First of all, it is convenient to have an abbreviation for output to the 572 \.{PL} file: 573 574 @d out(#)==write(pl_file,#) 575 576 @ In order to stick to standard \PASCAL, we use three strings called 577 |ASCII_04|, |ASCII_10|, and |ASCII_14|, in terms of which we can do the 578 appropriate conversion of ASCII codes. Three other little strings are 579 used to produce |face| codes like \.{MIE}. 580 581 @<Glob...@>= 582 @!ASCII_04,@!ASCII_10,@!ASCII_14: packed array [1..32] of char; 583 {strings for output in the user's external character set} 584 @!MBL_string,@!RI_string,@!RCE_string:packed array [1..3] of char; 585 {handy string constants for |face| codes} 586 587 @ @<Set init...@>= 588 ASCII_04:=' !"#$%&''()*+,-./0123456789:;<=>?';@/ 589 ASCII_10:='@@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_';@/ 590 ASCII_14:='`abcdefghijklmnopqrstuvwxyz{|}~ ';@/ 591 MBL_string:='MBL'; RI_string:='RI '; RCE_string:='RCE'; 592 593 @ The array |dig| will hold a sequence of digits to be output. 594 595 @<Glob...@>= 596 @!dig:array[0..11] of 0..9; 597 598 @ Here, in fact, are two procedures that output |dig[j-1]|$\,\ldots\,$|dig[0]|, 599 given $j>0$. 600 601 @p procedure out_digs(j:integer); {outputs |j| digits} 602 begin repeat decr(j); out(dig[j]:1); 603 until j=0; 604 end; 605 @# 606 procedure print_digs(j:integer); {prints |j| digits} 607 begin repeat decr(j); print(dig[j]:1); 608 until j=0; 609 end; 610 611 @ The |print_octal| procedure indicates how |print_digs| can be used. 612 Since this procedure is used only to print character codes, it always 613 produces three digits. 614 615 @p procedure print_octal(c:byte); {prints octal value of |c|} 616 var j:0..2; {index into |dig|} 617 begin print(''''); {an apostrophe indicates the octal notation} 618 for j:=0 to 2 do 619 begin dig[j]:=c mod 8; c:=c div 8; 620 end; 621 print_digs(3); 622 end; 623 624 @ A \.{PL} file has nested parentheses, and we want to format the output 625 so that its structure is clear. The |level| variable keeps track of the 626 depth of nesting. 627 628 @<Glob...@>= 629 @!level:0..5; 630 631 @ @<Set init...@>= 632 level:=0; 633 634 @ Three simple procedures suffice to produce the desired structure in the 635 output. 636 637 @p procedure out_ln; {finishes one line, indents the next} 638 var l:0..5; 639 begin write_ln(pl_file); 640 for l:=1 to level do out(' '); 641 end; 642 @# 643 procedure left; {outputs a left parenthesis} 644 begin incr(level); out('('); 645 end; 646 @# 647 procedure right; {outputs a right parenthesis and finishes a line} 648 begin decr(level); out(')'); out_ln; 649 end; 650 651 @ The value associated with a property can be output in a variety of 652 ways. For example, we might want to output a {\mc BCPL} string that 653 begins in |tfm[k]|: 654 655 @p procedure out_BCPL(@!k:index); {outputs a string, preceded by a blank space} 656 var l:0..39; {the number of bytes remaining} 657 begin out(' '); l:=tfm[k]; 658 while l>0 do 659 begin incr(k); decr(l); 660 case tfm[k] div @'40 of 661 1: out(ASCII_04[1+(tfm[k] mod @'40)]); 662 2: out(ASCII_10[1+(tfm[k] mod @'40)]); 663 3: out(ASCII_14[1+(tfm[k] mod @'40)]); 664 end; 665 end; 666 end; 667 668 @ The property value might also be a sequence of |l| bytes, beginning 669 in |tfm[k]|, that we would like to output in octal notation. 670 The following procedure assumes that |l<=4|, but larger values of |l| 671 could be handled easily by enlarging the |dig| array and increasing 672 the upper bounds on |b| and |j|. 673 674 @p procedure out_octal(@!k,@!l:index); {outputs |l| bytes in octal} 675 var a:0..@'1777; {accumulator for bits not yet output} 676 @!b:0..32; {the number of significant bits in |a|} 677 @!j:0..11; {the number of digits of output} 678 begin out(' O '); {specify octal format} 679 a:=0; b:=0; j:=0; 680 while l>0 do @<Reduce \(1)|l| by one, preserving the invariants@>; 681 while (a>0)or(j=0) do 682 begin dig[j]:=a mod 8; a:=a div 8; incr(j); 683 end; 684 out_digs(j); 685 end; 686 687 @ @<Reduce \(1)|l|...@>= 688 begin decr(l); 689 if tfm[k+l]<>0 then 690 begin while b>2 do 691 begin dig[j]:=a mod 8; a:=a div 8; b:=b-3; incr(j); 692 end; 693 case b of 694 0: a:=tfm[k+l]; 695 1:a:=a+2*tfm[k+l]; 696 2:a:=a+4*tfm[k+l]; 697 end; 698 end; 699 b:=b+8; 700 end 701 702 @ The property value may be a character, which is output in octal 703 unless it is a letter or a digit. This procedure is the only place 704 where a lowercase letter will be output to the \.{PL} file. 705 @^system dependencies@> 706 707 @p procedure out_char(@!c:byte); {outputs a character} 708 begin if font_type>vanilla then 709 begin tfm[0]:=c; out_octal(0,1) 710 end 711 else if (c>="0")and(c<="9") then 712 out(' C ',c-"0":1) 713 else if (c>="A")and(c<="Z") then 714 out(' C ',ASCII_10[c-"A"+2]) 715 else if (c>="a")and(c<="z") then 716 out(' C ',ASCII_14[c-"a"+2]) 717 else begin tfm[0]:=c; out_octal(0,1); 718 end; 719 end; 720 721 @ The property value might be a ``face'' byte, which is output in the 722 curious code mentioned earlier, provided that it is less than 18. 723 724 @p procedure out_face(@!k:index); {outputs a |face|} 725 var s:0..1; {the slope} 726 @!b:0..8; {the weight and expansion} 727 begin if tfm[k]>=18 then out_octal(k,1) 728 else begin out(' F '); {specify face-code format} 729 s:=tfm[k] mod 2; b:=tfm[k] div 2; 730 out(MBL_string[1+(b mod 3)]); 731 out(RI_string[1+s]); 732 out(RCE_string[1+(b div 3)]); 733 end; 734 end; 735 736 @ And finally, the value might be a |fix_word|, which is output in 737 decimal notation with just enough decimal places for \.{PLtoTF} 738 to recover every bit of the given |fix_word|. 739 740 All of the numbers involved in the intermediate calculations of 741 this procedure will be nonnegative and less than $10\cdot2^{24}$. 742 743 @p procedure out_fix(@!k:index); {outputs a |fix_word|} 744 var a:0..@'7777; {accumulator for the integer part} 745 @!f:integer; {accumulator for the fraction part} 746 @!j:0..12; {index into |dig|} 747 @!delta:integer; {amount if allowable inaccuracy} 748 begin out(' R '); {specify real format} 749 a:=(tfm[k]*16)+(tfm[k+1] div 16); 750 f:=((tfm[k+1] mod 16)*@'400+tfm[k+2])*@'400+tfm[k+3]; 751 if a>@'3777 then @<Reduce \(2)negative to positive@>; 752 @<Output the integer part, |a|, in decimal notation@>; 753 @<Output the fraction part, $|f|/2^{20}$, in decimal notation@>; 754 end; 755 756 @ The following code outputs at least one digit even if |a=0|. 757 758 @<Output the integer...@>= 759 begin j:=0; 760 repeat dig[j]:=a mod 10; a:=a div 10; incr(j); 761 until a=0; 762 out_digs(j); 763 end 764 765 @ And the following code outputs at least one digit to the right 766 of the decimal point. 767 768 @<Output the fraction...@>= 769 begin out('.'); f:=10*f+5; delta:=10; 770 repeat if delta>@'4000000 then f:=f+@'2000000-(delta div 2); 771 out(f div @'4000000:1); f:=10*(f mod @'4000000); delta:=delta*10; 772 until f<=delta; 773 end; 774 775 @ @<Reduce \(2)negative to positive@>= 776 begin out('-'); a:=@'10000-a; 777 if f>0 then 778 begin f:=@'4000000-f; decr(a); 779 end; 780 end 781 782 @* Doing it. 783 \TeX\ checks the information of a \.{TFM} file for validity as the 784 file is being read in, so that no further checks will be needed when 785 typesetting is going on. And when it finds something wrong, it just 786 calls the file ``bad,'' without identifying the nature of the problem, 787 since \.{TFM} files are supposed to be good almost all of the time. 788 789 Of course, a bad file shows up every now and again, and that's where 790 \.{TFtoPL} comes in. This program wants to catch at least as many errors as 791 \TeX\ does, and to give informative error messages besides. 792 All of the errors are corrected, so that the \.{PL} output will 793 be correct (unless, of course, the \.{TFM} file was so loused up 794 that no attempt is being made to fathom it). 795 796 @ Just before each character is processed, its code is printed in octal 797 notation. Up to eight such codes appear on a line; so we have a variable 798 to keep track of how many are currently there. We also keep track of 799 whether or not any errors have had to be corrected. 800 801 @<Glob...@>= 802 @!chars_on_line:0..8; {the number of characters printed on the current line} 803 @!perfect:boolean; {was the file free of errors?} 804 805 @ @<Set init...@>= 806 chars_on_line:=0;@/ 807 perfect:=true; {innocent until proved guilty} 808 809 @ Error messages are given with the help of the |bad| and |range_error| 810 and |bad_char| macros: 811 812 @d bad(#)==begin perfect:=false; if chars_on_line>0 then print_ln(' '); 813 chars_on_line:=0; print_ln('Bad TFM file: ',#); 814 end 815 @.Bad TFM file@> 816 @d range_error(#)==begin perfect:=false; print_ln(' '); 817 print(#,' index for character '); 818 print_octal(c); print_ln(' is too large;'); 819 print_ln('so I reset it to zero.'); 820 end 821 @d bad_char_tail(#)==print_octal(#); print_ln('.'); 822 end 823 @d bad_char(#)==begin perfect:=false; if chars_on_line>0 then print_ln(' '); 824 chars_on_line:=0; print('Bad TFM file: ',#,' nonexistent character '); 825 bad_char_tail 826 @d correct_bad_char_tail(#)==print_octal(tfm[#]); print_ln('.'); tfm[#]:=bc; 827 end 828 @d correct_bad_char(#)== begin perfect:=false; 829 if chars_on_line>0 then print_ln(' '); 830 chars_on_line:=0; print('Bad TFM file: ',#,' nonexistent character '); 831 correct_bad_char_tail 832 833 @<Glob...@>= 834 @!i:0..@'77777; {an index to words of a subfile} 835 @!c:0..256; {a random character} 836 @!d:0..3; {byte number in a word} 837 @!k:index; {a random index} 838 @!r:0..65535; {a random two-byte value} 839 @!count:0..127; {for when we need to enumerate a small set} 840 841 @ There are a lot of simple things to do, and they have to be done one 842 at a time, so we might as well get down to business. The first things 843 that \.{TFtoPL} will put into the \.{PL} file appear in the header part. 844 845 @<Do the header@>= 846 begin font_type:=vanilla; 847 if lh>=12 then 848 begin @<Set the true |font_type|@>; 849 if lh>=17 then 850 begin @<Output the family name@>; 851 if lh>=18 then @<Output the rest of the header@>; 852 end; 853 @<Output the character coding scheme@>; 854 end; 855 @<Output the design size@>; 856 @<Output the check sum@>; 857 @<Output the |seven_bit_safe_flag|@>; 858 end 859 860 @ @<Output the check sum@>= 861 left; out('CHECKSUM'); out_octal(check_sum,4); 862 right 863 864 @ Incorrect design sizes are changed to 10 points. 865 866 @d bad_design(#)==begin bad('Design size ',#,'!'); 867 @.Design size wrong@> 868 print_ln('I''ve set it to 10 points.'); 869 out(' D 10'); 870 end 871 872 @ @<Output the design size@>= 873 left; out('DESIGNSIZE'); 874 if tfm[design_size]>127 then bad_design('negative') 875 else if (tfm[design_size]=0)and(tfm[design_size+1]<16) then 876 bad_design('too small') 877 else out_fix(design_size); 878 right; 879 out('(COMMENT DESIGNSIZE IS IN POINTS)'); out_ln; 880 out('(COMMENT OTHER SIZES ARE MULTIPLES OF DESIGNSIZE)'); out_ln 881 @.DESIGNSIZE IS IN POINTS@> 882 883 @ Since we have to check two different {\mc BCPL} strings for validity, 884 we might as well write a subroutine to make the check. 885 886 @p procedure check_BCPL(@!k,@!l:index); {checks a string of length |<l|} 887 var j:index; {runs through the string} 888 @!c:byte; {character being checked} 889 begin if tfm[k]>=l then 890 begin bad('String is too long; I''ve shortened it drastically.'); 891 @.String is too long...@> 892 tfm[k]:=1; 893 end; 894 for j:=k+1 to k+tfm[k] do 895 begin c:=tfm[j]; 896 if (c="(")or(c=")") then 897 begin bad('Parenthesis in string has been changed to slash.'); 898 @.Parenthesis...changed to slash@> 899 tfm[j]:="/"; 900 end 901 else if (c<" ")or(c>"~") then 902 begin bad('Nonstandard ASCII code has been blotted out.'); 903 @.Nonstandard ASCII code...@> 904 tfm[j]:="?"; 905 end 906 else if (c>="a")and(c<="z") then tfm[j]:=c+"A"-"a"; {upper-casify letters} 907 end; 908 end; 909 910 @ The |font_type| starts out |vanilla|; possibly we need to reset it. 911 912 @<Set the true |font_type|@>= 913 begin check_BCPL(scheme,40); 914 if (tfm[scheme]>=11)and@|(tfm[scheme+1]="T")and@| 915 (tfm[scheme+2]="E")and@|(tfm[scheme+3]="X")and@| 916 (tfm[scheme+4]=" ")and@|(tfm[scheme+5]="M")and@| 917 (tfm[scheme+6]="A")and@|(tfm[scheme+7]="T")and@| 918 (tfm[scheme+8]="H")and@|(tfm[scheme+9]=" ") then 919 begin if (tfm[scheme+10]="S")and(tfm[scheme+11]="Y") then font_type:=mathsy 920 else if (tfm[scheme+10]="E")and(tfm[scheme+11]="X") then font_type:=mathex; 921 end; 922 end 923 924 @ @<Output the character coding scheme@>= 925 left; out('CODINGSCHEME'); 926 out_BCPL(scheme); 927 right 928 929 @ @<Output the family name@>= 930 left; out('FAMILY'); 931 check_BCPL(family,20); 932 out_BCPL(family); 933 right 934 935 @ @<Output the rest of the header@>= 936 begin left; out('FACE'); out_face(random_word+3); right; 937 for i:=18 to lh-1 do 938 begin left; out('HEADER D ',i:1); 939 out_octal(check_sum+4*i,@,4); right; 940 end; 941 end 942 943 @ This program does not check to see if the |seven_bit_safe_flag| has the 944 correct setting, i.e., if it really reflects the seven-bit-safety of 945 the \.{TFM} file; the stated value is merely put into the \.{PL} file. 946 The \.{PLtoTF} program will store a correct value and give a warning 947 message if a file falsely claims to be safe. 948 949 @<Output the |seven_bit_safe_flag|@>= 950 if (lh>17) and (tfm[random_word]>127) then 951 begin left; out('SEVENBITSAFEFLAG TRUE'); right; 952 end 953 954 @ The next thing to take care of is the list of parameters. 955 956 @<Do the parameters@>= 957 if np>0 then 958 begin left; out('FONTDIMEN'); out_ln; 959 for i:=1 to np do @<Check and output the $i$th parameter@>; 960 right; 961 end; 962 @<Check to see if |np| is complete for this font type@>; 963 964 @ @<Check to see if |np|...@>= 965 if (font_type=mathsy)and(np<>22) then 966 print_ln('Unusual number of fontdimen parameters for a math symbols font (', 967 @.Unusual number of fontdimen...@> 968 np:1,' not 22).') 969 else if (font_type=mathex)and(np<>13) then 970 print_ln('Unusual number of fontdimen parameters for an extension font (', 971 np:1,' not 13).') 972 973 @ All |fix_word| values except the design size and the first parameter 974 will be checked to make sure that they are less than 16.0 in magnitude, 975 using the |check_fix| macro: 976 977 @d check_fix_tail(#)==bad(#,' ',i:1,' is too big;'); 978 print_ln('I have set it to zero.'); 979 end 980 @d check_fix(#)==if (tfm[#]>0)and(tfm[#]<255) then 981 begin tfm[#]:=0; tfm[(#)+1]:=0; tfm[(#)+2]:=0; tfm[(#)+3]:=0; 982 check_fix_tail 983 984 @<Check and output the $i$th parameter@>= 985 begin left; 986 if i=1 then out('SLANT') {this parameter is not checked} 987 else begin check_fix(param(i))('Parameter');@/ 988 @.Parameter n is too big@> 989 @<Output the name of parameter $i$@>; 990 end; 991 out_fix(param(i)); right; 992 end 993 994 @ @<Output the name...@>= 995 if i<=7 then case i of 996 2:out('SPACE');@+3:out('STRETCH');@+4:out('SHRINK'); 997 5:out('XHEIGHT');@+6:out('QUAD');@+7:out('EXTRASPACE')@+end 998 else if (i<=22)and(font_type=mathsy) then case i of 999 8:out('NUM1');@+9:out('NUM2');@+10:out('NUM3'); 1000 11:out('DENOM1');@+12:out('DENOM2'); 1001 13:out('SUP1');@+14:out('SUP2');@+15:out('SUP3'); 1002 16:out('SUB1');@+17:out('SUB2'); 1003 18:out('SUPDROP');@+19:out('SUBDROP'); 1004 20:out('DELIM1');@+21:out('DELIM2'); 1005 22:out('AXISHEIGHT')@+end 1006 else if (i<=13)and(font_type=mathex) then 1007 if i=8 then out('DEFAULTRULETHICKNESS') 1008 else out('BIGOPSPACING',i-8:1) 1009 else out('PARAMETER D ',i:1) 1010 1011 @ We need to check the range of all the remaining |fix_word| values, 1012 and to make sure that |width[0]=0|, etc. 1013 1014 @d nonzero_fix(#)==(tfm[#]>0)or(tfm[#+1]>0)or(tfm[#+2]>0)or(tfm[#+3]>0) 1015 1016 @<Check the |fix_word| entries@>= 1017 if nonzero_fix(4*width_base) then bad('width[0] should be zero.'); 1018 @.should be zero@> 1019 if nonzero_fix(4*height_base) then bad('height[0] should be zero.'); 1020 if nonzero_fix(4*depth_base) then bad('depth[0] should be zero.'); 1021 if nonzero_fix(4*italic_base) then bad('italic[0] should be zero.'); 1022 for i:=0 to nw-1 do check_fix(4*(width_base+i))('Width'); 1023 @.Width n is too big@> 1024 for i:=0 to nh-1 do check_fix(4*(height_base+i))('Height'); 1025 @.Height n is too big@> 1026 for i:=0 to nd-1 do check_fix(4*(depth_base+i))('Depth'); 1027 @.Depth n is too big@> 1028 for i:=0 to ni-1 do check_fix(4*(italic_base+i))('Italic correction'); 1029 @.Italic correction n is too big@> 1030 if nk>0 then for i:=0 to nk-1 do check_fix(kern(i))('Kern'); 1031 @.Kern n is too big@> 1032 1033 @ The ligature/kerning program comes next. Before we can put it out in 1034 \.{PL} format, we need to make a table of ``labels'' that will be inserted 1035 into the program. For each character |c| whose |tag| is |lig_tag| and 1036 whose starting address is |r|, we will store the pair |(c,r)| in the 1037 |label_table| array. If there's a boundary-char program starting at~|r|, 1038 we also store the pair |(256,r)|. 1039 This array is sorted by its second components, using the 1040 simple method of straight insertion. 1041 1042 @<Glob...@>= 1043 @!label_table:array[0..258] of record@t@>@/@!cc:0..256;@!rr:0..lig_size;end; 1044 @!label_ptr: 0..257; {the largest entry in |label_table|} 1045 @!sort_ptr:0..257; {index into |label_table|} 1046 @!boundary_char:0..256; {boundary character, or 256 if none} 1047 @!bchar_label:0..@'77777; {beginning of boundary character program} 1048 1049 @ @<Set init...@>= 1050 boundary_char:=256; bchar_label:=@'77777;@/ 1051 label_ptr:=0; label_table[0].rr:=0; {a sentinel appears at the bottom} 1052 1053 @ We'll also identify and remove inaccessible program steps, using the 1054 |activity| array. 1055 1056 @d unreachable=0 {a program step not known to be reachable} 1057 @d pass_through=1 {a program step passed through on initialization} 1058 @d accessible=2 {a program step that can be relevant} 1059 1060 @<Glob...@>= 1061 @!activity:array[0..lig_size] of unreachable..accessible; 1062 @!ai,@!acti:0..lig_size; {indices into |activity|} 1063 1064 @ @<Do the ligatures and kerns@>= 1065 if nl>0 then 1066 begin for ai:=0 to nl-1 do activity[ai]:=unreachable; 1067 @<Check for a boundary char@>; 1068 end; 1069 @<Build the label table@>; 1070 if nl>0 then 1071 begin left; out('LIGTABLE'); out_ln;@/ 1072 @<Compute the |activity| array@>; 1073 @<Output and correct the ligature/kern program@>; 1074 right; 1075 @<Check for ligature cycles@>; 1076 end 1077 1078 @ We build the label table even when |nl=0|, because this catches errors 1079 that would not otherwise be detected. 1080 1081 @<Build...@>= 1082 for c:=bc to ec do if tag(c)=lig_tag then 1083 begin r:=remainder(c); 1084 if r<nl then 1085 begin if tfm[lig_step(r)]>stop_flag then 1086 begin r:=256*tfm[lig_step(r)+2]+tfm[lig_step(r)+3]; 1087 if r<nl then if activity[remainder(c)]=unreachable then 1088 activity[remainder(c)]:=pass_through; 1089 end; 1090 end; 1091 if r>=nl then 1092 begin perfect:=false; print_ln(' '); 1093 print('Ligature/kern starting index for character '); print_octal(c); 1094 print_ln(' is too large;'); print_ln('so I removed it.'); reset_tag(c); 1095 @.Ligature/kern starting index...@> 1096 end 1097 else @<Insert |(c,r)| into |label_table|@>; 1098 end; 1099 label_table[label_ptr+1].rr:=lig_size; {put ``infinite'' sentinel at the end} 1100 1101 @ @<Insert |(c,r)|...@>= 1102 begin sort_ptr:=label_ptr; {there's a hole at position |sort_ptr+1|} 1103 while label_table[sort_ptr].rr>r do 1104 begin label_table[sort_ptr+1]:=label_table[sort_ptr]; 1105 decr(sort_ptr); {move the hole} 1106 end; 1107 label_table[sort_ptr+1].cc:=c; 1108 label_table[sort_ptr+1].rr:=r; {fill the hole} 1109 incr(label_ptr); activity[r]:=accessible; 1110 end 1111 1112 @ @<Check for a bound...@>= 1113 if tfm[lig_step(0)]=255 then 1114 begin left; out('BOUNDARYCHAR'); 1115 boundary_char:=tfm[lig_step(0)+1]; out_char(boundary_char); right; 1116 activity[0]:=pass_through; 1117 end; 1118 if tfm[lig_step(nl-1)]=255 then 1119 begin r:=256*tfm[lig_step(nl-1)+2]+tfm[lig_step(nl-1)+3]; 1120 if r>=nl then 1121 begin perfect:=false; print_ln(' '); 1122 print('Ligature/kern starting index for boundarychar is too large;'); 1123 print_ln('so I removed it.'); 1124 @.Ligature/kern starting index...@> 1125 end 1126 else begin label_ptr:=1; label_table[1].cc:=256; label_table[1].rr:=r; 1127 bchar_label:=r; activity[r]:=accessible; 1128 end; 1129 activity[nl-1]:=pass_through; 1130 end 1131 1132 @ @<Compute the |activity| array@>= 1133 for ai:=0 to nl-1 do if activity[ai]=accessible then 1134 begin r:=tfm[lig_step(ai)]; 1135 if r<stop_flag then 1136 begin r:=r+ai+1; 1137 if r>=nl then 1138 begin bad('Ligature/kern step ',ai:1,' skips too far;'); 1139 @.Lig...skips too far@> 1140 print_ln('I made it stop.'); tfm[lig_step(ai)]:=stop_flag; 1141 end 1142 else activity[r]:=accessible; 1143 end; 1144 end 1145 1146 @ We ignore |pass_through| items, which don't need to be mentioned in 1147 the \.{PL} file. 1148 1149 @<Output and correct the ligature...@>= 1150 sort_ptr:=1; {point to the next label that will be needed} 1151 for acti:=0 to nl-1 do if activity[acti]<>pass_through then 1152 begin i:=acti; @<Take care of commenting out unreachable steps@>; 1153 @<Output any labels for step $i$@>; 1154 @<Output step $i$ of the ligature/kern program@>; 1155 end; 1156 if level=2 then right {the final step was unreachable} 1157 1158 @ @<Output any labels...@>= 1159 while i=label_table[sort_ptr].rr do 1160 begin left; out('LABEL'); 1161 if label_table[sort_ptr].cc=256 then out(' BOUNDARYCHAR') 1162 else out_char(label_table[sort_ptr].cc); 1163 right; incr(sort_ptr); 1164 end 1165 1166 @ @<Take care of commenting out...@>= 1167 if activity[i]=unreachable then 1168 begin if level=1 then 1169 begin left; out('COMMENT THIS PART OF THE PROGRAM IS NEVER USED!'); out_ln; 1170 end 1171 end 1172 else if level=2 then right 1173 1174 @ @<Output step $i$...@>= 1175 begin k:=lig_step(i); 1176 if tfm[k]>stop_flag then 1177 begin if 256*tfm[k+2]+tfm[k+3]>=nl then 1178 bad('Ligature unconditional stop command address is too big.'); 1179 @.Ligature unconditional stop...@> 1180 end 1181 else if tfm[k+2]>=kern_flag then @<Output a kern step@> 1182 else @<Output a ligature step@>; 1183 if tfm[k]>0 then 1184 if level=1 then @<Output either \.{SKIP} or \.{STOP}@>; 1185 end 1186 1187 @ The \.{SKIP} command is a bit tricky, because we will be omitting all 1188 inaccessible commands. 1189 1190 @<Output either...@>= 1191 begin if tfm[k]>=stop_flag then out('(STOP)') 1192 else begin count:=0; 1193 for ai:=i+1 to i+tfm[k] do if activity[ai]=accessible then incr(count); 1194 out('(SKIP D ',count:1,')'); {possibly $count=0$, so who cares} 1195 end; 1196 out_ln; 1197 end 1198 1199 @ @<Output a kern step@>= 1200 begin if nonexistent(tfm[k+1]) then if tfm[k+1]<>boundary_char then 1201 correct_bad_char('Kern step for')(k+1); 1202 @.Kern step for nonexistent...@> 1203 left; out('KRN'); out_char(tfm[k+1]); 1204 r:=256*(tfm[k+2]-kern_flag)+tfm[k+3]; 1205 if r>=nk then 1206 begin bad('Kern index too large.'); 1207 @.Kern index too large@> 1208 out(' R 0.0'); 1209 end 1210 else out_fix(kern(r)); 1211 right; 1212 end 1213 1214 @ @<Output a ligature step@>= 1215 begin if nonexistent(tfm[k+1]) then if tfm[k+1]<>boundary_char then 1216 correct_bad_char('Ligature step for')(k+1); 1217 @.Ligature step for nonexistent...@> 1218 if nonexistent(tfm[k+3]) then 1219 correct_bad_char('Ligature step produces the')(k+3); 1220 @.Ligature step produces...@> 1221 left; r:=tfm[k+2]; 1222 if (r=4)or((r>7)and(r<>11)) then 1223 begin print_ln('Ligature step with nonstandard code changed to LIG'); 1224 r:=0; tfm[k+2]:=0; 1225 end; 1226 if r mod 4>1 then out('/'); 1227 out('LIG'); 1228 if odd(r) then out('/'); 1229 while r>3 do 1230 begin out('>'); r:=r-4; 1231 end; 1232 out_char(tfm[k+1]); out_char(tfm[k+3]); right; 1233 end 1234 1235 @ The last thing on \.{TFtoPL}'s agenda is to go through the 1236 list of |char_info| and spew out the information about each individual 1237 character. 1238 1239 @<Do the characters@>= 1240 sort_ptr:=0; {this will suppress `\.{STOP}' lines in ligature comments} 1241 for c:=bc to ec do if width_index(c)>0 then 1242 begin if chars_on_line=8 then 1243 begin print_ln(' '); chars_on_line:=1; 1244 end 1245 else begin if chars_on_line>0 then print(' '); 1246 incr(chars_on_line); 1247 end; 1248 print_octal(c); {progress report} 1249 left; out('CHARACTER'); out_char(c); out_ln; 1250 @<Output the character's width@>; 1251 if height_index(c)>0 then @<Output the character's height@>; 1252 if depth_index(c)>0 then @<Output the character's depth@>; 1253 if italic_index(c)>0 then @<Output the italic correction@>; 1254 case tag(c) of 1255 no_tag: do_nothing; 1256 lig_tag: @<Output the applicable part of the ligature/kern 1257 program as a comment@>; 1258 list_tag: @<Output the character link unless there is a problem@>; 1259 ext_tag: @<Output an extensible character recipe@>; 1260 end; {there are no other cases} 1261 right; 1262 end 1263 1264 @ @<Output the character's width@>= 1265 begin left; out('CHARWD'); 1266 if width_index(c)>=nw then range_error('Width') 1267 else out_fix(width(c)); 1268 right; 1269 end 1270 1271 @ @<Output the character's height@>= 1272 if height_index(c)>=nh then range_error('Height') 1273 @.Height index for char...@> 1274 else begin left; out('CHARHT'); out_fix(height(c)); right; 1275 end 1276 1277 @ @<Output the character's depth@>= 1278 if depth_index(c)>=nd then range_error('Depth') 1279 @.Depth index for char@> 1280 else begin left; out('CHARDP'); out_fix(depth(c)); right; 1281 end 1282 1283 @ @<Output the italic correction@>= 1284 if italic_index(c)>=ni then range_error('Italic correction') 1285 @.Italic correction index for char...@> 1286 else begin left; out('CHARIC'); out_fix(italic(c)); right; 1287 end 1288 1289 @ @<Output the applicable part of the ligature...@>= 1290 begin left; out('COMMENT'); out_ln;@/ 1291 i:=remainder(c); r:=lig_step(i); 1292 if tfm[r]>stop_flag then i:=256*tfm[r+2]+tfm[r+3]; 1293 repeat @<Output step...@>; 1294 if tfm[k]>=stop_flag then i:=nl 1295 else i:=i+1+tfm[k]; 1296 until i>=nl; 1297 right; 1298 end 1299 1300 @ We want to make sure that there is no cycle of characters linked together 1301 by |list_tag| entries, since \TeX\ doesn't want to risk endless loops. 1302 If such a cycle exists, the routine here detects it when processing 1303 the largest character code in the cycle. 1304 1305 @<Output the character link unless there is a problem@>= 1306 begin r:=remainder(c); 1307 if nonexistent(r) then 1308 begin bad_char('Character list link to')(r); reset_tag(c); 1309 @.Character list link...@> 1310 end 1311 else begin while (r<c)and(tag(r)=list_tag) do r:=remainder(r); 1312 if r=c then 1313 begin bad('Cycle in a character list!'); 1314 @.Cycle in a character list@> 1315 print('Character '); print_octal(c); 1316 print_ln(' now ends the list.'); 1317 reset_tag(c); 1318 end 1319 else begin left; out('NEXTLARGER'); out_char(remainder(c)); 1320 right; 1321 end; 1322 end; 1323 end 1324 1325 @ @<Output an extensible character recipe@>= 1326 if remainder(c)>=ne then 1327 begin range_error('Extensible'); reset_tag(c); 1328 @.Extensible index for char@> 1329 end 1330 else begin left; out('VARCHAR'); out_ln; 1331 @<Output the extensible pieces that exist@>; 1332 right; 1333 end 1334 1335 @ @<Output the extensible pieces that...@>= 1336 for k:=0 to 3 do if (k=3)or(tfm[exten(c)+k]>0) then 1337 begin left; 1338 case k of 1339 0:out('TOP');@+1:out('MID');@+2:out('BOT');@+3:out('REP')@+end; 1340 if nonexistent(tfm[exten(c)+k]) then out_char(c) 1341 else out_char(tfm[exten(c)+k]); 1342 right; 1343 end 1344 1345 @ Some of the extensible recipes may not actually be used, but \TeX\ will 1346 complain about them anyway if they refer to nonexistent characters. 1347 Therefore \.{TFtoPL} must check them too. 1348 1349 @<Check the extensible recipes@>= 1350 if ne>0 then for c:=0 to ne-1 do for d:=0 to 3 do 1351 begin k:=4*(exten_base+c)+d; 1352 if (tfm[k]>0)or(d=3) then 1353 begin if nonexistent(tfm[k]) then 1354 begin bad_char('Extensible recipe involves the')(tfm[k]); 1355 @.Extensible recipe involves...@> 1356 if d<3 then tfm[k]:=0; 1357 end; 1358 end; 1359 end 1360 1361 @* Checking for ligature loops. 1362 We have programmed almost everything but the most interesting calculation of 1363 all, which has been saved for last as a special treat. \TeX's extended ligature 1364 mechanism allows unwary users to specify sequences of ligature replacements 1365 that never terminate. For example, the pair of commands 1366 $$\.{(/LIG $x$ $y$) (/LIG $y$ $x$)}$$ 1367 alternately replaces character $x$ by character $y$ and vice versa. A similar 1368 loop occurs if \.{(LIG/ $z$ $y$)} occurs in the program for $x$ and 1369 \.{(LIG/ $z$ $x$)} occurs in the program for $y$. 1370 1371 More complicated loops are also possible. For example, suppose the ligature 1372 programs for $x$ and $y$ are 1373 $$\vcenter{\halign{#\hfil\cr 1374 \.{(LABEL $x$)(/LIG/ $z$ $w$)(/LIG/> $w$ $y$)} \dots,\cr 1375 \.{(LABEL $y$)(LIG $w$ $x$)} \dots;\cr}}$$ 1376 then the adjacent characters $xz$ change to $xwz$, $xywz$, $xxz$, $xxwz$, 1377 \dots, ad infinitum. 1378 1379 @ To detect such loops, \.{TFtoPL} attempts to evaluate the function 1380 $f(x,y)$ for all character pairs $x$ and~$y$, where $f$ is defined as 1381 follows: If the current character is $x$ and the next character is 1382 $y$, we say the ``cursor'' is between $x$ and $y$; when the cursor 1383 first moves past $y$, the character immediately to its left is 1384 $f(x,y)$. This function is defined if and only if no infinite loop is 1385 generated when the cursor is between $x$ and~$y$. 1386 1387 The function $f(x,y)$ can be defined recursively. It turns out that all pairs 1388 $(x,y)$ belong to one of five classes. The simplest class has $f(x,y)=y$; this 1389 happens if there's no ligature between $x$ and $y$, or in the cases 1390 \.{LIG/>} and \.{/LIG/>>}. Another simple class arises when there's a 1391 \.{LIG} or \.{/LIG>} between $x$ and~$y$, generating the character~$z$; 1392 then $f(x,y)=z$. Otherwise we always have $f(x,y)$ equal to 1393 either $f(x,z)$ or $f(z,y)$ or $f(f(x,z),y)$, where $z$ is the inserted 1394 ligature character. 1395 1396 The first two of these classes can be merged; we can also consider 1397 $(x,y)$ to belong to the simple class when $f(x,y)$ has been evaluated. 1398 For technical reasons we allow $x$ to be 256 (for the boundary character 1399 at the left) or 257 (in cases when an error has been detected). 1400 1401 For each pair $(x,y)$ having a ligature program step, we store 1402 $(x,y)$ in a hash table from which the values $z$ and $class$ can be read. 1403 1404 @d simple=0 {$f(x,y)=z$} 1405 @d left_z=1 {$f(x,y)=f(z,y)$} 1406 @d right_z=2 {$f(x,y)=f(x,z)$} 1407 @d both_z=3 {$f(x,y)=f(f(x,z),y)$} 1408 @d pending=4 {$f(x,y)$ is being evaluated} 1409 1410 @<Glob...@>= 1411 @!hash:array[0..hash_size] of 0..66048; {$256x+y+1$ for $x\le257$ and $y\le255$} 1412 @!class:array[0..hash_size] of simple..pending; 1413 @!lig_z:array[0..hash_size] of 0..257; 1414 @!hash_ptr:0..hash_size; {the number of nonzero entries in |hash|} 1415 @!hash_list:array[0..hash_size] of 0..hash_size; {list of those nonzero entries} 1416 @!h,@!hh:0..hash_size; {indices into the hash table} 1417 @!x_lig_cycle,@!y_lig_cycle:0..256; {problematic ligature pair} 1418 1419 @ @<Check for ligature cycles@>= 1420 hash_ptr:=0; y_lig_cycle:=256; 1421 for hh:=0 to hash_size do hash[hh]:=0; {clear the hash table} 1422 for c:=bc to ec do if tag(c)=lig_tag then 1423 begin i:=remainder(c); 1424 if tfm[lig_step(i)]>stop_flag then 1425 i:=256*tfm[lig_step(i)+2]+tfm[lig_step(i)+3]; 1426 @<Enter data for character $c$ starting at location |i| in the hash table@>; 1427 end; 1428 if bchar_label<nl then 1429 begin c:=256; i:=bchar_label; 1430 @<Enter data for character $c$ starting at location |i| in the hash table@>; 1431 end; 1432 if hash_ptr=hash_size then 1433 begin print_ln('Sorry, I haven''t room for so many ligature/kern pairs!'); 1434 @.Sorry, I haven't room...@> 1435 goto final_end; 1436 end; 1437 for hh:=1 to hash_ptr do 1438 begin r:=hash_list[hh]; 1439 if class[r]>simple then {make sure $f$ is defined} 1440 r:=f(r,(hash[r]-1)div 256,(hash[r]-1)mod 256); 1441 end; 1442 if y_lig_cycle<256 then 1443 begin print('Infinite ligature loop starting with '); 1444 @.Infinite ligature loop...@> 1445 if x_lig_cycle=256 then print('boundary')@+else print_octal(x_lig_cycle); 1446 print(' and '); print_octal(y_lig_cycle); print_ln('!'); 1447 out('(INFINITE LIGATURE LOOP MUST BE BROKEN!)'); goto final_end; 1448 end 1449 1450 @ @<Enter data for character $c$...@>= 1451 repeat hash_input; k:=tfm[lig_step(i)]; 1452 if k>=stop_flag then i:=nl 1453 else i:=i+1+k; 1454 until i>=nl 1455 1456 @ We use an ``ordered hash table'' with linear probing, because such a table 1457 is efficient when the lookup of a random key tends to be unsuccessful. 1458 1459 @p procedure hash_input; {enter data for character |c| and command |i|} 1460 label 30; {go here for a quick exit} 1461 var @!cc:simple..both_z; {class of data being entered} 1462 @!zz:0..255; {function value or ligature character being entered} 1463 @!y:0..255; {the character after the cursor} 1464 @!key:integer; {value to be stored in |hash|} 1465 @!t:integer; {temporary register for swapping} 1466 begin if hash_ptr=hash_size then goto 30; 1467 @<Compute the command parameters |y|, |cc|, and |zz|@>; 1468 key:=256*c+y+1; h:=(1009*key) mod hash_size; 1469 while hash[h]>0 do 1470 begin if hash[h]<=key then 1471 begin if hash[h]=key then goto 30; {unused ligature command} 1472 t:=hash[h]; hash[h]:=key; key:=t; {do ordered-hash-table insertion} 1473 t:=class[h]; class[h]:=cc; cc:=t; {namely, do a swap} 1474 t:=lig_z[h]; lig_z[h]:=zz; zz:=t; 1475 end; 1476 if h>0 then decr(h)@+else h:=hash_size; 1477 end; 1478 hash[h]:=key; class[h]:=cc; lig_z[h]:=zz; 1479 incr(hash_ptr); hash_list[hash_ptr]:=h; 1480 30:end; 1481 1482 @ We must store kern commands as well as ligature commands, because the former 1483 might make the latter inapplicable. 1484 1485 @<Compute the command param...@>= 1486 k:=lig_step(i); y:=tfm[k+1]; t:=tfm[k+2]; cc:=simple; zz:=tfm[k+3]; 1487 if t>=kern_flag then zz:=y 1488 else begin case t of 1489 0,6:do_nothing; {\.{LIG},\.{/LIG>}} 1490 5,11:zz:=y; {\.{LIG/>}, \.{/LIG/>>}} 1491 1,7:cc:=left_z; {\.{LIG/}, \.{/LIG/>}} 1492 2:cc:=right_z; {\.{/LIG}} 1493 3:cc:=both_z; {\.{/LIG/}} 1494 end; {there are no other cases} 1495 end 1496 1497 @ Evaluation of $f(x,y)$ is handled by two mutually recursive procedures. 1498 Kind of a neat algorithm, generalizing a depth-first search. 1499 1500 @p function f(@!h,@!x,@!y:index):index; forward;@t\2@> 1501 {compute $f$ for arguments known to be in |hash[h]|} 1502 function eval(@!x,@!y:index):index; {compute $f(x,y)$ with hashtable lookup} 1503 var @!key:integer; {value sought in hash table} 1504 begin key:=256*x+y+1; h:=(1009*key) mod hash_size; 1505 while hash[h]>key do 1506 if h>0 then decr(h)@+else h:=hash_size; 1507 if hash[h]<key then eval:=y {not in ordered hash table} 1508 else eval:=f(h,x,y); 1509 end; 1510 1511 @ Pascal's beastly convention for |forward| declarations prevents us from 1512 saying |function f(h,x,y:index):index| here. 1513 1514 @p function f; 1515 begin case class[h] of 1516 simple: do_nothing; 1517 left_z: begin class[h]:=pending; lig_z[h]:=eval(lig_z[h],y); class[h]:=simple; 1518 end; 1519 right_z: begin class[h]:=pending; lig_z[h]:=eval(x,lig_z[h]); class[h]:=simple; 1520 end; 1521 both_z: begin class[h]:=pending; lig_z[h]:=eval(eval(x,lig_z[h]),y); 1522 class[h]:=simple; 1523 end; 1524 pending: begin x_lig_cycle:=x; y_lig_cycle:=y; lig_z[h]:=257; class[h]:=simple; 1525 end; {the value 257 will break all cycles, since it's not in |hash|} 1526 end; {there are no other cases} 1527 f:=lig_z[h]; 1528 end; 1529 1530 @* The main program. 1531 The routines sketched out so far need to be packaged into separate procedures, 1532 on some systems, since some \PASCAL\ compilers place a strict limit on the 1533 size of a routine. The packaging is done here in an attempt to avoid some 1534 system-dependent changes. 1535 1536 First comes the |organize| procedure, which reads the input data and 1537 gets ready for subsequent events. If something goes wrong, the routine 1538 returns |false|. 1539 1540 @p function organize:boolean; 1541 label final_end, 30; 1542 var tfm_ptr:index; {an index into |tfm|} 1543 begin @<Read the whole input file@>;@/ 1544 @<Set subfile sizes |lh|, |bc|, \dots, |np|@>;@/ 1545 @<Compute the base addresses@>;@/ 1546 organize:=true; goto 30; 1547 final_end: organize:=false; 1548 30: end; 1549 1550 @ Next we do the simple things. 1551 1552 @p procedure do_simple_things; 1553 var i:0..@'77777; {an index to words of a subfile} 1554 begin @<Do the header@>;@/ 1555 @<Do the parameters@>;@/ 1556 @<Check the |fix_word| entries@>@/ 1557 end; 1558 1559 @ And then there's a routine for individual characters. 1560 1561 @p procedure do_characters; 1562 var @!c:byte; {character being done} 1563 @!k:index; {a random index} 1564 @!ai:0..lig_size; {index into |activity|} 1565 begin @<Do the characters@>;@/ 1566 end; 1567 1568 @ Here is where \.{TFtoPL} begins and ends. 1569 @p begin initialize;@/ 1570 if not organize then goto final_end; 1571 do_simple_things;@/ 1572 @<Do the ligatures and kerns@>; 1573 @<Check the extensible recipes@>; 1574 do_characters; print_ln('.');@/ 1575 if level<>0 then print_ln('This program isn''t working!'); 1576 @.This program isn't working@> 1577 if not perfect then 1578 begin out('(COMMENT THE TFM FILE WAS BAD, SO THE DATA HAS BEEN CHANGED!)'); 1579 @.THE TFM FILE WAS BAD...@> 1580 write_ln(pl_file); 1581 end; 1582 final_end:end. 1583 1584 @* System-dependent changes. 1585 This section should be replaced, if necessary, by changes to the program 1586 that are necessary to make \.{TFtoPL} work at a particular installation. 1587 It is usually best to design your change file so that all changes to 1588 previous sections preserve the section numbering; then everybody's version 1589 will be consistent with the printed program. More extensive changes, 1590 which introduce new sections, can be inserted here; then only the index 1591 itself will get a new section number. 1592 @^system dependencies@> 1593 1594 @* Index. 1595 Pointers to error messages appear here together with the section numbers 1596 where each ident\-i\-fier is used.