github.com/graybobo/golang.org-package-offline-cache@v0.0.0-20200626051047-6608995c132f/x/talks/2014/c2go.slide (about)

     1  Go, from C to Go
     2  
     3  GopherCon
     4  25 Apr 2014
     5  
     6  Russ Cox
     7  Google
     8  
     9  http://golang.org/
    10  
    11  
    12  * Video
    13  
    14  A video of this talk was recorded at GopherCon in Denver.
    15  
    16  .link https://www.youtube.com/watch?v=QIE5nV5fDwA Watch the talk on YouTube
    17  
    18  
    19  * Go Compiler
    20  
    21  * Go Compiler
    22  
    23  80,000+ lines of C.
    24  
    25  * Problem
    26  
    27  Programming in Go is fun.
    28  
    29  Programming in C is not.
    30  
    31  * Problem
    32  
    33  Writing a Go compiler requires Go expertise.
    34  
    35  Writing any program in C requires C expertise.
    36  
    37  Writing a Go compiler in C requires Go and C expertise.
    38  
    39  * Solution
    40  
    41  Write the Go compiler in Go.
    42  
    43  * Past
    44  
    45  Why not write the Go compiler in Go on day one?
    46  
    47  - Go did not exist.
    48  
    49  - Go was unstable.
    50  
    51  - Go is not targeting compiler writers.
    52  
    53  * Present
    54  
    55  Why do it today?
    56  
    57  - Go does exist.
    58  
    59  - Go is stable.
    60  
    61  - Go is a great general purpose language.
    62  
    63  * How?
    64  
    65  Crazy idea: mechanical conversion.
    66  
    67  “One big gofix.”
    68  
    69  * C
    70  
    71  * C
    72  
    73  - First creative burst in 1972 at Bell Labs
    74  
    75  - Ritchie, [[http://cm.bell-labs.com/who/dmr/chist.html][The Development of the C Language]], HOPL 1993
    76  
    77  - “C is quirky, flawed, and an enormous success...”
    78  
    79  * C Data Model
    80  
    81  - Original target: PDP-11 with 24 kB of memory.
    82  
    83  - Programmer is in charge of memory.
    84  
    85  - “Off-stack, dynamically-allocated storage is provided only by a library routine and the burden of managing it is placed on the programmer: C is hostile to automatic garbage collection.”
    86  
    87  - Types are there to help but not enforced.
    88  
    89  * C Control Flow
    90  
    91  - `do...while`, `for`, `switch`, `while`
    92  
    93  - the much maligned `goto`
    94  
    95  * C Program Model
    96  
    97  - Per-file compilation.
    98  
    99  - Headers vs code.
   100  
   101  - `#define`, `#include`
   102  
   103  * Conversion
   104  
   105  * Challenges for Converting C to Go
   106  
   107  - minor: unions, #define, comments
   108  
   109  - goto
   110  
   111  - type mapping
   112  
   113  * Goal
   114  
   115  Automated conversion of our C code to Go.
   116  
   117  Target: _our_ C code, not _all_ C code.
   118  
   119  - Want generated code to be maintainable.
   120  - Want automatic translation for 99%+ of the code.
   121  - No need to solve general problem for tiny number of cases.
   122  - Special cases in converter are okay.
   123  - Annotations in source code are okay.
   124  
   125  * Warmups
   126  
   127  * Unions
   128  
   129  go/src/cmd/gc/go.h
   130  
   131  	struct  Val
   132  	{
   133  	    short   ctype;
   134  	    union
   135  	    {
   136  	        short   reg;        // OREGISTER
   137  	        short   bval;       // bool value CTBOOL
   138  	        Mpint*  xval;       // int CTINT, rune CTRUNE
   139  	        Mpflt*  fval;       // float CTFLT
   140  	        Mpcplx* cval;       // float CTCPLX
   141  	        Strlit* sval;       // string CTSTR
   142  	    } u;
   143  	};
   144  
   145  * Unions
   146  
   147  go/include/link.h
   148  
   149  	struct  Addr
   150  	{
   151  	    short type;
   152  	    union
   153  	    {
   154  	        char    sval[8];
   155  	        float64 dval;
   156  	        Prog*   branch; // for 5g, 6g, 8g
   157  	    } u;
   158  	    
   159  	    ...
   160  	};
   161  
   162  * Unions
   163  
   164  `#define` `struct` `union` `/*` `Great` `space` `saver` `*/`
   165  
   166  * Unions
   167  
   168  `#define` `union` `struct` `/*` `legal` `in` `C!` `*/`
   169  
   170  And anyway, there are only two.
   171  
   172  * #define
   173  
   174  Can't just expand during parsing.
   175  
   176  * #define
   177  
   178  Not many. 
   179  
   180  	/*
   181  	 * defined macros
   182  	 *    you need super-gopher-guru privilege
   183  	 *    to add this list.
   184  	 */
   185  	#define nelem(x)    (sizeof(x)/sizeof((x)[0]))
   186  	#define nil         ((void*)0)
   187  	...
   188  
   189  Extend parser to recognize special cases.
   190  
   191  * #define
   192  
   193  Annotate some.
   194  
   195  	#define	BOM	0xFEFF
   196  	/*c2go enum { BOM = 0xFEFF }; */
   197  
   198  Rewrite others.
   199  
   200  	enum {
   201  	    BOM = 0xFEFF,
   202  	};
   203  
   204  * Comments
   205  
   206  Can't just discard during parsing.
   207  
   208  	/*
   209  	 * If the new process paused because it was
   210  	 * swapped out, set the stack level to the last call
   211  	 * to savu(u_ssav).  This means that the return
   212  	 * which is executed immediately after the call to aretu
   213  	 * actually returns from the last routine which did
   214  	 * the savu.
   215  	 *
   216  	 * You are not expected to understand this.
   217  	 */
   218  	if(rp->p_flag&SSWAP) {
   219  	    rp->p_flag =& ~SSWAP;
   220  	    aretu(u.u_ssav);
   221  	}
   222  
   223  * Comments
   224  
   225  Record precise source locations.
   226  
   227  	case OMAPLIT:
   228  	    n->esc = EscNone;  // until proven otherwise
   229  	    e->noesc = list(e->noesc, n);
   230  	    n->escloopdepth = e->loopdepth;
   231  
   232  	    // Keys and values make it to memory, lose track.
   233  	    for(ll=n->list; ll; ll=ll->next) {
   234  	        escassign(e, &e->theSink, ll->n->left);
   235  	        escassign(e, &e->theSink, ll->n->right);
   236  	    }
   237  	    break;
   238  
   239  Whole-line comments attach to syntax immediately following (or EOF).
   240  
   241  Suffix comments attach to syntax immediately before.
   242  
   243  Syntax carries comments if it moves.
   244  
   245  * Goto
   246  
   247  * C Goto
   248  
   249  “27. Horrors! goto’s and labels
   250  
   251  C has a goto statement and labels, so you can branch about the way you used to. But most of the time goto’s aren’t needed... The code can almost always be more clearly expressed by for/while, if/else, and compound statements.
   252  
   253  * C Goto
   254  
   255  One use of goto’s with some legitimacy is in a program which contains a long loop, where a while(1) would be too extended. Then you might write
   256  
   257  	mainloop:
   258  		...
   259  		goto mainloop;
   260  
   261  Another use is to implement a break out of more than one level of for or while. goto’s can only branch to labels within the same function.”
   262  
   263  — Kernighan, [[http://cm.bell-labs.com/who/dmr/ctut.pdf][Programming in C – A Tutorial]]
   264  
   265  * Go Goto Restrictions
   266  
   267  - Cannot jump over a variable declaration in target scope.
   268  
   269  	.   if x {
   270  	        goto Done
   271  	    }
   272  	    
   273  	    y := f()
   274  	    print(y)
   275  	
   276  	Done:
   277  	    close(c)
   278  	    return
   279  
   280  * Go Goto Restrictions
   281  
   282  - Cannot jump over a variable declaration in target scope.
   283  
   284  	.   var y int
   285  
   286  	    if x {
   287  	        goto Done
   288  	    }
   289  	    
   290  	    y = f()
   291  	    print(y)
   292  	
   293  	Done:
   294  	    close(c)
   295  	    return
   296  
   297  * Go Goto Restrictions
   298  
   299  - Cannot jump into a new scope (into a { } block).
   300  
   301  	if bad {
   302  	Bad:
   303  	    printError()
   304  	    return err
   305  	}
   306  	
   307  	...
   308  	
   309  	if other bad thing {
   310  	    goto Bad
   311  	}
   312  
   313  * Go Goto Restrictions
   314  
   315  - Cannot jump into a new scope (into a { } block or switch case).
   316  
   317  	switch x {
   318  	case 1:
   319  	    F()
   320  	    goto Common;
   321  	case 2:
   322  	    G()
   323  	    goto Common
   324  	case 3:
   325  	Common:
   326  	    H()
   327  	}
   328  
   329  * Goto in Go compiler
   330  
   331  1032 goto statements
   332  241 labels
   333  
   334  * Goto in Go compiler
   335  
   336  35 indented labels
   337  
   338  18 switch case
   339  6 multilevel break/continue
   340  5 ‘else’ statement
   341  4 cleanup/error labels
   342  1 loop
   343  1 difficult to explain
   344  
   345  * Refactor switch case goto
   346  
   347  	switch(r->op) {
   348  	case OINDEXMAP:
   349  	    n->op = OAS2MAPR;
   350  	    goto common;
   351  	case ORECV:
   352  	    n->op = OAS2RECV;
   353  	    goto common;
   354  	case ODOTTYPE:
   355  	    n->op = OAS2DOTTYPE;
   356  	    r->op = ODOTTYPE2;
   357  	common:
   358  	    ...
   359  	}
   360  	
   361  * Refactor switch case goto
   362  
   363  	switch r.op {
   364  	case OINDEXMAP, ORECV, ODOTTYPE:
   365  	    switch r.op {
   366  	    case OINDEXMAP:
   367  	        n.op = OAS2MAPR
   368  	    case ORECV:
   369  	    	n.op = OAS2RECV
   370  	    case ODOTTYPE:
   371  	    	n.op = OAS2DOTTYPE
   372  	    	r.op = ODOTTYPE2
   373  	    }
   374  	    ...
   375  	}
   376  
   377  * General solution
   378  
   379  Baker, [[http://dl.acm.org/citation.cfm?id=321999][An Algorithm for Structuring Flowgraphs]], JACM 1977
   380  
   381  But we don't need it.
   382  
   383  Handle trivial rewrites in converter.
   384  Rewrite problematic gotos by hand.
   385  
   386  * Type Mapping
   387  
   388  * Type Mapping
   389  
   390  General question: what type to use in the Go translation?
   391  
   392  - C allows implicit conversion between int, long, char and so on. Go must use one consistently.
   393  
   394  - C uses pointers for what Go calls pointers _and_ slices.
   395  
   396  * Type Mapping
   397  
   398  Build graph of “assigned” value flow and extract clusters.
   399  
   400  	x = y;
   401  	
   402  	int f(void) {
   403  	    return x;
   404  	}
   405  	w = f();
   406  	
   407  	void g(int z);
   408  	g(x);
   409  	g(y);
   410  
   411  Apply to entire compiler (all files). Exclude some functions.
   412  
   413  * Type Mapping
   414  
   415  	int
   416  	islvalue(Node *n)
   417  	{
   418  	    switch(n->op) {
   419  	    case OINDEX:
   420  	        if(isfixedarray(n->left->type))
   421  	            return islvalue(n->left);
   422  	        if(n->left->type != T && n->left->type->etype == TSTRING)
   423  	            return 0;
   424  	        // fall through
   425  	    case OIND:
   426  	    case ODOTPTR:
   427  	    case OCLOSUREVAR:
   428  	        return 1;
   429  	    case ODOT:
   430  	        return islvalue(n->left);
   431  	    case ONAME:
   432  	        if(n->class == PFUNC)
   433  	            return 0;
   434  	        return 1;
   435  	    }
   436  	    return 0;
   437  	}
   438  
   439  * Type Mapping
   440  
   441  	int
   442  	islvalue(Node *n)
   443  	{
   444  	    ...
   445  	    return islvalue(n->left);
   446  	    ...
   447  	    return 0;
   448  	    ...
   449  	    return 1;
   450  	    ...
   451  	    return islvalue(n->left);
   452  	    ...
   453  	    return 0;
   454  	    ...
   455  	    return 1;
   456  	    ...
   457  	    return 0;
   458  	}
   459  
   460  * Type Mapping
   461  
   462  	cluster
   463  	    types: int
   464  	    values:
   465  	        return from islvalue
   466  	        0
   467  	        1
   468  	        islvalue(n)
   469  	        islvalue(n->left)
   470  	        islvalue(n->right)
   471  	    contexts:
   472  	        bool condition
   473  	        /* if(islvalue(n)), if(!islvalue(n)), ... */
   474  
   475  Translation: bool.
   476  
   477  * Type Mapping
   478  
   479  	cluster
   480  	    types: int
   481  	    values:
   482  	        return from checksliceconst
   483  	        0
   484  	        -1
   485  	    contexts:
   486  	        checksliceconst(lo, hi) < 0
   487  	        checksliceconst(lo, mid) < 0
   488  	        checksliceconst(mid, hi) < 0
   489  
   490  Translation: bool or error.
   491  
   492  * Type Mapping
   493  
   494  	cluster
   495  	    types: Val*
   496  	    values:
   497  	        var Val *v
   498  	        va_arg(fp->args, Val*)
   499  	    contexts:
   500  	        v->ctype
   501  	        v->u
   502  
   503  Translation: pointer.
   504  
   505  * Type Mapping
   506  
   507  	cluster
   508  	    types: long*
   509  	    values:
   510  	        var long* a1
   511  	        &a->a[0]
   512  	    contexts:
   513  	        *a1
   514  	        a1++
   515  
   516  Translation: slice.
   517  
   518  * Type Mapping
   519  
   520  Cluster statistics
   521  
   522  - 1,703 clusters in Go compiler
   523  - median cluster size 4 values
   524  - max cluster size 16,592 values
   525  
   526  Clustering does not rely on C type information at all.
   527  
   528  * Conversion status
   529  
   530  - Still prototyping, but looks good.
   531  - Aiming at Go 1.4, but no promises.
   532  
   533  By the way, please try the Go 1.3 beta!
   534  
   535  * Go from C to Go!
   536  
   537  - Practical
   538  - Applicable to other code bases?
   539  - Applicable to other languages?
   540  - Applicable to program understanding tools?
   541