design-notes.adoc (2854B)
1 = Design Notes 2 3 == Problems: 4 5 Translating C to Go is harder than it looks. 6 7 Jan says: It's impossible in the general case to turn C char* into Go 8 []byte. It's possible to do it probably often for concrete C code 9 cases - based also on author's C coding style. The first problem this 10 runs into is that Go does not guarantee that the backing array will 11 keep its address stable due to Go movable stacks. C expects the 12 opposite, a pointer never magically modifies itself, so some code will 13 fail. 14 15 INSERT CODE EXAMPLES ILLUSTRATING THE PROBLEM HERE 16 17 == How the parser works 18 19 There are no comment nodes in the C AST. Instead every cc.Token has a 20 Sep field: https://godoc.org/modernc.org/cc/v3#Token 21 22 It captures, when configured to do so, all white space preceding the 23 token, combined, including comments, if any. So we have all white 24 space/comments information for every token in the AST. A final white 25 space/comment, preceding EOF, is available as field TrailingSeperator 26 in the AST: https://godoc.org/modernc.org/cc/v3#AST. 27 28 To get the lexically first white space/comment for any node, use 29 tokenSeparator(): 30 https://gitlab.com/cznic/ccgo/-/blob/6551e2544a758fdc265c8fac71fb2587fb3e1042/v3/go.go#L1476 31 32 The same with a default value is comment(): 33 https://gitlab.com/cznic/ccgo/-/blob/6551e2544a758fdc265c8fac71fb2587fb3e1042/v3/go.go#L1467 34 35 == Looking forward 36 37 Eric says: In my visualization of how the translator would work, the 38 output of a ccgo translation of a module at any given time is a file 39 of pseudo-Go code in which some sections may be enclosed by a Unicode 40 bracketing character (presently using the guillemot quotes U+ab and 41 U+bb) meaning "this is not Go yet" that intentionally makes the Go 42 compiler barf. This expresses a color on the AST nodes. 43 44 So, for example, if I'm translating hello.c with a ruleset that does not 45 include print -> fmt.Printf, this: 46 47 --------------------------------------------------------- 48 #include <stdio> 49 50 /* an example comment */ 51 52 int main(int argc, char *argv[]) 53 { 54 printf("Hello, World") 55 } 56 --------------------------------------------------------- 57 58 becomes this without any explicit rules at all: 59 60 --------------------------------------------------------- 61 «#include <stdio>» 62 63 /* an example comment */ 64 65 func main 66 { 67 «printf(»"Hello, World"!\n"«)» 68 } 69 --------------------------------------------------------- 70 71 Then, when the rule print -> fmt.Printf is added, it becomes 72 73 --------------------------------------------------------- 74 import ( 75 "fmt" 76 ) 77 78 /* an example comment */ 79 80 func main 81 { 82 fmt.Printf("Hello, World"!\n") 83 } 84 --------------------------------------------------------- 85 86 because with that rule the AST node corresponding to the printf 87 call can be translated and colored "Go". This implies an import 88 of fmt. We observe that there are no longer C-colored spans 89 and drop the #includes. 90 91 // end