github.com/nevalang/neva@v0.23.1-0.20240507185603-7696a9bb8dda/CONTRIBUTING.md (about)

     1  # Contributing
     2  
     3  Read the https://nevalang.org/docs/tutorial.
     4  
     5  See [ARCHITECTURE.md](./ARCHITECTURE.md) and [Makefile](./Makefile).
     6  
     7  ## Requirements
     8  
     9  - Go: https://go.dev/doc/install
    10  - Make: https://www.gnu.org/software/make/#download
    11  - NodeJS and NPM: https://docs.npmjs.com/downloading-and-installing-node-js-and-npm/
    12  - Antlr: `pip install antlr4-tools`
    13  - Tygo: `go install github.com/gzuidhof/tygo@latest`
    14  
    15  ### VSCode
    16  
    17  These are not really required but recommended in order you're using VSCode
    18  
    19  - [nevalang](https://marketplace.visualstudio.com/items?itemName=nevalang.vscode-nevalang)
    20  - [antlr4](https://marketplace.visualstudio.com/items?itemName=mike-lischke.vscode-antlr4)
    21  - [tmlanguage](https://marketplace.visualstudio.com/items?itemName=pedro-w.tmlanguage)
    22  - [markdown-mermaid](https://marketplace.visualstudio.com/items?itemName=bierner.markdown-mermaid)
    23  
    24  ## Development
    25  
    26  ## ANTLR Grammar
    27  
    28  Don't forget to open `neva.g4` file before debugging with VSCode
    29  
    30  ## VSCode Extension
    31  
    32  VSCode extension depends on types defined in the `sourcecode` and `typesystem` packages so it's dangerous to rename those. If you going to do so, make sure you did't brake TS types generation.
    33  
    34  Check out [tygo.yaml](./tygo.yaml). and `CONTRIBUTING.md` in "vscode-neva" repo.
    35  
    36  ## Learning Resources
    37  
    38  ### FBP/DataFlow
    39  
    40  - [Flow-Based Programming: A New Approach to Application Development](https://jpaulmorrison.com/fbp/1stedchaps.html)
    41  - [Dataflow and Reactive Programming Systems: A Practical Guide](https://www.amazon.com/Dataflow-Reactive-Programming-Systems-Practical/dp/1497422442)
    42  
    43  ### Golang
    44  
    45  Advanced Go knowledge is required. Especially understanding concurrency.
    46  
    47  - [Concurrency is not parallelism](https://go.dev/blog/waza-talk)
    48  - [Share Memory By Communicating](https://go.dev/blog/codelab-share)
    49  - [Go Concurrency Patterns: Timing out, moving on](https://go.dev/blog/concurrency-timeouts)
    50  - [Go Concurrency Patterns: Context](https://go.dev/blog/context)
    51  - [Go Concurrency Patterns: Pipelines and cancellation](https://go.dev/blog/pipelines)
    52  
    53  ## Community
    54  
    55  Check out https://nevalang.org/community to find out where you can get help and be in touch
    56  
    57  ## Design Principles
    58  
    59  Nevalang is built on a set of principles. They were rather naturally derived from the development process rather artificially created beforehand.
    60  
    61  > WARNING: Language is under heavy development and these principles are not guarantees we can give you at the moment, but rather guiding stars for us to keep moving in the right direction
    62  
    63  ### Program must fail at startup or never
    64  
    65  The idea is that most of the errors must be caught by compiler at compile time. And the rest of them, that are hard to catch (without sacrificing compiler's simplicity) are checked in runtime at startup.
    66  
    67  If no errors were caught at compile time and startup - then the program is correct and must run successfully. Any (non-logical) error that occurred after startup must be threated like compiler bug.
    68  
    69  ### Runtime must be fast, flexible and unsafe
    70  
    71  Runtime won't do any checks after startup. The program that runtime consumes must be correct. Program's correctness must be ensured by compiler. If there's a bug in compiler and runtime consumed invalid program - bad things can happen: deadlocks, memory leaks, freezes and crashes.
    72  
    73  ### Compiler directives must not be required
    74  
    75  Language must allow to implement everything without using of compiler directives.
    76  
    77  **Compiler directives are not always unsafe** (analyzer won't always validate their usage - that will make implementation more complicated) and thus must be used by language/stdlib developers or at _for users that know what they are doing_.
    78  
    79  It's good for user to understand what compiler directives are and how syntax sugar use them under the hood though.
    80  
    81  ### There is interpreter (backend can be slow)
    82  
    83  Compiler must be fast to the point where it generates IR. After that we have generating of target code (e.g. generating Go and then generating machine code with Go compiler) - that part ("backend") doesn't have to be fast. It's more important to keep it simple.
    84  
    85  The reason for that is that we have an interpreter that is internally uses compiler (it's impossible to generate IR from invalid program due to lack of type information), but not the whole thing. Just to the point where it generates IR. That's the part of the compiler that is used for development/debugging purposes. That's where we need to be fast.
    86  
    87  ### There is visual programming
    88  
    89  Once we build the good enough tool for visual programming we will switch from text based approach. Text will become supporting tool. To achieve this we must always keep in mind that what we do with the language must be easy to visualize in graph environment.
    90  
    91  ## Internal Implementation Q&A
    92  
    93  ### Why structures are not represented as Go structures?
    94  
    95  It would take generating Go types dynamically which is either makes use of reflection or codegeneration (which makes interpreter mode impossible). Maps have their overhead but they are easy to work with.
    96  
    97  ### Why nested structures are not represented as flat maps?
    98  
    99  Indeed it's possible to represent `{ foo {bar int } }` like `{ "foo/bar": 42 }`. The problem arise when when we access the whole field. Let's take this example:
   100  
   101  ```
   102  types {
   103      User {
   104          pet {
   105              name str
   106          }
   107      }
   108  }
   109  
   110  ...
   111  
   112  $u.pet -> foo.bar
   113  ```
   114  
   115  What will `foo.bar` actually receive? This design makes impossible to actually send structures around and allows to operate on non-structured data only.
   116  
   117  ### Why Go?
   118  
   119  It's a perfect match. Go has builtin green threads, scheduler and garbage collector. Even more than that - it has goroutines and channels that are 1-1 mappings to FBP's ports and connections. Last but not least is that it's a pretty fast compiled language. Having Go as a compile target allows to reuse its state of the art standart library and increase performance for free by just updating the underlaying compiler.
   120  
   121  ### Why compiler operates on multi-module graph (build) and not just turns everything into one big module?
   122  
   123  Imagine you have `foo.bar` in your code. How does compiler figures out what that actually is? In order to do that it needs to _resolve_ that _reference_. And this is how _reference resolution_ works:
   124  
   125  First, find out what `foo` is. Look at the `import` section in the current file. Let's say we see something like:
   126  
   127  ```neva
   128  import {
   129      github.com/nevalang/x/foo
   130  }
   131  ```
   132  
   133  This is how we now that `foo` is actually `github.com/nevalang/x/foo` imported package. Cool, but when version of the `github.com/nevalang/x` we should use? Well, to figure that out we need to look out current _module_'s _manifest_ file. There we can find something like:
   134  
   135  ```yaml
   136  deps:
   137    - github.com/nevalang/x 0.0.1
   138  ```
   139  
   140  Cool, now we now what _exactly_ `foo` is. It's a `foo` package inside of `0.0.1` version of the `github.com/nevalang/x` module. So what's the point of operating on a nested multi-module graph instead of having one giant module?
   141  
   142  Now let's consider another example. Instead of depending on `github.com/nevalang/x` your code depends on `submodule` and that sub-module itself depends on `github.com/nevalang/x`
   143  
   144  You still have that `foo.bar` in your code and your module still depends on `github.com/nevalang/x` module. But now you also depends on another `submod` sub-module that also depends on `github.com/nevalang/x`. But your module depends on `github.com/nevalang/x` of the `0.0.1` version and `submod` depends on `1.0.0`.
   145  
   146  Now we have a problem. When compiler sees `foo.bar` in some file it does import lookup and sees `github.com/nevalang/x` and... does not know what to do. To solve this issue we need to lookup current module manifest and check what version `github.com/nevalang/x` _this current module_ uses. To do that we need to preserve the multi-module structure of the program.
   147  
   148  One might ask can't we simply import things like:
   149  
   150  ```neva
   151  import {
   152      github.com/nevalang/x@0.0.1
   153  }
   154  ```
   155  
   156  That actually could solve the issue. The problem is that now we have to update the source code _each time we update our dependency_. That's a bad solution. We simply made probramming harder to avoid working on a compiler. We can do better.
   157  
   158  ### Why `#bind` does not accept literals?
   159  
   160  Indeed it would be handy to be able to do stuff like this:
   161  
   162  ```neva
   163  nodes {
   164      #bind(str "hello world!")
   165      const Const<str>
   166  }
   167  ```
   168  
   169  This would make desugarer much simpler (no need to create all this virtual constants), and not just for const senders but for struct selectors too.
   170  
   171  However, to implement this we need to be able to parse literals inside `irgen`. Right now we already introduce dependency for parsing entity references, but for arbitrary expressions we need the whole parser.
   172  
   173  Of course, it's possible to hide actual parser implementation behind some kind of interface defined by irgen but that would make code more complicated. Besides, the very idea of having parser inside code-generator sounds bad. Parsing references is the acceptable compromise on the other hand.
   174  
   175  ### Why Analyzer knows about stdlib? Isn't it bad design?
   176  
   177  At first there was a try to implement analyzer in a way that it only knows about the core of the language.
   178  
   179  But turns out that some components in stdlib (especially `builtin` package, especially the ones that uses `#extern` and `#bind` directives) are actually part of the core of the language.
   180  
   181  E.g. when user uses struct selectors like `foo.bar/baz -> ...` and then desugarer replaces this with `foo.bar -> structSelectorNode("baz") -> ...` (this is pseudocode) we must ensure that type of the `bar` is 1) a `struct` 2) has field `baz` and 3) `baz` is compatible with whatever `...` is. _This is static semantic analysis_ and that's is work for analyzer.
   182  
   183  Actually every time we use compiler directive we depend on implicit contract that cannot be expressed in the terms of the language itself (except we introduce abstractions for that, which will make language more complicated). That's why we have to analyze such things by injecting knowledge about stdlib.
   184  
   185  Designing the language in a way where analyzer has zero knowledge about stdlib is possible in theory but would make the language more complicated and would take much more time.
   186  
   187  ### Why desugarer comes after analyzer in compiler's pipeline?
   188  
   189  Two reasons:
   190  
   191  1. Analyzer should operate on original "sugared" program so it can found errors in user's source code. Otherwise found errors can relate to desugar implementation (compiler internals) which is not the compilation error but debug info for compiler developers. Finally it's much easier to make end-user errors readable and user-friendly this way.
   192  2. Desugarer that comes before analysis must duplicate some validation because it's unsafe to desugar some constructs before ensuring they are valid. E.g. desugar struct selectors without knowing fir sure that outport's type is a valid structure. Also many desugaring transformations are only possible on analyzed program with all type expressions resolved.
   193  
   194  Actually it's impossible to have desugarer before analysis. It's possible to have two desugarers - one before and one after. But that would make compiler much more complicated without visible benefits.
   195  
   196  ### Why union types are allowed for constants at syntax level?
   197  
   198  You indeed can declare `const foo int | string = 42` and that won't make much sense. The problem it's not enough to restrict that at root level, you also have to recursively check every complex type like `struct`, `list` or `map`. And that is impossible to make at syntax level and require work in analyzer. This is could be done in the future when we cover more important cases.
   199  
   200  ### Why we have special syntax for union?
   201  
   202  We don't have sugar for `maybe<T>` and `list<T>` so why would we have this for unions? The reason is union is special for the type system. It's handled differently at the level of compatibility checking and resolving.
   203  
   204  However it's not `struct` where we _technically_ have to have some "literal" syntax. It's possible in theory to have just `union<T1, T2, ... Tn>` like e.g. in Python but would require _type-system_ known about `union` name and handle this reference expressions very differently. In fact this will only make design more complicated because we _pretend_ like it's regular type instantiation consisting of reference and arguments but in fact it's not.
   205  
   206  Lastly it's just common to have `|` syntax for unions.
   207  
   208  ### Why type system supports arrays?
   209  
   210  Because type-system is public package that can be used by others to implement languages (or something else constraint-based).
   211  
   212  Since there's no arrays at the syntax and internal representation levels then there's no performance overhead. Also having arrays in type system is not the most complicated thing so removing them won't save us much.
   213  
   214  ### Why isn't Nevalang self-hosted?
   215  
   216  - Runtime will never be written in Nevalang itself because of the overhead of FBP runtime on to of Go's runtime. Go provides exactly that level of control we needed to implement FBP runtime for Nevalang.
   217  - Compiler will be someday rewritten in Nevalang itself but we need several years of active usage of the language before that
   218  
   219  There's 2 reasons why we don't rewrite compiler in Nevalang right now:
   220  
   221  1. Language is incredibly unstable. Stdlib and even the core is massively changing these days. Compiler will be even more unstable and hard to maintain if we do that, until Nevalang is more or less stable.
   222  2. Languages that are mostly used for writing compilers are eventually better suited for that purpose. While it's good to be able to write compiler in Nevalang without much effort, it's not the goal to create a language for compilers. Writing compilers is a good thing but it's not very popular task for programmers. Actually it's incredibly rare to write compilers at work. We want Nevalang to be good language for many programmers.