diff --git a/semantics.md b/semantics.md new file mode 100644 index 0000000..3843a51 --- /dev/null +++ b/semantics.md @@ -0,0 +1,85 @@ +# i swear this is what fuzzy actually does + +## the stack + +fuzzy works on a 16-bit cell-width, zero-page data stack indexed with the x register, as documented in Garth Wilson's [stack treatise](https://wilsonminesco.com/stacks/virtualstacks.html) + +to push a byte onto the data stack, we just: + +```asm + dex ; decrement the stack pointer + lda some_value ; load the byte we want on the stack into a + sta 0, x ; put the byte on the stack! +``` + +and to pop a byte off it: + +```asm + lda 0, x ; pop the top of stack off into a + inx ; increment the stack pointer +``` + +## types + +these are used in word definitions, and refer to the type of an individual stack cell: + +| type | desc | +| ---------------------- | ----------------------------------------------------------- | +| **bool** | a boolean value, represented by $0000 or $ffff | +| **nat** | an unsigned 16-bit integer | +| **int** | a signed 16-bit integer | +| **char** | an 8-bit george-ascii character, padded with leading zeroes | +| **string** | a 16-bit pointer to a string in memory | +| **word** _`dangerous`_ | a 16-bit pointer to a fuzzy word or quotation | + +## operators + +- `!` NOT: applies NOT to tos +- `&` AND: pops 2 off the stack and pushes the AND'ed result +- `|` OR: pops 2 off the stack and pushes the OR'ed result +- `+` add: pops 2 off the stack and pushes the sum +- `-` subtract: pops 2 off the stack and pushes the difference +- `*` multiply: pops 2 off the stack and pushes the result, truncating if it's >$FFFF +- `/` divide: pops 2 off the stack and pushes the remainder and quotient +- `=` equality: pushes true/false if the top 2 stack cells do/don't match +- `>` greater than: pushes true/false if tos-1 is/isn't greater than tos +- `<` less than: pushes true/false if tos-1 is/isn't greater than tos +- `#` quote _`dangerous`_: pops tos and pushes a word that produces its value + +### supported types (this will need to be more clearly laid out later) + +| operator | input type | output type | notes | +| -------- | ------------------------ | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `!` | `bool`, `nat`, `int` | `bool`, `nat`, `int` | | +| `&` | `bool`, `nat`, `int` | `bool`, `nat`, `int` | | +| `\|` | `bool`, `nat`, `int` | `bool`, `nat`, `int` | | +| `+` | `nat` `nat`, `int` `int` | `nat`, `int` | | +| `-` | `nat` `nat`, `int` `int` | `nat`, `int` | subtracting two `nat`s | +| `*` | `nat` `nat`, `int` `int` | `nat`, `int` | most products will be truncated, since most 16 bit multiplications result in a >16 bit product, but in practice that shouldn't matter cause we're not doing science | +| `/` | `nat` `nat`, `int` `int` | `nat` `nat`, `int` `int` | produces two cells, the quotient and remainder | +| `=` | any any | `bool` | equality/order is checked based on stack cell value, not type (e.g. a `word` pointing to $abcd and a `nat` with the value $abcd are equivalent) | +| `>` | any any | `bool` | see above | +| `<` | any any | `bool` | see above | +| `#` | any | `word` | _`dangerous`_ | + +## `danger!` + +the `danger!` keyword marks a word as being _`dangerous`_. certain language features can only be used in dangerous words, such as: + +- inline assembly +- quotations + - typechecking quotations is a difficult problem & probably too complex too implement on george if we ever want to fully self-host fuzzy +- unchecked operator usage + - applying `+` to two chars, applying `&` to two strings, etc + - this does not mean that _dangerous_ words are untyped! just the type of the result of an operation is asserted to be the word result type + - `danger! dangerous_word num num is char: +` can't be used on a `num char` stack, and any words used after `dangerous_word` treat the top of the stack as having a `char` and don't care that it was made with two `num`s + +the program body cannot use any _dangerous_ features. this makes it so that _dangerous_ behavior is contained to specific words. + +## memory layout + +| start | end | use | +| ------ | ------ | ---------------------------- | +| `$200` | `$300` | | +| | | core language implementation | +| | | core language implementation | diff --git a/syntax.md b/syntax.md new file mode 100644 index 0000000..0ed41b3 --- /dev/null +++ b/syntax.md @@ -0,0 +1,78 @@ +# fuzzy syntax in a well-defined grammar so i don't lose my mind + +## notation + +| notation | meaning | +| -------- | --------------------------------------------- | +| abc | syntactical production | +| : | maps production to children (products?) | +| () | groups items | +| ʕ·ᴥ·ʔ | any 8-bit georgesci character | +| `abc` | exact character(s) | +| \x | an escape character | +| x? | optional | +| x\* | zero or more of x | +| x+ | one or more of x | +| x+y | y or more of x | +| x.y | y repetitions of x | +| \| | one or another | +| [-] | any characters in range (>=1 ranges accepted) | + +(adapted from the rust reference cause i like how simple they do it) + +## grammar + +the only semantically significant whitespace is \n+2 after a word definition. + +otherwise, assume tokens are delimited by an arbitrary amount of (not \n+2) whitespace, including no whitespace, e.g. the colon in `hello is: "hello"` + +also order is significant! if `value` produced `word` first, it would make reserved words like `true` and `false` parse into word references. + +```syntax +george: defs? body + +defs: (def \n+2)* +body: values + +def: signature `:` values +signature: `danger!`? word typedef + +values: (value | op)* + +typedef: pop? `is` push? effects? + +pop: type* + +push: type* + +effects: effect* + +type: `bool` | `nat` | `int` | `char` | `string` | `word` + +effect: `paint` | `sing` | `store` + +value: bool | num | char | string | word + +op: `!` | `&` | `|` | `+` | `-` | `*` | `/` | `=` | `>` | `<` | `#` + +quote: `[` values `]` + +bool: `true` | `false` + +word: [a-z A-Z]+ + +num: hexnum | binarynum + +binarynum: binarydigit+ +binarydigit: [0-9] +hexnum: (`$` hexdigit+) +hexdigit: [0-9 a-f A-F] + +char: `'` ʕ·ᴥ·ʔ `'` + +string: `"` ʕ·ᴥ·ʔ* `"` +``` + +## notes + +fuzzy assumes the source text to be encoded in [georgesci](#), which is nearly ascii-compatible and should only cause minor headaches <3