fuzzy spec :)

This commit is contained in:
august kline 2024-10-06 21:54:35 -04:00
parent ed8e20f0db
commit cbc7bff7f7
2 changed files with 163 additions and 0 deletions

85
semantics.md Normal file
View File

@ -0,0 +1,85 @@
# i swear this is what fuzzy actually does
## the stack
fuzzy works on a 16-bit cell-width, zero-page data stack indexed with the x register, as documented in Garth Wilson's [stack treatise](https://wilsonminesco.com/stacks/virtualstacks.html)
to push a byte onto the data stack, we just:
```asm
dex ; decrement the stack pointer
lda some_value ; load the byte we want on the stack into a
sta 0, x ; put the byte on the stack!
```
and to pop a byte off it:
```asm
lda 0, x ; pop the top of stack off into a
inx ; increment the stack pointer
```
## types
these are used in word definitions, and refer to the type of an individual stack cell:
| type | desc |
| ---------------------- | ----------------------------------------------------------- |
| **bool** | a boolean value, represented by $0000 or $ffff |
| **nat** | an unsigned 16-bit integer |
| **int** | a signed 16-bit integer |
| **char** | an 8-bit george-ascii character, padded with leading zeroes |
| **string** | a 16-bit pointer to a string in memory |
| **word** _`dangerous`_ | a 16-bit pointer to a fuzzy word or quotation |
## operators
- `!` NOT: applies NOT to tos
- `&` AND: pops 2 off the stack and pushes the AND'ed result
- `|` OR: pops 2 off the stack and pushes the OR'ed result
- `+` add: pops 2 off the stack and pushes the sum
- `-` subtract: pops 2 off the stack and pushes the difference
- `*` multiply: pops 2 off the stack and pushes the result, truncating if it's >$FFFF
- `/` divide: pops 2 off the stack and pushes the remainder and quotient
- `=` equality: pushes true/false if the top 2 stack cells do/don't match
- `>` greater than: pushes true/false if tos-1 is/isn't greater than tos
- `<` less than: pushes true/false if tos-1 is/isn't greater than tos
- `#` quote _`dangerous`_: pops tos and pushes a word that produces its value
### supported types (this will need to be more clearly laid out later)
| operator | input type | output type | notes |
| -------- | ------------------------ | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `!` | `bool`, `nat`, `int` | `bool`, `nat`, `int` | |
| `&` | `bool`, `nat`, `int` | `bool`, `nat`, `int` | |
| `\|` | `bool`, `nat`, `int` | `bool`, `nat`, `int` | |
| `+` | `nat` `nat`, `int` `int` | `nat`, `int` | |
| `-` | `nat` `nat`, `int` `int` | `nat`, `int` | subtracting two `nat`s |
| `*` | `nat` `nat`, `int` `int` | `nat`, `int` | most products will be truncated, since most 16 bit multiplications result in a >16 bit product, but in practice that shouldn't matter cause we're not doing science |
| `/` | `nat` `nat`, `int` `int` | `nat` `nat`, `int` `int` | produces two cells, the quotient and remainder |
| `=` | any any | `bool` | equality/order is checked based on stack cell value, not type (e.g. a `word` pointing to $abcd and a `nat` with the value $abcd are equivalent) |
| `>` | any any | `bool` | see above |
| `<` | any any | `bool` | see above |
| `#` | any | `word` | _`dangerous`_ |
## `danger!`
the `danger!` keyword marks a word as being _`dangerous`_. certain language features can only be used in dangerous words, such as:
- inline assembly
- quotations
- typechecking quotations is a difficult problem & probably too complex too implement on george if we ever want to fully self-host fuzzy
- unchecked operator usage
- applying `+` to two chars, applying `&` to two strings, etc
- this does not mean that _dangerous_ words are untyped! just the type of the result of an operation is asserted to be the word result type
- `danger! dangerous_word num num is char: +` can't be used on a `num char` stack, and any words used after `dangerous_word` treat the top of the stack as having a `char` and don't care that it was made with two `num`s
the program body cannot use any _dangerous_ features. this makes it so that _dangerous_ behavior is contained to specific words.
## memory layout
| start | end | use |
| ------ | ------ | ---------------------------- |
| `$200` | `$300` | |
| | | core language implementation |
| | | core language implementation |

78
syntax.md Normal file
View File

@ -0,0 +1,78 @@
# fuzzy syntax in a well-defined grammar so i don't lose my mind
## notation
| notation | meaning |
| -------- | --------------------------------------------- |
| abc | syntactical production |
| : | maps production to children (products?) |
| () | groups items |
| ʕ·ᴥ·ʔ | any 8-bit georgesci character |
| `abc` | exact character(s) |
| \x | an escape character |
| x? | optional |
| x\* | zero or more of x |
| x+ | one or more of x |
| x+y | y or more of x |
| x.y | y repetitions of x |
| \| | one or another |
| [-] | any characters in range (>=1 ranges accepted) |
(adapted from the rust reference cause i like how simple they do it)
## grammar
the only semantically significant whitespace is \n+2 after a word definition.
otherwise, assume tokens are delimited by an arbitrary amount of (not \n+2) whitespace, including no whitespace, e.g. the colon in `hello is: "hello"`
also order is significant! if `value` produced `word` first, it would make reserved words like `true` and `false` parse into word references.
```syntax
george: defs? body
defs: (def \n+2)*
body: values
def: signature `:` values
signature: `danger!`? word typedef
values: (value | op)*
typedef: pop? `is` push? effects?
pop: type*
push: type*
effects: effect*
type: `bool` | `nat` | `int` | `char` | `string` | `word`
effect: `paint` | `sing` | `store`
value: bool | num | char | string | word
op: `!` | `&` | `|` | `+` | `-` | `*` | `/` | `=` | `>` | `<` | `#`
quote: `[` values `]`
bool: `true` | `false`
word: [a-z A-Z]+
num: hexnum | binarynum
binarynum: binarydigit+
binarydigit: [0-9]
hexnum: (`$` hexdigit+)
hexdigit: [0-9 a-f A-F]
char: `'` ʕ·ᴥ·ʔ `'`
string: `"` ʕ·ᴥ·ʔ* `"`
```
## notes
fuzzy assumes the source text to be encoded in [georgesci](#), which is nearly ascii-compatible and should only cause minor headaches <3