79 lines
2.2 KiB
Markdown
79 lines
2.2 KiB
Markdown
# fuzzy syntax in a well-defined grammar so i don't lose my mind
|
|
|
|
## notation
|
|
|
|
| notation | meaning |
|
|
| -------- | --------------------------------------------- |
|
|
| abc | syntactical production |
|
|
| : | maps production to children (products?) |
|
|
| () | groups items |
|
|
| ʕ·ᴥ·ʔ | any 8-bit georgesci character |
|
|
| `abc` | exact character(s) |
|
|
| \x | an escape character |
|
|
| x? | optional |
|
|
| x\* | zero or more of x |
|
|
| x+ | one or more of x |
|
|
| x+y | y or more of x |
|
|
| x.y | y repetitions of x |
|
|
| \| | one or another |
|
|
| [-] | any characters in range (>=1 ranges accepted) |
|
|
|
|
(adapted from the rust reference cause i like how simple they do it)
|
|
|
|
## grammar
|
|
|
|
the only semantically significant whitespace is \n+2 after a word definition.
|
|
|
|
otherwise, assume tokens are delimited by an arbitrary amount of (not \n+2) whitespace, including no whitespace, e.g. the colon in `hello is: "hello"`
|
|
|
|
also order is significant! if `value` produced `word` first, it would make reserved words like `true` and `false` parse into word references.
|
|
|
|
```syntax
|
|
george: defs? body
|
|
|
|
defs: (def \n+2)*
|
|
body: values
|
|
|
|
def: signature `:` values
|
|
signature: `danger!`? word typedef
|
|
|
|
values: (value | op)*
|
|
|
|
typedef: pop? `is` push? effects?
|
|
|
|
pop: type*
|
|
|
|
push: type*
|
|
|
|
effects: effect*
|
|
|
|
type: `bool` | `nat` | `int` | `char` | `string` | `word`
|
|
|
|
effect: `paint` | `sing` | `store`
|
|
|
|
value: bool | num | char | string | word
|
|
|
|
op: `!` | `&` | `|` | `+` | `-` | `*` | `/` | `=` | `>` | `<` | `#`
|
|
|
|
quote: `[` values `]`
|
|
|
|
bool: `true` | `false`
|
|
|
|
word: [a-z A-Z]+
|
|
|
|
num: hexnum | binarynum
|
|
|
|
binarynum: binarydigit+
|
|
binarydigit: [0-9]
|
|
hexnum: (`$` hexdigit+)
|
|
hexdigit: [0-9 a-f A-F]
|
|
|
|
char: `'` ʕ·ᴥ·ʔ `'`
|
|
|
|
string: `"` ʕ·ᴥ·ʔ* `"`
|
|
```
|
|
|
|
## notes
|
|
|
|
fuzzy assumes the source text to be encoded in [georgesci](#), which is nearly ascii-compatible and should only cause minor headaches <3
|