Hello! Thanks for your potential interest in contributing to Porffor :)
This document hopes to help you understand Porffor-specific TS, specifically for writing built-ins (inside compiler/builtins/*.ts
eg btoa
, String.prototype.trim
, ...). This guide isn't really meant for modifying the compiler itself yet (eg compiler/codegen.js
), as built-ins are ~easier to implement and more useful at the moment.
I mostly presume decent JS knowledge, with some basic TS too but nothing complicated. Knowing low-level stuff generally (pointers, etc) and/or Wasm (bytecode) is also a plus but hopefully not required.
If you have any questions you can ask in the Porffor Discord, please feel free to ask anything if you get stuck :)
Please read this entire document before beginning as there are important things throughout.
- Clone the repo and enter the repo (
git clone https://github.com/CanadaHonk/porffor.git
) npm install
The repo comes with easy alias scripts for Unix and Windows, which you can use like so:
- Unix:
./porf path/to/script.js
- Windows:
.\porf path/to/script.js
You can also swap out node
in the alias to use another runtime like Deno (deno run -A ...
) or Bun (bun ...
), or just use it yourself (eg node runner/index.js ...
, bun runner/index.js ...
). Node, Deno, Bun should work.
If you update any file inside compiler/builtins
you will need to do this for it to update inside Porffor otherwise your changes will have no effect. Run ./porf precompile
to precompile. It may error during this, if so, you might have an error in your code or there could be a compiler error with Porffor (feel free to ask for help as soon as you encounter any errors with it).
Porffor has usual JS types (or at least the ones it supports), but also internal types for various reasons.
The most important and widely used internal type is ByteString. Regular strings in Porffor are UTF-16 encoded, so each character uses 2 bytes. ByteStrings are special strings which are used when the characters in a string only use ASCII/LATIN-1 characters, so the lower byte of the UTF-16 characters are unused. Instead of wasting memory with all the unused memory, ByteStrings instead use 1 byte per character. This halves memory usage of such strings and also makes operating on them faster. The downside is that many Porffor built-ins have to be written twice, slightly different, for both String
and ByteString
types.
This is complicated internally but essentially, only use it for pointers. (This is not signed or unsigned, instead it is the Wasm valtype i32
so the signage is ~instruction dependant).
Pointers are the main (and most difficult) unique feature you ~need to understand when dealing with objects (arrays, strings, ...).
We'll explain things per common usage you will likely need to know:
Porffor.wasm`local.get ${foobar}`
Gets the pointer to the variable foobar
. You don't really need to worry about how it works in detail, but essentially it gets the pointer as a number (type) instead of as the object it is.
Porffor.wasm.i32.store8(pointer, characterCode, 0, 4)
Stores the character code characterCode
at the pointer pointer
for a ByteString.1
Porffor.wasm.i32.store16(pointer, characterCode, 0, 4)
Stores the character code characterCode
at the pointer pointer
for a String.1
Porffor.wasm.i32.load8_u(pointer, 0, 4)
Loads the character code at the pointer pointer
for a ByteString.1
Porffor.wasm.i32.load16_u(pointer, 0, 4)
Loads the character code at the pointer pointer
for a String.1
Porffor.wasm.i32.store(pointer, length, 0, 0)
Stores the length length
at pointer pointer
, setting the length of an object. This is mostly unneeded today as you can just do obj.length = length
. (The 0, 4
args are necessary for the Wasm instruction, but you don't need to worry about them (0
alignment, 0
byte offset).
Here is the code for ByteString.prototype.toUpperCase()
:
export const __ByteString_prototype_toUpperCase = (_this: bytestring) => {
const len: i32 = _this.length;
let out: bytestring = '';
Porffor.wasm.i32.store(out, len, 0, 0);
let i: i32 = Porffor.wasm`local.get ${_this}`,
j: i32 = Porffor.wasm`local.get ${out}`;
const endPtr: i32 = i + len;
while (i < endPtr) {
let chr: i32 = Porffor.wasm.i32.load8_u(i++, 0, 4);
if (chr >= 97) if (chr <= 122) chr -= 32;
Porffor.wasm.i32.store8(j++, chr, 0, 4);
}
return out;
};
Now let's go through it section by section:
export const __ByteString_prototype_toUpperCase = (_this: bytestring) => {
Here we define a built-in for Porffor. Notably:
- We do not use
a.b.c
, instead we use__a_b_c
- We use a
_this
argument, asthis
does not exist in Porffor yet - We use an arrow function
- We do not set a return type as prototype methods cannot use them currently or errors can happen.
const len: i32 = _this.length;
let out: bytestring = '';
Porffor.wasm.i32.store(out, len, 0, 0);
This sets up the out
variable we are going to write to for the output of this function. We set the length in advance to be the same as _this
, as foo.length == foo.toLowerCase().length
, because we will later be manually writing to it using Wasm intrinsics, which will not update the length themselves.
let i: i32 = Porffor.wasm`local.get ${_this}`,
j: i32 = Porffor.wasm`local.get ${out}`;
Get the pointers for _this
and out
as i32
s (~number
s).
const endPtr: i32 = i + len;
while (i < endPtr) {
Set up an end target pointer as the pointer variable for _this
plus the length of it. Loop below until that pointer reaches the end target, so we iterate through the entire string.
let chr: i32 = Porffor.wasm.i32.load8_u(i++, 0, 4);
Read the character (code) from the current _this
pointer variable, and increment it so next iteration it reads the next character, etc.
if (chr >= 97) if (chr <= 122) chr -= 32;
If the character code is >= 97 (a
) and <= 122 (z
), decrease it by 32, making it an upper case character. eg: 97 (a
) - 32 = 65 (A
).
Porffor.wasm.i32.store8(j++, chr, 0, 4);
Store the character code into the out
pointer variable, and increment it.
- For declaring variables, you must use explicit type annotations currently (eg
let a: number = 1
, notlet a = 1
). - You might spot
Porffor.fastOr
/Porffor.fastAnd
, these are non-short circuiting versions of||
/&&
, taking any number of conditions as arguments. You shouldn't don't need to use or worry about these. - There are ~no objects, you cannot use them.
- Attempt to avoid string/array-heavy code and use more variables instead if possible, easier on memory and CPU/perf.
- Do not set a return type for prototype methods, it can cause errors/unexpected results.
- You cannot use other functions in the file not exported, or variables not inside the current function.
if (...)
uses a fast truthy implementation which is not spec-compliant as most conditions should be strictly checked. To use spec-compliant behavior, useif (Boolean(...))
.- For object (string/array/etc) literals, you must use a variable eg
const out: bytestring = 'foobar'; console.log(out);
instead ofconsole.log('foobar')
due to precompile's allocator constraints. - Generally prefer/use non-strict equality ops (
==
/!=
).
There is 0 setup for this (right now). You can try looking through the other built-ins files but do not worry about it a lot, I honestly do not mind going through and cleaning up after a PR as long as the code itself is good :^)
You should ideally have one commit per notable change (using amend/force push). Commit messages should be like ${file}: ${description}
. Don't be afraid to use long titles if needed, but try and be short if possible. Bonus points for detail in commit description. Gold star for jokes in description too.
Examples:
builtins/date: impl toJSON
builtins/date: fix ToIntegerOrInfinity returning -0
codegen: fix inline wasm for unreachable
builtins/array: wip toReversed
builtins/tostring_number: impl radix
For the first time, ensure you run ./test262/setup.sh
.
Run node test262
to run all the tests and get an output of total overall test results.
Warning: this will consume 1-6GB of memory and ~90% of all CPU cores while running (depending on thread count), it should take 15-120s depending on machine. You can specify how many threads with --threads=N
, it will use the number of CPU threads by default.
The main thing you want to pay attention to is the emoji summary (lol):
🧪 50005 | 🤠 7007 (-89) | ❌ 1914 (-32) | 💀 13904 (-61) | 📝 23477 (-120) | ⏰ 2 | 🏗 2073 (+302) | 💥 1628
To break this down: 🧪 total 🤠 pass ❌ fail 💀 runtime error 📝 todo (error) ⏰ timeout 🏗️ wasm compile error 💥 compile error
The diff compared to the last commit (with test262 data) is shown in brackets. Basically, you want passes 🤠 up, and errors 💀📝🏗💥 down. It is fine if some errors change balance/etc, as long as they are not new failures.
It will also log new passes/fails. Be careful as sometimes the overall passes can increase, but other files have also regressed into failures which you might miss. Also keep in mind some tests may have been false positives before, but we can investigate the diff together :)
- Use
node test262 path/to/tests
to run specific test262 dirs/files (egnode test262 built-ins/Date
). - Use
--log-errors
to log the errors of individual tests. - Use
--debug-asserts
to log expected/actual of assertion failures (experimental).