/
startup.mu4
1819 lines (1325 loc) · 61.6 KB
/
startup.mu4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
token ( .forth. (unlinked-name)
<:> ] (lit) [ token ) drop c@ , ] parse 2drop ^ [ show
( Phew! now we can have comments!)
( This file is part of muforth: https://muforth.dev/
Copyright 2002-2024 David Frech. (Read the LICENSE for details.)
( This file is muforth/mu/startup.mu4. It contains high-level Forth code
necessary to the useful execution of muforth. This file is loaded and
interpreted every time muforth starts up.
The idea is to move as much code as possible *out* of muforth's C kernel
and instead implement it in Forth. Hence the name "muforth": mu is the
Greek letter often used in engineering to represent "micro".
However, this "micro-ness" only refers to the C kernel; once everything
in this file is loaded, muforth has over 500 words defined!
This file exemplifies a Forth strength - shared by Lisp and Smalltalk,
among other interpretive/compiled languages - that I like to call
"writing the reader"; the reader being, in this case, the Forth
interpreter/compiler.
As defined in the kernel, the interpreter/compiler is very simple; it
only knows how to do the following things:
1. parse a whitespace-delimited token out of the input stream;
2. look up a token in the dictionary, and complain if it is not found;
3. execute the code that is associated with a token;
4. compile a "call" to the code that is associated with a token, by
appending its execution address to the end of the current dictionary
entry;
5. create a new dictionary entry.
That's it! No numbers, no control structures, and no error reporting
other than "xyz isn't defined".
In this file, in Forth, we need to extend the interpreter/compiler to do
the following:
1. compile control structures: if/then, for/next, begin/while/repeat;
2. compile data structures: variables, constants, create/does words;
3. read and write numbers - an interesting exercise since muforth starts
life not even knowing the constants 0 or 1;
4. read and write strings;
5. create return stack exception frames - for error handling and "fluid
binding" of global variables - and unwind through these frames when
errors occur.
Once these are complete we will have a useful Forth for doing real work.
The order of business will sometimes seem haphazard; words can only be
defined after the words they depend on have been defined, so we end up
jumping around a bit in the "semantics" of the language.
Hopefully the reader will find this an interesting exercise in
bootstrapping, which was precisely my intention.
So, here goes; now we start extending the language, bit by bit.)
( The bit of inscrutable poetry at the beginning of this file creates the
word ( which uses parse to look for a closing ) ( character, and
throws away the parsed text.)
( We want to create the current variable, but we have no defining words
to help us. We need to create a create/does word completely by hand. We
do it in two phases.)
( First, create an empty does> clause, and leave on the stack the IP
pointing to the ^/EXIT/UNNEST.)
here ] ^ [
( Next, create in the .forth. chain a <does> word named current, point it
to the empty does> clause above, and copy in the address of the .forth.
chain as its initial value.)
token current .forth. (unlinked-name) show
<does> addr, ( ip of empty does> clause)
.forth. , ( initial value)
( With current defined we can now create the word new which we will use
from now on to create new names in the dictionary. Instead of using
.forth. as above, we create new names in the chain pointed to by current .)
( But before we define new let's create another ( word, this time in the
.compiler. chain, so we can comment the following code. It's a little
complicated, since they have the same name. So we will first find the forth
( and then compile it by hand.)
token ( .compiler. (unlinked-name)
<:> token ( .forth. find huh? compile, ] ^ [ show ( call the forth ( )
token new .forth. (unlinked-name) ( new is defined in the .forth. chain)
<:> ]
token
nope ( placeholder for code we paste in later!)
current @ (unlinked-name) ( link new word to current chain, not .forth.)
^ [ show
( Finally, we can create the word : which we will use to make colon
definitions. It creates a new word, compiles the code field for colon
words, and then uses ] to switch to compiling state.)
new :
<:> ]
new <:> ]
^ [ show
( We don't have ; so let's define it. It lives in the compiler chain, and
when executed, it compiles ^ and then executes [ to switch back to
interpret state from compiling state. Since [ lives in the .compiler.
chain, we have to search for it in a complicated way.)
.compiler. current ! ( make .compiler. the current chain)
: ; compile ^
[ token [ .compiler. find huh? compile, ] show ^ [ show
.forth. current ! ( switch back to .forth.)
( By convention, names of dictionary chains start and end with dots. When a
dictionary chain word is executed, it pushes the address of its link
pointer - which in turns points to the last word defined on the chain.
To make it the current chain, we store this pointer into current. Let's
create a meaningful name for that.)
: definitions current ! ;
( If we want to make a chain the "current" one that receives all new
definitions, we execute the chain, and then execute "definitions". But
that's a lot to type, and we do this a lot, so in general when I create a
new chain - with the dots in its name - I also create a word *without* the
dots, that simply calls the chain word and then calls definitions.
Let's do that for our three existing chains.)
: forth .forth. definitions ;
: compiler .compiler. definitions ;
: runtime .runtime. definitions ;
( Create a literal in the word currently being defined.)
: literal compile (lit) , ;
( End a bracketed computation and use the result to create a literal.)
: #] literal ] ;
( !! NOTE !! Do -NOT- change this part of the file without thinking VERY
hard first. Make changes below the line marked `Add changes below this
line', otherwise it may be difficult to diagnose problems added by new
code.)
( Stack manipulations.)
: rot >r swap r> swap ; ( a b c - b c a) ( fig!)
: -rot swap >r swap r> ; ( a b c - c a b)
: nip swap drop ; ( a b - b)
: tuck swap over ; ( a b - b a b)
( Back up over last token parsed from the input so we can re-parse it.
NOTE: We have to be careful about line numbers. If the token was followed
immediately by a newline, line will have been incremented. When backing
up over the token, we need to reset line to its previous value, otherwise
parsing the token again will increment line *again* and it will no longer
refer to the correct line!
Happily, @line captures the value of line at the *beginning* of a token.
Let's reset line to @line when we back up over the token.)
: untoken parsed drop first addr! @line line ! ;
( Let's define some words that are useful for searching specific dictionary
chains and compiling words from them.)
( Roll tokenizing and searching into one.)
: token' token rot find ; ( chain - a u F | body T)
( Compiling from specific chains. Note that `\' is an elaboration of the
basic scheme of `\chain'. These words will be handy in the assembler
and target compiler.)
( Tick)
: chain' token' huh? ;
: \chain chain' compile, ;
( 28-apr-2000. Do we ever -really- want to search anything other than .forth.?)
: ' .forth. chain' ;
( : ' current @ chain' ; ( XXX)
compiler
( XXX: should this and ' do the same thing?)
( : ['] .forth. chain' literal ;)
( XXX should this search .runtime. rather than .forth. ??)
: ['] ' literal ;
( XXX: is this useful? Here? Maybe in a target compiler...)
: \f .runtime. \chain ;
: \c .compiler. \chain ; ( until we have \ ; we need this for "if")
forth
( We don't even have any constants yet! So we make the easiest one first...)
: 0 [ dup dup xor #] ;
: -1 [ 0 invert #] ;
: 1 [ -1 negate #] ;
: 2 [ 1 2* #] ;
( On and off)
: on -1 swap ! ;
: off 0 swap ! ;
: bl [ 2 2* 2* 2* 2* #] ; ( space character)
: char token drop c@ ; ( grab the first character of the following token)
: ctrl char [ bl 2* ( 64) #] xor ; ( how you get ^? = 127.)
compiler
: char \f char literal ;
: ctrl \f ctrl literal ;
forth
( Before I figured out the trick above, which yields the correct answer for
ctrl ?, I defined ctrl thus:)
( : ctrl char [ bl 1- #] and ; ( 31 and)
( Some useful tidbits.)
: - negate + ;
: u+ ( a b c - a+c b) rot + swap ; ( "under-plus")
: v+ ( x1 y1 x2 y2 - x1+x2 y1+y2) push u+ pop + ; ( add 2-vectors)
: 1+ 1 + ; ( these are common)
: 1- -1 + ;
( addrs are platform-specific - either 32 or 64 bits.)
: addr [ 1 addrs #] ;
: addr+ [ addr #] + ;
: addr- [ addr negate #] + ;
( cells are always 64 bits - 8 bytes.)
: cell [ 1 cells #] ;
: cell+ [ cell #] + ;
: cell- [ cell negate #] + ;
( For fetching and storing a series of bytes.)
: c@+ ( a - b a+1) dup c@ swap 1+ ;
: c!+ ( b a - a+1) tuck c! 1+ ;
( For fetching and storing a series of cells.)
: @+ ( a - n a+) dup @ swap cell+ ;
: !+ ( n a - a+) tuck ! cell+ ;
( Occasionally useful, but not for use by mere mortals. ;-)
: addr@+ ( a - n a+) dup addr@ swap addr+ ;
( Double-length words.)
: 2@ @+ @ swap ; ( cell at lower address to TOP)
: 2! !+ ! ;
: 2dup ( a b - a b a b) over over ;
: 2swap ( a b c d - c d a b) rot push rot pop ;
: 2over ( a b c d - a b c d a b) [ 2 1+ #] nth [ 2 1+ #] nth ;
: 2tuck ( a b c d - c d a b c d) 2swap 2over ;
: = xor 0= ;
: not 0= ; ( warning! this is NOT 1's complement)
: bic invert and ;
: @execute @ execute ;
( jump allows jumping thru a table of addresses; you are responsible for
making sure the index is within range! It must be used at the end of a
word. Common usage looks like this: jump nope do1 do2 do3 [
That example assumes the top of stack has a number from 0 to 3.
Since no UNNEST needs to be compiled, use of [ rather than ; to end the
word is common.)
runtime
: jump ( which) addrs pop + addr@ execute ;
forth
( Mark a branch source for later fixup.)
: mark ( - src) here 0 addr, ;
( Resolve a forward or backward jump, from src to dest.)
( When using absolute branch addresses, this is easy: just store dest at src.)
: <resolve ( dest src) addr! ;
: >resolve ( src dest) swap <resolve ;
compiler
: then ( src) here >resolve ;
: =if ( - src) compile (=0branch) mark ;
: ?if ( - src) compile (?0branch) mark ;
: if ( - src) compile (0branch) mark ;
: again ( dest) compile (branch) mark <resolve ;
: else ( src0 - src1) compile (branch) mark swap \c then ;
: begin ( - dest) here ;
: =until ( dest) \c =if <resolve ;
: ?until ( dest) \c ?if <resolve ;
: until ( dest) \c if <resolve ;
: =while ( dest - src dest) \c =if swap ;
: ?while ( dest - src dest) \c ?if swap ;
: while ( dest - src dest) \c if swap ;
: repeat ( src dest) \c again \c then ;
( n for .. next goes n times; 0 if n=0 )
: for ( - src dest) \c ?if compile push \c begin ;
: next ( dest) compile (next) mark <resolve \c then ;
( do, loop, +loop)
: do ( - src dest) compile (do) mark \c begin ;
: loop ( src dest) compile (loop) mark <resolve \c then ;
: +loop ( src dest) compile (+loop) mark <resolve \c then ;
( make \ more like ANS-Forth's POSTPONE)
( Now, the confusion happens because we need to write code _in this word_
that will compile the above code into _other_ words. How about that?)
( Read a token out of the input stream. If the token is on the compiler
chain, postpone its execution until the word we're compiling executes. If
the token is on the runtime or forth chains, postpone its compilation
until the word that we're compiling executes. Got that? ;-)
: \ .compiler. token' if compile, ^ then
.runtime. find huh? compile compile compile, ;
forth
( A nice way to do full-line comments with no trailing delimiter. It throws
away the rest of the line, scanning for a newline, but only if there was
a space after the -- . Without this test, -- followed directly by a
newline will throw away the _following_ line, which is a bit mystifying.
;-)
: -- trailing if c@ bl = if ctrl J parse 2drop then then ;
compiler
: -- \f -- ;
forth
( Defining words are next. Right now we only `know' how to make `colon'
definitions. We need some structural help first.)
( I wanted to gain a little of the clarity that Chuck Moore's colorForth
gains by getting rid of "[ <calculate something here> ] literal". He
replaces the whole construct with colored words that are executed or
compiled depending on their color, but with a little added twist: when
switching from executed to compiled words -- yellow to green --
colorForth assumes that the yellow words calculated a literal; just
before starting to compile the first green word after the transition,
colorForth compiles a literal.
Even though we don't have color in muforth, we can make things a bit
cleaner by creating a new word -- #] -- that compiles a literal *before*
restarting the colon compiler.
We retain the normal Forth behaviour that ] simply restarts the colon
compiler, doing no other work.)
( Dictionary structure words. Link fields point to link fields. Roughly, a
dictionary entry is the following addr-sized things: suffix, link, code;
where suffix is the last 3 [32-bit] or 7 [64-bit] characters of the name,
followed by its byte-sized length. If length is zero, the word is
*hidden*.)
: link>name ( 'link - a u) 1- dup c@ ( 'len len) tuck - swap ;
( These words all assume we're calculating to or from a code field
address.)
: >link ( 'code - 'link) addr- ;
: link> ( 'link - 'code) addr+ ;
: >name ( 'code - a u) >link link>name ;
: >ip ( 'code - 'ip) addr+ ;
: ip> ( 'ip - 'code) addr- ;
: >body ( 'code - 'body) >ip addr+ ;
: body> ( 'body - 'code) addr- ip> ;
( Undefine a word by zeroing out the length byte of the name.)
: undef token current @ find if >link 1- 0 swap c! ^ then complain ;
( create and does>. Everything old is new again. ;-)
( 2010-nov-30. After many iterations, I have finally arrived at fig-forth's
implementation of create/does>. The only difference is the names of the
words.)
( In fig, there are several kinds of words:
* CODE words, whose code field points to machine code
* COLON words, whose code field points to docolon, and whose bodies
contain a list of execution tokens
* CONSTANTS, whose code field points to doconst, and whose body
contains a value
* VARIABLES, whose code field points to dovar, and whose body contains
a value
* DOES words, whose code field points to dodoes, and whose body
contains an IP pointer, followed optionally by data.
In muforth there are only three kinds of words:
* CODE words - primitives defined in C whose code field points to the C
code implementation
* COLON words, whose code field points to docolon; body contains a list
of execution tokens
* DOES words, whose code field points to dodoes; body starts with IP
pointer - to parent's Forth code - followed optionally by data.
fig and muforth share this inefficient but simple implementation. In the
case of fig, it was because they didn't know any better. In my case, I
knew better but in the interest of avoiding machine-code dependencies -
the efficient way of compiling does> words essentially being a form of
DTC [direct-threaded code] - I had no choice.
If you want a threaded-code implementation using only pure pointers, you
need two pointers in each "child" word defined with create/does: one to
point to C [dodoes] and one to point to Forth [the body of the parent
defining word].)
( last-created contains the address of the ip address slot of the last
<does> word defined.
We make the variable by hand, since we are about to create the variable
and constant defining words, but we don't have them yet!)
current body> >ip addr@ ( re-use current's IP pointer to empty does> body)
new last-created show
<does> addr, ( ip of empty does> clause)
0 , ( initial value)
( does> fixes up the does ip of the last <does> word to point to the code
after "does>" in the caller.)
: does> pop last-created @ addr! ;
( The underlying engine for all "create/does>" words. It allocates no data
space in the word. This version does not consume a token from the input
stream, instead expecting one on the stack. This is particularly useful
if we are meta-compiling another Forth and want to compile heads into the
image, since we have to process the token - the name of the new word -
twice.)
( Everything is defined in terms of `create-hidden'.)
: create-hidden
new <does>
here last-created ! 0 addr, ( placeholder for does ip)
does> ; ( make the does ip point *somewhere*)
( We'll use this version of "create" for all host words, since they will be
*data* words. But for the target compiler we are going to have a *target*
colon compiler, which will have the same hide/show problem that we have on
the host. So we need to be able to create hidden target words, and we will
use create-hidden for this.)
: create create-hidden show ; ( always immediately show host create'd words)
: constant ( value)
create , ( compile the constant) does> @ ;
: 2constant ( v1 v2)
create , , ( compile the constants) does> 2@ ;
( An array with every cell set to a default value.)
: defarray ( default cells) create for dup , next drop ;
: array ( cells) 0 swap defarray ;
( A byte array; length is rounded up to cell boundary.)
: buffer ( bytes) aligned ( round up) cell/ array ;
( A self-indexing array with every cell set to a default value.)
: defarray+ ( default cells) defarray does> ( i - a) swap cells + ;
: array+ ( cells) 0 swap defarray+ ;
: variable create 0 , ;
: 2variable variable 0 , ;
( To bracket comments in a flexible way. If you've bracketed some text
using comment, changing "comment" to "uncomment" will interpret the
bracketed text - the delimiter becomes a noop.)
: comment
token ( the comment end token to match)
begin 2dup token =while string= until 2drop ^ then
2drop 2drop 2drop ;
: uncomment new <:> \ ^ ; ( create a noop word)
( How about a really cool word that makes self-parsing comment words? In
other words, like using "comment" - defined above - but instead of having
to say "comment **foobar** <commented text> **foobar**", you define
**foobar** to skip tokens until it comes to a matching **foobar**!!)
comment no-self-comments
: make-comment create does> drop untoken comment ;
( Here is one to get you started - good for block comments. It's 75
characters long:)
make-comment
===========================================================================
no-self-comments
( I guess we can have deferred words, even though they are, in some ways,
inelegant. The alternative - creating a variable and a colon word that
calls through that variable, for _every_ deferred word - is also in some
ways inelegant - and clumsy.
Actually, the way we define this is exactly equivalent to what we would
have to do with variables; the difference is that instead of two named
objects - the variable and the colon word that calls thru it - we have
one - the deferred word - and we need an extra mechanism to get to its
value to change it.
The main argument _against_ deferred words is that they aren't orthogonal
w.r.t. _user_ variables. The way we are defining them here they are
implemented using a global, system variable. On muforth, we don't care,
because we don't _have_ user variables; but on a properly multithreaded
target machine things are different. There we probably wouldn't implement
deferred words at all, using instead the "<variable> @execute" idiom; or,
indeed, we could have all deferred use _user_ variables instead of
globals. But that's what the fuss is.
That and that "vectoring" them isn't strictly postfix. And it requires
architecture-specific code!)
variable undeferred ' nope undeferred !
variable last-deferred-executed
: defer create undeferred @ ,
does> dup last-deferred-executed ! @execute ;
( Syntactic sugar - from Rod Crawford's 4ARM.)
: now ' ;
: is ' >body ! ; ( as in `now host-interpret is interpret')
compiler
: now ' literal ;
: is ' >body literal \ ! ;
forth
( Defining new dictionary chains.)
( These used to be in an array but are now independent of each other. They
are structures, created in the body of a does word, that look just like a
name entry in the dictionary - a name-suffix followed by a link field.
The name entry is always the string "muchain" followed by a zero length
byte. This is exactly 8 bytes long - the length of a suffix now that
muforth is 64-bit. The name identifies the word as the head of a dictionary
chain.
The name is hidden - by setting the length to zero - so that dictionary
searches and word listings won't see it.
The link field points to the link field within the name entry of the last
word defined on the chain.
We create new chains by reusing the code field and the "muchain" name
field from .forth. - an existing chain that is created by C code in
src/dict.c that is executed at startup.)
: chain ( anchor-link)
new [ ' .forth. addr@ #] addr, ( code field)
[ .forth. cell- @ #] , ( hidden "muchain" name field)
addr, ( anchor-link) show ;
: sealed 0 chain ; ( create an independent vocab chain)
: chained current @ chain ; ( chain to the current vocab)
( It's also possible to chain to an -arbitrary- vocab by simply doing this:
.arbitrary. chain .new-is-chained-to-arbitrary. )
( When executed, a chain pushes the address of the link field following the
fake "muchain" name. To print out the name of a chain, execute it - or
fetch current to get the current chain - and then execute
>chain-name type
)
( cell- skips backward over "muchain"; addr- skips backward to point to the
code field; from there >name gets us to the name!)
: >chain-name ( 'chain - a u) cell- addr- >name ;
( Conditional compilation.)
sealed .conditional.
: conditional .conditional. definitions ;
( eat consumes tokens until it either consumes all the input - in which
case the while loop will exit - or an execute'd word returns _true_ to
exit the containing loop. ?toss processes each token. If it exists in
.conditional. , it executes it; otherwise, it throws it away.)
: ?toss .conditional. find if execute ^ then 2drop 0 ;
: eat 0 ( nesting) begin token =while ?toss until drop ( nesting) ^
then 2drop ( token) drop ( nesting) ;
compiler
: .if 0= if eat then ;
: .else eat ;
: .then ;
( Consume a token, search a chain, and return only the "found or not" flag.)
: .contains ( chain - found) token' nip =if ^ then nip ;
: .def .forth. \ .contains ;
: .ndef \ .def 0= ;
: .ifdef \ .def \ .if ;
: .ifndef \ .ndef \ .if ;
conditional
( nesting - nesting exitflag)
: .if 1+ 0 ; ( .if nests, never exits)
: .else dup 0= ; ( .else doesn't nest, exits if nesting at 0)
: .then 1- dup 0< ; ( .then unnests, exits if nesting -was- at 0)
: .ifdef 1+ 0 ; ( these are like .if)
: .ifndef 1+ 0 ;
forth
: .if \ .if ;
: .else \ .else ;
: .then ;
: .def \ .def ;
: .ndef \ .ndef ;
: .ifdef \ .ifdef ;
: .ifndef \ .ifndef ;
: .contains \ .contains ;
: .and and ;
: .or or ;
: .not 0= ;
-- -----------------------------------------------------------------------
-- Schleisiek-style return stack words.
-- -----------------------------------------------------------------------
( Trying out, after all these years, the techniques that Klaus Schleisiek
presented in 1984 [at FORML] and that I read about in 1993.
The basic idea is that, in addition to return address pointers [saved
IPs], there are stack frames on the return stack. These can be for any
purpose, but we're interested here the following: local variable storage,
"fluid" rebinding of variables - aka dynamic scoping, and
cleanup-on-return - eg, to close a file that we opened.)
( Here is a picture of the return stack, with high memory towards the top of
the page, and low memory further down:
^ | |
| +--------------------+
| | prev return addr |
| +--------------------+
| | ... | several cells could be here; depends on the
| +--------------------+ type of frame
| | ... |
| +--------------------+
| | cfa of cleanup |
| +--------------------+
+---+ prev frame |<--- fp
+--------------------+
| ip of remove |<--- rp remove calls unlink
+--------------------+ )
runtime
variable fp ( the "top" - most recently pushed - frame)
( fp points to a frame ptr, which pts to a frame ptr...)
( link creates a new frame. It fetches the cfa of the following word and
pushes it onto the return stack. This is the cleanup routine. Then it
links this frame into the list rooted at fp, and then returns to its
caller, skipping the following cfa. link is called by a word that builds
a new stack frame.)
: link r> addr@+ swap >r ( fetch & skip following cfa & push to r)
fp @ >r rp@ fp ! ( link this frame to previous)
>r ( restore return address) ;
( unlink undoes what link did. It unlinks the frame from the list rooted at
fp, and then runs the cleanup routine, which will do whatever is
necessary to de-allocate the frame and undo any state changes made by the
word that called link.)
: unlink r> ( save return address)
fp @ rp! r> fp ! ( unlink frame)
r> execute ( execute cleanup word)
>r ( restore return address) ;
create remove ] unlink ; ( remove pushes IP when executed!)
( Now some interesting applications.)
-- -----------------------------------------------------------------------
-- Catch and throw
-- -----------------------------------------------------------------------
variable cf ( catch frame pointer)
( These don't save or restore SP.)
: catch
r> addr@+ >r ( fetch & skip following cfa)
cf @ >r ( push prev catch frame pointer)
rp@ cf ! ( now point to this frame)
execute
r> cf ! ( restore prev catch frame pointer)
0 ;
( throw returns to word after catch. It is up to this code to unwind the
stack!)
: throw ( error)
?if
cf @ cell+ @ >r ( pretend to return from catch!)
then ;
( unwind is useful in the context of exceptions. It starts at fp and
unlinks each frame in turn until fp is zero or points to a frame above
the current catch frame.)
( XXX Right now we are using unwinding as an on/off toggle, but in the
future we could have different bits that could be tested by the various
cleanup routines.)
variable unwinding
: unwind ( unwind-flags)
unwinding !
r> ( ra)
( While fp non-zero and pushed frames are below last catch frame, unlink them.)
begin fp @ dup cf @ u< and while unlink repeat
cf @ rp!
r> cf ! ( restore prev catch frame pointer)
rdrop ( discard return address from catch - we've already executed it!)
>r ( ra)
unwinding off ;
-- -----------------------------------------------------------------------
-- Fluid binding (dynamically-scoped variables)
-- -----------------------------------------------------------------------
( Restore saved value of a normal cell-sized variable.)
: restore
r> ( ra) r> r> ( value addr) ! >r ( ra) ;
( Preserve the value of a variable for the duration of the execution of the
calling word.)
: preserve ( addr) ( address of variable)
r> ( ra)
over ( addr) >r swap @ ( value) >r
link restore ( push cleanup)
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
( Sometimes we need to save and restore values from C code - the
interpreter is the main example of this - and since the C values are
pointer-sized - ie addrs - we can't use @ and ! to save and restore the
value, so we need special copies of restore and preserve for this!)
( Restore saved value of an addr-sized variable.)
: addr-restore
r> ( ra) r> r> ( value addr) addr! >r ( ra) ;
( Preserve the value of an addr-sized variable for the duration of the
execution of the calling word.)
: addr-preserve ( addr) ( address of addr-sized variable)
r> ( ra)
over ( addr) >r swap addr@ ( value) >r
link addr-restore ( push cleanup)
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
-- -----------------------------------------------------------------------
-- Cleanup on return
-- -----------------------------------------------------------------------
: cleanup
r> ( ra) r> ( value) r> ( cfa) execute >r ( ra) ;
( Push value and following cfa to R stack; on exit or unwind, execute cfa
with value on the stack.)
: on-exit ( value)
r> ( ra)
addr@+ swap >r ( fetch & skip following cfa & push to r)
swap >r ( push value)
link cleanup ( push code to undo whatever needs undoing)
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
-- -----------------------------------------------------------------------
-- Local variable frames
-- -----------------------------------------------------------------------
( Deallocate local variables.)
: unroom
r> ( ra)
r> ( #cells) rp+! ( rp+! takes cell count!)
>r ( ra) ;
( Allocate space for local variables.)
( NOTE: do -not- try to use a for loop to push cells! It doesn't work! The
return stack is being used to store the loop index, but you're busy
pushing stuff there! All hell breaks loose! If you absolutely want to
zero locals as they are allocated, do a begin/until loop with the count
on the data stack.)
: room ( #cells)
r> ( ra)
( choose one! mark, zero, allocate)
-- swap dup begin "55aa55aa >r 1- dup 0= until drop ( mark)
-- swap dup begin 0 >r 1- dup 0= until drop ( zero)
swap dup negate rp+! ( allocate)
( #cells) >r
link unroom
remove >r ( normal return - unlink and cleanup)
>r ( ra) ;
forth
-- -----------------------------------------------------------------------
-- End of fancy R-stack goodies, and back to pedestrian Forth.
-- -----------------------------------------------------------------------
( Number input)
variable dpl ( location of last . ) dpl on ( -1)
variable radix
: radixer constant does> @ radix ! ;
2 2* 2* dup 2* ( 16!) radixer hex
dup ( 8!) radixer octal
2 + ( 10!) radixer decimal
2 radixer binary
decimal
( Punctuation in numbers: sign, radix, decimal point, separators.)
( NOTE WELL: This code - the number parsing code - has been a thorn in my
side for ever. You'll see, as you read the following code and comments,
that over the years I have made changes, but it has never been as simple
or as elegant as I would like. It needs a really good whacking.)
( 2006-mar-26. Ok, so this *totally* sucks. The presence of these bits of
punctuation can mask a word not being found in the dictionary. A bare /,
for instance, with no digits to keep it company, is happily parsed as a
number. The number? 0. Urgh.)
: punct ( a u ch - a' u' matched)
over if ( still chars to process) swap push over c@ xor if
( no match) pop 0 ^ then
( match) pop 1 -1 v+ -1 ^ then
( end of input) drop 0 ;
: ?sign ( a u - a' u' neg) char - punct if -1 ^ then 0 ;
( I wanted to add Michael Pruemm's '0' as a hex specifier, but it's not as
simple as adding it to this list. It will match a bare 0, which won't be
matched as a number.)
: ?radix ( a u - a' u')
( char 0 punct if hex ^ then )
char " punct if hex ^ then ( " for hex and ' for octal are Donald Knuthisms)
char ' punct if octal ^ then
char $ punct if hex ^ then ( $ for hex is a time-worn convention)
char # punct if decimal ^ then
char % punct if binary ^ then ;
( . resets dpl; others leave it unchanged; this means that embedding . in a
number causes dpl to be set to the count of digits _after_ the _last_ .
in the number.)
: dot? ( a u - a' u' matched)
char . punct if dpl off -1 ^ then
char , punct if -1 ^ then
char - punct if -1 ^ then
char / punct if -1 ^ then
char : punct if -1 ^ then
char _ punct if -1 ^ then 0 ;
( This is scary. We need a bunch of literals for `digit>'.)
: digit> ( ch - digit | junk)
char 0 - [ 2 2* 2* 1+ #] ( 9) over u< if ( !decimal)
[ 2 2* 2* 2* 1+ #] ( 17) -
[ 2 1+ 2* 2* 2* 1+ #] ( 25) over u< if ( !hex, UPPERCASE)
[ 2 2* 2* 2* 2* #] ( 32) -
[ 2 1+ 2* 2* 2* 1+ #] ( 25) over u< if ( !hex, lowercase)
( junk) ^
then then ( hex) [ 2 2* 1+ 2* #] ( 10) + then ( decimal) ;
: digit? ( ch - digit T | junk F) digit> dup radix @ u< ;
: @digit? ( a - a digit T | a junk F) dup c@ digit? ;
: *digit ( accum a digit - accum*base+digit a)
rot radix @ * + swap dpl @ 0< 1+ dpl +! ;
( 2002-mar-23. I still don't like how number parsing works. On the one
hand, we know ahead of time exactly how many characters we have [in the
token we are trying to convert]; on the other, the way the prefix [sign
and radix] and embedded [. , - : /] characters work, we can't simply put
them in a loop: there should be at most one sign and one radix at the
beginning. Right now I have >number [which converts digits] and punct
words _both_ checking if there are any characters left to process. This
seems clumsy.
And that "dpl!" in ?dot bugs me, too.)
( ANS compatible! - or at least it was when it converting with double numbers.)
( If >number finds a non-digit, it pops the return stack - which contains
the for loop counter - and returns this value, which is number of
characters left in the token.)
: >number ( accum a u - accum' a' u') ( a' is first unconvertible char)
for @digit? 0= if drop pop ^ then *digit 1+ next 0 ;
: digits ( accum a u - accum' a' u' #converted)
dup push ( chars left) >number pop over - ;
( XXX 2009-sep-01. The following doesn't make sense, and it's a lie as
well, since 'number,' doesn't exist any more:
Now some help for the colon compiler. Note that the colon compiler now
calls `number,' to convert-and-compile and calls `number' when interpreting.
This is so that `number,' or `number' can reset dpl when they're done. We do
this so that constants don't screw up fixed-point arithmetic conversion.
Without this code, if you were to use a fixed-point number, 3.1415 eg, dpl
would be set to 4. Then `0' pushes 0 on the stack but doesn't affect dpl,
so Forth tries to convert it, and BOOM.)
: ?bad-number ( sign accum a u good - sign accum a u | a u 0)
if ^ then 2push 2drop 2pop 0 shunt ;
: number? ( a u - n -1 | a' u' 0)
radix preserve ( always reset the radix, even in case of error)
?radix ?sign -rot dpl on 0 -rot ( sign accum a u)
begin digits ?bad-number =while ( still chars to parse)
dot? ?bad-number repeat
2drop swap if negate then -1 ;
: number number? huh? ;
( Ok, folks, now that we have number parsing code we can redefine the
interpreter and compiler, which up till this point have simply complained
if they saw something not in the dictionary.)
( First we need to re-define [ and ] . We will define these as
interpreter "modes", which consist of a word to display a prompt string
and a token consumer function. Since we don't currently have a way to
compile strings,we will use nope until we re-define the prompts later.)
variable state ( interpret or compile -- or whatever!)
: mode create ( prompt token-consumer) , , does> state ! ;
( Redefine the forth "consumer" to try to convert numbers after failing to
find a token in the .forth. chain. Since we don't yet have a way to
create nameless colon definitions, do it by hand!)
' nope ( null prompt for interpret mode)
here <:> ] ( make a nameless colon word)
( interpret one token)
.forth. find if execute ^ then number ;
compiler