forked from Forth-Standard/forth200x
/
text-substitution.txt
381 lines (308 loc) · 11.2 KB
/
text-substitution.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
From: stephenXXX@mpeforth.com (Stephen Pelc)
Newsgroups: comp.lang.forth
Subject: RfD Substitute v3
Organization: MPE
Message-ID: <4ab11e22.84952232@192.168.0.50>
Date: Wed, 16 Sep 2009 17:20:21 GMT
Text substitution proposal
Revised: 16 September 2009
Author:
Stephen Pelc, MicroProcessor Engineering, stephen@mpeforth.com
Input from:
Peter Knaggs, Leon Wagner, Bernd Paysan, Anton Ertl
Contact:
Stephen Pelc
MicroProcessor Engineering
133 Hill Lane
Southampton SO15 5AF
England
Tel: +44 (0)23 80631441
Fax: +44 (0)23 80339691
Net: stephen@mpeforth.com
Web: http://www.mpeforth.com
Rationale
=========
Why do this?
------------
Text substitution is useful for a number of applications,
including:
1)Directory handling
2)Configuration management
3)Localisation
Many applications need to be able to perform text substitution,
for example:
Your balance at <time> on <date> is <currencyvalue>.
We can provide for these requirements by defining a text
substitution facility. For example, we can provide an initial
string in the form:
Your balance at %time% on %date% is %currencyvalue%.
In this example, % is used as delimiters for the substitution
name. The text “time”, "date” and “currencyvalue” are text
substitutions, where “time” and “date” insert the current time and
date, and “currencyvalue” inserts the top item on the stack as a
string in the active currency.
Implementation
--------------
Implementation of SUBSTITUTE may be considered as being equivalent
to a wordlist which is searched. If the substitution name is
found, the word is executed to return a string that replaces the
substitution name and delimiters. Such words can be deferred or
multiple wordlists can be used. The implementation techniques
required are similar to those many implementers have used with
ENVIRONMENT?. The range of functions required can be handled by
standardising substitution names. Practical use indicates that
implemention with a wordlist eases adding substitutions that
perform run-time manipulations, e.g. time and date.
This proposal is derived from an implementation used in a
Forth system for more than ten years. The % character was
chosen as the default delimiter because it is an illegal character
in DOS and Windows file and path names. By choosing an illegal
character, substitutions in file name handling can very easily be
implemented, even at the level of OPEN-FILE and CREATE-FILE.
There is no provision for changing the delimiter character because
it causes difficulties when porting code and can have dangerous
side effects at run-time, especially when delimiter usage varies
in different source libraries. Similarly, use of separate leading
and trailing delimiter characters increases the complexity of some
substitutions. The simplest choice is to use one fixed character.
Whatever character is chosen, it will cause problems somewhere. As
with the whole standard document, nothing in this proposal
prohibits a system from providing system-specific extensions.
Usage
-----
Translation of a sentence or message from one language to another
may result in changes to the displayed parameter order. For
example, the substitution in
Your balance at %time% on %date% is %currencyvalue%.
requires a different order in Russian.
When an application provides substitution of parameters on the
data stack by SUBSTITUTE and uses substitutions to access these
items, it is recommended that items not be destroyed but instead
be removed after execution of SUBSTITUTE. For example, the
previous string could be redone as:
Your balance at %$4% on %$2% is %i0%.
where the substitution $x takes the caddr/len string at stack
depth x, and the substitution ix takes a signed integer. The
depths are defined after the parameters to SUBSTITUTE itself have
been removed. The word SUBSTITUTE would then be called as:
$time $tlen $date $dlen val src srclen dest dlen SUBSTITUTE
Proposal
========
The following two words are defined to handle substitution of
text, and output of application data such as date, time, currency,
path and so on.
17.6.2.aaaa REPLACES "replaces" STRING EXT
Execution: c-addr1 len1 c-addr2 len2 --
Set the string defined by c-addr1 and len1 as the text to
substitute for the substitution named by c-addr2 and len2. If the
substitution does not exist it is created. The program may then
reuse the buffer c-addr1/len1 without affecting the definition of
the substitution. Ambiguous conditions occur as follows
1)The substitution cannot be created.
2)The name of a substitution contains a delimiter character.
17.6.2.bbbb SUBSTITUTE "substitute" STRING EXT
Execution: c-addr1 len1 c-addr2 len2 -- c-addr2 len3 n
Perform substitution on the string at c-addr1 and len1 placing the
result at string c-addr2 and len2, returning c-addr2 and len3, the
length of the resulting string. An ambiguous condition occurs if
the resulting string will not fit into c-addr2/len2 or if c-addr2
is the same as c-addr1. The return value n is positive (0..+n) on
success and indicates the number of substitutions made. A negative
value for n indicates that an error occurred, leaving c-addr2 and
len3 undefined. Substitution occurs from the start of c-addr1 and
len1 in one pass and is non-recursive.
When a substitution name surrounded by '%' (ASCII 0x25) delimiters is
encountered by SUBSTITUTE, the following occurs:
1)If the name is null, a single delimiter character is
substituted, i.e. %% is replaced by %.
2)If the name is a valid substitution name, the leading and
trailing delimiter characters and the enclosed substitution name
are replaced by the substitution text.
3)If the name is not a valid substitution name, the name with
leading and trailing delimiters is passed unchanged to the output.
17.6.2.cccc UNESCAPE "unescape" STRING EXT
Execution: c-addr1 len1 c-addr2 -- c-addr2 len2
Replace each '%' character in the input string c-addr1/len1 by two
'%' characters. The output is represented by c-addr2/len2. The
buffer at c-addr2 must be big enough to hold the unescaped string.
Reference implementation
========================
decimal
[undefined] bounds [if]
: bounds \ addr len -- addr+len addr
over + swap ;
[then]
[undefined] -rot [if]
: -rot \ a b c -- c a b
rot rot ;
[then]
[undefined] place [if]
: place \ c-addr1 u c-addr2 --
\ Copy the string described by c-addr1 u as a counted string at
\ the memory address described by c-addr2.
2dup 2>r
1 chars + swap move
2r> c! ;
[then]
char % constant delim
\ Character used as the substitution name delimiter.
wordlist constant wid-subst
\ Wordlist ID of the wordlist used to hold substitution names
\ and replacement text.
256 buffer: Name \ -- addr
\ Scratch buffer to hold substitution name as a counted string.
variable DestLen \ -- addr
\ Maximum length of the destination buffer.
2variable Dest \ -- addr
\ Holds destination string current length and address.
variable SubstErr \ -- addr
\ Holds zero or an error code.
[defined] VFXforth [if] \ VFX Forth
: makeSubst \ caddr len -- caddr
\ Given a name string create a substution and storage space.
\ Return the address of the buffer for the substitution text.
\ This word requires carnal knowledge of the host Forth.
\ Some systems may need to perform case conversion here.
get-current >r wid-subst set-current
($create) \ like CREATE but takes caddr/len
r> set-current
here 256 allot 0 over c! \ create buffer space
;
[then]
[defined] (WID-CREATE) [if] \ SwiftForth
: makeSubst \ caddr len -- caddr
wid-subst (WID-CREATE) \ like CREATE but takes caddr/len/wid
LAST @ >CREATE !
here 256 allot 0 over c! \ create buffer space
;
[then]
: findSubst \ caddr len -- xt flag | 0
\ Given a name string, find the substitution. Return xt and flag
\ if found, or just zero if not found. Some systems may need to
\ perform case conversion here.
wid-subst search-wordlist
;
: replaces \ text tlen name nlen --
\ Define the string text/tlen as the text to substitute for the
\ substitution named name/nlen. If the substitution does not
\ exist it is created.
2dup findSubst if
nip nip execute \ get buffer address
else
makeSubst
then
place \ copy as counted string
;
: addDest \ char --
\ Add the character to the destination string.
Dest @ DestLen @ < if
Dest 2@ + c! 1 chars Dest +!
else
drop -1 SubstErr !
then
;
: formName \ caddr len -- caddr' len'
\ Given a source string pointing at a leading delimiter, place
\ the name string in the name buffer.
1 /string 2dup delim scan >r drop \ find length of residue
2dup r> - dup >r Name place \ save name in buffer
r> 1 chars + /string \ step over name and trailing %
;
: >dest \ caddr len --
\ Add a string to the output string.
bounds
?do i c@ addDest 1 chars +loop
;
: processName \ -- flag
\ Process the last substitution name. Return true if found,
\ 0 if not found.
Name count findSubst dup >r if
execute count >dest
else
delim addDest Name count >dest delim addDest
then
r>
;
: substitute \ src slen dest dlen -- dest dlen' n
\ Expand the source string using substitutions. Note that this
\ version is simplistic, performs no error checking, and requires
\ a global buffer and global variables.
Destlen ! 0 Dest 2! 0 -rot \ -- 0 src slen
0 SubstErr !
begin
dup 0 >
while
over c@ delim <> if \ character not %
over c@ addDest 1 /string
else
over 1 chars + c@ delim = if \ %% for one output %
delim addDest 2 /string \ add one % to output
else
formName processName
if rot 1+ -rot then \ count substitutions
then
then
repeat
2drop Dest 2@ rot SubstErr @
if drop SubstErr @ then
;
: unescape \ c-addr1 len1 c-addr2 -- c-addr2 len2
\ Replace each '%' character in the input string c-addr1/len1 by
\ two '%' characters. The output is represented by caddr2/len2.
\ If you pass a string through UNESCAPE and then SUBSTITUTE,
\ you get the original string.
dup 2swap over + swap ?do
i c@ [char] % =
if [char] % over c! 1+ then
i c@ over c! 1+
loop
over -
;
Tests
=====
create tb 256 allot \ -- addr
\ Buffer for text.
create db 256 allot \ -- addr
\ destination buffer for text.
: >tb \ caddr len -- caddr' len
\ Place string in TB, and return the string. Done
\ this way to avoid problems with transient regions.
tb place tb count
;
: .sub \ caddr len n --
\ Display the result of a substitution.
cr . ." Substitutions, result:" type ." :"
;
: tsub \ caddr len --
\ Run the substitution text and display the results.
db 256 substitute .sub
;
s" hello" >tb s" hl" replaces
s" world" >tb s" wld" replaces
s" Start: %hl%,%wld%! :End" tsub
s" Hello, world!" tsub
s" aaa%foobar%bbb" tsub
s" aaa%%bbb" tsub
Change history
==============
16 September 2009
Revised definition of SUBSTITUTE to reduce the number of
ambiguous definitions.
Added more test cases.
9 September 2009
Incorporated changes from Leon Wagner to UNESCAPE.
4 September 2009
Added UNESCAPE.
6 April 2009
Reworked rationale.
Specified delimiter in SUBSTITUTE.
2 April 2009
Added reference implementation and test code
26 March 2009
Extracted from the internationalisation proposal.
--
Stephen Pelc, stephenXXX@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads