Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLIP MQTT protocol definition for FORTH client implementation #430

Open
wolfgangr opened this issue Jan 27, 2019 · 101 comments
Open

SLIP MQTT protocol definition for FORTH client implementation #430

wolfgangr opened this issue Jan 27, 2019 · 101 comments

Comments

@wolfgangr
Copy link

Inspired by this hack
https://hackaday.com/2017/02/13/hacking-on-the-weirdest-esp-module/#comment-5820116
I tried something similiar:
MQTT from an STM32 Bluepill running mecrisp FORTH, connected to ESP-Link.

However, when I try to issue test strings, the communication between ESP-LINK and the STM32 breaks down. I have to reset both the µC at http://192.168.1.88/console.html and ESP-Link at http://192.168.1.88/log.html using the HTML-Buttons at those pages.

I suspect that some wrong command string confuses ESP-LINK and renders it in a non-responsive state.

I assume that the communication ESP-LINK <-> µC is neither pure MQTT nor SLIP but a dedicated protocol, correct?

Unfortunately, there is no MQTT in https://embello.jeelabs.org/flib/ yet.

I have repeatedly searched the issues and all docs for some spec, but the only source I found is the
https://github.com/jeelabs/el-client/blob/master/ELClient/examples/mqtt/mqtt.ino
However, assembling test strings from reverse engineering this very error prone, too.

It would be great to find a couple of preassembled test strings to ensure that the whole setup is working.
So I could work down to more complicated functions by small changes.
Just some "hello" appearing on my mqtt would be great.

Details from my setup:

  • esp-link v2.2.3 - 2016-06-21 21:58:48 - 1bcdc62, prebuilt stock flash
  • WEMOS D1 Mini with ESP12 module
  • https://embello.jeelabs.org/hardware/bluepill.html
  • mecrisp-Stellaris RA 2.4.8 for STM32F103 by Matthias Koch
  • reset GPIO12, UART swapped
  • folie <-> ESP-Link <-> UART, accompanied by web console
  • mosquitto 1.4.10-3+deb9u2 on debian 9.6
  • I see esp-link {"rssi":-60, "heap_free":19992} on my mosquitto_pub,
    so the ESP-link <-> mosquitto is working
  • syslog running, but neither in syslog nor in the weblog I see any traces from futile commands

I'd plan to rework the whole thing using string buffer (hope to find such in mecrisp) instead of blowing the stack.
Would be glad to share, of course.

@wolfgangr
Copy link
Author

Maybe I have a pointer alignment problem - among many others.
This is no surprise when apps are ported from 8 bit to 32 bit platform.
And obviously, the align-word in my mecrisp is dysfunct.

This produced different hard to debug effects, among them the infamous
Unhandled Interrupt 00000003 ! crashes (see also here https://jeelabs.org/article/1619a/ )
But I also had obviusly simple definitions that did not work, and irrespossive connections between ESP-Link and mecrisp.

grepping through my source trees, I found
flib/spi/rf69ook.fs:113:OOK.RSSI.QLEN 2 rshift 1+ 2 lshift buffer: ook.rssi.q \ don't trust align.
So it seems to be a known problem.

aligned seems to work, calign obvioulsy not.
My current workaround is

: myalign here dup aligned swap - allot ; 
: align myalign ;

and after possibly uneven c, sequences, I call

calign
myalign

I'll try to figure out whether it is a problem rooted in mecrisp or in the emmbello/flib I include, and will rise an issue in the proper place then.

I hope this finding brings me a bit closer to a succesful FORTH-MQTT-link.
Nevertheless, test sequences and/ or an API documentation for the SLIP-MQTT interface would still be appreciated.
I'd propose to write some nice text, when I get the proper informations / assistance to do so.

@wolfgangr
Copy link
Author

just replaced my reset-wire by a diode as outlined here
https://jeelabs.org/article/1619a/
and can do proper soft resets again. Nice :-)

This is not the primary cause, of the issue at hand, just a aggravation of symptoms, but may help to track down the cause of unplanned soft resets - at least I can read the last console printings just before a crash.

@wolfgangr
Copy link
Author

hacked a simple string lib to proceed
would prefer to test and polish it before I publish it

@wolfgangr
Copy link
Author

Making progress - hitting visible walls already.

I manage it to get a message parsed by ESP-Link as supposed-to-be slip.
At least I can see a related message in the web log.
Maybe the mqtt stuff is not yet OK, but before I have to get the CRC right.
This my sending string buffer as visible in FORTH
The first two bytes resemble the counter, the next two the max length of the allocated string.
So, the data start at ....085C

SLIP-message stringbuf-dump  
Start at 20000858 
Last byte is 089d, containing C0 aka SLIP_END

                                   |-pos-|-len-|-data-> 

20000850   FF F7 33 FF 00 BD 00 00   42 00 80 00 00 00 00 01   ..3..... B.......
20000860   00 00 00 89 C0 00 00 00   0B 00 05 00 00 00 00 00   ........ ........
20000870   14 00 68 6F 6D 65 2F 62   61 73 65 6D 65 6E 74 2F   ..home/b asement/
20000880   77 61 73 00 00 00 02 00   6F 6E 00 00 02 00 02 00   was..... on......
20000890   00 00 01 00 00 00 00 00   01 00 00 43 F4 C0 00 00   ........ ...C....
200008A0   00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........

This is what I read from the ESP-Link log.
The start and the count suggest that only the bytes between the C0 aka SLIP_END are counted.
However, the end of the messages differ - even in size

16530> SLIP: start or end len=9 inpkt=0
916531> SLIP: start or end len=56 inpkt=1
916531> SLIP: bad CRC, crc=b6dc rcv=f443 len=56
916531> \00\00\00\0B\00\05\00\00\00\00\00\14\00home/basement/was\00\00\00\02\00on\00\00\02\00\02\00\00\00\01\00\00\00\00\00\01\00\00C\F4

How is the CRC to be generated?
I compared my hackaday example code with the source and identified this stanza:
https://github.com/jeelabs/esp-link/blob/fe4f565fe83f05e402cc8d8ca3ceefbc39692a1f/serial/crc16.c

unsigned short
crc16_add(unsigned char b, unsigned short acc)
  acc ^= b;
  acc  = (acc >> 8) | (acc << 8);
  acc ^= (acc & 0xff00) << 4;
  acc ^= (acc >> 8) >> 4;
  acc ^= (acc & 0xff00) >> 5;
return acc;

this is my implementation in forth:

: crc+ ( old_running_sum new_byte -- new_running_sum ) 
  xor
  dup 8 rshift swap 8 lshift or
  $ffff and
  dup $ff00 and 4 lshift xor
  $ffff and
  dup 8 rshift 4 rshift xor 
  dup $ff00 and 5 rshift xor
;

Compared to the template, I added two $ffff and masks to keep calcuation in the 16-bit-width.
I suppose that unsigned short in C is supposed to be 16 bit as well, OK?
What else can be wrong?
I don't find any initialisation in the C-code, so I assume the accumulator will start with zero.
I found a CRC-Calculator on the web, but I coulld not match their algorithm descriptions.
https://crccalc.com/?crc=0+1&method=ascii

Any pointer?

@wolfgangr
Copy link
Author

could it be a serial communication issue?
ran the examples at 460800
switching back to 115200 , ruined my test example, so I have not exactly sam strings.
but.... what puzzles me is that the trail of the strings are responded differntly in both cases
I list below

  • last 6 bytes sent
  • crc reported as bad by ESP-Link log
  • last bytes as responded by ESP-Link log
    But obviously, the CRC arrived correctly at the ESP-Link - this would not point t a communicatin issue
 01 00 00 43 F4 C0
rcv=f443
\01\00\00C\F4
01 00 00 A3   C4 C0 
 rcv=c4a3 
\01\00\00\A3\C4

@Blimpyway
Copy link

Blimpyway commented Jan 29, 2019 via email

@wolfgangr
Copy link
Author

I'm just thinking into endianness regarding the last step - putting the CRC result to the message string.
But then I might just get two bytes swapped - easy to spot.

Both the C and my FORTH code are implemented in integer math.
shouldn't assembly level bare metal prob's be covered by the compilers?
The links you pointed me refer both to C Compiles and problem of some endianness macro not included correctly.
So either my FORTH or ESP-Link may be compiled wrong and nobody realized?
I'll try to figure that ....

@wolfgangr
Copy link
Author

The forth side is easy to examine.
just a little loop, shifting a bit through a 32 bit number.
Used the same lshift word as in the crc implementation above
#430 (comment)
I think it behaves as expected - no jumping bits, bit just falls off at the left end

: shifter cr 1 40 1 DO i . 1 lshift dup hex. loop cr ; Redefine shifter.  ok.
  ok.
shifter 
1 00000002 2 00000004 3 00000008 4 00000010 5 00000020 6 00000040 7 00000080 8 00000100 
9 00000200 10 00000400 11 00000800 12 00001000 13 00002000 14 00004000 15 00008000 16 00010000 
17 00020000 18 00040000 19 00080000 20 00100000 21 00200000 22 00400000 23 00800000 24 01000000 
25 02000000 26 04000000 27 08000000 28 10000000 29 20000000 30 40000000 31 80000000 32 00000000 
33 00000000 34 00000000 35 00000000 36 00000000 37 00000000 38 00000000 39 00000000 

How can I inject a test code into ESP-Link implementation?
I'd like to see some online CRC calculator like this one to compare
https://crccalc.com/?crc=0+1&method=ascii

@wolfgangr
Copy link
Author

wolfgangr commented Jan 29, 2019

Are there examples of just one crc step, so that I could cross check my incremental bytewise updater?
Hm, looks like I have to dig for a C hello world an paste the code in?
But then I would just explore the C-build environment of my Workstation.

I think I'll try to untwine the SLIP wrapper layer from the MQTT inside.
Just a test string with a few bytes inside, easy to compare.

But for the moment, live comes into the way....

@Blimpyway
Copy link

Blimpyway commented Jan 29, 2019 via email

@tve
Copy link
Member

tve commented Jan 29, 2019

Your forth CRC code looks correct to me. You need to make sure that the new_byte parameter really only has 8 bits.
Sadly I inherited a lot of that low-level stuff from some other project, so it's not documented, grr..
In terms of slip, this is where it happens: https://github.com/jeelabs/esp-link/blob/master/serial/slip.c
Each packet starts and ends with a SLIP_END (0xC0). The last two bytes are the computed CRC value, which is calculated over all the bytes except the last two. Note that any 0xC0 or 0xDB value needs to be escaped on the wire and is unescaped before calculating the checksum.
A packet consists of a 16-bit command followed by a 16-bit argument count and a 32-bit "callback value" (really an opaque token). This is all little endian.
Your best bet is probably to start with the SYNC command whose code is 0x0001 and which takes one arg in the value field, which must be non-zero, e.g., { 0xC0, 0x01, 0x00, 0x01, 0x00, 0x11, 0x11, 0x11, 0x11, crc-lo, crc-hi, 0xC0 }
You should get a response back { 0xC0, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, crc-lo, crc-hi, 0xC0 }

@wolfgangr
Copy link
Author

I found my error.
Silly oversight in the caller to the CRC:
What I wanted: DO I c@ crc+ LOOP
What I did: DO I crc+ LOOP
... in a loop cycling over string adresses. So I calculated the CRC of the adresses, not the content.

Now I got the message:

 99215> SLIP: start or end len=9 inpkt=0
 99215> SLIP: start or end len=56 inpkt=1
 99216> cmdParsePacket: cmd=0 argc=2816 value=1280
 99216> cmdExec: Dispatching cmd=NULL

I interpret it that way that I have got the SLIP thing right and can proceed with MQTT now.
To be sure, I changed the endiannes of my CRC added and got :
SLIP: bad CRC, crc=b6dc rcv=dcb6 len=56
Lesson learned: screw your CRC if you need to debug the SLIP content :-O ;-)

@wolfgangr
Copy link
Author

Making progress: first succesful MQTT :-)
This is what I send: (fist byte at ...0934, last byte at ...0975 )

20000930   42 00 80 00 01 00 00 00   89 02 00 00 C0 0B 00 05   B....... ........
20000940   00 00 00 00 00 14 00 68   6F 6D 65 2F 62 61 73 65   .......h ome/base
20000950   6D 65 6E 74 2F 77 61 73   68 65 72 02 00 6F 6E 00   ment/was her..on.
20000960   00 02 00 02 00 00 00 01   00 00 00 00 00 01 00 00   ........ ........
20000970   00 00 00 C1 7E C0 00 00   00 00 00 00 00 00 00 00   ....~... ........

ESP-link debug log:
As we see, there is another MQTT pub, starting at 860810 sending the esp-link status

855163> SLIP: start or end len=9 inpkt=0
855164> SLIP: start or end len=56 inpkt=1
855164> cmdParsePacket: cmd=11 argc=5 value=0
855164> cmdExec: Dispatching cmd=MQTT_PUB
855164> MQTT: MQTTCMD_Publish topic=home/basement/washer, data_len=2, qos=0, retain=0
855164> MQTT: Publish, topic: "home/basement/washer", length: 26
855165> MQTT: Send type=PUBLISH id=0000 len=26
860810> MQTT: Publish, topic: "esp-link", length: 45
860810> MQTT: Send type=PUBLISH id=00A9 len=45
860813> MQTT: Recv type=PUBACK id=04A9 len=4; Pend type=PUBLISH id=4A9

and this is my console running :~$ mosquitto_sub -h 192.168.X.Y -v -t '#'
both my test string and the esp-link status:

home/basement/washer on
esp-link {"rssi":-60, "heap_free":20008}

@wolfgangr
Copy link
Author

@tve

so it's not documented, grr..

That's a common issue in the FOSS world.
I respect that if people give away their work for free, they don't feel obliged to do boring stuff ;-)

So let me add my 3 cents here.
(Or should I better start a Wiki page ... ?)

A packet consists of a 16-bit command followed by a 16-bit argument count and a 32-bit "callback value" (really an opaque token). This is all little endian.
Your best bet is probably to start with the SYNC command whose code is 0x0001 and which takes one arg in the value field, which must be non-zero, e.g., { 0xC0, 0x01, 0x00, 0x01, 0x00, 0x11, 0x11, 0x11, 0x11, crc-lo, crc-hi, 0xC0 }

so this breaks up as follows?

      END      0xC0, 
      SYNC command 0x0001  ~~>  in little endian = 0x01, 0x00, 
      arbitrary non-zero dummy arg  0x0001 ~~>  0x01, 0x00, 
      arbitrary dummy callback value   0x11, 0x11, 0x11, 0x11, 
      checksum         crc-lo, crc-hi, 
      END       0xC0

You should get a response back { 0xC0, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, crc-lo, crc-hi, 0xC0 }

Have not yet screwed my forth interpreter to process responses coming along the same line I use to talk to it. But should not be that difficult, I expect.
What I see at the console is a which I can expand to 0xEF at my forth console
Presumably from the checksum?

When I process the answer, is it ok to consider anything enclose between two C0 as SLIP (well, if not excaped...)
Can I rely that any SLIP rests on one single line?
Or do line breaks loose their meaning between C0 ?

Note that any 0xC0 or 0xDB value needs to be escaped on the wire and is unescaped before calculating the checksum.

Good to know.
So this should go into the routine which assembles the command string.
If I did it in the definition of literals, it would be in the checksum and thus screw it.
I found $DB constant SLIP_ESC compiled from my inherited template.
So this is the escape character?
Is this the correct way to do it then?

    0xDB -> 0xDB 0xDB
    0xC0 -> 0xDB 0xC0

And I suppose I have to remove it from responses as well, OK?

I'd like to keep things simple. Can I avoid nonprintble characters alltogehter?

@wolfgangr
Copy link
Author

OK, next let's dissect my first succesful washer test string.

This is how it looks in memory:

20000930   42 00 80 00 01 00 00 00   89 02 00 00 C0 0B 00 05   B....... ........
20000940   00 00 00 00 00 14 00 68   6F 6D 65 2F 62 61 73 65   .......h ome/base
20000950   6D 65 6E 74 2F 77 61 73   68 65 72 02 00 6F 6E 00   ment/was her..on.
20000960   00 02 00 02 00 00 00 01   00 00 00 00 00 01 00 00   ........ ........
20000970   00 00 00 C1 7E C0 00 00   00 00 00 00 00 00 00 00   ....~... ........

Ouh, should I push my git to github to refer to the code that did this? anyway...

My code template says 'Command to sync up the esp-link '
01 00 00 00 89 02 00 00 and calls it 'sync'
Is it the sync you referred to? It's not enclosed in "END", and there is no CRC?
Or is it just bullshit an silently ignored?

C0 according to wikipedia article in SLIP stands for 'frame end'.
Is this required to start the following MQTT, or to close the (maybe broken) sync before?

0B 00 05 00 00 00 00 00 is called mqtt.pramble in my template, commented as '5 arguments, no callback'. So expanding your explanation, this is acutally a ESP-link SLIP command - to switch it to MQTT mode now - not part of the MQTT standard - correct?
I see 0x000B as command an 0x0005 as argument(s?).
Is it an argument, or a counter of such?
If an argument, what's the meaning?
If a counter, what is counted then?

Next is a string literal, consisting of a 16-bit length 0x0014 = dec 20 and a 20 byte long home/basement/washer no frills ascii string.
I think the slashes refer to MQTT subtopics, but the mosquitto manual is still on my to-be-read list.

14 00 
68 6F 6D 65 2F 62 61 73 65 6D 65 6E 74 2F 77 61 73 68 65 72  

The next one 02 00 6F 6E 00 00 02 00 02 00 00 00
I just copied from the template, which I paste here including the comments there:

: message.on 
	02 00  6f 6e  00 00  \ "on"
	02 00  02 00  00 00  \ len of data  
;

There is another one (not used in my first example)
I see the string format similiar to above
Is there 16-bit padding required by the MQTT protocol?
And I see the 3rd bit in the second line increased - copy of the string len?

: message.off 
	03 00  6f 66  66 00  \ "off"
	02 00  03 00  00 00  \ len of data  
;

The last one I simply copied without any clue.
Well, I can google MQTT & quos & retain, of course
01 00 00 00 00 00 01 00 00 00 00 00

: qos.and.retain 
	01 00  00 00  00 00  \ qos = 0
	01 00  00 00  00 00  \ retain = 0
;

I really would appreciate some simple manual on the standard.
The standard itself in my eyes is just a 100 page collection of might and could.
Where can I find answers such as regarding the assembly of those strings?

C1 7E is the CRC checsum (0x7ec1) calculated as discussed above, in little endian notation
C0 is the frame end token again.

@wolfgangr
Copy link
Author

Ouh, should I push my git to github to refer to the code that did this? anyway...

voila:
https://github.com/wolfgangr/forthMQTT3pktControl

preliminary hack of a buffered string library
https://github.com/wolfgangr/forthMQTT3pktControl/blob/master/stringbuf.fs

the constants I referred to:
https://github.com/wolfgangr/forthMQTT3pktControl/blob/master/mmq-const.fs

the hot spot - work in progress while I write in this issue
https://github.com/wolfgangr/forthMQTT3pktControl/blob/master/mmq-tools.fs

@uzi18
Copy link
Contributor

uzi18 commented Jan 30, 2019

@wolfgangr nice work, but why FORTH?
@tve any hints for this implementation?

@wolfgangr
Copy link
Author

wolfgangr commented Jan 30, 2019

why FORTH?

FORTH provides - to my knowledge - the largest spread between extremes on a couple of dimensions:

  • hands on high level interactive development vs close to metal hacks
  • low ressource footprint vs expressive power
  • people who love it vs people who never ever will ike it :-)

So on this question you could blow every thread, and there are enough of these discussions around.

@wolfgangr
Copy link
Author

back to start again? - Well, hitting a different wall, now.
Wrote some words to customize contents (topic, value)
Looks basicaly the same as the working one above - just different string lengts.

20000BA0   70 47 00 00 44 00 80 00   01 00 00 00 89 02 00 00   pG..D... ........
20000BB0   C0 0B 00 05 00 00 00 00   00 15 00 68 65 61 74 69   ........ ...heati
20000BC0   6E 67 2F 70 72 65 73 73   75 72 65 2F 70 75 6D 70   ng/press ure/pump
20000BD0   00 02 00 6F 6E 00 00 02   00 02 00 00 00 01 00 00   ...on... ........
20000BE0   00 00 00 01 00 00 00 00   00 DC AE C0 00 00 00 00   ........ ........
20000BF0   00 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........

But does not work.
No message received on the MQTT client.
The ESP-Link console:

804117> SLIP: start or end len=8 inpkt=0
804117> SLIP: start or end len=58 inpkt=1
804117> cmdParsePacket: cmd=11 argc=5 value=0
804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

So obviously the command got processed by the SLIP layer and handed over to the MQTT_PUP layer.
But then it threw some kind of an error E:M 28280
looks like a long night of RTFS....

@wolfgangr
Copy link
Author

For reference, again the log in the case of success

855164> cmdParsePacket: cmd=11 argc=5 value=0
855164> cmdExec: Dispatching cmd=MQTT_PUB
MQTT: MQTTCMD_Publish topic=home/basement/washer, data_len=2, qos=0, retain=0

... and of current fail:

804117> cmdParsePacket: cmd=11 argc=5 value=0
804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

we have a
https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c
line 100:

void ICACHE_FLASH_ATTR
MQTTCMD_Publish(CmdPacket *cmd) {

line 140:

  DBG("MQTT: MQTTCMD_Publish topic=%s, data_len=%d, qos=%d, retain=%d\n",
    topic, data_len, qos, retain);

MQTT_Publish(client, (char*)topic, (char*)data, data_len, qos%3, retain&1);

So I think this stanza generates the successful debug message and just after calls the handover to the MQTT server.

However, I can't see where the fail message E:M 28280 is assembled.
Looks I have to search one wrapping layer above.

@wolfgangr
Copy link
Author

wolfgangr commented Jan 31, 2019

@tve

Sadly I inherited a lot of that low-level stuff from some other project, so it's not documented,

Is there any hope that this other project has some docu we could build upon?

I see this (C)
https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_msg.h

Copyright (c) 2014, Stephen Robinson

but only referring the outgoing calls (ESP-link -> MQTT-Server)
No (C) in the serial parsers.

--- edit ---
https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt.c

*  Protocol: http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html
* Copyright (c) 2014-2015, Tuan PM <tuanpm at live dot com>
* Modified by Thorsten von Eicken to make it fully callback based

OK ? - I found the correct standard aka haystack and just heve to look for the needle...

@wolfgangr
Copy link
Author

https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c
contains some checks
can't identify the current point of failure, but keep them in the log as relevant details for the whole picture:

105: if (cmdGetArgc(&req) != 5) return;
So the cmd-len param has always to be 5, no need to implement variations on that

113: if (len > 128) return; /
max size of topic + data + decoration, we may match this limit at the client size

120: uint8_t *data = (uint8_t*)os_zalloc(len+1); .... if (!data)
may I refer to that as 'out of memory error'

my ESP-link keeps sending mqtt state messages like
esp-link {"rssi":-56, "heap_free":20384}
So I can't see a reason why we can' allocate 128 bytes in 20 k ram yet.

I see no premature returns after this last check and the succesful debug message.
We find repeated calls to cmdPopArg(
referring to some of the parameters displayed in the DBG("MQTT: MQTTCMD_Publish
Is there some backdoor error handler in the cmdPopArg that might override out flow of command?

@wolfgangr
Copy link
Author

wolfgangr commented Jan 31, 2019

Is there some backdoor error handler in the cmdPopArg that might override out flow of command?

// Copy next arg from request into the data pointer, returns 0 on success, -1 on error

int32_t cmdPopArg(CmdRequest *req, void *data, uint16_t len);

cmdPopArg(CmdRequest *req, void *data, uint16_t len) {

I don't see such backdoor.
There is a os_memcpy but I suppose this cannot fail on allocated buffers.
It just returns 0 (success) or -1 (not enough args?? ) , so has to be checked by the caller

@wolfgangr
Copy link
Author

wolfgangr commented Jan 31, 2019

https://github.com/jeelabs/el-client

A prelimenary documentation for the library is available on ELClient API Doc.

404: Page not found
The way back machine just shows an empty frame.

It refers to doygen. So the docu lives in here?
https://github.com/jeelabs/el-client/blob/master/ELClient/ELClientMqtt.cpp

hm... a first glance tells me that is a layer beyond what I'm looking for.
I don't want to call those C-functions, but mimic their behaviour (maybe a tiny subset thereof)

@tve
I think this were right what I try to reinvent here.
Are you aware of any other copy of this prelim API?
Maybe somewhere on your personal storage?

@wolfgangr
Copy link
Author

wolfgangr commented Jan 31, 2019

OK - hope disappointed - back to track.

who is printing E:M ?
github says:

We couldn’t find any code matching 'E:M' in jeelabs/esp-link

who is calling MQTTCMD_Publish ?

{CMD_MQTT_PUBLISH, "MQTT_PUB", MQTTCMD_Publish},

in Version 2.2.3 this line sits at 41: - 33 lines less
Looks like a major rewrite. Does this relate to my question?
Is it OK to continue my work on stable V2.2.3 ?

Thorsten, @tve
could you please provide a quick answer on that?

'==============

this is a // Command dispatch table.
So, who is processing that?

line 76: (in v2.2.3) : cmdGetCbByName(char* name)
which I read as get Callback by Name
no other hit in git seach for MQTT_PUB
... hm ... puzzling?
well, you don't call by strings within C, do you?

we can search for the integer index CMD_MQTT_PUBLISH on our command table as well .... and find....

CMD_MQTT_PUBLISH, // publish a message

@wolfgangr
Copy link
Author

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.h
distill the context:

typedef enum {
       .....
  CMD_MQTT_PUBLISH, // publish a message
       .....
} CmdName;

@wolfgangr
Copy link
Author

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.c#L83

static void ICACHE_FLASH_ATTR
cmdExec(const CmdList *scp, CmdPacket *packet) {
  // Iterate through the command table and call the appropriate function
  while (scp->sc_function != NULL) {
    if(scp->sc_name == packet->cmd) {
      DBG("cmdExec: Dispatching cmd=%s\n", scp->sc_text);
      // call command function
      scp->sc_function(packet);
      return;
    }
    scp++;
  }
  DBG("cmdExec: cmd=%d not found\n", packet->cmd);
}

Let' recall our log:

804117> cmdExec: Dispatching cmd=MQTT_PUB
804118> E:M 28280

OK, so we have found the point BEFORE the error encurred.
further enquire scp->sc_function(packet);
we have the function argument cmdExec(const CmdList *scp, ...
defined here
https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.h#L55

typedef struct {
  CmdName   sc_name;     // name as CmdName enum
  char      *sc_text;    // name as string
  cmdfunc_t sc_function; // pointer to function
} CmdList;

hm .... Looping closed and overlooked the point?

@wolfgangr
Copy link
Author

wolfgangr commented Jan 31, 2019

Looping closed and overlooked the point?

OK, down again the rabbit hole and collect the crumbs on the way.
https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/handlers.c#L41
{CMD_MQTT_PUBLISH, "MQTT_PUB", MQTTCMD_Publish},

https://github.com/jeelabs/esp-link/blob/v2.2.3/cmd/cmd.c#L89

      DBG("cmdExec: Dispatching cmd=%s\n", scp->sc_text);
      // call command function
      scp->sc_function(packet);

This resembles our last valid debug print, so we know for sure (+-) that MQTTCMD_Publish is called on our packet.

So somewhere between
https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L101
MQTTCMD_Publish(CmdPacket *cmd) {
and ( the debug print in success case we are missing here)
https://github.com/jeelabs/esp-link/blob/v2.2.3/mqtt/mqtt_cmd.c#L140
DBG("MQTT: MQTTCMD_Publish topic=%s, data_len=%d, qos=%d, retain=%d\n",
our problem must occur.

@uzi18
Copy link
Contributor

uzi18 commented Jan 31, 2019

E:M means out of memory for alloc

@uzi18
Copy link
Contributor

uzi18 commented Jan 31, 2019

need to find some readings on forth for uC. any link?

@uzi18
Copy link
Contributor

uzi18 commented Feb 1, 2019

I see forth is not so easy ;)

@wolfgangr
Copy link
Author

OK, after som googlin & RTFS, I try this hack

hook-key @ Constant old-key 
$80 stringbuffer constant mirror  
: ?mirrline mirror  stringbuf-wheretowrite 1- c@ $0d = IF mirror dup stringbuf-dump stringbuf-clear THEN ;
: mykey old-key execute dup mirror stringbuf-byte-app ?mirrline ; 
' mykey hook-key !
  • safe the current key handler
  • allocate a line buffer
  • if last character is 0xd, hexdump the line buffer and clear it
  • call old key handler, store in buffer and print/flush at end of line
  • activate this beast

@wolfgangr
Copy link
Author

added sharkand unshark words to switch this input line mirror hexdumper on and off.
This is what I see after I send the MQTT setup sequence

20001290   17 00 80 00 6D 71 74 74   2D 6D 65 73 73 61 67 65   ....mqtt -message
200012A0   20 6D 71 74 74 2D 73 65   6E 64 0D 00 00 00 00 00    mqtt-se nd......
 � ok.
  not found.
^unshark
Start at 20001290 
           |-pos-|-len-|-data-> 

20001290   0F 00 80 00 C0 03 00 00   00 0A 5E 75 6E 73 68 61   ........ ..^unsha
200012A0   72 6B 0D 00 00 00 00 00   00 00 00 00 00 00 00 00   rk...... ........

After normal response, this sequence is inserted: C0 03 00 00 00 0A 5E
after that, normal serial stream resumes.
We can see, that the call to unshark is spoiled.

compare what the debug says:
cmdResponse: cmd=3 val=540106272 argc=0
Looks like my callback is shifted 2 bytes? ... let' look further....

I don't see any immediate SLIP'ed response after submitting a MQTT subscribe.

This are responses to subscribed messages
I would expect topic and text, too - hm....

           |-pos-|-len-|-data-> 
20001290   1B 00 80 00 C0 03 00 02   00 0A 5E 33 12 C0 C0 03   ........ ..^3....
200012A0   00 02 00 0A 5E C0 03 00   02 00 0A 5E 12 C0 0D 00   ....^... ...^....

For the moment let's focus on the format of the messages:

           |-pos-|-len-|-data-> 
20001290   1B 00 80 00 
C0 03 00 02 00 0A 5E 33 12 C0
C0 03 00 02 00 0A 5E 
C0 03 00 02 00 0A 5E 12 C0 0D 
00   ....^... ...^....

This is really weir'd....
Is my code too slow so that bytes get lost?
does it relate to the work of the 0x0a character?

@wolfgangr
Copy link
Author

does it relate to the work of the 0x0a character?

obviously the internals of ESP-link or the serial line itself get screwed by 0a bytes.
changed them to a printable asterisk * 0x2a

The response to setup now reads:

20001290   16 00 80 00 C0 03 00 00   00 2A 5E 31 2A F7 6B C0   ........ .*^1*.k.
200012A0   20 20 75 6E 73 68 61 72   6B 0D 00 00 00 00 00 00     unshar k.......

which I'd dissassemble as

C0 - 03 00 - 00 00 - 2A 5E 31 2A - F7 6B   -  C0 

Now the subscription callbacks read

20001290   6D 00 80 00 C0 03 00 02   00 2A 5E 34 2A 07 00 66   m....... .*^4*..f
200012A0   6F 6F 2F 62 61 72 00 00   00 08 00 20 74 72 61 6C   oo/bar.. ... tral
200012B0   61 6C 61 00 00 1A 97 C0   C0 03 00 02 00 2A 5E 34   ala..... .....*^4
200012C0   2A 07 00 66 6F 6F 2F 62   61 72 00 00 00 08 00 20   *..foo/b ar..... 
200012D0   74 72 61 6C 61 6C 61 00   00 1A 97 C0 C0 03 00 02   tralala. ........
200012E0   00 2A 5E 34 2A 07 00 66   6F 6F 2F 62 61 72 00 00   .*^4*..f oo/bar..
200012F0   00 08 00 20 74 72 61 6C   61 6C 61 00 00 1A 97 C0   ... tral ala.....
20001300   0D 00 00 00 00 00 00 00   00 00 00 00 00 00 00 00   ........ ........

manually regrouped as follows

C0 - 03 00 02 00 2A 5E 34 2A 
   - 07 00 - 66  6F 6F 2F 62 61 72 - 00 00 00 
   - 08 00 - 20 74 72 61 6C 61 6C 61 - 00 00 
   - 1A 97 - C0  
C0 - 03 00 02 00 2A 5E 34 2A 
   - 07 00 - 66 6F 6F 2F 62 61 72 - 00 00 00 
   - 08 00 - 20 74 72 61 6C 61 6C 61 00 00 
   - 1A 97 - C0 
C0 - 03 00 02 00 2A 5E 34 2A 
   - 07 00 - 66 6F 6F 2F 62 61 72 - 00 00 00 
   - 08 00 20 74 72 61 6C 61 6C 61 00 00 
   - 1A 97 - C0   
0D 
  • SLIP_END
  • CMD_RESP_CB, /**< Response with a callback */
  • two data fields
  • callback 'pointer' aka 0x2a345e2a aka *^4*
  • topic string in 2 byte length + string + pad-to-4
  • data string in 2 byte length + string + pad-to-4
  • checksum
  • SLIP_END

which is not new to us any more

I can remember that with 0x0a in the callback, the ESPL-Log displays "cmd=3"
and the ^3 aka publishedCb which is not correct, instead of the correct cmd=4 and ^4.
Well, at least this is consisten with eating up topic and data...
So it's ESP-link that is disturbed by nonprintable bytes in a supposed-to-be hex address?
Let's avoid that quirk...

@wolfgangr
Copy link
Author

So we need a two layer parser:

  • SLIP ... switch to slip at SLIP_END , unescape SLIP_ESC .... until SLIP_END
  • check checksum
  • MQTT: process the slip encoded chunk
  • match pattern either over first 8 bytes or callback only
  • extract topic and data in case of subscription callback

special case: empty data field:
mosquitto_pub -h 192.168.X.Y -t "foo/bar" -m ""

20001290   1D 00 80 00 C0 03 00 02   00 2A 5E 34 2A 07 00 66   ........ .*^4*..f
200012A0   6F 6F 2F 62 61 72 00 00   00 00 00 00 00 2E 25 C0   oo/bar.. ......%.

we see that

  • nevertheless 2 data fields are announce
  • the data NUL-string consists of just 4 zeroes

@wolfgangr
Copy link
Author

here #430 (comment) I found

obviously the internals of ESP-link or the serial line itself get screwed by 0a bytes.

here #430 (comment) I knew

My code template says 'Command to sync up the esp-link '
01 00 00 00 89 02 00 00 and calls it 'sync'

I still prepend this to any of my commands.
May this cause ESP-Links misbehaviour?
Maybe this is broken?
Maybe I should enclose this (or a correct sync) in 0xC00 aka SLIP_END ?

let's recall the Master's words:

@tve wrote here #430 (comment)

In terms of slip, this is where it happens: https://github.com/jeelabs/esp-link/blob/master/serial/slip.c

Each packet starts and ends with a SLIP_END (0xC0).

The last two bytes are the computed CRC value, which is calculated over all the bytes except the last two.

Note that any 0xC0 or 0xDB value needs to be escaped on the wire
and is unescaped before calculating the checksum.

A packet consists of a 16-bit command
followed by a 16-bit argument count
and a 32-bit "callback value" (really an opaque token).
This is all little endian.

Your best bet is probably to start with the SYNC command
whose code is 0x0001 and
which takes one arg in the value field, which must be non-zero,
e.g., { 0xC0, 0x01, 0x00, 0x01, 0x00, 0x11, 0x11, 0x11, 0x11, crc-lo, crc-hi, 0xC0 }

You should get a response back
{ 0xC0, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, crc-lo, crc-hi, 0xC0 }

Have I learned to understand all that words now?
My sync template seems match the prescribed format, but misses CRC and SYNC_END enclosing
OK, lets do that:

  • fix my snyc
  • correctly escape and unescape
  • correctly parse responses

@wolfgangr
Copy link
Author

wolfgangr commented Feb 2, 2019

@uzi18

I see forth is not so easy ;)

Well, yes, it's more complicated than a pipe symbol | in linux

hook-key @ Constant old-key 
: mykey old-key execute dup shark-the-wire ;
' mykey hook-key ! 

@wolfgangr
Copy link
Author

wolfgangr commented Feb 2, 2019

@tve
Could you please confirm / correct the escape mechanism µC -> ESP-lib:

  • $C0 constant SLIP_END
  • $DB constant SLIP_ESC
  • every 0xC0 becomes 0xDB 0xC0
  • every 0xDB becomes 0xDB 0xDB
  • every ordinary non-SLIP transfer outside SLIP_END brackets has to be encoded
  • the content of a SLIP-command has to be encoded
  • the checksum of a SLIP-command has to be encoded
  • only the SLIP_END enclosing a SLIP-command have not to be encoded
  • a line end aka 0x0D resets slip encoding ?

It's like " and \ on the linux shell, I suppose?

@wolfgangr
Copy link
Author

Spent a hard night with stalling reads.
I get 2 characters from the SLIP-string - that's it.

I hoped I might get closer to the metal,
but when I look into the implementation fo my FORTH's terminal I find
serial-key

   ldr r2, =Terminal_USART_DR
   ldrb tos, [r2]         @ Fetch the character

... and serial-key?

   ldr r0, =Terminal_USART_SR
   ldr r1, [r0]     @ Fetch status
   movs r0, #RXNE
   ands r1, r0

Well, I think this is bare metal...

So what?
Play with the USART configuration in the STM?
In my first trial, I just kept pressing 'ENTER' to get the data.
So maybe I try to write 0x0d to the ESP-link before I throw a timeout?

O do we have some kind of flow control?
I haven't connected any extra wires, so maybe there is XON/XOFF available?
Or does this create problems instead of solving them?
Should I send XON?
or RTFM?

@wolfgangr
Copy link
Author

wolfgangr commented Feb 3, 2019

neither sending 0x0d nor xon aka 0x11 helps.
I see we have no RTS/CTS
#102 (comment)
#53 (comment)

@wolfgangr
Copy link
Author

wolfgangr commented Feb 3, 2019

here a transcript of debug data:
any char received is logged twice, enclosed in ' ~ ' aka 0x7E

Start at 20001734 
                       |-pos-|-len-|-data-> 

20001730   61 72 00 BD 9B 00 00 01   7E 45 45 7E 53 53 7E 50   ar...... ~EE~SS~P
20001740   50 7E 4C 4C 7E 2D 2D 7E   73 73 7E 79 79 7E 6E 6E   P~LL~--~ ss~yy~nn
20001750   7E 63 63 7E 20 20 7E 6D   6D 7E 65 65 7E 6D 6D 7E   ~cc~  ~m m~ee~mm~
20001760   73 73 7E 74 74 7E 72 72   7E 2D 2D 7E 63 63 7E 6F   ss~tt~rr ~--~cc~o
20001770   6F 7E 75 75 7E 6E 6E 7E   74 74 7E 65 65 7E 64 64   o~uu~nn~ tt~ee~dd
20001780   7E 20 20 7E 73 73 7E 6C   6C 7E 69 69 7E 70 70 7E   ~  ~ss~l l~ii~pp~
20001790   2D 2D 7E 73 73 7E 65 65   7E 6E 6E 7E 64 64 7E 20   --~ss~ee ~nn~dd~ 
200017A0   20 7E 0D 0D 7E C0 C0 7E   02 02 7E 00 00 7E 00 00    ~..~..~ ..~..~..
200017B0   7E 00 00 7E 89 89 7E 02   02 7E 00 00 7E 00 00 7E   ~..~..~. .~..~..~
200017C0   DA DA 7E 6A 6A 7E C0 C0   7E C0 C0 7E 0D 0D 7E 0D   ..~jj~.. ~..~..~.
200017D0   0D 7E 0D 00 00 00 00 00   00 00 00 00 00 00 00 00   .~...... ........

in plain and with explanation:
...send + 0x20 0x0d end of echoed command,
'---------- here our response to that command begins:
0xC0 = SLIP_END
02 00 00 00 89 02 00 00
DA 6A crc
0xC0 = SLIP_END
'---------- so far , so good
0xC0 = SLIP_END .... still possible, since log tells it sends two commands as answer to sync
0x0D 0x0D 0x0D this is BULLSHITT ???
Two subsequent 0XC0 , followed by several 0x0D, ????
As I see it, Transmission stalls after the SLIP-END that starts the second response.
after some seconds, I press {Enter} ( µC hangs anyway, since it is in SLIP mode and doesn't forward chars to the parser) and get these intermingled with the message I'd suppose to parse.

Do I have a echo problem?
IF so, how could I solve it?
generate my echo locally after stripping SLIP?

lets have a look at the ESP-debug log:

  • recieved a well formed sync
  • responsed a 'wifiCb'->0x289
  • pretends to response a cmd=3
    ah, I see, the line starting with cmdAddCb: is missing
    is this a clue?
822055> SLIP: start or end len=0 inpkt=0
822055> SLIP: start or end len=10 inpkt=1
822055> cmdParsePacket: cmd=1 argc=0 value=649
822055> cmdExec: Dispatching cmd=SYNC
822056> cmdResponse: cmd=2 val=649 argc=0
822056> cmdAddCb: 'wifiCb'->0x289 added at 0
822056> cmdWifiCb: wifiStatus=2
822056> cmdGetCbByName: cb wifiCb found at index 0
822056> cmdResponse: cmd=3 val=649 argc=1

Hm... I'd expected ESP-link to either complete SLIP packages or not start them at all.
How can I sync to broken messages?

@wolfgangr
Copy link
Author

wolfgangr commented Feb 3, 2019

buffer overrun?
: s ESPL-sync memstr-counted slip-send ;
so I just call 's' for the same sequence of commands.
The response pattern ends identical:
.... C0 C0 DA
If it'd hit a limit of fixed size, I'd expected a longer tail when I cut down the head.

So no blocklength, but a timing issue?
Do I need a interrupt driven UART buffer?
Do I want to reinvent the wheel?

@wolfgangr
Copy link
Author

wolfgangr commented Feb 3, 2019

Do I want to reinvent the wheel?

of course not, if I do not have to.
There is a interrupt-based USART2 with input ring buffer template - great.
16 Lines of forth - less than my current endeavour.

code snippets related to hardware adresses:

  • $E000E104 constant NVIC-EN1R \ IRQ 32 to 63 Set Enable Register
  • USART2-DR
  • ['] uart-irq-handler irq-usart2 !
  • 6 bit NVIC-EN1R ! \ enable USART2 interrupt 38
  • 5 bit USART2-CR1 bis! \ set RXNEIE

So let's hope that USART1 is hardware equivalent and has the same registers.
ah, here:
http://hightechdoc.net/mecrisp-stellaris/_build/html/interrupts.html?highlight=interrupt#table-b-interrupt-and-exception-vectors

and of course in the datasheet of the manufacturer
pg: 38 memory mapped base address of different components including UARTS:

0x40004400  USART2
0x40013800  USART1

Hints for the hook addresses for forth code as interrupt handlers we can check on the console:

irq-usart2 @ .$ 
0000_478C 
irq-usart1 @ .$ 
0000_478C 

4780 30 dump 
00004780   00 00 09 75 6E 68 61 6E   64 6C 65 64 00 B5 FD F7   ...unhan dled....
00004790   AF FF 14 55 6E 68 61 6E   64 6C 65 64 20 49 6E 74   ...Unhan dled Int
000047A0   65 72 72 75 70 74 20 00   47 F8 04 6D EF F3 05 86   errupt . G..m....

Obviously both interrupts are unassigned and wait for our actions.

There is this RXNEIE thing above which appears to be a per-UART-setting.
We might try just to copy it and read the manual if stuff does not work.

And there is the NVIC-EN1R, which obviously has single bits for different I/O units to enable interrupts. There is a table in the mecrisp-stellaris documentation, referring to something similiar.
However, this refers to Cortex M4 architecture and lists bit 23 prio 35 for USART2, which does not match above stanza, where USART2 is counted as number 38.

BluePill's STM32F103C8T3, in contrast, is labelled as Cortex-M3 by STM.
A pdf search reveals 2.3.5 Nested vectored interrupt controller (NVIC)
Neither search for USART1 nor NVIC reveals the secret of this magic figure :-(

So let's RTFI..nternet and find
http://www.st.com/stonline/products/literature/rm/13902.pdf

on page 199/1134 (admittedly haven't read them all, but I know ESP programmers would be happy with even a tiny subset of that ...) we find on a 'Table 61':

pos prio type of prio acronym       description            address
37   44   settable     USART1  USART1 global interrupt   0x0000_00D4 
38   45   settable     USART2  USART2 global interrupt   0x0000_00D8

This matches the USART2 settings from embello, whom we trust, of course, so we hope we can trust this table, too.

Cleaning up my browser windows, I'd like to save this link:
http://hightechdoc.net/mecrisp-stellaris/_build/html/repl.html?highlight=interrupt
https://sourceforge.net/p/mecrisp/discussion/general/thread/e29549a8d5/?limit=25#4afd/6b1e

emit, emit?, key and key? are the UART handlers.
....
Communication is done from/to buffer variables driven by timers/interrupts

@wolfgangr
Copy link
Author

wolfgangr commented Feb 3, 2019

first draft:
https://github.com/wolfgangr/forthMQTT3pktControl/blob/327577bda876c8b911f1118d02a0b70fcf2de4cf/uart1-irq.fs
... some basic debugging and we can:

uart1. 
SR 0000 DR 000D BRR 0271 CR1 200C CR2 0000 CR3 0000 GPTR 0000 ok.

@wolfgangr
Copy link
Author

wolfgangr commented Feb 3, 2019

This is weird. just hacked

\ switch running REPL to use irq buffered input queue
: uart1-irq_ulize  
  ['] uart1-irq-key? hook-key? !
  ['] uart1-irq-key  hook-key  !
  uart1-irq-init
;

and run it on the console:

uart1-irq_ulize   ok.
  ok.
  ok.
hook-key @ hex. 200008C0  ok.
' uart1-irq-key hex. 200008C0  ok.

Can't believe it. Hacked. Works. Full stop :-)
https://github.com/wolfgangr/forthMQTT3pktControl/blob/irq_ulize/uart1-irq.fs

#430 (comment)

nice work, but why FORTH?

@uzi18 Therefore :-)

@uzi18
Copy link
Contributor

uzi18 commented Feb 3, 2019

nice, but have no time to understand ;) http://hightechdoc.net/mecrisp-stellaris/_build/html/index.html
what was a problem with bad data recv?

@wolfgangr
Copy link
Author

I think that there is no input buffering in stock mecrisp, since the ASM-primitves read directly from USART registers. The forth-compiler aka REPL seems to be fast enough to read that without loss, but obviously not my approach to extract SLIP-encapsulated MQTT-frames from the input stream.

The STM32 hardware would implement RTS/CTS flow control, but on the ESP-link side, this is still on @tve 's to do list (not on the very top, I'm afraid...)

I have now (configurable) 128 byte of ring buffer, so I hope I can read data as fast as it arrives and have enough time to process then.

@uzi18
Copy link
Contributor

uzi18 commented Feb 3, 2019

so add buffering/dma auto fill

@wolfgangr
Copy link
Author

wolfgangr commented Feb 4, 2019

nice work, but why FORTH?

Actually I had expected that this all had long be implemented by @jcw .
He's been doing a lot of great stuff in forth and for forth.
And as far as I can see, ESP-log at least is living at his guthub-premises, and carries his logo, right?

And on the embello github I find credits for both mecris-stellaris-Wizard Matthias Koch and ESP-link chief maintainer @tve Thorsten van Eicken.

I can only hope that @jcw is not going to make a living with stuff I'm going to publish right now....

@wolfgangr
Copy link
Author

so add buffering/dma auto fill

I hope that this is what I have done.
I'm going to rethink the timing of input-SLIP processing.
In my current code, it's hooked to the REPL interpreter's key wait routine call.

This is a bit of the the upside-down-wrong way around, made it hard to code/understand/debug.

I could hook into the USART-IRQ handler, but I've learned that's bad programming practice - IRQ handlers are supposed to stay as short as possible.

I could use the multitasking framework, but this I'd like to kee open for application development.
The SLIP-Decoder has to handle all user input. If it stalls, the REPL stalls and forth gets unresponsive.
I don't want to fiddle with that all the time.

But I think there are 1-ms-systick interrupts as well.
I think it is a good point to hook SLIP-processing to that.
I think that' also a good occasion to check for protocol errors, buffer overruns and other stuff that might quest for an error handler to reset system's responsiveness - both REPL and MQTT in case of whatever bad things may happen. Maybe sort of a watchdog, actually...

@wolfgangr
Copy link
Author

... watchdog ....

?
There is a irq-watchdog available in mecrisp-stellaris.
But I think that's STM hardware and supposed to do hard resets - clear all ram as well.

When I have REPL or SLIP stalls due to communication quirks, I'd like to keep ram state and just do a simple REPL 'quit'. Clear stack and resume user prompt.

Any errorneous SLIP-message might be silently ignored.

Well, maybe it's a good idea to create a syslog entry.
Is there a SLIP-command for sending arbitrary debug text from the guest-µC to the syslog server ?
I hope so. Have to look. If not, I'll raise an issue... ;-)

@wolfgangr
Copy link
Author

Plan:

UART-RX-irq 
   |
serial input buffer (FIFO)
   |
   SLIP-decoder
   |          |
   |     SLIP-Message-Buffer ( n lines x m char )
   |          |
   |     MQTT-Message-Parser
   |          |
   |          +---> MQTT-triggered event-handlers 
   |
command buffer (FIFO)
   |
REPL aka forth outer interpreter aka compiler

Preliminary considerations regarding buffer size:

  1. serial input buffer
  • 115200 bps / 8 = 14400 bytes per second (actually less)
  • If we call the SLIP decoder at 1 ms intervals, we get <= 15 chars in this time
  • At 460800 baud this figure quadruples to 60 chars
  • speed - for what - vs RAM ...
  • do we want to misuse this buffer for backlog downstream processing?
    I don't think so, because that might cause mutual interlocks between REPL and MQTT
  1. REPL-FIFO
    without SLIP, I did not even realize that I had unbuffered input.
    So the REPL can read char's at that speed, but presumably it does so into a line buffer
    To avoid cross stalling to MQTT, the REPL FIFO had to store as many chars as there may be unprocessed between two adjacent SLIP frames.
    I don't know, but given 80 chars as a normal line size I think 128 byte will be a good start

  2. SLIP Message buffer

  • line size
    somewhere in the sources (ESPL? Client?) I found a figure of 128 byte.
    shall we make that shorter? MQTT-Topic + data + some overhead?
    I'd leave it there for the moment
    Keep it configurable and have the figure in mind when we run out of ram.
    My system says ramfree . 16368 - no reason to be stingy yet
  • line number
    depends on the task structure.
    If we have only one task to process MQTT messages then we need only one buffer line.
    But is this a good idea?
    If it takes time, it will backlog into the input buffer and create cross stalling.
    So we have to decide on the task regime first before we can decide on the buffer structure
  • FIFO ring buffer + start pointers?
    might be a more efficient way to store lines than with a rectangular fixed-length-scheme
  • dynamically allocated lines?
    we have no garbage collector, which is one of the great advantages and disadvantages of FORTH. If we want dynamically allocated strings, we should go for micropython or LUA or ....
  1. MQTT-Task-handler regimes
    My simple control switch app used the multitask feature - so we will want to support this.
    What options do we have?
  • Create a new task for every new command?
    There is a limited number of task slots in the scheme
  • have one task to consume the events
  • leave it to the app
  • don't bother the task manger at all - at any ms we have a new task anyway we might keep alive
    At the moment, I think that sounds good.

But there are still too many open quetions for a final decision

@wolfgangr
Copy link
Author

wolfgangr commented Feb 4, 2019

Let' start at my example
https://github.com/wolfgangr/forthMQTT3pktControl/blob/m014/3pkt.fs#L27
Two tasks I've timed there to run periodically (call-every) in background, independent from each other and from the REPL continuing to accept user prompts in the foreground:

' pv-status. 2000 1 call-every 
' pv-check pv-loop-delay @ 0 call-every 
  • pv-status : report the status of the system
  • pv-check : perform one loop turn of the control task:
    ADC read, some calculation, set outputs

Similiar tasks may typically be implemented as MQTT events, triggered by some MQTT subcription message, instead of a fixed periodic schedule.

So let's dig deeper into the anatomy of the mecrisp-stellaris multitasking framework:

We find that multi-fs is a round-robin cooperative multitasking framework, where any task is coded as running forever, but handing control over to the next task in the list ('suspending') at wait states or some other programmed situations using a pausestatement.

timed extends this to single shot tasks which are called by systicked triggers and supposed to finish quickly each call.

multi-irq just adds calls to flag a task to wake or sleep at the next round robin turn, but itself returning immediately - typically to be called in irq-handlers.

To pin down the concepts with as little distractive complexity. we refer to Matthias Koch's "blinky" examples

All methods have their merits, whose pro's and con's are beyond consideration at this point.
We'd just like to keep them all open as options for the implementation of MQTT event handlers.

However, all of them require some tiny code whith the message as parameter kicked off at successful parsing completion of a SLIP message.

So my idea were:

  • hook the SLIP-parser to systick interrupt and let it process as much of the instream each time as possible
  • allow for a configurable hook for processors to be called when a message parse has completed
  • allow the processing to be done in cooperative switching tasks as well
  • allow the processing to be done in tasks waked up at the end of the systick interrupt

The cooerative task mechanisms then require some kind of message passing that might be triggered at the end of a systick routine that completes a message parse run.
To avoid redundancy, we may leave its details to any later implementation and provide here just the interrupt-triggered kickoff.
Quick finishing tasks (far below the 1 ms-limit) can be implemented right away.
Longer or stalling tasks, or tasks that might require to maintain state or some larger data areas, may be handed over to one of the other task implementation variants.

The interrupt handler will then only leave the required message chunks and maybe some flag in some kind of message box. Any stalling / blocking / long runnig routines are forbidden as they may end up in system instability. Detail can be found in any decent multitasking tutorial.

A simple one we find here - search for "Multitasking on the Quick"

@wolfgangr
Copy link
Author

======================== draft 2019/02/05 08:10:27 ===================
Error handling
Brainstorm what might go wrong and what could be good idea to do then

  • input buffer full
    overwrite oldest byte and provide some debug status info - increment counter e.g.

  • command buffer full
    calculate a timeout, clear buffer and call quit to restart REPL
    maybe provide some debug print of dropped command pipeline

  • slip message buffer full
    similiar timeout approach

  • slip protocol error
    -- crc overrun
    discard errorneous message, maybe debug hexdump
    -- SLIP_END timeout
    discard errorneous message, maybe debug hexdump
    what is a proper timeout value? 100 ms?
    -- premature line end
    this is not a real SLIP error condition, since 0x0D is not escaped and may occur within the line
    However, if closing SLIP_END gets lost, we'll expect a timeout

  • output buffer overrun
    not in our current plan, but might happen if we print debug messages e.g. hexdump of a message
    If we can trigger syslog by SLIP we'll desperately need to address this
    Do we need fifo buffered output as well?
    How is output handled at the moment?

@wolfgangr
Copy link
Author

wolfgangr commented Feb 5, 2019

so add buffering/dma auto fill

no doubt, DMA would have the great advantage that once it runs, we didn' have to care whith IRQ and timing and task management interference issues any more at a later point in time.

I see that I get distracted from by first goal quite a lot, but may be this is skd of nasty work that simply has to be done...

So I just tried to read into the basic nuts'n bolts of STM32 DMA
we can attach both UART RX and TX to DMA, we have a ring buffer, we have byte size transfer - all great stuff.

What puzzles me is the first sentence on pg 278:
"transfer addresses (in the current internal peripheral/memory address register)
are not accessible by software."

That's a sad thing, because that's what would have made a ring buffer implementation quite straightforward.

My dry swimmers considerations for workarounds:

For RX,
we might a use a buffer of size 2^n and let DMA_CNDTRx simply overrun.
If me mask this counter by n bit (log2 of buffer size), we get the
pointer offset to the last byte received - so we are done.

For TX,
we must keep track of a write counter our selves.
That's the writing part of the ring buffer, anyway.
DMA_CNDTRx provides us with a "backlog" of bytes still to be
transferred.

write-pointer - DMA_CNDTRx = read-pointer (sorry, no RPL...)

We have to increment this value every time we add to the
buffer, and it gets decremented by DMA each time the UART is pulling
data off.

========================

After scanning the STM manual sections for DMA and USART, I think

  • it might work as sketched out for the RX side.
    Blocking / unblocking is triggererd by the incoming side which is hardware and integrated into the DMA framework.
    The DMA_CNDTRx is reset to zero afte reaching the top and thus can resemble a ring buffer input pointer.

  • I don't see that it is that simple on the TX side.
    The DMA_CNDTRx is not only for counting the backlog, but also to manage buffer size.
    I don't see a way to controll offset from memory buffer start into a certain location.
    It seems that this offset is zero at enabling DMA and increased by one every transfer step, along with a decrease of DMA_CNDTRx
    This mechanics would be straighforward to write out a linear ( adr - len -- ) type buffer.
    However, we'd to make sure that the app does not overwrite the buffer until DMA controlled TX has finished. Either copy the buffer or deeply rewrite the apps.

  • RX and TX share the same interrupt. We have to check flags in the IRQ handler.

Essence:

  • There is no easy way to get rid of USART interrupts by using DMA
  • it might be simple for the RX side , so i might give it a try with limited effort
  • shared interrupts for RX and TX appears manageable

@wolfgangr
Copy link
Author

Live comes into the way.
Customers are threatening whith orders (not for bits'n bytes, different business)
Chance for another one not-yet-finished project under my desk :-\\

Refocus:

Top goal:
get a working gadget that can run callbacks triggered by the receipt of previously configured MQTT subscription messages

required for basic funtionality:

  • IRQ TX
  • SLIP-decoder in the input stream
  • message handler dispatcher
  • debug to REPL command line

required for quality functionaltiy:

nice add non feature:

  • DMA RX/TX

@wolfgangr
Copy link
Author

wolfgangr commented Feb 7, 2019

IRQ TX

looks good: both RX and TX are handled over IRQ-driven ring buffer:
https://github.com/wolfgangr/forthMQTT3pktControl/blob/RX-TX-irq_ulize/uart1-irq.fs

next step is 'SLIP-decoder in the input stream'
above I considered to run this once every ms from systick.
Hower, irq-priority of systick is much higher than of UART, So I'd block reading during this processing and thus loose chars - bad idea.
I could trigger at task in the task framework.
But can this block my reading to the REPL?
There are pause clauses in the keywaits, so this might trigger input procession in some correct sequence.
And if a task is misbehaving? Then the REPL will freeze anyway.
But if I wan't to switch do singletask, my SLIP is not processed - bad idea.

Can I include the slip-decoder in the IRQ-handler?
This would save one buffer as well and thus precious ram.
Do we have enough time? let's see....

@wolfgangr
Copy link
Author

wolfgangr commented Feb 10, 2019

draftet some kind of state machine:
https://github.com/wolfgangr/forthMQTT3pktControl/blob/e995135c562b4de3467b94137152f6cad1f735c8/slip-handler.fs#L90

  • planned to be plugged as serial RX interrupt handler
  • handle SLIP_ESC unescaping
  • handle SLIP_END frame unpacking
  • handle SLIP errors by fallback after configurable timeout and message length
  • write ordinary terminal stream to RX ring buffer for FORTH REPL
  • write SLIP frames to .... hm, well, back again on this question now.

hav been there a week ago :-(((
#430 (comment)

@wolfgangr
Copy link
Author

wolfgangr commented Feb 10, 2019

Have an eye on the reference implementation:

https://github.com/jeelabs/el-client/blob/master/ELClient/ELClient.h#L100

    //== Responses
    // Process the input stream, call this in loop() to dispatch call-back based responses.
    // Callbacks on FP are invoked with an ElClientResponse pointer as argument.
    // Returns the ELClientPacket if a non-callback response was received, typically this is
    // used to create an ELClientResponse. Returns NULL if no response needs to be processed.
ELClientPacket *Process(void);

and here I i see that this is their state machine:
https://github.com/jeelabs/el-client/blob/master/ELClient/ELClient.cpp#L115

So they delay the untwining of SLIP/non-SLIP to the loop(), prepending a ring buffer ( I suppose that's what a 'stream' attached to a serial line boils down to)

Do they have to provide for a situation where SLIP command processing stalls while ohter chars have to be processed?

https://github.com/jeelabs/el-client/blob/master/ELClient/ELClient.h#L105

    // Busy wait for a response with a timeout in milliseconds, returns an ELClientPacket
    // if a response was recv'd and NULL otherwise. The ELClientPacket is typically used to
    // create an ELClientResponse.

OK, forget it.
If they cann afford busy wait for Client response, they don't care about something else happening anyway...

Well .... does ESP-Link end us commands at all ????
Le't pretend we don't know....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants