Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a maximum script length? #25

Open
tillig opened this issue Jan 23, 2019 · 16 comments
Open

Is there a maximum script length? #25

tillig opened this issue Jan 23, 2019 · 16 comments

Comments

@tillig
Copy link
Contributor

tillig commented Jan 23, 2019

I've got the ws2812svr starting at Raspberry Pi / Raspbian startup via an /etc/init.d script like this:

#! /bin/sh
# /etc/init.d/ws2812svr

### BEGIN INIT INFO
# Provides:  ws2812svr
# Required-Start: $remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop:
# Short-Description: Starts the ws2812svr LED controller.
# Description: Runs the ws2812svr listener on port 9999 to handle LED control.
### END INIT INFO
case "$1" in
  start)
    echo "Starting ws2812svr"
    /usr/bin/ws2812svr -tcp 9999 &
    ;;
  stop)
    echo "Stopping ws2812svr"
    killall ws2812svr
    ;;
  *)
    echo "Usage: /etc/init.d/ws2812svr {start|stop}"
    exit 1
    ;;
esac
exit 0

I have a web app connecting to port 9999 and sending scripts to the lights which include the setup, like:

setup 1,540,3,0,64;init;thread_start;fill 1,ff0000;render;thread_end;

I am able to send any number of short scripts like this with no issue. I can send, like, 10 in a row, no issues.

However, I did notice that if I send a script that exceeds around 1000 characters the whole Raspberry Pi crashes, or at least it hangs so hard it won't respond to any network requests anymore including SSH. I end up having to hard reset it by power cycling.

I thought it was #20 but since I can send the setup any number of times with smaller scripts, I figured it may be that I'm running into something else like a buffer overflow or something else.

Possibly related, possibly a separate thing... when I was trying to set the server up like this...

/usr/bin/ws2812svr -i "setup 1,540,3,0,64;init;" -tcp 9999 &

...I was only able to send a single script (which didn't include setup/init) before my lights stopped responding. I wasn't sure how to determine what was going on. I do know that when I removed the inline setup and went back to sending setup/init with the scripts things started working predictably again. Except, of course, in the case I had a longer script.

@tom-2015
Copy link
Owner

Yes there is a default maximum size for 1 command (command, not entire script), defined here:
#define DEFAULT_COMMAND_LINE_SIZE 1024
There is a check that prevents overflow but maybe something wrong with that. You can change the number and recompile maybe it solves the problem.

If you use the thread_start/end commands there is a default maximum of 32768 characters and this buffer is supposed to grow until all RAM is used. But something could be wrong with this as well. I'll have to check...
#define DEFAULT_BUFFER_SIZE 32768

Maybe you can try changing one of these default buffer size and see if the problem solves?

@tillig
Copy link
Contributor Author

tillig commented Jan 24, 2019

I increased the DEFAULT_COMMAND_LINE to 32768 for testing. It didn't seem to make a difference with respect to hanging. I have a strip of 540 LEDs and tried doing 54 evenly-spaced chasing lights - basically a string of fill 1,ff0000,X,1 commands to fill the initial chasers followed by a loop to rotate. Here's a shorter version of the script with 20 chasers:

setup 1,540,3,0,64;init;thread_start;fill 1,ff0000,0,1;fill 1,ff0000,27,1;fill 1,ff0000,54,1;fill 1,ff0000,81,1;fill 1,ff0000,108,1;fill 1,ff0000,135,1;fill 1,ff0000,162,1;fill 1,ff0000,189,1;fill 1,ff0000,216,1;fill 1,ff0000,243,1;fill 1,ff0000,270,1;fill 1,ff0000,297,1;fill 1,ff0000,324,1;fill 1,ff0000,351,1;fill 1,ff0000,378,1;fill 1,ff0000,405,1;fill 1,ff0000,432,1;fill 1,ff0000,459,1;fill 1,ff0000,486,1;fill 1,ff0000,513,1;render;do;delay 10;rotate;render;loop;thread_stop;

I wasn't able to get 54 chasers going, it hung and dropped the wifi connection.

I was able to get 20 chasers going (the script is a bit under 500 characters) but I did see it freeze in the middle of execution and, again, drop the wifi connection. I wasn't sending anything to it, just letting it run.

I'm not sure if this is a hardware limitation, like if I'm expecting the Raspberry Pi 3B to do too much or what. Still researching.

@tillig
Copy link
Contributor Author

tillig commented Jan 24, 2019

More testing with 32768 characters - I got 40 chasers working, 50 consistently hangs. The 40 chaser script is 988 characters long.

setup 1,540,3,0,64;init;thread_start;fill 1,ff0000,0,1;fill 1,ff0000,13,1;fill 1,ff0000,26,1;fill 1,ff0000,39,1;fill 1,ff0000,52,1;fill 1,ff0000,65,1;fill 1,ff0000,78,1;fill 1,ff0000,91,1;fill 1,ff0000,104,1;fill 1,ff0000,117,1;fill 1,ff0000,130,1;fill 1,ff0000,143,1;fill 1,ff0000,156,1;fill 1,ff0000,169,1;fill 1,ff0000,182,1;fill 1,ff0000,195,1;fill 1,ff0000,208,1;fill 1,ff0000,221,1;fill 1,ff0000,234,1;fill 1,ff0000,247,1;fill 1,ff0000,260,1;fill 1,ff0000,273,1;fill 1,ff0000,286,1;fill 1,ff0000,299,1;fill 1,ff0000,312,1;fill 1,ff0000,325,1;fill 1,ff0000,338,1;fill 1,ff0000,351,1;fill 1,ff0000,364,1;fill 1,ff0000,377,1;fill 1,ff0000,390,1;fill 1,ff0000,403,1;fill 1,ff0000,416,1;fill 1,ff0000,429,1;fill 1,ff0000,442,1;fill 1,ff0000,455,1;fill 1,ff0000,468,1;fill 1,ff0000,481,1;fill 1,ff0000,494,1;fill 1,ff0000,507,1;fill 1,ff0000,520,1;fill 1,ff0000,533,1;render;do;delay 100;rotate;render;loop;thread_stop;

I can consistently get it to hang on 54 chasers, an 1158 character script. The first time I send it, it seems to work fine. The second time, say with a different color, it will consistently hang the whole RPi.

setup 1,540,3,0,64;init;thread_start;fill 1,800000,0,1;fill 1,800000,10,1;fill 1,800000,20,1;fill 1,800000,30,1;fill 1,800000,40,1;fill 1,800000,50,1;fill 1,800000,60,1;fill 1,800000,70,1;fill 1,800000,80,1;fill 1,800000,90,1;fill 1,800000,100,1;fill 1,800000,110,1;fill 1,800000,120,1;fill 1,800000,130,1;fill 1,800000,140,1;fill 1,800000,150,1;fill 1,800000,160,1;fill 1,800000,170,1;fill 1,800000,180,1;fill 1,800000,190,1;fill 1,800000,200,1;fill 1,800000,210,1;fill 1,800000,220,1;fill 1,800000,230,1;fill 1,800000,240,1;fill 1,800000,250,1;fill 1,800000,260,1;fill 1,800000,270,1;fill 1,800000,280,1;fill 1,800000,290,1;fill 1,800000,300,1;fill 1,800000,310,1;fill 1,800000,320,1;fill 1,800000,330,1;fill 1,800000,340,1;fill 1,800000,350,1;fill 1,800000,360,1;fill 1,800000,370,1;fill 1,800000,380,1;fill 1,800000,390,1;fill 1,800000,400,1;fill 1,800000,410,1;fill 1,800000,420,1;fill 1,800000,430,1;fill 1,800000,440,1;fill 1,800000,450,1;fill 1,800000,460,1;fill 1,800000,470,1;fill 1,800000,480,1;fill 1,800000,490,1;fill 1,800000,500,1;fill 1,800000,510,1;fill 1,800000,520,1;fill 1,800000,530,1;render;do;delay 100;rotate;render;loop;thread_stop;

Again, unclear if there's a hardware limitation or what. I'm not sure how to do much diagnosis or see if there are errors cropping up. I have ws2812svr running at startup on a headless system and when it hangs it drops the connection so I can't really see if there's an error popping up.

@tom-2015
Copy link
Owner

I will take a look at it as soon as I have time, it must be some kind of buffer overflow I think...

@tom-2015
Copy link
Owner

tom-2015 commented Feb 6, 2019

I've tried your code and it seems to work fine on my Raspberry Pi 3B. In the beginning there was a bit of lag on ssh but no total crash. You are running it as a service in the background right? What happens if you start it manually from ssh?

Also: at the end after thread_stop; there needs to be an ENTER or the command will not be processed when the connection is closed.

@tillig
Copy link
Contributor Author

tillig commented Feb 6, 2019

I haven't noticed a difference if it's run interactively or as a service, but I'll try again to validate. I do know I have not been sending any \n at the end of the command string - I've been joining them all with semicolon, I've ensured there's a semicolon at the end, but I didn't add the newline. I'll also add that.

@tillig
Copy link
Contributor Author

tillig commented Feb 8, 2019

I added a '\n' and it didn't seem to make a difference. That said, I was able to get the above script to run once and then the second time I tried it using a different color - hang. I rebooted the Pi and tried again - I was able to run the long script twice and it hung on the third time.

I saw this behavior both as a background service and interactive.

I'm starting the service using sudo ./ws2812svr -tcp 9999 and passing the setup in as part of the script. I did it that way so I could store configuration settings in my app rather than having to change it in both the service startup and in the app. (The app uses things like number of lights in the strand to do math and generate scripts.)

When it hangs during interactive runs, it appears I don't get to the point where it says "client connected":

pi@pilights:~ $ sudo /usr/bin/ws2812svr -tcp 9999
Listening on 9999.
Waiting for client to connect.
Client connected.
Waiting for client to connect.
Client connected.
Waiting for client to connect.

In the above scenario, I ran the long script first with red, then with green... and the third try I was going for blue but that's where you see Waiting for client to connect but no actual connection. My app thinks it connected correctly to the socket and sent the data but since it's kind of fire-and-forget I can't really tell if the data was properly received. It appears it's not.

Do you have a test script somewhere that you use to send TCP commands to your server? I could try my command string using your test script and see if maybe I'm doing something wrong in my client app's TCP communication. I mean, it should be straightforward, but who knows.

@tom-2015
Copy link
Owner

tom-2015 commented Feb 8, 2019

Which Raspberry board / Raspbian version are you using? Maybe something changed recently?

Can you try running the server directly on the Pi (attach keyboard and monitor) maybe it will print something you cannot see on ssh because at that point ssh is already dead.

Command line option -d will also enable debugging output. This will print some things about what it is processing but not sure if it will help.

@tillig
Copy link
Contributor Author

tillig commented Feb 8, 2019

I'm on a Raspberry Pi 3B (not 3B+).

I can try running it locally but not with lights connected. My lights are mounted around my ceiling along with the power supply and Pi. I can't get cables up there. Unsure if that'd be much value.

Will try the -d when I get a chance.

@tom-2015
Copy link
Owner

tom-2015 commented Feb 8, 2019

It should not make a difference unless it is a power supply problem. The communication is one way so the program never knows how many leds are attached.

@tillig
Copy link
Contributor Author

tillig commented Feb 9, 2019

I hooked up a keyboard and monitor locally and ran:

sudo /usr/bin/ws2812svr -d -tcp 9999

I then ran my app and sent in the long script. A lot of text shot out and then when I tried to send the script again (this time with a different color), the hang occurred - or, at least, where I assume the hang would be, because the text stopped scrolling and the new script didn't take effect.

The tail of the log, including my attempt to stop and start the server again, looks like this:

Rotate 0 1 1 0
Render (null)
loop 1106
Rotate 0 1 1 0
Render (null)
loop 1106
Rotate 0 1 1 0
Render (null)
Exit thread.
Client connected.
^C
pi@pilights:~ $ sudo /usr/bin/ws2812svr -d -tcp 9999
Listening on 9999.
ERROR on binding.
pi@pilights:~ $ sudo killall ws2812svr
ws2812svr: no process found
pi@pilights:~ $ sudo /usr/bin/ws2812svr -d -tcp 9999
Listening on 9999.
ERROR on binding.

On that last Client connected is where I tried to send the script the second time. My app thinks it sent the script and disconnects; the server doesn't seem to register that and just sits. I tried to exit the server with Ctrl+C as you can see, which it did, but on trying to re-run it looks like the port is still bound to something.

netstat doesn't show anything bound to port 9999. Only SSH is listening on any TCP port that I can see.

After about 10 minutes of looking at netstat data, I tried running the server again and port 9999 was available again. It came up with

Listening on 9999.
Waiting for client to connect.

However, when I tried to connect to port 9999 via my app, my app hung on the connect command - I never could connect again.

I exited the server with Ctrl+C and did sudo reboot. Interestingly enough, it also hung during reboot when trying to stop all the network interfaces.

[ OK ] Stopped target Network.
       Stopping Raise network interfaces...
       Stopping dhcpcd on all interfaces...
[ OK ] Stopped Raise network interfaces.

And that's where it hung. I had to hard power off/on to get it to reboot.

Now, here's where it gets weird. I decided to step through the connect/send/disconnect in my app to see where things get hung up... and when I was stepping through I couldn't reproduce the issue.

I now find that if I introduce a half-second of sleep time between connecting to the server and sending the new script that things pretty consistently work. While I can't prove it at the moment, it feels as though the thread from the previous script needs time to stop before the new script starts with a new thread; and the sleep time between the connect and the send is giving it just enough time to be properly handled. Or maybe I had connected but the server wasn't quite ready to receive everything? I dunno. Regardless, here's what it looks like in C#:

socket.Connect(endpoint);
// Adding this sleep time seems to be the magic.
Thread.Sleep(500);
socket.Send(msg, 0, msg.Length, SocketFlags.None);
socket.Shutdown(SocketShutdown.Both);
socket.Close();

I'll keep messing with it, but with that sleep time on my side I'm getting pretty consistent success, at least when running with the keyboard/monitor and disconnected from the lights. I'm going to hook it all back up for real and try some more.

@tillig
Copy link
Contributor Author

tillig commented Feb 9, 2019

Verified - I've been messing with this for an hour, sending script after script at it, and if I have the sleep in place it seems to work without fail. I'm not sure what the minimum sleep value needs to be, but I've slowly been reducing it and I'm down to 200ms without any issue, which for my purposes is plenty enough "real time script change."

@tom-2015
Copy link
Owner

When you close the connection the server needs a liltle bit of time to go back to listen mode. Seems like it can't handle a new connection request while switching from connected back to listen mode. Strange though because Raspbian OS should all take care of this and not crash the network interfaces...

@tillig
Copy link
Contributor Author

tillig commented Feb 10, 2019

I don't know enough about how it gets handled at the OS level, but to be clear, I'm not hammering it with connect/disconnect rapid fire, is more... Connect, send, disconnect, watch the lights for a few seconds. It's the subsequent connect and send that gets stuck, so it's more like something needs time between connect and receive. Many of my scripts have the thread start/end; any chance there's something hanging during the thread end? The thread ends on connect, right?

@tom-2015
Copy link
Owner

Yes it could be the program hangs somewhere but this should not mess up you network/ssh should still continue to work.

@CHBeeblebrox
Copy link

Inserting 'ipv6.disable=1' in '/boot/cmdline.txt' solved the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants