Skip to content
This repository has been archived by the owner on Jan 5, 2019. It is now read-only.

Respawn usb_cam_node after crash caused by usb port reconnection improve get_topic_data message and error handling #339

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

goruck
Copy link
Contributor

@goruck goruck commented Sep 2, 2017

The usb_cam_node driver crashes frequently because the USB port loses connection with the camera and the driver tries to read from it before the port gets automatically reconnected by the Raspberry Pi's OS.

You'll see an error like the following:

TRACE> arduino_handler serial read: >0,93.80,26.80,452,30.50,0,0,3.88,0.08<
[swscaler @ 0x11f3860] No accelerated colorspace conversion found from yuv422p to rgb24.
[ INFO] [1503837690.772792494]: Saved image /var/www/html/img.jpg
[ERROR] [1503837690.774624353]: Videre INI format can only save calibrations using the plumb bob distortion model. Use the YAML format instead.
	distortion_model = '', expected 'plumb_bob'
	D.size() = 0, expected 5
[ERROR] [1503837691.449486717]: VIDIOC_DQBUF error 19, No such device
[DEBUG] [WallTime: 1503837691.491640] 
TRACE> arduino_handler serial write 89 bytes: >0,0.0,0.0,False,False,False,False,False,False,0.5,True,True,True,0.0,0.0,0.0,False,False<
[WARN] [WallTime: 1503837691.493645] (5, 'Input/output error')
[WARN] [WallTime: 1503837691.503485] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837691.707030] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837691.908940] No serial device found on system in /dev/serial/by-id
[environments/environment_1/aerial_image-18] process has died [pid 8221, exit code 1, cmd /home/pi/catkin_ws/devel/lib/usb_cam/usb_cam_node __name:=aerial_image __log:=/home/pi/.ros/log/cb12cffa-8b1f-11e7-92a4-b827eb6991c6/environments-environment_1-aerial_image-18.log].
log file: /home/pi/.ros/log/cb12cffa-8b1f-11e7-92a4-b827eb6991c6/environments-environment_1-aerial_image-18*.log
[WARN] [WallTime: 1503837692.110899] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837692.312769] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837692.514701] No serial device found on system in /dev/serial/by-id

Note that the USB port automatically gets reconnected but not before the driver attempts a read which causes it to crash.

The errors above are generated in the usb_cam ros driver here:

case IO_METHOD_MMAP:
      CLEAR(buf);

      buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
      buf.memory = V4L2_MEMORY_MMAP;

      if (-1 == xioctl(fd_, VIDIOC_DQBUF, &buf))
      {
        switch (errno)
        {
          case EAGAIN:
            return 0;

          case EIO:
            /* Could ignore EIO, see spec. */

            /* fall through */

          default:
            errno_exit("VIDIOC_DQBUF");
        }
      }

which caused the process to be killed:

static void errno_exit(const char * s)
{
  ROS_ERROR("%s error %d, %s", s, errno, strerror(errno));
  exit(EXIT_FAILURE);
}

The simple solution is to specify the respawn = "true" attribute for the usb_cam_node in the launch file so that if it crashes ROS will automatically restart it.

This seems to work well as evidenced by the roslaunch log file:

[roslaunch][ERROR] 2017-09-02 12:55:32,524: [environments/environment_1/aerial_image-18] process has died [pid 20145, exit code 1, cmd /home/pi/catkin_ws/devel/lib/usb_cam/usb_cam_node __name:=aerial_image __log:=/home/pi/.ros/log/5b2f4e38-9011-11e7-b929-b827eb6991c6/environments-environment_1-aerial_image-18.log].
log file: /home/pi/.ros/log/5b2f4e38-9011-11e7-b929-b827eb6991c6/environments-environment_1-aerial_image-18*.log
[roslaunch][INFO] 2017-09-02 12:55:32,526: [environments/environment_1/aerial_image-18] restarting process
[roslaunch][INFO] 2017-09-02 12:55:32,527: process[environments/environment_1/aerial_image-18]: restarting os process
[roslaunch][INFO] 2017-09-02 12:55:32,528: process[environments/environment_1/aerial_image-18]: start w/ args [[u'/home/pi/catkin_ws/devel/lib/usb_cam/usb_cam_node', u'__name:=aerial_image', u'__log:=/home/pi/.ros/log/5b2f4e38-9011-11e7-b929-b827eb6991c6/environments-environment_1-aerial_image-18.log']]
[roslaunch][INFO] 2017-09-02 12:55:32,529: process[environments/environment_1/aerial_image-18]: cwd will be [/home/pi/.ros]
[roslaunch][INFO] 2017-09-02 12:55:32,547: process[environments/environment_1/aerial_image-18]: started with pid [20212]

It's not clear why the USB ports disconnect in the first place but since they get automatically reconnected this will mitigate the crash of the USB camera driver. We should still seek the root cause of the Raspberry Pi's USB ports in this regard.

sp4ghet
sp4ghet previously approved these changes Sep 3, 2017
@goruck
Copy link
Contributor Author

goruck commented Sep 24, 2017

Needed to add "respawn_delay = 30" attribute for the usb_cam_node in the launch file since the reconnection may fail if attempted too soon after a crash. With this the camera always reconnects for me and as a added bonus it stays on the same port so the top camera remains mapped to /dev/video0 assuming its physically plugged into the right USB port on the Raspberry Pi.

@goruck goruck changed the title Respawn usb_cam_node after crash caused by usb port reconnection. Respawn usb_cam_node after crash caused by usb port reconnection improve get_topic_data message and error handling Oct 29, 2017
@goruck
Copy link
Contributor Author

goruck commented Oct 29, 2017

The get_topic_data function in nodes/api.py wasn't really useful beyond specific messages. I've rewritten it to be general and more robust. I've added this changes to this PR. Note that get_topic_data depends on rospy_message_converter so I added that to scripts/generate_rosinstall. I'll also open an issue and link it to this PR just in case someone tries to use it and runs into problems.

Copy link

@dukrat dukrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested as non-breaking.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants