Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lidar performance #4505

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

alonfaraj
Copy link
Contributor

@alonfaraj alonfaraj commented Apr 28, 2022

Fixes

Fixes: #4474

About

Lidar raycast make the calculation using line trace on single thread.
This PR take advantage of ParallelFor to improve the performance.

How Has This Been Tested?

Lidar config tested

{
  "SettingsVersion": 1.2,
  "SimMode": "Car",
  "EnableRpc": true,
  "ViewMode": "Fpv",
  "LogMessagesVisible": true,
  "EngineSound": false,

  "Vehicles": {
    "PhysXCar": {
      "VehicleType": "PhysXCar",
      "DefaultVehicleState": "",
      "AutoCreate": true,
      "PawnPath": "",
      "EnableCollisionPassthrogh": false,
      "EnableCollisions": true,
      "RC": {
        "RemoteControlID": 0
      },
      "Sensors": {
        "LidarCustom": {
            "SensorType": 6,
            "Enabled" : true,
            "NumberOfChannels": 16,
            "PointsPerSecond": 32000,
            "Range": 40.0,
            "X": 2.0, "Y": 0, "Z": -0.7657162547111511,
            "Pitch": 0.0, "Roll": 0.0, "Yaw": 0.0,
            "HorizontalFOVStart": -180.0,
            "HorizontalFOVEnd": 180.0,
            "VerticalFOVUpper": 45.0,
            "VerticalFOVLower": -10.0,     
            "DrawDebugPoints": true
        }
      }
    }
  }
}

Benchmark:

Before:
LogTemp: Warning: code executed in 0.023679 seconds.
LogTemp: Warning: code executed in 0.007037 seconds.
LogTemp: Warning: code executed in 0.007560 seconds.
LogTemp: Warning: code executed in 0.007134 seconds.
LogTemp: Warning: code executed in 0.008667 seconds.
LogTemp: Warning: code executed in 0.007389 seconds.
LogTemp: Warning: code executed in 0.007219 seconds.
LogTemp: Warning: code executed in 0.008543 seconds.
LogTemp: Warning: code executed in 0.007057 seconds.
LogTemp: Warning: code executed in 0.009526 seconds.
LogTemp: Warning: code executed in 0.008571 seconds.
LogTemp: Warning: code executed in 0.007375 seconds.
LogTemp: Warning: code executed in 0.007126 seconds.
LogTemp: Warning: code executed in 0.008792 seconds.
LogTemp: Warning: code executed in 0.006901 seconds.
LogTemp: Warning: code executed in 0.007225 seconds.
LogTemp: Warning: code executed in 0.008875 seconds.
LogTemp: Warning: code executed in 0.007040 seconds.
LogTemp: Warning: code executed in 0.007565 seconds.
LogTemp: Warning: code executed in 0.008645 seconds.
LogTemp: Warning: code executed in 0.008770 seconds.
LogTemp: Warning: code executed in 0.007121 seconds.
LogTemp: Warning: code executed in 0.008888 seconds.
LogTemp: Warning: code executed in 0.009491 seconds.
LogTemp: Warning: code executed in 0.008847 seconds.
LogTemp: Warning: code executed in 0.008941 seconds.
LogTemp: Warning: code executed in 0.007436 seconds.
LogTemp: Warning: code executed in 0.009542 seconds.
LogTemp: Warning: code executed in 0.009030 seconds.
LogTemp: Warning: code executed in 0.008122 seconds.
LogTemp: Warning: code executed in 0.009491 seconds.
LogTemp: Warning: code executed in 0.008235 seconds.
LogTemp: Warning: code executed in 0.010218 seconds.
LogTemp: Warning: code executed in 0.008488 seconds.
LogTemp: Warning: code executed in 0.010495 seconds.
LogTemp: Warning: code executed in 0.009934 seconds.
LogTemp: Warning: code executed in 0.014051 seconds.

After:
LogTemp: Warning: code executed in 0.002308 seconds.
LogTemp: Warning: code executed in 0.001656 seconds.
LogTemp: Warning: code executed in 0.001599 seconds.
LogTemp: Warning: code executed in 0.001483 seconds.
LogTemp: Warning: code executed in 0.001626 seconds.
LogTemp: Warning: code executed in 0.001331 seconds.
LogTemp: Warning: code executed in 0.001581 seconds.
LogTemp: Warning: code executed in 0.001809 seconds.
LogTemp: Warning: code executed in 0.001552 seconds.
LogTemp: Warning: code executed in 0.001292 seconds.
LogTemp: Warning: code executed in 0.001149 seconds.
LogTemp: Warning: code executed in 0.002029 seconds.
LogTemp: Warning: code executed in 0.001522 seconds.
LogTemp: Warning: code executed in 0.001314 seconds.
LogTemp: Warning: code executed in 0.001266 seconds.
LogTemp: Warning: code executed in 0.001196 seconds.
LogTemp: Warning: code executed in 0.002130 seconds.
LogTemp: Warning: code executed in 0.001464 seconds.
LogTemp: Warning: code executed in 0.001435 seconds.
LogTemp: Warning: code executed in 0.001353 seconds.
LogTemp: Warning: code executed in 0.001157 seconds.
LogTemp: Warning: code executed in 0.002004 seconds.
LogTemp: Warning: code executed in 0.001739 seconds.
LogTemp: Warning: code executed in 0.001274 seconds.
LogTemp: Warning: code executed in 0.001429 seconds.
LogTemp: Warning: code executed in 0.002372 seconds.
LogTemp: Warning: code executed in 0.002305 seconds.
LogTemp: Warning: code executed in 0.001499 seconds.
LogTemp: Warning: code executed in 0.001063 seconds.
LogTemp: Warning: code executed in 0.001904 seconds.
LogTemp: Warning: code executed in 0.001390 seconds.
LogTemp: Warning: code executed in 0.001310 seconds.
LogTemp: Warning: code executed in 0.001293 seconds.
LogTemp: Warning: code executed in 0.001710 seconds.
LogTemp: Warning: code executed in 0.001767 seconds.
LogTemp: Warning: code executed in 0.001296 seconds.
LogTemp: Warning: code executed in 0.001814 seconds.
LogTemp: Warning: code executed in 0.001432 seconds.
LogTemp: Warning: code executed in 0.001237 seconds.
LogTemp: Warning: code executed in 0.001295 seconds.
LogTemp: Warning: code executed in 0.001241 seconds.
LogTemp: Warning: code executed in 0.001401 seconds.
LogTemp: Warning: code executed in 0.001784 seconds.
LogTemp: Warning: code executed in 0.001396 seconds.

You can see the execution time is about 8 times faster, which obviously depends on your CPU.

UPDATE:
After some tests, I have found the point cloud generated by the multithread method is not the same as the original.
Fixed here 497cf06

Also, it's pretty complicated to verify that the clouds are exactly the same so I have added a test to check they are identical.
I ran both the original method the and new method, compared the point cloud arrays and made sure I didn't hit the breakpoint in case they are different.

To test it yourself, you can override the UnrealLidarSensor::getPointCloud method with the following

UnrealLidarSensor::getPointCloud

// returns a point-cloud for the tick
void UnrealLidarSensor::getPointCloud(const msr::airlib::Pose& lidar_pose, const msr::airlib::Pose& vehicle_pose,
                                      const msr::airlib::TTimeDelta delta_time, msr::airlib::vector<msr::airlib::real_T>& point_cloud, msr::airlib::vector<int>& segmentation_cloud)
{
    point_cloud.clear();
    segmentation_cloud.clear();

    const msr::airlib::LidarSimpleParams params = getParams();
    const auto number_of_lasers = params.number_of_channels;

    // cap the points to scan via ray-tracing; this is currently needed for car/Unreal tick scenarios
    // since SensorBase mechanism uses the elapsed clock time instead of the tick delta-time.
    constexpr float MAX_POINTS_IN_SCAN = 1e+5f;
    uint32 total_points_to_scan = FMath::RoundHalfFromZero(params.points_per_second * delta_time);
    if (total_points_to_scan > MAX_POINTS_IN_SCAN) {
        total_points_to_scan = MAX_POINTS_IN_SCAN;
        UAirBlueprintLib::LogMessageString("Lidar: ", "Capping number of points to scan", LogDebugLevel::Failure);
    }

    // calculate number of points needed for each laser/channel
    const uint32 points_to_scan_with_one_laser = FMath::RoundHalfFromZero(total_points_to_scan / float(number_of_lasers));
    if (points_to_scan_with_one_laser <= 0) {
        //UAirBlueprintLib::LogMessageString("Lidar: ", "No points requested this frame", LogDebugLevel::Failure);
        return;
    }

    // calculate needed angle/distance between each point
    const float angle_distance_of_tick = params.horizontal_rotation_frequency * 360.0f * delta_time;
    const float angle_distance_of_laser_measure = angle_distance_of_tick / points_to_scan_with_one_laser;

    // normalize FOV start/end
    const float laser_start = std::fmod(360.0f + params.horizontal_FOV_start, 360.0f);
    const float laser_end = std::fmod(360.0f + params.horizontal_FOV_end, 360.0f);

    std::vector<float> point_cloud_single_thread;
    // Single threaded
    {
        // shoot lasers
        for (auto laser = 0u; laser < number_of_lasers; ++laser) {
            const float vertical_angle = laser_angles_[laser];

            for (auto i = 0u; i < points_to_scan_with_one_laser; ++i) {
                const float horizontal_angle = std::fmod(current_horizontal_angle_ + angle_distance_of_laser_measure * i, 360.0f);

                // check if the laser is outside the requested horizontal FOV
                if (!VectorMath::isAngleBetweenAngles(horizontal_angle, laser_start, laser_end))
                    continue;

                Vector3r point;
                int segmentationID = -1;
                // shoot laser and get the impact point, if any
                if (shootLaser(lidar_pose, vehicle_pose, horizontal_angle, vertical_angle, params, point, segmentationID)) {
                    point_cloud_single_thread.emplace_back(point.x());
                    point_cloud_single_thread.emplace_back(point.y());
                    point_cloud_single_thread.emplace_back(point.z());
                    //segmentation_cloud.emplace_back(segmentationID);
                }
            }
        }
    }

    // Multi threaded
    const uint32 total_jobs = number_of_lasers * points_to_scan_with_one_laser;

    point_cloud.assign(total_jobs * 3, FLT_MAX);
    segmentation_cloud.assign(total_jobs, -1);
    {
        ParallelFor(total_jobs, [&](int32 idx) {
            int32 laser_idx = (idx / points_to_scan_with_one_laser) % number_of_lasers;
            const float vertical_angle = laser_angles_[laser_idx];
            const float horizontal_angle = std::fmod(current_horizontal_angle_ + angle_distance_of_laser_measure * (idx % points_to_scan_with_one_laser), 360.0f);

            // check if the laser is outside the requested horizontal FOV
            if (VectorMath::isAngleBetweenAngles(horizontal_angle, laser_start, laser_end)) {
                Vector3r point;
                int segmentationID = -1;
                // shoot laser and get the impact point, if any
                if (shootLaser(lidar_pose, vehicle_pose, horizontal_angle, vertical_angle, params, point, segmentationID)) {
                    point_cloud[idx * 3] = point.x();
                    point_cloud[idx * 3 + 1] = point.y();
                    point_cloud[idx * 3 + 2] = point.z();
                    segmentation_cloud[idx] = segmentationID;
                }
            }
        });
    }

    // erase–remove idiom to handle non-valid elements
    point_cloud.erase(std::remove(point_cloud.begin(), point_cloud.end(), FLT_MAX), point_cloud.end());
    segmentation_cloud.erase(std::remove(segmentation_cloud.begin(), segmentation_cloud.end(), -1), segmentation_cloud.end());

    // compare
    if (point_cloud_single_thread != point_cloud) {
        UAirBlueprintLib::LogMessageString("Lidar test not equal ", "", LogDebugLevel::Failure);
    }

    current_horizontal_angle_ = std::fmod(current_horizontal_angle_ + angle_distance_of_tick, 360.0f);

    return;
}

Copy link
Contributor

@rajat2004 rajat2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big performance improvement, thanks a lot for this!

@rajat2004
Copy link
Contributor

rajat2004 commented May 8, 2022

Some more things which might also help in extracting more performance -

for (int i = 0; i < meshComponents.Num(); i++) {
segmentationID = segmentationID == -1 ? meshComponents[i]->CustomDepthStencilValue : segmentationID;
}

This is just getting the first component with id != -1, instead of continuing to iterate, we can break early -

for (auto* comp : meshCompoenents)
{
    segmentationID = comp->CustomDepthStencilValue;
    if (segmentationID != -1)
        break;
}

bool UnrealLidarSensor::shootLaser(const msr::airlib::Pose& lidar_pose, const msr::airlib::Pose& vehicle_pose,
const uint32 laser, const float horizontal_angle, const float vertical_angle,
const msr::airlib::LidarSimpleParams params, Vector3r& point, int& segmentationID)

Here laser is unused in the method body, should be removed. Also, const msr::airlib::LidarSimpleParams params is getting copied every time, that's also a performance hit

if (params.data_frame == AirSimSettings::kVehicleInertialFrame) {

This is a string comparison in the fast path, data_frame type should be changed to an Enum instead, it's used in very few places only hence am suggesting cleaning this up as well

msr::airlib::LidarSimpleParams params = getParams();

This could be made a const ref

@rajat2004
Copy link
Contributor

Another thing which I just wanted to note, from my (limited) experience with parallel programming, it was often better to have lesser number of parallel tasks each doing more work, rather than large number of small tasks. This helped in reducing core contention and improving cache locality as well.
Rather than doing ParallelFor for total_points_to_scan, it might be worth doing some batching, like total_points_to_scan / batch_size, and it would work on batch_size number of points. It would complicate the code a bit though, just something which might be interesting, doesn't need to be done in this PR. If trying this out, making it a configurable param would help in finding appropriate sizes which work well

@alonfaraj
Copy link
Contributor Author

Some more things which might also help in extracting more performance -

for (int i = 0; i < meshComponents.Num(); i++) {
segmentationID = segmentationID == -1 ? meshComponents[i]->CustomDepthStencilValue : segmentationID;
}

This is just getting the first component with id != -1, instead of continuing to iterate, we can break early -

for (auto* comp : meshCompoenents)
{
    segmentationID = comp->CustomDepthStencilValue;
    if (segmentationID != -1)
        break;
}

bool UnrealLidarSensor::shootLaser(const msr::airlib::Pose& lidar_pose, const msr::airlib::Pose& vehicle_pose,
const uint32 laser, const float horizontal_angle, const float vertical_angle,
const msr::airlib::LidarSimpleParams params, Vector3r& point, int& segmentationID)

Here laser is unused in the method body, should be removed. Also, const msr::airlib::LidarSimpleParams params is getting copied every time, that's also a performance hit

if (params.data_frame == AirSimSettings::kVehicleInertialFrame) {

This is a string comparison in the fast path, data_frame type should be changed to an Enum instead, it's used in very few places only hence am suggesting cleaning this up as well

msr::airlib::LidarSimpleParams params = getParams();

This could be made a const ref

Thanks @rajat2004!
Good points. I took care of the things you mentioned.

@wouter-heerwegh
Copy link

Since this project is going to be archived, could this pull request be merged in the Colosseum fork?

@xxEoD2242
Copy link

Since this project is going to be archived, could this pull request be merged in the Colosseum fork?

Hey @wouter-heerwegh,

I'm happy to merge this in. Give me a minute to figure out how to pull this PR into the forked repo and I will.

@xxEoD2242
Copy link

Adding this to Colosseum via PR #1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants