我们在调试autoware.ai的过程中,有时会遇到某个三方模块异常退出,即没有log也没有生成coredump文件,本文就讲解下如何定位这类问题。
一、使用root权限进入docker容器
autoware.ai默认使用ubuntu18编译,很多同学会使用docker,而默认使用autoware用户进入docker时,是没法sudo执行命令的,会提示“sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set”,所以这里我们首先使用root权限登陆。
编写open-docker脚本:
vim open-docker.sh
# run docker
cd $HOME/autoware.ai/generic/
./run.sh --cuda on -b $HOME/autoware.ai/Autoware,$HOME/autoware.ai/Robot -i yanjingang/autoware
编写exec-docker脚本:
vim exec-docker.sh
#!/bin/bash
set -x
# newgrp docker
CONTAINER_ID=`docker ps|awk 'NR==2{print $1}'`
CONTAINER_USER=autoware
if [ $# -ge 1 ]; then
CONTAINER_USER=$1
fi
docker exec --user $CONTAINER_USER -it $CONTAINER_ID /bin/bash
启动镜像容器:
./open-docker.sh
使用root登录已经open的容器:
./exec-docker.sh root
二、打开coredump文件生成
1.支持生成coredump文件
# 查看当前设置
cat /etc/security/limits.conf|grep core
ulimit -c # 如果是unlimited,说明设置成功
# 设置(最好加到业务启动命令里)
ulimit -c unlimited
2.修改coredump文件生成位置和命名规则
# 查看目前使用的方式
cat /proc/sys/kernel/core_pattern
# 修改生成位置和命名规则(必须bash -c 'echo xxx'方式,不能通过vi修改,默认生成到~/.rox/*.core)
bash -c 'echo "%e-%p-%t.core" > /proc/sys/kernel/core_pattern'
# 检查是否生效
cat /proc/sys/kernel/core_pattern
# 命名格式化参数说明:
%p - insert pid into filename 添加pid
%u - insert current uid into filename 添加当前uid
%g - insert current gid into filename 添加当前gid
%s - insert signal that caused the coredump into the filename 添加导致产生core的信号
%t - insert UNIX time that the coredump occurred into filename 添加core文件生成时的unix时间
%h - insert hostname where the coredump happened into filename 添加主机名
%e - insert coredumping executable name into filename 添加命令名
# qnx系统下没有/proc/sys/kernel/core_pattern文件,需要用命令指定保存目录
3.安装gdb工具
apt install -y gdb
4.保存镜像修改
# 获取容器ID
CONTAINER_ID=`docker ps|awk 'NR==2{print $1}'`
# 导出容器为镜像
docker commit -a "yanjingang" -m "autoware.ai" $CONTAINER_ID yanjingang/autoware:latest-melodic-base-cuda
# 检查镜像tag状态
docker image ls
# 镜像打包为tar包
docker save -o autoware-latest-melodic-base-cuda.tar yanjingang/autoware:latest-melodic-base-cuda
# push本地镜像到hub
docker login
docker push yanjingang/autoware:latest-melodic-base-cuda
三、编译Ros Debug包
1. CMakelist.txt配置
在编译ROS进程的CMakelist.txt中添加下面命令,使编译产生的执行文件包含gdb调试信息。
SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -ggdb")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall")
CMAKE_BUILD_TYPE是CMake中的一个变量,可以的取值是
Debug
Release
RelWithDebInfo
MinSizeRel
当这个变量值为Debug时,CMake会使用变量CMAKE_CXX_FLAGS_DEBUG、CMAKE_C_FLAGS_DEBUG中的字符串作为编译选项生成Makefile,相应的编译选项就是在这里指定。
CMAKE_CXX_FLAGS_DEBUG中的编译选项解释:
- -O0 关闭所有代码优化选项
- -Wall 开启大部分告警提示
- -g 包含调试信息
- -ggdb 在可执行文件中包含可供gdb使用的调试信息
CMAKE_CXX_FLAGS_RELEASE 中的编译选项解释:
- -O3 开启第三级别优化,在-O2基础上增加产生inline函数、使用寄存器等优化技术
2.编译
我们这里单独改了下PNC pure_pursuit模块的的CMakelist.txt,然后编译Debug版本
cd ~/Autoware
AUTOWARE_COMPILE_WITH_CUDA=1 colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Debug --packages-select=pure_pursuit
四、coredump+gdb验证
1.编写测试程序
cd ~/Autoware/src
catkin_create_pkg test_coredump
TEST_DIR=~/Autoware/src/test_coredump
mkdir -p $TEST_DIR/src && mkdir -p $TEST_DIR/launch
# 测试程序
vim $TEST_DIR/src/test_coredump.cpp
#include <ros/ros.h>
#include <iostream>
int main(int argc, char **argv){
ros::init(argc, argv, "core_dump_test");
ros::NodeHandle n;
int *p = NULL;
std::cout<<*p<<std::endl; //使用空指针
ros::AsyncSpinner spinner(1);
spinner.start();
ros::waitForShutdown();
}
2.编写测试编译配置
# 测试编译配置
vim $TEST_DIR/CMakeLists.txt
cmake_minimum_required(VERSION 3.0.2)
project(test_coredump)
SET(CMAKE_BUILD_TYPE "Debug")
SET(CMAKE_CXX_FLAGS_DEBUG "$ENV{CXXFLAGS} -O0 -Wall -g -ggdb")
SET(CMAKE_CXX_FLAGS_RELEASE "$ENV{CXXFLAGS} -O3 -Wall")
find_package(autoware_build_flags REQUIRED)
find_package(
catkin REQUIRED COMPONENTS
roscpp
)
catkin_package(
INCLUDE_DIRS
)
include_directories(
include
${catkin_INCLUDE_DIRS}
)
add_executable(${PROJECT_NAME} src/test_coredump.cpp)
add_dependencies(${PROJECT_NAME} ${${PROJECT_NAME}_EXPORTED_TARGETS} ${catkin_EXPORTED_TARGETS})
target_link_libraries(${PROJECT_NAME}
${catkin_LIBRARIES}
)
install(TARGETS ${PROJECT_NAME}
RUNTIME DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
)
install(
DIRECTORY launch/
DESTINATION ${CATKIN_PACKAGE_SHARE_DESTINATION}/launch
)
# 依赖配置
vim $TEST_DIR/package.xml
<?xml version="1.0"?>
<package format="2">
<name>test_coredump</name>
<version>0.0.0</version>
<description>The test_coredump package</description>
<maintainer email="autoware@todo.todo">autoware</maintainer>
<license>TODO</license>
<depend>roscpp</depend>
<buildtool_depend>catkin</buildtool_depend>
</package>
3.编写launch文件
vim $TEST_DIR/launch/test_coredump.launch
<launch>
<node pkg="test_coredump" type="test_coredump" name="test_coredump" output="log"></node>
</launch>
4.编译
# 编译
cd ~/Autoware
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Debug --packages-select=test_coredump
find . -type f -name test_coredump|xargs chmod +x
5.启动并检查coredump文件
cd ~/Autoware
source ~/Autoware/install/setup.bash
# 启动
roslaunch test_coredump test_coredump.launch
....
[test_coredump-2] process has died [pid 67385, exit code -11
# 查看coredump文件
ll /home/autoware/.ros/test_coredump-* -rt
~/.ros/test_coredump-68009-1764667443.core
6.gdb定位位置
# 确定模块node位置
find . -name test_coredump -type f
./build/test_coredump/devel/lib/test_coredump/test_coredump
# 使用gdb查看core堆栈
gdb ~/Autoware/build/test_coredump/devel/lib/test_coredump/test_coredump ~/.ros/test_coredump-68009-1764667443.core
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/autoware/Autoware/build/test_coredump/devel/lib/test_coredump/test_coredump...done.
[New LWP 68009]
[New LWP 68015]
[New LWP 68018]
[New LWP 68017]
[New LWP 68016]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/autoware/Autoware/install/test_coredump/lib/test_coredump/test_coredump _'.
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0 0x0000564662c02d92 in main (argc=1, argv=0x7fffe50673d8) at /home/autoware/Autoware/src/test_coredump/src/test_coredump.cpp:9
9 std::cout<<*p<<std::endl; //使用空指针
[Current thread is 1 (Thread 0x7fc2f2fe3d40 (LWP 68009))]
(gdb) q
下图是卡尔曼滤波追踪lidar聚类障碍物时,判断聚类点云是否是车辆时,有vector下标越界:
增加防护:
vim Autoware/src/autoware/core_perception/lidar_kf_contour_track/nodes/lidar_kf_contour_track/lidar_kf_contour_track_core.cpp
...
bool ContourTracker::IsCar(const PlannerHNS::DetectedObject& obj, const PlannerHNS::WayPoint& currState, PlannerHNS::RoadNetwork& map)
{
if(bMap)
{
bool bOnLane = false;
// std::cout << "Debug Obj: " << obj.id << ", Closest Lane: " << m_ClosestLanesList.size() << std::endl;
for(unsigned int i =0 ; i < m_ClosestLanesList.size(); i++)
{
PlannerHNS::RelativeInfo info;
PlannerHNS::PlanningHelpers::GetRelativeInfoLimited(m_ClosestLanesList.at(i)->points, obj.center, info);
// std::cout << " info.iFront: " << info.iFront << " points.size: " << m_ClosestLanesList.at(i)->points.size() << std::endl;
// yan 26.12.2 GetRelativeInfoLimited函数在points.size()=2时,做了个临时插值,导致偶发info.iFront=2会超过points.size()的下标,其实2个points不可能是Car,所以这里直接跳过即可
if (info.iFront > m_ClosestLanesList.at(i)->points.size() -1) // 增加拦截
continue;
PlannerHNS::WayPoint wp = m_ClosestLanesList.at(i)->points.at(info.iFront);
double direct_d = hypot(wp.pos.y - obj.center.pos.y, wp.pos.x - obj.center.pos.x);
// std::cout << "- Distance To Car: " << obj.distance_to_center << ", PerpD: " << info.perp_distance << ", DirectD: " << direct_d << ", bAfter: " << info.bAfter << ", bBefore: " << info.bBefore << std::endl;
...
}
以上为一个示例,大家根据自己情况Debug对应模块即可。
yan 2025.12.2
参考:
Docker容器权限报错 [无法使用sudo]:sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set
