关于解码Carlife的h264视频流时程序Dump的调查
■现象
carlife黑屏。
■再现步骤
1.Iphone手机carlife版本更新至6.0及以上(5.9版本的手机carlife没有此现象)
2.A7项目有carlife功能的车机
(目前尝试了其他项目的A7车机均有此现象。I6平台的项目或使用PC软件播放Carlife的h264数据没有此现象)
3.连接carlife成功后,点击手机home键退到主界面,车机carlife功能黑屏(程序崩溃)
■根本原因
直接原因:
通过捕获Core文件,使用gdb调试,发现omx代码调用gstreamer接口一路调用到__memcpy_neon时程序Dump。
memcpy拷贝的源地址是非法的地址。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
L127: memcpy (dinfo.data, sinfo.data +offset, size);
(gdb) p sinfo
1 = {memory = 0xaf704da0, flags =GST_MAP_READ,
data = 0xac167000 <error: Cannot accessmemory at address 0xac167000>, ★非法地址
size = 559104, maxsize = 559104, user_data= {0xb6fceab8, 0xb65a019b,
0xb683fda0, 0xb659c0e0}, _gst_reserved ={0x1, 0x7270e000, 0xaca77688,
0x0}}
诱因:
在对Carlife传输给车机的H264码流进行解码时,
我们发现SPS信息中level-id字段会随着Carlife应用在前后台改变而改变。
具体说:
Carlife手机应用在前台时SPS信息:00 00 0001 27 42 00 1F <—(level-id:1F)
Carlife手机应用在后台时SPS信息:00 00 0001 27 42 00 20 <—(level-id:20)
通过与高通沟通,得知Level 3.1(level-id:1F)和3.2(level-id:20)
所支持的视频解析度范围不同,解码器一般会通过该值进行重新配置。
相同解析度的情况下,如果从level 3.1 改成level 3.2不利于解码的稳定性。
根本原因:
高通调查后承认自己解码器的Bug,提供walkaround方案。没有考虑视频播放过程中重新配置解码器的情况。
■调查详细
调试小插曲
一开始直接使用gdb解析CoreDump的时候发现,bt打出来的信息都是?号,无疑这是因为我们的程序没有带-g编译参数,没有符号导致的。
加完-g参数重新编译后,可以完整的显示出了堆栈信息,但是没有行号信息。从gdb的提示信息可以看出,符号是正确加载了,很奇怪。
经过很多尝试之后,决定尝试替换车机里其他人拷贝进去的gdb,使用我在A7平台下编译生成的gdb命令,这下显示了所有想要的东西。
调试开始
下面开始展示真正的实力了:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173 gdb /usr/app/carlife/CarlifeDaemon ./TinyVpuDec\:src_14988678226_2254.core
Core was generated by `/usr/app/carlife/CarlifeDaemon -i'.
Program terminated with signal SIGSEGV, Segmentation fault.
0 __memcpy_neon () at ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S:568
568 ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S: No such file or directory.
(gdb) bt
0 __memcpy_neon () at ../ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S:568
1 0xb6855ff2 in _fallback_mem_copy (mem=0xaf704da0, offset=0, size=559104)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstallocator.c:127
2 0xb685e3a6 in gst_buffer_copy_into (dest=0xaf711020, src=0xaf705ca0,
flags=(GST_BUFFER_COPY_FLAGS | GST_BUFFER_COPY_TIMESTAMPS | GST_BUFFER_COPY_META | GST_BUFFER_COPY_MEMORY | GST_BUFFER_COPY_DEEP), offset=0, size=559104)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstbuffer.c:498
3 0xb685e5bc in gst_buffer_copy_with_flags (buffer=0xaf705ca0,
flags=(GST_BUFFER_COPY_FLAGS | GST_BUFFER_COPY_TIMESTAMPS | GST_BUFFER_COPY_META | GST_BUFFER_COPY_MEMORY | GST_BUFFER_COPY_DEEP))
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstbuffer.c:579
4 0xb65b4cbc in gst_base_sink_drain (basesink=0x15ff3f8)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/libs/gst/base/gstbasesink.c:4918
5 0xb65b8e78 in gst_base_sink_default_query (basesink=0x15ff3f8,
query=0xaf70bc90)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/libs/gst/base/gstbasesink.c:4941
6 0xb688a152 in gst_pad_query (pad=pad@entry=0x15e2960,
query=query@entry=0xaf70bc90)
---Type <return> to continue, or q <return> to quit---
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstpad.c:3831
7 0xb688a624 in gst_pad_peer_query (pad=0x15e2810,
query=query@entry=0xaf70bc90)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstpad.c:3962
8 0xb03d1242 in gst_video_decoder_negotiate_pool (decoder=0x15f6158,
caps=<optimized out>)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0-plugins-base/1.6-r0/git/gst-libs/gst/video/gstvideodecoder.c:3696
9 0xb03d585a in gst_video_decoder_negotiate (decoder=decoder@entry=0x15f6158)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0-plugins-base/1.6-r0/git/gst-libs/gst/video/gstvideodecoder.c:3872
10 0xb032cd92 in gst_omx_video_dec_reconfigure_output_port (
crop_rect=0xb0111168, crop_rect=0xb0111168, self=0x15f6158)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gst-omx/1.2-r0/git/omx/gstomxvideodec.c:1722
11 gst_omx_video_dec_loop (self=0x15f6158)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gst-omx/1.2-r0/git/omx/gstomxvideodec.c:1956
12 0xb68ab9ce in gst_task_func (task=0xb01136b8)
---Type <return> to continue, or q <return> to quit---
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gsttask.c:331
13 0xb6d06410 in g_thread_pool_thread_proxy (data=<optimized out>)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/cortexa7hf-vfp-poky-linux-gnueabi/glib-2.0/1_2.40.0-r0/glib-2.40.0/glib/gthreadpool.c:307
14 0xb6d05d02 in g_thread_proxy (data=0xaf702b80)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/cortexa7hf-vfp-poky-linux-gnueabi/glib-2.0/1_2.40.0-r0/glib-2.40.0/glib/gthread.c:764
15 0xb6b5145e in start_thread (arg=0xaca78910) at pthread_create.c:314
16 0xb699dd9c in ?? ()
at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 1
1 0xb6855ff2 in _fallback_mem_copy (mem=0xaf704da0, offset=0, size=559104)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstallocator.c:127
127 /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstallocator.c: No such file or directory.
结合代码可以发现,程序执行L127: memcpy (dinfo.data, sinfo.data +offset, size);时Dump了
这时我们需要看一下pc指针指向什么地方Dump的。于是我们查看寄存器的信息。
(gdb) info r
r0 0xac1f0058 2887712856
r1 0xac167000 2887151616
r2 0x88800 559104
r3 0x0 0
r4 0x88800 559104
r5 0xac1f0008 2887712776
r6 0xaf704da0 2943372704
r7 0xb69064b4 3062916276
r8 0x88800 559104
r9 0xaca775f0 2896655856
r10 0x0 0
r11 0xaca77624 2896655908
r12 0xac1f0058 2887712856
sp 0xaca775b8 0xaca775b8
lr 0xb6855ff3 -1232773133
pc 0xb6855ff2 0xb6855ff2 <_fallback_mem_copy+118>
cpsr 0x200e0030 537788464
看下pc指向的是汇编哪一行
(gdb) disassemble r
0xb68e52b4 <+372>: strne r3, [r12], #4
0xb68e52b8 <+376>: lsls r8, r8, #31
0xb68e52bc <+380>: ldrhcs r3, [r1], #2
0xb68e52c0 <+384>: ldrbne r8, [r1]
0xb68e52c4 <+388>: strhcs r3, [r12], #2
0xb68e52c8 <+392>: strbne r8, [r12]
0xb68e52cc <+396>: pop {r8} ; (ldr r8, [sp], #4)
0xb68e52d0 <+400>: bx lr
=> 0xb68e52d4 <+404>: vldr d3, [r1]
0xb68e52d8 <+408>: vldr d4, [r1, #64] ; 0x40
0xb68e52dc <+412>: vldr d5, [r1, #128] ; 0x80
0xb68e52e0 <+416>: vldr d6, [r1, #192] ; 0xc0
0xb68e52e4 <+420>: vldr d7, [r1, #256] ; 0x100
0xb68e52e8 <+424>: vldr d0, [r1, #8]
0xb68e52ec <+428>: vldr d1, [r1, #16]
从汇编代码可以看出来,程序是加载 r1 寄存器中存放的地址时Dump了,访问了无法访问的内存。
我们看下代码L127: memcpy (dinfo.data, sinfo.data +offset, size);再看一下我们数据的拷贝源是什么,很明显sinfo.data无法访问。
(gdb) p sinfo
1 = {memory = 0xaf704da0, flags = GST_MAP_READ,
data = 0xac167000 <error: Cannot access memory at address 0xac167000>,
size = 559104, maxsize = 559104, user_data = {0xb6fceab8, 0xb65a019b,
0xb683fda0, 0xb659c0e0}, _gst_reserved = {0x1, 0x7270e000, 0xaca77688,
0x0}}
让我们的显示变得漂亮一点
(gdb) set print pretty on
我们回到栈帧1,为什么要到栈帧1呢?栈帧0是memcpy函数,我们打印不了我们想要看的参数mem
那我们为什么要看mem呢?很简单,sinfo是怎么来的,通过阅读代码是由mem拷贝过来的。
(gdb) frame 1
1 0xb67ebff2 in _fallback_mem_copy (mem=0xaf604da8, offset=0, size=559104)
at /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstallocator.c:127
127 /Build_Directory/workspace/ZS11_Gerrit/A7_MR3/poky/build/tmp/work/atlas7_arm-poky-linux-gnueabi/gstreamer1.0/1.6-r0/git/gst/gstallocator.c: No such file or directory.
(gdb) info r
r0 0xac175058 2887209048
r1 0xac0ec000 2886647808
r2 0x88800 559104
r3 0x0 0
r4 0x88800 559104
r5 0xac175008 2887208968
r6 0xaf604da8 2942324136
r7 0xb689c4b4 3062482100
r8 0x88800 559104
r9 0xac9fc5f0 2896152048
r10 0x0 0
r11 0xac9fc624 2896152100
r12 0xac175058 2887209048
sp 0xac9fc5b8 0xac9fc5b8
lr 0xb67ebff3 -1233207309
pc 0xb67ebff2 0xb67ebff2 <_fallback_mem_copy+118>
cpsr 0x200e0030 537788464
(gdb) p sinfo
4 = {
memory = 0xaf604da8,
flags = GST_MAP_READ,
data = 0xac0ec000 <error: Cannot access memory at address 0xac0ec000>,
size = 559104,
maxsize = 559104,
user_data = {0xb6f64ab8, 0xb653619b, 0xb67d5da0, 0xb65320e0},
_gst_reserved = {0x1, 0x8583b000, 0xac9fc688, 0x0}
}
(gdb) p *(GstMemorySystem *)(mem)
6 = {
mem = {
mini_object = {
type = 1980632,
refcount = 3,
lockstate = 131329,
flags = 1,
copy = 0xb6817489 <_gst_memory_copy>,
dispose = 0xb0280055,
free = 0xb6816f49 <_gst_memory_free>,
n_qdata = 0,
qdata = 0x0
},
allocator = 0xaf60f048,
parent = 0x0,
maxsize = 559104,
align = 0,
offset = 0,
size = 559104
},
slice_size = 0,
data = 0xaf610398 "\001",
user_data = 0xac0ec000,
---Type <return> to continue, or q <return> to quit---
notify = 0xffffffff,
map_mode = 0
}
来,我们看看gstallocator.c:127的代码,gst_memory_map (mem, &sinfo, GST_MAP_READ)在这个地方,给sinfo赋值了,具体怎么赋值的呢?我们需要深入到gst_memory_map 函数1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42gboolean
gst_memory_map (GstMemory * mem, GstMapInfo * info, GstMapFlags flags)
{
g_return_val_if_fail (mem != NULL, FALSE);
g_return_val_if_fail (info != NULL, FALSE);
if (!gst_memory_lock (mem, (GstLockFlags) flags))
goto lock_failed;
info->flags = flags;
info->memory = mem;
info->size = mem->size;
info->maxsize = mem->maxsize - mem->offset;
if (mem->allocator->mem_map_full)
info->data = mem->allocator->mem_map_full (mem, info, mem->maxsize);
else
info->data = mem->allocator->mem_map (mem, mem->maxsize, flags); // ★该函数
if (G_UNLIKELY (info->data == NULL))
goto error;
info->data = info->data + mem->offset;
return TRUE;
/* ERRORS */
lock_failed:
{
GST_CAT_DEBUG (GST_CAT_MEMORY, "mem %p: lock %d failed", mem, flags);
memset (info, 0, sizeof (GstMapInfo));
return FALSE;
}
error:
{
/* something went wrong, restore the orginal state again */
GST_CAT_ERROR (GST_CAT_MEMORY, "mem %p: subclass map failed", mem);
gst_memory_unlock (mem, (GstLockFlags) flags);
memset (info, 0, sizeof (GstMapInfo));
return FALSE;
}
}
1 | //gstallocator.c |
1 | //gstallocator.c |
一开始,以为就是这个函数,但是后来发现不是,思考了一下这个应该就是C语言实现多态的方式。那是怎么知道不是这个函数的呢?增加断点调试。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82(gdb) attach pid(CarlifeDaemon)
(gdb) b gstallocator.c:106
复现问题后继续运行到断点
(gdb) c
锁定当前线程,因为gstmemory.c中的函数会在很多地方调用到,不锁定的话会停在别的线程
(gdb) set scheduler-locking on
(gdb) b gstmemory.c:309
(gdb) c
为了能够进入mem_map函数调试,我在_sysmem_map函数处加了断点,但是死活进不去,于是我一步步,打印mem->allocator 的所有成员看看mem_map的函数地址。
(gdb) p *(GstAllocator*)mem->allocator
4 = {
object = {
object = {
g_type_instance = {
g_class = 0xaf6138d0
},
ref_count = 17,
qdata = 0x2
},
lock = {
p = 0xaf60a600,
i = {2942346752, 0}
},
name = 0xaf60ccb0 "TinyVideoSink:pool:sink:allocator",
parent = 0x0,
flags = 66060304,
control_bindings = 0x0,
control_rate = 100000000,
last_sync = 18446744073709551615,
_gst_reserved = 0x0
},
mem_type = 0xb02bafb8 "V4l2Memory",
mem_map = 0xb029efe9, # 需要关注的函数地址
---Type <return> to continue, or q <return> to quit---
mem_unmap = 0xb029ef7d,
mem_copy = 0xb6766f7d <_fallback_mem_copy>,
mem_share = 0xb029fcc5,
mem_is_span = 0xb029ef0d,
mem_map_full = 0x0,
mem_unmap_full = 0x0,
_gst_reserved = {0x0, 0x0},
priv = 0xaf60d040
}
No function contains specified address.
(gdb) x 0xb029efe9
0xb029efe9: 0x924b146c
其他调用的地方
(gdb) p *(GstAllocator*)mem->allocator
6 = {
object = {
object = {
g_type_instance = {
g_class = 0x1c81ac8
},
ref_count = 7,
qdata = 0x2
},
lock = {
p = 0x1c81b78,
i = {29891448, 0}
},
name = 0x1c80950 "allocatorsysmem0",
parent = 0x0,
flags = 0,
control_bindings = 0x0,
control_rate = 100000000,
last_sync = 18446744073709551615,
_gst_reserved = 0x0
},
mem_type = 0xb67db1d4 "SystemMemory",
mem_map = 0xb67667dd <_sysmem_map>, # 需要关注的函数地址
---Type <return> to continue, or q <return> to quit---
mem_unmap = 0xb67667e1 <_sysmem_unmap>,
mem_copy = 0xb6766a49 <_sysmem_copy>,
mem_share = 0xb6766b65 <_sysmem_share>,
mem_is_span = 0xb67667e5 <_sysmem_is_span>,
mem_map_full = 0x0,
mem_unmap_full = 0x0,
_gst_reserved = {0x0, 0x0},
priv = 0x1c83c18
}
同样的函数指针,指向了不同的函数实现,然而这块代码我们没有,没有办法知道如何赋值的,
只知道现在sinfo.data = 0xac0ec000是(GstMemorySystem )(mem)的user_data