| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
 | <!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>My favorite bug: segfaults in Java — Luke T. Shumaker</title>
  <link rel="stylesheet" href="assets/style.css">
  <link rel="alternate" type="application/atom+xml" href="./index.atom" name="web log entries"/>
</head>
<body>
<header><a href="/">Luke T. Shumaker</a> » <a href=/blog>blog</a> » java-segfault</header>
<article>
<h1 id="my-favorite-bug-segfaults-in-java">My favorite bug: segfaults in
Java</h1>
<blockquote>
<p>Update: Two years later, I wrote a more detailed version of this
article: <a href="./java-segfault-redux.html">My favorite bug: segfaults
in Java (redux)</a>.</p>
</blockquote>
<p>I’ve told this story orally a number of times, but realized that I
have never written it down. This is my favorite bug story; it might not
be my hardest bug, but it is the one I most like to tell.</p>
<h2 id="the-context">The context</h2>
<p>In 2012, I was a Senior programmer on the FIRST Robotics Competition
team 1024. For the unfamiliar, the relevant part of the setup is that
there are 2 minute and 15 second matches in which you have a 120 pound
robot that sometimes runs autonomously, and sometimes is controlled over
WiFi from a person at a laptop running stock “driver station” software
and modifiable “dashboard” software.</p>
<p>That year, we mostly used the dashboard software to allow the human
driver and operator to monitor sensors on the robot, one of them being a
video feed from a web-cam mounted on it. This was really easy because
the new standard dashboard program had a click-and drag interface to add
stock widgets; you just had to make sure the code on the robot was
actually sending the data.</p>
<p>That’s great, until when debugging things, the dashboard would
suddenly vanish. If it was run manually from a terminal (instead of
letting the driver station software launch it), you would see a core
dump indicating a segmentation fault.</p>
<p>This wasn’t just us either; I spoke with people on other teams,
everyone who was streaming video had this issue. But, because it only
happened every couple of minutes, and a match is only 2:15, it didn’t
need to run very long, they just crossed their fingers and hoped it
didn’t happen during a match.</p>
<p>The dashboard was written in Java, and the source was available
(under a 3-clause BSD license), so I dove in, hunting for the bug. Now,
the program did use Java Native Interface to talk to OpenCV, which the
video ran through; so I figured that it must be a bug in the C/C++ code
that was being called. It was especially a pain to track down the
pointers that were causing the issue, because it was hard with native
debuggers to see through all of the JVM stuff to the OpenCV code, and
the OpenCV stuff is opaque to Java debuggers.</p>
<p>Eventually the issue lead me back into the Java code—there was a
native pointer being stored in a Java variable; Java code called the
native routine to <code>free()</code> the structure, but then tried to
feed it to another routine later. This lead to difficulty again—tracking
objects with Java debuggers was hard because they don’t expect the
program to suddenly segfault; it’s Java code, Java doesn’t segfault, it
throws exceptions!</p>
<p>With the help of <code>println()</code> I was eventually able to see
that some code was executing in an order that straight didn’t make
sense.</p>
<h2 id="the-bug">The bug</h2>
<p>The issue was that Java was making an unsafe optimization (I never
bothered to figure out if it is the compiler or the JVM making the
mistake, I was satisfied once I had a work-around).</p>
<p>Java was doing something similar to tail-call optimization with
regard to garbage collection. You see, if it is waiting for the return
value of a method <code>m()</code> of object <code>o</code>, and code in
<code>m()</code> that is yet to be executed doesn’t access any other
methods or properties of <code>o</code>, then it will go ahead and
consider <code>o</code> eligible for garbage collection before
<code>m()</code> has finished running.</p>
<p>That is normally a safe optimization to make… except for when a
destructor method (<code>finalize()</code>) is defined for the object;
the destructor can have side effects, and Java has no way to know
whether it is safe for them to happen before <code>m()</code> has
finished running.</p>
<h2 id="the-work-around">The work-around</h2>
<p>The routine that the segmentation fault was occurring in was
something like:</p>
<pre><code>public type1 getFrame() {
    type2 child = this.getChild();
    type3 var = this.something();
    // `this` may now be garbage collected
    return child.somethingElse(var); // segfault comes here
}</code></pre>
<p>Where the destructor method of <code>this</code> calls a method that
will <code>free()</code> native memory that is also accessed by
<code>child</code>; if <code>this</code> is garbage collected before
<code>child.somethingElse()</code> runs, the backing native code will
try to access memory that has been <code>free()</code>ed, and receive a
segmentation fault. That usually didn’t happen, as the routines were
pretty fast. However, running 30 times a second, eventually bad luck
with the garbage collector happens, and the program crashes.</p>
<p>The work-around was to insert a bogus call to this to keep
<code>this</code> around until after we were also done with
<code>child</code>:</p>
<pre><code>public type1 getFrame() {
    type2 child = this.getChild();
    type3 var = this.something();
    type1 ret = child.somethingElse(var);
    this.getSize(); // bogus call to keep `this` around
    return ret;
}</code></pre>
<p>Yeah. After spending weeks wading through though thousands of lines
of Java, C, and C++, a bogus call to a method I didn’t care about was
the fix.</p>
</article>
<footer>
<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@lukeshu.com">Luke T. Shumaker</a>.</p>
<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a> license.</p>
</footer>
</body>
</html>
 |