diff options
Diffstat (limited to 'public/java-segfault.html')
-rw-r--r-- | public/java-segfault.html | 121 |
1 files changed, 121 insertions, 0 deletions
diff --git a/public/java-segfault.html b/public/java-segfault.html new file mode 100644 index 0000000..4da6dec --- /dev/null +++ b/public/java-segfault.html @@ -0,0 +1,121 @@ +<!DOCTYPE html> +<html lang="en"> +<head> + <meta charset="utf-8"> + <title>My favorite bug: segfaults in Java — Luke T. Shumaker</title> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <link rel="stylesheet" href="assets/style.css"> + <link rel="alternate" type="application/atom+xml" href="./index.atom" name="web log entries"/> +</head> +<body> +<header><a href="/">Luke T. Shumaker</a> » <a href=/blog>blog</a> » java-segfault</header> +<article> +<h1 id="my-favorite-bug-segfaults-in-java">My favorite bug: segfaults in +Java</h1> +<blockquote> +<p>Update: Two years later, I wrote a more detailed version of this +article: <a href="./java-segfault-redux.html">My favorite bug: segfaults +in Java (redux)</a>.</p> +</blockquote> +<p>I’ve told this story orally a number of times, but realized that I +have never written it down. This is my favorite bug story; it might not +be my hardest bug, but it is the one I most like to tell.</p> +<h2 id="the-context">The context</h2> +<p>In 2012, I was a Senior programmer on the FIRST Robotics Competition +team 1024. For the unfamiliar, the relevant part of the setup is that +there are 2 minute and 15 second matches in which you have a 120 pound +robot that sometimes runs autonomously, and sometimes is controlled over +WiFi from a person at a laptop running stock “driver station” software +and modifiable “dashboard” software.</p> +<p>That year, we mostly used the dashboard software to allow the human +driver and operator to monitor sensors on the robot, one of them being a +video feed from a web-cam mounted on it. This was really easy because +the new standard dashboard program had a click-and drag interface to add +stock widgets; you just had to make sure the code on the robot was +actually sending the data.</p> +<p>That’s great, until when debugging things, the dashboard would +suddenly vanish. If it was run manually from a terminal (instead of +letting the driver station software launch it), you would see a core +dump indicating a segmentation fault.</p> +<p>This wasn’t just us either; I spoke with people on other teams, +everyone who was streaming video had this issue. But, because it only +happened every couple of minutes, and a match is only 2:15, it didn’t +need to run very long, they just crossed their fingers and hoped it +didn’t happen during a match.</p> +<p>The dashboard was written in Java, and the source was available +(under a 3-clause BSD license), so I dove in, hunting for the bug. Now, +the program did use Java Native Interface to talk to OpenCV, which the +video ran through; so I figured that it must be a bug in the C/C++ code +that was being called. It was especially a pain to track down the +pointers that were causing the issue, because it was hard with native +debuggers to see through all of the JVM stuff to the OpenCV code, and +the OpenCV stuff is opaque to Java debuggers.</p> +<p>Eventually the issue lead me back into the Java code—there was a +native pointer being stored in a Java variable; Java code called the +native routine to <code>free()</code> the structure, but then tried to +feed it to another routine later. This lead to difficulty again—tracking +objects with Java debuggers was hard because they don’t expect the +program to suddenly segfault; it’s Java code, Java doesn’t segfault, it +throws exceptions!</p> +<p>With the help of <code>println()</code> I was eventually able to see +that some code was executing in an order that straight didn’t make +sense.</p> +<h2 id="the-bug">The bug</h2> +<p>The issue was that Java was making an unsafe optimization (I never +bothered to figure out if it is the compiler or the JVM making the +mistake, I was satisfied once I had a work-around).</p> +<p>Java was doing something similar to tail-call optimization with +regard to garbage collection. You see, if it is waiting for the return +value of a method <code>m()</code> of object <code>o</code>, and code in +<code>m()</code> that is yet to be executed doesn’t access any other +methods or properties of <code>o</code>, then it will go ahead and +consider <code>o</code> eligible for garbage collection before +<code>m()</code> has finished running.</p> +<p>That is normally a safe optimization to make… except for when a +destructor method (<code>finalize()</code>) is defined for the object; +the destructor can have side effects, and Java has no way to know +whether it is safe for them to happen before <code>m()</code> has +finished running.</p> +<h2 id="the-work-around">The work-around</h2> +<p>The routine that the segmentation fault was occurring in was +something like:</p> +<pre><code>public type1 getFrame() { + type2 child = this.getChild(); + type3 var = this.something(); + // `this` may now be garbage collected + return child.somethingElse(var); // segfault comes here +}</code></pre> +<p>Where the destructor method of <code>this</code> calls a method that +will <code>free()</code> native memory that is also accessed by +<code>child</code>; if <code>this</code> is garbage collected before +<code>child.somethingElse()</code> runs, the backing native code will +try to access memory that has been <code>free()</code>ed, and receive a +segmentation fault. That usually didn’t happen, as the routines were +pretty fast. However, running 30 times a second, eventually bad luck +with the garbage collector happens, and the program crashes.</p> +<p>The work-around was to insert a bogus call to this to keep +<code>this</code> around until after we were also done with +<code>child</code>:</p> +<pre><code>public type1 getFrame() { + type2 child = this.getChild(); + type3 var = this.something(); + type1 ret = child.somethingElse(var); + this.getSize(); // bogus call to keep `this` around + return ret; +}</code></pre> +<p>Yeah. After spending weeks wading through though thousands of lines +of Java, C, and C++, a bogus call to a method I didn’t care about was +the fix.</p> + +</article> +<footer> + <aside class="sponsor"><p>I'd love it if you <a class="em" + href="/sponsor/">sponsored me</a>. It will allow me to continue + <a class="em" href="/imworkingon/">my work</a> on the GNU/Linux + ecosystem. Thanks!</p></aside> + +<p>The content of this page is Copyright © 2014 <a href="mailto:lukeshu@lukeshu.com">Luke T. Shumaker</a>.</p> +<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a> license.</p> +</footer> +</body> +</html> |