diff options
author | Luke Shumaker <lukeshu@lukeshu.com> | 2023-07-21 23:41:41 -0600 |
---|---|---|
committer | Luke Shumaker <lukeshu@lukeshu.com> | 2023-07-21 23:41:57 -0600 |
commit | 9f1f26adbd35e5b968f3a561bc04fb7664322f55 (patch) | |
tree | ba66322ca7494e3df0a62ff1917ac069ad6d9392 /public | |
parent | 3250a2386d3111a4ec51b37f42218c90b69ed341 (diff) | |
parent | 997b21c0c3aa7aee8fb56ed94d0419e46816e8ac (diff) |
make: btrfs-rec.md: Improve the markup
Diffstat (limited to 'public')
-rw-r--r-- | public/assets/style.css | 22 | ||||
-rw-r--r-- | public/btrfs-rec.html | 613 | ||||
-rw-r--r-- | public/btrfs-rec.md | 122 | ||||
-rw-r--r-- | public/index.atom | 613 |
4 files changed, 750 insertions, 620 deletions
diff --git a/public/assets/style.css b/public/assets/style.css index e653c21..1dbdb3d 100644 --- a/public/assets/style.css +++ b/public/assets/style.css @@ -30,10 +30,28 @@ li p + ul { margin-top: -1em; } kbd, code, samp, tt, pre { background: #DDDDFF; - white-space: pre; +} +kbd, code, samp, tt, { + white-space: pre-wrap; +} +@media screen { + /* Firefox for Android does weird hacks with font-size. In + * particular, `white-space: pre` is broken on Firefox for + * Android if you don't set an absolute font-size. So this is + * a hack to get reasonable behavior without setting + * `white-space: pre`. */ + pre { + white-space: pre-wrap; + overflow-x: auto; + } + pre > code { + white-space: pre-wrap; + display: block; + min-width: max-content; + } } @media print { - kbd, code, samp, tt, pre { + pre, pre > code { white-space: pre-wrap; } } diff --git a/public/btrfs-rec.html b/public/btrfs-rec.html index e28b9f2..3012538 100644 --- a/public/btrfs-rec.html +++ b/public/btrfs-rec.html @@ -16,8 +16,8 @@ btrfs-rec: Recover (data from) a broken btrfs filesystem</h1> <p>I originally sent this email on 2023-07-10, but it has been caught by their bogofilter. Yes, it was <a href="https://git.lukeshu.com/btrfs-progs-ng/tree/README.md?id=18e6066c241cf3d252b6521150843ffc858d8434">plaintext</a>. -No, I didn’t use GMail. Yes, I’ve successfully participated in vger -lists in the past. Yes, I’ve reached out to postmaster; no, I haven’t +No, I didn't use GMail. Yes, I've successfully participated in vger +lists in the past. Yes, I've reached out to postmaster; no, I haven't received a reply yet (as of 2023-07-14).</p> </blockquote> <div style="font-family: monospace"> @@ -28,17 +28,17 @@ a broken btrfs filesystem<br/> Date: Mon, 10 Jul 2023 21:23:41 <87jzv7uo5e.wl-lukeshu@lukeshu.com><br/></p> </div> <p>Inspired by a mis-typed <code>dd</code> command, for the last year -I’ve been working on a tool for recovering corrupt btrfs filesystems; at +I've been working on a tool for recovering corrupt btrfs filesystems; at first idly here and there, but more actively in the last few months. I hope to get it incorporated into btrfs-progs, though perhaps that is -problematic for a few reasons I’ll get to. If the code can’t be +problematic for a few reasons I'll get to. If the code can't be incorporated into btrfs-progs, at least the ideas and algorithms should be.</p> <p><a href="https://git.lukeshu.com/btrfs-progs-ng/">https://git.lukeshu.com/btrfs-progs-ng/</a></p> <p>Highlights:</p> <ul> -<li><p>In general, it’s more tolerant of corrupt filesystems than +<li><p>In general, it's more tolerant of corrupt filesystems than <code>btrfs check --repair</code>, <code>btrfs rescue</code> or <code>btrfs restore</code>.</p></li> <li><p><code>btrfs-rec inspect rebuild-mappings</code> is a better @@ -48,8 +48,8 @@ branches to broken B+ trees.</p></li> <li><p><code>btrfs-rec inspect mount</code> is a read-only FUSE implementation of btrfs. This is conceptually a replacement for <code>btrfs restore</code>.</p></li> -<li><p>It’s entirely written in Go. I’m not saying that’s a good thing, -but it’s an interesting thing.</p></li> +<li><p>It's entirely written in Go. I'm not saying that's a good thing, +but it's an interesting thing.</p></li> </ul> <p>Hopefully some folks will find it useful, or at least neat!</p> <ul> @@ -162,25 +162,26 @@ to try to get past those errors, only to get a different set of errors. Some of these patches I am separately submitting to btrfs-progs.)</p> <h1 id="overview-of-use">2. Overview of use</h1> <p>There are two <code>btrfs-rec</code> sub-command groups: -<code>btrfs-rec inspect SUBCMD</code> and -<code>btrfs-rec repair SUBCMD</code>, and you can find out about various +<code>btrfs-rec inspect <var>SUBCMD</var></code> and <code>btrfs-rec +repair <var>SUBCMD</var></code>, and you can find out about various sub-commands with <code>btrfs-rec help</code>. These are both told about devices or images with the <code>--pv</code> flag.</p> -<p><code>btrfs-rec inspect SUBCMD</code> commands open the filesystem -read-only, and (generally speaking) write extracted or rebuilt -information to stdout. <code>btrfs-rec repair SUBCMD</code> commands -open the filesystem read+write, and consume information from -<code>btrfs-rec inspect SUBCMD</code> commands to actually repair the -filesystem (except I haven’t actually implemented any -<code>repair</code> commands yet… despite the lack of -<code>repair</code> commands, I believe that <code>btrfs-rec</code> is -already a useful because of the <code>btrfs-rec inspect mount</code> -command to get data out of the broken filesystem). This split allows you -to try things without being scared by WARNINGs about not using these -tools unless you’re an expert or have been told to by a developer.</p> +<p><code>btrfs-rec inspect <var>SUBCMD</var></code> commands open the +filesystem read-only, and (generally speaking) write extracted or +rebuilt information to stdout. <code>btrfs-rec repair +<var>SUBCMD</var></code> commands open the filesystem read+write, and +consume information from <code>btrfs-rec inspect +<var>SUBCMD</var></code> commands to actually repair the filesystem +(except I haven't actually implemented any <code>repair</code> commands +yet... despite the lack of <code>repair</code> commands, I believe that +<code>btrfs-rec</code> is already a useful because of the +<code>btrfs-rec inspect mount</code> command to get data out of the +broken filesystem). This split allows you to try things without being +scared by WARNINGs about not using these tools unless you're an expert +or have been told to by a developer.</p> <p>In the broken <code>dump-zero.1.img</code> example above (which has a perfectly intact superblock, but a totally broken -<code>CHUNK_TREE</code>), to “repair” it I’d:</p> +<code>CHUNK_TREE</code>), to "repair" it I'd:</p> <ol type="1"> <li><p>Start by using <code>btrfs-rec inspect rebuild-mappings</code> to rebuild the broken chunk/dev/blockgroup trees:</p> @@ -188,7 +189,7 @@ rebuild the broken chunk/dev/blockgroup trees:</p> --pv=dump-zero.1.img \ > mappings-1.json</code></pre></li> <li><p>If it only mostly succeeds, but on stderr tells us about a few -regions of the image that it wasn’t able to figure out the chunks for. +regions of the image that it wasn't able to figure out the chunks for. Using some human-level knowledge, you can write those yourself, inserting them into the generated <code>mappings.json</code>, and ask <code>rebuild-mappings</code> to normalize what you wrote:</p> @@ -216,17 +217,17 @@ $ sudo btrfs-rec inspect mount \ ./mnt</code></pre></li> </ol> <p>This example is fleshed out more (and the manual edits to -<code>mappings.json</code> explained more) in -<code>./examples/main.sh</code>.</p> +<code>mappings.json</code> explained more) in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/examples/main.sh?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./examples/main.sh</code></a>.</p> <h1 id="prior-art">3. Prior art</h1> <p>Comparing <code>btrfs-rec inspect mount</code> with the existing https://github.com/adam900710/btrfs-fuse project:</p> <ul> <li>Again, mine has better fault tolerance</li> <li>Mine is read-only</li> -<li>Mine supports xattrs (“TODO” in Adam’s)</li> -<li>Mine supports separate inode address spaces for subvolumes; Adam’s -doesn’t due to limitations in FUSE, mine works around this by lazily +<li>Mine supports xattrs ("TODO" in Adam's)</li> +<li>Mine supports separate inode address spaces for subvolumes; Adam's +doesn't due to limitations in FUSE, mine works around this by lazily setting up separate mountpoints for each subvolume (though this does mean that the process needs to run as root, which is a bummer).</li> </ul> @@ -234,33 +235,55 @@ mean that the process needs to run as root, which is a bummer).</li> <h2 id="overview-of-the-source-tree-layout">4.1. Overview of the source tree layout</h2> <ul> -<li><p><code>examples/</code> has example scripts showing how to use -<code>btrfs-rec</code>.</p></li> -<li><p><code>lib/btrfs/</code> is the core btrfs -implementation.</p></li> -<li><p><code>lib/btrfscheck/</code> and <code>lib/btrfsutil/</code> are -libraries for “btrfs-progs” type programs, that are userland-y things -that I thought should be separate from the core implementation; -something that frustrated me about libbtrfs was having to figure out “is +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/examples?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>examples/</code></a> +has example scripts showing how to use <code>btrfs-rec</code>.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfs?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/btrfs/</code></a> +is the core btrfs implementation.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfscheck?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/btrfscheck/</code></a> +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/btrfsutil/</code></a> +are libraries for "btrfs-progs" type programs, that are userland-y +things that I thought should be separate from the core implementation; +something that frustrated me about libbtrfs was having to figure out "is this thing here in support of btrfs bits-on-disk, or in support of a -higher-level ‘how btrfs-progs wants to think about things’?”</p></li> -<li><p><code>cmd/btrfs-rec/</code> is where the command implementations -live. If a sub-command fits in a single file, it’s -<code>cmd/btrfs-rec/inspect_SUBCMD.go</code>, otherwise, it’s in a -separate <code>cmd/btrfs-rec/inspect/SUBCMD/</code> package.</p></li> -<li><p><code>lib/textui/</code> is reasonably central to how the -commands implement a text/CLI user-interface.</p></li> -<li><p><code>lib/binstruct/</code>, <code>lib/diskio/</code>, and -<code>lib/streamio/</code> are non-btrfs-specific libraries related to -the problem domain.</p></li> -<li><p><code>lib/containers/</code>, <code>lib/fmtutil/</code>, -<code>lib/maps/</code>, <code>lib/slices/</code>, and -<code>lib/profile/</code> are all generic Go libraries that have nothing -to do with btrfs or the problem domain, but weren’t in the Go standard -library and I didn’t find/know-of exiting implementations that I liked. -Of these, all but <code>containers</code> are pretty simple utility -libraries. Also, some of these things have been added to the standard -library since I started the project.</p></li> +higher-level 'how btrfs-progs wants to think about things'?"</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>cmd/btrfs-rec/</code></a> +is where the command implementations live. If a sub-command fits in a +single file, it's +<code>cmd/btrfs-rec/inspect_<var>SUBCMD</var>.go</code>, otherwise, it's +in a separate <code>cmd/btrfs-rec/inspect/<var>SUBCMD</var>/</code> +package.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/textui?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/textui/</code></a> +is reasonably central to how the commands implement a text/CLI +user-interface.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/binstruct?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/binstruct/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/diskio?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/diskio/</code></a>, +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/streamio?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/streamio/</code></a> +are non-btrfs-specific libraries related to the problem domain.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/containers?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/containers/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/fmtutil?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/fmtutil/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/maps?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/maps/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/slices?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/slices/</code></a>, +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/profile?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/profile/</code></a> +are all generic Go libraries that have nothing to do with btrfs or the +problem domain, but weren't in the Go standard library and I didn't +find/know-of exiting implementations that I liked. Of these, all but +<code>containers</code> are pretty simple utility libraries. Also, some +of these things have been added to the standard library since I started +the project.</p></li> </ul> <h2 id="base-decisions-cli-structure-go-json">4.2. Base decisions: CLI structure, Go, JSON</h2> @@ -273,8 +296,8 @@ decided I should just focus on learning btrfs-bits-on-disk.</p></li> <li><p>writing a new thing: It was becoming increasingly apparent to me that it was going to be an uphill-fight of having recovery-tools share the same code as the main-tools, as the routines used by the main-tools -rightly have validity checks, where recovery-tools want to say “yes, I -know it’s invalid, can you give it to me anyway?”.</p></li> +rightly have validity checks, where recovery-tools want to say "yes, I +know it's invalid, can you give it to me anyway?".</p></li> <li><p>writing it in not-C: I love me some C, but higher level languages are good for productivity. And I was trying to write a whole lot of code at once, I needed a productivity boost.</p></li> @@ -286,7 +309,7 @@ bits-on-disk.</p></li> Go, so I had Go swapped into my brain. And Go still feels close to C but provides <em>a lot</em> of niceness and safety over C.</p></li> </ul> -<p>It turned out that Go was perhaps not the best choice, but we’ll come +<p>It turned out that Go was perhaps not the best choice, but we'll come back to that.</p> <p>I wanted to separate things into a pipeline. For instance: Instead of <code>btrfs rescue chunk-recover</code> trying to do everything to @@ -299,31 +322,32 @@ of JSON. Then I can feed that JSON to rebuilds the mappings in the chunk tree, and dumps them as JSON. And then other commands can consume that <code>mappings.json</code> to use that instead of trying to read the chunk tree from the actual FS, so -that you don’t have to make potentially destructive writes to inspect an +that you don't have to make potentially destructive writes to inspect an FS with a broken chunk tree, and can inspect it more forensically. Or -then use <code>btrfs-rec repair SOME_SUBCMD_I_HAVENT_WRITTEN_YET</code> -to write that chunk tree in <code>mappings.json</code> back to the -filesystem.</p> +then use <code>btrfs-rec repair +<var>SOME_SUBCMD_I_HAVENT_WRITTEN_YET</var></code> to write that chunk +tree in <code>mappings.json</code> back to the filesystem.</p> <p>(But also, the separate steps thing was useful just so I could iterate on the algorithms of <code>rebuild-mappings process</code> separately from having to scan the entire FS)</p> -<p>So, I made the decision that <code>btrfs-rec inspect SUBCMD</code> -commands should all only open the FS read-only, and output their work to -a separate file; that writing that info back to the FS should be -separate in <code>btrfs-rec repair SUBCMD</code>.</p> +<p>So, I made the decision that <code>btrfs-rec inspect +<var>SUBCMD</var></code> commands should all only open the FS read-only, +and output their work to a separate file; that writing that info back to +the FS should be separate in <code>btrfs-rec repair +<var>SUBCMD</var></code>.</p> <p>For connecting those parts of the pipeline, I chose JSON, for a few reasons:</p> <ul> <li><p>I wanted something reasonably human-readable, so that I could debug it easier.</p></li> <li><p>I wanted something reasonably human-readable, so that human -end-users could make manual edits; for example, in -<code>examples/main.sh</code> I have an example of manually editing -<code>mappings.json</code> to resolve a region that the algorithm -couldn’t figure out, but with knowledge of what caused the corruption a -human can.</p></li> -<li><p>I didn’t want to invent my own DSL and have to handle writing a -parser. (This part didn’t pay off! See below.)</p></li> +end-users could make manual edits; for example, in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/examples/main.sh?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>examples/main.sh</code></a> +I have an example of manually editing <code>mappings.json</code> to +resolve a region that the algorithm couldn't figure out, but with +knowledge of what caused the corruption a human can.</p></li> +<li><p>I didn't want to invent my own DSL and have to handle writing a +parser. (This part didn't pay off! See below.)</p></li> <li><p>I wanted something that I thought would have good support in a variety of languages, so that if Go is problematic for getting things merged upstream it could be rewritten in C (or maybe Rust?) piece-meal @@ -334,13 +358,14 @@ where each subcommand can be rewritten one at a time.</p></li> <ul> <li><p>I spent a lot of time getting the garbage collector to not just kill performance.</p></li> -<li><p>The <code>btrfs-rec inspect rebuild-mappings SUBCMD</code> -subcommands all throw a lot of data through the JSON encoder/decoder, -and I learned that the Go stdlib <code>encoding/json</code> package has -memory use that grows O(n^2) (-ish? I didn’t study the implementation, -but that’s what the curve looks like just observing it) on the size of -the data being shoved through it, so I had to go take a break and go -write https://pkg.go.dev/git.lukeshu.com/go/lowmemjson which is a +<li><p>The <code>btrfs-rec inspect rebuild-mappings +<var>SUBCMD</var></code> subcommands all throw a lot of data through the +JSON encoder/decoder, and I learned that the Go stdlib +<code>encoding/json</code> package has memory use that grows O(n^2) +(-ish? I didn't study the implementation, but that's what the curve +looks like just observing it) on the size of the data being shoved +through it, so I had to go take a break and go write +https://pkg.go.dev/git.lukeshu.com/go/lowmemjson which is a mostly-drop-in-replacement that tries to be as close-as possible to O(1) memory use. So I did end up having to write my own parser anyway :(</p></li> @@ -348,7 +373,7 @@ memory use. So I did end up having to write my own parser anyway <h2 id="algorithms">4.3. Algorithms</h2> <p>There are 3 algorithms of note in <code>btrfs-rec</code>, that I think are worth getting into mainline btrfs-progs even if the code of -<code>btrfs-rec</code> doesn’t get in:</p> +<code>btrfs-rec</code> doesn't get in:</p> <ol type="1"> <li><p>The <code>btrfs-rec inspect rebuild-mappings</code> algoritithm to rebuild information from the <code>CHUNK_TREE</code>, @@ -362,8 +387,10 @@ re-attach lost branches to broken B+ trees.</p></li> <code>rebuild-mappings</code> algorithm</h3> <p>(This step-zero scan is <code>btrfs-rec inspect rebuild-mappings scan</code>, and principally -lives in <code>./lib/btrfsutil/scan.go</code> and -<code>./cmd/btrfs-rec/inspect/rebuidmappings/scan.go</code>)</p> +lives in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/scan.go</code></a> +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildmappings/scan.go</code></a>)</p> <ol start="0" type="1"> <li>Similar to <code>btrfs rescue chunk-recover</code>, scan each device for things that look like nodes; keep track of: @@ -379,22 +406,22 @@ the generation.</li> <p>Create a bucket of the data from Chunks, DevExtents, and BlockGroups; since these are mostly a Chunk and a DevExtent+BlockGroup store pretty much the same information; we can use one to reconstruct the other. How -we “merge” these and handle conflicts is in -<code>./lib/btrfs/btrfsvol/lvm.go:addMapping()</code>, I don’t think -this part is particularly clever, but given that +we "merge" these and handle conflicts is in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfs/btrfsvol/lvm.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n121"><code>./lib/btrfs/btrfsvol/lvm.go:addMapping()</code></a>, +I don't think this part is particularly clever, but given that <code>btrfs rescue chunk-recover</code> crashes if it encounters two overlapping chunks, I suppose I should spell it out:</p> <ul> -<li><p>A “mapping” is represented as a group of 4 things:</p> +<li><p>A "mapping" is represented as a group of 4 things:</p> <ul> <li>logical address</li> <li>a list of 1 or more physical addresses (device ID and offset)</li> -<li>size, and a Boolean indicator of whether the size is “locked”</li> +<li>size, and a Boolean indicator of whether the size is "locked"</li> <li>block group flags, and a Boolean presence-indicator</li> </ul></li> <li><p>Mappings must be merged if their logical or physical regions overlap.</p></li> -<li><p>If a mapping has a “locked” size, then when merging it may +<li><p>If a mapping has a "locked" size, then when merging it may subsume smaller mappings with unlocked sizes, but its size cannot be changed; trying to merge a locked-size mapping with another mapping that is not for a subset region should return an error.</p></li> @@ -402,47 +429,48 @@ is not for a subset region should return an error.</p></li> not be changed; it may only be merged with another mapping that does not have flags present, or has identical flags.</p></li> <li><p>When returning an error because of overlapping non-mergeable -mappings, just log an error on stderr and keep going. That’s an +mappings, just log an error on stderr and keep going. That's an important design thing that is different than normal filesystem code; if -there’s an error, yeah, detect and notify about it, <strong>but don’t +there's an error, yeah, detect and notify about it, <strong>but don't bail out of the whole routine</strong>. Just skip that one item or whatever.</p></li> </ul> -<p>Now that we know how to “add a mapping”, let’s do that:</p> +<p>Now that we know how to "add a mapping", let's do that:</p> <p>(The following main-steps are <code>btrfs-rec inspect rebuild-mappings process</code>, and principally -live in -<code>./cmd/btrfs-rec/inspect/rebuidmappings/process.go</code>)</p> +live in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/process.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildmappings/process.go</code></a>)</p> <ol type="1"> <li><p>Add all found Chunks.</p></li> <li><p>Add all found DevExtents.</p></li> <li><p>Add a phyical:logical mapping of length nodesize for each node that was found.</p></li> <li><p>Any mappings from steps 2 or 3 that are missing blockgroup flags -(that is: they weren’t able to be merged with a mapping from step 1), +(that is: they weren't able to be merged with a mapping from step 1), use the found BlockGroups to fill in those flags.</p></li> -<li><p>Now we’ll merge all found CSum items into a map of the sums of +<li><p>Now we'll merge all found CSum items into a map of the sums of the logical address space. Sort all of the csum items by generation, then by address. Loop over them in that order, inserting their sums into the map. If two csum items overlap, but agree about the sums of the -overlapping region, that’s fine, just take their union. For overlaps +overlapping region, that's fine, just take their union. For overlaps that disagree, items with a newer generation kick out items with an -older generation. If disagreeing items have the same generation… I don’t -think that can happen except by a filesystem bug (i.e. not by a failing -drive or other external corruption), so I wasn’t too concerned about it, -so I just log an error on stderr and skip the later-processed item. See -<code>./cmd/btrfs-rec/inspect/rebuidmappings/process_sums_logical.go</code>.</p> +older generation. If disagreeing items have the same generation... I +don't think that can happen except by a filesystem bug (i.e. not by a +failing drive or other external corruption), so I wasn't too concerned +about it, so I just log an error on stderr and skip the later-processed +item. See <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go</code></a>.</p> <p>Look at regions of the logical address space that meet all the 3 criteria:</p> <ul> <li>we have CSum items for them</li> <li>we have a BlockGroup for them</li> -<li>we don’t have a Chunk/DevExtent mapping them to the pysical address +<li>we don't have a Chunk/DevExtent mapping them to the pysical address space.</li> </ul> <p>Pair those CSums up with BlockGroups, and for each BlockGroup, search the list of checksums of physical blocks to try to find a physical -region that matches the logical csums (and isn’t already mapped to a +region that matches the logical csums (and isn't already mapped to a different logical region). I used a Knuth-Morris-Pratt search, modified to handle holes in the logical csum list as wildcards.</p> <p>Insert any found mappings into our bucket of mappings.</p></li> @@ -456,34 +484,35 @@ second best is less than a 50% match, then I add the best match. In my experience, the best match is >90% (or at whatever the maximum percent is for how much of the BlockGroup has logical sums), and the second best is 0% or 1%. The point of tracking both is that if there -isn’t a clear-cut winner, I don’t want it to commit to a potentially +isn't a clear-cut winner, I don't want it to commit to a potentially wrong choice.</p></li> </ol> <h3 id="the---rebuild-algorithm">4.3.2. The <code>--rebuild</code> algorithm</h3> <p>The <code>--rebuild</code> flag is implied by the <code>--trees=trees.json</code> flag, and triggers an algorithm that -allows “safely” reading from a broken B+ tree, rather than the usual B+ +allows "safely" reading from a broken B+ tree, rather than the usual B+ tree lookup and search functions. I probably should have tried to understand the <code>btrfs restore</code> algorithm, maybe I reinvented -the wheel…</p> +the wheel...</p> <p>This algorithm requires a list of all nodes on the filesystem; we -find these using the same scan as above -(<code>./lib/btrfsutil/scan.go</code>), the same procedure as -<code>btrfs rescue chunk-recover</code>.</p> +find these using the same scan as above (<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/scan.go</code></a>), +the same procedure as <code>btrfs rescue chunk-recover</code>.</p> <p>We walk all of those nodes, and build a reasonably lightweight -in-memory graph of all nodes (<code>./lib/btrfsutil/graph.go</code>), +in-memory graph of all nodes (<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/graph.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/graph.go</code></a>), tracking</p> <ul> -<li>each node’s +<li>each node's <ul> <li>logical address</li> <li>level</li> <li>generation</li> <li>tree</li> -<li>each item’s key and size</li> +<li>each item's key and size</li> </ul></li> -<li>each keypointer’s +<li>each keypointer's <ul> <li>source node</li> <li>source slot within the node</li> @@ -494,20 +523,22 @@ tracking</p> <li>destination generation</li> </ul></li> <li>logical addresses and error messages for nodes that are pointed to -by a keypointer or the superblock, but can’t be read (because that -logical address isn’t mapped, or it doesn’t look like a node, or…)</li> +by a keypointer or the superblock, but can't be read (because that +logical address isn't mapped, or it doesn't look like a node, +or...)</li> <li>an index such that for a given node we can quickly list both keypointers both originating at that node and pointing to that node.</li> </ul> <h4 id="rebuilt-forrest-behavior-looking-up-trees">4.3.2.1. rebuilt forrest behavior (looking up trees)</h4> -<p>(see: <code>./lib/btrfsutil/rebuilt_forrest.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/rebuilt_forrest.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/rebuilt_forrest.go</code></a>)</p> <ul> <li>The <code>ROOT_TREE</code>, <code>CHUNK_TREE</code>, <code>TREE_LOG</code>, and <code>BLOCK_GROUP_TREE</code> (the trees -pointed to directy by the superblock) work as you’d expect.</li> -<li>For other trees, we (as you’d expect) look up the root item in the +pointed to directy by the superblock) work as you'd expect.</li> +<li>For other trees, we (as you'd expect) look up the root item in the rebuilt <code>ROOT_TREE</code>, and then (if rootitem.ParentUUID is non-zero) eagerly also look up the parent tree (recursing on ourself). We try to use the <code>UUID_TREE</code> tree to help with this, but @@ -516,7 +547,7 @@ If we fail to look up the parent tree (or its parent, or a more distant ancestor), then (depending on a flag) we either make a note of that, or error out and fail to look up the child tree. For <code>--rebuild</code> and <code>--trees=trees.json</code> we are permissive of this error, and -just make note of it; but we’ll re-use this algorithm in the +just make note of it; but we'll re-use this algorithm in the <code>rebuild-trees</code> algorithm below, and it needs the more strict handling.</li> <li>When creating the rebuilt individual tree, we start by adding the @@ -528,15 +559,16 @@ additional root nodes grafted on to the tree by the </ul> <h4 id="rebuilt-individual-tree-behavior">4.3.2.2. rebuilt individual tree behavior</h4> -<p>(see: <code>./lib/btrfsutil/rebuilt_tree.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/rebuilt_tree.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/rebuilt_tree.go</code></a>)</p> <p>In order to read from a tree, we first have to build a few indexes. We store these indexes in an Adaptive Replacement Cache; they are all -re-buildable based on the tree’s list of roots and the above graph; if -we have a bunch of trees we don’t need to keep all of this in memory at -once. Note that this is done 100% with the in-memory graph, we don’t +re-buildable based on the tree's list of roots and the above graph; if +we have a bunch of trees we don't need to keep all of this in memory at +once. Note that this is done 100% with the in-memory graph, we don't need to read anything from the filesystem during these procedures.</p> <ul> -<li><p>The first index we build is the “node index”. This is an index +<li><p>The first index we build is the "node index". This is an index that for every node tells us what root(s) the tree would need to have in order for the tree to include that node, and also what the highest item key would be acceptable in the node if the tree includes that root. We @@ -545,12 +577,12 @@ case the tree is real broken and there are multiple paths from the root to the node; as these different paths may imply different max-item constraints. Put more concretely, the type of the index is:</p> <pre><code>map[ nodeID → map[ rootNodeID → {loMaxItem, hiMaxItem} ] ]</code></pre> -<p>We’ll do a loop over the graph, using dynamic-programming memoization +<p>We'll do a loop over the graph, using dynamic-programming memoization to figure out ordering and avoid processing the same node twice; for -each node we’ll</p> +each node we'll</p> <ul> -<li><p>Check whether the owner-tree is this tree or one of this tree’s -ancestors (and if it’s an ancestor, that the node’s generation isn’t +<li><p>Check whether the owner-tree is this tree or one of this tree's +ancestors (and if it's an ancestor, that the node's generation isn't after the point that the child tree was forked from the parent tree). If not, we are done processing that node (record an empty/nil set of roots for it).</p></li> @@ -558,27 +590,27 @@ for it).</p></li> {<code>loMaxItem</code>, <code>hiMaxItem</code>}.</p></li> <li><p>Look at each keypointer that that points at the node and:</p> <ul> -<li><p>Skip the keypointer if its expectations of the node aren’t met: -if the level, generation, and min-key constraints don’t match up. If the -keypointer isn’t in the last slot in the source node, we also go ahead -and include checking that the destination node’s max-key is under the -min-key of the keypointer in the next slot, since that’s cheap to do +<li><p>Skip the keypointer if its expectations of the node aren't met: +if the level, generation, and min-key constraints don't match up. If the +keypointer isn't in the last slot in the source node, we also go ahead +and include checking that the destination node's max-key is under the +min-key of the keypointer in the next slot, since that's cheap to do now.</p></li> -<li><p>Skip the keypointer if its source node’s owner-tree isn’t this -tree or one of this tree’s ancestors (and if it’s an ancestor, that the -node’s generation isn’t after the point that the child tree was forked +<li><p>Skip the keypointer if its source node's owner-tree isn't this +tree or one of this tree's ancestors (and if it's an ancestor, that the +node's generation isn't after the point that the child tree was forked from the parent tree).</p></li> -<li><p>dynamic-programming recurse and index the keypointer’s source +<li><p>dynamic-programming recurse and index the keypointer's source node.</p></li> -<li><p>for every root that would result in the keypointer’s source node +<li><p>for every root that would result in the keypointer's source node being included in the tree:</p> <p>. If the keypointer is in the last slot, look at what the what the -source node’s last-item constraints would be if that root is included, +source node's last-item constraints would be if that root is included, and can now check the max-item of our destination node. We check against the <code>hiMaxItem</code>; as if there is any valid path from the root to this node, then we want to be permissive and include it. If that -check fails, then we’re done with this keypointer. Also, make node of -those <code>loMaxItem</code> and <code>hiMaxItem</code> values, we’ll +check fails, then we're done with this keypointer. Also, make node of +those <code>loMaxItem</code> and <code>hiMaxItem</code> values, we'll use them again in just a moment.</p> <p>. Otherwise, set both <code>loMaxItem</code> and <code>hiMaxItem</code> to 1-under the min-item of the keypointer in the @@ -600,14 +632,14 @@ be a (potential) root, and insert <code>rootID=thisNode</code> -> <code>hiMaxItem</code>} map and insert it into the index as the entry for this node.</p></li> </ul></li> -<li><p>The next index we build is the “item index”. This is a “sorted -map” (implemented as a red-black tree, supporting sub-range iteration) +<li><p>The next index we build is the "item index". This is a "sorted +map" (implemented as a red-black tree, supporting sub-range iteration) of <code>key</code> → {<code>nodeID</code>, <code>slotNumber</code>}; a map that for each key tells us where to find the item with that key.</p> <ul> <li><p>Loop over the node index, and for each node check if both (a) it has <code>level==0</code> (is a leaf node containing items), and (b) its -set of roots that would include it has any overlap with the tree’s set +set of roots that would include it has any overlap with the tree's set of roots.</p></li> <li><p>Loop over each of those included leaf nodes, and loop over the items in each node. Insert the <code>key</code> → {<code>nodeId</code>, @@ -617,10 +649,10 @@ that key, decide which one wins by:</p> <li><p>Use the one from the node with the owner-tree that is closer to this tree; node with owner=thisTree wins over a node with owner=thisTree.parent, which would win over a node with -owner.thisTree.parent.parent. If that’s a tie, then…</p></li> -<li><p>Use the one from the node with the higher generation. If that’s a -tie, then…</p></li> -<li><p>I don’t know, I have the code <code>panic</code>:</p> +owner.thisTree.parent.parent. If that's a tie, then...</p></li> +<li><p>Use the one from the node with the higher generation. If that's a +tie, then...</p></li> +<li><p>I don't know, I have the code <code>panic</code>:</p> <pre><code>// TODO: This is a panic because I'm not really sure what the // best way to handle this is, and so if this happens I want the // program to crash and force me to figure out how to handle it. @@ -633,27 +665,27 @@ panic(fmt.Errorf("dup nodes in tree=%v: old=%v=%v ; new=%v=%v", <p>Note that this algorithm means that for a given node we may use a few items from that node, while having other items from that same node be overridden by another node.</p></li> -<li><p>The final index we build is the “error index”. This is an index +<li><p>The final index we build is the "error index". This is an index of what errors correspond to which range of keys, so that we can report -them, and give an idea of “there may be entries missing from this -directory” and similar.</p> -<p>For each error, we’ll track the min-key and max-key of the range it -applies to, the node it came from, and what the error string is. We’ll +them, and give an idea of "there may be entries missing from this +directory" and similar.</p> +<p>For each error, we'll track the min-key and max-key of the range it +applies to, the node it came from, and what the error string is. We'll store these into an interval tree keyed on that min-key/max-key range.</p> <ul> <li><p>Create an empty set <code>nodesToProcess</code>. Now populate it:</p> <ul> -<li><p>Once again, we’ll loop over the node index, but this time we’ll -only check that there’s overlap between the set of roots that would -include the node and the tree’s set of roots. The nodes that are +<li><p>Once again, we'll loop over the node index, but this time we'll +only check that there's overlap between the set of roots that would +include the node and the tree's set of roots. The nodes that are included in this tree, insert both that node itself and all node IDs that it has keypointers pointing to into the <code>nodesToProcess</code> set.</p></li> -<li><p>Also insert all of the tree’s roots into +<li><p>Also insert all of the tree's roots into <code>nodesToProcess</code>; this is in case the superblock/root-item -points to an invalid node that we couldn’t read.</p></li> +points to an invalid node that we couldn't read.</p></li> </ul></li> <li><p>Now loop over <code>nodesToProcess</code>. For each node, create an empty list of errors. Use the keypointers pointing to and the min @@ -662,17 +694,17 @@ expectations for the node; this should be reasonably straight-forward, given:</p> <ul> <li><p>If different keypointers have disagreeing levels, insert an error -in to the list, and don’t bother with checking the node’s +in to the list, and don't bother with checking the node's level.</p></li> <li><p>If different keypointers have disagreeing generations, insert an -error in to the list, and don’t bother with checking the node’s +error in to the list, and don't bother with checking the node's generation.</p></li> <li><p>If different keypointers have different min-item expectations, use the max of them.</p></li> </ul> <p>Then:</p> <ul> -<li>If the node is a “bad node” in the graph, insert the error message +<li>If the node is a "bad node" in the graph, insert the error message associated with it. Otherwise, check those expectations against the node in the graph.</li> </ul> @@ -688,22 +720,23 @@ then set the range to the zero-key through the max-key.</p></li> operations using those indexes; exact-lookup using the item index, and range-lookups and walks using the item index together with the error index. Efficiently searching the <code>CSUM_TREE</code> requires knowing -item sizes, so that’s why we recorded the item sizes into the graph.</p> +item sizes, so that's why we recorded the item sizes into the graph.</p> <h3 id="the-rebuild-trees-algorithm">4.3.3. The <code>rebuild-trees</code> algorithm</h3> <p>The <code>btrfs inspect rebuild-trees</code> algorithm finds nodes to -attach as extra roots to trees. I think that conceptually it’s the the +attach as extra roots to trees. I think that conceptually it's the the simplest of the 3 algorithms, but turned out to be the hardest to get -right. So… maybe more than the others reference the source code too -(<code>./cmd/btrfs-rec/inspect/rebuildtrees/</code>) because I might -forget some small but important detail.</p> -<p>The core idea here is that we’re just going to walk each tree, +right. So... maybe more than the others reference the source code too +(<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/</code></a>) +because I might forget some small but important detail.</p> +<p>The core idea here is that we're just going to walk each tree, inspecting each item in the tree, and checking for any items that are implied by other items (e.g.: a dir entry item implies the existence of inode item for the inode that it points at). If an implied item is not in the tree, but is in some other node, then we look at which potential roots we could add to the tree that would add that other node. Then, -after we’ve processed all of the items in the filesystem, we go add +after we've processed all of the items in the filesystem, we go add those various roots to the various trees, keeping track of which items are added or updated. If any of those added/updated items have a version with a newer generation on a different node, see what roots we could add @@ -713,16 +746,17 @@ version of each item has been added, loop back and inspect all added/updated items for implied items, keeping track of roots we could add. Repeat until a steady-state is reached.</p> <p>There are lots of little details in that process, some of which are -for correctness, and some of which are for “it should run in hours -instead of weeks.”</p> +for correctness, and some of which are for "it should run in hours +instead of weeks."</p> <h4 id="initialization">4.3.3.1. initialization</h4> -<p>First up, we’re going to build and in-memory graph, same as above. -But this time, while we’re reading the nodes to do that, we’re also +<p>First up, we're going to build and in-memory graph, same as above. +But this time, while we're reading the nodes to do that, we're also going to watch for some specific items and record a few things about them.</p> -<p>(see: <code>./cmd/btrfs-rec/inspect/rebuildtrees/scan.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/scan.go</code></a>)</p> <p>For each {<code>nodeID</code>, <code>slotNumber</code>} pair that -matches one of these item types, we’re going to record:</p> +matches one of these item types, we're going to record:</p> <ul> <li>flags: <ul> @@ -731,17 +765,17 @@ matches one of these item types, we’re going to record:</p> </ul></li> <li>names: <ul> -<li><code>DIR_INDEX</code> items: the file’s name</li> +<li><code>DIR_INDEX</code> items: the file's name</li> </ul></li> <li>sizes: <ul> <li><code>EXTENT_CSUM</code> items: the number of bytes that this is a -sum for (i.e. the item size over the checksum size, times the block +sum for (i.e. the item size over the checksum size, times the block size)</li> <li><code>EXTENT_DATA</code> items: the number of bytes in this extent -(i.e. either the item size minus +(i.e. either the item size minus <code>offsetof(btrfs_file_extent_item.disk_bytenr)</code> if -<code>FILE_EXTENT_INLINE</code>, or else the item’s +<code>FILE_EXTENT_INLINE</code>, or else the item's <code>num_bytes</code>).</li> </ul></li> <li>data backrefs: @@ -756,19 +790,19 @@ at.</li> </ul></li> </ul> <h4 id="the-main-loop">4.3.3.2. the main loop</h4> -<p>(see: -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go</code></a>)</p> <p>Start with that scan data (graph + info about items), and also a rebuilt forrest from the above algorithm, but with:</p> <ul> -<li><p>the flag set so that it refuses to look up a tree if it can’t -look up all of that tree’s ancestors</p></li> -<li><p>an additional “potential-item index” that is similar to the item +<li><p>the flag set so that it refuses to look up a tree if it can't +look up all of that tree's ancestors</p></li> +<li><p>an additional "potential-item index" that is similar to the item index. It is generated the same way and can cache/evict the same way; the difference is that we invert the check for if the set of roots for a -node has overlap with the tree’s set of roots; we’re looking for +node has overlap with the tree's set of roots; we're looking for <em>potential</em> nodes that we could add to this tree.</p></li> -<li><p>some callbacks; we’ll get to what we do in these callbacks in a +<li><p>some callbacks; we'll get to what we do in these callbacks in a bit, but for now, what the callbacks are:</p> <ul> <li><p>a callback that is called for each added/updated item when we add @@ -779,10 +813,10 @@ a root.</p></li> ID.</p></li> </ul></li> </ul> -<p>(The callbacks are in -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go</code>)</p> -<p>We have 5 unordered queues (“work lists”?); these are sets that when -it’s time to drain them we’ll sort the members and process them in that +<p>(The callbacks are in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go</code></a>)</p> +<p>We have 5 unordered queues ("work lists"?); these are sets that when +it's time to drain them we'll sort the members and process them in that order.</p> <ol type="1"> <li>the tree queue: a list of tree IDs that we need to crawl</li> @@ -791,7 +825,7 @@ should re-process if we add a root to that tree</li> <li>the added-item queue: a set of key/tree pairs identifying items that have been added by adding a root to a tree</li> <li>the settled-item-queue: a set of key/tree pairs that have have not -just been added by adding a root, but we’ve also verified that they are +just been added by adding a root, but we've also verified that they are the newest-generation item with that key that we could add to the tree.</li> <li>the augment queue: for each item that we want to add to a tree, the @@ -799,19 +833,19 @@ list of roots that we could add to get that item.</li> </ol> <p>The roots all start out empty, except for the tree queue, which we seed with the <code>ROOT_TREE</code>, the <code>CHUNK_TREE</code>, and -the <code>BLOCK_GROUP_TREE</code> (It is a “TODO” task that it should +the <code>BLOCK_GROUP_TREE</code> (It is a "TODO" task that it should probably also be seeded with the <code>TREE_LOG</code>, but as I will -say below in the “future work” section, I don’t actually understand the -<code>TREE_LOG</code>, so I couldn’t implement it).</p> -<p>Now we’re going to loop until the tree queue, added-item queue, +say below in the "future work" section, I don't actually understand the +<code>TREE_LOG</code>, so I couldn't implement it).</p> +<p>Now we're going to loop until the tree queue, added-item queue, settled-item queue, and augment queue are all empty (all queues except -for the retry-item queue). Each loop “pass” has 3 substeps:</p> +for the retry-item queue). Each loop "pass" has 3 substeps:</p> <ol type="1"> <li><p>Crawl the trees (drain the tree queue, fill the added-item queue).</p></li> <li><p>Either:</p> <ol type="a"> -<li><p>if the added-item queue is non-empty: “settle” those items (drain +<li><p>if the added-item queue is non-empty: "settle" those items (drain the added-item queue, fill the augment queue and the settled-item queue).</p></li> <li><p>otherwise: process items (drain the settled-item queue, fill the @@ -820,7 +854,7 @@ augment queue and the tree queue)</p></li> <li><p>Apply augments (drain the augment queue and maybe the retry-item queue, fill the added-item queue).</p></li> </ol> -<p>OK, let’s look at those 3 substeps in more detail:</p> +<p>OK, let's look at those 3 substeps in more detail:</p> <ol type="1"> <li><p>Crawl the trees; drain the tree queue, fill the added-item queue.</p> @@ -833,34 +867,34 @@ callback for each item in one of the added nodes. Our callback inserts each item into the added-item queue. The forrest also calls our root-added callback, but because of the way this algorithm works, that turns out to be a no-op at this step.</p> -<p>I mentioned that we added callbacks to intercept the forrest’s -looking up of root items and resolving UUIDs; we override the forrest’s -“lookup root item” routine and “resolve UUID” routine to instead of +<p>I mentioned that we added callbacks to intercept the forrest's +looking up of root items and resolving UUIDs; we override the forrest's +"lookup root item" routine and "resolve UUID" routine to instead of doing normal lookups on the <code>ROOT_TREE</code> and -<code>UUID_TREE</code>, use the above <code>WantXXX</code> routines that -we’ll define below in the “graph callbacks” section.</p> -<p>It shouldn’t matter what order this queue is processed in, but I sort +<code>UUID_TREE</code>, use the above <code>Want<var>XXX</var></code> +routines that we'll define below in the "graph callbacks" section.</p> +<p>It shouldn't matter what order this queue is processed in, but I sort tree IDs numerically.</p> -<p>The crawling is fairly fast because it’s just in-memory, the only +<p>The crawling is fairly fast because it's just in-memory, the only accesses to disk are looking up root items and resolving UUIDs.</p></li> <li><p>Either:</p> <ol type="a"> <li><p>Settle items from the added-item queue to the settled-item queue (and fill the augment queue).</p> -<p>For each item in the queue, we look in the tree’s item index to get -the {node, slot} pair for it, then we do the same in the tree’s +<p>For each item in the queue, we look in the tree's item index to get +the {node, slot} pair for it, then we do the same in the tree's potential-item index. If the potential-item index contains an entry for -the item’s key, then we check if the potential-item’s node should “win” -over the queue item’s node, deciding the “winner” using the same routine -as when building the item index. If the potential-item’s node wins, then -we add the potential node’s set of roots to the augment queue. If the -queue-item’s node wins, then we add the item to the settled-item queue +the item's key, then we check if the potential-item's node should "win" +over the queue item's node, deciding the "winner" using the same routine +as when building the item index. If the potential-item's node wins, then +we add the potential node's set of roots to the augment queue. If the +queue-item's node wins, then we add the item to the settled-item queue (except, as an optimization, if the item is of a type that cannot possibly imply the existence of another item, then we just drop it and -don’t add it to the settled-item queue).</p> -<p>It shouldn’t matter what order this queue is processed in, but I sort +don't add it to the settled-item queue).</p> +<p>It shouldn't matter what order this queue is processed in, but I sort it numerically by treeID and then by item key.</p> -<p>This step is fairly fast because it’s entirely in-memory, making no +<p>This step is fairly fast because it's entirely in-memory, making no accesses to disk.</p></li> <li><p>Process items from the settled-item queue (drain the settled-item queue, fill the augment queue and the tree queue).</p> @@ -870,7 +904,7 @@ patterns cache-friendly. For the most part, we just sort each queue item by tree, then by key. But, we have special handling for <code>EXTENT_ITEM</code>s, <code>METADATA_ITEM</code>s, and <code>EXTENT_DATA_REF</code> items: We break <code>EXTENT_ITEM</code>s -and <code>METADATA_ITEM</code>s in to “sub-items”, treating each ref +and <code>METADATA_ITEM</code>s in to "sub-items", treating each ref embedded in them as a separate item. For those embedded items that are <code>EXTENT_DATA_REF</code>s, and for stand-alone <code>EXTENT_DATA_REF</code> items, we sort them not with the @@ -882,99 +916,99 @@ that we may visit/read an <code>EXTENT_ITEM</code> or <code>METADATA_ITEM</code> multiple times as we process the queue, but to do otherwise is to solve MinLA, which is NP-hard and also an optimal MinLA solution I still think would perform worse than this; there is a -reasonably lengthy discussion of this in a comment in -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()</code>.</p> +reasonably lengthy discussion of this in a comment in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n251"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()</code></a>.</p> <p>Now we loop over that sorted queue. In the code, this loop is deceptively simple. Read the item, then pass it to a function that tells us what other items are implied by it. That function is large, but -simple; it’s just a giant table. The trick is how it tells us about +simple; it's just a giant table. The trick is how it tells us about implied items; we give it set of callbacks that it calls to tell us -these things; the real complexity is in the callbacks. These “graph -callbacks” will be discussed in detail below, but as an illustrative +these things; the real complexity is in the callbacks. These "graph +callbacks" will be discussed in detail below, but as an illustrative example: It may call <code>.WantOff()</code> with a tree ID, object ID, item type, and offset to specify a precise item that it believes should exist.</p> <p>If we encounter a <code>ROOT_ITEM</code>, add the tree described by that item to the tree queue.</p></li> </ol> -<p>(Both the “can this item even imply the existence of another item” -check and the “what items are implied by this item” routine are in -<code>./lib/btrfscheck/graph.go</code>)</p></li> +<p>(Both the "can this item even imply the existence of another item" +check and the "what items are implied by this item" routine are in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfscheck/graph.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfscheck/graph.go</code></a>)</p></li> <li><p>Apply augments; drain the augment queue (and maybe the retry-item queue), fill the added-item queuee.</p> -<p>It is at this point that I call out that the augment queue isn’t +<p>It is at this point that I call out that the augment queue isn't implemented as a simple map/set like the others, the <code>treeAugmentQueue struct</code> has special handling for sets of different sizes; optimizing the space for empty and len()==1 sized sets, and falling back to normal the usual implementation for larger sets; this is important because those small sets are the overwhelming -majority, and otherwise there’s no way the program would be able to run +majority, and otherwise there's no way the program would be able to run on my 32GB RAM laptop. Now that I think about it, I bet it would even be worth it to add optimized storage for len()==2 sized sets.</p> -<p>The reason is that each “want” from above is tracked in the queue +<p>The reason is that each "want" from above is tracked in the queue separately; if we were OK merging them, then this optimized storage -wouldn’t be nescessary. But we keep them separate, so that:</p> +wouldn't be nescessary. But we keep them separate, so that:</p> <ul> -<li><p>For all “wants”, including ones with empty sets, graph callbacks +<li><p>For all "wants", including ones with empty sets, graph callbacks can check if a want has already been processed; avoiding re-doing any work (see the description of the graph callbacks below).</p></li> -<li><p>For “wants” with non-empty sets, we can see how many different -“wants” could be satisfied with a given root, in order to decide which +<li><p>For "wants" with non-empty sets, we can see how many different +"wants" could be satisfied with a given root, in order to decide which root to choose.</p></li> </ul> <p>Anyway, we loop over the trees in the augment queue. For each tree we -look at that tree’s augment queue and look at all the choices of root +look at that tree's augment queue and look at all the choices of root nodes to add (below), and decide on a list to add. The we add each of those roots to the tree; the adding of each root triggers several calls to our item-added callback (filling the added-item queue), and our root-added callback. The root-added callback moves any items from the retry-item queue for this tree to the added-item queue.</p> -<p>How do we decide between choices of root nodes to add? -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()</code> -has a good comment explaining the criteria we’d like to optimize for, +<p>How do we decide between choices of root nodes to add? <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n528"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()</code></a> +has a good comment explaining the criteria we'd like to optimize for, and then code that does an OK-ish job of actually optimizing for that:</p> <ul> <li><p>It loops over the augment queue for that tree, building a list of possible roots, for each possible root making note of 3 things:</p> <ol type="a"> -<li><p>how many “wants” that root satisfies,</p></li> -<li><p>how far from treee the root’s owner is (owner=tree is a distance +<li><p>how many "wants" that root satisfies,</p></li> +<li><p>how far from treee the root's owner is (owner=tree is a distance of 0, owner=tree.parent is a distance of 1, owner=tree.parent.parent is a distance of 2, and so on), and</p></li> <li><p>what the generation of that root is.</p></li> </ol></li> <li><p>We sort that list first by highest-count-first, then by lowest-distance-first, then by highest-generation-first.</p></li> -<li><p>We create a “return” set and an “illegal” set. We loop over the +<li><p>We create a "return" set and an "illegal" set. We loop over the sorted list; for each possible root if it is in the illegal set, we skip -it, otherwise we insert it into the return set and for each “want” that +it, otherwise we insert it into the return set and for each "want" that includes this root we all all roots that satisfy that want to the illegal list.</p></li> </ul></li> </ol> <p>It is important that the rebuilt forrest have the flag set so that it -refuses to look up a tree if it can’t look up all of that tree’s +refuses to look up a tree if it can't look up all of that tree's ancestors; otherwise the potential-items index would be garbage as we -wouldn’t have a good idea of which nodes are OK to consider; but this -does have the downside that it won’t even attempt to improve a tree with +wouldn't have a good idea of which nodes are OK to consider; but this +does have the downside that it won't even attempt to improve a tree with a missing parent. Perhaps the algorithm should flip the flag once the loop terminates, and then re-seed the tree queue with each <code>ROOT_ITEM</code> from the <code>ROOT_TREE</code>?</p> <h4 id="graph-callbacks">4.3.3.3. graph callbacks</h4> -<p>(see: -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go</code></a>)</p> <p>The graph callbacks are what tie the above together.</p> <p>For each of these callbacks, whenever I say that it looks up -something in a tree’s item index or potential-item index, that implies +something in a tree's item index or potential-item index, that implies looking the tree up from the forrest; if the forrest cannot look up that tree, then the callback returns early, after either:</p> <ul> <li><p>if we are in substep 1 and are processing a tree: we add the tree that is being processed to the tree queue. (TODO: Wait, this assumes that an augment will be applied to the <code>ROOT_TREE</code> before the -next pass… if that isn’t the case, this will result in the loop never -terminating… I guess I need to add a separate retry-tree +next pass... if that isn't the case, this will result in the loop never +terminating... I guess I need to add a separate retry-tree queue?)</p></li> <li><p>if we are in substep 2 and are processing an item: we add the item that is being processed to the retry-item queue for the tree that @@ -983,15 +1017,15 @@ cannot be looked up</p></li> <p>The 6 methods in the <code>brfscheck.GraphCallbacks</code> interface are:</p> <ol type="1"> -<li><p><code>FSErr()</code>: There’s an error with the filesystem; this +<li><p><code>FSErr()</code>: There's an error with the filesystem; this callback just spits it out on stderr. I say such a trivial matter -because, again, for a recovery tool I think it’s worth putting care in +because, again, for a recovery tool I think it's worth putting care in to how you handle errors and where you expect them: We expect them here, so we have to check for them to avoid reading invalid data or whatever, -but we don’t actually need to do anything other than watch our +but we don't actually need to do anything other than watch our step.</p></li> <li><p><code>Want()</code>: We want an item in a given tree with a given -object ID and item type, but we don’t care about what the item’s offset +object ID and item type, but we don't care about what the item's offset is.</p> <p>The callback works by searching the item index to see if it can find such an item; if so, it has nothing else to do and returns. Otherwise, @@ -1003,13 +1037,13 @@ augment queue (even if that set is still empty).</p></li> <li><p><code>WantOff()</code>: The same, but we want a specific offset.</p></li> <li><p><code>WantDirIndex()</code>: We want a <code>DIR_INDEX</code> -item for a given inode and filename, but we don’t know what the offset +item for a given inode and filename, but we don't know what the offset of that item is.</p> <p>First we scan over the item index, looking at all <code>DIR_INDEX</code> items for that inode number. For each item, we can check the scan data to see what the filename in that <code>DIR_INDEX</code> is, so we can see if the item satisfies this want -without accessing the disk. If there’s a match, then there is nothing +without accessing the disk. If there's a match, then there is nothing else to do, so we return. Otherwise, we do that same search over the potential-item index; if we find any matches, then we build the set of roots to add to the augment queue the same as in @@ -1018,71 +1052,73 @@ roots to add to the augment queue the same as in <code>DATA_EXTENT</code> items in the given tree for the given inode, and we want them to cover from 0 to a given size bytes of that file.</p> <p>First we walk that range in the item index, to build a list of the -gaps that we need to fill (“Step 1” in -<code>rebuild_wantcb.go:_wantRange()</code>). This walk -(<code>rebuild_wantcb.go:_walkRange()</code>) requires knowing the size -of each file extent; so doing this quickly without hitting disk is why -we recorded the size of each file extent in our initialization step.</p> -<p>Then (“Step 2” in <code>_wantRange()</code>) we iterate over each of +gaps that we need to fill ("Step 1" in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n260"><code>rebuild_wantcb.go:_wantRange()</code></a>). +This walk (<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n195"><code>rebuild_wantcb.go:_walkRange()</code></a>) +requires knowing the size of each file extent; so doing this quickly +without hitting disk is why we recorded the size of each file extent in +our initialization step.</p> +<p>Then ("Step 2" in <code>_wantRange()</code>) we iterate over each of the gaps, and for each gap do a very similar walk (again, by calling <code>_walkRange()</code>, but this time over the potential-item index. For each file extent we find that has is entirely within the gap, we -“want” that extent, and move the beginning of of the gap forward to the +"want" that extent, and move the beginning of of the gap forward to the end of that extent. This algorithm is dumb and greedy, potentially making sub-optimal selections; and so could probably stand to be -improved; but in my real-world use, it seems to be “good -enough”.</p></li> +improved; but in my real-world use, it seems to be "good +enough".</p></li> <li><p><code>WantCSum()</code>: We want 1 or more <code>EXTENT_CSUM</code> items to cover the half-open interval [<code>lo_logical_addr</code>, <code>hi_logical_addr</code>). Well, maybe. It also takes a subvolume ID and an inode number; and looks up in the scan data whether that inode has the <code>INODE_NODATASUM</code> flag set; if it does have the flag set, then it returns early without -looking for any <code>EXTENT_CSUM</code> items. If it doesn’t return +looking for any <code>EXTENT_CSUM</code> items. If it doesn't return early, then it performs the same want-range routine as <code>WantFileExt</code>, but with the appropriate tree, object ID, and item types for csums as opposed to data extents.</p></li> </ol> -<p>For each of these callbacks, we generate a “wantKey”, a tuple +<p>For each of these callbacks, we generate a "wantKey", a tuple representing the function and its arguments; we check the augment-queue -to see if we’ve already enqueued a set of roots for that want, and if +to see if we've already enqueued a set of roots for that want, and if so, that callback can return early without checking the potential-item index.</p> <h1 id="future-work">5. Future work</h1> -<p>It’s in a reasonably useful place, I think; and so now I’m going to -take a break from it for a while. But there’s still lots of work to +<p>It's in a reasonably useful place, I think; and so now I'm going to +take a break from it for a while. But there's still lots of work to do:</p> <ul> -<li><p>RAID almost certainly doesn’t work.</p></li> +<li><p>RAID almost certainly doesn't work.</p></li> <li><p>Encryption is not implemented.</p></li> -<li><p>It doesn’t understand (ignores) the <code>TREE_LOG</code> -(because I don’t understand the <code>TREE_LOG</code>).</p></li> -<li><p><code>btrfs-rec inspect mount</code> should add “lost+found” -directories for inodes that are included in the subvolume’s tree but -aren’t reachable from the tree’s root inode</p></li> -<li><p>I still need to implement <code>btrfs-rec repair SUBCMD</code> -subcommands to write rebuilt-information from +<li><p>It doesn't understand (ignores) the <code>TREE_LOG</code> +(because I don't understand the <code>TREE_LOG</code>).</p></li> +<li><p><code>btrfs-rec inspect mount</code> should add "lost+found" +directories for inodes that are included in the subvolume's tree but +aren't reachable from the tree's root inode</p></li> +<li><p>I still need to implement <code>btrfs-rec repair +<var>SUBCMD</var></code> subcommands to write rebuilt-information from <code>btrfs-rec inspect</code> back to the filesystem.</p></li> <li><p>I need to figure out the error handling/reporting story for <code>mount</code>.</p></li> <li><p>It needs a lot more tests</p> <ul> -<li>I’d like to get the existing btrfs-progs fsck tests to run on +<li>I'd like to get the existing btrfs-progs fsck tests to run on it.</li> </ul></li> <li><p>In the process of writing this email, I realized that I probably -need to add a retry-tree queue; see the “graph callbacks” section in the +need to add a retry-tree queue; see the "graph callbacks" section in the description of the <code>rebuild-trees</code> algorithm above.</p></li> -<li><p>Shere are a number of “TODO” comments or panics in the code:</p> +<li><p>Shere are a number of "TODO" comments or panics in the code:</p> <ul> <li><p>Some of them definitely need done.</p></li> <li><p>Some of them are <code>panic("TODO")</code> on the basis that if -it’s seeing something on the filesystem that it doesn’t recognize, it’s -probably that I didn’t get to implementing that thing/situation, but -it’s possible that the thing is just corrupt. This should only be for +it's seeing something on the filesystem that it doesn't recognize, it's +probably that I didn't get to implementing that thing/situation, but +it's possible that the thing is just corrupt. This should only be for situations that the node passed the checksum test, so it being corrupt would have to be caused by a bug in btrfs rather than a failing drive or -other corruption; I wasn’t too worried about btrfs bugs.</p></li> +other corruption; I wasn't too worried about btrfs bugs.</p></li> </ul></li> <li><p><code>btrfs-rec inspect rebuild-trees</code> is slow, and can probably be made a lot faster.</p> @@ -1094,12 +1130,13 @@ steps on my ThinkPad E15 for a 256GB disk image are as follows:</p> btrfs-rec inspect rebuild-trees : 1h 4m 55s btrfs-rec inspect ls-files : 29m 55s btrfs-rec inspect ls-trees : 8m 40s</code></pre> -<p>For the most part, it’s all single-threaded (with the main exception +<p>For the most part, it's all single-threaded (with the main exception that in several places I/O has been moved to a separate thread from the main CPU-heavy thread), but a lot of the algorithms could be parallelized.</p></li> -<li><p>There are a lot of “tunable” values that I haven’t really spent -time tuning. These are all annotated with <code>textui.Tunable()</code>. +<li><p>There are a lot of "tunable" values that I haven't really spent +time tuning. These are all annotated with <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/textui/tunable.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>textui.Tunable()</code></a>. I sort-of intended for them to be adjustable on the CLI.</p></li> <li><p>Perhaps the <code>btrfs inspect rebuild-trees</code> algorithm could be adjusted to also try to rebuild trees with missing parents; see @@ -1108,20 +1145,20 @@ the above discussion of the algorithm.</p></li> <h1 id="problems-for-merging-this-code-into-btrfs-progs">6. Problems for merging this code into btrfs-progs</h1> <ul> -<li><p>It’s written in Go, not C.</p></li> -<li><p>It’s effectively GPLv3+ (not GPLv2-only or GPLv2+) because of use +<li><p>It's written in Go, not C.</p></li> +<li><p>It's effectively GPLv3+ (not GPLv2-only or GPLv2+) because of use of some code under the Apache 2.0 license (2 files in the codebase itself that are based off of Apache-licensed code, and use of unmodified 3rd-party libraries).</p></li> <li><p>It uses ARC (Adaptive Replacement Cache), which is patented by -IBM, and the patent doesn’t expire for another 7 months. An important +IBM, and the patent doesn't expire for another 7 months. An important property of ARC over LRU is that it is scan-resistant; the above algorithms do a lot of scanning. On that note, now that RedHat is owned by IBM: who in the company do we need to get to talk to eachother so that we can get ARC into the Linux kernel before then?</p></li> </ul> <div style="font-family: monospace"> -<p>– <br/> Happy hacking,<br/> ~ Luke Shumaker<br/></p> +<p>-- <br/> Happy hacking,<br/> ~ Luke Shumaker<br/></p> </div> </article> diff --git a/public/btrfs-rec.md b/public/btrfs-rec.md index 8519969..f5d8dc6 100644 --- a/public/btrfs-rec.md +++ b/public/btrfs-rec.md @@ -2,6 +2,7 @@ Announcing: btrfs-rec: Recover (data from) a broken btrfs filesystem ==================================================================== --- date: "2023-07-10" +markdown_options: "-smart" --- > I originally sent this email on 2023-07-10, but it has been caught @@ -170,14 +171,14 @@ btrfs-progs.) # 2. Overview of use There are two `btrfs-rec` sub-command groups: -`btrfs-rec inspect SUBCMD` and `btrfs-rec repair SUBCMD`, and you can +<code>btrfs-rec inspect <var>SUBCMD</var></code> and <code>btrfs-rec repair <var>SUBCMD</var></code>, and you can find out about various sub-commands with `btrfs-rec help`. These are both told about devices or images with the `--pv` flag. -`btrfs-rec inspect SUBCMD` commands open the filesystem read-only, and +<code>btrfs-rec inspect <var>SUBCMD</var></code> commands open the filesystem read-only, and (generally speaking) write extracted or rebuilt information to stdout. -`btrfs-rec repair SUBCMD` commands open the filesystem read+write, and -consume information from `btrfs-rec inspect SUBCMD` commands to +<code>btrfs-rec repair <var>SUBCMD</var></code> commands open the filesystem read+write, and +consume information from <code>btrfs-rec inspect <var>SUBCMD</var></code> commands to actually repair the filesystem (except I haven't actually implemented any `repair` commands yet... despite the lack of `repair` commands, I believe that `btrfs-rec` is already a useful because of the `btrfs-rec @@ -231,7 +232,7 @@ I'd: ./mnt This example is fleshed out more (and the manual edits to -`mappings.json` explained more) in `./examples/main.sh`. +`mappings.json` explained more) in [`./examples/main.sh`]. # 3. Prior art @@ -251,30 +252,30 @@ https://github.com/adam900710/btrfs-fuse project: ## 4.1. Overview of the source tree layout - - `examples/` has example scripts showing how to use `btrfs-rec`. + - [`examples/`] has example scripts showing how to use `btrfs-rec`. - - `lib/btrfs/` is the core btrfs implementation. + - [`lib/btrfs/`] is the core btrfs implementation. - - `lib/btrfscheck/` and `lib/btrfsutil/` are libraries for + - [`lib/btrfscheck/`] and [`lib/btrfsutil/`] are libraries for "btrfs-progs" type programs, that are userland-y things that I thought should be separate from the core implementation; something that frustrated me about libbtrfs was having to figure out "is this thing here in support of btrfs bits-on-disk, or in support of a higher-level 'how btrfs-progs wants to think about things'?" - - `cmd/btrfs-rec/` is where the command implementations live. If a + - [`cmd/btrfs-rec/`] is where the command implementations live. If a sub-command fits in a single file, it's - `cmd/btrfs-rec/inspect_SUBCMD.go`, otherwise, it's in a separate - `cmd/btrfs-rec/inspect/SUBCMD/` package. + <code>cmd/btrfs-rec/inspect_<var>SUBCMD</var>.go</code>, otherwise, it's in a separate + <code>cmd/btrfs-rec/inspect/<var>SUBCMD</var>/</code> package. - - `lib/textui/` is reasonably central to how the commands implement a + - [`lib/textui/`] is reasonably central to how the commands implement a text/CLI user-interface. - - `lib/binstruct/`, `lib/diskio/`, and `lib/streamio/` are + - [`lib/binstruct/`], [`lib/diskio/`], and [`lib/streamio/`] are non-btrfs-specific libraries related to the problem domain. - - `lib/containers/`, `lib/fmtutil/`, `lib/maps/`, `lib/slices/`, and - `lib/profile/` are all generic Go libraries that have nothing to do + - [`lib/containers/`], [`lib/fmtutil/`], [`lib/maps/`], [`lib/slices/`], and + [`lib/profile/`] are all generic Go libraries that have nothing to do with btrfs or the problem domain, but weren't in the Go standard library and I didn't find/know-of exiting implementations that I liked. Of these, all but `containers` are pretty simple utility @@ -324,17 +325,17 @@ can consume that `mappings.json` to use that instead of trying to read the chunk tree from the actual FS, so that you don't have to make potentially destructive writes to inspect an FS with a broken chunk tree, and can inspect it more forensically. Or then use -`btrfs-rec repair SOME_SUBCMD_I_HAVENT_WRITTEN_YET` to write that +<code>btrfs-rec repair <var>SOME_SUBCMD_I_HAVENT_WRITTEN_YET</var></code> to write that chunk tree in `mappings.json` back to the filesystem. (But also, the separate steps thing was useful just so I could iterate on the algorithms of `rebuild-mappings process` separately from having to scan the entire FS) -So, I made the decision that `btrfs-rec inspect SUBCMD` commands +So, I made the decision that <code>btrfs-rec inspect <var>SUBCMD</var></code> commands should all only open the FS read-only, and output their work to a separate file; that writing that info back to the FS should be -separate in `btrfs-rec repair SUBCMD`. +separate in <code>btrfs-rec repair <var>SUBCMD</var></code>. For connecting those parts of the pipeline, I chose JSON, for a few reasons: @@ -344,7 +345,7 @@ reasons: - I wanted something reasonably human-readable, so that human end-users could make manual edits; for example, in - `examples/main.sh` I have an example of manually editing + [`examples/main.sh`] I have an example of manually editing `mappings.json` to resolve a region that the algorithm couldn't figure out, but with knowledge of what caused the corruption a human can. @@ -364,7 +365,7 @@ OK, so: Go and/or JSON maybe being mistakes: - I spent a lot of time getting the garbage collector to not just kill performance. - - The `btrfs-rec inspect rebuild-mappings SUBCMD` subcommands all + - The <code>btrfs-rec inspect rebuild-mappings <var>SUBCMD</var></code> subcommands all throw a lot of data through the JSON encoder/decoder, and I learned that the Go stdlib `encoding/json` package has memory use that grows O(n^2) (-ish? I didn't study the implementation, but that's @@ -394,8 +395,8 @@ doesn't get in: ### 4.3.1. The `rebuild-mappings` algorithm (This step-zero scan is `btrfs-rec inspect rebuild-mappings scan`, and -principally lives in `./lib/btrfsutil/scan.go` and -`./cmd/btrfs-rec/inspect/rebuidmappings/scan.go`) +principally lives in [`./lib/btrfsutil/scan.go`] and +[`./cmd/btrfs-rec/inspect/rebuildmappings/scan.go`]) 0. Similar to `btrfs rescue chunk-recover`, scan each device for things that look like nodes; keep track of: @@ -410,7 +411,7 @@ Create a bucket of the data from Chunks, DevExtents, and BlockGroups; since these are mostly a Chunk and a DevExtent+BlockGroup store pretty much the same information; we can use one to reconstruct the other. How we "merge" these and handle conflicts is in -`./lib/btrfs/btrfsvol/lvm.go:addMapping()`, I don't think this +[`./lib/btrfs/btrfsvol/lvm.go:addMapping()`], I don't think this part is particularly clever, but given that `btrfs rescue chunk-recover` crashes if it encounters two overlapping chunks, I suppose I should spell it out: @@ -445,7 +446,7 @@ Now that we know how to "add a mapping", let's do that: (The following main-steps are `btrfs-rec inspect rebuild-mappings process`, and principally live in -`./cmd/btrfs-rec/inspect/rebuidmappings/process.go`) +[`./cmd/btrfs-rec/inspect/rebuildmappings/process.go`]) 1. Add all found Chunks. @@ -469,7 +470,7 @@ process`, and principally live in by a filesystem bug (i.e. not by a failing drive or other external corruption), so I wasn't too concerned about it, so I just log an error on stderr and skip the later-processed item. See - `./cmd/btrfs-rec/inspect/rebuidmappings/process_sums_logical.go`. + [`./cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go`]. Look at regions of the logical address space that meet all the 3 criteria: @@ -510,11 +511,11 @@ probably should have tried to understand the `btrfs restore` algorithm, maybe I reinvented the wheel... This algorithm requires a list of all nodes on the filesystem; we find -these using the same scan as above (`./lib/btrfsutil/scan.go`), the +these using the same scan as above ([`./lib/btrfsutil/scan.go`]), the same procedure as `btrfs rescue chunk-recover`. We walk all of those nodes, and build a reasonably lightweight -in-memory graph of all nodes (`./lib/btrfsutil/graph.go`), tracking +in-memory graph of all nodes ([`./lib/btrfsutil/graph.go`]), tracking - each node's + logical address @@ -540,7 +541,7 @@ in-memory graph of all nodes (`./lib/btrfsutil/graph.go`), tracking #### 4.3.2.1. rebuilt forrest behavior (looking up trees) -(see: `./lib/btrfsutil/rebuilt_forrest.go`) +(see: [`./lib/btrfsutil/rebuilt_forrest.go`]) - The `ROOT_TREE`, `CHUNK_TREE`, `TREE_LOG`, and `BLOCK_GROUP_TREE` (the trees pointed to directy by the superblock) work as you'd @@ -564,7 +565,7 @@ in-memory graph of all nodes (`./lib/btrfsutil/graph.go`), tracking #### 4.3.2.2. rebuilt individual tree behavior -(see: `./lib/btrfsutil/rebuilt_tree.go`) +(see: [`./lib/btrfsutil/rebuilt_tree.go`]) In order to read from a tree, we first have to build a few indexes. We store these indexes in an Adaptive Replacement Cache; they are all @@ -748,7 +749,7 @@ The `btrfs inspect rebuild-trees` algorithm finds nodes to attach as extra roots to trees. I think that conceptually it's the the simplest of the 3 algorithms, but turned out to be the hardest to get right. So... maybe more than the others reference the source code too -(`./cmd/btrfs-rec/inspect/rebuildtrees/`) because I might forget some +([`./cmd/btrfs-rec/inspect/rebuildtrees/`]) because I might forget some small but important detail. The core idea here is that we're just going to walk each tree, @@ -778,7 +779,7 @@ But this time, while we're reading the nodes to do that, we're also going to watch for some specific items and record a few things about them. -(see: `./cmd/btrfs-rec/inspect/rebuildtrees/scan.go`) +(see: [`./cmd/btrfs-rec/inspect/rebuildtrees/scan.go`]) For each {`nodeID`, `slotNumber`} pair that matches one of these item types, we're going to record: @@ -804,7 +805,7 @@ types, we're going to record: #### 4.3.3.2. the main loop -(see: `./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go`) +(see: [`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go`]) Start with that scan data (graph + info about items), and also a rebuilt forrest from the above algorithm, but with: @@ -831,7 +832,7 @@ rebuilt forrest from the above algorithm, but with: + a callback that intercepts resolving an UUID to an object ID. (The callbacks are in - `./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go`) + [`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go`]) We have 5 unordered queues ("work lists"?); these are sets that when it's time to drain them we'll sort the members and process them in @@ -892,7 +893,7 @@ OK, let's look at those 3 substeps in more detail: looking up of root items and resolving UUIDs; we override the forrest's "lookup root item" routine and "resolve UUID" routine to instead of doing normal lookups on the `ROOT_TREE` and - `UUID_TREE`, use the above `WantXXX` routines that we'll define + `UUID_TREE`, use the above <code>Want<var>XXX</var></code> routines that we'll define below in the "graph callbacks" section. It shouldn't matter what order this queue is processed in, but I @@ -948,7 +949,7 @@ OK, let's look at those 3 substeps in more detail: to solve MinLA, which is NP-hard and also an optimal MinLA solution I still think would perform worse than this; there is a reasonably lengthy discussion of this in a comment in - `./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()`. + [`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()`]. Now we loop over that sorted queue. In the code, this loop is deceptively simple. Read the item, then pass it to a function @@ -967,7 +968,7 @@ OK, let's look at those 3 substeps in more detail: (Both the "can this item even imply the existence of another item" check and the "what items are implied by this item" routine are in - `./lib/btrfscheck/graph.go`) + [`./lib/btrfscheck/graph.go`]) 3. Apply augments; drain the augment queue (and maybe the retry-item queue), fill the added-item queuee. @@ -1007,7 +1008,7 @@ OK, let's look at those 3 substeps in more detail: this tree to the added-item queue. How do we decide between choices of root nodes to add? - `./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()` + [`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()`] has a good comment explaining the criteria we'd like to optimize for, and then code that does an OK-ish job of actually optimizing for that: @@ -1044,7 +1045,7 @@ once the loop terminates, and then re-seed the tree queue with each #### 4.3.3.3. graph callbacks -(see: `./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go`) +(see: [`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go`]) The graph callbacks are what tie the above together. @@ -1106,8 +1107,8 @@ The 6 methods in the `brfscheck.GraphCallbacks` interface are: First we walk that range in the item index, to build a list of the gaps that we need to fill ("Step 1" in - `rebuild_wantcb.go:_wantRange()`). This walk - (`rebuild_wantcb.go:_walkRange()`) requires knowing the size of + [`rebuild_wantcb.go:_wantRange()`]). This walk + ([`rebuild_wantcb.go:_walkRange()`]) requires knowing the size of each file extent; so doing this quickly without hitting disk is why we recorded the size of each file extent in our initialization step. @@ -1155,7 +1156,7 @@ do: inodes that are included in the subvolume's tree but aren't reachable from the tree's root inode - - I still need to implement `btrfs-rec repair SUBCMD` subcommands to + - I still need to implement <code>btrfs-rec repair <var>SUBCMD</var></code> subcommands to write rebuilt-information from `btrfs-rec inspect` back to the filesystem. @@ -1204,7 +1205,7 @@ do: could be parallelized. - There are a lot of "tunable" values that I haven't really spent - time tuning. These are all annotated with `textui.Tunable()`. I + time tuning. These are all annotated with [`textui.Tunable()`]. I sort-of intended for them to be adjustable on the CLI. - Perhaps the `btrfs inspect rebuild-trees` algorithm could be @@ -1232,3 +1233,40 @@ do: Happy hacking,<br/> ~ Luke Shumaker<br/> </div> + +[`./examples/main.sh`]: https://git.lukeshu.com/btrfs-progs-ng/tree/examples/main.sh?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`examples/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/examples?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/btrfs/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfs?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/btrfscheck/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfscheck?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/btrfsutil/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`cmd/btrfs-rec/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/textui/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/textui?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/binstruct/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/binstruct?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/diskio/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/diskio?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/streamio/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/streamio?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/containers/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/containers?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/fmtutil/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/fmtutil?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/maps/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/maps?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/slices/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/slices?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`lib/profile/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/profile?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`examples/main.sh`]: https://git.lukeshu.com/btrfs-progs-ng/tree/examples/main.sh?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./lib/btrfsutil/scan.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildmappings/scan.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./lib/btrfs/btrfsvol/lvm.go:addMapping()`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfs/btrfsvol/lvm.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n121 +[`./cmd/btrfs-rec/inspect/rebuildmappings/process.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/process.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./lib/btrfsutil/scan.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./lib/btrfsutil/graph.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/graph.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./lib/btrfsutil/rebuilt_forrest.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/rebuilt_forrest.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./lib/btrfsutil/rebuilt_tree.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/rebuilt_tree.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildtrees/`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildtrees/scan.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n251 +[`./lib/btrfscheck/graph.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfscheck/graph.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n528 +[`./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434 +[`rebuild_wantcb.go:_wantRange()`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n260 +[`rebuild_wantcb.go:_walkRange()`]: https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n195 +[`textui.Tunable()`]: https://git.lukeshu.com/btrfs-progs-ng/tree/lib/textui/tunable.go?id=18e6066c241cf3d252b6521150843ffc858d8434 diff --git a/public/index.atom b/public/index.atom index d1d7786..3612db1 100644 --- a/public/index.atom +++ b/public/index.atom @@ -24,8 +24,8 @@ btrfs-rec: Recover (data from) a broken btrfs filesystem</h1> <p>I originally sent this email on 2023-07-10, but it has been caught by their bogofilter. Yes, it was <a href="https://git.lukeshu.com/btrfs-progs-ng/tree/README.md?id=18e6066c241cf3d252b6521150843ffc858d8434">plaintext</a>. -No, I didn’t use GMail. Yes, I’ve successfully participated in vger -lists in the past. Yes, I’ve reached out to postmaster; no, I haven’t +No, I didn't use GMail. Yes, I've successfully participated in vger +lists in the past. Yes, I've reached out to postmaster; no, I haven't received a reply yet (as of 2023-07-14).</p> </blockquote> <div style="font-family: monospace"> @@ -36,17 +36,17 @@ a broken btrfs filesystem<br/> Date: Mon, 10 Jul 2023 21:23:41 &lt;87jzv7uo5e.wl-lukeshu@lukeshu.com&gt;<br/></p> </div> <p>Inspired by a mis-typed <code>dd</code> command, for the last year -I’ve been working on a tool for recovering corrupt btrfs filesystems; at +I've been working on a tool for recovering corrupt btrfs filesystems; at first idly here and there, but more actively in the last few months. I hope to get it incorporated into btrfs-progs, though perhaps that is -problematic for a few reasons I’ll get to. If the code can’t be +problematic for a few reasons I'll get to. If the code can't be incorporated into btrfs-progs, at least the ideas and algorithms should be.</p> <p><a href="https://git.lukeshu.com/btrfs-progs-ng/">https://git.lukeshu.com/btrfs-progs-ng/</a></p> <p>Highlights:</p> <ul> -<li><p>In general, it’s more tolerant of corrupt filesystems than +<li><p>In general, it's more tolerant of corrupt filesystems than <code>btrfs check --repair</code>, <code>btrfs rescue</code> or <code>btrfs restore</code>.</p></li> <li><p><code>btrfs-rec inspect rebuild-mappings</code> is a better @@ -56,8 +56,8 @@ branches to broken B+ trees.</p></li> <li><p><code>btrfs-rec inspect mount</code> is a read-only FUSE implementation of btrfs. This is conceptually a replacement for <code>btrfs restore</code>.</p></li> -<li><p>It’s entirely written in Go. I’m not saying that’s a good thing, -but it’s an interesting thing.</p></li> +<li><p>It's entirely written in Go. I'm not saying that's a good thing, +but it's an interesting thing.</p></li> </ul> <p>Hopefully some folks will find it useful, or at least neat!</p> <ul> @@ -170,25 +170,26 @@ to try to get past those errors, only to get a different set of errors. Some of these patches I am separately submitting to btrfs-progs.)</p> <h1 id="overview-of-use">2. Overview of use</h1> <p>There are two <code>btrfs-rec</code> sub-command groups: -<code>btrfs-rec inspect SUBCMD</code> and -<code>btrfs-rec repair SUBCMD</code>, and you can find out about various +<code>btrfs-rec inspect <var>SUBCMD</var></code> and <code>btrfs-rec +repair <var>SUBCMD</var></code>, and you can find out about various sub-commands with <code>btrfs-rec help</code>. These are both told about devices or images with the <code>--pv</code> flag.</p> -<p><code>btrfs-rec inspect SUBCMD</code> commands open the filesystem -read-only, and (generally speaking) write extracted or rebuilt -information to stdout. <code>btrfs-rec repair SUBCMD</code> commands -open the filesystem read+write, and consume information from -<code>btrfs-rec inspect SUBCMD</code> commands to actually repair the -filesystem (except I haven’t actually implemented any -<code>repair</code> commands yet… despite the lack of -<code>repair</code> commands, I believe that <code>btrfs-rec</code> is -already a useful because of the <code>btrfs-rec inspect mount</code> -command to get data out of the broken filesystem). This split allows you -to try things without being scared by WARNINGs about not using these -tools unless you’re an expert or have been told to by a developer.</p> +<p><code>btrfs-rec inspect <var>SUBCMD</var></code> commands open the +filesystem read-only, and (generally speaking) write extracted or +rebuilt information to stdout. <code>btrfs-rec repair +<var>SUBCMD</var></code> commands open the filesystem read+write, and +consume information from <code>btrfs-rec inspect +<var>SUBCMD</var></code> commands to actually repair the filesystem +(except I haven't actually implemented any <code>repair</code> commands +yet... despite the lack of <code>repair</code> commands, I believe that +<code>btrfs-rec</code> is already a useful because of the +<code>btrfs-rec inspect mount</code> command to get data out of the +broken filesystem). This split allows you to try things without being +scared by WARNINGs about not using these tools unless you're an expert +or have been told to by a developer.</p> <p>In the broken <code>dump-zero.1.img</code> example above (which has a perfectly intact superblock, but a totally broken -<code>CHUNK_TREE</code>), to “repair” it I’d:</p> +<code>CHUNK_TREE</code>), to "repair" it I'd:</p> <ol type="1"> <li><p>Start by using <code>btrfs-rec inspect rebuild-mappings</code> to rebuild the broken chunk/dev/blockgroup trees:</p> @@ -196,7 +197,7 @@ rebuild the broken chunk/dev/blockgroup trees:</p> --pv=dump-zero.1.img \ &gt; mappings-1.json</code></pre></li> <li><p>If it only mostly succeeds, but on stderr tells us about a few -regions of the image that it wasn’t able to figure out the chunks for. +regions of the image that it wasn't able to figure out the chunks for. Using some human-level knowledge, you can write those yourself, inserting them into the generated <code>mappings.json</code>, and ask <code>rebuild-mappings</code> to normalize what you wrote:</p> @@ -224,17 +225,17 @@ $ sudo btrfs-rec inspect mount \ ./mnt</code></pre></li> </ol> <p>This example is fleshed out more (and the manual edits to -<code>mappings.json</code> explained more) in -<code>./examples/main.sh</code>.</p> +<code>mappings.json</code> explained more) in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/examples/main.sh?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./examples/main.sh</code></a>.</p> <h1 id="prior-art">3. Prior art</h1> <p>Comparing <code>btrfs-rec inspect mount</code> with the existing https://github.com/adam900710/btrfs-fuse project:</p> <ul> <li>Again, mine has better fault tolerance</li> <li>Mine is read-only</li> -<li>Mine supports xattrs (“TODO” in Adam’s)</li> -<li>Mine supports separate inode address spaces for subvolumes; Adam’s -doesn’t due to limitations in FUSE, mine works around this by lazily +<li>Mine supports xattrs ("TODO" in Adam's)</li> +<li>Mine supports separate inode address spaces for subvolumes; Adam's +doesn't due to limitations in FUSE, mine works around this by lazily setting up separate mountpoints for each subvolume (though this does mean that the process needs to run as root, which is a bummer).</li> </ul> @@ -242,33 +243,55 @@ mean that the process needs to run as root, which is a bummer).</li> <h2 id="overview-of-the-source-tree-layout">4.1. Overview of the source tree layout</h2> <ul> -<li><p><code>examples/</code> has example scripts showing how to use -<code>btrfs-rec</code>.</p></li> -<li><p><code>lib/btrfs/</code> is the core btrfs -implementation.</p></li> -<li><p><code>lib/btrfscheck/</code> and <code>lib/btrfsutil/</code> are -libraries for “btrfs-progs” type programs, that are userland-y things -that I thought should be separate from the core implementation; -something that frustrated me about libbtrfs was having to figure out “is +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/examples?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>examples/</code></a> +has example scripts showing how to use <code>btrfs-rec</code>.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfs?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/btrfs/</code></a> +is the core btrfs implementation.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfscheck?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/btrfscheck/</code></a> +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/btrfsutil/</code></a> +are libraries for "btrfs-progs" type programs, that are userland-y +things that I thought should be separate from the core implementation; +something that frustrated me about libbtrfs was having to figure out "is this thing here in support of btrfs bits-on-disk, or in support of a -higher-level ‘how btrfs-progs wants to think about things’?”</p></li> -<li><p><code>cmd/btrfs-rec/</code> is where the command implementations -live. If a sub-command fits in a single file, it’s -<code>cmd/btrfs-rec/inspect_SUBCMD.go</code>, otherwise, it’s in a -separate <code>cmd/btrfs-rec/inspect/SUBCMD/</code> package.</p></li> -<li><p><code>lib/textui/</code> is reasonably central to how the -commands implement a text/CLI user-interface.</p></li> -<li><p><code>lib/binstruct/</code>, <code>lib/diskio/</code>, and -<code>lib/streamio/</code> are non-btrfs-specific libraries related to -the problem domain.</p></li> -<li><p><code>lib/containers/</code>, <code>lib/fmtutil/</code>, -<code>lib/maps/</code>, <code>lib/slices/</code>, and -<code>lib/profile/</code> are all generic Go libraries that have nothing -to do with btrfs or the problem domain, but weren’t in the Go standard -library and I didn’t find/know-of exiting implementations that I liked. -Of these, all but <code>containers</code> are pretty simple utility -libraries. Also, some of these things have been added to the standard -library since I started the project.</p></li> +higher-level 'how btrfs-progs wants to think about things'?"</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>cmd/btrfs-rec/</code></a> +is where the command implementations live. If a sub-command fits in a +single file, it's +<code>cmd/btrfs-rec/inspect_<var>SUBCMD</var>.go</code>, otherwise, it's +in a separate <code>cmd/btrfs-rec/inspect/<var>SUBCMD</var>/</code> +package.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/textui?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/textui/</code></a> +is reasonably central to how the commands implement a text/CLI +user-interface.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/binstruct?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/binstruct/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/diskio?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/diskio/</code></a>, +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/streamio?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/streamio/</code></a> +are non-btrfs-specific libraries related to the problem domain.</p></li> +<li><p><a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/containers?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/containers/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/fmtutil?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/fmtutil/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/maps?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/maps/</code></a>, +<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/slices?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/slices/</code></a>, +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/profile?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>lib/profile/</code></a> +are all generic Go libraries that have nothing to do with btrfs or the +problem domain, but weren't in the Go standard library and I didn't +find/know-of exiting implementations that I liked. Of these, all but +<code>containers</code> are pretty simple utility libraries. Also, some +of these things have been added to the standard library since I started +the project.</p></li> </ul> <h2 id="base-decisions-cli-structure-go-json">4.2. Base decisions: CLI structure, Go, JSON</h2> @@ -281,8 +304,8 @@ decided I should just focus on learning btrfs-bits-on-disk.</p></li> <li><p>writing a new thing: It was becoming increasingly apparent to me that it was going to be an uphill-fight of having recovery-tools share the same code as the main-tools, as the routines used by the main-tools -rightly have validity checks, where recovery-tools want to say “yes, I -know it’s invalid, can you give it to me anyway?”.</p></li> +rightly have validity checks, where recovery-tools want to say "yes, I +know it's invalid, can you give it to me anyway?".</p></li> <li><p>writing it in not-C: I love me some C, but higher level languages are good for productivity. And I was trying to write a whole lot of code at once, I needed a productivity boost.</p></li> @@ -294,7 +317,7 @@ bits-on-disk.</p></li> Go, so I had Go swapped into my brain. And Go still feels close to C but provides <em>a lot</em> of niceness and safety over C.</p></li> </ul> -<p>It turned out that Go was perhaps not the best choice, but we’ll come +<p>It turned out that Go was perhaps not the best choice, but we'll come back to that.</p> <p>I wanted to separate things into a pipeline. For instance: Instead of <code>btrfs rescue chunk-recover</code> trying to do everything to @@ -307,31 +330,32 @@ of JSON. Then I can feed that JSON to rebuilds the mappings in the chunk tree, and dumps them as JSON. And then other commands can consume that <code>mappings.json</code> to use that instead of trying to read the chunk tree from the actual FS, so -that you don’t have to make potentially destructive writes to inspect an +that you don't have to make potentially destructive writes to inspect an FS with a broken chunk tree, and can inspect it more forensically. Or -then use <code>btrfs-rec repair SOME_SUBCMD_I_HAVENT_WRITTEN_YET</code> -to write that chunk tree in <code>mappings.json</code> back to the -filesystem.</p> +then use <code>btrfs-rec repair +<var>SOME_SUBCMD_I_HAVENT_WRITTEN_YET</var></code> to write that chunk +tree in <code>mappings.json</code> back to the filesystem.</p> <p>(But also, the separate steps thing was useful just so I could iterate on the algorithms of <code>rebuild-mappings process</code> separately from having to scan the entire FS)</p> -<p>So, I made the decision that <code>btrfs-rec inspect SUBCMD</code> -commands should all only open the FS read-only, and output their work to -a separate file; that writing that info back to the FS should be -separate in <code>btrfs-rec repair SUBCMD</code>.</p> +<p>So, I made the decision that <code>btrfs-rec inspect +<var>SUBCMD</var></code> commands should all only open the FS read-only, +and output their work to a separate file; that writing that info back to +the FS should be separate in <code>btrfs-rec repair +<var>SUBCMD</var></code>.</p> <p>For connecting those parts of the pipeline, I chose JSON, for a few reasons:</p> <ul> <li><p>I wanted something reasonably human-readable, so that I could debug it easier.</p></li> <li><p>I wanted something reasonably human-readable, so that human -end-users could make manual edits; for example, in -<code>examples/main.sh</code> I have an example of manually editing -<code>mappings.json</code> to resolve a region that the algorithm -couldn’t figure out, but with knowledge of what caused the corruption a -human can.</p></li> -<li><p>I didn’t want to invent my own DSL and have to handle writing a -parser. (This part didn’t pay off! See below.)</p></li> +end-users could make manual edits; for example, in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/examples/main.sh?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>examples/main.sh</code></a> +I have an example of manually editing <code>mappings.json</code> to +resolve a region that the algorithm couldn't figure out, but with +knowledge of what caused the corruption a human can.</p></li> +<li><p>I didn't want to invent my own DSL and have to handle writing a +parser. (This part didn't pay off! See below.)</p></li> <li><p>I wanted something that I thought would have good support in a variety of languages, so that if Go is problematic for getting things merged upstream it could be rewritten in C (or maybe Rust?) piece-meal @@ -342,13 +366,14 @@ where each subcommand can be rewritten one at a time.</p></li> <ul> <li><p>I spent a lot of time getting the garbage collector to not just kill performance.</p></li> -<li><p>The <code>btrfs-rec inspect rebuild-mappings SUBCMD</code> -subcommands all throw a lot of data through the JSON encoder/decoder, -and I learned that the Go stdlib <code>encoding/json</code> package has -memory use that grows O(n^2) (-ish? I didn’t study the implementation, -but that’s what the curve looks like just observing it) on the size of -the data being shoved through it, so I had to go take a break and go -write https://pkg.go.dev/git.lukeshu.com/go/lowmemjson which is a +<li><p>The <code>btrfs-rec inspect rebuild-mappings +<var>SUBCMD</var></code> subcommands all throw a lot of data through the +JSON encoder/decoder, and I learned that the Go stdlib +<code>encoding/json</code> package has memory use that grows O(n^2) +(-ish? I didn't study the implementation, but that's what the curve +looks like just observing it) on the size of the data being shoved +through it, so I had to go take a break and go write +https://pkg.go.dev/git.lukeshu.com/go/lowmemjson which is a mostly-drop-in-replacement that tries to be as close-as possible to O(1) memory use. So I did end up having to write my own parser anyway :(</p></li> @@ -356,7 +381,7 @@ memory use. So I did end up having to write my own parser anyway <h2 id="algorithms">4.3. Algorithms</h2> <p>There are 3 algorithms of note in <code>btrfs-rec</code>, that I think are worth getting into mainline btrfs-progs even if the code of -<code>btrfs-rec</code> doesn’t get in:</p> +<code>btrfs-rec</code> doesn't get in:</p> <ol type="1"> <li><p>The <code>btrfs-rec inspect rebuild-mappings</code> algoritithm to rebuild information from the <code>CHUNK_TREE</code>, @@ -370,8 +395,10 @@ re-attach lost branches to broken B+ trees.</p></li> <code>rebuild-mappings</code> algorithm</h3> <p>(This step-zero scan is <code>btrfs-rec inspect rebuild-mappings scan</code>, and principally -lives in <code>./lib/btrfsutil/scan.go</code> and -<code>./cmd/btrfs-rec/inspect/rebuidmappings/scan.go</code>)</p> +lives in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/scan.go</code></a> +and <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildmappings/scan.go</code></a>)</p> <ol start="0" type="1"> <li>Similar to <code>btrfs rescue chunk-recover</code>, scan each device for things that look like nodes; keep track of: @@ -387,22 +414,22 @@ the generation.</li> <p>Create a bucket of the data from Chunks, DevExtents, and BlockGroups; since these are mostly a Chunk and a DevExtent+BlockGroup store pretty much the same information; we can use one to reconstruct the other. How -we “merge” these and handle conflicts is in -<code>./lib/btrfs/btrfsvol/lvm.go:addMapping()</code>, I don’t think -this part is particularly clever, but given that +we "merge" these and handle conflicts is in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfs/btrfsvol/lvm.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n121"><code>./lib/btrfs/btrfsvol/lvm.go:addMapping()</code></a>, +I don't think this part is particularly clever, but given that <code>btrfs rescue chunk-recover</code> crashes if it encounters two overlapping chunks, I suppose I should spell it out:</p> <ul> -<li><p>A “mapping” is represented as a group of 4 things:</p> +<li><p>A "mapping" is represented as a group of 4 things:</p> <ul> <li>logical address</li> <li>a list of 1 or more physical addresses (device ID and offset)</li> -<li>size, and a Boolean indicator of whether the size is “locked”</li> +<li>size, and a Boolean indicator of whether the size is "locked"</li> <li>block group flags, and a Boolean presence-indicator</li> </ul></li> <li><p>Mappings must be merged if their logical or physical regions overlap.</p></li> -<li><p>If a mapping has a “locked” size, then when merging it may +<li><p>If a mapping has a "locked" size, then when merging it may subsume smaller mappings with unlocked sizes, but its size cannot be changed; trying to merge a locked-size mapping with another mapping that is not for a subset region should return an error.</p></li> @@ -410,47 +437,48 @@ is not for a subset region should return an error.</p></li> not be changed; it may only be merged with another mapping that does not have flags present, or has identical flags.</p></li> <li><p>When returning an error because of overlapping non-mergeable -mappings, just log an error on stderr and keep going. That’s an +mappings, just log an error on stderr and keep going. That's an important design thing that is different than normal filesystem code; if -there’s an error, yeah, detect and notify about it, <strong>but don’t +there's an error, yeah, detect and notify about it, <strong>but don't bail out of the whole routine</strong>. Just skip that one item or whatever.</p></li> </ul> -<p>Now that we know how to “add a mapping”, let’s do that:</p> +<p>Now that we know how to "add a mapping", let's do that:</p> <p>(The following main-steps are <code>btrfs-rec inspect rebuild-mappings process</code>, and principally -live in -<code>./cmd/btrfs-rec/inspect/rebuidmappings/process.go</code>)</p> +live in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/process.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildmappings/process.go</code></a>)</p> <ol type="1"> <li><p>Add all found Chunks.</p></li> <li><p>Add all found DevExtents.</p></li> <li><p>Add a phyical:logical mapping of length nodesize for each node that was found.</p></li> <li><p>Any mappings from steps 2 or 3 that are missing blockgroup flags -(that is: they weren’t able to be merged with a mapping from step 1), +(that is: they weren't able to be merged with a mapping from step 1), use the found BlockGroups to fill in those flags.</p></li> -<li><p>Now we’ll merge all found CSum items into a map of the sums of +<li><p>Now we'll merge all found CSum items into a map of the sums of the logical address space. Sort all of the csum items by generation, then by address. Loop over them in that order, inserting their sums into the map. If two csum items overlap, but agree about the sums of the -overlapping region, that’s fine, just take their union. For overlaps +overlapping region, that's fine, just take their union. For overlaps that disagree, items with a newer generation kick out items with an -older generation. If disagreeing items have the same generation… I don’t -think that can happen except by a filesystem bug (i.e. not by a failing -drive or other external corruption), so I wasn’t too concerned about it, -so I just log an error on stderr and skip the later-processed item. See -<code>./cmd/btrfs-rec/inspect/rebuidmappings/process_sums_logical.go</code>.</p> +older generation. If disagreeing items have the same generation... I +don't think that can happen except by a filesystem bug (i.e. not by a +failing drive or other external corruption), so I wasn't too concerned +about it, so I just log an error on stderr and skip the later-processed +item. See <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildmappings/process_sums_logical.go</code></a>.</p> <p>Look at regions of the logical address space that meet all the 3 criteria:</p> <ul> <li>we have CSum items for them</li> <li>we have a BlockGroup for them</li> -<li>we don’t have a Chunk/DevExtent mapping them to the pysical address +<li>we don't have a Chunk/DevExtent mapping them to the pysical address space.</li> </ul> <p>Pair those CSums up with BlockGroups, and for each BlockGroup, search the list of checksums of physical blocks to try to find a physical -region that matches the logical csums (and isn’t already mapped to a +region that matches the logical csums (and isn't already mapped to a different logical region). I used a Knuth-Morris-Pratt search, modified to handle holes in the logical csum list as wildcards.</p> <p>Insert any found mappings into our bucket of mappings.</p></li> @@ -464,34 +492,35 @@ second best is less than a 50% match, then I add the best match. In my experience, the best match is &gt;90% (or at whatever the maximum percent is for how much of the BlockGroup has logical sums), and the second best is 0% or 1%. The point of tracking both is that if there -isn’t a clear-cut winner, I don’t want it to commit to a potentially +isn't a clear-cut winner, I don't want it to commit to a potentially wrong choice.</p></li> </ol> <h3 id="the---rebuild-algorithm">4.3.2. The <code>--rebuild</code> algorithm</h3> <p>The <code>--rebuild</code> flag is implied by the <code>--trees=trees.json</code> flag, and triggers an algorithm that -allows “safely” reading from a broken B+ tree, rather than the usual B+ +allows "safely" reading from a broken B+ tree, rather than the usual B+ tree lookup and search functions. I probably should have tried to understand the <code>btrfs restore</code> algorithm, maybe I reinvented -the wheel…</p> +the wheel...</p> <p>This algorithm requires a list of all nodes on the filesystem; we -find these using the same scan as above -(<code>./lib/btrfsutil/scan.go</code>), the same procedure as -<code>btrfs rescue chunk-recover</code>.</p> +find these using the same scan as above (<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/scan.go</code></a>), +the same procedure as <code>btrfs rescue chunk-recover</code>.</p> <p>We walk all of those nodes, and build a reasonably lightweight -in-memory graph of all nodes (<code>./lib/btrfsutil/graph.go</code>), +in-memory graph of all nodes (<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/graph.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/graph.go</code></a>), tracking</p> <ul> -<li>each node’s +<li>each node's <ul> <li>logical address</li> <li>level</li> <li>generation</li> <li>tree</li> -<li>each item’s key and size</li> +<li>each item's key and size</li> </ul></li> -<li>each keypointer’s +<li>each keypointer's <ul> <li>source node</li> <li>source slot within the node</li> @@ -502,20 +531,22 @@ tracking</p> <li>destination generation</li> </ul></li> <li>logical addresses and error messages for nodes that are pointed to -by a keypointer or the superblock, but can’t be read (because that -logical address isn’t mapped, or it doesn’t look like a node, or…)</li> +by a keypointer or the superblock, but can't be read (because that +logical address isn't mapped, or it doesn't look like a node, +or...)</li> <li>an index such that for a given node we can quickly list both keypointers both originating at that node and pointing to that node.</li> </ul> <h4 id="rebuilt-forrest-behavior-looking-up-trees">4.3.2.1. rebuilt forrest behavior (looking up trees)</h4> -<p>(see: <code>./lib/btrfsutil/rebuilt_forrest.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/rebuilt_forrest.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/rebuilt_forrest.go</code></a>)</p> <ul> <li>The <code>ROOT_TREE</code>, <code>CHUNK_TREE</code>, <code>TREE_LOG</code>, and <code>BLOCK_GROUP_TREE</code> (the trees -pointed to directy by the superblock) work as you’d expect.</li> -<li>For other trees, we (as you’d expect) look up the root item in the +pointed to directy by the superblock) work as you'd expect.</li> +<li>For other trees, we (as you'd expect) look up the root item in the rebuilt <code>ROOT_TREE</code>, and then (if rootitem.ParentUUID is non-zero) eagerly also look up the parent tree (recursing on ourself). We try to use the <code>UUID_TREE</code> tree to help with this, but @@ -524,7 +555,7 @@ If we fail to look up the parent tree (or its parent, or a more distant ancestor), then (depending on a flag) we either make a note of that, or error out and fail to look up the child tree. For <code>--rebuild</code> and <code>--trees=trees.json</code> we are permissive of this error, and -just make note of it; but we’ll re-use this algorithm in the +just make note of it; but we'll re-use this algorithm in the <code>rebuild-trees</code> algorithm below, and it needs the more strict handling.</li> <li>When creating the rebuilt individual tree, we start by adding the @@ -536,15 +567,16 @@ additional root nodes grafted on to the tree by the </ul> <h4 id="rebuilt-individual-tree-behavior">4.3.2.2. rebuilt individual tree behavior</h4> -<p>(see: <code>./lib/btrfsutil/rebuilt_tree.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfsutil/rebuilt_tree.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfsutil/rebuilt_tree.go</code></a>)</p> <p>In order to read from a tree, we first have to build a few indexes. We store these indexes in an Adaptive Replacement Cache; they are all -re-buildable based on the tree’s list of roots and the above graph; if -we have a bunch of trees we don’t need to keep all of this in memory at -once. Note that this is done 100% with the in-memory graph, we don’t +re-buildable based on the tree's list of roots and the above graph; if +we have a bunch of trees we don't need to keep all of this in memory at +once. Note that this is done 100% with the in-memory graph, we don't need to read anything from the filesystem during these procedures.</p> <ul> -<li><p>The first index we build is the “node index”. This is an index +<li><p>The first index we build is the "node index". This is an index that for every node tells us what root(s) the tree would need to have in order for the tree to include that node, and also what the highest item key would be acceptable in the node if the tree includes that root. We @@ -553,12 +585,12 @@ case the tree is real broken and there are multiple paths from the root to the node; as these different paths may imply different max-item constraints. Put more concretely, the type of the index is:</p> <pre><code>map[ nodeID → map[ rootNodeID → {loMaxItem, hiMaxItem} ] ]</code></pre> -<p>We’ll do a loop over the graph, using dynamic-programming memoization +<p>We'll do a loop over the graph, using dynamic-programming memoization to figure out ordering and avoid processing the same node twice; for -each node we’ll</p> +each node we'll</p> <ul> -<li><p>Check whether the owner-tree is this tree or one of this tree’s -ancestors (and if it’s an ancestor, that the node’s generation isn’t +<li><p>Check whether the owner-tree is this tree or one of this tree's +ancestors (and if it's an ancestor, that the node's generation isn't after the point that the child tree was forked from the parent tree). If not, we are done processing that node (record an empty/nil set of roots for it).</p></li> @@ -566,27 +598,27 @@ for it).</p></li> {<code>loMaxItem</code>, <code>hiMaxItem</code>}.</p></li> <li><p>Look at each keypointer that that points at the node and:</p> <ul> -<li><p>Skip the keypointer if its expectations of the node aren’t met: -if the level, generation, and min-key constraints don’t match up. If the -keypointer isn’t in the last slot in the source node, we also go ahead -and include checking that the destination node’s max-key is under the -min-key of the keypointer in the next slot, since that’s cheap to do +<li><p>Skip the keypointer if its expectations of the node aren't met: +if the level, generation, and min-key constraints don't match up. If the +keypointer isn't in the last slot in the source node, we also go ahead +and include checking that the destination node's max-key is under the +min-key of the keypointer in the next slot, since that's cheap to do now.</p></li> -<li><p>Skip the keypointer if its source node’s owner-tree isn’t this -tree or one of this tree’s ancestors (and if it’s an ancestor, that the -node’s generation isn’t after the point that the child tree was forked +<li><p>Skip the keypointer if its source node's owner-tree isn't this +tree or one of this tree's ancestors (and if it's an ancestor, that the +node's generation isn't after the point that the child tree was forked from the parent tree).</p></li> -<li><p>dynamic-programming recurse and index the keypointer’s source +<li><p>dynamic-programming recurse and index the keypointer's source node.</p></li> -<li><p>for every root that would result in the keypointer’s source node +<li><p>for every root that would result in the keypointer's source node being included in the tree:</p> <p>. If the keypointer is in the last slot, look at what the what the -source node’s last-item constraints would be if that root is included, +source node's last-item constraints would be if that root is included, and can now check the max-item of our destination node. We check against the <code>hiMaxItem</code>; as if there is any valid path from the root to this node, then we want to be permissive and include it. If that -check fails, then we’re done with this keypointer. Also, make node of -those <code>loMaxItem</code> and <code>hiMaxItem</code> values, we’ll +check fails, then we're done with this keypointer. Also, make node of +those <code>loMaxItem</code> and <code>hiMaxItem</code> values, we'll use them again in just a moment.</p> <p>. Otherwise, set both <code>loMaxItem</code> and <code>hiMaxItem</code> to 1-under the min-item of the keypointer in the @@ -608,14 +640,14 @@ be a (potential) root, and insert <code>rootID=thisNode</code> -& <code>hiMaxItem</code>} map and insert it into the index as the entry for this node.</p></li> </ul></li> -<li><p>The next index we build is the “item index”. This is a “sorted -map” (implemented as a red-black tree, supporting sub-range iteration) +<li><p>The next index we build is the "item index". This is a "sorted +map" (implemented as a red-black tree, supporting sub-range iteration) of <code>key</code> → {<code>nodeID</code>, <code>slotNumber</code>}; a map that for each key tells us where to find the item with that key.</p> <ul> <li><p>Loop over the node index, and for each node check if both (a) it has <code>level==0</code> (is a leaf node containing items), and (b) its -set of roots that would include it has any overlap with the tree’s set +set of roots that would include it has any overlap with the tree's set of roots.</p></li> <li><p>Loop over each of those included leaf nodes, and loop over the items in each node. Insert the <code>key</code> → {<code>nodeId</code>, @@ -625,10 +657,10 @@ that key, decide which one wins by:</p> <li><p>Use the one from the node with the owner-tree that is closer to this tree; node with owner=thisTree wins over a node with owner=thisTree.parent, which would win over a node with -owner.thisTree.parent.parent. If that’s a tie, then…</p></li> -<li><p>Use the one from the node with the higher generation. If that’s a -tie, then…</p></li> -<li><p>I don’t know, I have the code <code>panic</code>:</p> +owner.thisTree.parent.parent. If that's a tie, then...</p></li> +<li><p>Use the one from the node with the higher generation. If that's a +tie, then...</p></li> +<li><p>I don't know, I have the code <code>panic</code>:</p> <pre><code>// TODO: This is a panic because I&#39;m not really sure what the // best way to handle this is, and so if this happens I want the // program to crash and force me to figure out how to handle it. @@ -641,27 +673,27 @@ panic(fmt.Errorf(&quot;dup nodes in tree=%v: old=%v=%v ; new=%v=%v&quot; <p>Note that this algorithm means that for a given node we may use a few items from that node, while having other items from that same node be overridden by another node.</p></li> -<li><p>The final index we build is the “error index”. This is an index +<li><p>The final index we build is the "error index". This is an index of what errors correspond to which range of keys, so that we can report -them, and give an idea of “there may be entries missing from this -directory” and similar.</p> -<p>For each error, we’ll track the min-key and max-key of the range it -applies to, the node it came from, and what the error string is. We’ll +them, and give an idea of "there may be entries missing from this +directory" and similar.</p> +<p>For each error, we'll track the min-key and max-key of the range it +applies to, the node it came from, and what the error string is. We'll store these into an interval tree keyed on that min-key/max-key range.</p> <ul> <li><p>Create an empty set <code>nodesToProcess</code>. Now populate it:</p> <ul> -<li><p>Once again, we’ll loop over the node index, but this time we’ll -only check that there’s overlap between the set of roots that would -include the node and the tree’s set of roots. The nodes that are +<li><p>Once again, we'll loop over the node index, but this time we'll +only check that there's overlap between the set of roots that would +include the node and the tree's set of roots. The nodes that are included in this tree, insert both that node itself and all node IDs that it has keypointers pointing to into the <code>nodesToProcess</code> set.</p></li> -<li><p>Also insert all of the tree’s roots into +<li><p>Also insert all of the tree's roots into <code>nodesToProcess</code>; this is in case the superblock/root-item -points to an invalid node that we couldn’t read.</p></li> +points to an invalid node that we couldn't read.</p></li> </ul></li> <li><p>Now loop over <code>nodesToProcess</code>. For each node, create an empty list of errors. Use the keypointers pointing to and the min @@ -670,17 +702,17 @@ expectations for the node; this should be reasonably straight-forward, given:</p> <ul> <li><p>If different keypointers have disagreeing levels, insert an error -in to the list, and don’t bother with checking the node’s +in to the list, and don't bother with checking the node's level.</p></li> <li><p>If different keypointers have disagreeing generations, insert an -error in to the list, and don’t bother with checking the node’s +error in to the list, and don't bother with checking the node's generation.</p></li> <li><p>If different keypointers have different min-item expectations, use the max of them.</p></li> </ul> <p>Then:</p> <ul> -<li>If the node is a “bad node” in the graph, insert the error message +<li>If the node is a "bad node" in the graph, insert the error message associated with it. Otherwise, check those expectations against the node in the graph.</li> </ul> @@ -696,22 +728,23 @@ then set the range to the zero-key through the max-key.</p></li> operations using those indexes; exact-lookup using the item index, and range-lookups and walks using the item index together with the error index. Efficiently searching the <code>CSUM_TREE</code> requires knowing -item sizes, so that’s why we recorded the item sizes into the graph.</p> +item sizes, so that's why we recorded the item sizes into the graph.</p> <h3 id="the-rebuild-trees-algorithm">4.3.3. The <code>rebuild-trees</code> algorithm</h3> <p>The <code>btrfs inspect rebuild-trees</code> algorithm finds nodes to -attach as extra roots to trees. I think that conceptually it’s the the +attach as extra roots to trees. I think that conceptually it's the the simplest of the 3 algorithms, but turned out to be the hardest to get -right. So… maybe more than the others reference the source code too -(<code>./cmd/btrfs-rec/inspect/rebuildtrees/</code>) because I might -forget some small but important detail.</p> -<p>The core idea here is that we’re just going to walk each tree, +right. So... maybe more than the others reference the source code too +(<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/</code></a>) +because I might forget some small but important detail.</p> +<p>The core idea here is that we're just going to walk each tree, inspecting each item in the tree, and checking for any items that are implied by other items (e.g.: a dir entry item implies the existence of inode item for the inode that it points at). If an implied item is not in the tree, but is in some other node, then we look at which potential roots we could add to the tree that would add that other node. Then, -after we’ve processed all of the items in the filesystem, we go add +after we've processed all of the items in the filesystem, we go add those various roots to the various trees, keeping track of which items are added or updated. If any of those added/updated items have a version with a newer generation on a different node, see what roots we could add @@ -721,16 +754,17 @@ version of each item has been added, loop back and inspect all added/updated items for implied items, keeping track of roots we could add. Repeat until a steady-state is reached.</p> <p>There are lots of little details in that process, some of which are -for correctness, and some of which are for “it should run in hours -instead of weeks.”</p> +for correctness, and some of which are for "it should run in hours +instead of weeks."</p> <h4 id="initialization">4.3.3.1. initialization</h4> -<p>First up, we’re going to build and in-memory graph, same as above. -But this time, while we’re reading the nodes to do that, we’re also +<p>First up, we're going to build and in-memory graph, same as above. +But this time, while we're reading the nodes to do that, we're also going to watch for some specific items and record a few things about them.</p> -<p>(see: <code>./cmd/btrfs-rec/inspect/rebuildtrees/scan.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/scan.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/scan.go</code></a>)</p> <p>For each {<code>nodeID</code>, <code>slotNumber</code>} pair that -matches one of these item types, we’re going to record:</p> +matches one of these item types, we're going to record:</p> <ul> <li>flags: <ul> @@ -739,17 +773,17 @@ matches one of these item types, we’re going to record:</p> </ul></li> <li>names: <ul> -<li><code>DIR_INDEX</code> items: the file’s name</li> +<li><code>DIR_INDEX</code> items: the file's name</li> </ul></li> <li>sizes: <ul> <li><code>EXTENT_CSUM</code> items: the number of bytes that this is a -sum for (i.e. the item size over the checksum size, times the block +sum for (i.e. the item size over the checksum size, times the block size)</li> <li><code>EXTENT_DATA</code> items: the number of bytes in this extent -(i.e. either the item size minus +(i.e. either the item size minus <code>offsetof(btrfs_file_extent_item.disk_bytenr)</code> if -<code>FILE_EXTENT_INLINE</code>, or else the item’s +<code>FILE_EXTENT_INLINE</code>, or else the item's <code>num_bytes</code>).</li> </ul></li> <li>data backrefs: @@ -764,19 +798,19 @@ at.</li> </ul></li> </ul> <h4 id="the-main-loop">4.3.3.2. the main loop</h4> -<p>(see: -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go</code></a>)</p> <p>Start with that scan data (graph + info about items), and also a rebuilt forrest from the above algorithm, but with:</p> <ul> -<li><p>the flag set so that it refuses to look up a tree if it can’t -look up all of that tree’s ancestors</p></li> -<li><p>an additional “potential-item index” that is similar to the item +<li><p>the flag set so that it refuses to look up a tree if it can't +look up all of that tree's ancestors</p></li> +<li><p>an additional "potential-item index" that is similar to the item index. It is generated the same way and can cache/evict the same way; the difference is that we invert the check for if the set of roots for a -node has overlap with the tree’s set of roots; we’re looking for +node has overlap with the tree's set of roots; we're looking for <em>potential</em> nodes that we could add to this tree.</p></li> -<li><p>some callbacks; we’ll get to what we do in these callbacks in a +<li><p>some callbacks; we'll get to what we do in these callbacks in a bit, but for now, what the callbacks are:</p> <ul> <li><p>a callback that is called for each added/updated item when we add @@ -787,10 +821,10 @@ a root.</p></li> ID.</p></li> </ul></li> </ul> -<p>(The callbacks are in -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go</code>)</p> -<p>We have 5 unordered queues (“work lists”?); these are sets that when -it’s time to drain them we’ll sort the members and process them in that +<p>(The callbacks are in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_treecb.go</code></a>)</p> +<p>We have 5 unordered queues ("work lists"?); these are sets that when +it's time to drain them we'll sort the members and process them in that order.</p> <ol type="1"> <li>the tree queue: a list of tree IDs that we need to crawl</li> @@ -799,7 +833,7 @@ should re-process if we add a root to that tree</li> <li>the added-item queue: a set of key/tree pairs identifying items that have been added by adding a root to a tree</li> <li>the settled-item-queue: a set of key/tree pairs that have have not -just been added by adding a root, but we’ve also verified that they are +just been added by adding a root, but we've also verified that they are the newest-generation item with that key that we could add to the tree.</li> <li>the augment queue: for each item that we want to add to a tree, the @@ -807,19 +841,19 @@ list of roots that we could add to get that item.</li> </ol> <p>The roots all start out empty, except for the tree queue, which we seed with the <code>ROOT_TREE</code>, the <code>CHUNK_TREE</code>, and -the <code>BLOCK_GROUP_TREE</code> (It is a “TODO” task that it should +the <code>BLOCK_GROUP_TREE</code> (It is a "TODO" task that it should probably also be seeded with the <code>TREE_LOG</code>, but as I will -say below in the “future work” section, I don’t actually understand the -<code>TREE_LOG</code>, so I couldn’t implement it).</p> -<p>Now we’re going to loop until the tree queue, added-item queue, +say below in the "future work" section, I don't actually understand the +<code>TREE_LOG</code>, so I couldn't implement it).</p> +<p>Now we're going to loop until the tree queue, added-item queue, settled-item queue, and augment queue are all empty (all queues except -for the retry-item queue). Each loop “pass” has 3 substeps:</p> +for the retry-item queue). Each loop "pass" has 3 substeps:</p> <ol type="1"> <li><p>Crawl the trees (drain the tree queue, fill the added-item queue).</p></li> <li><p>Either:</p> <ol type="a"> -<li><p>if the added-item queue is non-empty: “settle” those items (drain +<li><p>if the added-item queue is non-empty: "settle" those items (drain the added-item queue, fill the augment queue and the settled-item queue).</p></li> <li><p>otherwise: process items (drain the settled-item queue, fill the @@ -828,7 +862,7 @@ augment queue and the tree queue)</p></li> <li><p>Apply augments (drain the augment queue and maybe the retry-item queue, fill the added-item queue).</p></li> </ol> -<p>OK, let’s look at those 3 substeps in more detail:</p> +<p>OK, let's look at those 3 substeps in more detail:</p> <ol type="1"> <li><p>Crawl the trees; drain the tree queue, fill the added-item queue.</p> @@ -841,34 +875,34 @@ callback for each item in one of the added nodes. Our callback inserts each item into the added-item queue. The forrest also calls our root-added callback, but because of the way this algorithm works, that turns out to be a no-op at this step.</p> -<p>I mentioned that we added callbacks to intercept the forrest’s -looking up of root items and resolving UUIDs; we override the forrest’s -“lookup root item” routine and “resolve UUID” routine to instead of +<p>I mentioned that we added callbacks to intercept the forrest's +looking up of root items and resolving UUIDs; we override the forrest's +"lookup root item" routine and "resolve UUID" routine to instead of doing normal lookups on the <code>ROOT_TREE</code> and -<code>UUID_TREE</code>, use the above <code>WantXXX</code> routines that -we’ll define below in the “graph callbacks” section.</p> -<p>It shouldn’t matter what order this queue is processed in, but I sort +<code>UUID_TREE</code>, use the above <code>Want<var>XXX</var></code> +routines that we'll define below in the "graph callbacks" section.</p> +<p>It shouldn't matter what order this queue is processed in, but I sort tree IDs numerically.</p> -<p>The crawling is fairly fast because it’s just in-memory, the only +<p>The crawling is fairly fast because it's just in-memory, the only accesses to disk are looking up root items and resolving UUIDs.</p></li> <li><p>Either:</p> <ol type="a"> <li><p>Settle items from the added-item queue to the settled-item queue (and fill the augment queue).</p> -<p>For each item in the queue, we look in the tree’s item index to get -the {node, slot} pair for it, then we do the same in the tree’s +<p>For each item in the queue, we look in the tree's item index to get +the {node, slot} pair for it, then we do the same in the tree's potential-item index. If the potential-item index contains an entry for -the item’s key, then we check if the potential-item’s node should “win” -over the queue item’s node, deciding the “winner” using the same routine -as when building the item index. If the potential-item’s node wins, then -we add the potential node’s set of roots to the augment queue. If the -queue-item’s node wins, then we add the item to the settled-item queue +the item's key, then we check if the potential-item's node should "win" +over the queue item's node, deciding the "winner" using the same routine +as when building the item index. If the potential-item's node wins, then +we add the potential node's set of roots to the augment queue. If the +queue-item's node wins, then we add the item to the settled-item queue (except, as an optimization, if the item is of a type that cannot possibly imply the existence of another item, then we just drop it and -don’t add it to the settled-item queue).</p> -<p>It shouldn’t matter what order this queue is processed in, but I sort +don't add it to the settled-item queue).</p> +<p>It shouldn't matter what order this queue is processed in, but I sort it numerically by treeID and then by item key.</p> -<p>This step is fairly fast because it’s entirely in-memory, making no +<p>This step is fairly fast because it's entirely in-memory, making no accesses to disk.</p></li> <li><p>Process items from the settled-item queue (drain the settled-item queue, fill the augment queue and the tree queue).</p> @@ -878,7 +912,7 @@ patterns cache-friendly. For the most part, we just sort each queue item by tree, then by key. But, we have special handling for <code>EXTENT_ITEM</code>s, <code>METADATA_ITEM</code>s, and <code>EXTENT_DATA_REF</code> items: We break <code>EXTENT_ITEM</code>s -and <code>METADATA_ITEM</code>s in to “sub-items”, treating each ref +and <code>METADATA_ITEM</code>s in to "sub-items", treating each ref embedded in them as a separate item. For those embedded items that are <code>EXTENT_DATA_REF</code>s, and for stand-alone <code>EXTENT_DATA_REF</code> items, we sort them not with the @@ -890,99 +924,99 @@ that we may visit/read an <code>EXTENT_ITEM</code> or <code>METADATA_ITEM</code> multiple times as we process the queue, but to do otherwise is to solve MinLA, which is NP-hard and also an optimal MinLA solution I still think would perform worse than this; there is a -reasonably lengthy discussion of this in a comment in -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()</code>.</p> +reasonably lengthy discussion of this in a comment in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n251"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:sortSettledItemQueue()</code></a>.</p> <p>Now we loop over that sorted queue. In the code, this loop is deceptively simple. Read the item, then pass it to a function that tells us what other items are implied by it. That function is large, but -simple; it’s just a giant table. The trick is how it tells us about +simple; it's just a giant table. The trick is how it tells us about implied items; we give it set of callbacks that it calls to tell us -these things; the real complexity is in the callbacks. These “graph -callbacks” will be discussed in detail below, but as an illustrative +these things; the real complexity is in the callbacks. These "graph +callbacks" will be discussed in detail below, but as an illustrative example: It may call <code>.WantOff()</code> with a tree ID, object ID, item type, and offset to specify a precise item that it believes should exist.</p> <p>If we encounter a <code>ROOT_ITEM</code>, add the tree described by that item to the tree queue.</p></li> </ol> -<p>(Both the “can this item even imply the existence of another item” -check and the “what items are implied by this item” routine are in -<code>./lib/btrfscheck/graph.go</code>)</p></li> +<p>(Both the "can this item even imply the existence of another item" +check and the "what items are implied by this item" routine are in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/btrfscheck/graph.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./lib/btrfscheck/graph.go</code></a>)</p></li> <li><p>Apply augments; drain the augment queue (and maybe the retry-item queue), fill the added-item queuee.</p> -<p>It is at this point that I call out that the augment queue isn’t +<p>It is at this point that I call out that the augment queue isn't implemented as a simple map/set like the others, the <code>treeAugmentQueue struct</code> has special handling for sets of different sizes; optimizing the space for empty and len()==1 sized sets, and falling back to normal the usual implementation for larger sets; this is important because those small sets are the overwhelming -majority, and otherwise there’s no way the program would be able to run +majority, and otherwise there's no way the program would be able to run on my 32GB RAM laptop. Now that I think about it, I bet it would even be worth it to add optimized storage for len()==2 sized sets.</p> -<p>The reason is that each “want” from above is tracked in the queue +<p>The reason is that each "want" from above is tracked in the queue separately; if we were OK merging them, then this optimized storage -wouldn’t be nescessary. But we keep them separate, so that:</p> +wouldn't be nescessary. But we keep them separate, so that:</p> <ul> -<li><p>For all “wants”, including ones with empty sets, graph callbacks +<li><p>For all "wants", including ones with empty sets, graph callbacks can check if a want has already been processed; avoiding re-doing any work (see the description of the graph callbacks below).</p></li> -<li><p>For “wants” with non-empty sets, we can see how many different -“wants” could be satisfied with a given root, in order to decide which +<li><p>For "wants" with non-empty sets, we can see how many different +"wants" could be satisfied with a given root, in order to decide which root to choose.</p></li> </ul> <p>Anyway, we loop over the trees in the augment queue. For each tree we -look at that tree’s augment queue and look at all the choices of root +look at that tree's augment queue and look at all the choices of root nodes to add (below), and decide on a list to add. The we add each of those roots to the tree; the adding of each root triggers several calls to our item-added callback (filling the added-item queue), and our root-added callback. The root-added callback moves any items from the retry-item queue for this tree to the added-item queue.</p> -<p>How do we decide between choices of root nodes to add? -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()</code> -has a good comment explaining the criteria we’d like to optimize for, +<p>How do we decide between choices of root nodes to add? <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n528"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild.go:resolveTreeAugments()</code></a> +has a good comment explaining the criteria we'd like to optimize for, and then code that does an OK-ish job of actually optimizing for that:</p> <ul> <li><p>It loops over the augment queue for that tree, building a list of possible roots, for each possible root making note of 3 things:</p> <ol type="a"> -<li><p>how many “wants” that root satisfies,</p></li> -<li><p>how far from treee the root’s owner is (owner=tree is a distance +<li><p>how many "wants" that root satisfies,</p></li> +<li><p>how far from treee the root's owner is (owner=tree is a distance of 0, owner=tree.parent is a distance of 1, owner=tree.parent.parent is a distance of 2, and so on), and</p></li> <li><p>what the generation of that root is.</p></li> </ol></li> <li><p>We sort that list first by highest-count-first, then by lowest-distance-first, then by highest-generation-first.</p></li> -<li><p>We create a “return” set and an “illegal” set. We loop over the +<li><p>We create a "return" set and an "illegal" set. We loop over the sorted list; for each possible root if it is in the illegal set, we skip -it, otherwise we insert it into the return set and for each “want” that +it, otherwise we insert it into the return set and for each "want" that includes this root we all all roots that satisfy that want to the illegal list.</p></li> </ul></li> </ol> <p>It is important that the rebuilt forrest have the flag set so that it -refuses to look up a tree if it can’t look up all of that tree’s +refuses to look up a tree if it can't look up all of that tree's ancestors; otherwise the potential-items index would be garbage as we -wouldn’t have a good idea of which nodes are OK to consider; but this -does have the downside that it won’t even attempt to improve a tree with +wouldn't have a good idea of which nodes are OK to consider; but this +does have the downside that it won't even attempt to improve a tree with a missing parent. Perhaps the algorithm should flip the flag once the loop terminates, and then re-seed the tree queue with each <code>ROOT_ITEM</code> from the <code>ROOT_TREE</code>?</p> <h4 id="graph-callbacks">4.3.3.3. graph callbacks</h4> -<p>(see: -<code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go</code>)</p> +<p>(see: <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>./cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go</code></a>)</p> <p>The graph callbacks are what tie the above together.</p> <p>For each of these callbacks, whenever I say that it looks up -something in a tree’s item index or potential-item index, that implies +something in a tree's item index or potential-item index, that implies looking the tree up from the forrest; if the forrest cannot look up that tree, then the callback returns early, after either:</p> <ul> <li><p>if we are in substep 1 and are processing a tree: we add the tree that is being processed to the tree queue. (TODO: Wait, this assumes that an augment will be applied to the <code>ROOT_TREE</code> before the -next pass… if that isn’t the case, this will result in the loop never -terminating… I guess I need to add a separate retry-tree +next pass... if that isn't the case, this will result in the loop never +terminating... I guess I need to add a separate retry-tree queue?)</p></li> <li><p>if we are in substep 2 and are processing an item: we add the item that is being processed to the retry-item queue for the tree that @@ -991,15 +1025,15 @@ cannot be looked up</p></li> <p>The 6 methods in the <code>brfscheck.GraphCallbacks</code> interface are:</p> <ol type="1"> -<li><p><code>FSErr()</code>: There’s an error with the filesystem; this +<li><p><code>FSErr()</code>: There's an error with the filesystem; this callback just spits it out on stderr. I say such a trivial matter -because, again, for a recovery tool I think it’s worth putting care in +because, again, for a recovery tool I think it's worth putting care in to how you handle errors and where you expect them: We expect them here, so we have to check for them to avoid reading invalid data or whatever, -but we don’t actually need to do anything other than watch our +but we don't actually need to do anything other than watch our step.</p></li> <li><p><code>Want()</code>: We want an item in a given tree with a given -object ID and item type, but we don’t care about what the item’s offset +object ID and item type, but we don't care about what the item's offset is.</p> <p>The callback works by searching the item index to see if it can find such an item; if so, it has nothing else to do and returns. Otherwise, @@ -1011,13 +1045,13 @@ augment queue (even if that set is still empty).</p></li> <li><p><code>WantOff()</code>: The same, but we want a specific offset.</p></li> <li><p><code>WantDirIndex()</code>: We want a <code>DIR_INDEX</code> -item for a given inode and filename, but we don’t know what the offset +item for a given inode and filename, but we don't know what the offset of that item is.</p> <p>First we scan over the item index, looking at all <code>DIR_INDEX</code> items for that inode number. For each item, we can check the scan data to see what the filename in that <code>DIR_INDEX</code> is, so we can see if the item satisfies this want -without accessing the disk. If there’s a match, then there is nothing +without accessing the disk. If there's a match, then there is nothing else to do, so we return. Otherwise, we do that same search over the potential-item index; if we find any matches, then we build the set of roots to add to the augment queue the same as in @@ -1026,71 +1060,73 @@ roots to add to the augment queue the same as in <code>DATA_EXTENT</code> items in the given tree for the given inode, and we want them to cover from 0 to a given size bytes of that file.</p> <p>First we walk that range in the item index, to build a list of the -gaps that we need to fill (“Step 1” in -<code>rebuild_wantcb.go:_wantRange()</code>). This walk -(<code>rebuild_wantcb.go:_walkRange()</code>) requires knowing the size -of each file extent; so doing this quickly without hitting disk is why -we recorded the size of each file extent in our initialization step.</p> -<p>Then (“Step 2” in <code>_wantRange()</code>) we iterate over each of +gaps that we need to fill ("Step 1" in <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n260"><code>rebuild_wantcb.go:_wantRange()</code></a>). +This walk (<a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/cmd/btrfs-rec/inspect/rebuildtrees/rebuild_wantcb.go?id=18e6066c241cf3d252b6521150843ffc858d8434#n195"><code>rebuild_wantcb.go:_walkRange()</code></a>) +requires knowing the size of each file extent; so doing this quickly +without hitting disk is why we recorded the size of each file extent in +our initialization step.</p> +<p>Then ("Step 2" in <code>_wantRange()</code>) we iterate over each of the gaps, and for each gap do a very similar walk (again, by calling <code>_walkRange()</code>, but this time over the potential-item index. For each file extent we find that has is entirely within the gap, we -“want” that extent, and move the beginning of of the gap forward to the +"want" that extent, and move the beginning of of the gap forward to the end of that extent. This algorithm is dumb and greedy, potentially making sub-optimal selections; and so could probably stand to be -improved; but in my real-world use, it seems to be “good -enough”.</p></li> +improved; but in my real-world use, it seems to be "good +enough".</p></li> <li><p><code>WantCSum()</code>: We want 1 or more <code>EXTENT_CSUM</code> items to cover the half-open interval [<code>lo_logical_addr</code>, <code>hi_logical_addr</code>). Well, maybe. It also takes a subvolume ID and an inode number; and looks up in the scan data whether that inode has the <code>INODE_NODATASUM</code> flag set; if it does have the flag set, then it returns early without -looking for any <code>EXTENT_CSUM</code> items. If it doesn’t return +looking for any <code>EXTENT_CSUM</code> items. If it doesn't return early, then it performs the same want-range routine as <code>WantFileExt</code>, but with the appropriate tree, object ID, and item types for csums as opposed to data extents.</p></li> </ol> -<p>For each of these callbacks, we generate a “wantKey”, a tuple +<p>For each of these callbacks, we generate a "wantKey", a tuple representing the function and its arguments; we check the augment-queue -to see if we’ve already enqueued a set of roots for that want, and if +to see if we've already enqueued a set of roots for that want, and if so, that callback can return early without checking the potential-item index.</p> <h1 id="future-work">5. Future work</h1> -<p>It’s in a reasonably useful place, I think; and so now I’m going to -take a break from it for a while. But there’s still lots of work to +<p>It's in a reasonably useful place, I think; and so now I'm going to +take a break from it for a while. But there's still lots of work to do:</p> <ul> -<li><p>RAID almost certainly doesn’t work.</p></li> +<li><p>RAID almost certainly doesn't work.</p></li> <li><p>Encryption is not implemented.</p></li> -<li><p>It doesn’t understand (ignores) the <code>TREE_LOG</code> -(because I don’t understand the <code>TREE_LOG</code>).</p></li> -<li><p><code>btrfs-rec inspect mount</code> should add “lost+found” -directories for inodes that are included in the subvolume’s tree but -aren’t reachable from the tree’s root inode</p></li> -<li><p>I still need to implement <code>btrfs-rec repair SUBCMD</code> -subcommands to write rebuilt-information from +<li><p>It doesn't understand (ignores) the <code>TREE_LOG</code> +(because I don't understand the <code>TREE_LOG</code>).</p></li> +<li><p><code>btrfs-rec inspect mount</code> should add "lost+found" +directories for inodes that are included in the subvolume's tree but +aren't reachable from the tree's root inode</p></li> +<li><p>I still need to implement <code>btrfs-rec repair +<var>SUBCMD</var></code> subcommands to write rebuilt-information from <code>btrfs-rec inspect</code> back to the filesystem.</p></li> <li><p>I need to figure out the error handling/reporting story for <code>mount</code>.</p></li> <li><p>It needs a lot more tests</p> <ul> -<li>I’d like to get the existing btrfs-progs fsck tests to run on +<li>I'd like to get the existing btrfs-progs fsck tests to run on it.</li> </ul></li> <li><p>In the process of writing this email, I realized that I probably -need to add a retry-tree queue; see the “graph callbacks” section in the +need to add a retry-tree queue; see the "graph callbacks" section in the description of the <code>rebuild-trees</code> algorithm above.</p></li> -<li><p>Shere are a number of “TODO” comments or panics in the code:</p> +<li><p>Shere are a number of "TODO" comments or panics in the code:</p> <ul> <li><p>Some of them definitely need done.</p></li> <li><p>Some of them are <code>panic("TODO")</code> on the basis that if -it’s seeing something on the filesystem that it doesn’t recognize, it’s -probably that I didn’t get to implementing that thing/situation, but -it’s possible that the thing is just corrupt. This should only be for +it's seeing something on the filesystem that it doesn't recognize, it's +probably that I didn't get to implementing that thing/situation, but +it's possible that the thing is just corrupt. This should only be for situations that the node passed the checksum test, so it being corrupt would have to be caused by a bug in btrfs rather than a failing drive or -other corruption; I wasn’t too worried about btrfs bugs.</p></li> +other corruption; I wasn't too worried about btrfs bugs.</p></li> </ul></li> <li><p><code>btrfs-rec inspect rebuild-trees</code> is slow, and can probably be made a lot faster.</p> @@ -1102,12 +1138,13 @@ steps on my ThinkPad E15 for a 256GB disk image are as follows:</p> btrfs-rec inspect rebuild-trees : 1h 4m 55s btrfs-rec inspect ls-files : 29m 55s btrfs-rec inspect ls-trees : 8m 40s</code></pre> -<p>For the most part, it’s all single-threaded (with the main exception +<p>For the most part, it's all single-threaded (with the main exception that in several places I/O has been moved to a separate thread from the main CPU-heavy thread), but a lot of the algorithms could be parallelized.</p></li> -<li><p>There are a lot of “tunable” values that I haven’t really spent -time tuning. These are all annotated with <code>textui.Tunable()</code>. +<li><p>There are a lot of "tunable" values that I haven't really spent +time tuning. These are all annotated with <a +href="https://git.lukeshu.com/btrfs-progs-ng/tree/lib/textui/tunable.go?id=18e6066c241cf3d252b6521150843ffc858d8434"><code>textui.Tunable()</code></a>. I sort-of intended for them to be adjustable on the CLI.</p></li> <li><p>Perhaps the <code>btrfs inspect rebuild-trees</code> algorithm could be adjusted to also try to rebuild trees with missing parents; see @@ -1116,20 +1153,20 @@ the above discussion of the algorithm.</p></li> <h1 id="problems-for-merging-this-code-into-btrfs-progs">6. Problems for merging this code into btrfs-progs</h1> <ul> -<li><p>It’s written in Go, not C.</p></li> -<li><p>It’s effectively GPLv3+ (not GPLv2-only or GPLv2+) because of use +<li><p>It's written in Go, not C.</p></li> +<li><p>It's effectively GPLv3+ (not GPLv2-only or GPLv2+) because of use of some code under the Apache 2.0 license (2 files in the codebase itself that are based off of Apache-licensed code, and use of unmodified 3rd-party libraries).</p></li> <li><p>It uses ARC (Adaptive Replacement Cache), which is patented by -IBM, and the patent doesn’t expire for another 7 months. An important +IBM, and the patent doesn't expire for another 7 months. An important property of ARC over LRU is that it is scan-resistant; the above algorithms do a lot of scanning. On that note, now that RedHat is owned by IBM: who in the company do we need to get to talk to eachother so that we can get ARC into the Linux kernel before then?</p></li> </ul> <div style="font-family: monospace"> -<p>– <br/> Happy hacking,<br/> ~ Luke Shumaker<br/></p> +<p>-- <br/> Happy hacking,<br/> ~ Luke Shumaker<br/></p> </div> </content> <author><name>Luke Shumaker</name><uri>https://lukeshu.com/</uri><email>lukeshu@sbcglobal.net</email></author> |