diff options
author | Luke Shumaker <lukeshu@lukeshu.com> | 2023-07-14 15:25:03 -0700 |
---|---|---|
committer | Luke Shumaker <lukeshu@lukeshu.com> | 2023-07-14 15:25:18 -0700 |
commit | 3250a2386d3111a4ec51b37f42218c90b69ed341 (patch) | |
tree | 32ac6edd81e791d2c3338c1f11e67f40b0cbe007 /public/bash-arrays.html | |
parent | 8c99fadac68cb05b4aaa08cab7a55c7fbfe5e364 (diff) | |
parent | c045654a862bc1119fa4e7584fff9d2a965192ea (diff) |
make: Add the btrfs-rec email
This isn't quite verbatim checking in the email as I did in
btrfs-progs-ng.git, I fussed with it a bit to get my blog engine to do
sane things with it.
Diffstat (limited to 'public/bash-arrays.html')
-rw-r--r-- | public/bash-arrays.html | 172 |
1 files changed, 137 insertions, 35 deletions
diff --git a/public/bash-arrays.html b/public/bash-arrays.html index 8e424bb..a02e60c 100644 --- a/public/bash-arrays.html +++ b/public/bash-arrays.html @@ -10,24 +10,61 @@ <header><a href="/">Luke Shumaker</a> » <a href=/blog>blog</a> » bash-arrays</header> <article> <h1 id="bash-arrays">Bash arrays</h1> -<p>Way too many people don’t understand Bash arrays. Many of them argue that if you need arrays, you shouldn’t be using Bash. If we reject the notion that one should never use Bash for scripting, then thinking you don’t need Bash arrays is what I like to call “wrong”. I don’t even mean real scripting; even these little stubs in <code>/usr/bin</code>:</p> +<p>Way too many people don’t understand Bash arrays. Many of them argue +that if you need arrays, you shouldn’t be using Bash. If we reject the +notion that one should never use Bash for scripting, then thinking you +don’t need Bash arrays is what I like to call “wrong”. I don’t even mean +real scripting; even these little stubs in <code>/usr/bin</code>:</p> <pre><code>#!/bin/sh java -jar /…/something.jar $* # WRONG!</code></pre> -<p>Command line arguments are exposed as an array, that little <code>$*</code> is accessing it, and is doing the wrong thing (for the lazy, the correct thing is <code>-- "$@"</code>). Arrays in Bash offer a safe way preserve field separation.</p> -<p>One of the main sources of bugs (and security holes) in shell scripts is field separation. That’s what arrays are about.</p> +<p>Command line arguments are exposed as an array, that little +<code>$*</code> is accessing it, and is doing the wrong thing (for the +lazy, the correct thing is <code>-- "$@"</code>). Arrays in Bash offer a +safe way preserve field separation.</p> +<p>One of the main sources of bugs (and security holes) in shell scripts +is field separation. That’s what arrays are about.</p> <h2 id="what-field-separation">What? Field separation?</h2> -<p>Field separation is just splitting a larger unit into a list of “fields”. The most common case is when Bash splits a “simple command” (in the Bash manual’s terminology) into a list of arguments. Understanding how this works is an important prerequisite to understanding arrays, and even why they are important.</p> -<p>Dealing with lists is something that is very common in Bash scripts; from dealing with lists of arguments, to lists of files; they pop up a lot, and each time, you need to think about how the list is separated. In the case of <code>$PATH</code>, the list is separated by colons. In the case of <code>$CFLAGS</code>, the list is separated by whitespace. In the case of actual arrays, it’s easy, there’s no special character to worry about, just quote it, and you’re good to go.</p> +<p>Field separation is just splitting a larger unit into a list of +“fields”. The most common case is when Bash splits a “simple command” +(in the Bash manual’s terminology) into a list of arguments. +Understanding how this works is an important prerequisite to +understanding arrays, and even why they are important.</p> +<p>Dealing with lists is something that is very common in Bash scripts; +from dealing with lists of arguments, to lists of files; they pop up a +lot, and each time, you need to think about how the list is separated. +In the case of <code>$PATH</code>, the list is separated by colons. In +the case of <code>$CFLAGS</code>, the list is separated by whitespace. +In the case of actual arrays, it’s easy, there’s no special character to +worry about, just quote it, and you’re good to go.</p> <h2 id="bash-word-splitting">Bash word splitting</h2> -<p>When Bash reads a “simple command”, it splits the whole thing into a list of “words”. “The first word specifies the command to be executed, and is passed as argument zero. The remaining words are passed as arguments to the invoked command.” (to quote <code>bash(1)</code>)</p> -<p>It is often hard for those unfamiliar with Bash to understand when something is multiple words, and when it is a single word that just contains a space or newline. To help gain an intuitive understanding, I recommend using the following command to print a bullet list of words, to see how Bash splits them up:</p> +<p>When Bash reads a “simple command”, it splits the whole thing into a +list of “words”. “The first word specifies the command to be executed, +and is passed as argument zero. The remaining words are passed as +arguments to the invoked command.” (to quote <code>bash(1)</code>)</p> +<p>It is often hard for those unfamiliar with Bash to understand when +something is multiple words, and when it is a single word that just +contains a space or newline. To help gain an intuitive understanding, I +recommend using the following command to print a bullet list of words, +to see how Bash splits them up:</p> <pre><code>printf ' -> %s\n' <var>words…</var><hr> -> word one -> multiline word -> third word </code></pre> -<p>In a simple command, in absence of quoting, Bash separates the “raw” input into words by splitting on spaces and tabs. In other places, such as when expanding a variable, it uses the same process, but splits on the characters in the <code>$IFS</code> variable (which has the default value of space/tab/newline). This process is, creatively enough, called “word splitting”.</p> -<p>In most discussions of Bash arrays, one of the frequent criticisms is all the footnotes and “gotchas” about when to quote things. That’s because they usually don’t set the context of word splitting. <strong>Double quotes (<code>"</code>) inhibit Bash from doing word splitting.</strong> That’s it, that’s all they do. Arrays are already split into words; without wrapping them in double quotes Bash re-word splits them, which is almost <em>never</em> what you want; otherwise, you wouldn’t be working with an array.</p> +<p>In a simple command, in absence of quoting, Bash separates the “raw” +input into words by splitting on spaces and tabs. In other places, such +as when expanding a variable, it uses the same process, but splits on +the characters in the <code>$IFS</code> variable (which has the default +value of space/tab/newline). This process is, creatively enough, called +“word splitting”.</p> +<p>In most discussions of Bash arrays, one of the frequent criticisms is +all the footnotes and “gotchas” about when to quote things. That’s +because they usually don’t set the context of word splitting. +<strong>Double quotes (<code>"</code>) inhibit Bash from doing word +splitting.</strong> That’s it, that’s all they do. Arrays are already +split into words; without wrapping them in double quotes Bash re-word +splits them, which is almost <em>never</em> what you want; otherwise, +you wouldn’t be working with an array.</p> <h2 id="normal-array-syntax">Normal array syntax</h2> <table> <caption> @@ -49,7 +86,9 @@ word </tr> </tbody> </table> -<p>Now, for accessing the array. The most important things to understanding arrays is to quote them, and understanding the difference between <code>@</code> and <code>*</code>.</p> +<p>Now, for accessing the array. The most important things to +understanding arrays is to quote them, and understanding the difference +between <code>@</code> and <code>*</code>.</p> <table> <caption> <h1>Getting an entire array</h1> @@ -74,8 +113,10 @@ word </tr> </tbody> </table> -<p>It’s really that simple—that covers most usages of arrays, and most of the mistakes made with them.</p> -<p>To help you understand the difference between <code>@</code> and <code>*</code>, here is a sample of each:</p> +<p>It’s really that simple—that covers most usages of arrays, and most +of the mistakes made with them.</p> +<p>To help you understand the difference between <code>@</code> and +<code>*</code>, here is a sample of each:</p> <table> <tbody> <tr><th><code>@</code></th><th><code>*</code></th></tr> @@ -99,8 +140,10 @@ done</code></pre></td> </tr> </tbody> </table> -<p>In most cases, <code>@</code> is what you want, but <code>*</code> comes up often enough too.</p> -<p>To get individual entries, the syntax is <code>${array[<var>n</var>]}</code>, where <var>n</var> starts at 0.</p> +<p>In most cases, <code>@</code> is what you want, but <code>*</code> +comes up often enough too.</p> +<p>To get individual entries, the syntax is +<code>${array[<var>n</var>]}</code>, where <var>n</var> starts at 0.</p> <table> <caption> <h1>Getting a single entry from an array</h1> @@ -142,7 +185,8 @@ done</code></pre></td> </tr> </tbody> </table> -<p>Notice that <code>"${array[@]}"</code> is equivalent to <code>"${array[@]:0}"</code>.</p> +<p>Notice that <code>"${array[@]}"</code> is equivalent to +<code>"${array[@]:0}"</code>.</p> <table> <caption> <h1>Getting the length of an array</h1> @@ -165,8 +209,14 @@ done</code></pre></td> </tbody> </table> <h2 id="argument-array-syntax">Argument array syntax</h2> -<p>Accessing the arguments is mostly that simple, but that array doesn’t actually have a variable name. It’s special. Instead, it is exposed through a series of special variables (normal variables can only start with letters and underscore), that <em>mostly</em> match up with the normal array syntax.</p> -<p>Setting the arguments array, on the other hand, is pretty different. That’s fine, because setting the arguments array is less useful anyway.</p> +<p>Accessing the arguments is mostly that simple, but that array doesn’t +actually have a variable name. It’s special. Instead, it is exposed +through a series of special variables (normal variables can only start +with letters and underscore), that <em>mostly</em> match up with the +normal array syntax.</p> +<p>Setting the arguments array, on the other hand, is pretty different. +That’s fine, because setting the arguments array is less useful +anyway.</p> <table> <caption> <h1>Accessing the arguments array</h1> @@ -204,7 +254,9 @@ done</code></pre></td> <tr><td><code>array=("${array[0]}" "${array[@]:<var>n+1</var>}")</code></td><td><code>shift <var>n</var></code></td></tr> </tbody> </table> -<p>Did you notice what was inconsistent? The variables <code>$*</code>, <code>$@</code>, and <code>$#</code> behave like the <var>n</var>=0 entry doesn’t exist.</p> +<p>Did you notice what was inconsistent? The variables <code>$*</code>, +<code>$@</code>, and <code>$#</code> behave like the <var>n</var>=0 +entry doesn’t exist.</p> <table> <caption> <h1>Inconsistencies</h1> @@ -233,11 +285,27 @@ done</code></pre></td> </tr> </tbody> </table> -<p>These make sense because argument 0 is the name of the script—we almost never want that when parsing arguments. You’d spend more code getting the values that it currently gives you.</p> -<p>Now, for an explanation of setting the arguments array. You cannot set argument <var>n</var>=0. The <code>set</code> command is used to manipulate the arguments passed to Bash after the fact—similarly, you could use <code>set -x</code> to make Bash behave like you ran it as <code>bash -x</code>; like most GNU programs, the <code>--</code> tells it to not parse any of the options as flags. The <code>shift</code> command shifts each entry <var>n</var> spots to the left, using <var>n</var>=1 if no value is specified; and leaving argument 0 alone.</p> -<h2 id="but-you-mentioned-gotchas-about-quoting">But you mentioned “gotchas” about quoting!</h2> -<p>But I explained that quoting simply inhibits word splitting, which you pretty much never want when working with arrays. If, for some odd reason, you do what word splitting, then that’s when you don’t quote. Simple, easy to understand.</p> -<p>I think possibly the only case where you do want word splitting with an array is when you didn’t want an array, but it’s what you get (arguments are, by necessity, an array). For example:</p> +<p>These make sense because argument 0 is the name of the script—we +almost never want that when parsing arguments. You’d spend more code +getting the values that it currently gives you.</p> +<p>Now, for an explanation of setting the arguments array. You cannot +set argument <var>n</var>=0. The <code>set</code> command is used to +manipulate the arguments passed to Bash after the fact—similarly, you +could use <code>set -x</code> to make Bash behave like you ran it as +<code>bash -x</code>; like most GNU programs, the <code>--</code> tells +it to not parse any of the options as flags. The <code>shift</code> +command shifts each entry <var>n</var> spots to the left, using +<var>n</var>=1 if no value is specified; and leaving argument 0 +alone.</p> +<h2 id="but-you-mentioned-gotchas-about-quoting">But you mentioned +“gotchas” about quoting!</h2> +<p>But I explained that quoting simply inhibits word splitting, which +you pretty much never want when working with arrays. If, for some odd +reason, you do what word splitting, then that’s when you don’t quote. +Simple, easy to understand.</p> +<p>I think possibly the only case where you do want word splitting with +an array is when you didn’t want an array, but it’s what you get +(arguments are, by necessity, an array). For example:</p> <pre><code># Usage: path_ls PATH1 PATH2… # Description: # Takes any number of PATH-style values; that is, @@ -253,13 +321,22 @@ path_ls() { find -L "${dirs[@]}" -maxdepth 1 -type f -executable \ -printf '%f\n' 2>/dev/null | sort -u }</code></pre> -<p>Logically, there shouldn’t be multiple arguments, just a single <code>$PATH</code> value; but, we can’t enforce that, as the array can have any size. So, we do the robust thing, and just act on the entire array, not really caring about the fact that it is an array. Alas, there is still a field-separation bug in the program, with the output.</p> -<h2 id="i-still-dont-think-i-need-arrays-in-my-scripts">I still don’t think I need arrays in my scripts</h2> +<p>Logically, there shouldn’t be multiple arguments, just a single +<code>$PATH</code> value; but, we can’t enforce that, as the array can +have any size. So, we do the robust thing, and just act on the entire +array, not really caring about the fact that it is an array. Alas, there +is still a field-separation bug in the program, with the output.</p> +<h2 id="i-still-dont-think-i-need-arrays-in-my-scripts">I still don’t +think I need arrays in my scripts</h2> <p>Consider the common code:</p> <pre><code>ARGS=' -f -q' … command $ARGS # unquoted variables are a bad code-smell anyway</code></pre> -<p>Here, <code>$ARGS</code> is field-separated by <code>$IFS</code>, which we are assuming has the default value. This is fine, as long as <code>$ARGS</code> is known to never need an embedded space; which you do as long as it isn’t based on anything outside of the program. But wait until you want to do this:</p> +<p>Here, <code>$ARGS</code> is field-separated by <code>$IFS</code>, +which we are assuming has the default value. This is fine, as long as +<code>$ARGS</code> is known to never need an embedded space; which you +do as long as it isn’t based on anything outside of the program. But +wait until you want to do this:</p> <pre><code>ARGS=' -f -q' … if [[ -f "$filename" ]]; then @@ -267,7 +344,10 @@ if [[ -f "$filename" ]]; then fi … command $ARGS</code></pre> -<p>Now you’re hosed if <code>$filename</code> contains a space! More than just breaking, it could have unwanted side effects, such as when someone figures out how to make <code>filename='foo --dangerous-flag'</code>.</p> +<p>Now you’re hosed if <code>$filename</code> contains a space! More +than just breaking, it could have unwanted side effects, such as when +someone figures out how to make +<code>filename='foo --dangerous-flag'</code>.</p> <p>Compare that with the array version:</p> <pre><code>ARGS=(-f -q) … @@ -277,21 +357,43 @@ fi … command "${ARGS[@]}"</code></pre> <h2 id="what-about-portability">What about portability?</h2> -<p>Except for the little stubs that call another program with <code>"$@"</code> at the end, trying to write for multiple shells (including the ambiguous <code>/bin/sh</code>) is not a task for mere mortals. If you do try that, your best bet is probably sticking to POSIX. Arrays are not POSIX; except for the arguments array, which is; though getting subset arrays from <code>$@</code> and <code>$*</code> is not (tip: use <code>set --</code> to re-purpose the arguments array).</p> -<p>Writing for various versions of Bash, though, is pretty do-able. Everything here works all the way back in bash-2.0 (December 1996), with the following exceptions:</p> +<p>Except for the little stubs that call another program with +<code>"$@"</code> at the end, trying to write for multiple shells +(including the ambiguous <code>/bin/sh</code>) is not a task for mere +mortals. If you do try that, your best bet is probably sticking to +POSIX. Arrays are not POSIX; except for the arguments array, which is; +though getting subset arrays from <code>$@</code> and <code>$*</code> is +not (tip: use <code>set --</code> to re-purpose the arguments +array).</p> +<p>Writing for various versions of Bash, though, is pretty do-able. +Everything here works all the way back in bash-2.0 (December 1996), with +the following exceptions:</p> <ul> <li><p>The <code>+=</code> operator wasn’t added until Bash 3.1.</p> <ul> -<li>As a work-around, use <code>array[${#array[*]}]=<var>word</var></code> to append a single element.</li> +<li>As a work-around, use +<code>array[${#array[*]}]=<var>word</var></code> to append a single +element.</li> </ul></li> -<li><p>Accessing subset arrays of the arguments array is inconsistent if <var>pos</var>=0 in <code>${@:<var>pos</var>:<var>len</var>}</code>.</p> +<li><p>Accessing subset arrays of the arguments array is inconsistent if +<var>pos</var>=0 in <code>${@:<var>pos</var>:<var>len</var>}</code>.</p> <ul> -<li>In Bash 2.x and 3.x, it works as expected, except that argument 0 is silently missing. For example <code>${@:0:3}</code> gives arguments 1 and 2; where <code>${@:1:3}</code> gives arguments 1, 2, and 3. This means that if <var>pos</var>=0, then only <var>len</var>-1 arguments are given back.</li> -<li>In Bash 4.0, argument 0 can be accessed, but if <var>pos</var>=0, then it only gives back <var>len</var>-1 arguments. So, <code>${@:0:3}</code> gives arguments 0 and 1.</li> -<li>In Bash 4.1 and higher, it works in the way described in the main part of this document.</li> +<li>In Bash 2.x and 3.x, it works as expected, except that argument 0 is +silently missing. For example <code>${@:0:3}</code> gives arguments 1 +and 2; where <code>${@:1:3}</code> gives arguments 1, 2, and 3. This +means that if <var>pos</var>=0, then only <var>len</var>-1 arguments are +given back.</li> +<li>In Bash 4.0, argument 0 can be accessed, but if <var>pos</var>=0, +then it only gives back <var>len</var>-1 arguments. So, +<code>${@:0:3}</code> gives arguments 0 and 1.</li> +<li>In Bash 4.1 and higher, it works in the way described in the main +part of this document.</li> </ul></li> </ul> -<p>Now, Bash 1.x doesn’t have arrays at all. <code>$@</code> and <code>$*</code> work, but using <code>:</code> to select a range of elements from them doesn’t. Good thing most boxes have been updated since 1996!</p> +<p>Now, Bash 1.x doesn’t have arrays at all. <code>$@</code> and +<code>$*</code> work, but using <code>:</code> to select a range of +elements from them doesn’t. Good thing most boxes have been updated +since 1996!</p> </article> <footer> |