Notes on email message threading ================================ --- date: "2024-06-08" markdown_options: "-smart" --- > I sent an email to Jamie Zawinski with feedback on his venerable > email threading algorithm. Perhaps my commentary will be a useful > reference to others implementing email threading. > > You can see my implementation of his algorithm at > <https://git.lukeshu.com/www/tree/cmd/generate/mailstuff/thread_alg.go> > (and a use of it at > <https://git.lukeshu.com/www/tree/cmd/generate/mailstuff/thread.go>). <div style="font-family: monospace"> To: [Jamie Zawinski] [<jwz@jwz.org>]<br/> Subject: message threading<br/> Date: Sat, 08 Jun 2024 22:34:41 -0600 Message-ID: <87tti2ybry.wl-lukeshu@lukeshu.com> </div> Hi, I'm implementing message threading, and have been referencing both your document [<https://www.jwz.org/doc/threading.html>]; and [RFC 5256]. I'm not sure whether you're interested in updating a document that's more than 25 years old, but if you are: I hope you find the following feedback valuable. You write that the algorithm in RFC 5256 is merely a <q>restating</q> of your algorithm, but I noticed 3 (minor) differences: 1. In your step 1.C, the RFC says to check whether this would create a loop, and if it would to skip creating the link; your version only says to perform this check in step 1.B. 2. The RFC says to sort the messages by date between your steps 4 and 5; that is: when grouping by subject, containers in the root set should be processed in date-order (you do not specify an order), and that if container in the root set is empty then the subject should be taken from the earliest-date child (you say to use an arbitrary child). 3. The RFC precisely states how to trim a subject down to a "base subject," rather than simply saying <q>Strip \`\`Re:'', \`\`RE:'', \`\`RE[5]:'', \`\`Re: Re[4]: Re:'' and so on.</q> Additionally, there are two minor points on which I found their version to be clearer: 1. The RFC specifies how to handle messages without a Message-Id or with a duplicate Message-Id (on [page 9]), as well as how to normalize a Message-Id (by referring to [RFC 2822]). This is perhaps out-of-scope of your algorithm document, but I feel that it would be worth mentioning in your background or definitions section. 2. In your step 1.B, I did not understand what <q>If they are already linked, don't change the existing links</q> meant until I read the RFC, which words it as <q>If a message already has a parent, don't change the existing link.</q> It was unclear to me what <q>they</q> was referring to in your version. <div style="font-family: monospace"> -- <br/> Happy hacking,<br/> ~ Luke T. Shumaker<br/> </div> [Jamie Zawinski]: https://www.jwz.org/ [<jwz@jwz.org>]: https://www.jwz.org/about.html [<https://www.jwz.org/doc/threading.html>]: https://www.jwz.org/doc/threading.html [RFC 5256]: https://datatracker.ietf.org/doc/html/rfc5256 [RFC 2822]: https://datatracker.ietf.org/doc/html/rfc2822 [page 9]: https://datatracker.ietf.org/doc/html/rfc5256#page-9