Notes on email message threading ================================ --- date: "2024-06-08" markdown_options: "-smart" --- > I sent an email to Jamie Zawinski with feedback on his venerable > email threading algorithm. Perhaps my commentary will be a useful > reference to others implementing email threading. > > You can see my implementation of his algorithm at > > (and a use of it at > ).
To: [Jamie Zawinski] [<jwz@jwz.org>]
Subject: message threading
Date: Sat, 08 Jun 2024 22:34:41 -0600 Message-ID: <87tti2ybry.wl-lukeshu@lukeshu.com>
Hi, I'm implementing message threading, and have been referencing both your document [<https://www.jwz.org/doc/threading.html>]; and [RFC 5256]. I'm not sure whether you're interested in updating a document that's more than 25 years old, but if you are: I hope you find the following feedback valuable. You write that the algorithm in RFC 5256 is merely a restating of your algorithm, but I noticed 3 (minor) differences: 1. In your step 1.C, the RFC says to check whether this would create a loop, and if it would to skip creating the link; your version only says to perform this check in step 1.B. 2. The RFC says to sort the messages by date between your steps 4 and 5; that is: when grouping by subject, containers in the root set should be processed in date-order (you do not specify an order), and that if container in the root set is empty then the subject should be taken from the earliest-date child (you say to use an arbitrary child). 3. The RFC precisely states how to trim a subject down to a "base subject," rather than simply saying Strip \`\`Re:'', \`\`RE:'', \`\`RE[5]:'', \`\`Re: Re[4]: Re:'' and so on. Additionally, there are two minor points on which I found their version to be clearer: 1. The RFC specifies how to handle messages without a Message-Id or with a duplicate Message-Id (on [page 9]), as well as how to normalize a Message-Id (by referring to [RFC 2822]). This is perhaps out-of-scope of your algorithm document, but I feel that it would be worth mentioning in your background or definitions section. 2. In your step 1.B, I did not understand what If they are already linked, don't change the existing links meant until I read the RFC, which words it as If a message already has a parent, don't change the existing link. It was unclear to me what they was referring to in your version.
--
Happy hacking,
~ Luke T. Shumaker
[Jamie Zawinski]: https://www.jwz.org/ [<jwz@jwz.org>]: https://www.jwz.org/about.html [<https://www.jwz.org/doc/threading.html>]: https://www.jwz.org/doc/threading.html [RFC 5256]: https://datatracker.ietf.org/doc/html/rfc5256 [RFC 2822]: https://datatracker.ietf.org/doc/html/rfc2822 [page 9]: https://datatracker.ietf.org/doc/html/rfc5256#page-9