summaryrefslogtreecommitdiff
path: root/public/message-threading.md
blob: eb83705f7efc16552453b583ed9bdc2663832dd6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
Notes on email message threading
================================
---
date: "2024-06-08"
markdown_options: "-smart"
---

> I sent an email to Jamie Zawinski with feedback on his venerable
> email threading algorithm.  Perhaps my commentary will be a useful
> reference to others implementing email threading.
>
> You can see my implementation of his algorithm at
> <https://git.lukeshu.com/www/tree/cmd/generate/mailstuff/thread_alg.go>
> (and a use of it at
> <https://git.lukeshu.com/www/tree/cmd/generate/mailstuff/thread.go>).

<div style="font-family: monospace">
To: [Jamie Zawinski] [&lt;jwz@jwz.org&gt;]<br/>
Subject: message threading<br/>
Date: Sat, 08 Jun 2024 22:34:41 -0600
Message-ID: &lt;87tti2ybry.wl-lukeshu@lukeshu.com&gt;
</div>

Hi,

I'm implementing message threading, and have been referencing both
your document [&lt;https://www.jwz.org/doc/threading.html&gt;]; and [RFC 5256].
I'm not sure whether you're interested in updating a document that's
more than 25 years old, but if you are: I hope you find the following
feedback valuable.

You write that the algorithm in RFC 5256 is merely a <q>restating</q> of
your algorithm, but I noticed 3 (minor) differences:
   
1. In your step 1.C, the RFC says to check whether this would create a
   loop, and if it would to skip creating the link; your version only
   says to perform this check in step 1.B.
   
2. The RFC says to sort the messages by date between your steps 4 and
   5; that is: when grouping by subject, containers in the root set
   should be processed in date-order (you do not specify an order),
   and that if container in the root set is empty then the subject
   should be taken from the earliest-date child (you say to use an
   arbitrary child).

3. The RFC precisely states how to trim a subject down to a "base
   subject," rather than simply saying <q>Strip \`\`Re:'', \`\`RE:'',
   \`\`RE[5]:'', \`\`Re: Re[4]: Re:'' and so on.</q>

Additionally, there are two minor points on which I found their
version to be clearer:

1. The RFC specifies how to handle messages without a Message-Id or
   with a duplicate Message-Id (on [page 9]), as well as how to
   normalize a Message-Id (by referring to [RFC 2822]).  This is perhaps
   out-of-scope of your algorithm document, but I feel that it would
   be worth mentioning in your background or definitions section.

2. In your step 1.B, I did not understand what <q>If they are already
   linked, don't change the existing links</q> meant until I read the
   RFC, which words it as <q>If a message already has a parent, don't
   change the existing link.</q>  It was unclear to me what <q>they</q> was
   referring to in your version.

<div style="font-family: monospace">
-- <br/>
Happy hacking,<br/>
~ Luke T. Shumaker<br/>
</div>

[Jamie Zawinski]: https://www.jwz.org/
[&lt;jwz@jwz.org&gt;]: https://www.jwz.org/about.html
[&lt;https://www.jwz.org/doc/threading.html&gt;]: https://www.jwz.org/doc/threading.html
[RFC 5256]: https://datatracker.ietf.org/doc/html/rfc5256
[RFC 2822]: https://datatracker.ietf.org/doc/html/rfc2822
[page 9]: https://datatracker.ietf.org/doc/html/rfc5256#page-9