I've been working on Thunderbird Conversations for more than a year now, and I've learned a lot about Thunderbird internals over the past months. I thought I'd share some thoughts on the design of Thunderbird Conversations, in the hope that it helps would-be extension authors better grasp the design and the relationship between Gloda, libmime, the message headers, and the message database.

This is the first blog post in a series. In this post, I'll talk how various parts of Thunderbird interact together. In another post, I'll talk about the Thunderbird Conversations design.

Thunderbird fellows, if I'm talking nonsense in this post, please make the right amendments in the comments. Thanks!

An introduction

The mere fact of displaying a message involves a complex pipeline that's responsible from fetching the message to rendering it onto the users' screen. I'll focus on a small portion of the pipeline, and I believe this is the one most extension authors will be interested in.

I'll detail the following situations:

  • displaying a message,
  • examining the structure of a message,
  • indexing (by gloda) of a message,
  • manipulation of a gloda message.

The basics

At the core of Thunderbird is the message store. These are the big 4GB+ files in your "Mail" or "ImapMail" folders in your profile directory. As the name implies, they store all the messages that you have chosen to keep on your computer (this depends on your synchronization and storage settings). Presently, we have one big file for each folder.

Each folder has an summary file (some kind of an index): these are the .msf files that go along each one of the big files mentioned above. When you choose to open a given folder in Thunderbird, we do not re-read every single message in the folder. The point of the summary file is that it contains just the right amount of information so that one can build the message list from it, that is, the part of the screen that lists the contents of the selected folder. Summary files are usually rather small files.

The summary files contain only basic information: sender, recipients, date, and so on. Programatically speaking, they translate into nsIMsgDbHdr instances: all the information that's stored in the .msf files is reflected onto nsIMsgDbHdrs.

There are different execution pipelines, depending on what happens. Here's the simple case where the user chooses to display a message.

Standard "message display" pipeline.

Gloda schema

Fig. 1. A big schema that you should read alongside the explanations!

How do we display the currently selected message in the message list?

What Thunderbird initially holds is a message header (extension authors, this is gFolderDisplay.selectedMessage, a nsIMsgDbHdr). The message store is queried to figure out whether we have this message locally, or if we need to download it. A download can happen if the message just arrived (think about IMAP), or if the user chose not to synchronize the folder the message is in. A partial download will occur, that is, we won't download heavy attachments, but seek so that we just read the associated metadata. Once we have downloaded the message body and all the attachments that are to be displayed alongside the message (text/plain attachments, for instance), libmime kicks in.

libmime is an old, ancient piece of software that lies at the core of Thunderbird. It is responsible for two completely unrelated tasks: parsing a message, and at the same time outputting the corresponding HTML for rendering. Because the two tasks correspond to distinct, separate logics, they naturally are horribly entangled in the libmime code ;-). Yay!

Libmime will parse the message structure, decode attachments, take care of various message encodings, output <hr>s to separate inline attachments, use <fieldset>s for attachment headers, and so on.

Finally, the output of libmime is fed into a XUL <browser> element whose task is to render HTML. This is the message reader, that sits in the bottom right corner of Thunderbird, and this is the part that's responsible for rendering the final HTML.

I am an extension, give me a representation of the message

If you are an extension author, there's good hope you'll want to examine a message at some point. What you hold is usually a nsIMsgDbHdr. There's absolutely no way you can disguise yourself as a libmime consumer, and talk to libmime directly. The ancient API is really hard to work with, plus, the data is not presented in an easily understandable way. What you'll probably do is use a module called mimemsg.js, that abstracts the libmime API away and offers you a nice, hierarchical vision of a message.

The pipeline goes like this: you call MsgHdrToMimeMessage, a function that, as the name implies, translates a nsIMsgDbHdr into a MimeMessage instance. The really nice thing is, mimemsg.js will talk to libmime for you, and will build a hierarchical representation of the message. Once it's done (this is asynchronous), the callback you passed to MsgHdrToMimeMessage will be called with the resulting MimeMessage instance as its second parameter.

let msgHdr = gFolderDisplay.selectedMessage;
MsgHdrToMimeMessage(msgHdr, null, function (aMsgHdr, aMimeMessage) {
  // do something with aMimeMessage:
  dump(aMimeMessage.coerceBodyToPlainText());
  dump(aMimeMessage.allUserAttachments.length);
  dump(aMimeMessage.size);
}, true);

Fig. 2. MsgHdrToMimeMessage example. Read the documentation here.

MimeMessage, MimeBody, MimeMessageAttachment and MimeUnknown are all Javascript classes that represent parts of a message. They all have interesting properties: for instance, MimeMessage has some allAttachments and allUserAttachments properties, a parts property, a body property. MimeMessageAttachment has a url property, a size property, a isReal property, and so on. mimemsg.js is modern JS, is really well commented, easily readable, so if your are interested, you should definitely take a look.

So I don't need Gloda, right? What is Gloda anyway?

Gloda is a global index, kind of like the msf files, except it is Thunderbird-wide, uses an modern storage format, and is searchable efficiently. Gloda runs in the background, and indexes messages as they go: that is to say, it reads the interesting properties (sender, recipient, attachments, body) and stores them in a SQLite database that the user can search later.

Internally, when performing an indexing task, Gloda also starts with message headers. It requests a download of the message if not present in the offline store, obtains a MimeMessage representation of the message, and extracts relevant information from it. It inserts the information into a file called "global-messages.sqlite", and goes on indexing other messages. Indexing happens when new messages arrive, or in the background, in the case of an initial indexing.

I'm an extension author, how am I interested?

Gloda allows you to perform complex searches: this is the spine of Thunderbird Conversations. Gloda allows us to find out about the entire conversation even though only one message of it is currently available (because, say, the rest of the conversation is in other folders). Besides messages, Gloda also allows you to find contacts.

This is not the only reason you might want to use gloda. Gloda contains information that's much richer than what's stored in the message headers. For instance, the allUserAttachments property of MimeMessages is also stored on Gloda messages (it's the attachmentInfos noun). That kind of information is not available with a bare nsIMsgDbHdr. Finally, while asking for a MimeMessage might require the message to be downloaded, querying Gloda is all done locally.

So now let's assume you hold a nsIMsgDbHdr and you want more information than what's stored in the .msf file (remember, that's what the nsIMsgDbHdr corresponds to). Your first reflex should be to find the corresponding GlodaMessage if present. If that doesn't work, you can re-trigger a download of the message (this will be slower) by requesting a MimeMessage representation. This should be your fallback plan.

Gloda.getMessageCollectionForHeaders([gFolderDisplay.selectedMessage], {
  onItemsAdded: function (aItems) {},
  onItemsModified: function () {},
  onItemsRemoved: function () {},
  onQueryCompleted: function (aCollection) {
    let [glodaMsg] = aCollection.items;
    dump("This message has "+glodaMsg.attachmentInfos.length+"\n");
    let domNode = myDocument.getElementsByTagName("img")[0];
    dump("We assume the first attachment is an image so we can display it\n");
    domNode.setAttribute("src", glodaMsg.attachmentInfos[0].url);
  },
}, null);

Fig. 3. Obtaining a GlodaMessage starting with a nsIMsgDbHdr

EDIT: as Andrew kindly pointed out, I'm being over-optimistic since I assume the message has been indexed already. The user might be running with indexing disabled, or the indexer might be busy doing some background task, and if the message just arrived, it might fail to index it in a timely manner. Here's a complete version with a fallback plan:

Gloda.getMessageCollectionForHeaders([gFolderDisplay.selectedMessage], {
  onItemsAdded: function (aItems) {},
  onItemsModified: function () {},
  onItemsRemoved: function () {},
  onQueryCompleted: function (aCollection) {
    if (aCollection.items.length) {
      let [glodaMsg] = aCollection.items;
      dump("This message has "+glodaMsg.attachmentInfos.length+"\n");
      let domNode = myDocument.getElementsByTagName("img")[0];
      dump("We assume the first attachment is an image so we can display it\n");
      domNode.setAttribute("src", glodaMsg.attachmentInfos[0].url);
    } else {
      MsgHdrToMimeMessage(gFolderDisplay.selectedMessage, null, function (aMsgHdr, aMimeMsg) {
        // do the same thing with the MimeMessage API
        dump("This message has "+aMimeMsg.allUserAttachments.length+"\n");
        let domNode = myDocument.getElementsByTagName("img")[0];
        dump("We assume the first attachment is an image so we can display it\n");
        domNode.setAttribute("src", aMimeMsg.allUserAttachments[0].url);
      }, true); // true means force the message into being downloaded... this might take some time!
    }
  },
}, null);

Fig. 4. Complete sample that first tries to find the requested information with Gloda, and fallbacks to a complete streaming of the message with MimeMessage if the message isn't found in Gloda.

Gloda doesn't have the information I'm interested in

I can hear astute readers telling me that manipulating MimeMessages will always be better that manipulating GlodaMessages: although it might be slower to obtain a MimeMessage, it has all the information: all the data, all the headers... GlodaMessages only have what's needed for indexing and searching.

Surprise! You can write a Gloda plugin that will attach the custom information you want in the Gloda index. That way, you'll never have to re-stream a message to find out about a X-Whatever header: you just have to make sure you insert the X-Whatever information in the Gloda index, and voilĂ , you're good to go.

For instance, Thunderbird Conversations packages a gloda plugin that searches for bugzilla information and attaches it to the gloda message. The Gloda plugin we package is passed the MimeMessage representation Gloda obtained, extracts relevant information from it, and attaches it to the Gloda message that's to be stored in the database.

I want complex information about the message, and I also want to display it

You'll have to use Gloda or mimemsg.js to obtain the metadata you're interested in, and then trigger the streaming again into an iframe. This might result in the message being streamed twice if you go the MimeMessage way, but there's no easy way to work around this right now. Walking the MimeMessage and synthetizing the HTML to be rendered yourself is error-prone, so for the time being, you're better off leaving it all to libmime (remember, libmime not only decodes the raw messages but also decides how to display it).

I walked into the gloda land, and I want to display one of the messages I just found

GlodaMessage instances have a folderMessage property (possibly null) that represents the original nsIMsgDbHdr if, say, you want to display it. That's true, Gloda sometimes remembers dead messages.

That's all folks. I'll try to write another blog post detailing the issues I've faced and the design choices I've made when writing Thunderbird Conversations. Thanks for reading!