So for a long time I’ve been thinking about archiving my Gmail account as i have been a customer of Google’s since around 2004. Recently Gmail had a bit of a blackout and i worried that the interwebs apocalypse was upon us. It really made me think what if when the service returns my mail is missing. In truth my life is in that ickle 10.9GB mailbox and I of all people didn’t have a backup. I spend my life telling others to backup, backup, backup!!!!.
Anyway with a little bit of help from Matt Cutts wonderful blog i was able to implement a simple backup. Here is a link to the blog post…. http://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/ Matt is a Software Engineer at Google and currently heads up the Webspam team. Anyway enough on Matt… For your convenience i have copied Matt’s post below for you to read but i don’t take any credit for the work.
How to back up your Gmail on Linux in four easy steps by Matt Cutts.
I really like Gmail, but I also like having backups of my data just in case. Here’s how to use a simple program called getmail on Unix to backup your Gmail or Google Apps email. We’ll break this into four steps.
Step 0: Why getmail?
If you browse around on the web, you’ll find several options to help you download and backup your email. Here are a few:
- If you use Windows, you can back up your email using Thunderbird or you can use Outlook to backup your email.
- If you run Mac OS X (Leopard), you can back up your Gmail using Apple Mail. For the rest of this post, I’ll assume you’re running a flavor of Linux such as Ubuntu.
- If you need a ton of flexible power or run your own mail server, fetchmail could be a good choice.
- If you want something really fast, retchmail might fit your needs.
- If you want a nice mix of simple configuration and flexibility, I really recommend a Python program called getmail. That’s what we’ll be using in this post.
Step 1: Install getmail
On Ubuntu 7.10 (Gutsy Gibbon), you would type
sudo apt-get install getmail4
at a terminal window. Hey, that wasn’t so bad, right? If you use a different flavor of Linux, you can download getmail and install it with a few commands like this:
cd /tmp
[Note: wget the tarball download link found at http://pyropus.ca/software/getmail/#download ]
tar xzvf getmail*.tar.gz
cd (the directory that was created)
sudo python setup.py install
Step 2: Configure Gmail and getmail
First, turn on POP in your Gmail account. Because you want a copy of all your mail, I recommend that you choose the “Enable POP for all mail” option. On the “When messages are accessed with POP” option, I would choose “Keep Gmail’s copy in the Inbox” so that Gmail still keeps your email after you back up your email.
For this example, let’s assume that your username is bob@gmail.com and your password is bobpassword. Let’s also assume that you want to back up your email into a directory called gmail-archive and that your home directory is located at /home/bob/.
I have to describe a little about how mail is stored in Unix. There are a couple well-known methods to store email: mbox and Maildir. When mail is stored in mbox format, all your mail is concatenated together in one huge file. In the Maildir format, each email is stored in a separate file. Needless to say, each method has different strengths and weaknesses. For the time being, let’s assume that you want your email in one big file (the mbox format) and work through an example.
Example with mbox format
- Make a directory called “.getmail” in your home directory with the command “mkdir ~/.getmail”. This directory will store your configuration data and the debugging logs that getmail generates.
- Make a directory called gmail-archive with the command “mkdir ~/gmail-archive”. This directory will store your email.
- Make a file ~/.getmail/getmail.gmail and put the following text in it:
[retriever]
type = SimplePOP3SSLRetriever
server = pop.gmail.com
username = bob@gmail.com
password = bobpassword[destination]
type = Mboxrd
path = ~/gmail-archive/gmail-backup.mbox[options]
# print messages about each action (verbose = 2)
# Other options:
# 0 prints only warnings and errors
# 1 prints messages about retrieving and deleting messages only
verbose = 2
message_log = ~/.getmail/gmail.log
- Added: Run the command “touch ~/gmail-archive/gmail-backup.mbox” . If you change the path in the file above, touch whatever filename you used. This command creates an empty file that getmail can then append to.
The file format should be pretty self-explanatory. You’re telling getmail to fetch your email from pop.gmail.com via a POP3 connection over SSL (which prevents people from seeing your email as it passes between Gmail and your computer). The [destination] section tells where to save your email, and in what format. The “Mboxrd” is a flavor of the mbox format — read this page on mbox formats if you’re really interested. Finally, we set options so that getmail generates a verbose log file that will help in case there are any snags.
Example with Maildir format
Suppose you prefer Maildir instead? You’d still run “mkdir ~/.getmail” and “mkdir ~/gmail-archive”. But the Maildir format uses three directories (tmp, new, and cur). We need to make those directories, so type “mkdir ~/gmail-archive/tmp ~/gmail-archive/new ~/gmail-archive/cur” as well. In addition, change the [destination] section to say
[destination]
type = Maildir
path = ~/gmail-archive/
Otherwise your configuration file is the same.
Step 3: Run getmail
The good news is that step 2 was the hard part. Run getmail with a command like “getmail -r /home/bob/.getmail/getmail.gmail” (use the path to the config file that you made in Step 2). With any luck, you’ll see something like
getmail version 4.6.5
Copyright (C) 1998-2006 Charles Cazabon. Licensed under the GNU GPL version 2.
SimplePOP3SSLRetriever:bob@gmail.com@pop.gmail.com:995:
msg 1/99 (7619 bytes) from <info@example.com> delivered to Mboxrd /home/bob/gmail-archive/gmail-backup.mbox
msg 2/99 (6634 bytes) from <sales@example.com> delivered to Mboxrd /home/bob/gmail-archive/gmail-backup.mbox
…
99 messages retrieved, 0 skipped
Summary:
Retrieved 99 messages from SimplePOP3SSLRetriever:bob@gmail.com@pop.gmail.com:995
Hooray! It works! But wait — I have over 99 messages, you say. Why did it only download 99 messages? The short answer is that Gmail will only let you down a few hundred emails at a time. You can repeat the command (let getmail finish each time before you run it again) until all of your email is downloaded.
Step 4: Download new email automatically
A backup is a snapshot of your email at one point in time, but it’s even better if you download and save new email automatically. (This step will also come in handy if you have a ton of Gmail and don’t want to run the command from Step 3 over and over again for hours to download all your mail.)
We’re going to make a simple cron job that runs periodically to download new email and preserve it. First, make a very short file called /home/bob/fetch-email.sh and put the following text in the file:
#!/bin/bash
# Note: -q means fetch quietly so that this program is silent
/usr/bin/getmail -q -r /home/bob/.getmail/getmail.gmail
Make sure that the file is readable/executable with the command “chmod u+rx /home/bob/fetch-email.sh”. If you want to make sure the program works, run the command “/home/bob/fetch-email.sh”. The program should execute without generating any output, but if there’s new email waiting for you it will be downloaded. This script needs to be silent or else you’ll get warnings when you run the script using cron.
Now type the command “crontab -e” and add the following entry to your crontab:
# Every 10 minutes (at 7 minutes past the hour), fetch my email
7,17,27,37,47,57 * * * * /home/bob/fetch-email.sh
This crontab entry tells cron “Every 10 minutes, run the script fetch-email.sh”. If you wanted to check less often (maybe once an hour), change “7,17,27,37,47,57″ to “7″ and the cron job will run at 7 minutes after every hour. That’s it — you’re done! Enjoy the feeling of having a Gmail backup in case your net connection goes down.
Bonus info: Back up in both mail formats at once!
As I mentioned, mbox and Maildir have different advantages. The mbox format is convenient because you only need to keep track of one file, but editing/deleting email from that huge file is a pain. And when one program is trying to write new email while another program is trying to edit the file, things can sometimes go wrong unless both programs are careful. Maildir is more robust, but it chews through inodes because each email is a separate file. It also can be harder to process Maildir files with regular Unix command-line tools, just because there are so many email files.
Why not archive your email in both formats just to be safe? The getmail program can easily support this. Just change your [destination] information to look like this:
[destination]
type = MultiDestination
destinations = (‘[mboxrd-destination]‘, ‘[maildir-destination]‘)[mboxrd-destination]
type = Mboxrd
path = ~/gmail-archive/gmail-backup.mbox[maildir-destination]
type = Maildir
path = ~/gmail-archive/
Note that you’ll still have to run all the “mkdir” commands to make the “gmail-archive” directory, as well as the tmp, new, and cur directories under the gmail-archive directory.
Bonus reading!
What, you’re still here? Okay, if you’re still reading, here’s a few pointers you might be interested in:
- The main getmail site includes a page with lots of getmail examples of configuration files. The getmail website has a ton of great documentation, too. Major props to Charles Cazabon for his getmail program.
- This write-up from about a year ago covers how to back up Gmail as well.
- The author of getmail seems to hang out quite a bit on this getmail mailing list. See the main site for directions on signing up for the list.
- If you’re interested in a more powerful setup (e.g. using Gmail + getmail + procmail), this is a useful page.
- For the truly sadistic, learn the difference between a Mail User Agent (MUA) and a Mail Transfer Agent (MTA) and how email really gets delivered in Unix.
- I’ve been meaning to write all this down for months. Jeff Atwood’s recent post finally pushed me over the edge. Jeff describes a program that offers to “archive your Gmail” for $29.95, but when you give the program your username/password it secretly mails your username/password to the program’s creator. That’s pretty much pure evil in my book. And the G-Archiver program isn’t even needed! Because Gmail will export your email for free using POP or IMAP, it’s not hard to archive your Gmail. So I wrote up how I back up my Gmail in case it helps anyone else. Enjoy!
Added March 16, 2008: Several people have added helpful comments. One of my favorites led me to a post by commenter Peng about how to back up Gmail with IMAP using getmail. Peng describes how to back up the email by label as well. He mentions that you could use the search “after:2007/1/1 before:2007/3/31″ and assign the label FY07Q1 to the search results, for example. Then you can back up that single label/mailbox by making the getmail config file look like this:
[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
username = username
password = password
mailboxes = (“FY07Q1″,)[destination]
type = Mboxrd
path = ~/.getmail/gmail-backup-FY07Q1.mbox
Peng also mentions a nice bonus: since you’re backing up via IMAP instead of POP, there’s no download limit. That means that you don’t have to run the getmail program repeatedly. Thanks for mentioning that Peng!