Migrating Gmail Between Accounts
Using OfflineIMAP to migrate email between Gmail accounts while retaining labels and message dates.
TL;DR
You can just jump ahead to view the results or the configuration used.
Leading a Double Life
I’ve been leading a double life for 13 years.
I’ve had my own domain since April 2000, and ran my own mail server. In February 2005 I signed up for a Gmail account when it was a beta service. This means that I’ve had two email accounts in parallel.
In January 2012 I moved my domain to Google Apps (now G Suite), so I have two Gmail accounts each with their accumulated emails.
I decided that it was finally time to try and move across and use my own domain as the primary account.
I decided to use OfflineIMAP to move the email messages across. I simply installed version 7.0.2 from the Debian stable repository.
I have used OfflineIMAP in the past. I think it was so I could read/compose email using Mutt while offline in the days when WiFi was not ubiquitous.
Approach
I searched the Internet to see if this had been done already, read the documentation, and looked through the example configuration file.
When I needed to understand some of the internal details of OfflineIMAP (normally when my configuration had triggered an exception) I had the source code available. I examined the Python scripts locally, or I could have cloned the source repository.
I would pull from the Gmail folders [Gmail]/All Mail
and [Gmail]/Sent Mail
, and then push to Gmail folders Xfer/All Mail
and Xfer/Sent Mail
.
I discovered that OfflineIMAP would synchronise the labels, so I would
not lose the benefits of the extensive filtering on the source account
over the years. They would gain a label Xfer/All Mail
or Xfer/Sent
,
but that could be useful to know in the future.
I configured OfflineIMAP with two accounts. One would be used to pull from the source Gmail account to a local folder, and the second would be used to push from the local folder to the target Gmail account.
This had the bonus benefit of being able to run first with the Pull account and view the messages, and then run the Push account to see the result of the upload.
I initially started by specifying maxage
so I could limit my
experiments to recent messages.
Tweaks
My first attempt was running quickly. I pulled down messages from the source, and pushed then up to the target. The messages had the original labels, and they retained the “Unread”, “Flagged”, and “Imported” status.
Message Dates
I noticed that the date displayed against the message (in the list view, and in message view) was the time I ran the import. That had to change.
Date Header
I updated the configuration, and added utime_from_header = yes
.
This sets the file modification time on the message file from the “Date”
header when downloaded, and then that timestamp is used as the received
timestamp when the message is uploaded.
This passed a quick inspection, but I came across a message that hadn’t had the timestamp set. Viewing the message revealed the date header in the email was:
Date: Fri, 30 3 7 15:4:26 -4
It was no wonder that this date had failed to be parsed.
Received Header
This lead me to closer inspection of the results. There were some minor discrepancies between the date displayed on the source account and the target account.
The date displayed by Gmail is not from the email message, but is the “Created on” date (use “Show original” to see). This means you see when the message arrived in the mailbox, not the time it claimed to be (given incorrect clocks or delivery delays).
Fortunately, Gmail correctly adds “Received” headers, so this information is available in the message.
Received: by 10.100.91.11 with SMTP id o11cs412060anb;
Sun, 1 Apr 2007 08:17:24 -0700 (PDT)
I wrote the helper script set-received-mtime
which extracts the first
“Received” header, or if not available extracts the “Date” header.
This date is used to set the modification time on the message file.
This is combined with presynchook
and postsynchook
to set the
modification time on the newly downloaded messages.
Running
Customise the configuration for your accounts.
If you are using 2-Step Verification to sign into your Google account you will need to create App passwords. Otherwise you will need to turn on access for less secure apps.
Check the configuration:
$ offlineimap -c offlineimaprc --info
Run a download and check message contents and timestamps:
$ offlineimap -c offlineimaprc -o -a Pull
Run a push and verify uploaded messages:
$ offlineimap -c offlineimaprc -o -a Push
To verify on a smaller dataset before the full migration, set maxage
in
the [Account Pull]
section. When you are ready for the full migration
remove the metadata from ~/.offlineimap
and the downloaded folders
All Mail
and Sent Mail
.
Results
How Long Did It Take?
I downloaded a total of ~120,000 messages totalling ~4GB in around 6½ hours.
The upload took considerable longer. I ran it over a couple of days, and I had to restart the process many times due to an exception being thrown (often the dreaded “Too many read 0”). Fortunately OfflineIMAP is designed to pick-up the process from where it had got to when next run. I think the actual execution time was ~40 hours.
Making a incremental run to fetch any newer messages takes just a couple of minutes (but see limitations below).
Tweaking
There was some minor tweaking to do following the upload.
I’d used nested labels, and they became flattened. So instead of the ‘group’ label having the nested labels ‘item1’ and ‘item2’, I had two labels ‘group/item1’ and ‘group/item2’. This was easily rectified by manually creating the ‘group’ label.
I needed to set update the label settings to apply the settings for “Show in label list” (“show”, “hide”, “show if unread”) and “Show in message list” (“show”, “hide”) to match the source account.
Limitations
Although you can run the synchronisation again, it only transfers new message, but does not update the flags or labels.
This is due to the message filename being changed to match the destination
Gmail account UID, so it no longer matches the source Gmail account.
Fortunately a combination of OfflineIMAP’s ‘FMD5’ and the readonly = True
configuration means source messages were not deleted or infinitely
duplicated.
Future Work
OfflineIMAP
It should be possible to incorporate the parsing of the “Received”
date into OfflineIMAP, and then this timestamp could be used by the
utime_from_header
configuration.
If ‘Gmail’ were supported as a local repository (by introducing ‘MappedGmailRepository’) then a direct Gmail to Gmail synchronisation could be possible.
imaplib2
I believe the “Too many read 0” exception thrown by imaplib2 is due to the assumption that an socket being reported as ready for read will lead to data being readable is violated by using a wrapped SSL socket.
I think a simple script connecting to Gmail with accompanying wireshark capture could confirm/deny my theory.
Configuration
My solution requires an offlinemaprc
configuration file, and a helper
script set-received-mtime
.
This worked successfully for me, but YMMV. Take some time to understand it as OfflineIMAP is a powerful tool.
offlineimaprc
This is the configuration file used by OfflineIMAP.
Place this into the directory created for performing the migration,
and configure the value of localfolders
in [DEFAULT]
section to match.
Set the correct credentials for the [Repository Source]
and [Repository Target]
.
If you want to limit the email pulled back for initial testing, set the value
of maxage
in the [Account Pull]
section.
[DEFAULT]
localfolders = ~/GmailMigration
timestamp = %(localfolders)s/syncstart.timestamp
sslcacertfile = /etc/ssl/certs/ca-certificates.crt
auth_mechanisms = PLAIN
synclabels = yes
[general]
accounts = Pull, Push
socktimeout = 60
[Account Pull]
remoterepository = Source
localrepository = In
presynchook = [ -f %timestamp)s ] || touch %(timestamp)s
postsynchook = \
find %(localfolders)s -newer %(timestamp)s -name '*,FMD5=*' -print0 \
| xargs -r0 %(localfolders)s/set-received-mtime \
&& rm %(timestamp)s
# maxage = 2018-09-01
[Account Push]
remoterepository = Target
localrepository = Out
[Repository Source]
type = Gmail
remoteuser = myaccount@gmail.com
remotepass = mysecretpassword
folderfilter = lambda folder: folder in ['[Gmail]/All Mail', '[Gmail]/Sent Mail']
nametrans = lambda folder: folder.replace('[Gmail]/', '', 1)
readonly = True
[Repository In]
type = GmailMaildir
[Repository Out]
type = Maildir
nametrans = lambda x: 'Xfer/' + x
readonly = True
[Repository Target]
type = Gmail
remoteuser = me@example.com
remotepass = mysecretpassword
folderfilter = lambda folder: folder in ['Xfer/All Mail', 'Xfer/Sent Mail']
nametrans = lambda folder: folder.replace('Xfer/', '', 1)
set-received-mtime
Place this script in the migration directory, and make it executable.
#! /bin/sh -e
receivedHeader() {
< "$1" sed -n -e '
/^Received:/{
h
: o
n
/^[ \t]/ {
H
b o
}
x
s/\n//g
s/Received:\s*//
s/.*;\s*//
p
q
}
/^$/q'
}
dateHeader() {
< "$1" sed -n -e '
/^Date:/{
s/Date:\s*//
p
q
}
/^$/q'
}
for message
do
mtime="$(receivedHeader "$message")"
[ -z "$mtime" ] && mtime="$(dateHeader "$message")"
[ -n "$mtime" ] && touch -m -d "$mtime" "$message"
done