About

I created this application out of a personal need. I discovered that over the years I haven't exactly been great about maintaining my address book. I've lost touch with many of the acquaintances that I have had casual communication with. I realized that their email addresses were trapped in the deep bowels of my Gmail account, if only there was some way to extract them. I quickly realized that using the newly released IMAP protocol I could probe every message and then extract out the email addresses from it, and in some cases even additional data like the first and last name. I started playing around with the scripting a bit and came up with what I have now. This tool goes out to the Gmail IMAP server and downloads the message header from every email that is stored in my Gmail account (except the SPAM folder). It pulls them into a master list, along with the first and last name if available. After all of the emails have been extracted it calculates some basic statistics including frequency of occurrence, which it uses to sort them on. All of this information is then exported into CSV files that are compatible with many different applications.

While doing this I discovered an additional use for this data. On many of the social networking sites like Facebook and LinkedIn they will allow you to import a contact list file and find everyone that has registered for the service that you already know. This turned out to be a really killer app for this functionality. The only challenge was that they limit your ability to upload contacts to about 2,000 at one go. I added some additional functionality to my extract that "chunked" the file into several files, each with 2,000 email addresses in it. This allowed me to upload my newly discovered contacts a bit at a time, which worked very well.

When you run this application against your Gmail account you will get an email that contains several things. First off, you will get some statistics including number of emails, number of unique email addresses, time to execute the extract. In addition you will receive a zip file attachment that contains several CSV files. There is a "master" CSV file that will contain all of the information found. There is also going to be the "chunked" versions. There will be how many ever of these are required depending on the number of emails you have in your account. In example, I have 65339 emails and 15134 unique email addresses. If you break these out into 2,000 email address "chunks" it results in 8 files. The files that are created are ordered by email address frequency. This means that the 2,000 emails in the first file occur more often than the second file and so on.

As you can imagine this service takes a considerable amount of CPU power in order to complete. I'm running the back-end processing servers on Amazon EC2. In order to keep this site up I need your donations to cover the costs associated with it. If you found this application useful, please consider donating a few bucks my way. Thank you in advance.

If you have any additional questions please feel free to email me at jason@vallery.net.