I have recently doing some research on open source email/data extraction utilities, and ran across this one (PEDALS) that is used by the Library of Congress for archiving purposes.
I tested it and was pleased by how it performed. My testing was performed on Microsoft .PST files. PEDALS parses through the .PST files, and enumerates each .MSG (email), and extracts the associated attachments. The email body and metadata are stored in an XML file.
To make this useful for litigation review platforms you need to convert the XML to a .CSV file, and store the Email body as extracted text (e.g., a .txt or rich text file). At this point you could cull the data with a Desktop version of dtSearch, or you can then convert all of the extracted text and attachments to the target media (e.g., searchable postscript based PDF’s with optimized compression for web based download) and load them into the target Litigation Review software you plan to use.
Of course this process would be easier with some accompanying utilities to make those conversions automatically, and generate the target load file.
Fear not, eDiscoverySquad will be working on making this process easier in the future so be sure to watch our Blog for future updates.