Hadoop + EC2 + S3 = Super alternatives for researchers (& real people too!)
I recently discovered and have been inspired by a real-world and non-trivial (in space and in time) application of Hadoop (Open Source implementation of Google's MapReduce ) combined with the Amazon Simple Storage Service ( Amazon S3 ) and the Amazon Elastic Compute Cloud ( Amazon EC2 ). The project was to convert pre-1922 New York Times articles-as-scanned-TIFF-images into PDFs of the articles: Recipe: 4 TB of data loaded to S3 (TIFF images) + Hadoop (+ Java Advanced Imaging and various glue) + 100 EC2 instances + 24 hours = 11 million PDFs , 1.5 TB on S3 Unfortunately, the developer ( Derek Gottfrid ) did not say how much this cost the NYT. But here is my back-of-the-envelope calculation (using the Amazon S3/EC2 FAQ ): EC2: $0.10 per instance-hour x 100 instances x 24hrs = $240 S3 : $0.15 per GB-Month x 4500 GB x ~1.5/31 months = ~$33 + $0.10 per GB of data transferred in x 4000 GB = $400 + $0.13 per GB of data transferred out x 1500 GB = $195 Total: = ~$868 Not unre...