Newseum Page Grabber Script

2010-02-07 09:08:00

Newseum archives the front pages of of over 500 newspapers from all around the world. If you know the ID of the papers you want to see you can use this simple Python program to download the jpg of the papers' front page to your local system. Edit the CITIES list to set the IDs of the papers to be grabbed.

#!/usr/bin/env python
"""
    Quick Newseum Frontpage Grabber script
    Copyright 2009 by Brian C. Lane
    Imp Software
    All Rights Reserved

    Modify CITIES list below to add the city designators (as seen in the
    URLS at http://www.newseum.org/todaysfrontpages/default.asp)
"""
import urllib2
import re
import os
import urlparse

# Add more cities here
CITIES = [ "AL_AS", "AL_MA",   ]

NEWSEUM_URL="http://www.newseum.org/todaysfrontpages/hr.asp?fpVname=%s"
NEWSEUM_IMG="http://www.newseum.org"

def fetchNewseumImage(city):
    """
    Fetch the image for a city
    """
    print "Parsing the page for %s" % (city)
    page = urllib2.urlopen(NEWSEUM_URL % city).read()

    # Quick and dirty grep for the image name
    match = re.search('<img class="tfp_lrg_img" src="(.*)" alt=', page)
    if match:
        img_url = NEWSEUM_IMG + os.path.abspath(match.group(1))
        print "Saving the image for %s" % (city)
        image = urllib2.urlopen(img_url).read()
        open(os.path.basename(match.group(1)), "wb").write(image)

def main():
    """
    Main code goes here
    """
    for city in CITIES:
        fetchNewseumImage(city)

if __name__ == '__main__':
    main()

The source is also hosted here at github