SoFunction
Updated on 2024-11-19

python basic tutorial project four news aggregation

The fourth exercise in the Python Basics book, News Aggregation. The main function of this program is to collect information from a specified source (in this case a Usenet newsgroup) and save it to a specified destination file (two forms are used here: plain text and html). The use of this program is somewhat similar to the current blog subscription tools or RSS feeds.

Let's start with the code and then we'll analyze them one by one:

from nntplib import NNTP
from time import strftime,time,localtime
from email import message_from_string
from urllib import urlopen
import textwrap
import re
day = 24*60*60
def wrap(string,max=70):
    '''
    '''
    return '\n'.join((string)) + '\n'
class NewsAgent:
    '''
    '''
    def __init__(self):
         = []
         = []
    def addSource(self,source):
        (source)
    def addDestination(self,dest):
        (dest)
    def distribute(self):
        items = []
        for source in :
            (())
        for dest in :
            (items)
class NewsItem:
    def __init__(self,title,body):
         = title
         = body
class NNTPSource:
    def __init__(self,servername,group,window):
         = servername
         = group
         = window
    def getItems(self):
        start = localtime(time() - *day)
        date = strftime('%y%m%d',start)
        hour = strftime('%H%M%S',start)
        server = NNTP()
        ids = (,date,hour)[1]
        for id in ids:
            lines = (id)[3]
            message = message_from_string('\n'.join(lines))
            title = message['subject']
            body = message.get_payload()
            if message.is_multipart():
                body = body[0]
            yield NewsItem(title,body)
        ()
class SimpleWebSource:
    def __init__(self,url,titlePattern,bodyPattern):
         = url
         = (titlePattern)
         = (bodyPattern)
    def getItems(self):
        text = urlopen().read()
        titles = (text)
        bodies = (text)
        for  in zip(titles,bodies):
            yield NewsItem(title,wrap(body))
class PlainDestination:
    def receiveItems(self,items):
        for item in items:
            print 
            print '-'*len()
            print 
class HTMLDestination:
    def __init__(self,filename):
         = filename
    def receiveItems(self,items):
        out = open(,'w')
        print >> out,'''
        <html>
        <head>
         <title>Today's News</title>
        </head>
        <body>
        <h1>Today's News</hi>
        '''
        print >> out, '<ul>'
        id = 0
        for item in items:
            id += 1
            print >> out, '<li><a href="#" rel="external nofollow" >%s</a></li>' % (id,)
        print >> out, '</ul>'
        id = 0
        for item in items:
            id += 1
            print >> out, '<h2><a name="%i">%s</a></h2>' % (id,)
            print >> out, '<pre>%s</pre>' % 
        print >> out, '''
        </body>
        </html>
        '''
def runDefaultSetup():
    agent = NewsAgent()
    bbc_url = '/text_only.stm'
    bbc_title = r'(?s)a href="[^" rel="external nofollow" ]*">\s*<b>\s*(.*?)\s*</b>'
    bbc_body = r'(?s)</a>\s*<br/>\s*(.*?)\s*<'
    bbc = SimpleWebSource(bbc_url, bbc_title, bbc_body)
    (bbc)
    clpa_server = ''
    clpa_group = ''
    clpa_window = 1
    clpa = NNTPSource(clpa_server,clpa_group,clpa_window)
    (clpa)
    (PlainDestination())
    (HTMLDestination(''))
    ()
if __name__ == '__main__':
    runDefaultSetup()

The program, first analyzed as a whole, focuses partly on NewsAgent, which serves to store the source of the news, stores the destination address, and then calls the source server (NNTPSource as well as SimpleWebSource) as well as the class that writes the news (PlainDestination and HTMLDestination). So from here it is also clear that NNTPSource is specialized to get information from news servers, SimpleWebSource is to get data from a url. The role of PlainDestination and HTMLDestination is very obvious, the former is used to output the obtained content to the terminal, the latter is to write data to the html file.

With that analysis, then in looking at what's in the main program, the main program is here to add the source and output destination addresses to NewsAgent.

It's really a simple program, but this one uses layering.

This is the whole content of this article.