閱讀RSS提要


RSS(豐富站點摘要)是一種用於提供定期更改的Web內容的格式。 許多與新聞相關的網站,網路紀錄檔和其他線上發布商將其內容作為RSS Feed聯合到任何想要它的人。 在python中,借助以下包來讀取和處理這些提要。

pip install feedparser

Feed結構

在下面的範例中,我們獲取了Feed的結構,以便可以進一步分析要處理的Feed的哪些部分。

import feedparser
NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")
entry = NewsFeed.entries[1]

print entry.keys()

執行上面範例程式碼,得到以下結果 -

['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']

Feed標題和貼文

在下面的範例中,我們讀取了rss feed的標題和頭部。

import feedparser

NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")

print 'Number of RSS posts :', len(NewsFeed.entries)

entry = NewsFeed.entries[1]
print 'Post Title :',entry.title

執行上面範例程式碼,得到以下結果 -

Number of RSS posts : 5
Post Title : Cong-JD(S) in SC over choice of pro tem speaker

Feed詳情

基於上面的輸入結構,可以使用python程式從feed中匯出必要的細節,如下所示。 由於專案是字典,所以可利用其鍵來產生所需的值。

import feedparser

NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms")

entry = NewsFeed.entries[1]

print entry.published
print "******"
print entry.summary
print "------News Link--------"
print entry.link

當我們執行上面的程式時,得到以下輸出 -

Fri, 18 May 2018 20:13:13 GMT
******
Controversy erupted on Friday over the appointment of BJP MLA K G Bopaiah as pro tem speaker for the assembly, with Congress and JD(S) claiming the move went against convention that the post should go to the most senior member of the House. The combine approached the SC to challenge the appointment. Hearing is scheduled for 10:30 am today.
------News Link--------
https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms