專案(Item)物件是Python中的常規的字典型別。我們可以用下面的語法來存取類的屬性:
>>> item = YiibaiItem()
>>> item['title'] = 'sample title'
>>> item['title']
'sample title'
新增上述程式碼到下面的例子中:
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
from first_scrapy.items import YiibaiItem
class firstSpider(scrapy.Spider):
name = "first"
allowed_domains = ["tw511.com"]
start_urls = [
"/5/59/1786.htmlscrapy_create_project.html",
"/5/59/1787.html"
]
def parse(self, response):
# 所有教學名稱及連結 ...
for sel in response.xpath('//ul/li'):
item = YiibaiItem()
item['title'] = sel.xpath('a/text()').extract()
item['link'] = sel.xpath('a/@href').extract()
item['desc'] = sel.xpath('text()').extract()
yield item
因此,上述蜘蛛的部分輸出結果是:
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/3/39/1360.html'],
'title': [u'Python3u6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/3/36/1261.html7/'],
'title': [u'PHP7u6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/65/2006.html'],
'title': [u'Excelu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/19/157/4594.html/uml/'],
'title': [u'UML']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/76/2333.html/'],
'title': [u'Socketu7f16u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/74/2301.html/'],
'title': [u'Radiusu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/nodejs/'],
'title': [u'Node.jsu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/svn/'],
'title': [u'SVNu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/67/2082.html'],
'title': [u'Gitu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/makefile/'],
'title': [u'Makefile']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/79/2378.html'],
'title': [u'Unix']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/79/2378.html_commands/'],
'title': [u'Linux/Unixu547du4ee4']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/79/2378.html_system_calls/'],
'title': [u'Unix/Linuxu7cfbu7edfu8c03u7528']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/6/75/2304.html'],
'title': [u'Shell']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/drools/'],
'title': [u'Droolsu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/linq/'],
'title': [u'LinQu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/wcf/'],
'title': [u'WCFu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/142/4124.html'],
'title': [u'MySQLu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/144/4212.html'],
'title': [u'PL/SQLu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/145/4235.html'],
'title': [u'PostgreSQLu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/141/4072.html'],
'title': [u'MongoDBu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/149/4377.htmlite'],
'title': [u'SQLiteu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/137/3953.html'],
'title': [u'DB2u6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/146/4288.html'],
'title': [u'Redisu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/140/4054.html'],
'title': [u'Memcachedu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/134/3881.html'],
'title': [u'Accessu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/149/4377.html'],
'title': [u'SQLu6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/149/4377.html_server/'],
'title': [u'SQL Serveru6559u7a0b']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/20/206/8011.html'],
'title': [u'Java']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/3/39/1360.html'],
'title': [u'Python']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'/18/142/4124.html'],
'title': [u'MySQL']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/articles'],
'title': [u'u6700u65b0u6587u7ae0']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'],
'link': [u'https://www.tw511.com/login/byqq'],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
',
u'
',
u'
',
u'
',
u'
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
', u'
'], 'link': [], 'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
'], 'link': [], 'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u5b89u88c5xa0', u'&amd64
'],
'link': [u'http://sourceforge.net/projects/pywin32/'],
'title': [u'pywin32']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u5b89u88c5 Python2.7.9 u4ee5u4e0bu7684xa0',
u'xa0u6216u8005u4e0bu8f7du5730u5740uff1axa0',
u'
'],
'link': [u'https://pip.pypa.io/en/latest/installing/',
u'https://pypi.python.org/pypi/setuptools#files',
u'https://pypi.python.org/pypi/setuptools#files'],
'title': [u'pip', u'https://pypi.python.org/pypi/setuptools#files', u'xa0']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u60a8u53efu4ee5u901au8fc7u4f7fu7528u4ee5u4e0bu547du4ee4u6765u68c0u67e5 pip u7248u672cuff1a
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u5b89u88c5twisteduff0cu4e0bu8f7du5730u5740 -',
u'
'],
'link': [u'https://pypi.python.org/packages/2.7/T/Twisted/Twisted-13.0.0.win32-py2.7.msi#md5=c2d453a344f56cf6f77204c5769288c0'],
'title': [u'https://pypi.python.org/packages/2.7/T/Twisted/Twisted-13.0.0.win32-py2.7.msi#md5=c2d453a344f56cf6f77204c5769288c0']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u5b89u88c5xa0zope u63a5u53e3uff1a',
u'xa0u9009u62e9u5012u6570u7b2cu4e8cu4e2axa0',
u'xa0',
u'
'],
'link': [u'https://pypi.python.org/pypi/zope.interface/4.1.0',
u'https://pypi.python.org/packages/2.7/z/zope.interface/zope.interface-4.1.0.win32-py2.7.exe#md5=c0100a3cd6de6ecc3cd3b4d678ec7931'],
'title': [u'https://pypi.python.org/pypi/zope.interface/4.1.0',
u'zope.interface-4.1.0.win32-py2.7.exe']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u5b89u88c5 lxml uff0cu7248u672cu8981u9009u5bf9u5e94u7cfbu7edfuff0cu9519u8befu7684u662fu7528u4e0du4e86u7684u3002u4e0bu8f7du5730u5740uff1axa0',
u'
'],
'link': [u'https://pypi.python.org/pypi/lxml/3.2.3'],
'title': [u'https://pypi.python.org/pypi/lxml/3.2.3']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u8981u5b89u88c5scrapyuff0cu8fd0u884cu4ee5u4e0bu547du4ee4uff1a
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
', u'
'], 'link': [], 'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
', u'
'], 'link': [], 'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
', u'
', u'
'], 'link': [], 'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u5b89u88c5', u'
'],
'link': [u'http://brew.sh/'],
'title': [u'homebrew']}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u8bbeu7f6eu73afu5883u53d8u91cf PATH u6307u5b9axa0homebrewxa0u5305u5728u7cfbu7edfu8f6fu4ef6u5305u524du4f7fu7528uff1a
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u53d8u66f4u5b8cu6210u540euff0cu91cdu65b0u52a0u8f7d .bashrc u4f7fu7528u4e0bu9762u7684u547du4ee4uff1a
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u63a5u4e0bu6765uff0cu4f7fu7528u4e0bu9762u7684u547du4ee4u5b89u88c5xa0Pythonuff1a
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] DEBUG: Scraped from <200 /5/59/1787.html>
{'desc': [u'
u63a5u4e0bu6765uff0cu5b89u88c5scrapyuff1a
',
u'
'],
'link': [],
'title': []}
2016-10-03 13:11:06 [scrapy] INFO: Closing spider (finished)
2016-10-03 13:11:06 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 709,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 15401,
'downloader/response_count': 3,
'downloader/response_status_count/200': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 10, 3, 5, 11, 6, 478000),
'item_scraped_count': 210,
'log_count/DEBUG': 214,
'log_count/INFO': 7,
'response_received_count': 3,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2016, 10, 3, 5, 11, 5, 197000)}
2016-10-03 13:11:06 [scrapy] INFO: Spider closed (finished)
D:first_scrapy>