如果您在UNIX平台上工作,那麼最好安裝 IPython。 如果有IPython的無法存取,您也可以使用bpython。
[settings] shell = bpython
scrapy shell <url>
S.N |
快捷方式和說明
|
---|---|
1 |
shelp()
它提供了可用物件和快捷方式的幫助選項
|
2 |
fetch(request_or_url)
它會從請求或URL的響應收集相關物件可能的更新
|
3 |
view(response) 可以在本地瀏覽器檢視特定請求的響應,觀察和正確顯示外部連結,追加基本標籤到響應正文。 |
S.N. |
物件和說明
|
---|---|
1 |
crawler
它指定當前爬行物件
|
2 |
spider
如果對於當前網址沒有蜘蛛,那麼它將通過定義新的蜘蛛處理URL或蜘蛛物件
|
3 |
request
它指定了最後採集頁面請求物件
|
4 |
response
它指定了最後採集頁面響應物件
|
5 |
settings
它提供當前Scrapy設定
|
scrapy shell 'http://scrapy.org' --nolog
[s] Available Scrapy objects: [s] crawler [s] item {} [s] request [s] response <200 http://scrapy.org> [s] settings [s] spider [s] Useful shortcuts: [s] shelp() Provides available objects and shortcuts with help option [s] fetch(req_or_url) Collects the response from the request or URL and associated objects will get update [s] view(response) View the response for the given request
>> response.xpath('//title/text()').extract_first() u'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework' >> fetch("http://reddit.com") [s] Available Scrapy objects: [s] crawler [s] item {} [s] request [s] response <200 https://www.tw511.com/> [s] settings [s] spider [s] Useful shortcuts: [s] shelp() Shell help (print this help) [s] fetch(req_or_url) Fetch request (or URL) and update local objects [s] view(response) View response in a browser >> response.xpath('//title/text()').extract() [u'reddit: the front page of the internet'] >> request = request.replace(method="POST") >> fetch(request) [s] Available Scrapy objects: [s] crawler ...
import scrapy class SpiderDemo(scrapy.Spider): name = "spiderdemo" start_urls = [ "https://www.tw511.com", "http://yiibai.org", "http://yiibai.net", ] def parse(self, response): # You can inspect one specific response if ".net" in response.url: from scrapy.shell import inspect_response inspect_response(response, self)
scrapy.shell.inspect_response
2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None) 2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None) 2016-02-08 18:15:20-0400 [scrapy] DEBUG: Crawled (200) (referer: None) [s] Available Scrapy objects: [s] crawler ... >> response.url 'http://yiibai.org'
>> response.xpath('//div[@class="val"]') It displays the output as []
>> view(response) It displays the response as True