久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

<legend id='YiBE9'><style id='YiBE9'><dir id='YiBE9'><q id='YiBE9'></q></dir></style></legend>

  • <small id='YiBE9'></small><noframes id='YiBE9'>

        <bdo id='YiBE9'></bdo><ul id='YiBE9'></ul>
      <tfoot id='YiBE9'></tfoot>
      1. <i id='YiBE9'><tr id='YiBE9'><dt id='YiBE9'><q id='YiBE9'><span id='YiBE9'><b id='YiBE9'><form id='YiBE9'><ins id='YiBE9'></ins><ul id='YiBE9'></ul><sub id='YiBE9'></sub></form><legend id='YiBE9'></legend><bdo id='YiBE9'><pre id='YiBE9'><center id='YiBE9'></center></pre></bdo></b><th id='YiBE9'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='YiBE9'><tfoot id='YiBE9'></tfoot><dl id='YiBE9'><fieldset id='YiBE9'></fieldset></dl></div>
      2. 是否可以從 Scrapy spider 運行另一個蜘蛛?

        Is it possible to run another spider from Scrapy spider?(是否可以從 Scrapy spider 運行另一個蜘蛛?)
        1. <legend id='YCTlj'><style id='YCTlj'><dir id='YCTlj'><q id='YCTlj'></q></dir></style></legend>

              <tbody id='YCTlj'></tbody>

            <small id='YCTlj'></small><noframes id='YCTlj'>

          • <tfoot id='YCTlj'></tfoot>

                  <bdo id='YCTlj'></bdo><ul id='YCTlj'></ul>

                  <i id='YCTlj'><tr id='YCTlj'><dt id='YCTlj'><q id='YCTlj'><span id='YCTlj'><b id='YCTlj'><form id='YCTlj'><ins id='YCTlj'></ins><ul id='YCTlj'></ul><sub id='YCTlj'></sub></form><legend id='YCTlj'></legend><bdo id='YCTlj'><pre id='YCTlj'><center id='YCTlj'></center></pre></bdo></b><th id='YCTlj'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='YCTlj'><tfoot id='YCTlj'></tfoot><dl id='YCTlj'><fieldset id='YCTlj'></fieldset></dl></div>
                  本文介紹了是否可以從 Scrapy spider 運行另一個蜘蛛?的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

                  問題描述

                  限時送ChatGPT賬號..

                  現在我有 2 只蜘蛛,我想做的是

                  For now I have 2 spiders, what I would like to do is

                  1. Spider 1 轉到 url1 并且如果出現 url2 ,用 url2<調用蜘蛛 2/代碼>.也使用管道保存url1的內容.
                  2. 蜘蛛2url2做點什么.
                  1. Spider 1 goes to url1 and if url2 appears, call spider 2 with url2. Also saves the content of url1 by using pipeline.
                  2. Spider 2 goes to url2 and do something.

                  由于兩種蜘蛛的復雜性,我想將它們分開.

                  Due to the complexities of both spiders I would like to have them separated.

                  我使用 scrapy crawl 的嘗試:

                  def parse(self, response):
                      p = multiprocessing.Process(
                          target=self.testfunc())
                      p.join()
                      p.start()
                  
                  def testfunc(self):
                      settings = get_project_settings()
                      crawler = CrawlerRunner(settings)
                      crawler.crawl(<spidername>, <arguments>)
                  

                  它會加載設置但不會抓取:

                  It does load the settings but doesn't crawl:

                  2015-08-24 14:13:32 [scrapy] INFO: Enabled extensions: CloseSpider, LogStats, CoreStats, SpiderState
                  2015-08-24 14:13:32 [scrapy] INFO: Enabled downloader middlewares: DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, HttpAuthMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
                  2015-08-24 14:13:32 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
                  2015-08-24 14:13:32 [scrapy] INFO: Spider opened
                  2015-08-24 14:13:32 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
                  

                  文檔中有一個關于從腳本啟動的示例,但我想做的是在使用 scrapy crawl 命令時啟動另一個蜘蛛.

                  The documentations has a example about launching from script, but what I'm trying to do is launch another spider while using scrapy crawl command.

                  完整代碼

                  from scrapy.crawler import CrawlerRunner
                  from scrapy.utils.project import get_project_settings
                  from twisted.internet import reactor
                  from multiprocessing import Process
                  import scrapy
                  import os
                  
                  
                  def info(title):
                      print(title)
                      print('module name:', __name__)
                      if hasattr(os, 'getppid'):  # only available on Unix
                          print('parent process:', os.getppid())
                      print('process id:', os.getpid())
                  
                  
                  class TestSpider1(scrapy.Spider):
                      name = "test1"
                      start_urls = ['http://www.google.com']
                  
                      def parse(self, response):
                          info('parse')
                          a = MyClass()
                          a.start_work()
                  
                  
                  class MyClass(object):
                  
                      def start_work(self):
                          info('start_work')
                          p = Process(target=self.do_work)
                          p.start()
                          p.join()
                  
                      def do_work(self):
                  
                          info('do_work')
                          settings = get_project_settings()
                          runner = CrawlerRunner(settings)
                          runner.crawl(TestSpider2)
                          d = runner.join()
                          d.addBoth(lambda _: reactor.stop())
                          reactor.run()
                          return
                  
                  class TestSpider2(scrapy.Spider):
                  
                      name = "test2"
                      start_urls = ['http://www.google.com']
                  
                      def parse(self, response):
                          info('testspider2')
                          return
                  

                  我希望是這樣的:

                  1. scrapy 抓取測試1(例如,當 response.status_code 為 200 時:)
                  2. 在test1中,調用scrapy crawl test2

                  推薦答案

                  我不會深入給出,因為這個問題真的很老,但我會繼續從官方 Scrappy 文檔中刪除這個片段......你非常接近!哈哈

                  I won't go in depth given since this question is really old but I'll go ahead drop this snippet from the official Scrappy docs.... You are very close! lol

                  import scrapy
                  from scrapy.crawler import CrawlerProcess
                  
                  class MySpider1(scrapy.Spider):
                      # Your first spider definition
                      ...
                  
                  class MySpider2(scrapy.Spider):
                      # Your second spider definition
                      ...
                  
                  process = CrawlerProcess()
                  process.crawl(MySpider1)
                  process.crawl(MySpider2)
                  process.start() # the script will block here until all crawling jobs are finished
                  

                  https://doc.scrapy.org/en/latest/topics/實踐.html

                  然后使用回調,你可以在你的蜘蛛之間傳遞項目做你所說的邏輯函數

                  And then using callbacks you can pass items between your spiders do do w.e logic functions your talking about

                  這篇關于是否可以從 Scrapy spider 運行另一個蜘蛛?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

                  【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

                  相關文檔推薦

                  What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多處理模塊的 .join() 方法到底在做什么?)
                  Passing multiple parameters to pool.map() function in Python(在 Python 中將多個參數傳遞給 pool.map() 函數)
                  multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEncodingError: TypeError(cannot serialize _io.BufferedReader object,)) - IT屋-程序員軟件開
                  Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多進程池.當其中一個工作進程確定不再需要完成工作時,如何退出腳本?) - IT屋-程序員
                  How do you pass a Queue reference to a function managed by pool.map_async()?(如何將隊列引用傳遞給 pool.map_async() 管理的函數?)
                  yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(與多處理錯誤的另一個混淆,“模塊對象沒有屬性“f)
                  <i id='VmOGD'><tr id='VmOGD'><dt id='VmOGD'><q id='VmOGD'><span id='VmOGD'><b id='VmOGD'><form id='VmOGD'><ins id='VmOGD'></ins><ul id='VmOGD'></ul><sub id='VmOGD'></sub></form><legend id='VmOGD'></legend><bdo id='VmOGD'><pre id='VmOGD'><center id='VmOGD'></center></pre></bdo></b><th id='VmOGD'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='VmOGD'><tfoot id='VmOGD'></tfoot><dl id='VmOGD'><fieldset id='VmOGD'></fieldset></dl></div>
                      <tbody id='VmOGD'></tbody>

                    <small id='VmOGD'></small><noframes id='VmOGD'>

                      <tfoot id='VmOGD'></tfoot>
                      • <bdo id='VmOGD'></bdo><ul id='VmOGD'></ul>
                        <legend id='VmOGD'><style id='VmOGD'><dir id='VmOGD'><q id='VmOGD'></q></dir></style></legend>
                          1. 主站蜘蛛池模板: www.欧美日韩 | 宅男的天堂 | 男女啪啪免费 | 在线免费看黄网站 | 欧美美女视频 | 国产成人午夜精品 | 成人午夜网站 | 午夜高清 | 日本在线小视频 | 国产精品www| 亚洲综合一区二区三区 | 在线免费观看黄色片 | 91久久久久久久 | 亚洲欧美国产毛片在线 | 不卡视频在线 | 国产日韩欧美 | 免费观看一区二区 | 一级国产片| 对白刺激国产子与伦 | 日韩在线免费视频 | www国产视频| 国产ts在线 | 荤话粗俗h高h重口 | 一区二区免费视频 | 视频一区二区在线观看 | 手机av在线 | 美女91网站 | 国产乱叫456在线 | 亚洲国产激情 | 麻豆一区二区三区 | 日韩精品国产一区 | 亚洲永久精品视频 | 久久九九99 | 日韩在线小视频 | 亚久久 | 久久精品www人人爽人人 | 欧美福利在线 | 青青草国产在线视频 | av每日更新 | 亚洲一区影院 | 久久人人爽人人爽人人片 |