久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

是否可以從 Scrapy spider 運行另一個蜘蛛?

2023-05-26 Python問題 html5模板網

Is it possible to run another spider from Scrapy spider?(是否可以從 Scrapy spider 運行另一個蜘蛛?)

本文介紹了是否可以從 Scrapy spider 運行另一個蜘蛛?的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學習吧！

問題描述

限時送ChatGPT賬號..

現在我有 2 只蜘蛛，我想做的是

For now I have 2 spiders, what I would like to do is

Spider 1 轉到 url1 并且如果出現 url2 ，用 url2<調用蜘蛛 2/代碼>.也使用管道保存url1的內容.
蜘蛛2去url2做點什么.



Spider 1 goes to url1 and if url2 appears, call spider 2 with url2. Also saves the content of url1 by using pipeline.
Spider 2 goes to url2 and do something.

由于兩種蜘蛛的復雜性，我想將它們分開.
Due to the complexities of both spiders I would like to have them separated.
我使用 scrapy crawl 的嘗試:
def parse(self, response):
    p = multiprocessing.Process(
        target=self.testfunc())
    p.join()
    p.start()

def testfunc(self):
    settings = get_project_settings()
    crawler = CrawlerRunner(settings)
    crawler.crawl(<spidername>, <arguments>)

它會加載設置但不會抓取:
It does load the settings but doesn't crawl:
2015-08-24 14:13:32 [scrapy] INFO: Enabled extensions: CloseSpider, LogStats, CoreStats, SpiderState
2015-08-24 14:13:32 [scrapy] INFO: Enabled downloader middlewares: DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, HttpAuthMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-08-24 14:13:32 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-08-24 14:13:32 [scrapy] INFO: Spider opened
2015-08-24 14:13:32 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

文檔中有一個關于從腳本啟動的示例，但我想做的是在使用 scrapy crawl 命令時啟動另一個蜘蛛.
The documentations has a example about launching from script, but what I'm trying to do is launch another spider while using scrapy crawl command.
完整代碼
from scrapy.crawler import CrawlerRunner
from scrapy.utils.project import get_project_settings
from twisted.internet import reactor
from multiprocessing import Process
import scrapy
import os


def info(title):
    print(title)
    print('module name:', __name__)
    if hasattr(os, 'getppid'):  # only available on Unix
        print('parent process:', os.getppid())
    print('process id:', os.getpid())


class TestSpider1(scrapy.Spider):
    name = "test1"
    start_urls = ['http://www.google.com']

    def parse(self, response):
        info('parse')
        a = MyClass()
        a.start_work()


class MyClass(object):

    def start_work(self):
        info('start_work')
        p = Process(target=self.do_work)
        p.start()
        p.join()

    def do_work(self):

        info('do_work')
        settings = get_project_settings()
        runner = CrawlerRunner(settings)
        runner.crawl(TestSpider2)
        d = runner.join()
        d.addBoth(lambda _: reactor.stop())
        reactor.run()
        return

class TestSpider2(scrapy.Spider):

    name = "test2"
    start_urls = ['http://www.google.com']

    def parse(self, response):
        info('testspider2')
        return

我希望是這樣的:
scrapy 抓取測試1(例如，當 response.status_code 為 200 時:)
在test1中，調用scrapy crawl test2

推薦答案
我不會深入給出，因為這個問題真的很老，但我會繼續從官方 Scrappy 文檔中刪除這個片段......你非常接近！哈哈
I won't go in depth  given since this question is really old but I'll go ahead drop this snippet from the official Scrappy docs....   You are very close! lol 
import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished

https://doc.scrapy.org/en/latest/topics/實踐.html
然后使用回調，你可以在你的蜘蛛之間傳遞項目做你所說的邏輯函數
And then using callbacks you can pass items between your spiders do do w.e logic functions your talking about

                        這篇關于是否可以從 Scrapy spider 運行另一個蜘蛛?的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網！
                【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題，如果有圖片或者內容侵犯了您的權益，請聯系我們刪除處理，感謝您的支持！


                
             
                  
                  上一篇：Process.run() 和 Process.start() 之間的區別 
                  下一篇：Python 3.4 多處理遞歸 Pool.map()


          
          
            
              
                
                  相關文檔推薦
                

                
                  
                    Python 多處理模塊的 .join() 方法到底在做什么?
                  
                  What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多處理模塊的 .join() 方法到底在做什么?)
                
                

                
                  
                    在 Python 中將多個參數傳遞給 pool.map() 函數
                  
                  Passing multiple parameters to pool.map() function in Python(在 Python 中將多個參數傳遞給 pool.map() 函數)
                
                

                
                  
                    multiprocessing.pool.MaybeEncodingError: 'TypeError("
                  
                  multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEncodingError: TypeError(cannot serialize _io.BufferedReader object,)) - IT屋-程序員軟件開
                
                

                
                  
                    Python 多進程池.當其中一個工作進程確定不再需要
                  
                  Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多進程池.當其中一個工作進程確定不再需要完成工作時，如何退出腳本?) - IT屋-程序員
                
                

                
                  
                    如何將隊列引用傳遞給 pool.map_async() 管理的函數
                  
                  How do you pass a Queue reference to a function managed by pool.map_async()?(如何將隊列引用傳遞給 pool.map_async() 管理的函數?)
                
                

                
                  
                    與多處理錯誤的另一個混淆，“模塊"對象沒
                  
                  yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(與多處理錯誤的另一個混淆，“模塊對象沒有屬性“f)


        
<i id='VmOGD'><tr id='VmOGD'><dt id='VmOGD'><q id='VmOGD'><span id='VmOGD'><b id='VmOGD'><form id='VmOGD'><ins id='VmOGD'></ins><ul id='VmOGD'></ul><sub id='VmOGD'></sub></form><legend id='VmOGD'></legend><bdo id='VmOGD'><pre id='VmOGD'><center id='VmOGD'></center></pre></bdo></b><th id='VmOGD'></th></span></q></dt></tr></i><div   class="qwawimqqmiuu"   id='VmOGD'><tfoot id='VmOGD'></tfoot><dl id='VmOGD'><fieldset id='VmOGD'></fieldset></dl></div>
<tbody id='VmOGD'></tbody>
<small id='VmOGD'></small><noframes id='VmOGD'>
<tfoot id='VmOGD'></tfoot>
<bdo id='VmOGD'></bdo><ul id='VmOGD'></ul>
<legend id='VmOGD'><style id='VmOGD'><dir id='VmOGD'><q id='VmOGD'></q></dir></style></legend>

        
        
          
          
            
               
                欄目導航
                前端問題解決Java問題php問題Python問題C#/.NET問題C/C++問題移動開發問題數據庫問題
                
              
            
          
          
          
          
            
              
                最新文章
                
                    • 在python中添加背景圖像...
                  

                    • 'numpy.float64' 對象不可迭...
                  

                    • ElementClickInterceptedException:消息...
                  

                    • OMP:錯誤 #15:正在初始化 libi...
                  

                    • ftp.retrbinary() 幫助 python...
                  

                    • 在 gitlab CI 期間激活 conda 環境...
                  

                    • 將十六進制轉換為浮點數...
                  

                    • OpenCV findChessboardCorners 函數在...
                  

                    • Python:“ModuleNotFoundError"，...
                  

                    • 如何使用 Python Pandas 將 JMP ...
                  

                    • ValueError:無法將字符串轉換為...
                  

                    • Ftplib ConnectionRefusedError:[Errn...
                  

              
            
          
          
          
          
            
              
                熱門文章
                
                    • 在python中添加背景圖像...
                  

                    • 'numpy.float64' 對象不可迭...
                  

                    • ElementClickInterceptedException:消息...
                  

                    • OMP:錯誤 #15:正在初始化 libi...
                  

                    • ftp.retrbinary() 幫助 python...
                  

                    • 在 gitlab CI 期間激活 conda 環境...
                  

                    • 將十六進制轉換為浮點數...
                  

                    • OpenCV findChessboardCorners 函數在...
                  

                    • Python:“ModuleNotFoundError"，...
                  

                    • 如何使用 Python Pandas 將 JMP ...
                  

                    • ValueError:無法將字符串轉換為...
                  

                    • Ftplib ConnectionRefusedError:[Errn...
                  

              
            
          
          
          
          
            
              
                熱門標簽
                
        	旅游公司
         	
        	服裝服飾
         	
        	機械設備
         	
        	電子產品
         	
        	政府協會
         	
        	網絡營銷
         	
        	環保科技
         	
        	科技公司
         	
        	家政服務
         	
        	營銷型
         	
        	環保
         	
        	軟件開發
         	
        	傳媒公司
         	
        	金融服務
         	
        	雙語
         	
        	培訓機構
         	
        	零部件
         	
        	教育培訓
         	
        	博客主題
         	
        	軸承
         	
        	新聞資訊
         	
        	視頻
         	
        	進銷存系統
         	
        	bootstrap
         	
        	商城模板
         	
        	商務合作
         	
        	廣告設計
         	
        	驗證碼
         	
        	門戶
         	
        	ar
         	
        	OElove
         	
        	漫畫網
         	
        	全景
         	
        	商城
         	
        	區塊鏈
         	
        	虛擬幣
         	
        	你畫我猜
         	
        	卡券
         	
        	動畫特效
         	
        	在線客服
         	
        	地板
         	
        	域名停放
         	
        	canvas
         	
        	html5
         	
        	svg
         	
        	博客
         	
        	攝影
         	
        	導航
         	
        	小說源碼
         	
        	污染治理
         	
        	蘋果cms
         	
        	微擎微贊
         	
        	微商
         	
        	訂單系統
         	
        	小程序
         	
        	電影源碼
         	
        	微信程序
         	
        	帝國cms
         	
        	掃碼點餐
         	
        	jquery
         	
        	angular
         	
        	視頻打賞
         	
        	thinkphp
         	
        	360
         	
        	動畫模板
         	
        	淘寶客
         	
        	音樂
         	
        	分發系統
         	
        	o2o
         	
        	微擎