問題描述
我遇到了一個(gè)問題,讓我很難找到理想的解決方案,為了更好地解釋它,我將在這里公開我的場(chǎng)景.
I've a problem which is giving me some hard time trying to figure it out the ideal solution and, to better explain it, I'm going to expose my scenario here.
我有一個(gè)接收訂單的服務(wù)器來自幾個(gè)客戶.每個(gè)客戶都會(huì)提交一組重復(fù)性任務(wù)應(yīng)該在某個(gè)指定的時(shí)間執(zhí)行間隔,例如:客戶端 A 提交任務(wù)AA 應(yīng)該每次執(zhí)行2009-12-31 和 2009-12-31 之間的分鐘2010-12-31;所以如果我的數(shù)學(xué)是對(duì)的大約有 525 600 次操作一年,給予更多的客戶和任務(wù)讓服務(wù)器處理所有這些任務(wù)是不可行的所以我提出了工人的想法機(jī)器.服務(wù)器將被開發(fā)在 PHP 上.
I've a server that will receive orders from several clients. Each client will submit a set of recurring tasks that should be executed at some specified intervals, eg.: client A submits task AA that should be executed every minute between 2009-12-31 and 2010-12-31; so if my math is right that's about 525 600 operations in a year, given more clients and tasks it would be infeasible to let the server process all these tasks so I came up with the idea of worker machines. The server will be developed on PHP.
工人機(jī)器只是普通的便宜基于 Windows 的計(jì)算機(jī),我會(huì)在我家或我的工作場(chǎng)所主持,每個(gè)工人都會(huì)有一個(gè)專門的互聯(lián)網(wǎng)連接(使用動(dòng)態(tài) IP)和 UPS 以避免停電.每個(gè)worker 也會(huì)每一次查詢服務(wù)器通過網(wǎng)絡(luò)服務(wù)調(diào)用 30 秒左右,獲取下一個(gè)待處理的作業(yè)并處理它.工作完成后,工人將將輸出提交到服務(wù)器并請(qǐng)求一份新工作等等,無(wú)窮無(wú)盡.如果需要擴(kuò)展系統(tǒng) I應(yīng)該只設(shè)置一個(gè)新的工人,然后整個(gè)事情應(yīng)該無(wú)縫運(yùn)行.將開發(fā)工作客戶端使用 PHP 或 Python.
Worker machines are just regular cheap Windows-based computers that I'll host on my home or at my workplace, each worker will have a dedicated Internet connection (with dynamic IPs) and a UPS to avoid power outages. Each worker will also query the server every 30 seconds or so via web service calls, fetch the next pending job and process it. Once the job is completed the worker will submit the output to the server and request a new job and so on ad infinitum. If there is a need to scale the system I should just set up a new worker and the whole thing should run seamlessly. The worker client will be developed in PHP or Python.
在任何時(shí)候,我的客戶都應(yīng)該能夠登錄到服務(wù)器并檢查他們訂購(gòu)的任務(wù)的狀態(tài).
At any given time my clients should be able to log on to the server and check the status of the tasks they ordered.
現(xiàn)在是棘手的部分:
- 我必須能夠重建已經(jīng)處理的任務(wù),如果對(duì)于某些服務(wù)器出現(xiàn)故障的原因.
- 工作人員不是特定于客戶的,一名工人應(yīng)該處理工作任何給定數(shù)量的客戶.
我對(duì)一般數(shù)據(jù)庫(kù)設(shè)計(jì)以及要使用的技術(shù)有一些疑問.
I've some doubts regarding the general database design and which technologies to use.
最初我想使用多個(gè) SQLite 數(shù)據(jù)庫(kù)并將它們?nèi)考尤敕?wù)器,但我不知道如何按客戶端分組以生成作業(yè)報(bào)告.
Originally I thought of using several SQLite databases and joining them all on the server but I can't figure out how I would group by clients to generate the job reports.
我從未真正使用過以下任何技術(shù):memcached、CouchDB、Hadoop 等等,但我愿意想知道其中任何一個(gè)是否適合我的問題,如果是,您為新手推薦哪個(gè)是分布式計(jì)算"(或者這是并行的?)像我一樣.請(qǐng)記住,worker 具有動(dòng)態(tài) IP.
I've never actually worked with any of the following technologies: memcached, CouchDB, Hadoop and all the like, but I would like to know if any of these is suitable for my problem, and if yes which do you recommend for a newbie is "distributed computing" (or is this parallel?) like me. Please keep in mind that the workers have dynamic IPs.
就像我之前說的,我在通用數(shù)據(jù)庫(kù)設(shè)計(jì)方面也遇到了麻煩,部分原因是我還沒有選擇任何特定的 R(D)DBMS,而是我已經(jīng)選擇了一個(gè)問題,我認(rèn)為它與我的 DBMS 無(wú)關(guān)選擇與排隊(duì)系統(tǒng)有關(guān)......我是否應(yīng)該預(yù)先計(jì)算特定作業(yè)的所有絕對(duì)時(shí)間戳并擁有大量時(shí)間戳,執(zhí)行并將它們標(biāo)記為完成升序或者我應(yīng)該有一個(gè)更聰明的系統(tǒng),比如when timestamp mod 60 == 0 -> execute".這個(gè)聰明"系統(tǒng)的問題在于某些作業(yè)不會(huì)按順序執(zhí)行,因?yàn)橛行┕ぷ魅藛T可能會(huì)等待無(wú)所事事而其他工作人員過載.您有什么建議?
Like I said before I'm also having trouble with the general database design, partly because I still haven't chosen any particular R(D)DBMS but one issue that I've and I think it's agnostic to the DBMS I choose is related to the queuing system... Should I precalculate all the absolute timestamps to a specific job and have a large set of timestamps, execute and flag them as complete in ascending order or should I have a more clever system like "when timestamp modulus 60 == 0 -> execute". The problem with this "clever" system is that some jobs will not be executed in order they should be because some workers could be waiting doing nothing while others are overloaded. What do you suggest?
PS:我不確定這個(gè)問題的標(biāo)題和標(biāo)簽是否正確反映了我的問題以及我正在嘗試做的事情;如果不是,請(qǐng)相應(yīng)地進(jìn)行編輯.
感謝您的意見!
@timdev:
- 輸入將是一個(gè)非常小的 JSON 編碼字符串,輸出也將是一個(gè) JSON 編碼字符串,但要大一點(diǎn)(大約 1-5 KB).
- 將使用來自 Web 的多個(gè)可用資源計(jì)算輸出,因此主要瓶頸可能是帶寬.數(shù)據(jù)庫(kù)寫入也可能是一個(gè) - 取決于 R(D)DBMS.
推薦答案
看起來您即將重新創(chuàng)建 Gearman.以下是 Gearman 的介紹:
It looks like you're on the verge of recreating Gearman. Here's the introduction for Gearman:
Gearman 提供了一個(gè)通用的應(yīng)用程序?qū)⒐ぷ魍獍o其他人的框架更好的機(jī)器或流程適合做這項(xiàng)工作.它可以讓你并行工作,加載平衡處理,并調(diào)用語(yǔ)言之間的功能.有可能用于各種應(yīng)用,從高可用性網(wǎng)站到數(shù)據(jù)庫(kù)復(fù)制的傳輸事件.換句話說,它是神經(jīng)系統(tǒng)如何分布處理通信.
Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
您可以用 PHP 編寫客戶端和后端工作程序代碼.
You can write both your client and the back-end worker code in PHP.
關(guān)于為 Windows 編譯的 Gearman 服務(wù)器的問題:我不認(rèn)為它可以在為 Windows 預(yù)先構(gòu)建的整潔包中使用.Gearman 仍然是一個(gè)相當(dāng)年輕的項(xiàng)目,他們可能還沒有成熟到可以為 Windows 生產(chǎn)現(xiàn)成的發(fā)行版.
Re your question about a Gearman Server compiled for Windows: I don't think it's available in a neat package pre-built for Windows. Gearman is still a fairly young project and they may not have matured to the point of producing ready-to-run distributions for Windows.
Sun/MySQL 員工 Eric Day 和 Brian Aker 提供了教程2009 年 7 月 OSCON 上的 Gearman,但他們的幻燈片只提到了 Linux 包.
Sun/MySQL employees Eric Day and Brian Aker gave a tutorial for Gearman at OSCON in July 2009, but their slides mention only Linux packages.
這是 Perl CPAN Testers 項(xiàng)目的鏈接,表明可以使用 Microsoft C 編譯器 (cl.exe
) 在 Win32 上構(gòu)建 Gearman-Server,并且它通過了測(cè)試:http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html 但我猜你必須下載源代碼并自己構(gòu)建.
Here's a link to the Perl CPAN Testers project, that indicates that Gearman-Server can be built on Win32 using the Microsoft C compiler (cl.exe
), and it passes tests: http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html But I'd guess you have to download source code and build it yourself.
這篇關(guān)于PHP 分布式系統(tǒng)剖析的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!