問(wèn)題描述
我正在開(kāi)發(fā)一個(gè)介于電子郵件服務(wù)和社交網(wǎng)絡(luò)之間的網(wǎng)絡(luò)應(yīng)用.我覺(jué)得它有潛力在未來(lái)變得非常大,所以我擔(dān)心可擴(kuò)展性.
I'm working on a web app that is somewhere between an email service and a social network. I feel it has the potential to grow really big in the future, so I'm concerned about scalability.
我決定為每個(gè)活動(dòng)用戶創(chuàng)建一個(gè)單獨(dú)的 SQLite 數(shù)據(jù)庫(kù),而不是使用一個(gè)集中的 MySQL/InnoDB 數(shù)據(jù)庫(kù),然后在那個(gè)時(shí)候?qū)ζ溥M(jìn)行分區(qū):每個(gè)分片"一個(gè)活動(dòng)用戶.
Instead of using one centralized MySQL/InnoDB database and then partitioning it when that time comes, I've decided to create a separate SQLite database for each active user: one active user per 'shard'.
這樣備份數(shù)據(jù)庫(kù)就像每天將每個(gè)用戶的小數(shù)據(jù)庫(kù)文件復(fù)制到遠(yuǎn)程位置一樣簡(jiǎn)單.
That way backing up the database would be as easy as copying each user's small database file to a remote location once a day.
擴(kuò)展就像添加額外的硬盤(pán)來(lái)存儲(chǔ)新文件一樣簡(jiǎn)單.
Scaling up will be as easy as adding extra hard disks to store the new files.
當(dāng)應(yīng)用程序超出單個(gè)服務(wù)器時(shí),我可以使用 GlusterFS 在文件系統(tǒng)級(jí)別將服務(wù)器鏈接在一起并原樣運(yùn)行應(yīng)用程序,或者安裝一個(gè)簡(jiǎn)單的 SQLite 代理系統(tǒng),允許每個(gè)服務(wù)器操作相鄰服務(wù)器中的 sqlite 文件.
When the app grows beyond a single server I can link the servers together at the filesystem level using GlusterFS and run the app unchanged, or rig up a simple SQLite proxy system that will allow each server to manipulate sqlite files in adjacent servers.
并發(fā)問(wèn)題將最小化,因?yàn)槊總€(gè) HTTP 請(qǐng)求一次只會(huì)訪問(wèn)一個(gè)或兩個(gè)數(shù)據(jù)庫(kù)文件,在數(shù)千個(gè)中,而且 SQLite 無(wú)論如何只會(huì)阻止讀取.
Concurrency issues will be minimal because each HTTP request will only touch one or two database files at a time, out of thousands, and SQLite only blocks on reads anyway.
我敢打賭,這種方法將使我的應(yīng)用程序能夠優(yōu)雅地?cái)U(kuò)展并支持許多很酷和獨(dú)特的功能.我賭錯(cuò)了嗎?我錯(cuò)過(guò)了什么嗎?
I'm betting that this approach will allow my app to scale gracefully and support lots of cool and unique features. Am I betting wrong? Am I missing anything?
更新 我決定采用一個(gè)不太極端的解決方案,到目前為止它運(yùn)行良好.我正在使用固定數(shù)量的分片 - 準(zhǔn)確地說(shuō)是 256 個(gè) sqlite 數(shù)據(jù)庫(kù).每個(gè)用戶都通過(guò)一個(gè)簡(jiǎn)單的哈希函數(shù)分配并綁定到一個(gè)隨機(jī)分片.
UPDATE I decided to go with a less extreme solution, which is working fine so far. I'm using a fixed number of shards - 256 sqlite databases, to be precise. Each user is assigned and bound to a random shard by a simple hash function.
我的應(yīng)用程序的大多數(shù)功能每個(gè)請(qǐng)求只需要訪問(wèn)一兩個(gè)分片,但有一個(gè)特別需要對(duì) 256 個(gè)不同分片中的 10 到 100 個(gè)不同分片執(zhí)行簡(jiǎn)單查詢,具體取決于用戶.測(cè)試表明,如果所有數(shù)據(jù)都緩存在 RAM 中,大約需要 0.02 秒或更短的時(shí)間.我想我可以忍受!
Most features of my app require access to just one or two shards per request, but there is one in particular that requires the execution of a simple query on 10 to 100 different shards out of 256, depending on the user. Tests indicate it would take about 0.02 seconds, or less, if all the data is cached in RAM. I think I can live with that!
UPDATE 2.0 我將應(yīng)用程序移植到 MySQL/InnoDB 并且能夠獲得與常規(guī)請(qǐng)求大致相同的性能,但是對(duì)于需要分片行走的請(qǐng)求,innodb 的速度提高了 4-5 倍.出于這個(gè)原因和其他原因,我放棄了這個(gè)架構(gòu),但我希望有人能在某個(gè)地方找到它的用處......謝謝.
UPDATE 2.0 I ported the app to MySQL/InnoDB and was able to get about the same performance for regular requests, but for that one request that requires shard walking, innodb is 4-5 times faster. For this reason, and other reason, I'm dropping this architecture, but I hope someone somewhere finds a use for it...thanks.
推薦答案
如果您必須執(zhí)行所謂的分片行走"——即找出一堆不同用戶的所有數(shù)據(jù),就會(huì)失敗.這種特定類(lèi)型的查詢"必須以編程方式完成,依次詢問(wèn)每個(gè) SQLite 數(shù)據(jù)庫(kù) - 并且很可能是您站點(diǎn)中最慢的方面.在將數(shù)據(jù)分片"到不同數(shù)據(jù)庫(kù)的任何系統(tǒng)中,這是一個(gè)常見(jiàn)問(wèn)題.
The place where this will fail is if you have to do what's called "shard walking" - which is finding out all the data across a bunch of different users. That particular kind of "query" will have to be done programmatically, asking each of the SQLite databases in turn - and will very likely be the slowest aspect of your site. It's a common issue in any system where data has been "sharded" into separate databases.
如果所有數(shù)據(jù)對(duì)用戶來(lái)說(shuō)都是獨(dú)立的,那么這應(yīng)該可以很好地?cái)U(kuò)展 - 使其成為有效設(shè)計(jì)的關(guān)鍵是了解數(shù)據(jù)可能會(huì)如何使用以及數(shù)據(jù)是否來(lái)自一個(gè)人將與來(lái)自另一個(gè)(在您的上下文中)的數(shù)據(jù)進(jìn)行交互.
If all the of the data is self-contained to the user, then this should scale pretty well - the key to making this an effective design is to know how the data is likely going to be used and if data from one person will be interacting with data from another (in your context).
您可能還需要注意文件系統(tǒng)資源 - SQLite 很棒、很棒、速度很快等等 - 但是在使用標(biāo)準(zhǔn)數(shù)據(jù)庫(kù)"(即 MySQL、PostgreSQL 等)時(shí)確實(shí)可以獲得一些緩存和寫(xiě)入優(yōu)勢(shì),因?yàn)樗鼈兊脑O(shè)計(jì)方式.在您提議的設(shè)計(jì)中,您會(huì)錯(cuò)過(guò)其中的一些內(nèi)容.
You may also need to watch out for file system resources - SQLite is great, awesome, fast, etc - but you do get some caching and writing benefits when using a "standard database" (i.e. MySQL, PostgreSQL, etc) because of how they're designed. In your proposed design, you'll be missing out on some of that.
這篇關(guān)于極端分片:每個(gè)用戶一個(gè) SQLite 數(shù)據(jù)庫(kù)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!