問題描述
我正在使用 MaxMind 免費(fèi)數(shù)據(jù)庫(kù)進(jìn)行 IP 查找.我將數(shù)據(jù)轉(zhuǎn)換成下表:
I am using MaxMind free databases to do IP lookups. I convert the data to the following table:
CREATE TABLE [dbo].[GeoBlocks](
[StartIPNum] [varchar](50) NULL,
[EndIPNumb] [varchar](50) NULL,
[LocationNum] [varchar](50) NULL,
[PostalCode] [varchar](50) NULL,
[Latitude] [varchar](50) NULL,
[Longitude] [varchar](50) NULL)
這個(gè)查找表中大約有 350 萬(wàn)條記錄.
There are about 3.5M records in this lookup table.
我的目標(biāo)是通過查找 IP 在 StartIPNum 和 EndIPNum 之間的記錄來確定 IP(十進(jìn)制形式)的 LocationNum
My goal is to determine the LocationNum for an IP(decimal form) by finding the record where the IP is between StartIPNum and EndIPNum
我的存儲(chǔ)過程如下所示:參數(shù):@DecimalIP bigint
My stored procedure looks like this: Parameter: @DecimalIP bigint
select GeoBlocks.StartIPNum ,@DecimalIP as DecimalIp
,GeoBlocks.Postalcode ,GeoBlocks.Latitude as Latitude
,GeoBlocks.Longitude as Longitude
from GeoBlocks
where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb
我在 StartIPNum 和 EndIPNum 上創(chuàng)建了唯一索引.
I have created unique indexes on StartIPNum and EndIPNum.
但是,當(dāng)我運(yùn)行它時(shí),SQL Server 會(huì)對(duì)查詢的 Where 部分進(jìn)行表掃描.此查詢需要 650-750 毫秒.(我服務(wù)器上的大多數(shù)查詢需要 0-2 毫秒)
However, when I run this, SQL server does a table scan for the Where portion of the query. This query takes 650-750ms. (Most queries on my server take 0-2ms)
如何加快查詢速度?
添加示例數(shù)據(jù):
StartIPNum EndIPNumb LocationNum PostalCode Latitude Longitude
1350218632 1350218639 2782113 48.2000 16.3667
1350218640 1350218655 2782113 48.2000 16.3667
1350218656 1350218687 2782113 48.2000 16.3667
1350218688 1350218751 2782113 48.2000 16.3667
1350218752 1350218783 2782113 48.2000 16.3667
推薦答案
更新:
總結(jié)分散在各種評(píng)論中的信息:
To summarize information scattered among various comments:
IP 地址列是
VarChar(50)
字符串,包含沒有左填充的十進(jìn)制值.這些列上的索引將按字母順序而不是數(shù)字順序?qū)λ鼈冞M(jìn)行排序,即10"<2".(使用左填充,排序在數(shù)字上也是正確的:10">02".)
The IP address columns are
VarChar(50)
strings containing decimal values without left padding. An index on those columns will sort them alphabetically, not numerically, i.e. "10" < "2". (With left padding the sort will be correct numerically as well: "10" > "02".)
WHERE
子句( where @DecimalIP 在 GeoBlocks.StartIPNum 和 GeoBlocks.EndIPNumb
之間)使用混合數(shù)據(jù)類型.@DecimalIP
是一個(gè) BIGINT
而兩列是 VarChar(50)
.SQL 通過實(shí)現(xiàn)數(shù)據(jù)類型優(yōu)先級(jí)方案來處理混合數(shù)據(jù)類型之間的操作.(Ref.)這會(huì)導(dǎo)致每行中的 IP 地址被轉(zhuǎn)換從字符串到 BIGINT
值,因此比較以數(shù)字方式完成,并且以相當(dāng)大的成本返回預(yù)期"結(jié)果.在這種情況下,索引(幾乎)毫無用處.
The WHERE
clause (where @DecimalIP between GeoBlocks.StartIPNum and GeoBlocks.EndIPNumb
) uses mixed datatypes. @DecimalIP
is a BIGINT
while the two columns are VarChar(50)
. SQL handles operations among mixed datatypes by implementing a data type precedence scheme. (Ref.) This causes the IP addresses in each row to be converted from strings to BIGINT
values, hence the comparison is done numerically and the "expected" results are returned at a considerable cost. The indexes are (all but) useless in this case.
將列更改為 BIGINT
將允許使用索引來提高性能并確保比較按數(shù)字而不是按字母順序進(jìn)行.包含 StartIPNum
和 EndIPNumb
列的單個(gè)索引將大大提高性能.請(qǐng)注意,如果不允許重疊地址范圍,則索引在 StartIPNum
上將有效地唯一,并且可以用 StartIPNum
上的索引替換為 EndIPNumb
作為包含列的性能.
Changing the columns to BIGINT
will allow the use of an index to improve performance and ensure that comparisons are done numerically rather than alphabetically. An single index containing both the StartIPNum
and EndIPNumb
columns will greatly improve performance. Note that if overlapping address ranges are not allowed then the index will effectively be unique on StartIPNum
and could be replaced with an index on StartIPNum
with EndIPNumb
as an included column for performance.
原答案:
如果您使用點(diǎn)號(hào)表示的 IPV4 地址,例如192.168.0.42",您可以使用此 UDF 將字符串轉(zhuǎn)換為 BIGINT
值:
If you are using IPV4 addresses in dotted notation, e.g. "192.168.0.42", you can convert the strings into BIGINT
values with this UDF:
create function [dbo].[IntegerIPV4Address]( @IPV4Address VarChar(16) )
returns BigInt
with SchemaBinding
begin
declare @Dot1 as Int = CharIndex( '.', @IPV4Address );
declare @Dot2 as Int = CharIndex( '.', @IPV4Address, @Dot1 + 1 );
declare @Dot3 as Int = CharIndex( '.', @IPV4Address, @Dot2 + 1 );
return Cast( Substring( @IPV4Address, 0, @Dot1 ) as BigInt ) * 0x1000000 +
Cast( Substring( @IPV4Address, @Dot1 + 1, @Dot2 - @Dot1 - 1 ) as BigInt ) * 0x10000 +
Cast( Substring( @IPV4Address, @Dot2 + 1, @Dot3 - @Dot2 - 1 ) as BigInt ) * 0x100 +
Cast( Substring( @IPV4Address, @Dot3 + 1, Len( @IPV4Address ) * 1 ) as BigInt );
end
您可以根據(jù)函數(shù)結(jié)果存儲(chǔ)整數(shù)值或在計(jì)算列上創(chuàng)建索引.請(qǐng)注意,您需要更改查詢以引用 WHERE
子句中的整數(shù)列.
You can either store the integer values or create an index on a computed column based on the functions result. Note that you need to change your query to reference the integer column in the WHERE
clause.
如果您將值存儲(chǔ)為整數(shù),以下函數(shù)會(huì)將它們轉(zhuǎn)換回規(guī)范化字符串,其中地址的每個(gè)部分都是三位數(shù).這些值可用于比較,因?yàn)樗鼈儗醋帜疙樞蚝蛿?shù)字順序排序.
If you store the values as integers the following function will convert them back to normalized strings where each part of the address is three digits. These values can be used in comparisons since they will sort the same way both alphabetically and numerically.
create function [dbo].[NormalizedIPV4Address]( @IntegerIPV4Address as BigInt )
returns VarChar(16)
with SchemaBinding -- Deterministic function.
begin
declare @BinaryAddress as VarBinary(4) = Cast( @IntegerIPV4Address as VarBinary(4) );
return Right( '00' + Cast( Cast( Substring( @BinaryAddress, 1, 1 ) as Int ) as VarChar(3) ), 3 ) +
'.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 2, 1 ) as Int ) as VarChar(3) ), 3 ) +
'.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 3, 1 ) as Int ) as VarChar(3) ), 3 ) +
'.' + Right( '00' + Cast( Cast( Substring( @BinaryAddress, 4, 1 ) as Int ) as VarChar(3) ), 3 )
end
您可以對(duì)表中的字符串值進(jìn)行往返,將它們?nèi)哭D(zhuǎn)換為規(guī)范化"形式,以便使用這兩個(gè)函數(shù)對(duì)它們進(jìn)行正確排序.不是一個(gè)理想的解決方案,因?yàn)樗枰獙?duì)所有未來的插入和更新進(jìn)行規(guī)范化,但目前可能會(huì)有所幫助.
You could round-trip the string values in your table to get them all into "normalized" form so that they sort correctly by using both functions. Not an ideal solution since it requires that all future inserts and updates be normalized, but it may help for the moment.
這篇關(guān)于在搜索之間加速的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!