HBase 之RowKey特性与设计

HBase 之RowKey特性与设计

有序集合

高位字典序

scan 'test:demo'
1                                          column=d:v, timestamp=1452261353304, value=1
10                                         column=d:v, timestamp=1452261353304, value=10
11                                         column=d:v, timestamp=1452261353304, value=11
2                                          column=d:v, timestamp=1452261353304, value=2
3                                          column=d:v, timestamp=1452261353304, value=3
4                                          column=d:v, timestamp=1452261353304, value=4
5                                          column=d:v, timestamp=1452261353304, value=5
6                                          column=d:v, timestamp=1452261353304, value=6
7                                          column=d:v, timestamp=1452261353304, value=7
8                                          column=d:v, timestamp=1452261353304, value=8
9                                          column=d:v, timestamp=1452261353304, value=9

高位填充

如:str_pad($i, 4, 0, STR_PAD_LEFT),需要提前预估填充长度,浪费空间,可读性好

0001                                       column=d:v, timestamp=1452261782358, value=1
0002                                       column=d:v, timestamp=1452261782358, value=2
0003                                       column=d:v, timestamp=1452261782358, value=3
0004                                       column=d:v, timestamp=1452261782358, value=4
0005                                       column=d:v, timestamp=1452261782358, value=5
0006                                       column=d:v, timestamp=1452261782358, value=6
0007                                       column=d:v, timestamp=1452261782358, value=7
0008                                       column=d:v, timestamp=1452261782358, value=8
0009                                       column=d:v, timestamp=1452261782358, value=9
0010                                       column=d:v, timestamp=1452261782358, value=10
0011                                       column=d:v, timestamp=1452261782358, value=11

字节数组

如:pack('N', $i), 固定4个字节,可表示PHP_INT_MAX=9223372036854775807,可读性差

\x00\x00\x00\x01                           column=d:v, timestamp=1452262267583, value=1
\x00\x00\x00\x02                           column=d:v, timestamp=1452262267583, value=2
\x00\x00\x00\x03                           column=d:v, timestamp=1452262267583, value=3
\x00\x00\x00\x04                           column=d:v, timestamp=1452262267583, value=4
\x00\x00\x00\x05                           column=d:v, timestamp=1452262267583, value=5
\x00\x00\x00\x06                           column=d:v, timestamp=1452262267583, value=6
\x00\x00\x00\x07                           column=d:v, timestamp=1452262267583, value=7
\x00\x00\x00\x08                           column=d:v, timestamp=1452262267583, value=8
\x00\x00\x00\x09                           column=d:v, timestamp=1452262267583, value=9
\x00\x00\x00\x0A                           column=d:v, timestamp=1452262267583, value=10
\x00\x00\x00\x0B                           column=d:v, timestamp=1452262267583, value=11
\xFF\xFF\xFF\xFF                           column=d:v, timestamp=1452262267583, value=9223372036854775807

查询

// get
hbaseshell: get 'test:demo', "\x00\x00\x00\x05"
PHP: $client->get($table, pack('N', 5), 'd:v', array());
// scan
scan 'test:demo', {LIMIT=>3, STARTROW=>"\x00\x00\x00\x05"}
scan 'test:demo', {LIMIT=>3, FILTER=>"RowFilter(=, 'binary:\x00\x00\x00\x0A')"}
PHP:

前缀匹配

设计为字符串可以前缀匹配,设计为二进制可以通过比较运算符

scan 'test:demo'
10                                         column=d:v, timestamp=1452264128708, value=10
11                                         column=d:v, timestamp=1452264128708, value=11
12                                         column=d:v, timestamp=1452264128708, value=12
13                                         column=d:v, timestamp=1452264128708, value=13
20                                         column=d:v, timestamp=1452264128708, value=20
21                                         column=d:v, timestamp=1452264128708, value=21
22                                         column=d:v, timestamp=1452264128708, value=22
23                                         column=d:v, timestamp=1452264128708, value=23
30                                         column=d:v, timestamp=1452264128708, value=30
scan 'test:demo', {FILTER=>"PrefixFilter('2')"}
20                                         column=d:v, timestamp=1452264128708, value=20
21                                         column=d:v, timestamp=1452264128708, value=21
22                                         column=d:v, timestamp=1452264128708, value=22
23                                         column=d:v, timestamp=1452264128708, value=23
scan 'test:demo'
\x00\x00\x00\x0A                           column=d:v, timestamp=1452264274718, value=10
\x00\x00\x00\x0B                           column=d:v, timestamp=1452264274718, value=11
\x00\x00\x00\x0C                           column=d:v, timestamp=1452264274718, value=12
\x00\x00\x00\x0D                           column=d:v, timestamp=1452264274718, value=13
\x00\x00\x00\x14                           column=d:v, timestamp=1452264274718, value=20
\x00\x00\x00\x15                           column=d:v, timestamp=1452264274718, value=21
\x00\x00\x00\x16                           column=d:v, timestamp=1452264274718, value=22
\x00\x00\x00\x17                           column=d:v, timestamp=1452264274718, value=23
\x00\x00\x00\x1E                           column=d:v, timestamp=1452264274718, value=30
scan 'test:demo', {FILTER=>"RowFilter(>=, 'binary:\x00\x00\x00\x14') AND RowFilter(<, 'binary:\x00\x00\x00\x1E')"}
\x00\x00\x00\x14                                     column=d:v, timestamp=1452475162689, value=20
\x00\x00\x00\x15                                     column=d:v, timestamp=1452475162689, value=21
\x00\x00\x00\x16                                     column=d:v, timestamp=1452475162689, value=22
\x00\x00\x00\x17                                     column=d:v, timestamp=1452475162689, value=23

时间戳倒序

PHP_INT_MAX - timestamp

scan 'test:demo'
694930447                                            column=d:v, timestamp=1452494595766, value=2016-01-11 23:00:00
694934047                                            column=d:v, timestamp=1452494595766, value=2016-01-11 22:00:00
694937647                                            column=d:v, timestamp=1452494595766, value=2016-01-11 21:00:00
694941247                                            column=d:v, timestamp=1452494595766, value=2016-01-11 20:00:00
694944847                                            column=d:v, timestamp=1452494595766, value=2016-01-11 19:00:00
694948447                                            column=d:v, timestamp=1452494595766, value=2016-01-11 18:00:00
695016847                                            column=d:v, timestamp=1452494595766, value=2016-01-10 23:00:00
695020447                                            column=d:v, timestamp=1452494595766, value=2016-01-10 22:00:00
695103247                                            column=d:v, timestamp=1452494595766, value=2016-01-09 23:00:00

RowKey 散列

随机性,很好的分片策略,写性能好,适合随机读,不能做范围查询


scan 'test:demo'                                                                                                                                          
 05dd7456c96cfe634ddde71643005aea                     column=d:v, timestamp=1452494789054, value=2016-01-11 19:00:00                                                                                               
 73ba01bbd958dc1ee21a77f90c093201                     column=d:v, timestamp=1452494789054, value=2016-01-11 22:00:00                                                                                               
 997911d73a22d9b66bba1b1eaf57585b                     column=d:v, timestamp=1452494789054, value=2016-01-10 22:00:00                                                                                               
 a9b8bc97eb1857ec093729461bc82e39                     column=d:v, timestamp=1452494789054, value=2016-01-11 18:00:00                                                                                               
 af5a500358b045e0cb108706df74c277                     column=d:v, timestamp=1452494789054, value=2016-01-11 20:00:00                                                                                               
 b23a99f57d2f1ce4b2b38e61fc1ae297                     column=d:v, timestamp=1452494789054, value=2016-01-11 23:00:00                                                                                               
 cd2c5eecb54e873116f6a4f5cae597ca                     column=d:v, timestamp=1452494789054, value=2016-01-10 23:00:00                                                                                               
 df0ac01d686792a2084feaed87ef7da8                     column=d:v, timestamp=1452494789054, value=2016-01-11 21:00:00                                                                                               
 e3e1fcf4a89dfc0ec4657f507d8a651e                     column=d:v, timestamp=1452494789054, value=2016-01-09 23:00:00

Salt 散列

通过md5(key) % splitNum + key 处理

scan 'test:demo'                                                                                                                                    
 0#1452380400                                         column=d:v, timestamp=1452494968030, value=2016-01-09 23:00:00                                                                                               
 0#1452466800                                         column=d:v, timestamp=1452494968030, value=2016-01-10 23:00:00                                                                                               
 0#1452535200                                         column=d:v, timestamp=1452494968030, value=2016-01-11 18:00:00                                                                                               
 0#1452542400                                         column=d:v, timestamp=1452494968030, value=2016-01-11 20:00:00                                                                                               
 0#1452546000                                         column=d:v, timestamp=1452494968030, value=2016-01-11 21:00:00                                                                                               
 0#1452553200                                         column=d:v, timestamp=1452494968030, value=2016-01-11 23:00:00                                                                                               
 1#1452463200                                         column=d:v, timestamp=1452494968030, value=2016-01-10 22:00:00                                                                                               
 3#1452549600                                         column=d:v, timestamp=1452494968030, value=2016-01-11 22:00:00                                                                                               
 5#1452538800                                         column=d:v, timestamp=1452494968030, value=2016-01-11 19:00:00