RGW S3数据校验相关问题
沈志伟 Ceph开源社区
前言
随着云时代的来临,海量存储的需求出现了井喷。首当其次的是AWS的s3存储。基于ceph的rgw实现的s3是不少中小厂商的不二选择。今天笔者就在这里简聊关于s3中数据校验及分片上传的相关问题。
上传验证
不少朋友首先就会想到用hash方法来校验数据的完整性。其中md5是最简单、通用的方式。在s3中的object会添加etag的元数据来表示此对象的md5(非分块上传模式)
生成一个1MB大小的文件,并验证它的md5值
[root@hz-ceph-01 mnt]# dd if=/dev/urandom of=1M.txt bs=1M count=1 记录了1+0 的读入 记录了1+0 的写出 1048576字节(1.0 MB)已复制,0.103442 秒,10.1 MB/秒 [root@hz-ceph-01 mnt]# md5sum 1M.txt a16adff9e15406d413ccc5751a49d735 1M.txt
使用s3cmd工具,把此文件上传到test bucket里
[root@hz-ceph-01 mnt]# s3cmd put 1M.txt s3://test upload: '1M.txt' -> 's3://test/1M.txt' [1 of 1] 1048576 of 1048576 100% in 0s 4.80 MB/s done
通过radosgw-admin查看此object的元数据
[root@hz-ceph-01 mnt]# radosgw-admin object stat --bucket=test --object=1M.txt { "name": "1M.txt", "size": 1048576, "policy": { "acl": { "acl_user_map": [ { "user": "test", "acl": 15 } ], "acl_group_map": [], "grant_map": [ { "id": "test", "grant": { "type": { "type": 0 }, "id": "test", "email": "", "permission": { "flags": 15 }, "name": "just test", "group": 0 } } ] }, "owner": { "id": "test", "display_name": "just test" } }, "etag": "a16adff9e15406d413ccc5751a49d735\u0000", ## 在这里 ## "tag": "7473751f-f731-411d-ac91-0bc835035560.14099.138\u0000", "manifest": { "objs": [], "obj_size": 1048576, "explicit_objs": "false", "head_obj": { "bucket": { "name": "test", "pool": "default.rgw.buckets.data", "data_extra_pool": "default.rgw.buckets.non-ec", "index_pool": "default.rgw.buckets.index", "marker": "7473751f-f731-411d-ac91-0bc835035560.4201.1", "bucket_id": "7473751f-f731-411d-ac91-0bc835035560.4201.1", "tenant": "" }, "key": "", "ns": "", "object": "1M.txt", "instance": "", "orig_obj": "1M.txt" }, "head_size": 524288, "max_head_size": 524288, "prefix": ".Ysz22Hik0lKyuDDpvN-rJwUP9xsw8td_", "tail_bucket": { "name": "test", "pool": "default.rgw.buckets.data", "data_extra_pool": "default.rgw.buckets.non-ec", "index_pool": "default.rgw.buckets.index", "marker": "7473751f-f731-411d-ac91-0bc835035560.4201.1", "bucket_id": "7473751f-f731-411d-ac91-0bc835035560.4201.1", "tenant": "" }, "rules": [ { "key": 0, "val": { "start_part_num": 0, "start_ofs": 524288, "part_size": 0, "stripe_max_size": 4194304, "override_prefix": "" } } ], "tail_instance": "" }, "attrs": { "user.rgw.content_type": "application\/octet-stream\u0000", "user.rgw.pg_ver": "�\u0000\u0000\u0000\u0000\u0000\u0000\u0000", "user.rgw.source_zone": "\u0000\u0000\u0000\u0000", "user.rgw.x-amz-date": "Tue, 14 Mar 2017 15:07:48 +0000\u0000", "user.rgw.x-amz-meta-s3cmd-attrs": "uid:0\/gname:root\/uname:root\/gid:0\/mode:33188\/mtime:1489503916\/atime:1489503969\/md5:a16adff9e15406d413ccc5751a49d735\/ctime:1489503916\u0000", "user.rgw.x-amz-storage-class": "STANDARD\u0000" } }可以通过jq工具做过滤,如下所示: radosgw-admin object stat —bucket=test —object=1M.txt | jq .etag
两者对比就比较明显了,基本是一致的。
"etag": "a16adff9e15406d413ccc5751a49d735\u0000" a16adff9e15406d413ccc5751a49d735 1M.txt对于etag末尾的\u0000表示什么含义,目前笔者还未知。了解的朋友希望能帮忙解释下
接下来我们来聊一天分块上传中校验问题,关于s3 multipart upload 的详细原理及过程请参看 aws s3 文档 ,请点我
笔者简单的总结下上传的流程:
本地使用s3cmd put 上传文件如果大于15MB的时候,默认会使用分片来上传。 向s3服务端申请多片上传的upload id ,之后上传分片、列出分片、完成上传都需要用到这个 id 上传的时候还需要指向分片的编号(编号从1开始),本地把文件切出对应分片大小的数据块,然后通过 put 方法把分片数据块上传到s3 分片上传完成,s3会返回一个信息,并在http header里含有刚上传分片的元数据信息(即etag ) 本地通过比 etag 和 分片的md5值,判断分片数据是否完整上传成功 重复 3、4、5 步骤直到所有分片都成功上传 最后本地再发送一个完成上传的请求,请求里包含所有分片的编号和md5值,s3服务端就把这些分片组成一个完整的对象文件保存在指定的bucket里注意
3、4、5步骤可以通过多线程实现并发上传,但这需要自己开发
分片上传实例
创建个20MB 大小的 文件 etag
[root@hz-ceph-01 ~]# dd if=/dev/urandom of=etag bs=1M count=20 20+0 records in 20+0 records out 20971520 bytes (21 MB) copied, 1.53815 s, 13.6 MB/s
查看etag的 md5 值
[root@hz-ceph-01 ~]# md5sum etag cbacae177df231e8b3714bf662a433d9 etag
上传文件etag
[root@hz-ceph-01 ~]# s3cmd put etag s3://test/ upload: 'etag' -> 's3://test/etag' [part 1 of 2, 15MB] [1 of 1] ## 每个分片默认大小为15MB ## 15728640 of 15728640 100% in 2s 6.76 MB/s done upload: 'etag' -> 's3://test/etag' [part 2 of 2, 5MB] [1 of 1] 5242880 of 5242880 100% in 1s 3.74 MB/s done
获取对象的元数据(只取 etag 部分)
[root@hz-ceph-01 ~]# radosgw-admin object stat --bucket=test --object=etag | jq .etag "c05bea71ce5d2b6eceaf46e9c347b22e-2\u0000"
疑问就出现了,为什么这个etag不再是上传文件的md5?那它又是什么呢?请下来笔者就给看官们抽丝剥茧。
这个etag其中各分片数据块的md5值经过二进制转换再md5运算之后的结果
下面我们通过手动方式来制造出这个’etag’值
把etag文件按15MB大小做切片(最后一片按剩余大小切分,20MB=15MB+5MB)
[root@hz-ceph-01 ~]# dd if=etag of=01 bs=15M count=1 1+0 records in 1+0 records out 15728640 bytes (16 MB) copied, 0.07331 s, 215 MB/s [root@hz-ceph-01 ~]# dd if=etag of=02 skip=15 bs=1M count=5 5+0 records in 5+0 records out 5242880 bytes (5.2 MB) copied, 0.00639034 s, 820 MB/s
查看分片的大小
[root@hz-ceph-01 ~]# du -sh * 15M 01 5.0M 02 4.0K anaconda-ks.cfg 20M etag
获取分片的md5值,重定向到一个文本文件
[root@hz-ceph-01 ~]# md5sum 01 | awk '{print $1}' > checksums.txt [root@hz-ceph-01 ~]# md5sum 02 | awk '{print $1}' >> checksums.txt
查看文件内容
[root@hz-ceph-01 ~]# cat checksums.txt 38b979773d839d45670fd962d4e3eff3 3345b99b221433862a415e636098e98d
把文件转成二进制并进行md5运算
[root@hz-ceph-01 ~]# xxd -r -p checksums.txt | md5sum c05bea71ce5d2b6eceaf46e9c347b22e -
以上结果跟我们先前上传的对象的etag值进行比对
“c05bea71ce5d2b6eceaf46e9c347b22e-2\u0000”
看到这结果,我想看官是不是豁然开郞了。etag值的-2表示2个分片。
下面重新上传etag文件,打开debug再观察下
[root@hz-ceph-01 ~]# s3cmd put etag s3://test -d DEBUG: s3cmd version 1.6.1 DEBUG: ConfigParser: Reading file '/root/.s3cfg' DEBUG: ConfigParser: access_key->PY...17_chars...P ...<中间省略> DEBUG: CHECK: etag ## 本地校验etag文件 ## DEBUG: PASS: u'etag' INFO: Running stat() and reading/calculating MD5 values on 1 files, this may take some time... DEBUG: DeUnicodising u'etag' using UTF-8 DEBUG: doing file I/O to read md5 of etag DEBUG: DeUnicodising u'etag' using UTF-8 INFO: Summary: 1 local files to upload ## 添加元数据 ## DEBUG: attr_header: {'x-amz-meta-s3cmd-attrs': 'uid:0/gname:root/uname:root/gid:0/mode:33188/mtime:1489554799/atime:1489554908/md5:cbacae177df231e8b3714bf662a433d9/ctime:1489554799'} ## 添加元数据 ## DEBUG: DeUnicodising u'etag' using UTF-8 DEBUG: DeUnicodising u'etag' using UTF-8 DEBUG: DeUnicodising u'etag' using UTF-8 DEBUG: String 'etag' encoded to 'etag' DEBUG: CreateRequest: resource[uri]=/etag?uploads DEBUG: Using signature v2 ## 签名header ## DEBUG: SignHeaders: 'POST\n\napplication/octet-stream\n\nx-amz-date:Wed, 15 Mar 2017 05:41:45 +0000\nx-amz-meta-s3cmd-attrs:uid:0/gname:root/uname:root/gid:0/mode:33188/mtime:1489554799/atime:1489554908/md5:cbacae177df231e8b3714bf662a433d9/ctime:1489554799\nx-amz-storage-class:STANDARD\n/test/etag?uploads' ## 签名header ## DEBUG: Processing request, please wait... DEBUG: get_hostname(test): test.s3.liudong.com ## 请求upload id ## DEBUG: format_uri(): /etag?uploads DEBUG: Sending request method_string='POST', uri='/etag?uploads', headers={'x-amz-meta-s3cmd-attrs': 'uid:0/gname:root/uname:root/gid:0/mode:33188/mtime:1489554799/atime:1489554908/md5:cbacae177df231e8b3714bf662a433d9/ctime:1489554799', 'content-type': 'application/octet-stream', 'Authorization': 'AWS PY59Q1KZG848ZU1RJIEP:nOc7HrZh1AqCqXOTS9q1D880FaI=', 'x-amz-date': 'Wed, 15 Mar 2017 05:41:45 +0000', 'x-amz-storage-class': 'STANDARD'}, body=(0 bytes) ## 返回结果,包含 uplaod id ## DEBUG: Response: {'status': 200, 'headers': {'transfer-encoding': 'chunked', 'server': 'openresty/1.9.15.1', 'connection': 'keep-alive', 'x-amz-request-id': 'tx000000000000000000096-0058c8d419-3713-default', 'date': 'Wed, 15 Mar 2017 05:41:45 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Bucket>test</Bucket><Key>etag</Key><UploadId>2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup</UploadId></InitiateMultipartUploadResult>'} DEBUG: get_hostname(test): test.s3.liudong.com DEBUG: ConnMan.get(): re-using connection: http://test.s3.liudong.com#1 DEBUG: format_uri(): ### 上传分片 1 ## /etag?partNumber=1&uploadId=2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup 65536 of 15728640 0% in 0s 418.08 kB/sDEBUG: ConnMan.put(): connection put back to pool (http://test.s3.liudong.com#2) ## 上传结束,返回结果 ## DEBUG: Response: {'status': 200, 'headers': {'content-length': '0', 'accept-ranges': 'bytes', 'server': 'openresty/1.9.15.1', 'connection': 'keep-alive', 'etag': '"38b979773d839d45670fd962d4e3eff3"', 'x-amz-request-id': 'tx000000000000000000097-0058c8d419-3713-default', 'date': 'Wed, 15 Mar 2017 05:41:47 GMT'}, 'reason': 'OK', 'data': '', 'size': 15728640L} 15728640 of 15728640 100% in 2s 6.09 MB/s done ## 本地校验 md5 与 etag值 ## DEBUG: MD5 sums: computed=38b979773d839d45670fd962d4e3eff3, received="38b979773d839d45670fd962d4e3eff3" ...<忽略上传分片2> DEBUG: MultiPart: Upload finished: 2 parts DEBUG: MultiPart: Completing upload: 2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup DEBUG: String 'etag' encoded to 'etag' DEBUG: CreateRequest: resource[uri]=/etag?uploadId=2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup DEBUG: Using signature v2 DEBUG: SignHeaders: 'POST\n\n\n\nx-amz-date:Wed, 15 Mar 2017 05:41:49 +0000\n/test/etag?uploadId=2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup' DEBUG: Processing request, please wait... DEBUG: get_hostname(test): test.s3.liudong.com DEBUG: ConnMan.get(): re-using connection: http://test.s3.liudong.com#3 DEBUG: format_uri(): ## 向服务端发送POST,完成上传 ## /etag?uploadId=2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup DEBUG: Sending request method_string='POST', uri='/etag?uploadId=2~1SdbLkiHRWfXMXzcqiw5VaiqeH5HLup', headers={'content-length': '223', 'Authorization': 'AWS PY59Q1KZG848ZU1RJIEP:77yWJrG4+NZqSQM+Fx2ew8J2hFs=', 'x-amz-date': 'Wed, 15 Mar 2017 05:41:49 +0000'}, body=(223 bytes) ## 服务端返回上传对象的相关信息 ## DEBUG: Response: {'status': 200, 'headers': {'transfer-encoding': 'chunked', 'server': 'openresty/1.9.15.1', 'connection': 'keep-alive', 'x-amz-request-id': 'tx000000000000000000099-0058c8d41d-3713-default', 'date': 'Wed, 15 Mar 2017 05:41:49 GMT', 'content-type': 'application/xml'}, 'reason': 'OK', 'data': '<?xml version="1.0" encoding="UTF-8"?><CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Location>test.s3.liudong.com</Location><Bucket>test</Bucket><Key>etag</Key><ETag>c05bea71ce5d2b6eceaf46e9c347b22e-2</ETag></CompleteMultipartUploadResult>'} DEBUG: ConnMan.put(): connection put back to pool (http://test.s3.liudong.com#4)
从上面我们可以清晰地了解整个分片上传的过程。
SDK 实现分片上传
最常用的是 boto,为了更简单地演示,笔者在此用REST的API来实现。由于需要签名认证过程比较繁琐,在此我借用下***。在github上面有个很不错的库python-requests-aws,可以使用requests来实现s3的签名。请点我
还是以上面提到的etag文件为例,01、02为分片. 我们一步一步拆解。
生成upload id### -- coding: UTF-8 --
filename: uplaodid.py
import requests,sys import xmltodict from xmltodict import parse, unparse, OrderedDict from awsauth import S3Auth import time host = ‘s3.liudong.com’ access_key = ‘PY59Q1KZG848ZU1RJIEP’ secret_key = ‘tlCP8M2d9b7rsOI7z6rE7TjJJyYPv36FGVXmTenB’ bucketname = ‘test’ objectname = ‘etag’
申请uploadId
def getid(): cmd = ‘/%s/%s?uploads’ % (bucketname,objectname) url = ‘http://%s%s‘ % (host,cmd) response = requests.post(url, auth=S3Auth(access_key, secret_key,service_url=host)) UploadId = xmltodict.parse(response.content)[‘InitiateMultipartUploadResult’][‘UploadId’] return UploadId
if name == ‘main‘: print getid()
- 执行脚本获取`upload id`
[root@hz-ceph-01 ~]# python uploadid.py 2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW
- 通过s3cmd查看分片上传的`upload id`
[root@hz-ceph-01 ~]# s3cmd multipart s3://test s3://test/ Initiated Path Id 2017-03-16T02:32:59.303Z s3://test/etag 2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW
- 上传分片
-- coding: UTF-8 --
filename: uploadpart.py
import requests,sys,os import xmltodict from xmltodict import parse, unparse, OrderedDict from awsauth import S3Auth import time
host = ‘s3.liudong.com’ access_key = ‘PY59Q1KZG848ZU1RJIEP’ secret_key = ‘tlCP8M2d9b7rsOI7z6rE7TjJJyYPv36FGVXmTenB’ bucketname = ‘test’ objectname = ‘etag’
上传分片
def upload_part(UploadId,part,PartNumber): cmd = ‘/%s/%s?partNumber=%s&uploadId=%s’ % (bucketname,objectname,PartNumber,UploadId) url = ‘http://%s%s‘ % (host,cmd) with open(part, ‘rb’) as f: data = f.read() r = requests.put(url, auth=S3Auth(access_key, secret_key,service_url=host),data=data) print “status: %s” %(r.status_code )
if name == ‘main‘: if len(sys.argv) == 4: uploadid = sys.argv[1] part = sys.argv[2] number= sys.argv[3] else: print ‘uasage:%s [uploadid] [partfile] [partnumber] ‘ % sys.argv[0] sys.exit(2)
if os.path.exists(part): upload_part(uploadid,part,number) else: print "partifle is not exists" sys.exit(2)
- 执行脚本开始分片上传
[root@hz-ceph-01 ~]# python uploadpart.py 2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW 01 1 status: 200 [root@hz-ceph-01 ~]# python uploadpart.py 2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW 02 2 status: 200
在对应的pool里可以发现些踪迹 [root@hz-ceph-01 ~]# rados ls -p default.rgw.buckets.data 7473751f-f731-411d-ac91-0bc835035560.4201.1multipart_etag.2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW.2 7473751f-f731-411d-ac91-0bc835035560.4201.1shadow_etag.2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW.1_3 7473751f-f731-411d-ac91-0bc835035560.4201.1shadow_etag.2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW.1_1 7473751f-f731-411d-ac91-0bc835035560.4201.1shadow_etag.2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW.1_2 7473751f-f731-411d-ac91-0bc835035560.4201.1shadow_etag.2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW.2_1 7473751f-f731-411d-ac91-0bc835035560.4201.1multipart_etag.2~AeLmBWjMBpYRDFqEHyT6BHaZ8-y9oxW.1> - 注意: > - 关于这个`shadow`跟`multpart`的关系还有待研究 - 最后发送完成上传的请求
由于微信限制2000个字符,所以部分内容没有显示,大家可以阅读原文查看。
后记
以上是笔者的在学习和应用s3过程中的一点积累。有疏忽之处请各位看官指正。欢迎各位一起学习Ceph,今后陆续发一些文章,敬请期待。。。
更多博文详见: https://github.com/jaywayjayway/blog