Configuring radosgw to behave like Amazon S3
Posted on Mon 09 July 2012 in hints-and-kinks • 3 min read
If you’ve heard of Ceph, you’ve surely heard of radosgw, a RESTful gateway interface to the RADOS object store. You’ve probably also heard that it provides a front-end interface that is compatible with Amazon’s S3 API.
The question remains, if you have an S3 client that always assumes it can find objects at http://bucket.s3.amazonaws.com, how can you use such a client to interact, unmodified, with your radosgw host (or hosts)?
Pulling this off is actually remarkably simple, if you can control what nameserver your clients use to resolve DNS names. Which should be a given in the private cloud space.
First, of course, you’ll need an installed and configured Ceph cluster with one or several radosgw nodes. The Ceph documentation is an excellent reference for setting up radosgw.
Configuring radosgw to support virtual hosts
Then, you make sure you have the following entry in your Ceph configuration (normally in /etc/ceph/ceph.conf):
[client.radosgw.charlie] rgw dns name = s3.amazonaws.com
Substitute charlie with whatever name you want to use for your radosgw client when you interact with Ceph. What the rgw dns name option specifies is that radosgw will answer queries also for URLs like http://bucket.hostname/object, as opposed to just http://hostname/bucket/object.
Configuring Apache to respond to S3 host names
Also, add a wildcard record to the ServerAlias directive in the web server configuration for your radosgw host. For example:
<VirtualHost *:80> ServerName radosgw.example.com ServerAlias s3.amazonaws.com ServerAlias *.amazonaws.com
Configuring your DNS server
Then, set up your DNS server with a wildcard record in the s3.amazonaws.com zone, and have nameserver respond to requests in that zone. The zone file (for BIND9, in this case) could look like this:
$TTL 604800 @ IN SOA alice.example.com. root.alice.example.com. ( 2 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800 ) ; Negative Cache TTL ; @ IN NS alice.example.com. @ IN A 192.168.122.113 * IN CNAME @
In this zone, the A record s3.amazonaws.com resolves to 192.168.122.113, and any sub-domain (like mybucket.s3.amazonaws.com) also resolves to that same address via a CNAME record.
Using your RADOS store with S3 clients
And then you just configure your client hosts to resolve DNS names via that nameserver, and use your preferred client application to interact with it.
For example, for a user that you’ve created with radosgw-admin, which
uses the access key 12345 with a secret of 67890, and Mark Atwood’s
Net::Amazon::S3::Tools toolkit, here’s how you can interact
with your RADOS objects:
# export AWS_ACCESS_KEY_ID=12345 # export AWS_ACCESS_KEY_SECRET=67890 # s3mkbucket mymostawesomebucket # s3ls mymostawesomebucket # s3put mymostawesomebucket/foobar <<< "hello world" # s3ls mymostawesomebucket foobar # s3get mymostawesomebucket/foobar hello world
Simple enough. You can add one more nifty feature.
Adding load balancing
radosgw can scale horizontally, and all you need to do to make this work is to duplicate your radosgw and Apache configuration onto a different host, and then add a second record to your DNS zone:
$TTL 604800 @ IN SOA alice.example.com. root.alice.example.com. ( 3 ; Serial 604800 ; Refresh 86400 ; Retry 2419200 ; Expire 604800 ) ; Negative Cache TTL ; @ IN NS alice.example.com. @ IN A 192.168.122.112 @ IN A 192.168.122.113 * IN CNAME @
Then, as you access more buckets, you’ll hit the A records in a round-robin fashion, meaning your requests will be balanced across the servers. Add as many as you like.
Obviously, the above steps will not work for HTTPS connections to the REST API. And really, making that work would amount to some pretty terrible SSL certificate authority and client trust hackery, so just don’t do it.
This article originally appeared on the
hastexo.com website (now defunct).