Hosting a web site in radosgw
Posted on Tue 26 January 2016 in hints-and-kinks • 6 min read
If you’re familiar with web site hosting on Amazon S3, which is a simple and cheap way to host a static web site, you might be wondering whether or not you can do the same in Ceph radosgw.
The short answer is you can’t. Bucket Website is listed as Not Supported in the radosgw S3 API support matrix, and radosgw doesn’t have index document support either.
But the longer answer is that you can, provided you use radosgw in
combination with a front-end load-balancer — which, as it happens,
can add a few more bells and whistles, as well. You could probably do
the same thing with nginx, Varnish, or Apache in a
mod_proxy_balancer
balancer setup, but in this example
configuration, we’ll use HAProxy.
Getting started: the radosgw basics
Let’s take look at a simple radosgw configuration with virtual host
support, such that you can access your buckets as either
http://ceph.example.com/bucketname
or
http://bucketname.ceph.example.com
:
[client.rgw.radosgw01]
rgw_frontends = civetweb port=7480
rgw_dns_name = ceph.example.com
rgw_resolve_cname = True
Suppose we use s3cmd
to upload an HTML file to this bucket, setting
a public ACL:
s3cmd mb s3://testwebsite
s3cmd put --acl-public index.html s3://testwebsite/
Then if you exposed your radosgw to the web, any client (without
authentication) would be able to retrieve
http://testwebsite.ceph.example.com:7480/index.html
with a web
browser, or any other HTTP client application (such as curl
or
wget
):
curl -I http://testwebsite.ceph.example.com:7480/index.html
Which would then return something like:
HTTP/1.1 200 OK
Content-Length: 18050
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 21:28:47 GMT
ETag: "b03130a4a1fc24df0f9f336f2b6d1d90"
x-amz-request-id: tx000000000000000005a88-0056a7b7eb-312df-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:16:11 GMT
Introducing HAProxy
Now let’s start out with putting HAproxy in between. Nothing special there: radosgw listens on the conventional 7480 port, and we simply hand HAproxy traffic through there, and bind HAProxy itself to port 80.
global
log /dev/log local0
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats level admin
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/haproxy/ssl
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL).
ssl-default-bind-ciphers HIGH
tune.ssl.default-dh-param 2048
defaults
log global
mode http
option httplog
option dontlognull
retries 3
timeout queue 1000
timeout connect 1000
timeout client 30000
timeout server 30000
option forwardfor
frontend ceph_front
bind 0.0.0.0:80
default_backend ceph_back
backend ceph_back
balance source
server radosgw01 127.0.0.1:7480 check
Index documents
So, the first thing we’ll need to add is support for index
documents. We’d like to make sure that when we retrieve
https://testwebsite.ceph.example.com/
, what’s actually fetched from
the backend is /index.html
. We can do that by adding an HAproxy ACL
that matches for the trailing slash in the path, and an http-request
set-path
directive that appends the index document name:
frontend ceph_front
bind 0.0.0.0:80
acl path_ends_in_slash path_end -i /
# Append index document (index.html) to any path
# ending in "/".
http-request set-path %[path]index.html if path_ends_in_slash
default_backend ceph_back
Now, that’s fine in terms of getting the index document correctly:
curl -I http://testwebsite.ceph.example.com/index.html
HTTP/1.1 200 OK
Content-Length: 18050
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 21:28:47 GMT
ETag: "b03130a4a1fc24df0f9f336f2b6d1d90"
x-amz-request-id: tx000000000000000005a94-0056a7b9e3-312df-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:24:35 GMT
However, it of course breaks uploads and even bucket listings, or in other words, anything that uses the S3 API. Now you could test for some S3-specific headers in the request, but really, you should just check whether the request is authorized, and only apply the index document logic if it isn’t, like so:
frontend ceph_front
bind 0.0.0.0:80
acl path_ends_in_slash path_end -i /
acl auth_header hdr(Authorization) -m found
# Append index document (index.html) to any path
# ending in "/", unless the request has an auth header
http-request set-path %[path]index.html if path_ends_in_slash !auth_header
default_backend ceph_back
Great. Now we can upload using full paths without mangling, and on any
un-authenticated requests, we substitute /index.html
for any trailing
/
. In case you’re wondering: yes, this works for any path, not just
the root path.
Directory paths
However, you may also want something else, which is the ability to
correctly handle a request like
http://testwebsite.ceph.example.com/my/sub/directory
, where of
course you want the path /my/sub/directory
translated into
/my/sub/directory/index.html
, which means we want to append a slash
and an index document name to the request path.
So let’s do that:
frontend ceph_front
bind 0.0.0.0:80
acl path_has_dot path_sub -i .
acl path_ends_in_slash path_end -i /
acl auth_header hdr(Authorization) -m found
http-request set-path %[path]index.html if path_ends_in_slash !auth_header
# Append trailing slash if necessary.
http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
default_backend ceph_back
Note that what we’re doing here is somewhat crude. We’re assuming that
any actual file that we want to retrieve looks like name.ext
,
meaning it has a dot (period, full stop) character in it. The
path_sub -i .
expression in the path_has_dot
ACL simply matches
any path with .
in it, and we’re assuming that if a path has a dot
then it points to a file, if it doesn’t then it points to a directory.
You could be a little more clever here and use path_regex
instead of
path_sub
for a full regular expression match. But regex lookups are
slower than simple substring matches, so if the substring match works
for you, go for it.
So now, we can do this:
s3cmd put --acl-public index.html s3://testwebsite/my/sub/directory/
And then:
# Note omitted trailing slash
curl -I http://testwebsite.ceph.example.com/my/sub/directory
HTTP/1.1 200 OK
Content-Length: 24235
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 23:57:04 GMT
ETag: "fecd005b33c0f6bfdee61b787cf54cb0"
x-amz-request-id: tx00000000000000000bc83-0056a7bd25-312cd-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:38:29 GMT
HTTPS support
So, what else might you want to do? One obvious thing that you can use
HAproxy for is SSL termination. The radosgw embedded civetweb
webserver can do that for you, but that feature is currently mildly
broken in a rather curious
way. So in order to allow HTTPS
access to all your content via HAproxy instead, you would add:
frontend ceph_front_ssl
bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
reqadd X-Forwarded-Proto:\ https
acl path_has_dot path_sub -i .
acl path_ends_in_slash path_end -i /
acl auth_header hdr(Authorization) -m found
http-request set-path %[path]index.html if path_ends_in_slash !auth_header
http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
default_backend ceph_back
But maybe you’d like to force, not merely allow, HTTPS
access. redirect
to the rescue:
frontend ceph_front
bind 0.0.0.0:80
reqadd X-Forwarded-Proto:\ http
redirect scheme https code 301 if !{ ssl_fc }
frontend ceph_front_ssl
bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
reqadd X-Forwarded-Proto:\ https
acl path_has_dot path_sub -i .
acl path_ends_in_slash path_end -i /
acl auth_header hdr(Authorization) -m found
http-request set-path %[path]index.html if path_ends_in_slash !auth_header
http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
default_backend ceph_back
And here we go:
# Note HTTP
curl -IL http://testwebsite.ceph.example.com/my/sub/directory
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://testwebsite.ceph.example.com/my/sub/directory
Connection: close
HTTP/1.1 200 OK
Content-Length: 24235
Accept-Ranges: bytes
Last-Modified: Mon, 25 Jan 2016 23:57:04 GMT
ETag: "fecd005b33c0f6bfdee61b787cf54cb0"
x-amz-request-id: tx00000000000000000bdeb-0056a7bf9b-312cd-default
Content-type: text/html
Date: Tue, 26 Jan 2016 18:48:59 GMT
Compression
And finally, maybe you’d like to speed up access to the stuff on your site. Why not add gzip on-the-fly-compression? It’s supported by every browser worth its salt, and will make your users happier. You’ll want to restrict compression to specific MIME types though. In the configuration below, we enable compression for plain text, HTML, XML, CSS, JavaScript, and SVG images.
frontend ceph_front
bind 0.0.0.0:80
reqadd X-Forwarded-Proto:\ http
redirect scheme https code 301 if !{ ssl_fc }
frontend ceph_front_ssl
bind 0.0.0.0:443 ssl crt ceph.pem no-sslv3 no-tls-tickets
reqadd X-Forwarded-Proto:\ https
acl path_has_dot path_sub -i .
acl path_ends_in_slash path_end -i /
acl auth_header hdr(Authorization) -m found
http-request set-path %[path]index.html if path_ends_in_slash !auth_header
http-request set-path %[path]/index.html if !path_has_dot !path_ends_in_slash !auth_header
compression algo gzip
compression type text/html text/xml text/plain text/css application/javascript image/svg+xml
default_backend ceph_back
Let’s see how that helps us. Do a request without gzip encoding
support, and observe that its total download size matches the
document’s Content-Length
:
curl https://testwebsite.ceph.example.com/my/sub/directory > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 24235 100 24235 0 0 94565 0 --:--:-- --:--:-- --:--:-- 94299
Now, add an Accept-Encoding
header:
curl -H 'Accept-Encoding: gzip' https://testwebsite.ceph.example.com/my/sub/directory > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 5237 0 5237 0 0 19243 0 --:--:-- --:--:-- --:--:-- 19324
There. Actual download size goes from 24KB down to just 5KB.
Where to go from here
There’s a few additional features to be added here. You could enable CORS or HSTS, for example, and of course you could add more backends. But if you read this far, you surely get the idea.
And you’re welcome to examine the headers you can pull from this page you’re reading, wink wink. :)
This article originally appeared on the hastexo.com
website (now defunct).