Cloud Server Downgrade


Since I am still unemployed and depend on my blog as a means of both learning and mental stimulation, I decided to make the funds that I still have remaining with my cloud provider, Vultr, last a little bit longer. Since I was not really using a whole lot of the server space and RAM that I had allocated, I figured that I could downgrade from the 10.00 per month plan to the 5.00 tier. Since the new cloud server has a smaller overall footprint, I needed to convert from Apache to hosting with OpenBSD’s httpd. The conversion itself was relatively trivial, it just took a long time. The hardest part was figuring out how to make OpenBSD’s relayd do some additional filtering so that I could continue to block the badly behaving web crawlers. It turns out it was not too difficult. Below is my relayd.conf file which has an include statement for an additional set of rules relayd.rules specific for blocking the user agent strings belonging to the bad bots.

prefork 5

http protocol "https" {
        match request header set "X-Forwarded-For" \
                value "$REMOTE_ADDR:$REMOTE_PORT"
        match request header set "X-Forwarded-By" \
                value "$SERVER_ADDR:$SERVER_PORT"
        match request header set "X-Forwarded-Proto" \
                value "https"
        match request header set "Keep-Alive" \
                value "$TIMEOUT"
        
        match response header remove "Server"
        match response header set "Strict-Transport-Security" \
                value "max-age=31556952"
        match response header set "X-Frame-Options" \
                value "SAMEORIGIN"
        match response header set "X-XSS-Protection" \
                value "1; mode=block"
        match response header set "X-Content-Type-Options" \
                value "nosniff"
        match response header set "Referrer-Policy" \
                value "no-referrer-when-downgrade"
        match response header set "Feature-Policy" \
                value "camera 'none'; microphone 'none'; geolocation 'none'"

        include "/etc/relayd.rules"

        tcp {nodelay, sack, backlog 10}
        tls keypair "goblackcat.com"
}

relay "www" {
        listen on egress port 443 tls
        protocol "https"
        forward with tls to 127.0.0.1 port 443
}

relay "www6" {
        listen on 2001:19f0:5:3208::1 port 443 tls
        protocol "https"
        forward with tls to ::1 port 443
}

Below are the additional set of rules that I created to keep the bad web crawling robots at day. It does not really stop them completely because they still try periodically. However, they’re always greeted with a 403 Forbidden error!

#relayd.rules
block request header "User-agent" value "*Alpha*"
block request header "User-agent" value "*Ahref*"
block request header "User-agent" value "*Baidu*"
block request header "User-agent" value "*Sogou*"
block request header "User-agent" value "*Semrush*"
block request header "User-agent" value "*Yandex*"
block request header "User-agent" value "*YaK*"
block request header "User-agent" value "*PaperLi*"
block request header "User-agent" value "*Buck*"
block request header "User-agent" value "*masscan*"
block request header "User-agent" value "*SearchAtlas*"
block request header "User-agent" value "*SMUrl*"
block request header "User-agent" value "*Twingly*"
return error

In this case the relayd.rules file, for an intense and purposes, mimics the Apache .htaccess file’s capability of blocking by rewriting. Then, I set up OpenBSD’s httpd with the httpd.conf file below. In the server definition file for the TLS side, I have added some rewrite rules which also mimic a piece of what .htaccess used to do for me. Since Googlebot, msnbot, bingbot, and msnbot keep trying to grab my RSS feed at the location that used to be associated with my old WordPress site. I reckon one day they will learn.

server "example.com" {
        listen on egress port 80
        alias "www.example.com"
        block return 302 "https://$SERVER_NAME$REQUEST_URI"

        log {
                style forwarded
                access "access-www.log"
                error "error-www.log"
        }
}

server "example.com" {
        listen on localhost tls port 443
        alias "www.example.com"
        root "/htdocs/example"
        directory index "index.html"

        log {
                style forwarded
                access "access-www.log"
                error "error-www.log"
        }

        tcp {nodelay, sack, backlog 10}

        tls {
                certificate "/etc/ssl/example.com.crt"
                key "/etc/ssl/private/example.com.key"
        }

        location "/feed*" {
                request rewrite "/index.xml"
        }
		
		location "/posts/feed" {
				request rewrite "/index.xml"
		}
}

And that is, as they say, a wrap. The only thing I am going to miss about Apache is the fact that OpenBSD’s httpd does not have the custom error page feature. Maybe one day it will get implemented but it is probably not high up on the developer’s interests or priorities. I don’t fault them because one of httpd’s goals was not having featuritis or unnecessary extra features that just makes a piece of software bloated. Custom error pages are more of a want than a need.


See also