concrete5 pages with .html extension in URL

This article might be very specific for one issue and also a bit hacky, but something when you care about your search engine ranking a lot, it might be good idea to look at it.
If you move an existing site to concrete5 which was in the internet for a while, you’ll have a number of websites that point to your site. Depending on the old CMS, if there was one, the URLs might look different than common concrete5 URLs. What choices do you have in such a situation?

  1. Do nothing and wait till Google updates its index, but keep in mind that other websites might never update their links unless someone tells them to do so.
  2. Add a proper 301 redirect that forwards the visitors to the new page. There’s an add-on for this written by ScottC that handles this task very nicely, http://www.concrete5.org/marketplace/addons/url-director/
  3. Make sure the old links still work but changing the concrete5 URLs

In this article we’re going to look at the last option. Before we start, you should know how a page path in concrete5 looks like. You probably do but just in case:

  • /about/
  • /help/faq/

Sometimes the trailing slash is missing but that doesn’t matter in our case. Let’s assume we’ve got an existing non-concrete5 site where the page paths look like this:

  • /about.html
  • /help/faq.html

We’re going to make a few minor changes to make it possible to use this addresses without having to add a redirect. Google nor any visitor will see a change unless they look at the source code of the website.
To achieve this, we need to modify the .htaccess file. If you aren’t familiar with it, in short: It’s an Apache specific file that lets you add certain configurations without having to touch the central configuration file. Before you continue, make sure you’ve got pretty URLs activated, to do this, type “pretty urls” in the intelligent search box in concrete5 and tick the checkbox. If everything works, this change should work as well.

Open the .htaccess file, it might be hidden depending on your FTP client, make sure you’ve enabled to option to see hidden files, and make it look like this:

RewriteEngine On
RewriteBase /
 
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}/index.html !-f
RewriteCond %{REQUEST_FILENAME}/index.php !-f
RewriteRule (.*)\.html$ index.php [E=HTML_URL:/$1/]
 
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}/index.html !-f
RewriteCond %{REQUEST_FILENAME}/index.php !-f
RewriteRule . index.php [L]

The interesting part is the first RewriteRule. It tries to match every request where there’s a “.html” at the end. If it found one, it will extract everything before the “.html”. The “.*” is the rule that matches everything and the brackets around it, make sure it gets saved as a variable, $1 in our case. The E=HTML_URL creates a new variable called REDIRECT_HTML_URL with the value of our path. The first part REDIRECT_ is always added by Apache.

Now that we have an additional variable that hols the correct page path in case there’s a URL ending with “.html”, we have to tell concrete5 to process that new variable. To do this, open config/site.php and insert a new line which defines SERVER_PATH_VARIABLE. Here’s a complete example but note that most values are probably different on your server, it’s just about the last line:

<?php 
define('DB_SERVER', 'localhost');
define('DB_USERNAME', 'codeblog');
define('DB_PASSWORD', 'sdf.sdfDD*çsdf');
define('DB_DATABASE', 'codeblog');
define('BASE_URL', 'http://www.codeblog.ch');
define('DIR_REL', '');
define('PASSWORD_SALT', 'yc8tSlHjZaGpADJ');
define('SERVER_PATH_VARIABLE', 'REDIRECT_HTML_URL');

That’s all it takes, after you’ve made these changes, you can go on an use .html at the end of your URLs.

How that’s it work?

Just for those who’d like to understand why this works. First we’ve made sure there’s a variable that holds the correct path, even if there’s a different ending. We then changed the variable concrete5 uses to find the correct page. If you open /core/concrete/libraries/request.php you can find this code:

public static function get() {
    static $req;
    if (!isset($req) || C5_ENVIRONMENT_ONLY) {
        $path = false;
        if (defined('SERVER_PATH_VARIABLE')) {
            $path = Request::parsePathFromRequest(SERVER_PATH_VARIABLE);
        }
        if (!$path) {
            $path = Request::parsePathFromRequest('PATH_INFO');
        }
        if (!$path) {
            $path = Request::parsePathFromRequest('REDIRECT_URL');
        }
        if (!$path) {
            $path = Request::parsePathFromRequest('REQUEST_URI');
        }
        if (!$path) {
            $path = Request::parsePathFromRequest('ORIG_PATH_INFO');
        }
        if (!$path) {
            $path = Request::parsePathFromRequest('SCRIPT_NAME');
        }
        $req = new Request($path);
    }
    return $req;
}

There’s a check which looks for SERVER_PATH_VARIABLE which is exactly what we’re using. Since concrete5 checks a few more things, we can still use the original page paths like /about.

Generate different URLs in concrete5

Now that we can process requests ending with .html you might wonder what we can do to create such addresses. This requires a bit more hacking but it’s also possible. In most cases where you have a link, concrete5 uses a helper called “Navigation”, here’s the official documentation about it: http://www.concrete5.org/documentation/developers/helpers/navigation/. We can override this helper and add “.html” to the links generated by it.

Create a new file called “navigation.php” in the root directory “helpers”. On a default installation, this directory will be empty. Once you’ve created that file, insert this content:

<?php
defined('C5_EXECUTE') or die('Access Denied.');
class NavigationHelper extends Concrete5_Helper_Navigation {
    public function getLinkToCollection(&$cObj, $appendBaseURL = false, $ignoreUrlRewriting = false) {
        $link = parent::getLinkToCollection($cObj, $appendBaseURL, $ignoreUrlRewriting);
        if (URL_REWRITING && !$ignoreUrlRewriting) {
            $relativeLink = rtrim(str_replace(BASE_URL, '', $link), '/');
            if ($relativeLink != '') {
                $link = rtrim($link, '/') . '.html';
            }
        }
        return $link;
    }
}

This hacky code will change your URLs and thus make sure you’ll have .html in your internal links.

As mentioned at the beginning, this solution is pretty hacky but might work in case you must keep the existing URLs. Let me know if there are things that I’ve missed.




No Comments


You can leave the first : )



Leave a Reply

Your email address will not be published. Required fields are marked *