SEO: Eliminating duplicate content on eZ Publish sites
By: Xavier Cousin | May 4, 2012 | Web solutions
A common task in website search engine optimization is to set up Google Webmaster Tools and go through its HTML suggestions. Many of the suggestions center around duplicate content -- pages at different URLs but that have the same HTML title, meta description, body content, and more. This post summarizes a couple of common duplicate content scenarios within an eZ Publish site, and how to solve them.
"kernel (1)" errors: returning a more appropriate HTTP header
In eZ Publish, pages that require user login in order to be viewed -- that is, pages that return the default "kernel (1)" error message to anonymous (or otherwise non-authorized for the use case) users -- return an HTTP status code of 200. This might be appropriate depending on your site setup, but search engines view such pages as indexable content, and by default the content for all such pages is almost identical. The more appropriate HTTP header is the "401 Authorization Required" status.
For one of our clients, thousands of duplicate content errors were due to links in a forum that bring up the ability for users to send private messages to each other. If a user is not logged in, they are prompted to do so before being able to send such messages. This resulted in as many duplicate content errors as there were forum topics, since search engines can only see the login page.
eZ Publish supports custom HTTP headers per error type; you can configure this in an override of error.ini as shown below:
<?php /* #?ini charset="utf8"? [ErrorSettings-kernel] HTTPError[1]=401 #if, as in this case, you're throwing an HTTP response code that isn't #defined by default in the error.ini, you must declare and name it here [HTTPError-401] HTTPName=Authorization Required */ ?>
Once this setting is in place, every "kernel (1)" error will have a 401 HTTP header, which tells Google and other search engines that the page is intended for logged in users and should not be indexed.
Of course, there are alternative solutions and slightly different scenarios. For example, you could specify URL patterns for search engines to ignore in robots.txt rather than specifying a blanket status for all "kernel (1)" errors. Or, you might want search engines to index content behind the login form using something like First Click Free.
Sorting and pagination: the same content displayed in different ways
Sorting and other similar filters
Many sites have sections where users can choose multiple ways to display the same content. For example, you might have a listing of events where the events can be sorted alphabetically, by date, by location, and so on. Or, you might provide users with drilldown navigation, so they can apply filters in different ways to display the content they are looking for. Often you will accomplish this by using view parameters, in a URL pattern such as http://yoursite.com/yourpageurl/(filter_by)/value1/(another_filter)/value2/(sort_by)/name. Be aware that this creates a lot of potential for duplicate content, HTML titles, and meta descriptions.
In order to indicate to search engines the "right" or "base" URL, you can use the canonical link tag. There are a couple of ways to specify this within an eZ Publish context, one of which would be to check for the existence of certain view parameters in your pagelayout.tpl template:
{def $canonical=false} ... {if or( is_set( $module_result.view_parameters.filter_param1 ), is_set( $module_result.view_parameters.filter_param2 ) )} {set $canonical = true} {/if} <head> ... {if $canonical} {* add a rel="canonical" and set the href (the right page) to the url without the view parameters *} <link rel="canonical" href={$requested_uri_string|ezurl()} /> {/if} ... </head>
In the case above, the $requested_uri_string variable is automatically available in the pagelayout, and excludes any view parameters. A slight variation on that framework would be to delegate the logic to the module result, letting the full view templates handle the logic of determining and/or overriding the canonical link. We won't show that solution in this post, but it would use the ezpagedata() and ezpagedata_set() template operators similar to what is discussed below regarding pagination.
"prev" and "next" links for pagination
Displaying one big piece of content across multiple paginated pages can cause duplicate HTML title and meta description issues. One of our client's sites had rel="prev" (for previous page) and rel="next" (for next page) attributes directly on the previous and next page links in the pagination. However, Google's documentation on the topic suggests that you place this information in <link> tags within the HTML header instead.
To do so, your pagination template (loaded in full view templates) needs to send information to the pagelayout about whether the current page has previous and/or next pagination links. You can use the ezpagedata_set() template operator in the pagination template...
{* PAGINATION TEMPLATE *} ... {switch match=$:item_previous|lt(0) } {case match=0} {* there is a previous page; let the pagelayout know about it *} {ezpagedata_set( 'previous_page', $previous_page_url|ezurl() )} <a href={$previous_page_url|ezurl()} class="prev">< {"prev"|i18n("design/standard/navigator")}</a> {/case} {case match=1} <span class="prev">< {"prev"|i18n("design/standard/navigator")}</span> {/case} {/switch} ... {switch match=$:item_next|lt($item_count)} {case match=1} {* there is a next page; let the pagelayout know about it *} {ezpagedata_set( 'next_page', $next_page_url|ezurl())} <a href={$next_page_url|ezurl()} class="next" rel="next">{"next"|i18n("design/standard/navigator")} ></a> {/case} {case} <span class="next">{"next"|i18n("design/standard/navigator")} ></span> {/case} {/switch} ...
... and then test for this information using the ezpagedata() template operator in the pagelayout:
{* PAGELAYOUT *} {def $pagedata = ezpagedata()} <head> ... {if is_set( $pagedata.persistent_variable.previous_page )} <link rel="prev" href=$pagedata.persistent_variable.previous_page /> {/if} {if is_set( $pagedata.persistent_variable.next_page )} <link rel="next" href=$pagedata.persistent_variable.next_page /> {/if} ...
Do you have any common sources of duplicate content errors in an eZ Publish context? Share them with us by leaving a comment.