Injecting content into rich text fields at the template level in eZ Publish / eZ Platform
By: Benjamin Kroll | January 1, 2018 | eZ Publish development tips, Web solutions, twig, template operator, ez platform, and ez publish legacy
On content heavy sites, it can be useful to inject snippets of code into CMS data before display (for example, into the body copy of an article). These snippets could be automatically generated glossaries, tables of contents, or ad units, placed mid-content.
Let's explore a general approach to getting such snippets into place.
The use case: injecting mid-content ad units
To be able to support ad units mid-content on long articles, it doesn't make much sense to rely on editors to place custom tags or literal snippets directly into the content they create. That approach is unreliable, prone to human error, and a nightmare to support should requirements change.
A more reliable and maintainable approach is to work out a set of rules and automatically inject the ad unit code at the template level based on those rules, before it is rendered. These rules should be content-based and ideally have a fallback or two, since not all content is structured the same way.
If your articles generally have at least a certain number of paragraphs of text and several sub-headings, you could base the rules on that information. For example: Place the ad unit snippet before the second sub-heading; if there is no second sub-heading, fall back and try to place it after the third paragraph.
Solution outline
To support the use case above, a filter method is used to process and update the article's rich text content. For the purposes of this article, "rich text" refers to eZ Publish's "XML block" datatype. On eZ Platform, a custom Twig filter would be used to call this filter method, while in eZ Publish legacy, a template operator can be used to achieve the same result.
The filter method will need to:
- Process the source HTML into XML
- Search for the target location(s) specified using XPath queries and fall back until one is found or no targets remain
- Inject the injection HTML given, if a target location is found
- Return the source HTML (modified or as-is)
The filter method
public static function injectIntoHTML( $html, $injectContent, $injectPositionXPaths, $mode = 'insertAfter' ) { $DOM = new \DOMDocument; try { // force encoding by prepending xml doc type to ensure data doesn't get mangled $DOM->loadHTML( '<?xml encoding="utf-8" ?>' . $html ); $xpath = new \DOMXPath( $DOM ); $parent = $xpath->query( '//body' ); $injectPositionXPaths = ( $injectPositionXPaths && is_string( $injectPositionXPaths ) )? array( $injectPositionXPaths ) : $injectPositionXPaths; // try to find the injection target based on the path given // the first match wins, others are fallback foreach( $injectPositionXPaths as $path ) { $injectTarget = $xpath->query( $path ); if ( $injectTarget && $injectTarget->item(0) ) { $newElement = $DOM->createElement( 'inject' ); switch ( $mode ) { case 'insertBefore': { $parent->item(0)->insertBefore( $newElement, $injectTarget->item(0) ); } break; case 'insertAfter': default: { $parent->item(0)->insertBefore( $newElement, $injectTarget->item(0)->nextSibling ); } break; } // we're only interested in the body content after this point // i.e. the original HTML passed in, including the injection placeholder preg_match( '/<body>(.*)<\/body>/ims', $DOM->saveHTML( $parent->item(0) ), $matches ); if ( $matches && isset( $matches[1] ) ) { // replacing injection placeholder with the content to be injected $html = preg_replace( '/<inject><\/inject>/', $injectContent, $matches[1] ); } // exit after an injection target was found & content was injected successfully break; } } } catch ( Exception $e ) { // handle any exceptions as needed } finally { // always return return $html; } }
Solution breakdown
Parameters
The injection method accepts three required parameters and one optional parameter, and returns a modified version of the HTML passed in.
- $html string source HTML to inject into
- $injectContent string content to inject
- $injectPositionXPathsstring|array XPath query string or array of XPath query strings of where to inject content. First match wins, others used as fallback.Note: XPath query strings need to start with //body as the HTML is placed in a wrapper node during the injection process.
- Optional $mode string injection method insertBefore|insertAfter (default)
Creating a DOM document from the HTML
The injection method tries to create a DOM document using the helper method supplied by PHP's DOMDocument class. Prepending an XML DocType string to the HTML forces the desired encoding, as the injection HTML is not a full valid HTML document. This prevents the text from getting mangled due to encoding issues.
When working with the HTML output of an eZ Publish rich text (XML block) field, you have some assurance that the supplied HTML is valid. Content in eZ Publish is stored separate from design; content that is created through eZ Publish's rich text editor passes through validator methods before being stored.
A reference to the common parent element is stored, which is used during the injection process. The body node is created by DOMDocument's loadHTML() method and wraps the child nodes created from the HTML.
$DOM = new \DOMDocument; try { // force encoding by prepending xml doc type to ensure data doesn't get mangled $DOM->loadHTML( '<?xml encoding="utf-8" ?>' . $html ); $xpath = new \DOMXPath( $DOM ); $parent = $xpath->query( '//body' );
Finding and processing injection targets
The method loops over the injection target XPath queries and creates a new dummy element if a matching path is found. It continues to look for new targets until a match is found or all targets have been processed but none found.
The new element is created either before or after the target element, based on the injection mode chosen.
Since the DOMNode $parent->item(0) does not have an insertAfter() method available, the code looks for the next sibling element instead and injects before it, resulting in the desired placement. (This can fail in edge cases and could be improved by checking if the target is the last child element and, if so, using appendChild() on the parent element.)
$injectPositionXPaths = ( $injectPositionXPaths && is_string( $injectPositionXPaths ) )? array( $injectPositionXPaths ) : $injectPositionXPaths; foreach( $injectPositionXPaths as $path ) { $injectTarget = $xpath->query( $path ); if ( $injectTarget && $injectTarget->item(0) ) { $newElement = $DOM->createElement( 'inject' ); switch ( $mode ) { case 'insertBefore': { $parent->item(0)->insertBefore( $newElement, $injectTarget->item(0) ); } break; case 'insertAfter': default: { $parent->item(0)->insertBefore( $newElement, $injectTarget->item(0)->nextSibling ); } break; }
Injecting the HTML
The DOMDocument is converted to HTML using the built-in saveHTML(), from which only the contents of the body contents are extracted using a regular expression match.
If the match was successful, the placeholder element created in the XML processing above is then replaced with the injection HTML, updating the original HTML passed to the filter.
preg_match( '/<body>(.*)<\/body>/ims', $DOM->saveHTML( $parent->item(0) ), $matches ); if ( $matches && isset( $matches[1] ) ) { // replacing injection placeholder with the content to be injected $html = preg_replace( '/<inject><\/inject>/', $injectContent, $matches[1] ); }
In case the match is unsuccessful, the unmodified HTML is returned. This should not be an issue if the XPath query targets were chosen carefully and good fallbacks are supplied.
Applying the filter
The Twig filter call in the template would look something like this:
my_xmlblock_data_output_html|inject_into_html( '<div class="ad-unit">My ad unit code here</div>', [ '//body/h2[2]', '//body/h2[1]', '//body/p[15]', '//body/p[10]', '//body/p[5]' ], 'insertBefore' )
The Twig filter is applied to the rich text data's HTML output, passing in the HTML to inject, a number of injection targets (as an array of XPath queries), and the injection mode.
The example above targets an injection:
- Before the second or first heading level 2 tag; or
- Before the 15th, 10th, or 5th paragraph tag
Note: Each query starts with //body as the method called by inject_into_html places the rich text field's HTML contents passed to the filter into a wrapper node as part of the injection process.
Handling errors
In case the HTML source you're trying to inject into contains elements that are not supported by DOMDocument, the injection will fail. If you know the errors shouldn't stop the injection, you can switch libxml's error handling to internal and work through them as needed. The injection should continue to work depending on the type and severity of the error.
The error handling can be adjusted by adding a libxml_use_internal_errors call right after the DOMDocument instantiation.
$DOM = new \DOMDocument; libxml_use_internal_errors(true);
A flexible solution
The solution above covers the requirements of the use case and is easy to apply on any type of rich text content. If ad requirements change, it should be as simple as updating a number of templates.
The solution is also flexible enough to be extended to cover additional use cases such as:
- Injecting at multiple locations
- New injection modes (inside-before, inside-after, wrap...)
- Injection mode per rule