eZ Find: How to return specific fields of indexed data
By: Benjamin Kroll | January 11, 2017 | eZ Publish add-ons, eZ Publish development tips, Web solutions, ezfind, search, eZ Find, and Solr
When working with eZ Find fetches, you may want to return only a specific sub-set of data for each of the search results, rather than the whole content object.
You can do that by using the eZ Find 'search' fetch's 'fields_to_return' parameter.
Why use eZ Find 'search' fetch's 'fields_to_return'
Accessing index data not available on the content object by default
Each Solr result document holds information that is not part of the content object's attribute or meta information.
This includes some additional meta data, as well as custom data added during indexing e.g. via eZ Find's Index Time Plugin mechanism.
That information can be used to filter searches, but is not returned in the result object unless 'fields_to_return' is used.
Performance
It makes no sense to fetch whole content objects when you only need a few fields.
- Do you want to show 100 or 200 or 300 event titles with event dates (both of these being data_map attributes in this case) per page? No problem1.
- Do you want to get a large number of pre-filtered content object IDs to use in another fetch? No problem.
- Do you want to create a large CSV set or data table quickly? No problem. (There is a much better way for the former case, see ...but wait, there's more!)
1 This still depends on server memory and available speed available, of course.
Depending on which information you need, you may be able to simply use the 'as_objects' parameter by itself. This will return a lighter version of the results object, which contains a lot of commonly used information e.g. name, published, is_visible, url_alias. Fetching these light results also allows you to return a much larger number of results at a time without having to worry about the memory cost too much.
In case you need attribute data or custom indexed data you will need to add 'fields_to_return' to the fetch, which will return a similarly light version of the results object, with the additional field information requested.
There is still some overhead here, as each result returned contains a number of meta fields and other information returned by default. Unless you really want to push it, using the standard fetch with 'fields_to_return' should be fine.
If you do want to push it, you can always go further, using the rawSolrQuery fetch.
How to use eZ Find 'search' fetch's 'fields_to_return'
Please note: The examples below are based on eZ Find LS 5.3, which uses Solr 4.7.
Older versions of eZ Find using Solr 3.x feature an older version of the admin interface accessible via http://<hostname>:8983/solr/admin
The query tool is available in all versions of the admin interface and the query syntax remains the same as well.
Here is an example fetch in an eZ Publish template:
{def $search_hash = hash( 'query', '*', 'class_id', 'secondary_content', 'fields_to_return', array( 'meta_owner_name_t', 'attr_main_title_t' ), 'as_objects', false(), 'limit', 10 ) $search = fetch( 'ezfind', 'search', $search_hash ) } {$search.SearchResult.0|dump( show, 2 )}
The result dump shows:
Attribute | Type | Value |
---|---|---|
guid | string | 'abc123' |
installation_id | string | 'xyz987' |
installation_url | string | 'http://solr_dev/' |
name | string | 'About Us and Generic Content Page' |
language_code | string | 'eng-CA' |
owner_name | string | 'Administrator User' |
id | integer | 123 |
main_node_id | integer | 124 |
published | string | '2015-11-23T04:06:26Z' |
path_string | array | Array(1) |
>0 | string | '/1/2/124/' |
is_invisible | array | Array(1) |
>0 | boolean | false |
main_url_alias | string | 'About-Us-and-Generic-Content-Page' |
main_path_string | string | '/1/2/124/' |
fields | array | Array(1) |
>attr_main_title_t | string | 'About Us and Generic Content Page' |
highlight | string | '' |
elevated | boolean | false |
There are a few things to note about the fetch code, as well as the data returned by it:
- To be able to use the 'fields_to_return' parameter, you also need to set 'as_objects' to false().
- Field names for 'fields_to_return' are the actual Solr field names.Note: If you are unsure what fields are available, refer to the Solr Admin interface's Query tool, which can be reached via http://<your_hostname>:8983/solr/#/ezp-default/query on a default installation.Use a wildcard (*) for the 'fl' field to see all available fields for a query. You can also use the wildcard in partial field names e.g. meta_*
- meta_* fields will be accessible via keys on the search result array, but without the meta_ prefix or field type suffix. (e.g. meta_owner_name_t becomes owner_name).
- as_* fields (binary data) will be accessible via keys of the data_map array on the search result array, but without the as_ prefix or field type suffix.
- All other fields end up in the fields array on the search result array. Unlike the meta_ and as_ fields, however, the key for these fields is the actual field name used in the 'fields_to_return' parameter.
For further insight on how the eZ Find fetches work, you can refer to:
- extension/ezfind/modules/ezfind/function_definition.php to see all available eZ Find fetch parameters
- search(); in extension/ezfind/classes/ezfmodulefunctioncollection.php to see how the fetch parameters are handled
- buildResultObjects(); in extension/ezfind/search/plugins/ezsolr/ezsolr.php to see how the search results object is created
Why isn't this documented?
I don't have an answer for that, but it brings up two important aspects of development in general:
- As a project maintainer: Keep your code documentation up to date, always. This is true for both inline as well as external docs.
- As a developer: Read the source code!
The first is arguably more important, but we all know good/complete documentation is, sadly, not common. The source code, however, gives you some insight into what's going on behind the scenes and, more importantly, will reveal to you functionality that you may not be aware of (yet).
Going further using rawSolrQuery
As the documentation states, the 'rawSolrQuery' fetch function: "Allows for “raw” Solr requests (not for normal use, but for example to search “foreign” Solr or Lucene indexes)."
You should keep that in mind, but don't let it stop you.
You will likely only need to use the 'rawSolrQuery' as an exception or for debugging. It shouldn't be your 'go to' solution.
The standard 'search' fetch in combination with 'as_objects' and 'fields_to_return' will get you similar data, while making use of the standard attribute filter and sort syntax, as well as the CMS permissions and visibility layers with only minor overhead. Using the standard fetches you'll also not have to deal with authentication and request parameter configuration as that is handled for you as part of the fetches.
{def $use_auth = ezini( 'SolrBase', 'SearchServerAuthentication', 'solr.ini' )|eq( 'enabled' ) $auth_prefix = cond( $use_auth, concat( ezini( 'SolrBase', 'SearchServerUserPass', 'solr.ini' ), '@' ), false(), '' ) $raw_base_url = ezini( 'SolrBase', 'SearchServerURI', 'solr.ini' )|explode( '://' )|implode( concat( '://', $auth_prefix ) ) $query = '*' $raw_hash = hash( 'baseURL', $raw_base_url, 'request', '/select', 'parameters', hash( 'q', $query, 'rows', 10, 'fq', 'meta_class_identifier_ms:secondary_content', 'fl', 'meta_owner_name_t,attr_main_title_t' ) ) $raw = fetch( 'ezfind', 'rawSolrRequest', $raw_hash ) } {$raw.response.docs|dump( show, 3 )}
The result doc dump shows:
Attribute | Type | Value |
---|---|---|
0 | array | Array(2) |
>meta_owner_name_t | string | 'Administrator User' |
>attr_main_title_t | string | 'About Us and Generic Content Page' |
As you can see, all the default fields present in the standard 'search' fetch are gone now, giving you only the fields requested.
It's important to note that the 'rawSolrRequest' fetch returns a different result structure than the standard fetch. Dump $raw to get an overview of what's available.
The fetch example starts off by determining if Solr is using authentication. Then it creates the base URL used in the fetch by determining the protocol as well as the authentication credentials used, based on the settings in eZ Find's solr.ini
$use_auth = ezini( 'SolrBase', 'SearchServerAuthentication', 'solr.ini' )|eq( 'enabled' ) $auth_prefix = cond( $use_auth, concat( ezini( 'SolrBase', 'SearchServerUserPass', 'solr.ini' ), '@' ), false(), '' ) $raw_base_url = ezini( 'SolrBase', 'SearchServerURI', 'solr.ini' )|explode( '://' )|implode( concat( '://', $auth_prefix ) )
The fetch itself takes three parameters - 'baseURL', 'request', and 'params':
- 'baseURL' is as described above; the URL eZ Find will make its request against Solr
- 'request' is the type of request made
- 'params' is a hash of Solr request parameters2
- 'q' is the query string
- 'start' is 'offset' in the 'search' fetch
- 'rows' is 'limit' in the 'search' fetch
- 'fq' is 'filter' in the 'search' fetch
- 'fl' is the field list; equivalent to 'fields_to_return' in the 'search' fetch
2 For a full list of parameters check the Solr Admin interface's Query tool.
$raw_hash = hash( 'baseURL', $raw_base_url, 'request', '/select', 'parameters', hash( 'q', $query, 'start', 0, 'rows', 10, 'fq', 'meta_class_identifier_ms:secondary_content', 'fl', 'meta_owner_name_t,attr_main_title_t' ) )
Note: 'rawSolrQuery' does not support the use of the 'wt' parameter, which is used to change the return data type.
rawSolrQuery for debugging
The Solr admin interface is the best tool to debug your data and query problems, but you may not always have access to it for security reasons.
In such cases the 'rawSolrQuery' lets you quickly and effectively see what data is available and any problems with the query used to retrieve it. As long as you have access to a template to put your debug query, you're set.
... but wait! There's more
Results of eZ Find fetches are returned as PHP arrays. Solr, however, is able to return result data in a number of different formats: JSON, XML, Python, Ruby, PHP, and CSV.
The result data type is controlled via the 'wt' parameter on a Solr select request.
Our example fetch URL run via the Solr Admin interface would look like this:
http://<your_hostname>:8983/solr/ezp-default/select?q=*&wt=csv&fq=meta_class_identifier_ms%3Asecondary_content&start=0&rows=10&fl=meta_owner_name_t%2Cattr_main_title_t
And would return:
meta_owner_name_t,attr_main_title_t
Administrator User,About Us and Generic Content Page
Not too surprising or exciting, until you urgently need to create large reports on your data structure as CSVs, which this type of request handles with ease. A few thousand rows at a time!