How to control sorting for nodes with missing values in sort field in Apache Solr

The other day while hanging out in IRC, I was pinged by katbailey, the Lady of the Lovely Voice (I could listen to her talk for hours) with a question about sorting in Solr when a sort field doesn't contain a value.  In particular, how can you control whether nodes without a value in the sort field show up at the beginning or end of the search results?  In her particular case, there was a Price field that was being sorted on, but not all nodes had a Price value, and the ones without Price were showing up at the beginning of the list.

I hadn't dealt with that before, but Peter Wolanin (aka pwolanin), one of the Solr Gurus, piped up with the answer. It lies in schema.xml, one of the Solr configuration files. In this case, Katherine was using a field with the "fs_" prefix (see my previous post for more info on dynamic fields and how they work).  In the config file, it is configured like this:

<dynamicField name="fs_*"&nbsp; type="sfloat"&nbsp; indexed="true"&nbsp; stored="true" multiValued="false"/>

This is a field of type sfloat, so we need to look at the configuration for that field type. 

<fieldType name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/>

The secret sauce is the sortMissingLast attribute.  It is explained in more detail in comments just above it in the config file.

 <!-- The optional sortMissingLast and sortMissingFirst attributes are
        currently supported on types that are sorted internally as strings.
        - If sortMissingLast="true", then a sort on this field will cause documents
        without the field to come after documents with the field,
        regardless of the requested sort order (asc or desc).
        - If sortMissingFirst="true", then a sort on this field will cause documents
        without the field to come before documents with the field,
        regardless of the requested sort order.
        - If sortMissingLast="false" and sortMissingFirst="false" (the default),
        then default lucene sorting will be used which places docs without the
        field first in an ascending sort and last in a descending sort.
       -->
 

So in this particular case,  it is set so if Katherine was sorting by the Price field, any nodes that didn't have a Price value would be placed after all of the nodes that did have a Price value.  If she wanted to place those items before the nodes with a Price value, she would have replaced

setMissingLast = "true"

with

setMissingFirst = "true";

And, if she wanted it to vary depending on whether the sort was ascending or descending, she would have added them both and set them both to false.

Now, as a side note, the sharp-eyed among you might have noticed that in Katherine's case, the settings were set properly for what she wanted;  setMissingLast was set to true for that field, so the items without a Price value should have been displayed at the end of the list, not the beginning.  As it turned out, the problem in her case was that the nodes that she thought did not have a value actually had a value of 0, which put them at the top of the list.  So, wielding her Solr fu, she added a line to her hook_apachesolr_update_index() function to only index that field if it has a non-zero value.

function mymodule_apachesolr_update_index(&$document, $node) {

        // The sale_price field will not have been set if the value was 0
         if (isset($fields['sale_price'])) {
            $document->fs_cck_field_sale_price = $fields['sale_price'];
         }
        }

This causes the Price field to keep from being indexed for nodes with value of 0 in it, so they are then placed at the end of the search results when they are sorted by Price.

So thus ends today's lesson in Solr. It should be noted that even though the issue turned out to be something different, all was not lost, because we got to learn about sorting on fields that don't have a value in the sorted field.  A win-win all the way around, don't you think?

Share This