Apache Solr - Group by date proximity

Posted April 27, 2016 « More articles

The company I currently work for sells camping vacations. In the travel business, it is usual to have set days for arrival and departure, usually saturdary.

We've opted for another model, allowing customers to select their prefered days of arrival and departure, as long as they fit within the planning (so not leaving any one-night gaps). All our availability is indexed in a Solr document database.

The relevant part of our schema looks like this:

/var/solr/data/instance/conf/schema.xml

<field name="unit_type_id" type="int"/>
<field name="arrival_date" type="tdate"/>
<field name="departure_date" type="tdate"/>
<field name="nights" type="int" />
<field name="price" type="double" />

Current implementation and issues

Next to that grid, our site has several search features and widgets, allowing for a more commercially attractive display of prices. These widgets allow search for a specific date. To show alternative results if we don't have the exact match, our site provides an options to search a couple days flexible. This is done by searching around the requested date, like so:

arrival_date:[-3DAYS TO +3DAYS]

We then group the results by 'unit_type_id' for display. Our current search has had this implementation for a while, but it always bugged my because if has one massive drawback: if there is a match in the 3 days prior to the requested date, that is the one displayed, even if the requested date is available.

I started looking for a way to display more relevant results, and dove into the Apache Solr documentation. I was kickedstarted by asking a StackOverflow question, which pointed me to the boost feature.

Here are some other good articles that helped me along:

Using Boost to group dates by proximity and relevancy

To use the boost feature to group by date proximity, we'll use the defType and boost fields. In the boost function, we'll use recip and ms on the arrival_date, which allows us to way the results. There is a really nice explaination of the recip function on StackOverflow.

recip, short for 'reciprocal function', takes four arguments: (x,m,a,b), performing the calculation: a/(m*x+b). m, a & b are constants, x is any (numeric) field or the result of another function.

Because we want to use the boost feature to order the search results, we can pick 1 for m, a and b. An example:

recip(abs(sub(nights,' . $nights . ')),1,1,1)

Important gotcha: use the q.alt field, not q to find the desired results.

You can try the full search feature using the filtering described on ViaLora.nl or the campsite-level filter on Rent-a-Tent.nl, which run on the same platform.

Url-decoded and indented (for legibility) the full url becomes:

https://HOST:PORT/solr/base/select
?df=
	unit_type_id,
	arrival_date
&defType=
	edismax
&q.alt=
	unit_type_id: 155
	AND
	arrival_date:[2016-06-06T00:00:00Z/DAY-3DAYS TO 2016-06-06T00:00:00Z/DAY+3DAYS]
&boost=
	recip(ms(arrival_date,2016-06-06T00:00:00Z),1,1,1)
&group=
	true
&group.field=
	unit_type_id
&wt=
	json
&indent=
	true

P.s. PHP Solarium & Debugging

In our project, we're using Solarium PHP, which can be hard to debug. Debugging the query Solarium constructs can be tricky, and I haven't found a way to output it. This can be circumvented by var_dump-ing it in the Curl Adapter (this line).

P.p.s. Flexible pricing grid

We provide full transparancy about our pricing and availability, allowing users to browse a price grid that can be searched. After some struggles to find a workable design, we settled on a two-axis grid controlled by a date-picker, containing all prices. This data is fed directly from Solr.