out of time
03 March 2010
Sunspot 1.0
The moment has arrived: after three release candidates and lots of great feedback from Sunspot users and contributors, Sunspot 1.0 and Sunspot::Rails 1.0 are out. It’s got a bunch of exciting new features, and I’m going to tell you all about them. You know how I do.
But first
There’s been a bit of confusion regarding the status of the Sunspot::Rails project. I shall clarify in the form of a brief bulleted list of facts:
- Sunspot and Sunspot::Rails now live in the same git repository.
- Going forward, Sunspot and Sunspot::Rails will be released in tandem, with matching versions. Sunspot::Rails will always have a dependency on its corresponding Sunspot version.
- Sunspot and Sunspot::Rails are still separate gems.
Hope that clears things up.
Upgrading to Sunspot 1.0
Sunspot 1.0 uses the totally awesome Solr 1.4 release; it’s the first Sunspot version that does. This opens up a lot of new functionality, which we’ll get to later, but it also means that you’ll need to upgrade any production Solr instances you’re running to Solr 1.4.
Sunspot 1.0 introduces the sunspot-installer
executable, which modifies an
existing Solr installation’s configuration files to work with Sunspot. Its main
purpose is to ease the pain of upgrading – if you’ve got a schema that’s built
for an older version of Sunspot, then simply run
sunspot-installer my/solr/home
and it’ll work with the latest and greatest.
For instance, if you are using Sunspot::Rails, upgrading is as easy as this:
If you’ve got a fresh, default Solr installation that you’re planning on using
with Sunspot, pass the -f
flag to sunspot-installer
, which will simply copy
Sunspot’s schema and solrconfig over the packaged ones; this works better
because Solr’s example schema uses naming conventions that conflict with
Sunspot’s.
And lastly, once you’ve modified your search configuration to take advantage of Sunspot 1.0 and Solr 1.4’s cool new capabilities, you’ll almost definitely need to reindex. Nobody said life was easy.
What’s new in Sunspot 1.0
Multiselect Facets
This is easily the most exciting new feature in Solr 1.4, which is why it’s first. Imagine the following situation: every Post belongs to a Category. You would like to build a search UI where your users can select one or more categories, along with some other filtering options. With the faceting functionality we all know and love, this simply isn’t possible: once the user selects a category, the categories facet will only show values that match the search results, which in this case is a single value: the category they have already selected.
What we’d really like to do is, for the purposes of calculating the available categories in the categories facet, ignore the fact that a category is already selected. And that’s exactly what multiselect faceting lets us do. Here’s how:
What’s going on here is that the with
method now has a meaningful return
value – essentially a handle to the filter it generates (it doesn’t matter what
the object it returns actually is). That handle can then be passed to the new
:exclude
option in the facet
method, which instructs Solr to ignore that
filter for the purposes of calculating that facet’s rows. And there you have it:
multiselect faceting.
Named Field Facets
Field facets can have lots of options applied to them, not least the awesome one we just talked about. And especially now that we have that last option, one can easily imagine situations in which you’d want to build a facet on the same row more than once, with different options.
Enter named field facets. They’re pretty simple to use:
In this case, we’re constructing two different facets on the :category_id
field: the first is a normal facet giving us all the categories that match our
search, and the second gives us all the categories that match our search across
all blogs. The :name
parameter is also what you use to retrieve the facet from
the search: search.facet(:all_blog_category_id)
Assumed Inconsistency
It’s well-known that Solr commits are very expensive and shouldn’t be done on a frequent basis. That means that if you have an application whose data changes frequently, keeping the Solr index completely up-to-date all the time is more or less impossible. Sunspot 1.0 eases the pain of that by introducing “assumed inconsistency”; in particular, if the search results reference an object that doesn’t actually exist in the primary data store, Solr will just quietly drop it from the results.
This behavior is automatic when using the Search#results
method. If you use
the hits
method, by default the data store isn’t queried at all, so Sunspot
has no way to double-checke that the referenced results actually exist. Tell
Sunspot to do that check by passing the :verify => true
parameter into the
hits
method.
The same logic applies for instantiated facets – by default, the referenced
instances aren’t loaded until the first time you ask for one, after the
collection of rows has already been built. Just pass :verify => true
into
the rows
method to make sure that all of your instantiated facets reference
actual instances.
New Field Types
Sunspot 1.0 introduces several new field types – some of them were available in Solr 1.3 and just not supported by Sunspot, and some are new in Solr 1.4.
First, the not-so-new types: long
and double
. They’re what they sound like
– bigger versions of integer
and float
.
In the department of more exciting features, Solr 1.4 and Sunspot 1.0 introduce
new “Trie” field types.
These are numeric types – they store integers, floats, or times – but they
index them in such a way that range queries are much faster. So, if you’ve got
numeric or date fields that you do range searches (which for our purpose
includes greater_than
and less_than
as well as between
), supercharge your
efficiency by defining them like so:
Note that in Sunspot 1.0, lat/lngs are stored in Trie fields as well, so you’ll definitely want to reindex your data if you’re using geo search. And speaking of geo search…
Out with LocalSolr, in with solr-spatial-light
Previous versions of Sunspot have not allowed searches to be performed with both a fulltext component and a local component, because LocalSolr clobbers various Solr features in such a way that, as far as I can tell, it’s not possible. After trying out the trunk version of LocalSolr, which did fix one problem but then introduced another, I decided to just do it myself. The result is solr-spatial-light, a very small Solr plugin that exposes the lucene-spatial library in Solr.
From Sunspot’s standpoint, the API has changed a bit (although it’s still mostly backward-compatible):
Aside from playing nice with keyword search, another advantage of
solr-spatial-light is that you can sort by distance without limiting results
to a certain radius; thus, both the :distance
and :sort
options are
optional. Of course, if you don’t pass in either, you won’t do much other than
make your search slower.
New Session Proxies
The SessionProxy pattern, first appearing in a recent Sunspot::Rails release, now gets first-class support in the core Sunspot library. A SessionProxy is simply an object that presents the same API as Sunspot::Session, adding behavior to the core Session functionality. There are lots of potential applications for this pattern (some of which you’ll probably see in future releases), but this release ships with three:
ThreadLocalSessionProxy
MasterSlaveSessionProxy
ShardingSessionProxy
Both ThreadLocalSessionProxy
and MasterSlaveSessionProxy
are automatically
injected by Sunspot::Rails (the latter only if config/sunspot.yml
contains a
master solr configuration). If you wish to manually inject a session proxy,
simply use the Sunspot.session=
method:
Support for class reloading
Sunspot now explicitly supports class reloading of the type done in Rails, Sinatra, etc., for classes that are set up for search. As well as yielding more consistent behavior with class reloading, this fixes a development-only memory leak.
Deletion by query
I can’t really think of a use case for this, but it seemed like a cool feature: you can now remove documents from Solr using a query; just use the same DSL as you would in a search:
What’s new in Sunspot::Rails 1.0
No more shelling out to start/stop Solr
Sunspot 1.0 introduces a real Sunspot::Server
class, which manages starting
and daemonizing the embedded Solr instance. Sunspot::Rails::Server
now simply
subclasses Sunspot::Server
, which means it doesn’t have to shell out to the
sunspot-solr
executable. Big plus for environments like Bundler where the
gem executables aren’t necessarily in the PATH
.
Different logic for spec support
Recent versions of Sunspot::Rails have automatically disconnected Solr during specs unless it was specifically enabled; Sunspot::Rails 1.0 reverses this, allowing explicit disabling of Solr in specs. Another change here is that if Solr is disabled in specs, all Solr operations are stubbed, including searches.
Here’s how to disable Sunspot in your specs:
Logging of Solr requests in Rails log
You can now log all Solr requests in the Rails log, with pretty coloring and
everything. Do do this, put require "sunspot/rails/solr_logging"
in an
initializer.
New Searchable::index method
The reindex
class method on ActiveRecord models is joined by index
, which
re-adds/updates all of the instances of that model to Solr, but doesn’t clear
out the index first.
That’s all for now
I’m very excited to have Sunspot 1.0 out there, but this certainly isn’t the end of the road. Some ideas that have been bouncing around the mailing list and may make appearances in future 1.x releases:
- Spell checking
- MoreLikeThis
- Facet prefixes
- Search auto-suggest
And much more. Ideas, questions, comments? Shout out to the mailing list. Bugs? Be a star and file us a ticket.