out of time

03 March 2010

Sunspot 1.0

The moment has arrived: after three release candidates and lots of great feedback from Sunspot users and contributors, Sunspot 1.0 and Sunspot::Rails 1.0 are out. It’s got a bunch of exciting new features, and I’m going to tell you all about them. You know how I do.

But first

There’s been a bit of confusion regarding the status of the Sunspot::Rails project. I shall clarify in the form of a brief bulleted list of facts:

Hope that clears things up.

Upgrading to Sunspot 1.0

Sunspot 1.0 uses the totally awesome Solr 1.4 release; it’s the first Sunspot version that does. This opens up a lot of new functionality, which we’ll get to later, but it also means that you’ll need to upgrade any production Solr instances you’re running to Solr 1.4.

Sunspot 1.0 introduces the sunspot-installer executable, which modifies an existing Solr installation’s configuration files to work with Sunspot. Its main purpose is to ease the pain of upgrading – if you’ve got a schema that’s built for an older version of Sunspot, then simply run sunspot-installer my/solr/home and it’ll work with the latest and greatest. For instance, if you are using Sunspot::Rails, upgrading is as easy as this:

$ rake sunspot:solr:stop
$ sudo gem install sunspot_rails -v '1.0.0'
$ sunspot-installer -v ./solr
$ rake sunspot:solr:start

If you’ve got a fresh, default Solr installation that you’re planning on using with Sunspot, pass the -f flag to sunspot-installer, which will simply copy Sunspot’s schema and solrconfig over the packaged ones; this works better because Solr’s example schema uses naming conventions that conflict with Sunspot’s.

And lastly, once you’ve modified your search configuration to take advantage of Sunspot 1.0 and Solr 1.4’s cool new capabilities, you’ll almost definitely need to reindex. Nobody said life was easy.

What’s new in Sunspot 1.0

Multiselect Facets

This is easily the most exciting new feature in Solr 1.4, which is why it’s first. Imagine the following situation: every Post belongs to a Category. You would like to build a search UI where your users can select one or more categories, along with some other filtering options. With the faceting functionality we all know and love, this simply isn’t possible: once the user selects a category, the categories facet will only show values that match the search results, which in this case is a single value: the category they have already selected.

What we’d really like to do is, for the purposes of calculating the available categories in the categories facet, ignore the fact that a category is already selected. And that’s exactly what multiselect faceting lets us do. Here’s how:

Sunspot.search(Post) do
  keywords('some keywords')
  with(:blog_id, 1)
  category_filter = with(:category_id, 3)
  facet(:category_id, :exclude => category_filter)
end

What’s going on here is that the with method now has a meaningful return value – essentially a handle to the filter it generates (it doesn’t matter what the object it returns actually is). That handle can then be passed to the new :exclude option in the facet method, which instructs Solr to ignore that filter for the purposes of calculating that facet’s rows. And there you have it: multiselect faceting.

Named Field Facets

Field facets can have lots of options applied to them, not least the awesome one we just talked about. And especially now that we have that last option, one can easily imagine situations in which you’d want to build a facet on the same row more than once, with different options.

Enter named field facets. They’re pretty simple to use:

Sunspot.search(Post) do
  keywords('some keywords')
  blog_filter = with(:blog_id, 1)
  facet(:category_id)
  facet(:category_id, :name => :all_blog_category_id, :exclude => blog_filter)
end

In this case, we’re constructing two different facets on the :category_id field: the first is a normal facet giving us all the categories that match our search, and the second gives us all the categories that match our search across all blogs. The :name parameter is also what you use to retrieve the facet from the search: search.facet(:all_blog_category_id)

Assumed Inconsistency

It’s well-known that Solr commits are very expensive and shouldn’t be done on a frequent basis. That means that if you have an application whose data changes frequently, keeping the Solr index completely up-to-date all the time is more or less impossible. Sunspot 1.0 eases the pain of that by introducing “assumed inconsistency”; in particular, if the search results reference an object that doesn’t actually exist in the primary data store, Solr will just quietly drop it from the results.

This behavior is automatic when using the Search#results method. If you use the hits method, by default the data store isn’t queried at all, so Sunspot has no way to double-checke that the referenced results actually exist. Tell Sunspot to do that check by passing the :verify => true parameter into the hits method.

The same logic applies for instantiated facets – by default, the referenced instances aren’t loaded until the first time you ask for one, after the collection of rows has already been built. Just pass :verify => true into the rows method to make sure that all of your instantiated facets reference actual instances.

New Field Types

Sunspot 1.0 introduces several new field types – some of them were available in Solr 1.3 and just not supported by Sunspot, and some are new in Solr 1.4.

First, the not-so-new types: long and double. They’re what they sound like – bigger versions of integer and float.

In the department of more exciting features, Solr 1.4 and Sunspot 1.0 introduce new “Trie” field types. These are numeric types – they store integers, floats, or times – but they index them in such a way that range queries are much faster. So, if you’ve got numeric or date fields that you do range searches (which for our purpose includes greater_than and less_than as well as between), supercharge your efficiency by defining them like so:

Sunspot.setup(Post) do
  integer :comment_count, :trie => true
  float :average_rating, :trie => true
  time :published_at, :trie => true
end

Note that in Sunspot 1.0, lat/lngs are stored in Trie fields as well, so you’ll definitely want to reindex your data if you’re using geo search. And speaking of geo search…

Out with LocalSolr, in with solr-spatial-light

Previous versions of Sunspot have not allowed searches to be performed with both a fulltext component and a local component, because LocalSolr clobbers various Solr features in such a way that, as far as I can tell, it’s not possible. After trying out the trunk version of LocalSolr, which did fix one problem but then introduced another, I decided to just do it myself. The result is solr-spatial-light, a very small Solr plugin that exposes the lucene-spatial library in Solr.

From Sunspot’s standpoint, the API has changed a bit (although it’s still mostly backward-compatible):

  Sunspot.search(Post) do
    near([40,5, -72.3], :distance => 5, :sort => true)
  end

Aside from playing nice with keyword search, another advantage of solr-spatial-light is that you can sort by distance without limiting results to a certain radius; thus, both the :distance and :sort options are optional. Of course, if you don’t pass in either, you won’t do much other than make your search slower.

New Session Proxies

The SessionProxy pattern, first appearing in a recent Sunspot::Rails release, now gets first-class support in the core Sunspot library. A SessionProxy is simply an object that presents the same API as Sunspot::Session, adding behavior to the core Session functionality. There are lots of potential applications for this pattern (some of which you’ll probably see in future releases), but this release ships with three:

ThreadLocalSessionProxy
Use a different Session object for each thread. Since this proxy spawns an indefinite number of new Session objects, it's initialized with a Configuration.
MasterSlaveSessionProxy
Encapsulate two Session objects, one of which points to a master Solr instance, and one of which points to a slave. Write operations go to the master, and reads to the slave.
ShardingSessionProxy
This is actually an abstract class, which relies on concrete subclasses to determine which shard a given object should be read from. It is initialized with a single search Session, which is used for cross-shard read operations; subclasses will generally also be initialized with a set of shard sessions for write operations. Two concrete implementations are provided -- ClassShardingSessionProxy and IdShardingSessionProxy, which determine shard based on the hash of the object's class and the object's ID respectively.

Both ThreadLocalSessionProxy and MasterSlaveSessionProxy are automatically injected by Sunspot::Rails (the latter only if config/sunspot.yml contains a master solr configuration). If you wish to manually inject a session proxy, simply use the Sunspot.session= method:

Sunspot.session = ThreadLocalSessionProxy.new(Sunspot.configuration)

Support for class reloading

Sunspot now explicitly supports class reloading of the type done in Rails, Sinatra, etc., for classes that are set up for search. As well as yielding more consistent behavior with class reloading, this fixes a development-only memory leak.

Deletion by query

I can’t really think of a use case for this, but it seemed like a cool feature: you can now remove documents from Solr using a query; just use the same DSL as you would in a search:

Sunspot.remove(Post) do
  with(:blog_id, 1)
end

What’s new in Sunspot::Rails 1.0

No more shelling out to start/stop Solr

Sunspot 1.0 introduces a real Sunspot::Server class, which manages starting and daemonizing the embedded Solr instance. Sunspot::Rails::Server now simply subclasses Sunspot::Server, which means it doesn’t have to shell out to the sunspot-solr executable. Big plus for environments like Bundler where the gem executables aren’t necessarily in the PATH.

Different logic for spec support

Recent versions of Sunspot::Rails have automatically disconnected Solr during specs unless it was specifically enabled; Sunspot::Rails 1.0 reverses this, allowing explicit disabling of Solr in specs. Another change here is that if Solr is disabled in specs, all Solr operations are stubbed, including searches.

Here’s how to disable Sunspot in your specs:

describe Post do
  describe "without search" do
    disconnect_sunspot

    it "should work awesome" do
      #...
    end
  end
end

Logging of Solr requests in Rails log

You can now log all Solr requests in the Rails log, with pretty coloring and everything. Do do this, put require "sunspot/rails/solr_logging" in an initializer.

New Searchable::index method

The reindex class method on ActiveRecord models is joined by index, which re-adds/updates all of the instances of that model to Solr, but doesn’t clear out the index first.

That’s all for now

I’m very excited to have Sunspot 1.0 out there, but this certainly isn’t the end of the road. Some ideas that have been bouncing around the mailing list and may make appearances in future 1.x releases:

And much more. Ideas, questions, comments? Shout out to the mailing list. Bugs? Be a star and file us a ticket.