out of time

03 March 2010

Sunspot 1.0

The moment has arrived: after three release candidates and lots of great feedback from Sunspot users and contributors, Sunspot 1.0 and Sunspot::Rails 1.0 are out. It’s got a bunch of exciting new features, and I’m going to tell you all about them. You know how I do.

But first

There’s been a bit of confusion regarding the status of the Sunspot::Rails project. I shall clarify in the form of a brief bulleted list of facts:

  • Sunspot and Sunspot::Rails now live in the same git repository.
  • Going forward, Sunspot and Sunspot::Rails will be released in tandem, with matching versions. Sunspot::Rails will always have a dependency on its corresponding Sunspot version.
  • Sunspot and Sunspot::Rails are still separate gems.

Hope that clears things up.

Upgrading to Sunspot 1.0

Sunspot 1.0 uses the totally awesome Solr 1.4 release; it’s the first Sunspot version that does. This opens up a lot of new functionality, which we’ll get to later, but it also means that you’ll need to upgrade any production Solr instances you’re running to Solr 1.4.

Sunspot 1.0 introduces the sunspot-installer executable, which modifies an existing Solr installation’s configuration files to work with Sunspot. Its main purpose is to ease the pain of upgrading – if you’ve got a schema that’s built for an older version of Sunspot, then simply run sunspot-installer my/solr/home and it’ll work with the latest and greatest. For instance, if you are using Sunspot::Rails, upgrading is as easy as this:

$ rake sunspot:solr:stop
$ sudo gem install sunspot_rails -v '1.0.0'
$ sunspot-installer -v ./solr
$ rake sunspot:solr:start

If you’ve got a fresh, default Solr installation that you’re planning on using with Sunspot, pass the -f flag to sunspot-installer, which will simply copy Sunspot’s schema and solrconfig over the packaged ones; this works better because Solr’s example schema uses naming conventions that conflict with Sunspot’s.

And lastly, once you’ve modified your search configuration to take advantage of Sunspot 1.0 and Solr 1.4’s cool new capabilities, you’ll almost definitely need to reindex. Nobody said life was easy.

What’s new in Sunspot 1.0

Multiselect Facets

This is easily the most exciting new feature in Solr 1.4, which is why it’s first. Imagine the following situation: every Post belongs to a Category. You would like to build a search UI where your users can select one or more categories, along with some other filtering options. With the faceting functionality we all know and love, this simply isn’t possible: once the user selects a category, the categories facet will only show values that match the search results, which in this case is a single value: the category they have already selected.

What we’d really like to do is, for the purposes of calculating the available categories in the categories facet, ignore the fact that a category is already selected. And that’s exactly what multiselect faceting lets us do. Here’s how:

Sunspot.search(Post) do
  keywords('some keywords')
  with(:blog_id, 1)
  category_filter = with(:category_id, 3)
  facet(:category_id, :exclude => category_filter)

What’s going on here is that the with method now has a meaningful return value – essentially a handle to the filter it generates (it doesn’t matter what the object it returns actually is). That handle can then be passed to the new :exclude option in the facet method, which instructs Solr to ignore that filter for the purposes of calculating that facet’s rows. And there you have it: multiselect faceting.

Named Field Facets

Field facets can have lots of options applied to them, not least the awesome one we just talked about. And especially now that we have that last option, one can easily imagine situations in which you’d want to build a facet on the same row more than once, with different options.

Enter named field facets. They’re pretty simple to use:

Sunspot.search(Post) do
  keywords('some keywords')
  blog_filter = with(:blog_id, 1)
  facet(:category_id, :name => :all_blog_category_id, :exclude => blog_filter)

In this case, we’re constructing two different facets on the :category_id field: the first is a normal facet giving us all the categories that match our search, and the second gives us all the categories that match our search across all blogs. The :name parameter is also what you use to retrieve the facet from the search: search.facet(:all_blog_category_id)

Assumed Inconsistency

It’s well-known that Solr commits are very expensive and shouldn’t be done on a frequent basis. That means that if you have an application whose data changes frequently, keeping the Solr index completely up-to-date all the time is more or less impossible. Sunspot 1.0 eases the pain of that by introducing “assumed inconsistency”; in particular, if the search results reference an object that doesn’t actually exist in the primary data store, Solr will just quietly drop it from the results.

This behavior is automatic when using the Search#results method. If you use the hits method, by default the data store isn’t queried at all, so Sunspot has no way to double-checke that the referenced results actually exist. Tell Sunspot to do that check by passing the :verify => true parameter into the hits method.

The same logic applies for instantiated facets – by default, the referenced instances aren’t loaded until the first time you ask for one, after the collection of rows has already been built. Just pass :verify => true into the rows method to make sure that all of your instantiated facets reference actual instances.

New Field Types

Sunspot 1.0 introduces several new field types – some of them were available in Solr 1.3 and just not supported by Sunspot, and some are new in Solr 1.4.

First, the not-so-new types: long and double. They’re what they sound like – bigger versions of integer and float.

In the department of more exciting features, Solr 1.4 and Sunspot 1.0 introduce new “Trie” field types. These are numeric types – they store integers, floats, or times – but they index them in such a way that range queries are much faster. So, if you’ve got numeric or date fields that you do range searches (which for our purpose includes greater_than and less_than as well as between), supercharge your efficiency by defining them like so:

Sunspot.setup(Post) do
  integer :comment_count, :trie => true
  float :average_rating, :trie => true
  time :published_at, :trie => true

Note that in Sunspot 1.0, lat/lngs are stored in Trie fields as well, so you’ll definitely want to reindex your data if you’re using geo search. And speaking of geo search…

Out with LocalSolr, in with solr-spatial-light

Previous versions of Sunspot have not allowed searches to be performed with both a fulltext component and a local component, because LocalSolr clobbers various Solr features in such a way that, as far as I can tell, it’s not possible. After trying out the trunk version of LocalSolr, which did fix one problem but then introduced another, I decided to just do it myself. The result is solr-spatial-light, a very small Solr plugin that exposes the lucene-spatial library in Solr.

From Sunspot’s standpoint, the API has changed a bit (although it’s still mostly backward-compatible):

  Sunspot.search(Post) do
    near([40,5, -72.3], :distance => 5, :sort => true)

Aside from playing nice with keyword search, another advantage of solr-spatial-light is that you can sort by distance without limiting results to a certain radius; thus, both the :distance and :sort options are optional. Of course, if you don’t pass in either, you won’t do much other than make your search slower.

New Session Proxies

The SessionProxy pattern, first appearing in a recent Sunspot::Rails release, now gets first-class support in the core Sunspot library. A SessionProxy is simply an object that presents the same API as Sunspot::Session, adding behavior to the core Session functionality. There are lots of potential applications for this pattern (some of which you’ll probably see in future releases), but this release ships with three:

Use a different Session object for each thread. Since this proxy spawns an indefinite number of new Session objects, it's initialized with a Configuration.
Encapsulate two Session objects, one of which points to a master Solr instance, and one of which points to a slave. Write operations go to the master, and reads to the slave.
This is actually an abstract class, which relies on concrete subclasses to determine which shard a given object should be read from. It is initialized with a single search Session, which is used for cross-shard read operations; subclasses will generally also be initialized with a set of shard sessions for write operations. Two concrete implementations are provided -- ClassShardingSessionProxy and IdShardingSessionProxy, which determine shard based on the hash of the object's class and the object's ID respectively.

Both ThreadLocalSessionProxy and MasterSlaveSessionProxy are automatically injected by Sunspot::Rails (the latter only if config/sunspot.yml contains a master solr configuration). If you wish to manually inject a session proxy, simply use the Sunspot.session= method:

Sunspot.session = ThreadLocalSessionProxy.new(Sunspot.configuration)

Support for class reloading

Sunspot now explicitly supports class reloading of the type done in Rails, Sinatra, etc., for classes that are set up for search. As well as yielding more consistent behavior with class reloading, this fixes a development-only memory leak.

Deletion by query

I can’t really think of a use case for this, but it seemed like a cool feature: you can now remove documents from Solr using a query; just use the same DSL as you would in a search:

Sunspot.remove(Post) do
  with(:blog_id, 1)

What’s new in Sunspot::Rails 1.0

No more shelling out to start/stop Solr

Sunspot 1.0 introduces a real Sunspot::Server class, which manages starting and daemonizing the embedded Solr instance. Sunspot::Rails::Server now simply subclasses Sunspot::Server, which means it doesn’t have to shell out to the sunspot-solr executable. Big plus for environments like Bundler where the gem executables aren’t necessarily in the PATH.

Different logic for spec support

Recent versions of Sunspot::Rails have automatically disconnected Solr during specs unless it was specifically enabled; Sunspot::Rails 1.0 reverses this, allowing explicit disabling of Solr in specs. Another change here is that if Solr is disabled in specs, all Solr operations are stubbed, including searches.

Here’s how to disable Sunspot in your specs:

describe Post do
  describe "without search" do

    it "should work awesome" do

Logging of Solr requests in Rails log

You can now log all Solr requests in the Rails log, with pretty coloring and everything. Do do this, put require "sunspot/rails/solr_logging" in an initializer.

New Searchable::index method

The reindex class method on ActiveRecord models is joined by index, which re-adds/updates all of the instances of that model to Solr, but doesn’t clear out the index first.

That’s all for now

I’m very excited to have Sunspot 1.0 out there, but this certainly isn’t the end of the road. Some ideas that have been bouncing around the mailing list and may make appearances in future 1.x releases:

  • Spell checking
  • MoreLikeThis
  • Facet prefixes
  • Search auto-suggest

And much more. Ideas, questions, comments? Shout out to the mailing list. Bugs? Be a star and file us a ticket.

View Comments
15 October 2009

Sunspot 0.10 released

Late breaking news: Sunspot 0.10 was released about a week ago. Version 0.10 has a lot of great new features, including support for geographical search using LocalSolr, keyword highlighting, and lots of new DisMax features for high-precision relevance tuning.


Much like all gems, Sunspot is no longer released on GitHub. You can install it from RubyForge or Gemcutter:

sudo gem install sunspot

If you’re using Sunspot::Rails, be sure to install the latest version, as it has some changes for compatibility with argument changes to the sunspot-solr executable.

If you’re running a Solr instance besides the one shipped with the sunspot-solr executable - including using rake sunspot:solr:start with a separate solr/ directory in Sunspot::Rails - now might be a good time to skip down to the installing LocalSolr in your Solr instance section. We’ll see you back up here when you’re done.

Geographical Search using LocalSolr

LocalSolr is an extension to Solr that provides geographical search functionality. As anyone who works on mobile or local-heavy applications can tell you, this is pretty cool. Sunspot 0.10 has support for geographical search and indexing, and the Solr instance that ships with the gem now has LocalSolr and its dependencies already installed.

To index geographical data for your model, just specify the coordinates field in the setup:

Sunspot.setup Post do
  coordinates :lat_lng

The models’ value for the coordinates should have one of the following pairs of attributes:

  • first, last (e.g., an Array)
  • lat, lng
  • lat, lon
  • lat, long

Once you’ve got your geographical data indexed, you can use the near method to search within a given radius:

Sunspot.search(Post) do
  near [40, -70], 5

This will search for posts within 5 miles of the coordinates <40, -70>. The first argument takes the same form as the coordinates value above; the second argument is always a number of miles. Unfortunately, it does not appear that LocalSolr can handle a distance of less than one mile, so hopefully you’re not running a CIA satellite or anything.

One other big gotcha with LocalSolr: unfortunately, the current stable release neither supports filter queries nor subqueries; this means that there is no way (that I know of) to use both regular boolean filters and a dismax query, which is what Sunspot uses for keyword search. So, Sunspot will fail fast if you try to do a query using both a fulltext and a local component. I’ve heard that the trunk version of LocalSolr does support filter queries; I will definitely be investigating and I hope to release a future version of Sunspot without this limitation.

Fine-tuning fulltext relevance with more dismax parameters

One big focus of this release is giving access to all of Solr’s powerful dismax features. In order to do so, Sunspot 0.10 introduces a fulltext block, which presents a DSL for fine-tuning fulltext queries.

This block is invoked with the fulltext method, which is the awesome new name for the keywords method (don’t worry; keywords is still aliased).

The fields method allows you to specify which fields you wish to perform fulltext search on, optionally giving a specific boost to each field:

Sunspot.search(Post) do
  fulltext 'boost control' do
    fields :title, :body => 0.75

The above will search only the title and body fields, applying a boost of 0.75 to the body field (title will have a default boost).

To set per-field boost without restricting which fields are searched (i.e., search all configured text fields), just use the field_boost:

Sunspot.search(Post) do
  fulltext 'boost control' do
    boost_fields :title => 2.0, :body => 0.75

Phrase fields add an extra boost to fields in which all the fulltext keywords appear in the field - it’s great for titles and other high-relevance fields:

Sunspot.search(Post) do
  fulltext 'phrase fields' do
    phrase_fields :title => 2.5

Boost queries allow extra boost to be applied to documents which match an arbitrary set of conditions:

Sunspot.search(Post) do
  fulltext 'boost query' do
    boost 2.0 do
      with :featured, true

The above will apply a boost of 2.0 to featured posts.

Fulltext highlighting

What’s better than giving your users the most relevant results for their keyword searches? Showing them just what in the documents matched the search, of course. Solr comes with built-in keyword highlighting; you can get a full explanation of the highlighting features here: http://wiki.apache.org/solr/HighlightingParameters

Simple highlighting can be activated simply by passing :highlight => true as an option to the keywords method:

Sunspot.search(Post) do
  keywords 'great pizza', :highlight => true

If you’d like to choose specific fields to highlight, pass an array of field names instead of true:

Sunspot.search(Post) do
  keywords 'great pizza', :highlight => %w(title body)

More advanced highlighting options can be passed to the highlight method inside the keywords DSL block:

Sunspot.search(Post) do
  keywords 'great pizza' do
    highlight :title, :body, :max_snippets => 3, :fragment_size => 200

The highlight method accepts the following options:

The maximum number of highlighted snippets to return per field.
The maximum size of a text fragment to consider for highlighting
If two highlighted fragments are adjacent to one another, merge them into a single fragment.
From the Solr wiki: "Use SpanScorer to highlight phrase terms only when they appear within the query phrase in the document. Default is false." Whatever that means.
Require that the field actually matched the query (instead of simply containing the words being searched). Requires :phrase_highlighter to be true.

Using highlights

If you’ve performed your search with highlights, you access them using the highlights method of the Sunspot::Search::Hit object. highlights can take a field name as an argument, in which case it will only return highlights for the specified field; otherwise, it will return all highlights for the given hit.

The objects returned by the highlights method allow deferred formatting, which is to say your view layer can decide how to format the highlights, when it’s time to display them:

<div class="results">
  <% @search.hits.each do |hit| %>
    <div class="result">
      <h3><%= hit.instance.title %></h3>
        <%= hit.highlights(:body).first.format { |phrase| "<span class=\"highlight\">#{phrase}</span>" } %>
  <% end %>

Note that in order for highlighting to work, the highlighted field needs to be a stored text field (pass :stored => true in the field definition).

Default search-time boost

While index-time boost is useful, it means that any change to field boost requires a reindex of your data. An alternative is to set a default search-time boost in the setup:

Sunspot.setup(Post) do
  text :title, :default_boost => 2.0

This means that a boost of 2.0 will be applied to the title field in all searches, unless the boost is specified in the search itself. This will, of course, only occur for searches issued with Sunspot.

Prefix queries

By popular demand, Sunspot 0.10 supports prefix queries, using the starts_with method in the DSL:

Sunspot.search(Post) do

Restrict field facet to a list of interesting values

Let’s say I’m faceting by category, but I’m not interested in all categories; I just want to show the top-level ones. Requesting a field facet for category_id will waste resources both at the Solr level and potentially at the Sunspot level (particularly if you’re using reference facets) loading rows you’re not interested in. Sunspot 0.10 introduces an :only option to the facet method, which only returns facets for the values you want. Use it like this:

Sunspot.search(Post) do
  facet :category_ids, :only => Category.top_level.map { |category| category.id }

Under the hood, this doesn’t actually issue a field facet request at all - instead it constructs a set of query facets, which are built so that, from the perspective of the Sunspot API, act exactly like a field facet. This is one of the rare places where Sunspot actually extends Solr’s functionality, instead of simply encapsulating it. I hope to build more of these in the future.

Query facets support all facet options

Query facets now support all the options that you’re used to for field facets. The difference here is that the options are applied after the search is run, while building the Sunspot::Facet object. The end result, however, is the same.

Scope by text fields

In possibly my least favorite feature of Sunspot 0.10, it is now possible to apply scope to text fields, using the text_fields block. This works exactly like normal scope, except that the field names passed refer to text fields, instead of attribute fields. Since text fields are tokenized, the behavior here is not always intuitive; be sure to read up on tokenization, and expect that your mileage may vary:

Sunspot.search(Post) do
  text_fields do
    with(:body, 'Short body')

Other enhancements

  • You can safely execute a search multiple times, with or without modification.
  • The sunspot-solr executable accepts -l (log level) and --log-file options, which control Solr logging output.
  • Using a field in a search requires only that it exists in at least one type under search, and that it has a consistent configuration for all the searched types that had it. Before, Sunspot required that all searched types had the field, which was unnecessarily restrictive.
  • Sunspot no longer depends on the Haml or Optiflag gems. Ruby built-ins FTW.

Installing LocalSolr in your Solr instance

Add the LocalSolr libraries

In the solr home directory (the one that contains the conf/ directory), create a directory called lib/, if there isn’t one already. Copy the the contents of /path/to/your/gems/sunspot-0.10.2/solr/solr/lib into that directory.

Add extra handlers to solrconfig.xml

Add the following lines somewhere inside the config node:

  <processor class='com.pjaol.search.solr.update.LocalUpdateProcessorFactory'>
    <str name='latField'>lat</str>
    <str name='lngField'>long</str>
    <int name='startTier'>9</int>
    <int name='endTier'>16</int>
  <processor class='solr.RunUpdateProcessorFactory'></processor>
  <processor class='solr.LogUpdateProcessorFactory'></processor>
<searchComponent class='com.pjaol.search.solr.component.LocalSolrQueryComponent' name='localsolr'>
  <str name='latField'>lat</str>
  <str name='lngField'>long</str>
<requestHandler class='org.apache.solr.handler.component.SearchHandler' name='geo'>
  <arr name='components'>

You’re doing great. One more step.

Add extra fields to your schema

Add this inside the types node:

<fieldtype name="sdouble" class="solr.SortableDoubleField" omitNorms="true"/>

Then add this inside the fields node:

<field name="lat"        type="sdouble" indexed="true" stored="true"  multiValued="false" />
<field name="long"       type="sdouble" indexed="true" stored="true"  multiValued="false" />
<dynamicField name="_local*" type="sdouble" indexed="true" stored="false" multiValued="false" />

Great job! You’re done.

To the future!

Well, that’s all for Sunspot 0.10. I’m hoping the next release will be 1.0; the focus will be working out any bugs and inconsistencies that come up, and making the experience of using and managing Sunspot and Solr as smooth as possible. Here are a few things I have in mind:

  • A tool to read an existing schema and solrconfig, and make the minimum changes to make them compatible with Sunspot’s needs.
  • A framework for testing code that uses Sunspot - basically, methods to ask a search whether a given set of search parameters has been applied.
  • Local search without limitations.

Up next, though, is a big new release of Sunspot::Rails - lots of great patches from great committers have been coming in, and I’m very excited to get them all into a release. Stay tuned!

View Comments
21 July 2009

Sunspot 0.9 Released

If you haven’t read the front page of this morning’s Times, then you heard it first here: Sunspot 0.9 is out. Here’s what I wrote about the upcoming version in my last post about Sunspot, on the occasion of the 0.8 release:

Sunspot 0.9 is up next; the main goal for that version is to replace solr-ruby with RSolr as the low-level Solr interface, which will open the door to more features in future versions (query-based faceting, LocalSolr support, etc.), but probably won’t have much effect on the API for that version (other than supporting use of the faster Curb library for the HTTP communication with Solr).

Turns out that was completely wrong: 0.9 introduces lots and lots of new features, inspired by requests from users, anticipated needs in my company’s application, and a close reading of the Sunspot wiki to find out more about what it’s capable of. Read on for the juicy details.

But first, this post is really long, so here’s the first table of contents I’ve ever put in a blog post:

If this is the first you’ve heard of Sunspot, I’d recommend checking out the home page and the README before reading on.

The new version introduces several improvements to how fulltext search is performed, giving you a lot more control over how it works and how relevance is calculated.

Dismax queries

Fulltext search in Sunspot 0.9 is performed using Solr’s dismax handler, an awesome feature that I had managed to be unaware of until fairly recently. You can read all about it in the Solr API docs, but the upshot is that Solr parses fulltext queries under the assumption that they are coming from user input. It provides a circumscribed subset of the usual Lucene query syntax: in particular, well-matched quotes can be used to demarcate a phrase, and the +/- modifiers work as usual. All other Lucene query syntax is escaped, and non-well-matched quotation marks are ignored.

As well as providing user-input-safe query parsing, the use of dismax queries opens up a few more features. Read on.

Field and document boosting

Probably the most requested feature for Sunspot is boosting. Sunspot now supports boosting at both the document level and the field level. Document boosts can be dynamic (i.e., evaluate a method or block for each indexed object to determine the boost) or static; field boosts are always static.

Some examples:

Sunspot.setup(Post) do
  boost 2.0 # All Posts will have a document-level boost of 2.0
  text :title, :boost => 1.5 # The title field will have a boost of 1.5
  text :body # Body will have the default boost of 1.0

Sunspot.setup(User) do
  boost do # featured users get a big boost
    if featured?

In Sunspot 0.8, fulltext search always searched all of the text fields. In 0.9, you can specify which fields you’d like to search:

Sunspot.search(Post) do
  keywords 'pizza restaurant', :fields => [:title, :abstract]

If you don’t specify which fields to search, the search will of course apply to all indexed text fields. Note that when searching for multiple types, the set of available text fields is the union of text fields configured for the types under search, not the intersection as in attribute field search.

Index multiple values in text fields

Sunspot 0.8 didn’t allow the indexing of multiple values for text fields. In 0.9, all text fields allow multiple values. The reasoning for this is that the main reason to disallow multiple values is that multi-valued fields cannot be used for sorting; but sorting by tokenized text fields is nonsense anyway. So this is fine:

Sunspot.setup(Post) do
  text :comment_bodies do
    comments.map { |comment| comment.body }

Search API

The new release also adds several enhancements to the general search API, increasing the information available from results as well as enhancing the power and ease of use of building queries.

It’s a hit

The Search class now implements the #hits method, which returns objects encapsulating result data coming directly from Solr. #hits is an enhanced version of the #raw_results method available in 0.8; #raw_results is still aliased and the objects returned are backward-compatible.

As in 0.8, Hit objects give access to the class name and primary key of the result object. They also give access to the keyword relevance score, if they’re coming from a keyword search. You can call #instance to load the actual result instance - the first time you call that method on a Hit, all the Hit objects will have their instances populated, so don’t worry about losing batch data retrieval.

Finally, Hit objects give access to stored fields, another new feature in v0.9. Stored fields can be configured in the indexer setup:

Sunspot.setup(Post) do
  string :title, :stored => true

Then here’s how to get data out of the Hit object:

search = Sunspot.search(Post) { keywords 'pizza' }
hit = search.hits.first
hit.class_name #=> "Post"
hit.primary_key #=> "12"
hit.score #=> 8.27
hit.stored(:title) #=> "Best pizza joints in town"
hit.instance #=> #<Post:0xb7d4c0d0>

Stored fields are most useful if you store a few crucial fields that you’d like to be able to display without making the round trip to persistent storage to retrieve the data.

Smarter shorthand restrictions

Sunspot 0.9 expands the types that can be passed as a value into the short-form #with method:

  • Passing a scalar value will scope to results where the field contains that value (this is not new).
  • Passing an Array will scope to results where the field contains any of the values in the array.
  • Passing a Range will scope to results where the field’s value is in the range.

For example:

Sunspot.search(Post) { with(:blog_id, 1) } # Find all posts with blog_id 1
Sunspot.search(Post) { with(:category_ids, [1, 3, 5]) } # Find all posts whose
                                                        # category_id is 1, 3,
                                                        # or 5
Sunspot.search(Post) { with(:average_rating, 3.0..5.0) } # Find all posts whose
                                                         # average rating is
                                                         # between 3.0 and 5.0

Have your cake OR (eat it too AND enjoy it)

The query DSL now supports the #any_of and #all_of methods, which group the enclosed restrictions into disjunctions and conjunctions respectively. One good use case is if you have an expiry time field; you’d like to get results whose expiry is either in the future, or nil:

Sunspot.search(Post) do
  any_of do
    with(:expires_at, nil)

If you’d like to AND together restrictions inside an OR, you can nest an #all_of block:

Sunspot.search(Post) do
  any_of do
    all_of do
      with(:featured, true)
      with(:published_at).greater_than(Time.now - 2.weeks)

Note that using #all_of at the top level of a query block is a no-op, since query restrictions are already combined using AND semantics.

Random ordering

By popular request, Sunspot now supports random ordering, which makes use of Solr’s RandomSortField:

Sunspot.search(Post) do


One of the biggest and most exciting changes in the new release is far fuller support for Solr’s faceting capabilities. While 0.8 supported basic field facets, I think it’s safe to say that 0.9 supports pretty much all of Solr’s built-in faceting features.

More facet control

The call to #facet inside the query DSL now takes the following options:

How the facet rows should be sorted. Options are :count, which orders by the number of results matching the row's value, and :index, which sorts the values lexically.
Maximum number of facet rows to return.
The minimum count a facet row must have to be returned.
Whether to return facet rows that match no documents in the scope. Default is false; setting to true is the same as setting :minimum_count to 0.

So, for example:

Sunspot.search do
  facet :author_name, :sort => :index
  facet :category_ids, :sort => :count, :limit => 5

Time Facets

Solr has special support for faceting over a time range, with a given interval to which rows should apply. The new release adds an API for this type of facet; simply provide the :time_range key to use this type of faceting. Note that time faceting only works with time type fields - Sunspot will fail fast if you try to use it with another field type.

Available options for time faceting are:

A Range object of Times. This is the full range over which times are returned. Specifying this field also enables time faceting.
Interval that each row should cover, in seconds. The default is 1 day.
Times outside the range that should be returned as facet rows. Allowed values are :before, :after, :between, :none, and :all. The default is :none.

For example:

Sunspot.search(Post) do
  facet :published_at, :time_range => 1.year.ago..Time.now,
                       :time_interval => 1.month

This will return facets covering each month that a publish date can fall into, for the last year. The facet rows returned in the results will have Range values containing the Time range for that particular row.

See the Solr Wiki for more information on date faceting.

Query facets

Field and date facets are useful, but the real ultimate power lies in Solr’s query faceting. This allows you to specify an arbitrary set of conditions for each row, making the possibilities pretty much endless. Sunspot 0.9 supports building query facets using the same DSL that is used for building normal search scope:

search = Sunspot.search(Post) do
  facet :rating_ranges do
    row 1.0..2.0 do
      with :average_rating, 1.0..2.0
    row 2.0..3.0 do
      with :average_rating, 2.0..3.0
    row 3.0..4.0 do
      with :average_rating, 3.0..4.0
    row 4.0..5.0 do
      with :average_rating, 4.0..5.0

A few things to point out about the above. First, the concept of grouping the various rows into a single “facet” is introduced by Sunspot; Solr itself simply accepts an undifferentiated set of query facets, with no grouping. I decided to introduce the grouping as it seems more intuitive to me, and helps keep the API consistent when retrieving facets from the search results. Also, the arguments to the #facet and #row methods are not passed on to Solr; they’re simply there to make it easy to make sense of the results. In particular, the argument passed to #facet should be a symbol, and it’s used to retrieve the facet from the Search#facet method. The argument to #row can be whatever you like; it becomes the #value associated with that facet row in the results. So, in the results from the previous example, we’d see:

ratings_facet = search.facet(:rating_ranges)
ratings_facet.rows.first.value #=> 3.0..4.0

Note that the field facet options aren’t supported by query facets; they’re always ordered by count, zeros are always returned, and there’s no limit. If there’s demand, I’d be happy to support those options in a post-processing stage in a later version.

Instantiated Facets

It’s common to index database foreign keys in Solr; the new release adds explicit recognition of that fact where faceting is concerned, allowing you to specify that a field references a particular class, and then populate the facet row with the instance referenced by the row’s value. Instantiated facets are lazy-loaded, but when you request any facet row’s instance, all of the instances for the facet’s rows are loaded, so batch loading is still taken advantage of.

To specify that a field references a persisted class, just add the :references option to the field definition:

Sunspot.setup(Post) do
  integer :blog_id, :references => Blog

Then when you facet by :blog_id field, you’ll have access to the #instance method on the rows:

search = Sunspot.search(Post) do
  facet :blog_id
search.facet(:blog_id).rows.first.instance #=> #<Blog:0xb7e1cd0c>

Facet by class

If you’re performing a search on multiple object types, you may want to facet based on the class of the documents. Sunspot now adds the :class field to all index setups, and allows faceting on it. The facet row values are Class objects:

search = Sunspot.search(Post, Comment) do
  keywords 'great pizza'
  facet :class
search.facet(:class).rows.first.value #=> Post

New features that don’t fit into a group

Batch indexing

In my company’s production application, we perform complex operations that initiate Solr indexing from disparate places within the application code. However, it’s more efficient to send all adds/updates as part of a single request; the Sunspot.batch method makes that simple:

Sunspot.batch do

When the batch block exits, Sunspot will send all of the indexed documents in a single HTTP request.

Date field type

Java doesn’t have a built-in type that contains date information without time information, like Ruby’s Date does; neither does Solr. For convenience, the new release creates a new date type, which indexes Ruby Date objects. Internally, the dates are stored as a time, with the time portion at midnight UTC. Facet values and stored values are returned as Ruby Date objects as expected.

Access to data accessors

Let’s say you’re running a Solr search against objects that are persisted with ActiveRecord; wouldn’t it be nice to be able to specify :include arguments for the database query? Toward this end, Sunspot now allows you to access the accessor for a given class from inside the query DSL; accessors can implement any methods they’d like to inform how data should be pulled from persistent storage.

For instance, let’s say your ActiveRecord adapter’s data accessor has an #includes= method, which tells it to pass the arguments into ActiveRecord’s :include option when performing the query. You can access that functionality like so:

Sunspot.search(Post, Comment) do
  adapter_for(Post).includes = [:blog, :comments]

Note that even if Post and Comment use the same adapter class (i.e., an ActiveRecord adapter), Sunspot will use a separate adapter instance for each, so you can safely set different options for each.

Easily configure your Solr installation with sunspot-configure-solr

While using the packaged Solr installation is great for development, I don’t recommend using it in production. The new release includes a new executable called sunspot-configure-solr, which writes a schema.xml file to the Sunspot installation of your choice, backing up the old schema.xml if it exists. sunspot-configure-solr includes a few options for areas where you can safely customize your schema:

The tokenizer class to use for fulltext field tokenization; the default is solr.StandardTokenizerFactory
Comma-separated list of extra filters to apply to fulltext fields. These will be applied after the default solr.StandardFilterFactory and solr.LowerCaseFilterFactory.
Solr home directory in which to install the schema file. This directory should contain a conf directory (it will be created if not). The default is the working directory from which the command is issued.

The tokenizer and filter classes can be specified with a shorthand: if the name passed is unqualified (i.e., doesn’t have any periods), it will be prefixed with “solr.” and suffixed with “FilterFactory” or “TokenizerFactory” respectively:

$ sunspot-configure-solr --dir /var/solr --tokenizer com.myapp.MyTokenizerFactory --filters EnglishPorter,com.myapp.MyFilterFactory

This will set the tokenizer to com.myapp.MyTokenizerFactory and add the extra filters solr.EnglishPorterFilterFactory. Note that more advanced Solr users will want to work with the schema file directly; just don’t change the naming scheme for the dynamic typed fields.

RSolr replaces solr-ruby

solr-ruby has been the de facto low-level Solr interaction layer for several years; RSolr is a newer library that has several advantages over solr-ruby:

  • It’s more actively maintained.
  • It passes queries directly to Solr without interpreting or modifying the parameters; this means that it implicitly supports any query parameters that are supported by Solr (or any Solr extensions that are installed).
  • It gives you the choice between using Net::HTTP, which is slow, and curb, which is a Ruby interface to libcurl, and is fast. Sunspot uses Net::HTTP for HTTP interaction by default for maximum compatibility, but applications can easily switch to curb by setting Sunspot.config.http_client = :curb (do this before initiating any interaction with Solr).

Remove accidental ActiveSupport dependency

Sunspot 0.8 required WillPaginate into the spec suite by default, which in turn loaded ActiveSupport. Because of this, a few places in the code were inadvertantly using ActiveSupport extensions, and the specs still passed even though they shouldn’t have. I modified the spec suite to only load WillPaginate if an environment variable is passed, and fixed the broken specs.

Toward the future

So, what’s next, you may wonder! Perhaps you have a few ideas of your own. Perhaps they are:


Solr supports keyword highlighting — this has never been a big priority for me but I have heard from other Sunspot users that it would be a nice thing to have, so I’m hoping to get support for that in a future version.

LocalSolr support

LocalSolr is a Solr extension that brings geographical-based searching to Solr; in particular, results can be restricted and sorted by distance from a given lat/long. Do want.

Query facet abstraction

I’ve just begun giving this thought, but it seems pretty clear from the query faceting example above that certain common use cases for query facets could be abstracted into a more concise API. For instance, wouldn’t it be nice to write that example as:

Sunspot.search(Post) do
  range_facet :average_rating, 1.0..2.0, 2.0..3.0, 3.0..4.0, 4.0..5.0

Install it.

$ sudo gem install outoftime-sunspot --source=http://gems.github.com

Be in touch.

My goal for Sunspot has always been for it to become the de facto Solr abstraction library for Rubyists. I’m always happy to get feature requests, bug reports, and especially patches.

  • If you notice some missing functionality in Sunspot or have a sweet idea for a new feature, please shoot a message to the Sunspot mailing list.
  • Found a bug? Submit a ticket on Lighthouse
  • Either of the above, and have a patch? Shoot me a pull request on GitHub.
View Comments
26 May 2009

Sunspot 0.8 is out

On Friday, I released the next milestone in Sunspot, version 0.8. This version doesn’t add to or change any of the basic functionality, but does add some advanced features which the app I work on for my day job happens to demand. Here’s a rundown:

Direct access to the Query API

Users of Sunspot will doubless be familiar with Sunspot’s search DSL, which gives an English-like interface for constructing search parameters. In some cases, however, such a DSL is actually counterproductive, particularly when searches are being built by an intermediate object, and thus not necessarily all in one place. So, the new methods Sunspot.new_search() and Search#query() are exposed, and the Sunspot::Query class itself is now part of the public API. What I have in mind in particular here is an application of the Go4 Builder pattern, along with ActiveRecord’s hash-initializer pattern, to elegantly translate web query parameters into a Sunspot search. Here’s a stripped-down example of what I think the code will look like to do that:

class EventSearchBuilder
  attr_reader :search

  def initialize(options = {})
    @search = Sunspot.new_search(Event)
    options.each_pair do |attr, value|
      if respond_to?("#{attr}=")
        send("#{attr}=", value)

  def when=(day_string)
    case day_string
    when 'future'
      @search.query.add_restriction(:start_time, :greater_than, Time.now)
    when 'past'
      @search.query.add_restriction(:start_time, :less_than, Time.now)
      date_time = Date.parse(day_string).to_time
      @search.query.add_restriction(:start_time, :between, date_time..(date_time + 1.day))

  def page=(page)

  def sort=(field)

Then in controller code, it’s as simple as:

def search
  @search = EventSearchBuilder.new(params).search

Dynamic Fields

I wouldn’t be surprised if I’m the only person who ever uses this feature of Sunspot, but just in case, let’s look at a real-world example. Let’s say part of my data model uses free-form key-value pairs, which use a constrained (but user-definable) set of keys and free-form values. I’ll call my model KeyValuePairs.

The trick I would like to pull here is that I would like to treat each key as a separate field in search, so that I can constrain, order, facet, etc. on the values for one key without them being affected by other keys. Since the keys are user-defined, I can’t just set up normal fields at build time; they need to be defined at index time. Enter Sunspot’s dynamic fields (we’ll use Sunspot::Rails’s wrapper API here):

class Business < ActiveRecord::Base
  has_many :key_value_pairs

  searchable do
    dynamic_string :key_value_pairs do
      key_value_pairs.inject({}) do |hash, pair|
        hash.merge(pair.key.to_sym => pair.value)

This sets up a dynamic field which is populated using the given block. What’s important there is that the field is populated using a hash - the keys of the hash become individual dynamic fields, and the values populate those fields in the index. The “base name” of the field is key_value_pairs, which is used to namespace the dynamic names that come out of the hash.

Working with dynamic fields is a lot like working with regular ones, except in the query, calls are wrapped in a dynamic block:

Business.search do
  dynamic :key_value_pairs do
    with(:cuisine, 'Sushi')

Naturally, those field names (:cuisine, :atmosphere) wouldn’t be hard-coded in a real application, since they would not be known at build time.

Dirty Sessions

Sessions now track whether any operations have been performed since the last time a commit was issued. The Session#dirty? method answers that question, and the Session#commit_if_dirty does exactly what it sounds like. Useful methods if you want to keep your commits to a minimum (you do) but you may have various parts of the code issuing Sunspot operations without any central knowledge on the part of your application.

That’s all for now

Sunspot 0.9 is up next; the main goal for that version is to replace solr-ruby with RSolr as the low-level Solr interface, which will open the door to more features in future versions (query-based faceting, LocalSolr support, etc.), but probably won’t have much effect on the API for that version (other than supporting use of the faster Curb library for the HTTP communication with Solr).

View Comments
21 May 2009

Installing alternate Ruby versions as optional packages

As a developer of Ruby libraries and applications, I’d like to make sure my code works in all of the major ruby implementations, but I’ve also got my “main” Ruby, the one that has been with me through thick and thin and happens to be the version installed on our production servers. The other Rubies need a place on my machine, but I’d like that place to be out of the way and have no chance of conflicting with my main Ruby installation or anything else I’ve got installed.

Fortunately, the omniscient beings who created the Filesystem Hierarchy Standard anticipated this need of mine, and in their wisdom created the /opt directory for this purpose. Unlike a normal package installation, which installs files in various places across your file system - /usr/bin, /usr/lib, /etc, /var, and the like - optional package installations put everything into a single subdirectory of /opt, where they’re fairly isolated from the rest of the system.

So, here’s how I installed YARV and JRuby as optional packages. This should work for anyone using Linux or Mac OS X1:

Download the packages

Find a nice directory for downloads.

$ wget ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p129.tar.gz
$ wget http://dist.codehaus.org/jruby/1.2.0/jruby-bin-1.2.0.tar.gz

Install YARV

$ sudo mkdir -pv /opt/ruby-1.9.1-p129
$ tar xzvf ruby-1.9.1-p129.tar.gz
$ cd ruby-1.9.1-p129
$ ./configure --prefix=/opt/ruby-1.9.1-p129
$ make
$ sudo make install

Install JRuby

$ sudo tar -C /opt -xzvf jruby-bin-1.2.0.tar.gz
$ sudo rm -v /opt/jruby-1.2.0/bin/*.bat

You can also remove most of the directories in the /opt/jruby-1.2.0/lib/native directory, except the one that corresponds to your architecture. If in doubt, leaving them all in won’t hurt.

Installing Gems

Assuming you’ve got RubyGems installed in your main Ruby installation, you don’t need to install it for your other installations - you can simply run the existing gem script using the various binaries, and it’ll work the way you want (installing the gems inside those optional package directories). For example:

$ sudo /opt/ruby-1.9.1-p129/bin/ruby -S gem install rake
$ sudo /opt/jruby-1.2.0/bin/jruby -S gem install rake

Using the small rubies script I covered in this post makes the process of installing gems (and doing anything else) in your various ruby versions considerably less painful.

1If you use MacPorts, which you probably do, you've got a bunch of software installed in a standard hierarchy inside of the `/opt/local` directory. That isn't really the [way it was intended to be used](http://www.pathname.com/fhs/pub/fhs-2.3.html#OPTADDONAPPLICATIONSOFTWAREPACKAGES), but it won't conflict with the installations covered in this post. View Comments