Sorting, Faceting And Schema Design In Solr
I was recently with a client doing a Best Practices assesment when I came across
a common source of confusion related to sorting, faceting and schema design.
As background, Solr provides a schema that describes the Fields and Field Types (FT) that are used by an application. Field Types describe how Solr should handle the information contained in a Field. For instance, the integer FT tells Solr to treat the contents of any Field of type integer as, you guessed it, an integer. By integer here, I mean, good old fashioned Java ints.
Solr provides other FTs like long, double, float, string, date, as well as Text (which can be associated with Lucenes analysis process). Additionally, Solr provides several sortable FTs such as sint, slong, sdouble and sfloat. Therein lies the confusion. I think what happens is developers hear the word sortable and think they should use the sortable FT for any field they want to sort results by.
However, there is some subtlety here. Namely, sortable FTs manipulate the content so that the lexicographic order is the same as the numeric order for use during search. Sortables are thus really meant to be used when doing things like range queries (i.e. [price:2 TO 100]) and not for sorting as it relates to returning results. Due to these required changes, sortables take up more space in the index (and in memory) then their non-sortable compadres.
Whats this got to do with schema design? Well, this client had three fields, all defined as sortable integer FTs, as in:
1.fieldOriginal - The source of the content. This was the main field used for sorting.
2.fieldSearch Copy field of Original, but rounded to the nearest 100. This was the main field for searching.
3.fieldFacet Copy field of Original, but rounded based on a percentage of the original value so as to provide a sliding scale for faceting. This was the main field used for faceting.
In this case, the client was using the Original for sorting, Search for searching, and Facet for faceting. They were not doing any range queries, so they did not need fieldSearch to be sortable. Furthermore, the Original field had over 1 million unique terms, so sorting on it was taking up a good chunk of memory and disk space. The other two fields were smaller, so the cost of sortables was not that big of a deal. Finally, this field pattern was replicated for several other fields as well, some of which also had a significant number of unique terms.
Thus, simply by changing the Fields to use integers where appropriate, we significantly reduced the memory footprint and the disk space required in this client application.
So, as is always the case, play close attention to your schema design. While the Solr example schema is pretty good out of the box, you shouldnt just take it as gospel, either. Spend some time thinking about your needs during design and it will likely save you much time later when debugging and testing your application.
To know more about
Solr Application and
Faceted Search check out Lucid Imagination website
www.lucidimagination.com
by: Lucid Imagination
Moncler Down Jacket, A Symbol Of Style And Fashion Joe Rodeo Watches - An Exclusive Symbol of Style and Luxury Best Design Of Doudoune Moncler BMX Track Design Doudoune Moncler Show The Mind Of The Designers Find The Right Web Design Wolverhampton Company! Transparency in Web Design Peace Sign Tattoo Designs - Do Nothing Until You Read This! Even Be A Gorgeous Spouse With Mermaid Style Bridal Dresses Outsourcing Web Design For Best Results Mechanical System Design, 3d Modelling For Custom Machine Design Know About The Next In The Designs And Features Of Showers, Shower Cubicles And Shower Panels Steal Michelle Obamas Jewelry Style