Multi-table Full Text Search Postgres with a somewhat nonstandard approach to search Often search will be done with external service we’ll use the tool already responsible for storing and finding data

@calebthompson calebthompson.io * Tuesday Hug * Close your laptops, there’s a lot of code here.s

I don’t want to jump into my credentials the point of credentials is to get you to listen, and you’re already here. So I’ll have more about me at the end but I do have one claim-to-fame

My birthday, so you know I’m an expert.

I'm going to talk to you about

A real-life feature. iterate, explore options, and optimize

Full Text Search What is it, how can it help us

Multi-table Full Text Search Postgres with You probably could have guessed that from the title

Views database views

Performance performance implications of using this approach and ways to mitigate them

Materialize d Views a somewhat recent feature

Gems that can help us out

Other Options for search in your application

Let's search for Articles We're going to look at a classic example

Article.where( "body LIKE %?%" , query)

Exact substrings That'll get us articles that contain an exact string

but that's not very useful

Article.where("body ILIKE %?%", query) That's better, at least we ignore case

Search on title What about the title?

Article.where(

"body ILIKE %?% OR title ILIKE %?%" , query, query) I'm sure you've seen this

Search by Author’s name Okay, now we want to search by author name too

Scopes def self.search(query)

joins(:user) .where(<<-SQL, query, query, query) articles.body ILIKE %?% OR articles.title ILIKE %?% OR users.name ILIKE %?% SQL

end We could get crazy with scopes

Query object class Search def self.for(query) [

Article.where("title ILIKE %?%", query), Article.where("body ILIKE %?%", query), Article.joins(:user) .where("users.name ILIKE %?%", query),

].flatten.uniq end end We could try a query object

Poor results And the results really aren't all that great Only matches for case-insensitive substrings

Full Text Search Enter query for rows based on natural language searching

natural language searching

remove stop words and, the, also, they, would

eliminate casing “Factory” and “factory” should return the same results

synonyms hunger and hungrily

stemming “try” and “trying”, “tries”, and “tried” will be recorded in the index under the single concept word "tri."

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Example of making that same query

SELECT DISTINCT(id) FROM ( SELECT id AS id,

title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text Grab the text we want to search on operator

FROM articles UNION SELECT articles.id AS id,

authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Grab the text we want to search on operator

SELECT DISTINCT(id) FROM ( SELECT

id AS id, title || ' ' || body AS the_text

FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors Also get the id for the article

title || ' ' || body AS the_text 

FROM articles UNION SELECT

articles.id AS id, authors.name AS the_text

FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Also get the id for the article

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT uniq as with query

That’s a lot of SQL. Where do we put it

We could throw that into our query object class Search def self.for(query) <<-SQL SQL end end

Scopes def self.search(query) where(<<-SQL, query) SQL end But SQL doesn’t belong in .rb files

Postgres has our answer

Views

Partial queries

Stored in the Database

Can be SELECT e d from

Return set of columns Compose the set of fields to be searched with views

Multi- source Can be pulled from multiple tables

Complete the query then perform a WHERE or other query on it when you need the results

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; unique users, all rows, active_at column from activities active over the past week

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; Syntax to create a view

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; Unique users

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id)

users.*,

   activities.created_at AS active_at 

FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; All rows from user

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*,

activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; plus an active_at row from activities’ created_at

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; only those active over the past week

Querying a view looks just like querying a table SELECT * FROM users_with_recent_activity WHERE id IN (1,2,3…) ORDER BY active_at DESC

ActiveRecord can use a view as its backend

So we can create a fairly vanilla model

class UserWithRecentActivity
< ActiveRecord::Base def self.table_name "users_with_recent_activity" end def readonly? true end end

class UserWithRecentActivity
< ActiveRecord::Base

def self.table_name "users_with_recent_activity" end def readonly?

class UserWithRecentActivity
< ActiveRecord::Base def self.table_name "users_with_recent_activity" end def readonly? true end end

def self.table_name "users_with_recent_activity" end def readonly? true end end Not necessary; some views can be written/deleted

Will it work with full text search?

⚲ Textacular

Takes care of the Full Text Search portions of queries

Search over every text field on a record

Variant search options like basic_search, fuzzy search, and advanced search

Game .basic_search('Sonic') most generally useful for a search

Game .basic_search (

title: 'Mario', system: 'Nintendo' ) next simplest useful thing

So let's go back and look at the search we wrote

SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION

SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id;

Now our search result is really simple in the Rails side

class Search < ActiveRecord::Base

include Textacular belongs_to :article end

class Search < ActiveRecord::Base

include Textacular belongs_to :article end

Search. basic_search("Sandi") .map(&:article)

Search. basic_search("Sandi") . map(&:article)

class SearchResult

include Enumerable

def initialize( query )

@results = Search.basic_search(query)

end def each @results.each end end

CREATE Migration We create the view in a migration

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up

ActiveRecord::Base.connection.execute(

"--The CREATE VIEW sql from before" )

end def down

ActiveRecord::Base.connection.execute( 'DROP VIEW users_with_recent_activity' )

end end

How resistant to change is it?

Let's find out - time for some feature creep!

Articles whose comments match the query

Searching on • Article (title, body) • Author (name) • Comments (body)

SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION

SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION

SELECT article_id, body AS the_text FROM comments

articles.id AS article_id, 
authors.name AS the_text 

FROM authors JOIN articles ON authors.id = articles.author_id

UNION

SELECT article_id, body AS the_text FROM comments

UPDATE Migration Look at an update view migration

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up

ActiveRecord::Base.connection.execute(

"CREATE OR REPLACE VIEW searches AS --The sql from the new view" )

end def down

ActiveRecord::Base.connection.execute( "CREATE OR REPLACE VIEW searches AS --The sql from the old view" )

end end the down migration has to have the entire SQL of the previous view

UPDATE Migration Unfortunately, can’t (always) update can’t remove columns

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up

ActiveRecord::Base.connection.execute(

"DROP VIEW searches; CREATE VIEW searches AS --The sql from the new view" )

end def down

ActiveRecord::Base.connection.execute( "DROP VIEW searches; CREATE VIEW searches AS --The sql from the old view" )

end end DROP first

Can’t be dumped to db/schema.rb

config.database_format = :structure db/structure.sql

Scenic Adds view methods to migrations allows views to be dumped to db/schema.rb attempts to make views easier to work with

[show how scenic gets migrations from view.sql definitions] show how scenic gets migrations from view.sql definitions syntax highlighting in .sql

[show that it has versions] show that it has versions

Model generator $ rails generate scenic:model search

create app/models/search.rb create db/views/searches_v01.sql create db/migrate/..._create_searches.rb Can be a lot to remember - generators

$ cat app/views/searches_v01.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id

View generator $ rails generate scenic:view search create db/views/searches_v02.sql create db/migrate/..._update_searches_to_version_2.rb re-running scenic:view creates new version

$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id And the new sql file includes the previous definition

$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments Because most times you’ll have a small change to make and that’s easier.

Performance

This query is pretty slow

it has to search across three tables to get us all of the results we need

Full Text Search ~400 ms ILIKE ~40 ms Full Text Search vs ILIKE

Views can’t be indexed

Underlying tables can

Add

indices

There are several types of indices

btree Usually you'll have used btree

GIN & GIST The ones that apply to full text search are GIN and GIST Generalized Inverted iNdex Generalized Inverted Search Tree

GIN index lookups are about three times faster than GiST

GIN indexes take about three times longer to build than GiST

GIN indexes are moderately slower to update than GiST indexes in some cases can be much slower depending on database settings (fast-update)

GIN indexes are two-to-three times larger than GiST indexes

Use GIN • You don’t have hundreds of thousands of rows • You’re not concerned about longer writes blocking the db • You’re adding the index late in the game • You don’t care about disk space • You want fast lookups (read-heavy)

Use GiST • Very large tables • You have performance concerns • For some reason disk space is important • Your table is write-heavy

add_index :articles, :body, using: :gin add_index :articles, :title, using: :gin add_index :authors, :name, using: :gin add_index :comments, :body, using: :gin

Materialized Views

Pre-populate the results of the view by running the partial query the view provides and storing it in a temporary table

Query against result set Those results are queried against when you query the view, rather than performing the full query for each search

Full Text Search ~400 ms ILIKE ~40 ms Materialized ~ 5.394 ms

REFRESH MATERIALIZED VIEW searches Materialized views need to be updated whenever the underlying data changes

after_commit :refresh def refresh

class.connection.execute( "REFRESH MATERIALIZED VIEW #{table_name}" ) end This can be hooked into ActiveRecord callbacks

What about pre-built solutions? There are a lot of good options out there.

ElasticSearch via Tire or elasticsearch-rails or Chewy require 'elasticsearch/model' class Article < ActiveRecord::Base include Elasticsearch::Model include Elasticsearch::Model::Callbacks end

Solr via Sunspot class Post < ActiveRecord::Base searchable do text :title, :body text :comments do comments.map { |comment| comment.body } end boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end

text :title, :body text :comments do comments.map {|comment| comment.body} end

boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at

string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end

sphinx with ThinkingSphinx ThinkingSphinx::Index.define :article, :with => :active_record do indexes subject, :sortable => true indexes content indexes author.name, :as => :author, :sortable => true has author_id, created_at, updated_at end

These services excel at faceted search

These services excel at faceted search More di f ficult with full text search It can be done, but you’d need to work a bit harder

Run on your

development machine .

Run on your

production machine .

Needs to be faked in tests

Some of these have lots of

cru f t in models.

Remove a data

concern from your database

Arcane syntax

:( they make me sadface

By combining

materialized views

full text search

Rails magic ✨

we have a pretty cool search feature

that doesn't require any new

dependencies

:)

Thank you

Bibliography • http://shisaa.jp/postset/postgresql-full-text-search-part-1.html

• http://blog.lostpropertyhq.com/postgres-full-text-search-is-good- enough/

• http://www.postgresql.org/docs/9.3/static/textsearch.html

• http://linuxgazette.net/164/sephton.html

• http://www.postgresql.org/docs/current/static/sql-createview.html

• http://www.postgresql.org/docs/current/static/indexes-examine.html

• http://www.postgresql.org/docs/current/static/textsearch-indexes.html

• http://en.wikipedia.org/wiki/Gin_and_tonic

http://calebthompson.io/talks/search.html

Come say hi That’s why I do this