1 WildSpeed - fast wildcard search for LIKE operator
3 Wildspeed extension provides GIN index support for wildcard search
6 Online version of this document is available
7 http://www.sai.msu.su/~megera/wiki/wildspeed
11 * Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
12 * Teodor Sigaev <teodor@sigaev.ru>, Moscow, Moscow University,Russia
16 Stable version, included into PostgreSQL distribution, released under
17 BSD license. Development version, available from this site, released
18 under the GNU General Public License, version 2 (June 1991)
22 Stable version of wildspeed is available from
23 http://www.sigaev.ru/cvsweb/cvsweb.cgi/wildspeed/
28 % tar xzvf wildspeed.tar.gz
32 % psql DB < wildspeed.sql
34 Wildspeed provides opclass (wildcard_ops) and uses partial match
35 feature of GIN, available since 8.4. Also, it supports full index scan.
37 The size of index can be very big, since it contains entries for all
38 permutations of the original word, see [1] for details. For example,
39 word hello will be indexed as well as its all permutations:
40 =# select permute('hello');
42 --------------------------------------
43 {hello$,ello$h,llo$he,lo$hel,o$hell}
45 Notice, symbol '$' is used only for visualization, in actual
46 implementation null-symbol '\0' is used.
48 Search query rewritten as prefix search:
53 For example, search for 'hel*o' will be rewritten as 'o$hel'.
55 Special function permute(TEXT), which returns all permutations of
56 argument, provided for test purposes.
58 Performance of wildspeed depends on search pattern. Basically,
59 wildspeed is less effective than btree index with text_pattern_ops for
60 prefix search (the difference is greatly reduced for long prefixes) and
61 much faster for wildcard search.
63 Wildspeed by default uses optimization (skip short patterns if there
64 are long one), which can be turned off in Makefile by removing define
65 -DOPTIMIZE_WILDCARD_QUERY.
69 1. http://www.cs.wright.edu/~tkprasad/courses/cs499/L05TolerantIR.ppt, see also,
70 http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html
74 Table words contains 747358 records, w1 and w2 columns contains the
75 same data in order to test performance of Btree (w1) and GIN (w2)
78 Column | Type | Modifiers
79 --------+------+-----------
83 words=# create index bt_idx on words using btree (w1 text_pattern_ops);
86 words=# create index gin_idx on words using gin (w2 wildcard_ops);
93 words=# select pg_relation_size('words');
98 words=# select pg_relation_size('gin_idx');
104 words=# select pg_relation_size('bt_idx');
111 words=# select count(*) from words where w1 like 'a%';
118 words=# select count(*) from words where w2 like 'a%';
127 words=# select count(*) from words where w1 like '%asd%';
134 words=# select count(*) from words where w2 like '%asd%';
143 words=# set enable_seqscan to off;
144 words=# explain analyze select count(*) from words where w2 like '%';
147 --------------------------------------------------------------------------------------------------------------
148 -----------------------------
149 Aggregate (cost=226274.98..226274.99 rows=1 width=0) (actual time=2218.709..2218.709 rows=1 loops=1)
150 -> Bitmap Heap Scan on words (cost=209785.73..224406.77 rows=747283 width=0) (actual time=1510.516..1913.
151 430 rows=747358 loops=1)
152 Filter: (w2 ~~ '%'::text)
153 -> Bitmap Index Scan on gin_idx (cost=0.00..209598.91 rows=747283 width=0) (actual time=1509.358..1
154 509.358 rows=747358 loops=1)
155 Index Cond: (w2 ~~ '%'::text)
156 Total runtime: 2218.747 ms