1 float4 smlar(anyarray, anyarray)
2 - computes similary of two arrays. Arrays should be the same type.
4 float4 smlar(anyarray, anyarray, bool useIntersect)
5 - computes similary of two arrays of composite types. Composite type looks like:
6 CREATE TYPE type_name AS (element_name anytype, weight_name FLOAT4);
7 useIntersect option points to use only intersected elements in denominator
8 see an exmaples in sql/composite_int4.sql or sql/composite_text.sql
10 float4 smlar( anyarray a, anyarray b, text formula );
11 - computes similary of two arrays by given formula, arrays should
13 Predefined variables in formula:
14 N.i - number of common elements in both array (intersection)
15 N.a - number of uniqueelements in first array
16 N.b - number of uniqueelements in second array
18 smlar('{1,4,6}'::int[], '{5,4,6}' )
19 smlar('{1,4,6}'::int[], '{5,4,6}', 'N.i / sqrt(N.a * N.b)' )
20 That calls are equivalent.
23 - returns true if similarity of that arrays is greater than limit
25 float4 show_smlar_limit() - deprecated
26 - shows the limit for % operation
28 float4 set_smlar_limit(float4) - deprecated
29 - sets the limit for % operation
31 Use instead of show_smlar_limit/set_smlar_limit GUC variable
32 smlar.threshold (see below)
35 text[] tsvector2textarray(tsvector)
36 - transforms tsvector type to text array
38 anyarray array_unique(anyarray)
39 - sort and unique array
41 float4 inarray(anyarray, anyelement)
42 - returns zero if second argument does not present in a first one
43 and 1.0 in opposite case
45 float4 inarray(anyarray, anyelement, float4, float4)
46 - returns fourth argument if second argument does not present in
47 a first one and third argument in opposite case
49 GUC configuration variables:
52 Array's with similarity lower than threshold are not similar
55 smlar.persistent_cache BOOL
56 Cache of global stat is stored in transaction-independent memory
59 Type of similarity formula: cosine(default), tfidf, overlap
61 smlar.stattable STRING
62 Name of table stored set-wide statistic. Table should be
64 CREATE TABLE table_name (
65 value data_type UNIQUE,
66 ndoc int4 (or bigint) NOT NULL CHECK (ndoc>0)
68 And row with null value means total number of documents.
69 See an examples in sql/*g.sql files
70 Note: used on for smlar.type = 'tfidf'
72 smlar.tf_method STRING
73 Calculation method for term frequency. Values:
74 "n" - simple counting of entries (default)
76 "const" - TF is equal to 1
77 Note: used on for smlar.type = 'tfidf'
79 smlar.idf_plus_one BOOL
80 If false (default), calculate idf as log(d/df),
81 if true - as log(1+d/df)
82 Note: used on for smlar.type = 'tfidf'
84 Module provides several GUC variables smlar.threshold, it's highly
85 recommended to add to postgesql.conf:
86 custom_variable_classes = 'smlar' # list of custom variable class names
87 smlar.threshold = 0.6 #or any other value > 0 and < 1
88 and other smlar.* variables
90 GiST/GIN support for % and && operations for:
91 Array Type | GIN operator class | GiST operator class
92 ---------------+----------------------+----------------------
93 bit[] | _bit_sml_ops |
94 bytea[] | _bytea_sml_ops | _bytea_sml_ops
95 char[] | _char_sml_ops | _char_sml_ops
96 cidr[] | _cidr_sml_ops | _cidr_sml_ops
97 date[] | _date_sml_ops | _date_sml_ops
98 float4[] | _float4_sml_ops | _float4_sml_ops
99 float8[] | _float8_sml_ops | _float8_sml_ops
100 inet[] | _inet_sml_ops | _inet_sml_ops
101 int2[] | _int2_sml_ops | _int2_sml_ops
102 int4[] | _int4_sml_ops | _int4_sml_ops
103 int8[] | _int8_sml_ops | _int8_sml_ops
104 interval[] | _interval_sml_ops | _interval_sml_ops
105 macaddr[] | _macaddr_sml_ops | _macaddr_sml_ops
106 money[] | _money_sml_ops |
107 numeric[] | _numeric_sml_ops | _numeric_sml_ops
108 oid[] | _oid_sml_ops | _oid_sml_ops
109 text[] | _text_sml_ops | _text_sml_ops
110 time[] | _time_sml_ops | _time_sml_ops
111 timestamp[] | _timestamp_sml_ops | _timestamp_sml_ops
112 timestamptz[] | _timestamptz_sml_ops | _timestamptz_sml_ops
113 timetz[] | _timetz_sml_ops | _timetz_sml_ops
114 varbit[] | _varbit_sml_ops |
115 varchar[] | _varchar_sml_ops | _varchar_sml_ops