| PostgreSQL 9.0beta2 Documentation | ||||
|---|---|---|---|---|
| Prev | Fast Backward | Appendix F. Additional Supplied Modules | Fast Forward | Next |
unaccent removes accents (diacritic signs) from a lexeme. It's a filtering dictionary, that means its output is always passed to the next dictionary (if any), contrary to the standard behavior. Currently, it supports most important accents from european languages.
Limitation: Current implementation of unaccent dictionary cannot be used as a normalizing dictionary for thesaurus dictionary.
A unaccent dictionary accepts the following options:
RULES is the base name of the file containing the list of translation rules. This file must be stored in $SHAREDIR/tsearch_data/ (where $SHAREDIR means the PostgreSQL installation's shared-data directory). Its name must end in .rules (which is not to be included in the RULES parameter).
The rules file has the following format:
Each line represents pair: character_with_accent character_without_accent
À A
Á A
 A
à A
Ä A
Å A
Æ A
Look at unaccent.rules, which is installed in $SHAREDIR/tsearch_data/, for an example.
Running the installation script creates a text search template unaccent and a dictionary unaccent based on it, with default parameters. You can alter the parameters, for example
=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
or create new dictionaries based on the template.
To test the dictionary, you can try
=# select ts_lexize('unaccent','Hôtel');
ts_lexize
-----------
{Hotel}
(1 row)
Filtering dictionary are useful for correct work of
ts_headline function.
=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
=# ALTER TEXT SEARCH CONFIGURATION fr
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, french_stem;
=# select to_tsvector('fr','Hôtels de la Mer');
to_tsvector
-------------------
'hotel':1 'mer':4
(1 row)
=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
?column?
----------
t
(1 row)
=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
ts_headline
------------------------
<b>Hôtel</b>de la Mer
(1 row)
unaccent function removes accents (diacritic signs) from
argument string. Basically, it's a wrapper around
unaccent dictionary.
unaccent([dictionary,
] string)
returns text
SELECT unaccent('unaccent','Hôtel');
SELECT unaccent('Hôtel');