From: | psql-mail(at)freeuk(dot)com |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Tsearch2 custom dictionaries |
Date: | 2003-08-07 15:14:18 |
Message-ID: | E19kmT4-000OdB-00@buckaroo.freeuk.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> On Thu, 7 Aug 2003 psql-mail(at)freeuk(dot)com wrote:
>
> > Part1.
> >
> > I have created a dictionary called 'webwords' which checks all
words
> > and curtails them to 300 chars (for now)
> >
> > after running
> > make
> > make install
> >
> > I then copied the lib_webwords.so into my $libdir
> >
> > I have run
> >
> > psql mybd < dict_webwords.sql
> >
> Once you did 'psql mybd < dict_webwords.sql' you should be able use
it :)
> Test it :
> select lexize('webwords','some_web_word');
I did test it with
select lexize('webwords','some_web_word');
lexize
-------
{some_web_word}
select lexize('webwords','some_400char_web_word');
lexize
--------
{some_shortened_web_word}
so that bit works, but then I tried
SELECT to_tsvector( 'webwords', 'my words' );
Error: No tsearch config
> Did you read http://www.sai.msu.su/~megera/oddmuse/index.cgi/Gendict
yeah, i did read it - its good!
should i run:
update pg_ts_cfgmap set dict_name='{webwords}';
> > Part2.
<snip>
> > As the text can be multilingual I don't think stemming is possible?
>
> You're right. I'm afraid you need UTF database, but tsearch2 isn't
> utf-8 compatible :(
My database was created as unicode - does this mean I cannot use
tsaerch?!
> > I also need to include many none-standard words in the index such
as
> > urls and message ID's contained in the text.
> >
>
> What's message ID ? Integer ? it's already recognized by parser.
>
> try
> select * from token_type();
>
> Also, last version of tsearch2 (for 7.3 grab from
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/,
> for 7.4 - available from CVS)
> has rather useful function - ts_debug
>
> apod=# select * from ts_debug('http://www.sai.msu.su/~megera');
> ts_name | tok_type | description | token | dict_name |
tsvector
> ---------+----------+-------------+----------------+-----------+------
------------
> simple | host | Host | www.sai.msu.su | {simple} | 'www.
sai.msu.su'
> simple | lword | Latin word | megera | {simple} | '
megera'
> (2 rows)
>
>
>
> > I get the feeling that building these indexs will by no means be an
> > easy task so any suggestions will be gratefully recieved!
> >
>
> You may write your own parser, at last. Some info about parser API:
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_in_Brief
Parser writing...scary stuff :-)
Thanks!
--
From | Date | Subject | |
---|---|---|---|
Next Message | scott.marlowe | 2003-08-07 15:49:31 | Re: ext3 block size |
Previous Message | Együd Csaba | 2003-08-07 14:55:55 | cannot open multi-query plan as cursor |