Servizio di integrazione risorse
IMPORTANT NOTE: The present file
has not been updated yet.
Follows one internal RFC document that describes the
search engine syntax.
SEARCH ENGINE SYNTAX AND TIPS
We believe that the search engine behaviour should reflect today's
typical user web experience, i.e. it should ressemble Google rather
than classical librarian search screens.
Here are extensions to Google's syntax that we implemented to enable
complex structured search.
1. The typical Google behaviour is preserved, as described at
[http://www.google.com/help/refinesearch.html]. Which means that,
"ellis muon" ... matches all records that contain both the word
'ellis' and the word 'muon' anywhere
This means that by default a 'logical AND' kind of search is done,
not the phrase search as ALEPH does.
"ellis and muon" ... ditto, syntactic sugar
"+ellis +muon" ... ditto, syntactic sugar
"ellis not muon" ... matches all records that contain the word
'ellis' but that do not contain the word
"ellis -muon" ... ditto, syntactic sugar
"ellis or muon" ... matches all records that contain at least one
of the words
"ellis |muon" ... ditto, syntactic sugar
2. Logical operations are automatically chained from left to right:
(no parenthesis support at the moment)
"ellis muon or kaon" ... means "(ellis and muon) or kaon". [Hmm,
Google seems to have preference for the
"OR" branch, so we may want to change this
"muon or kaon ellis" ... means "(muon or kaon) and ellis"
3. Truncation is done via '*' character (truncates for any word
beginning or ending) and via '?' character (truncates for any
phrase beginning or ending, i.e. including whitespace, commas etc).
"muon*" ... matches records that contain words 'muon', 'muons',
"muon?" ... matches records that start by letters 'muon', e.g.
'Muon (g-2) in model...', 'Muon and Muon Neutrino
Fluxes...', 'Muon Anomalous Magnetic Moment...', etc.
(This is good for ACC style of searches.)
The difference is important: e.g. (1) if you search for "muon
anomalous*" within title, you will do a WRD style of search for the
word 'muon', and the word that start by letters 'anomalous', and
then you do a logical and operation. (2) if you search for "muon
anomalous?" within title, you will ACC style of search, so that
only records that start by letters 'muon anomalous' are retrieved.
The wildcards work both in the prefix and the postfix mode:
"*ism" ... matches records that contain words that end by letters
"*o*" ... matches records that contain words that contain the
letter 'o' anywhere in the word (kinda slow!)
4. Search for various fields is supported via Google "site:" like
syntax. For example:
"author:ellis" ... matches records containing the word 'ellis'
anywhere within author fields (100 $a, 700 $a)
"author:ellis, j?" ... matches records that start by 'ellis, j'
inside author fields (100 $a, 700 $a)
"title:muon*" ... matches records with the words like 'muon',
'muonic', 'muons', etc anywhere within the title
The old syntax is also accepted, e.g. "wab=trigger" is silently
changed into "abstract:trigger".
The power users may use MARC-21 fields directly:
"100__a:ellis?" ... matches records that have '100 $a' subfield
starting by letters 'ellis'
Note that all MARC-21 fields are searchable in this manner, for
"909C0e:WA9?" ... matches records that have '909 C0 $e' subfield
starting by letters 'WA9'
You can search several 'near' MARC-21 fields in one go:
"24*:the cms?" ... matches records that have '24x xx $x' subfields
(where 'x' stands for anything) starting by
letters 'the cms'
"24:the cms?" ... ditto, syntactic sugar
"24:?the cms?" ... ditto, but letters 'the cms' can be found
anywhere within these fields (not only at its
5. Phrase search syntax done when single or double quotes are used.
The search is done by searching directly MARC-21 tags. The phrase
search is in fact a substring search, so that:
"author:'ellis, j'" ... matches authors like 'Ellis, John' or
'Ellis, J' but also 'Nellis, Jim' etc.
So that the difference beween "author:ellis, j?" and
"author:'ellis, j'" is that in the former case the start of the
field is fixed by the letter 'e', while in the latter case any text
"title:'high precision'" ... matches titles "The high precision
measurements...', or '...rendered with
high precisions, etc.
"abstract:'f'" ... matches records that contain the letter 'f'
inside the abstract (extremely slow!). This is
because the phrase search is implemented as a
substring search, i.e. word boundaries are not
required nor checked.
6. All the syntax mentioned above can be combined together, e.g.
"author:ellis -muon* +abstract:'additional data' +909C0:199*" which
would match records that have the word 'ellis' inside author
fields, that do not contain words like 'muon', 'muonic' etc in any
field, that contain the phrase (or the substring, to be more
precise) 'additional data' inside abstract fields, and that have
'909 C0 $x' field (where 'x' stands for anything) starting by
(Of course, our typical users would just type something like:
"ellis -muon* +199* 'additional data'" instead, which will give
them approximately the same number of hits and about ~10 times
Please comment on this RFC as soon as possible.