Searching for Text in Amazon CloudSearch
You can search both text and literal fields for a text string:
Text
andtext-array
fields are always searchable. You can search for individual terms as well as phrases. Searches withintext
andtext-array
fields are not case-sensitive.Literal
andliteral-array
fields can only be searched if they are search enabled in the domain's indexing options. You can search for an exact match of your search string. Searches in literal fields are case-sensitive.
If you use the simple query parser or do not specify a field when searching with the structured query parser, by default all text
and text-array
fields are searched. Literal fields are not searched by default. You can specify which fields you want to search with the q.options
parameter.
You can search the unique document ID field like any text field. To reference the document ID field in a search request, you use the field name _id
. Document IDs are always returned in the search results.
Topics
Searching for Individual Terms in Amazon CloudSearch
When you
search text
and text-array
fields for individual terms, Amazon CloudSearch finds all documents
that contain the search terms anywhere within the specified field, in any order. For
example, in the sample movie data, the title
field is configured as a
text
field. If you search the title
field for
star, you will find all of the movies that contain
star anywhere in the title
field, such as
star, star wars, and a star is
born. This differs from searching literal
fields, where the
field value must be identical to the search string to be considered a match.
The simple
query parser provides an easy way to search text
and text-array
fields for one or more
terms. The simple
query parser is used by default unless you use the q.parser
parameter
to specify a different query parser.
For example, to search for katniss, specify
katniss
in the query string. By default, Amazon CloudSearch includes all return enabled fields
in the search results. You can specify the return
parameter to specify which fields you want to return.
https://search-
domainname
-domainid
.us-east-1.cloudsearch.amazonaws.com/ 2013-01-01/search?q=katniss&return=title
By default, the response is returned in JSON:
{ "status": { "rid": "rd+5+r0oMAo6swY=", "time-ms": 9 }, "hits": { "found": 3, "start": 0, "hit": [ { "id": "tt1951265", "fields": { "title": "The Hunger Games: Mockingjay - Part 1" } }, { "id": "tt1951264", "fields": { "title": "The Hunger Games: Catching Fire" } }, { "id": "tt1392170", "fields": { "title": "The Hunger Games" } } ] } }
To specify multiple terms, separate the terms with a space. For example:
star wars
. When you specify multiple search terms, by default
documents must contain all of the terms to be considered a match. The terms can
occur anywhere within the text field, in any order.
By default, all text
and text-array
fields are searched when you use
the simple query parser. You can specify which fields you want to search by specifying the q.options
parameter. For example, this query constrains the search to the title
and description
fields and boosts the importance of matches in the title
field over matches in the description
field.
q=star wars&q.options={fields: ['title^5','description']}
When you use the simple query parser, you can use the following prefixes to designate individual terms as required, optional, or to be excluded from the search results:
-
+
—matching documents must contain the term. This is the default—separating terms with a space is equivalent to preceding them with the+
prefix. -
-
—exclude documents that contain the term from the search results. The-
operator only applies to individual terms. For example, to exclude documents that contain the term star in the default search field, specify:-star
. Searching forsearch?q=-star wars
retrieves all documents that do not contain the term star, but do contain the term wars. -
|
—include documents that contain the term in the search results, even if they don't contain the other terms. The|
operator only applies to individual terms. For example, to include documents that contain either of two terms, specify:term1 |term2
. Searching forsearch?q=star wars |trek
includes documents that contain both star and wars, or the term trek.
These prefixes only apply to individual terms in a simple query. To construct compound queries, you need to use the structured query parser, rather than the simple query parser. For example, to search for the terms star and wars using the structured query parser you would specify:
(and 'star' 'wars')
Note that this query matches documents that contain each of the terms in any of the fields being searched. The terms do not have to be in the same field to be considered a match. If, however, you specify (and 'star wars' 'luke')
, star and wars must occur within the same field, and luke can occur in any of the fields.
If you don't specify any fields when you use the structured
query
parser, all text
and text-array
fields are searched by
default, just like with the simple
parser. Similarly, you can use the
q.options
parameter to control which fields are searched and to boost
the importance of selected fields. For more information, see Constructing Compound Queries.
You can also perform fuzzy searches with the simple query parser. To perform a fuzzy search,
append the ~
operator and a value that indicates how much terms can differ from the user query string and
still be considered a match. For example, the specifying planit~1
searches for the term planit and allows matches to differ by up to one character, which means the results will include hits for planet.
Searching for Phrases in Amazon CloudSearch
When you search for a phrase, Amazon CloudSearch finds all documents that contain the complete phrase in the order specified. You can also perform sloppy phrase searches where the terms appear within the specified distance of one another.
To match a complete phrase rather than the individual terms in the phrase when you search with the simple query parser, enclose the phrase in double quotes. For example, the following query searches for the phrase with love.
q="with love"
To perform a sloppy phrase search with the simple query parser, append the ~
operator and a
distance value. The distance
value specifies the maximum number of words that can separate the words in the
phrase. For example, the following query searches for the terms with love within three words of one another.
q="with love"~3
In a compound query, you use the phrase
operator to specify the phrase you want to match; for example:
(phrase field=title 'star wars')
To perform a sloppy phrase search in a compound query, you use the near
operator. The near
operator enables you to specify the phrase you are looking for and how far apart the terms can be within
a field and still be considered a match. For example, the following query matches documents that have the terms star and wars no more than three words apart in the title
field.
(near field=title distance=3 'star wars')
For more information, see Constructing Compound Queries.
Searching for Literal Strings in Amazon CloudSearch
When you search a literal field for a string, Amazon CloudSearch returns only those documents that contain an
exact match for the complete search string in the specified field, including case. For example, if the title
field is configured as
a literal field and you search for Star, the value of the
title
field must be Star to be considered a
match—star, star wars and
a star is born will not be included in the
search results. This differs from text fields, where searches are not case-sensitive and
the specified search terms can appear anywhere within the field in any order.
To search a literal field, prefix the search string with the name of the literal field you want to search, followed by a colon. The search string must be enclosed in single quotes. For example, the following query searches for the literal string Sci-Fi.
genres:'Sci-Fi'
This example searches the genre field of each document and matches all documents whose genre field contains the value Sci-Fi. To be a match, the field value must be an exact match for the search string, including case. For example, documents that contain the value Sci-Fi in the genre field will not be included in the search results if you search for sci-fi or young adult sci-fi.
In a compound query, you use the term
operator syntax to search literal fields. For
example, (term field=genres 'Sci-Fi')
. For more information, see Constructing Compound Queries.
You can use literal fields in conjunction with faceting to enable users to drill down into the results according to the faceted attributes. For more information about faceting, see Getting and Using Facet Information in Amazon CloudSearch.
Searching for Prefixes in Amazon CloudSearch
You can search text
, text-array
, literal
, and
literal-array
fields for a prefix rather than for
a complete term. This matches results that contain the prefix followed by zero or more
characters. You must specify at least one character as the prefix. (To match all documents,
use the matchall
operator in a structured query.) In general, you should use a prefix that contains
at least two characters to avoid matching an excessive number of documents.
When you search a text
or text-array
field, terms that match
the prefix can occur anywhere within the contents of the field. When you search literal
fields, the entire search string, up to and including the prefix characters, must match
exactly.
Simple query parser—use the
*
(asterisk) wildcard operator to search for a prefix, for examplepre*
.Structured query parser—use the
prefix
operator to search for a prefix, for exampleprefix 'pre'
For example, the following query searches for the prefix oce in the title field and returns the title of each hit:
q=oce*&q.options={fields:['title']}&return=title
If you perform this search against the sample movie data, it returns as Ocean's Eleven and Ocean's Twelve:
{ "status": { "rid": "hIbIxb8oRAo6swY=", "time-ms": 2 }, "hits": { "found": 2, "start": 0, "hit": [ { "id": "tt0240772", "fields": { "title": "Ocean's Eleven" } }, { "id": "tt0349903", "fields": { "title": "Ocean's Twelve" } } ] } }
In a compound query, you use the prefix
operator to search for prefixes. For example, to search the title
field for the prefix
oce, you specify:
q.parser=structured&q=(prefix field%3Dtitle 'oce')
Note the URL encoding. For more information, see Constructing Compound Queries.
Note
When performing wildcard searches on text fields, keep in mind that Amazon CloudSearch tokenizes the text fields during indexing and performs stemming according to the analysis scheme configured for the field. Normally, Amazon CloudSearch performs the same text processing on the search query. However, when you search for a prefix with the wildcard operator (*) or prefix
operator, no stemming is performed on the prefix. This means that a search for a prefix that ends in s
won't match the singular version of the term. This can happen for any term that ends in s
, not just plurals. For example, if you search the actor
field in the sample movie data for Anders
, there are three matching movies. If you search for Ander*
, you get those movies as well as several others. However, if you search for Anders*
there are no matches. This is because the term is stored in the index as ander
, anders
does not appear in the index. For more information about how Amazon CloudSearch processes text and how it can affect searches, see Text Processing in Amazon CloudSearch.