Description: Drupal 7 note: The development version for Drupal 7.x is working! If you don't see the development version available below, check back in about 24 hours (or check it out from HEAD in CVS). Check issue #674064: Drupal 7 version of Porter Stemmer to monitor progress of the Drupal 7 version.
This module implements the Porter stemming algorithm to improve English-language searching with the Drupal built-in Search module.
The process of stemming reduces each word in the search index to its basic root or stem (e.g. 'blogging' to 'blog') so that variations on a word ('blogs', 'blogger', 'blogging', 'blog') are considered equivalent when searching. This generally results in more relevant search results.
Porter Stemmer version 6.x-2.0 and later versions use Version 2 of the Porter stemming algorithm, which is the version that Porter currently recommends using for live applications. Older versions of Porter Stemmer (including all 5.x versions) use the original Porter stemming algorithm.
Note that although the Porter stemming algorithm is not specific to American English, some British spellings will not be fully stemmed. Most notably, -ise word endings are not stemmed as well as -ize, due to technical issues in the algorithm.
After installing and enabling this module (in the usual way), you will need to rebuild the search index. To do this:
1. Visit Administer > Site configuration > Search settings, and click on "Re-index site".
2. Ensure that cron has run sufficient times so that the Search Settings page shows that the site is 100% indexed. You can run cron manually by visiting Reports > Status report and clicking on the "Run cron manually" link.
Limitations and Notes
* The Porter stemming algorithm has a few parts that work better with American English than British English, so some British spellings will not be stemmed correctly. It is also definitely English-specific, and non-English content will not be stemmed correctly.
* The core Search module does not currently provide a way for a stemming module (such as Porter Stemmer) to know the language of content or search terms during searching or search indexing. So, if you have a multi-lingual site and enable the Porter Stemmer module, it will unfortunately try to apply its stemming algorithm to all the content on your site, regardless of language. See this issue for details: #363336: Porter-stemmer should only stem english or language neutral content for a multi-language site.
* The Porter stemming algorithm attempts to reduce words to their lingustic root words -- it does not do general substring matching. So, for instance, it should make "walk", "walking", "walked", and "walks" all match in searching, but it will not make "walking" a match for "king".
* There is currently an issue with exerpts in Porter Stemmer (see: #437084: Excerpt fails to find stemmed keyword). For example, if a page contains the word "walking" and someone searches for "walk", that page will be included in the search results, but the search excerpt will not display the portion of text containing "walking" (it will probably just display the first paragraph of text on that page).
Related: porter, Stemming, Search, stemmer, Algorithm, drupal, stemmed, Version, Module, quotwalkingquot, Searching, Content, Language, Versions, English, British, Issue, quotwalkquot
O/S:BSD, Linux, Solaris, Mac OS X
File Size: 163.8 KB