Training, Open Source computer languages

PerlPythonMySQLTclRubyC & C++LuaJavaTomcatPHPhttpdLinux

Search our site for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
Finding words and work boundaries (MySQL, Perl, PHP)

If you're searching for the word "mile", you probably don't want the page that tells you that Sally Smiled at Harry. But you may want to find a Milestone, even if it is within quotes.

Regular Expressions are your friends!

In Perl style regular expressions (which also work in Python, and in PHP with the preg functions), the \b anchor (or 'zero width assertion') matches at a word boundary. In other words, it will let you find positions in your text string which have a special character followed by an alphanumeric, or an alphanumeric followed by a special character. It also matches at the very beginning or very end of the string if the string starts / ends with an alphanumeric.

example: /\bmile/i matches - ignoring case - words starting with mile.

In MySQL regular expressions (used with REGEXP and RLIKE matches), you have tow different anchors. [[:<:]] matches at the start of a word and [[:>:]] matches at the end of a word. Slightly longer / more complex, but probably a little quicker to run.

My personal suggestion - if you are searching - is to look for the search term anchoring the start but not the end to a word boundary. That way, you find all the "es" "ed" and "ing" words - end, ended, ending, but it does not send you round the bend with lots of spurious hits.

Our blog archive at http://www.wellho.net/mouth/ adds a column of "related short articles" down the right hand side to help you navigate to similar subjects. Until yesterday, we were reporting similar articles based on the subject of the current blog having one of its words appear within the subject line of another article - done that way from the early days of the blog to get a good spread of links to extra aricles. However, that list was getting long and I updated the script to use a MySQL regular expression - and now we have a list that (in most cases) has been trimmed back to a manageable size, and had a heightened relevance. For example - from 17 further links on that first page down to 12.

In other words - the clause
where entry_title like '%$word%'
has been replaced by
where entry_title rlike '[[:<:]]$word'

As an aside - we also eliminate a few common words from the page matching - here's the regular expression used on each word in a list from the title.

'^(the|from|and|you|our|why|they|via|that|can|all|'.
'use|your|big|how|etc|for|one|two|not|after|work|but|get|are)$'

(written 2008-08-03 10:59:46)

 
Associated topics are indexed under
Q803 - Object Orientation and General technical topics - Regular Expressions - Extra Elements
S157 - More MySQL Commands
P212 - Perl - More on Character Strings
H308 - PHP - Searches, and search engines

Back to
All around the world?
Previous and next
or
Horse's mouth home
Forward to
Current visitors from around the world - PHP
Some other Articles
Bath, Snake or Nag?
Clean code, jump free (Example in Lua)
Rules, suggestions, considerations for Lua variable names
Current visitors from around the world - PHP
Finding words and work boundaries (MySQL, Perl, PHP)
All around the world?
memcached - overview, installation, example of use in PHP
Old pictures and comparisons
Apache httpd, MySQL, PHP - installation procedure
Punting on the Cam
1891 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

© WELL HOUSE CONSULTANTS LTD., 2008: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 0800 043 8225 or 01225 708225 • FAX: 0845 8382 405 or 01225 707126 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho