Title case function in PHP based on grammatical rules for capitalizing the words in a title

Creating a Title Case Function with PHP Based on Grammatical Rules.

The common way that programmers implement title case is the most simple way, they capitalize the first letter of every word. But what about titles like the title above where the "on" and "with" are not capitalized? This is called "Headline Case." There are many different articles on the web about using PHP to title case a string. Most don't try to encompass the more complicated grammatical rules that govern title case. I have written a function that follows most of the grammatical rules of "Headline Case" If you have titles where you would have exceptions to these rules, I have provided the tags < no_parse > as a feature of the function below. In the case were you want to take over formatting and override the parser you can uses these tags to exempt it from the title casing parser. This way 90% of all situations are covered after which you only have to manually tweak a few entries. This takes so much of the grunt work out of data entry. In any case the tags are provided for your convenience. You may notice that the New York Times is inconstant on which style rules they use for title case (at least it was at the time of this writing). If they used an algorithm like mine there would be automatic standardization. I suggest you use the following function to make sure your content titles are standardized and let your authors who are the experts in writing opt out of the title case rules with the < no_parse > tags at their discretion. I may have missed special cases. I encourage you to use your Github account to fork the code below to improve it. That is, should you find your self more grammatically knowledgeable than I.


// Converts a string to Title Case based on one set of title case rules
// Put <no_parse></no_parse> around content that you don't want to be parsed by the title case rules
// Test the function :
// echo titleCase("this is a test <no_parse>do you not believe me</no_parse> here is the rest of the string");
// Which would yield the following output :
// This Is A Test do you not believe me Here Is The Rest of The String
function titleCase($string) {
//remove no_parse content
$string_array = preg_split("/(<no_parse>|<\\/no_parse>)+/i",$string);
$newString = "";
for ($k=0; $k< count($string_array); $k=$k+2){
$string = $string_array[$k];
//if the entire string is upper case dont perform any title case on it
if ($string != strtoupper($string)){
//TITLE CASE RULES:
//1.) uppercase the first char in every word
$new = preg_replace("/(^|\\s|\\''|''|\\"|-){1}([a-z]){1}/ie","''''.stripslashes(''\\\\1'').''''.stripslashes(strtoupper(''\\\\2'')).''''", $string);
//2.) lower case words exempt from title case
// Lowercase all articles, coordinate conjunctions ("and", "or", "nor"), and prepositions regardless of length, when they are other than the first or last word.
// Lowercase the "to" in an infinitive." - this rule is of course aproximated since it is contex sensitive
$matches = array();
// perform recusive matching on the following words
preg_match_all("/(\\sof|\\sa|\\san|\\sthe|\\sbut|\\sor|\\snot|\\syet|\\sat|\\son|\\sin|\\sover|\\sabove|\\sunder|\\sbelow|\\sbehind|\\snext\\sto|\\sbeside|\\sby|\\samoung|\\sbetween|\\sby|\\still|\\ssince|\\sdurring|\\sfor|\\sthroughout|\\sto|\\sand){2}/i",$new ,$matches);
for ($i=0; $i< count($matches); $i++){
for ($j=0; $j< count($matches[$i]); $j++){
$new = preg_replace("/(".$matches[$i][$j]."\\s)/ise","''''.strtolower(''\\\\1'').''''",$new);
}
}
//3.) fix capitilized letters after apostrophes
$new = preg_replace("/(\\w''S)/ie","''''.strtolower(''\\\\1'').''''",$new);
$new = preg_replace("/(\\w''\\w)/ie","''''.strtolower(''\\\\1'').''''",$new);
$new = preg_replace("/(\\W)(of|a|an|the|but|or|not|yet|at|on|in|over|above|under|below|behind|next to| beside|by|amoung|between|by|till|since|durring|for|throughout|to|and)(\\W)/ise","''\\\\1''.strtolower(''\\\\2'').''\\\\3''",$new);
//4.) capitalize first letter in the string always
$new = preg_replace("/(^[a-z]){1}/ie","''''.strtoupper(''\\\\1'').''''", $new);
//5.) replace special cases
// you will add to this as you find case specific problems
$new = preg_replace("/\\sin-/i"," In-",$new);
$new = preg_replace("/(\\s|\\"|\\''){1}(ph){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*|$){1}/ie","''\\\\1pH\\\\3''",$new);
$new = preg_replace("/^ph(\\s|$)/i","pH ",$new);
$new = preg_replace("/(\\s)ph($)/i"," pH",$new);
$new = preg_replace("/(\\s|\\"|\\''){1}(&amp;){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*){1}/ie","''\\\\1and\\\\3''",$new);
$new = preg_replace("/(\\s|\\"|\\''){1}(groundwater){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*){1}/e","''\\\\1Ground Water\\\\3''",$new);
//Standardize words that should be hyphenated.
//Example: cross connection => cross-connection
//$new = preg_replace("/(\\W|^){1}(cross){1}(\\s){1}(connection){1}(\\W|$){1}/ie","''\\\\1\\\\2-\\\\4\\\\5''",$new);
$new = preg_replace("/(\\s|\\"|\\''){1}(vs\\.){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*){1}/ie","''\\\\1Vs.\\\\3''",$new);
$new = preg_replace("/(\\s|\\"|\\''){1}(on-off){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*){1}/ie","''\\\\1On-Off\\\\3''",$new);
$new = preg_replace("/(\\s|\\"|\\''){1}(on-site){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*){1}/ie","''\\\\1On-Site\\\\3''",$new);
// special cases like Class A Fires or Class C Regulations
$new = preg_replace("/(\\s|\\"|\\''){1}(class\\s){1}(\\w){1}(\\s|,|\\.|\\"|\\''|:|!|\\?|\\*|$){1}/ie","''\\\\1\\\\2''.strtoupper(''\\\\3'').''\\\\4''",$new);
$new = stripslashes($new);
$string_array[$k] = $new;
}
}
for ($k=0; $k< count($string_array); $k++){
$newString .= $string_array[$k];
}
return($newString);
};