One of the major benefits of using FAST Search for SharePoint Server 2010 (FS4SP) is the ability to extend the item processing pipeline and modify existing or populate new crawled properties of each document programmatically. This concept may sound complicated at first but in reality it’s not that hard at all. In this blog post I’m going to show how to integrate a C# console application into the processing pipeline and use custom logic to populate an additional crawled property for each item in the search index.
Let’s say we have a number of SharePoint project sites where each site contains information about a different digital camera model and we’d like to tag each document located within any of the project sites with the project name (camera model) in the search index without adding any extra metadata to SharePoint items.
To accomplish that we are going to populate a custom crawled property called Project
by extracting the project name from site urls that match a specific pattern:
- http://intranet/sites/sp2010pillars/Projects/M300/
- http://intranet/sites/sp2010pillars/Projects/M400/
- http://intranet/sites/sp2010pillars/Projects/M500/
- http://intranet/sites/sp2010pillars/Projects/X200/
- http://intranet/sites/sp2010pillars/Projects/X250/
First of all we need to create a new crawled property to be populated. It is a good practice to create a new crawled property category so that the custom crawled properties don’t get mixed up with SharePoint or any other properties in the search index schema. Since crawled property categories are uniquely identified with a GUID, we need to generate a new GUID. One option is to use Visual Studio 2010 for that – Tools -> Create GUID:
Next we’ll use PowerShell to create the new category called Custom
and add the new Project
crawled property to it. In the next blog post I’m planning to show how to add a new refiner to the FAST Search Center based on the values we populate the Project
crawled property with so let’s go ahead and create and map it to a new managed property.
Add-PSSnapin Microsoft.FASTSearch.Powershell -ErrorAction SilentlyContinue
$guid = "{21FDF551-3231-49C3-A04C-A258052C4B68}"
New-FASTSearchMetadataCategory -Name Custom -Propset $guid
$crawledproperty = New-FASTSearchMetadataCrawledProperty -Name Project -Propset $guid -Varianttype 31
$managedproperty = New-FASTSearchMetadataManagedProperty -Name Project -type 1 -description "Project name extracted from the SharePoint site url"
Set-FASTSearchMetadataManagedProperty -ManagedProperty $managedproperty -Refinement 1
New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $managedproperty -CrawledProperty $crawledproperty
Now we are ready to create the console application that contains our custom logic.
The following code is going to be used to read the url
input crawled property value, check if it matches our project site url pattern and extract the project name from the url if it’s a match.
using System;
using System.Linq;
using System.Xml.Linq;
using System.Text.RegularExpressions;
namespace Contoso.ProjectNameExtractor
{
class Program
{
// special property set GUID that contains the url crawled property
public static readonly Guid PROPERTYSET_SPECIAL = new Guid("11280615-f653-448f-8ed8-2915008789f2");
// Custom crawled property category GUID that contains the Region crawled property
public static readonly Guid PROPERTYSET_CUSTOM = new Guid("21FDF551-3231-49C3-A04C-A258052C4B68");
// crawled property name to be populated
public const string PROPERTYNAME_REGION = "Project";
static void Main(string[] args)
{
XDocument inputDoc = XDocument.Load(args[0]);
// retrieve the url input property value
string url = (from cp in inputDoc.Descendants("CrawledProperty")
where new Guid(cp.Attribute("propertySet").Value).Equals(PROPERTYSET_SPECIAL) &&
cp.Attribute("propertyName").Value == "url" &&
cp.Attribute("varType").Value == "31"
select cp.Value).First();
XElement outputElement = new XElement("Document");
// project site url regex
Match urlMatch = Regex.Match(url, "(?<=http://intranet.contoso.com/sites/sp2010pillars/Projects/).*?[^/]+", RegexOptions.IgnoreCase);
if (urlMatch.Success)
{
// populate the custom Region crawled property
outputElement.Add(
new XElement("CrawledProperty",
new XAttribute("propertySet", PROPERTYSET_CUSTOM),
new XAttribute("propertyName", PROPERTYNAME_REGION),
new XAttribute("varType", 31),
urlMatch.Value)
);
}
outputElement.Save(args[1]);
}
}
}
At this point we are ready to deploy the application to the FAST Search servers. In order to do that we need to copy the executable to each FAST server running document processors and modify the pipelineextensibility.xml
file located in the FASTSearch\etc
folder on each of those servers. Keep in mind that the pipelineextensibility.xml
file can get overwritten if you install a FAST Search Server 2010 for SharePoint update or service pack. Below is the file content assuming that the executable is located in the FASTSearch\bin
folder:
<PipelineExtensibility>
<Run command="Contoso.ProjectNameExtractor.exe %(input)s %(output)s">
<Input>
<CrawledProperty propertySet="11280615-f653-448f-8ed8-2915008789f2" varType="31" propertyName="url"/>
</Input>
<Output>
<CrawledProperty propertySet="21FDF551-3231-49C3-A04C-A258052C4B68" varType="31" propertyName="Project"/>
</Output>
</Run>
</PipelineExtensibility>
Once all of the above is in place, simply execute psctrl reset
command in Microsoft FAST Search Server 2010 for SharePoint shell and submit a full crawl for the SharePoint content source. When the full crawl is complete let’s run a search query for “digital camera” and take a look at the Project
property value in the results:
As you can see, the managed property is populated with the expected values. In the next post I’ll show how to use this new property as a custom refiner in the FAST Search Center.
References:
- Integrating an External Item Processing Component
- CrawledProperty Element [Pipeline Extensibility Configuration Schema]