Manage SharePoint incremental crawl index deletion

Problem

In SharePoint 2013, we have a content source that connects via BCS. The total items in the External table are around 7.7 million records. We have a full crawl scheduled to crawl the BCS items every month and an incremental crawl to run every 4 hrs.

In our environment the full crawl takes 18 hrs to crawl 7.7 million records and incremental crawl takes 2-3 hrs based on the number of items modified in the external table.

Things were moving fine until the incremental crawl deleted all the indexes from the content source when it failed (due to permission issue) to crawl BCS items. When we checked the crawl log we found that the incremental crawl only triggered the delete operation in the SharePoint Search Service.

Finally we came to know that Microsoft enforced some policies in SharePoint Search Service to delete the index based on some conditions. I am not sure we can disable this setting but we can increase the thresholds through PowerShell.

If somebody knows how to disable this setting / policy please let me know….

Solution / Workaround

Please find the below article which explains about this issue:

http://technet.microsoft.com/en-us/library/hh127009.aspx

But few intervals are changed in 2013 version of SharePoint

Comparison of Default Values between 2010 and 2013:

Thanks to Simon’s blog for sharing this information in his blog.  

Property SP 2010 SP 2013
ErrorDeleteCountAllowed 30 times 10 times
ErrorDeleteIntervalAllowed 720 hours 240 hours
ErrorCountAllowed 100 times 15 times
ErrorIntervalAllowed 1440 hours 360 hours
RecrawlErrorCount 10 times 5 times
RecrawlErrorInterval 360 120 hours
DeleteUnvisitedMethod 1 1

 

 

Advertisements

Configure item level security in SharePoint BCS

In SharePoint 2007 version we can only apply the query time security trimming on BDC Entities, Where in SP 2010 / 2013 you can implement the security trimming in the Crawl time also.

If you wish to trim your SharePoint search results on external system using BCS based on your own logic, there are two approaches:

  • CRAWL time security trimming
  • QUERY time security trimming

Applying security trimming logic at query level needs to be checked against every item returned from the search query for current user. It’s a very heavy process. When we use NTLM users to apply permission for the individual items in the external system then it’s better to go with CRAWL time security trimming.  In the crawl time security trimming approach, you just need an additional column in the external system table which will be used to save the permission details for each item. This will work if you are building a new BCS Model from the scratch. It’s very difficult to apply crawl time security trimming logic in an ongoing complex BCS application. So in that case you can better use query time security trimming approach.

Please refer this article – http://msdn.microsoft.com/en-us/library/gg294169.aspx

Search Database Related Tables using BCS

The following solution is designed and developed with the help of my friend (Sreeharsha) and Scott Hilier

Environment:

SharePoint 2013 Enterprise with all the Services enabled on the same server.

SQL Server 2008 R2 Standard also on the same SharePoint standalone server.

Requirement:

Configure BCS to search a Database, where the Tables have a parent-child relationship. When searched for any term within the child table, the item linked in the parent item has to be displayed on the result page than the searched term or item from the child table.

The requirement makes sense in a way, that the child tables are considered as paragraphs within a document. When searched for a term within the paragraph, you would expect the link to the document to be displayed. After all the child tables will always be additional information about the parent item.

Design:

While designing for this requirement the data to be retrieved from the parent table and the child table has to be from two different entities, however they need be within the same LOBInstance.

By which, we have control to define the association between these entities within the Model.

Here is how the Method and the method instance need to be defined.

<Method Name=”PartItemToRevisionItem”>

<Properties>

<Property Name=”HideOnProfilePage” Type=”System.Boolean”>true</Property>

</Properties>

<Parameters>

<Parameter Name=”partID” Direction=”In”>

<TypeDescriptor Name=”PartID” TypeName=”System.String” IdentifierEntityName=”PartItem” IdentifierEntityNamespace=”MultiBCS.MultiEntity” IdentifierName=”PartID” ForeignIdentifierAssociationName=”PartItemToRevisionItem” />

</Parameter>

<Parameter Name=”revisionAssociationItemList” Direction=”Return”>

<TypeDescriptor Name=”RevisionItemList” TypeName=”MultiBCS.MultiEntity.RevisionItem[], MultiEntity” IsCollection=”true”>

<TypeDescriptors>

<TypeDescriptor Name=”RevisionItem” TypeName=”MultiBCS.MultiEntity.RevisionItem, MultiEntity”>

<TypeDescriptors>

<TypeDescriptor Name=”RevisionID” TypeName=”System.String” IdentifierEntityName=”RevisionItem” IdentifierEntityNamespace=”MultiBCS.MultiEntity” IdentifierName=”RevisionID” />

<TypeDescriptor Name=”Revision_Name” TypeName=”System.String” />

<TypeDescriptor Name=”Revision_Cost” TypeName=”System.Int32″ />

<TypeDescriptor Name=”PartID” TypeName=”System.String” />

<TypeDescriptor Name=”Revision_Type” TypeName=”System.String” />

</TypeDescriptors>

</TypeDescriptor>

</TypeDescriptors>

</TypeDescriptor>

</Parameter>

</Parameters>

<MethodInstances>

<Association Name=”PartItemToRevisionItem” Type=”AssociationNavigator” ReturnParameterName=”revisionAssociationItemList” IsCached=”false”>

<Properties>

<Property Name=”AttachmentAccessor” Type=”System.String”></Property>

</Properties>

<SourceEntity Name=”PartItem” Namespace=”MultiBCS.MultiEntity” />

<DestinationEntity Name=”RevisionItem” Namespace=”MultiBCS.MultiEntity” />

</Association>

</MethodInstances>

</Method>

The association method can be written in any entity (parent or child). But the tag that defines the source and the destination can be seen within the MethodInstance.

Silver Bullet:

Of all this requirement the most important part and the one that defines that the child tables need to be crawled as attachments by which I mean that, “when searched for a term in the child table, the item linked in the parent table to be displayed”, the below Tag needs to be added within the method instance, otherwise which multiple results will be displayed for a single record which is unwanted.

<Property Name=”AttachmentAccessor” Type=”System.String”></Property>

Thanks to Scott Hilier to the right pointer on the “Attachment Accessor”  which is significant for this requirement.

Resources:

How to: Crawl associated external content types in SharePoint 2013

Modeling Associations in External Data

 

Use of UseClientCachingForSearch property in BDC Model

Environment: 

  • SharePoint 2013 Enterprise with all the Services enabled on the same server.
  • SQL Server 2008 R2 Standard also on the same SharePoint standalone server. 

Problem:

We are using a custom BCS connector to crawl the Data from SQL for SharePoint 2013 search. There are 2.1 million records in the SQL table. The data is being read in batches and provided to the crawler (we have implemented our own logic to supply the records to the search crawler in batches).

The issue here is the crawl rate.  The rate of crawl at which the OOB External Content type retrieves the data is way higher than the custom process (2.1 M items in 8.30 hrs.). But when we try to crawl the BCS External Content type using custom BCS connector its takes 10 hrs. (If the load is high or multiple users log into the server sometimes it takes 19 hrs. to crawl 1M items)

At this point of time we are unable to identify the cause of the issue or find the section where the bottle neck is.

Solution: 

When we compared the OOB and Custom BDC model files (.bdcm), we found that there is one additional property called UseClientCachingForSearch was used in the OOB BDC model. Then we started investigating about this property and found that this is the property produce the magical number for OOB BCS.

The UseClientCachingForSearch property in BDC model improves the speed of full crawls by caching the item during enumeration. This property is also recommended when implementing incremental crawls that are based on change logs, because it improves incremental crawl speed.

Basically when we make a model using Out of the box, the UseClientCachingForSearch property is added automatically. When we make our own custom BCS model we have to add this property manually.

Important Note:

If BCS crawl items are larger than 30 kilobytes on average, we should not set this property, as it will lead to a significant number of cache misses and negate performance gains. In our case our records are not more than this limit.

Output

After implementing this we got the magical number same like out of the box BCS crawl in our custom BCS crawl activity. This is just in the development environment. When it goes to production it should give very good performance for 200% sure.

Why Custom BCS Connector?

Custom BCS connector is used here for the following reasons:

  • Out of the box BCS Connector has response limit. The maximum number of rows that can be read through Database Connector is 1,000,000. The limit can be changed via the ‘Set-SPBusinessDataCatalogThrottleConfig’ cmdlet using PowerShell. But it is not recommended. To handle this threshold issue on the search crawler, we are writing the custom BCS connector with the batch processing. (Please refer the SharePoint 2013 Software Boundaries)
  • Currently our external system (Teamcenter) has 2.1 million data and same may grow up to 6 million in the next 4 to 5 years. To process this dynamic data growth we have implemented batch processing in our Custom BCS connector.
  • OOB SharePoint incremental crawl will not delete the deleted items in the external system from the search indexes until we do a full crawl. To implement this logic for incremental crawl we choose custom BCS connector.

Vote of Thanks

Regarding this solution approach, I would like to convey my thanks to Subrat Naik who helped us to create the core BCS custom connector. To improve the performance at the code level Sivasankar Sabbani Pillai and Prasad Athalye did a great code review. To improve the crawl and search performance Scot Hillier and Matthew McDermott helped us a lot. Scot & Matthew the MVPs helped us a lot to get this solution work with greater performance in tough testing scenarios.

Last but not least Sreeharsha Alagani the Search Consultant who is working with me in this project. He has a very good knowledge in SharePoint Search Applications.

In this project the researches, findings and learning we have done are going to be Great Knowledge Base for everyone us/you.

Feel free to get in touch with me if you have any queries on Custom BCS connector / SharePoint 2013 Search. I am happy to help you.

Thank you every one!!! Happy SharePointing !!!