SharePoint 2013 Search crawls stuck
The search crawls are getting hung for some reason. It all started recently. The crawls were stuck for hours and days if not stopped manually. The farm that I work with would normally finish full crawls in less than 4-5 hours with around 300K size index in the largest content source. Strangely enough, when I kick off the crawl, the crawl is running fine with index count grows, however, I see no commits at the end (crawl never completes). Below are ULS errors I extracted.
Search service events aei87 Warning A database error occurred. Source: .Net SqlClient Data Provider Code: 220 occurred 0 time(s) Description: Error ordinal: 1 Message: Arithmetic overflow error for data type smallint, value = 32768., Class: 16, Number: 220, State: 1 at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.ExecuteNonQuery() at Microsoft.Office.Server.Data.SqlSession.ExecuteNonQuery(SqlCommand command) at Microsoft.Office.Server.Search.ManagedSqlSession.ExecuteNonQuery()
mssearch.exe (0x3D7C) 0x3384 SharePoint Server Search Crawler:Gatherer Plugin ac6s7 High ManagedSqlSession caught a SqlException executing the following command: 'proc_MSS_CommitTransactions'. Connection string was: 'Data Source=XXXXX;Initial Catalog=SSA_CrawlStoreDB;Integrated Security=True;Enlist=False;Pooling=True;Min Pool Size=0;Max Pool Size=100;Connect Timeout=15;Application Name=SharePoint[OWSTIMER][1][Search_Service_Application_CrawlStoreDB_299952a3060f4f628def625b742db001]' Original exception message: 'Arithmetic overflow error for data type smallint, value = 32768. Duplicate key was ignored.' Stack trace ' at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.ExecuteNonQuery() at Microsoft.Office.Server.Data.SqlSession.ExecuteNonQuery(SqlCommand command) at Microsoft.Office.Server.Search.ManagedSqlSession.ExecuteNonQuery()' SQL Errors: Error ordinal: 1 Message: Arithmetic overflow error for data type smallint, value = 32768., Class: 16, Number: 220, State: 1 Error ordinal: 2 Message: Duplicate key was ignored., Class: 0, Number: 3604, State: 0
mssearch.exe (0x3D7C) 0x3384 SharePoint Server Search Crawler:Gatherer Plugin aau6c High CManagedSqlSession::ExecuteNonQuery m_ManagedSqlSession->ExecuteNonQuery fails with Error 0x80131904 Description 'Arithmetic overflow error for data type smallint, value = 32768. Duplicate key was ignored.' Message 'Unknown error 0x80131904' [gathersql.cxx:2243] search\native\gather\server\gathersql.cxx
mssearch.exe (0x3D7C) 0x3384 SharePoint Server Search Crawler:Gatherer Plugin aau2i High CGatherer::CommitTransactions pSession->ExecuteNonQuery for proc_MSS_CommitTransactions failed with Error 0x80131904 [gatherobj.cxx:8212] search\native\gather\server\gatherobj.cxx
#1 Actions Taken:
- Reset Search Index
Index reset resulted the above errors go away. May be this is due to the reason that our full crawls were stuck on completing forever and every time we stop and initiate the crawl, a new full crawl kicks in and get stuck again. So with all this pile up, maybe the temp table sequences in proc_MSS_CommitTransactions procedure go out of smallint limit. Not sure if this theory is right but reset index definitely cleared the above errors but the crawls were getting stuck again with a new set of errors. See below.
mssearch.exe
(0x0FA8)
0x1644 SharePoint Server
Search Crawler:Gatherer
Service
dsg6 Medium
CRobotThread::Thread failed to move file for docID 50093, URL
sts4://site/siteurl=site1/subsite/siteid={0a3cf259-90dc-4abb-bf2e-15bb0d7275e7}/weburl=shared/webid={c54d4bd4-f6a4-4813-9aa8-927039e53575}/listid={20e29c27-f4dc-48fb-8528-8e0bcae2d450}/folderurl=/itemid=1211,
Error 0x80070005. The transaction will be
retried. [robotthrd.cxx:4810]
search\native\gather\gthrsvc\robotthrd.cxx
54ad911a-fc69-4897-920d-f557c1cf9d53
#2 Actions Taken:
- Add Antivirus exclusions (if not already added) on all SharePoint servers.
- Reboot the servers
Result: Crawls completed as expected. The issue is resolved.