IDOL - Things That Might Get You Into Trouble
Posted by Kimmo Pyhältö (M-Files) on 30 December 2019 08:59 AM
1. Initial design
1.1 All engines in one server
When talking about a cluster environment and IDOL there are usually two or more servers involved: the first one is for the frontend (DIH/DAH and Daily) and the second, third, fourth etc. are the backend (Main Index). This is also the recommended setup from MicroFocus.
The main reason for this is that the most of the data and the communication is relayed and possibly processed through the frontend. This leads to high I/O loads especially during the migration phase. If the backend and the frontend are in the same server, at least the performance might drop considerably. That in turn might lead into long indexing queues and unreasonable long indexing times.
This scenario is possible especially in vaults of size >1 - 5 million documents. The configuration without a separate indexing and search server may be sufficient, though it depends highly on the usage and performance expectations. If the amount of object modifications and/or object inflow to the vault is high, the separate servers should be considered.
1.2 Using virtual servers
One of the most common questions about the IDOL specification is the need of the physical server. Nowadays more and more environments are moving into virtual servers and still IDOL specification is quite specific with the physical ones. Although optimal, this is not entirely true. You can use IDOL in the virtual environment, but you have to be aware of the parallel virtual servers that burden the physical host server.
There are some variables in the virtual environments that are more difficult to understand and calculate than when using physical environment. When talking about IDOL the most critical part of the system is the I/O. In virtual environment, where there are physical disks that are used for driving multiple virtual drives and even the host's own OS and other functions, it is not straightforward to say what is the actual system performance. Another potentially problematic matter is the high amount of parallel operations running in different systems. This may hinder the inspection for the actual root cause if problems occur.
To cope with the things mentioned above the best way is to overdesign the cluster. It is recommended that during the initial construction the cluster would be designed for document amount after 3-5 years.
1.3 Network drives
A bit like with virtual servers the "outsourcing" of the storage space is becoming more common. Easier maintenance and sturdier, native data protection methods make them a tempting choice for the indexing platform.
Still, there are few things that might hinder or even prevent the cluster operation. The most obvious when talking about network drive is the performance of the network. Transferring index data between different networks not to mention across the internet might severely affect to the index performance. In addition, like with virtual servers there are more, even unknown moving parts that might affect to the overall functionality.
The second possible problem is the compatibility between the network drive, IDOL and M-Files. Although the specification of the network drive would seemingly be compatible, there might be some minor differences that will cause unexpected behavior.
2.1 Changing configuration
There are many IDOL dependent configuration items in Windows registry as in the cluster's .cfg files. Those settings should not be changed in any other way than instructed in the installation guides. There are settings that relate each other in a way that modifying only one might make the system work incorrectly. In addition, there are values that might tempt the user to increase them in hope of improving the system's performance. If not knowing exactly how the system reacts to the change the outcome might be even opposite.
If any of additional changes should be made to the configuration, please first discuss about it with M-Files personnel. Preferably, the discussion could be started from the use case: what the user actually tries to achieve with the change. There might be even better ways to accomplish that than the configuration change.
One of the configurations that might need to be changed in some scenarios is SplitNumbers. It is located on each engine's .cfg file. SplitNumbers is used to save search terms efficiently into the index. In other words, it prevents the index from bloating by the garbage words. As an example, we have a "word" abc123. By default, SplitNumbes is true and it saves that word as one chunk into the index. This will cause the situation, where you cannot use wildcards for searching that word e.g. with search phrase abc*. This is problematic in some cases, especially where there are certain code structures like product codes that has to be searched with the wild cards.
Using SplitNumbers setting false is possible, but its use should be considered carefully. The high amount of terms in the index will reduce its performance especially during DRESYNC operations, when the cluster moves its data across the engines. In addition, changing the system to use the false setting requires full re-indexing with DREINITIAL that empties the contents of all index databases.
3. Daily operations and maintenance
3.1 Too many documents in the index
Discussions about IDOL performance often include terms officially and something like strict/high limit. When talking about the amount of documents in one engine the official maximum is 1 million, but then there are descriptions like "until 1.5 million you are OK" or "at 1.5 million the system stops working". The reason for this bit obscure communication comes from the differences of the indexed material and the fact we cannot tell what has been indexed in every situation. Still, counting too much to the strict/upper limit will bring troubles. Exceeding 1 million will first start to affect to the performance on accelerating rate. At some point the system will start to behave oddly e.g. it does not find something it should. Finally, the index might get corrupted. All this will happen at some unknown points between 1 and 1.5 million documents.
To be certain of the index functionality the document limit should be kept at 1 million. The overhead is only used to have an assurance that the system works at 1 million documents even if exceeding it a bit. This also speaks for the overdesigning the cluster in the planning phase.
3.2 Running out of hard drive space
Because of "underdesign" or just because of lack of knowledge, what kind of files the filedata includes the disk space of the cluster might run out. If this happens, the system starts to queue the indexed material and it gives a clear error message into IDOL's indexing logs. The same queue logic is in M-Files: if it cannot send the extracted material into the cluster, it starts to build a queue, which will be flushed into IDOL side when it has enough disk space. This logic will prevent the harm for some time, even some days. However, eventually it will fail and might cause a corrupted index.
If the problem comes during the indexing, the best bet to solve it is to do a controlled shutdown of the cluster, add more disk space and then start the cluster again. In some virtual environments, the disk size can be added on the fly, though also in those cases it would be recommended to shut down the cluster first.
To avoid the situation, the answer again is the overdesign. The ratio in which the system can index the files is highly dependent of the file type. Usually the document's size in the index is about 1-5 % of the original file size, but in some cases, the indexed data can use even more space than the original file! If the index size is calculated by 10% of the filedata size, it should be adequate in most cases.
3.3 Limited server control
IDOL is somewhat sensitive to the abrupt changes in the host environment. Especially unexpected boot of the server will potentially corrupt the index irreversibly. For example, if 3rd party software causes a major failure in the system and the server reboots. Not as bad, but potentially serious issue might also come from Windows updates. Although Windows tries to shut down all processes before the reboot, shutting IDOL processes might take too long and Windows will kill them, which is hazardous especially if the indexing is still going.
Because we cannot be 100% sure about the stability of the server, the best way to avoid long re-indexing is to take IDOL .idx backups regularly. Another good practice is to schedule the server updates to the period where the cluster can be shut down in a controlled manner before the updates.
3.4 Limited knowledge
Finally yet importantly, there comes a human factor. Nowadays even sturdier, more stable and error prone software has made the configuration and use very easy and safe. Features like automatic validation before the settings change and automatic rollback strengthen the idea of trial and error methodology. Although IDOL is a robust and powerful system, it needs to be operated strictly and with good knowledge of its inner life. Since the last mentioned might be challenging to achieve, the best way to get to the good results is to follow the instructions and recommendations as accurately as possible.