|  07.11.2011, 23:11 | #1 | 
| Участник | fed: Multithreaded MRP and related issues 
			
			Источник: http://fedotenko.info/?p=208 ============== Today, I will share some experience related to setting up and fine tuning MRP for best performance. The most significant MRP-related improvement of DAX2009 over older versions is an ability to execute MRP in multi-threaded mode, when planning process is spread over several independent processes executed in parallel. As usual, multi-threaded execution works only when MRP is executed in batch mode, since helper threads are spawned as additional tasks inside the same batch job. This is the current way to support multi-threaded execution in Axapta. Basic setup The number of helper threads is specified on the second tab (Scheduling Helpers) of MRP start menu parameters. What number of helpers should be used ? To my experience, MRP scales very well. On my current project, we use 15 helpers (and 1 main thread). We tried to increase number of helpers to 31, but have not found any significant performance difference. Generally, it depends on configuration of your batch server(s) and your database server.When you are trying to increase the number of helper threads, you should check resource utilization for both batch server and database server. If one of these servers becomes saturated after yet another increase in helper threads number, then it is time to stop or, maybe, time to upgrade your hardware. I also want to mention, that I have positive experience of allocating MRP onto several batch servers. I would say, that my personal recommendation is as the following: First check every batch server, which will participate in MRP for the number of CPU cores it have. Then configure every batch server to run 2xNumber_Of_Cores batch threads. (In Administration->Setup->Server Configuration). Then specify number of helper as total number of batch threads for all batch servers serving your MRP batch group MINUS 1. Remember, that main batch thread also consumes one thread from your batch server group capacity. Also, if during MRP run, you are going to run some other batch processes on the same batch server(s), you may want to decrease the number even further, so maybe you should set number of helper-threads to total thread capacity minus 2 or 3. Next important parameter to discuss is Item Distribution factor. What does it mean? On the very first stages of MRP (In Update phase to be precise), the system allocates all items (all items in inventTable or all items in inventTable, which fit to the query specified in MRP startup dialog box) into chunks. Every chunk is a kind of unit of work, which is processed by one thread. During the phases of MRP, which are executed in parallel (Data regeneration, Coverage planning, Futures planning and Action Messages), every thread grabs chunk, process it, then grab another chunk, process it and so on, until chunks for given stage and BOM level are marked as processed. Size of the chunk is calculated as Item_Distribution_Factor*(Number_Of_Helpers+1). From one side, smaller chunks ensure smooth processing and even allocation of items between threads. The smaller chunk is, the less are chances for a thread to grab a chunk with high number of computational intensive items. Say, one item takes from 1 to 10 seconds to be coverage planed. If size of the chunk is only 1 item, then in the worst case, we most late thread will finish BOM level coverage only 10 seconds after the most early thread finished. If chunk size is 300 items, then in the worst case scenario, difference between time to process is 300*10Seconds-300*1Second==2700Seconds==45Minutes. It means that for this worst case scenario, there are good chances that most of helper threads will be doing nothing, waiting for 30-45 minutes for the last, unlucky thread to finish processing. It would increase planning time and it also would lead to non-optimal usage of hardware. (Since most of the threads would do nothing, while waiting for end of BOM Level). From other side, allocation of the chunk to a thread is a competitive process, which leads to temporary database locks. Several threads often try to allocate the same chunks in parallel; Only one of these threads succeed, while others repeat allocation process until they grab their own chunk. So, chunk allocation can become a bottleneck itself if number of chunks is too high and size of the chunk is too small. To my experience, reasonable size of the chunk is somewhere between 10 and 60. To find out the optimum chunk size and distribution factor, you can simply make several test with different distribution factors. Also, you can check ‘Track Item Process Duration’ checkbox in MRP Dialog parameters and then check typical item planning time in Master Planning->Inquiries->Unfinished Scheduling Processes->Inquiries->Item Process Duration. If item process duration varies a lot, you can benefit from smaller chunk size; if it does not, then probably increase of chunk size can be more beneficial. Another potential way to improve performance (I never tried it though) is to put randomly-ordered items into a chunk. Now, when the system creates chunks, it simply iterates over inventTable ordered by itemId, so a chunk contain items with similar itemId. Since, usually, items with the same complexity of planning often has sequential itemIds, it often leads to uneven of distribution of items between chunks. Some chunks consist from items which are regular purchased items, while other chunks consist of complex BOM-items, which require complex resource planning on many work center groups. If you add special MRPOrder field to inventtable, fill it with random number during item creation and then sort by this field during chunk generation, you can have more even distribution of items between chunks. Infant mortality issue The frequent problem of parallel MRP run is early termination of helper threads. Say, we started our MPR as usual, but then in 15 minutes, we see that all our helper batch tasks terminated withthe strange message “Nothing to process”. (You can see this message, if you click Log button in Batch Task form before main thread of MRP-batch terminate). Then the only remaining main thread continues to run MRP in single-thread mode (very slowly). Here is what is happening: 
 insert into it sleep(500); statement. This will add 0.5 seconds delay between re-reads of the process status, thus giving to your database server a time to breathe. Another issue to fix is low timeout value for helper threads. If you have realistically large working set of net requirements, deletion of the plan would take much more time then 15 minutes, expected by developers of the functionality. To increase timeout, you should open class declaration part of the mentioned class (reqProcessExternalThread) and modify definition of WAITFORPROCESSSTATUSUPDATE macros. Try to change the value of macros from 15 to at least 60 or maybe even 90. Slow end-of-level processing. Often, you can see in the Unfinished Scheduling Process form, that during coverage phase, in the end of BOM Level he system works very slowly. It looks like it almost hanged for 30-40 minutes, but then, suddenly it advances to the new level. If you start AOS tracing on the batch server, you will see that all helper threads are doing nothing (they simply rereading process status waiting for advance to the next BOM level), while one unlucky thread is busy calculating coverage for portion of items. Also, in Unfinished Scheduling Process form, you can realize that the number of items on the level decreases ! 10 minutes ago you had more then 15000 items on the level, now it is slightly less then 15000, and in a few minutes it will be less then 14500. What’s happening ? The main reason for the issue is incorrect BOM Level data in inventTable. When the system allocates items between chunks, every chunk has attached BOM Level. But sometimes, during explosion of BOM for,say, 5th level, the system finds as a component of exploded BOM the item, BOM Level of which (according to inventTable) is 2. Coverage for this item was already created already. How does the system calculates coverage info for this item again ? 
 cases, when size of this chunk exceeded 1000 items. If planning of every item takes,say,4 seconds, then unlucky thread, which managed to grab this chunk will spend more then 1 hour for processing of it. Since this chunk has highest number for a BOM level, it is usually grabbed by some unlucky thread in the very end of BOM Level processing. Then one thread is processing this chunk for an hour, while all other threads are doing nothing, simply waiting for our unlucky thread. The very first idea of how to fix the issue which came to the mind is to calculate BOM levels before every full regeneration planning. Unfortunately – it does not always help. Sometimes, we have incorrect BOMLevel info in inventTable only because we changed out BOM structure without recalculating BOM levels. But often, the reason for level change is that real life nesting of items in production BOMs does not fit to theoretical BOM structure described in master-data BOM. Say, we have a shining brass-head bolt, which is used only to screw a label with the name of our company to the final assembled good. Naturally, after calculation, it will get BOM Level of 1. Say, then, during production of a deeply nested sub-BOM of our finish good, someone decided to use the same bolt for screwing much smaller label to one of the sub-components. He dropped the line with standard bolt from production order’s BOM and added new line with the shining brass-head one.(Maybe it is not very good practice, but you simply can not update standard BOM structure for every possible small change, requested by a customer). If you recalculate BOM Levels, this bolt will still have BOMLevel 1. But if you try to run MRP, you would find out that it coverage actually is performed in level 5 or 6. By the way, one of the very last stages of MRP is BOMLevel update. During this stage, the system updates InventTable with actual BOM Level data, gathered during MRP processing, not with theoretical BOM Level from master-data BOM Structures. So, the only way to resolve the issue, is to change behavior of the system to create many smaller Spare Chunks for a level, instead of one large Spare Chunk. To accomplish this, you need to modify method getSpareListNum() of ReqProcessItemList table: X++: static server ReqProcessListNum getSpareListNum(ReqProcessId _processId, BOMLevel _level, ReqProcessStatus _status, Connection _con ) { ReqProcessItemList reqProcessItemList; ReqProcessItemListLine reqProcessItemListLine; RandomGenerate randomGenerate=new RandomGenerate(); ; select firstonly ListNum, RecId from reqProcessItemList order by listNum desc where reqProcessItemList.ProcessId == _processId && reqProcessItemList.Level == _level && reqProcessItemList.Spare == true; if (reqProcessItemList) { select count(recid) from reqProcessItemListLine where reqProcessItemListLine.ProcessId==_processId && reqProcessItemListLine.ListNum==reqProcessItemList.ListNum; } if (!reqProcessItemList || (reqProcessItemListLine.RecId>=#MAXLISTSIZE)) //#MAXLISTSIZE is the macros with size of the spare chunk. Should be around 30-60 { randomGenerate.parmSeed(timenow()); try { reqProcessItemList.setConnection(_con); select maxof(ListNum) from reqProcessItemList where reqProcessItemList.ProcessId == _processId; reqProcessItemList.ListNum=reqProcessItemList.ListNum+randomGenerate.randomInt(1,20); reqProcessItemList.ProcessId = _processId; reqProcessItemList.Level = _level; reqProcessItemList.Status = _status; reqProcessItemList.Spare = true; reqProcessItemList.insert(); } catch(Exception::DuplicateKeyException) { retry; } } return reqProcessItemList.ListNum; } Источник: http://fedotenko.info/?p=208 
				__________________ Расскажите о новых и интересных блогах по Microsoft Dynamics, напишите личное сообщение администратору. Последний раз редактировалось oip; 17.11.2011 в 16:17. Причина: Исправил форматирование кода | 
|  | |
| За это сообщение автора поблагодарили: oip (5). | |