Can you give an example in which regular performance measurement (A/B testing) was really difficult to do?
Really, in everyday use, any A/B test isn't very comfortable. Because in case you have a recommendation system, you need to split your recommendations in groups and in some cases you don't provide recommendations at all. And it's uncomfortable for the customer because they can't rely on the system psychologically. One time they get a recommendation, another they don't.
In many cases you can't do it in a simple way. You need to develop a design of A/B test which can feed to everyday life of a factory, and that's not so easy. We did it for different processes, and every time we needed to discuss it in details in advance, because people don't have an experience of A/B testing, and for them this concept is kind of alien. We need to introduce this concept in advance, and then find a way how to conduct this test comfortably.
In some cases we can compare groups, for example, one month to another month, in some we need to split events to groups A and B, in some we need to split it other ways. Every time it was difficult, but different problems arised.