Second non-demanded idea

Topic of non-demanded idea is dedicated to scientific ideas and methods that are published, tested and found very effective but on unknown reasons are not used by engineers.

Implicit Regularization Technique

This topic is about very practical engineering technique of obtaining an approximate solution of system of linear algebraic equations in case precise solution is not exists. To be absolutely clear what kind of solutions we consider let us look at the data below

4 * 4 Matrix
14.017.020.012.0
90.011.056.011.0
37.034.033.010.0
37.034.033.010.0
*
Vector
1.0
2.0
3.0
4.0
=
Result
156.0
324.0
244.0
244.0


The result vector is obtained by multiplication matrix and vector shown at the left hand side of equation. As you can see the last row of the matrix is equivalent to preceding one that means the system has infinity of solutions (since we know that it has at least one). If we try to solve system given Matrix and Result relatively to Vector (presuming we do not know it) we run into division by zero or other computational bug depending on the chosen method. In order to make situation much more complex and closer to engineering reality we add random errors to matrix elements and known vector

4 * 4 Matrix
14.117.020.012.0
89.911.156.011.0
37.033.833.010.0
37.034.033.010.2
*
Vector
?
?
?
?
=
Result
155.8
325.0
244.0
244.0


Now we have typical engineering problem. We have matrix equation with elements known approximately. More than that, even if we knew exact elements it would not bring us luck because exact system has infinity of solutions. Now we have one million dollar question is it possible to obtain at least approximately wanted vector (marked as ?) having these very bad data? Since I publish this article the answer can be guessed, of course it is possible and there are several methods of getting the result. One of these methods that perfectly works suggested by Russian mathematician Andrei Tikhonov in the middle of 20th century. I will show and explain it in much simpler way than Mr. Tikhonov did it about 60 years ago. And there is also my own method, which is much simpler. It is so elementary that needs few lines of coding even much less than some classical methods for solution of system of linear algebraic equations. It is stable for very large size systems (such as 1000 * 1000). These two methods are not the only that can be applied. This article only shows two effective solutions without mentioning all of them. All solutions require a book.



Andrei Tikhonov

1. Explicit Regularization According to Tikhonov

If we try to solve system

4 * 4 Matrix
14.117.020.012.0
89.911.156.011.0
37.033.833.010.0
37.034.033.010.2
*
Vector
?
?
?
?
=
Result
155.8
325.0
244.0
244.0


exactly we obtain vector V = {-1.96, 0.89, 8.95, -0.89}. We have to discard this solution on several reasons. We know that data have errors. We know that even if errors are corrected system may not have unique solution. We can easily find that solution is very sensitive to small variation of data. For example, if we change element M[4][4] to 10.0 the solution becomes V = {-4.33, 0.00, 13.68, -4.73}. If engineer who collected the data addressed to mathematician for solution about 50 years ago the conclusion would be that data is useless. But usually engineers have some knowledge about possible result, for example that all elements must be positive or that last element must be the largest and so on. Tikhonov suggested to take this information into consideration and change approach. Tikhonov explanation of solution of this problem is not very clear, so I provide my own explanation because it is more logical and show also his explanation later.

Since both matrix M and vector R are known approximately it does not make sense to look for solution X that makes vector M*X precisely equal to R but we may allow some mismatch expressed as non-negative difference ||M*X - R||2 = delta and we consider it as allowance and at the same time constraint. When delta parameter selected correctly we expect existence of infinity of solutions that satisfies above constraint. From this infinity we select unique one with minimum euclidian norm ||X||2. Why solution with minimum norm is better than any other may become clear when we look at two shown unstable solutions. We can see that numbers are significantly larger our ideal solution {1,2,3,4}. That also related to chosen numerical technique. In the middle of 20th century computers solve systems of linear algebraic equations by Gauss elimination. In this method the matrix is converted to upper triangular by linear transformations. The elements obtained in result of these transformations were typically small and contained accumulated errors. The first element of solution was obtained by division of two small numbers containing accumulated errors, which caused all other elements to be large random numbers. Obviously, better approximation may be obtained by smoothing the solution, this why minimum of ||X||2 is suggested as as constrained solution.

The computational part is not very complicated. The constrained extremum can be found by traditional Lagrange multipliers technique. In this method we introduce functional f(X) as

f(X) = ||X||2 + lambda* ( ||M*X - R||2delta )

Parameter delta is given, parameter lambda is unknown. According to Lagrange method we need to find unconstrained extremum of f(X) as function of lambda and choose correct lambda to satisfy the constraint ||M*X - R||2 = delta. It sounds more complicated than it is. In order to find unconstrained extremum we simply take partial derivatives of f(X) with respect to every component of vector X and assign them all to zero. Those who decide to do this are very welcome others may take my word that the result is new system of linear algebraic equations

(MTM + alpha*I) * X = MT R

where alpha is new parameter equal to 1/lambda and is called parameter of regularization. And I is identity matrix. The word regularization is introduced as characteristic of smoothness. The precise definition can be found in the theory of functions (regular functions) on intuitive level we may think of regularization as of smoothness and stability. The last expression is not a solution yet because it depends on parameter alpha. In order to find the solution that satisfies constraint we need to vary parameter alpha, find multiple solutions Xalpha as function of parameter and choose the one that match constraint

||M*Xalpha - R||2 = delta

Although it looks like complicated computational problem because we need to solve system of linear algebraic equations multiple times with verification if the solution comply with constraint, the actual algorithm is not really complicated. As soon as idea was published algorithmists found very quick and effective ways of solving system of two above shown matrix equations.

Tikhonov solution was the same but with different explanation. He suggested to minimize functional f(X)

f(X) = ||GX||2 + ||M * X - R||2

with constrained that can be optional including the one already introduced and matrix G has to be selected by researcher or engineer considering a priori information about possible solution. Usually, G was chosen as identity matrix multiplied by parameter of regularization, but in some cases it was numerical derrivative matrix that took differences of adjecent elements of X. This functional lead to the same solution but Tikhonov simply skipped the explanation why this functional was introduced in the first place.

After the technique was suggested and found effective, the researchers split into two groups those who introduce more sophisticated conditions of better solution from engineering point of view and those who tried to find more effective ways of numerical solutions, because those matrices were sometimes over 1000*1000 and tasks were running for hours before reaching the condition. I belonged to both groups and published 2 articles one about computational efficiency suggesting my own technique of dealing with computational problem and another where I introduced new criteria of better solution that means new regularization technique. In the second article I suggested to use two parameters of regularization as if one was not enough. As it always happen two years later I found more effective technique that do not need regularization parameter at all and can be programmed by 20 lines of code.

Needless to say that this quick introduction into Tikhonov regularization technique is written in the most possible simple way and do not cover even small part of the theory, showing only principal approach. Hundreds of people achieved Ph.D. in second half of the 20th century on that theory, so those who interested may address to Journal of Computational Mathematics and Mathematical Physics between 1960 and 2000. From the very beginning this regularization technique found its opponents. Some mathematicians were against modification of data. This is how they took parameter of regularization. Since this parameter is added to the data, data is modified and this is, on their opinion, what researcher should not do under any circumstances. No matter how bad is the data researcher should deal with them not try to alter them. Concluding introduction to Tikhonov regularization I show here the solution that can be obtained by its application. It is {0.48, 1.79, 4.08, 3.09}. I remind that exact solution is {1,2,3,4} and that it is one of infinity of solutions for precise data but we are dealing with approximate data, where errors were introduced to the data that has no unique solution. The regularized solution is stable. In case we change M[4][4] from 10.2 to 10.0 the result is changed into {0.72, 1.89, 3.60, 3.46}. We see no magic yet but significant improvement. For engineer to see that the last parameter is somewhere between 3 and 4 is much better than discard data or see it equals to either -0.89 or -4.73. The experiment data may not be reproducable or very expensive.

2. Implicit Regularization

Addressing the major criticism of the method that regularization alter the experimental data I suggested different technique about 18 years ago. Before I move to the matter of the method I need to remind or explain successive approximation because it is the foundation of the suggested technique. Generically, the transformation number to number is called function, transformation function into number is called functional and transformation of function into function or vector into vector is called operator. System of linear algebraic equation is an operator M that converts unknown vector V into known vector R

M * V = R

We may add vector V to both parts (if M is square matrix) and move R to the left-hand side.

M * V + V - R = V

These simple actions make sense if we introduce new operator F(V) = M * V + V - R and convert last formula into iterative procedure

F(Vj) = Vj+1      j=0,1,2,...

This procedure may converge to unique solution starting from arbitrary V0. In case the convergence exists operator is called contractive and the solution is called fixed point. In functional analysis vector is treated as point in multi-dimmensional space and one particular operation is called mapping of one point to another.

At the beginning we consider simple contractive operation with a scalar

xj+1 = 2 * xj - xj2

It has fixed point xk = 1 for every initial x0 from interval 0 < x0 < 2. Obviously, if we start from x0=0.5, we have sequence 0.75, 0.94, 0.99, ... the convergence is very fast, and, if number is small each step almost double the result that means very quick growth of small number and slow down of the growth when result is approaching 1.0. Now we consider same operation with matrices

Aj+1 = 2 * Aj - Aj2

We can find that there is big class of matrices, for which this operator is contraction and the fixed point is identity matrix. As we saw for scalar, in order to reach fixed point the initial value has to be in the certain range. Same with matrices, in order to reach identity operator the original matrix should have certain properties that we consider later, but now we find how we can use this operator for solution of system of linear algebraic equations and more than that we want to find out how this procedure can absolutely replace Tikhonov regularization and provide better approximate solution for bad data without usage of constrained extremum and parameter of regularization. In order to apply this procedure to system of linear algebraic equations

A * x = y

we multiply both parts by scalar 2

2 * A * x = 2 * y

and by matrix A

A * A * x = A * y

then we combine last two equations into one

(2 * A - A2) x = 2 * y - A * y

Looking at expression (2 * A - A2) as new matrix A1 and at expression 2 * y - A * y as new vector y1 we can continue procedure until certain criteria is achieved. If we are looking for unique solution for invertible matrix we expect left-hand side to converge to identity matrix and right-hand side to wanted solution. In case of degenerate matrix, bad conditioning or other bad data we have to look at the sequence of right-hand side vectors and choose the one that matching a priori engineering criteria. For example, for considered data I print all sequences of vectors yi in a table where it is easy to see all properties and compare to regularized solution:

Result of iterative approximations
#component 1component 2component 3component 4norm of vector
12.601.342.030.5912.99
22.491.562.070.6613.35
32.291.922.120.7714.01
42.012.382.210.9315.45
51.742.772.311.1417.34
61.562.862.451.4018.54
71.402.702.651.7719.43
81.192.442.942.3221.44
90.942.143.292.9624.99
100.771.933.523.3928.17
110.721.883.583.5029.10
120.721.883.593.5029.15
130.711.873.603.4929.15
140.701.873.623.4729.15
150.681.863.673.4329.17
160.641.853.763.3629.21
170.551.813.933.2229.39
180.391.754.262.9530.05
190.091.644.852.4632.32
20-0.401.475.821.6739.01
21-1.051.237.130.6053.82
22-1.651.008.33-0.3973.35
23-1.930.908.88-0.8384.05
24-1.960.898.95-0.8985.50
25-1.960.898.95-0.8985.55


As we can see near the best possible solution (iteration 12) the norm of vector and its components are almost stopped changing. They also stopped changing after iteration 25 because the solution reached fixed point. With our knowledge of expected result we choose some iteration near 12th. But I leave this criteria open to engineers. This is the typical picture of sequence of solutions for iterative procedure with degenerate matrix and approximate data. Engineers may use any knowledge about expected data to choose the best solution from the list. The condition

||A*x - y||2 = delta

may be used as well.

3. Some Backup Theory

I need to add simple (as simple as possible) explanation why Aj+1 = 2 * Aj - Aj2 converge and when. Similar as composite number may be factored into product of primes any matrix B can be decomposed into product of three matrices B = P * L * Q , where P is orthogonal and having same size as B, L is diagonal square having size matching number of columns in B and Q is orthogonal square same size as L. We presume here that B has number of rows not less than number of columns. Orthogonality presumes that PT * P = I and QT * Q = I. After left mulitiplication by transpose, BTB becomes equals to QTL2 Q . The diagonal elements of L2 are called eigenvalues. Their product equals to determinant, their sum equals to trace. Division BTB by trace (means division of every element) is equivalent to division of all eigenvalues by trace and therefore making them all less than 1.0. If our matrix A is obtained as [1/tr(BTB)] * BTB it is guaranteed convergence to any fixed point. This is what we need to do before iterations are started. After multiplication by transpose and division by trace matrix become non-negative symmetric with all eigenvalues less than 1.0 and, when procedure Aj+1 = 2 * Aj - Aj2 is applied, it makes transformation of all eigenvalues according to mentioned formula xj+1 = 2 * xj - xj2, which brings non-zero values to 1.0 and leave zero values unchanged. If and when all eigenvalues turn into 1.0 the diagonal matrix turn into identity matrix. In case matrix is degenerate and some of its eigenvalues are zeros it must enter into some fixed stage where it stops changing. But in real numerical transformations due to accumulated errors it does not happen and matrix either converge to identity or start diverging. Obviously, when such behavior is observed the computation must be stopped because point of optimal solution is already far behind.

4. Numerical Illustration

For those who want to see stability of Aj+1 = 2 * Aj - Aj2 iteration I can offer on-line test with Hilbert matrix. Hilbert matrix has elements h[i,j] = 1/(i+j-1) and it is known for making numerical solutions computationally challenging. It is proven that matrix has inverse but due to its properties it is very hard to obtain it. The algorithm is considered computationally stable when it handles size 10 * 10.


Andrew Polar, June 08, 2009.