Urysohn Adaptive Filterstwo bricks in the foundationFirst brick is a model in a form of straight line between two points. We call them LEFT and RIGHT. The model is fully defined by argument points $x_{left}, x_{right}$ and functions $f_{left}, f_{right}$. We need to identify model by sequence of known arguments and function values $x_i, f_i$, where all points belong to the line, and therefore $$f_i = (1  x_i) \cdot f_{left} + x_i \cdot f_{right}$$ The values of $x$ are relative distances from the left from the range [0,1]. We call values $(1  x_i)$ and $x_i$ weights and put all data in a table
Obviously, first two columns form a matrix, third column forms known vector and two unknown parameters $f_{left}$ and $f_{right}$ can be estimated by applying least mean squares (LMS) or other known method. It may look very obvious and may look like LMS, but it is different. When two parameters ($a$ and $b$) are estimated by LMS, the model must have two input values ($x$ and $y$) $$z_k = a \cdot x_k + b \cdot y_k.$$It has two linear terms, each of them is defined by a straight line passing through origin. In the first brick we have single input value but we can find two points of the model. This difference is critical property for all further introduced models.Second brick is bilinear model. It is defined in rectangular area $[x_{min}, x_{max}], [y_{min}, y_{max}]$ and represented by four points, which may be called NorthWest, NorthEast, SouthWest, SouthEast. Bilinear model does not assume that all four points belong to the same plane. The function value for bilinear model is computed as $$f = NW + (NENW) \cdot x + (SW  NW) \cdot y + ((SE + NW)  (NE + SW)) \cdot x \cdot y,$$ where $x$ is relative distance from North and $y$ is relative distance from West. Relative means they take values from [0,1] interval. The weight coefficients are defined in the following way: $$w_{NW} = (1  x) (1  y)$$ $$w_{NE} = x (1  y)$$ $$w_{SW} = (1  x) y$$ $$w_{SE} = x y$$ If to place them all in a table similar to first brick than same LMS is applicable. The difference with LMS is that by processing data we determine 4 model coefficients by 2 input values. By applying these two elementary methods it is possible to identify such nonlinear models as Urysohn operators with multiple inputs $$z(t) = \int_{0}^T U[x(ts), y(ts)]ds.$$ which is explained in details on this site. PS LMS (1960) is not the only way of adaptive filtering. It was actually derived from projectional descent published by Stefen Kaczmarz in 1937 and has a lot of similarities with it. 
