Discussion:
Motivation for matrix algebra
2005-07-19 01:36:06 UTC
Hello,

In order to better understand matrix algebra, I seek the motivation
behind its creation. That is, why is the concept of a matrix, and
matrix multiplication defined as they are?

I read that the definition for matrix multiplication was defined to
preserve the linear relationship of the composition of linear
functions, which I do not feel confident in my understanding.

For instance, suppose

G o F(x) = G(F(x)) where G and F are linear functions whose domains and
ranges work. Then clearly,

G o F(ax + by) = aG(F(x)) + bG(F(y)).

Now, if a linear function is represented in matrix algebra as a matrix,
then why and how does the definition of matrix multiplication suite
this?

That is, why is the definition of matrix algebra,

C = AB <=> C_ij = Sum_{k=1}^{n}A_ik*B_kj

defined as so?

As is evident, I am confused. The application of the definitions is
fine, but I seek a reason or motivation for the very definitions of the
matrix algebra. What was the intent in the minds of the humans that
created this subject to begin with?

If this is not explained well, please let me know. I truly wish to
understand this.

vsgdp
2005-07-19 01:44:42 UTC
In order to better understand matrix algebra, I seek the motivation
behind its creation. That is, why is the concept of a matrix, and
matrix multiplication defined as they are?
So you can write linear combinations as matrix-vector multiplication.
Stephen Montgomery-Smith
2005-07-19 01:54:15 UTC
My sense is that matrix algebra was invented as a neat abstraction for
describing systems of linear equations, e.g

a_11 x_1 + ... + a_1n x_n = b_1
...
a_n1 x_1 + ... + a_nn x_n = b_n

can be written as Ax=b. In order to do this you end up with the usual
definitions of matrix multiplication.

Similarly if you have the equations

c_11 b_1 + ... + c_1n b_n = z_1
...
c_n1 b_1 + ... + c_nn b_n = z_n

i.e. Cb=z, you want to be able to write C(Ax) = CAx = b. So this gives
a model for how CA should be defined, and again you end up with the
usual definition.

Once you have this abstraction, you begin to find that it is so useful
and convenient, that you keep using it, and end up with a whole study of
these matrices. (For example, the standard Gaussian elimination
approach to solving these linear equations can be conveniently
re-expressed in matrix notation as the LU decomposition.)

Stephen
2005-07-19 02:47:17 UTC
Post by Stephen Montgomery-Smith
My sense is that matrix algebra was invented as a neat abstraction for
describing systems of linear equations, e.g
a_11 x_1 + ... + a_1n x_n = b_1
...
a_n1 x_1 + ... + a_nn x_n = b_n
can be written as Ax=b. In order to do this you end up with the usual
definitions of matrix multiplication.
This is how I once thought was the impetus for the definition, but the
definition goes further than that approach it seems. Your idea is good
and represents that system of linear equations, but is only the case
for when the right-hand matrix is of the form n-by-1. That is, it
represents the normally given Ax = b where x and b are n-vectors, but
not C = AB where each of A,B, and C are non n-by-1 matrices. Unless, of
course, I am not understanding. In your view, is the definition just an
extension or abstraction of the case you gave?
Post by Stephen Montgomery-Smith
Similarly if you have the equations
c_11 b_1 + ... + c_1n b_n = z_1
...
c_n1 b_1 + ... + c_nn b_n = z_n
i.e. Cb=z, you want to be able to write C(Ax) = CAx = b. So this gives
a model for how CA should be defined, and again you end up with the
usual definition.
I do not follow the "C(Ax) = CAx = b" portion.

Do you mean that what is also wanted is for associativity to hold?
i.e. A(BC) = (AB)C.
Post by Stephen Montgomery-Smith
Once you have this abstraction, you begin to find that it is so useful
and convenient, that you keep using it, and end up with a whole study of
these matrices. (For example, the standard Gaussian elimination
approach to solving these linear equations can be conveniently
re-expressed in matrix notation as the LU decomposition.)
Matrix algebra is quite an interesting subject, I agree. I'm after
the motivation behind the definitions and subject as a whole.

Perhaps someone has books on the history of mathematics that might
shed light on the development of matrix algebra. I've read that
matrices where being used in the form of determinants before an actual
definition of a matrix was created.

Stephen Montgomery-Smith
2005-07-19 02:57:19 UTC
Post by Stephen Montgomery-Smith
My sense is that matrix algebra was invented as a neat abstraction for
describing systems of linear equations, e.g
a_11 x_1 + ... + a_1n x_n = b_1
...
a_n1 x_1 + ... + a_nn x_n = b_n
can be written as Ax=b. In order to do this you end up with the usual
definitions of matrix multiplication.
This is how I once thought was the impetus for the definition, but the
definition goes further than that approach it seems. Your idea is good
and represents that system of linear equations, but is only the case
for when the right-hand matrix is of the form n-by-1. That is, it
represents the normally given Ax = b where x and b are n-vectors, but
not C = AB where each of A,B, and C are non n-by-1 matrices. Unless, of
course, I am not understanding. In your view, is the definition just an
extension or abstraction of the case you gave?
I think that this next section answers your question in a better manner.
Post by Stephen Montgomery-Smith
Similarly if you have the equations
c_11 b_1 + ... + c_1n b_n = z_1
...
c_n1 b_1 + ... + c_nn b_n = z_n
i.e. Cb=z, you want to be able to write C(Ax) = CAx = b. So this gives
a model for how CA should be defined, and again you end up with the
usual definition.
I do not follow the "C(Ax) = CAx = b" portion.
Do you mean that what is also wanted is for associativity to hold?
i.e. A(BC) = (AB)C.
No. What I mean is that you could take this as your definition of CA -
it is that matrix D representing the system of equations

d_11 x1 + ... + d_1n = b_1, etc

that is equivalent to Cb=z and Ax=b. You sit down, do the calculations,
and lo and behold it is the same as the usual matrix multiplication.

This idea of "composition" of linear operators is really very important
to the area.
Daniel McLaury
2005-07-19 04:20:54 UTC
This is how I once thought was the impetus for the definition, but the
definition goes further than that approach it seems. Your idea is good
and represents that system of linear equations, but is only the case
for when the right-hand matrix is of the form n-by-1. That is, it
represents the normally given Ax = b where x and b are n-vectors, but
not C = AB where each of A,B, and C are non n-by-1 matrices. Unless, of
course, I am not understanding. In your view, is the definition just an
extension or abstraction of the case you gave?
If S and T are linear translations and v is a vector, and A and B are
the matrices of S and T, then A(B(v)) = STv. That is, the effect of
applying S to Tv is the same as applying ST to v, so you can take a
chain of linear operations and reduce it to a single linear operation.
Post by Stephen Montgomery-Smith
Similarly if you have the equations
c_11 b_1 + ... + c_1n b_n = z_1
...
c_n1 b_1 + ... + c_nn b_n = z_n
i.e. Cb=z, you want to be able to write C(Ax) = CAx = b. So this gives
a model for how CA should be defined, and again you end up with the
usual definition.
I do not follow the "C(Ax) = CAx = b" portion.
Do you mean that what is also wanted is for associativity to hold?
i.e. A(BC) = (AB)C.
Yes.
Post by Stephen Montgomery-Smith
Once you have this abstraction, you begin to find that it is so useful
and convenient, that you keep using it, and end up with a whole study of
these matrices. (For example, the standard Gaussian elimination
approach to solving these linear equations can be conveniently
re-expressed in matrix notation as the LU decomposition.)
Matrix algebra is quite an interesting subject, I agree. I'm after
the motivation behind the definitions and subject as a whole.
Perhaps someone has books on the history of mathematics that might
shed light on the development of matrix algebra. I've read that
matrices where being used in the form of determinants before an actual
definition of a matrix was created.
True; that's in the single-matrix case Tv = 0. It was a subject of
considerable interest whether a system of n equations in n variables
had zero, one, or infinite families of solutions.
Hagen
2005-07-19 08:44:56 UTC
Abstract coordinate free linear algebra is one
motivation to define matrices and their addition and
multiplication in the way they are defined. The following
facts give a guideline:

1. Every finitely generated vector space V (over some field
K) possesses a finite basis and is thus isomorphic to the
vector space K^n for some n (called the dimension of V).

2. A linear map f:V-->W between vector spaces (over the same
field K of course) of finite dimension can be concretely
described using a mxn matrix A, where m=dimW, n=dimV.

The describition goes as follows: choose a basis B in V
and a basis C in W.
Let x be the vector of coordinates of an element v of V
with respect to B.
Let y be the vector of coordinates of f(v) with respect
to C.
Then y=Ax in terms of matrix multiplication.

3. The set Hom(V,W) of all linear maps f:V-->W forms
a vector space itself if one defines addition of linear
maps pointwise, and multiplication with a scalar as well.

Using 2 for fixed bases B,C the vector space Hom(V,W)
is isomorphic to the space M(mxn,K) of mxn matrices
with entries in K, where matrix addition is defined in
the usual way.
Indeed this fact can be considered as the reason why
addition is defined in the way it is.

4. The set End(V)=Hom(V,W) is a ring: addition is defined
as in 3 and multiplication is the composition of linear
maps.
Using 2 for fixed basis B,C this ring is ismorphic to the
ring of square matrices M(nxn,K) with entries in K,
where the multiplication is defined in the usual way.
Again this fact can be considered as the reason to define
matrix multiplication in the way it is defined.

Remark: it is quite common among students to treat
matrices as if they were linear maps. But this point of
view easily leads to a lot of confusion and to
unnecessary work. Rather one should think of matrices
as coordinate descriptions of linear maps.
2005-07-20 16:28:41 UTC
Post by Hagen
Abstract coordinate free linear algebra is one
motivation to define matrices and their addition and
multiplication in the way they are defined. The following
1. Every finitely generated vector space V (over some field
K) possesses a finite basis and is thus isomorphic to the
vector space K^n for some n (called the dimension of V).
I learned that finite vector spaces will have a finite basis. When
you refer to "over some field K", do you mean a field to be like
whether real numbers, integers are used for values of the matrices?
Post by Hagen
2. A linear map f:V-->W between vector spaces (over the same
field K of course) of finite dimension can be concretely
described using a mxn matrix A, where m=dimW, n=dimV.
Okay.
Post by Hagen
The describition goes as follows: choose a basis B in V
and a basis C in W.
Let x be the vector of coordinates of an element v of V
with respect to B.
Let y be the vector of coordinates of f(v) with respect
to C.
Then y=Ax in terms of matrix multiplication.
Interesting.
Post by Hagen
3. The set Hom(V,W) of all linear maps f:V-->W forms
a vector space itself if one defines addition of linear
maps pointwise, and multiplication with a scalar as well.
Using 2 for fixed bases B,C the vector space Hom(V,W)
is isomorphic to the space M(mxn,K) of mxn matrices
with entries in K, where matrix addition is defined in
the usual way.
Indeed this fact can be considered as the reason why
addition is defined in the way it is.
4. The set End(V)=Hom(V,W) is a ring: addition is defined
as in 3 and multiplication is the composition of linear
maps.
Using 2 for fixed basis B,C this ring is ismorphic to the
ring of square matrices M(nxn,K) with entries in K,
where the multiplication is defined in the usual way.
Again this fact can be considered as the reason to define
matrix multiplication in the way it is defined.
I'm unable to follow the above argument since I've been taught
linear algebra from the applied sciences perspective and so do not know
what Hom(V,W), End(V), M(mxn, K), refer to. However, I would guess that
M is the set of all mxn matrices over the field K, and Hom(V,W) is the
set of all homorphisms from V to W (but I've not actually been taught
homorphisims).
Post by Hagen
Remark: it is quite common among students to treat
matrices as if they were linear maps. But this point of
view easily leads to a lot of confusion and to
unnecessary work. Rather one should think of matrices
as coordinate descriptions of linear maps.
Coordinate descriptions of linear maps because the matrices depend
on which basis used? I'd appreciate a further elaboration because I may
be one of the students you refer to.

Thank you very much for your assistance. It is more technical than
what I had expected, but I will do further study to understand about
Hom, End, etc.

Hagen
2005-07-21 08:54:04 UTC
Post by Hagen
Post by Hagen
Abstract coordinate free linear algebra is one
motivation to define matrices and their addition
and
Post by Hagen
multiplication in the way they are defined. The
following
Post by Hagen
1. Every finitely generated vector space V (over
some field
Post by Hagen
K) possesses a finite basis and is thus isomorphic
to the
Post by Hagen
vector space K^n for some n (called the dimension
of V).
I learned that finite vector spaces will have a
ve a finite basis. When
you refer to "over some field K", do you mean a field
to be like
whether real numbers, integers are used for values of
the matrices?
Not exactly. What I mean is this: a vector space V over
a field K is a set of things (called vectors) that one
(Of course one can make this precise writing down
Moreover one can multiply vectors v with elements of
the field K (called scalars) such that the following
rules hold:
(a+b)v=av+bv
(ab)v=a(bv)
a(v+w)=av+aw
where a,b are elements of K, v,w are elements of V.
Post by Hagen
Post by Hagen
3. The set Hom(V,W) of all linear maps f:V-->W
forms
Post by Hagen
a vector space itself if one defines addition of
linear
Post by Hagen
maps pointwise, and multiplication with a scalar as
well.
Post by Hagen
Using 2 for fixed bases B,C the vector space
Hom(V,W)
Post by Hagen
is isomorphic to the space M(mxn,K) of mxn matrices
with entries in K, where matrix addition is defined
in
Post by Hagen
the usual way.
Indeed this fact can be considered as the reason
why
Post by Hagen
addition is defined in the way it is.
4. The set End(V)=Hom(V,W) is a ring: addition is
defined
Post by Hagen
as in 3 and multiplication is the composition of
linear
Post by Hagen
maps.
Using 2 for fixed basis B,C this ring is ismorphic
to the
Post by Hagen
ring of square matrices M(nxn,K) with entries in K,
where the multiplication is defined in the usual
way.
Post by Hagen
Again this fact can be considered as the reason to
define
Post by Hagen
matrix multiplication in the way it is defined.
I'm unable to follow the above argument since
since I've been taught
linear algebra from the applied sciences perspective
and so do not know
what Hom(V,W), End(V), M(mxn, K), refer to. However,
I would guess that
M is the set of all mxn matrices over the field K,
and Hom(V,W) is the
set of all homorphisms from V to W (but I've not
actually been taught
homorphisims).
That's right. Anyway I defined these sets (in words)
in my original post.
Post by Hagen
Post by Hagen
Remark: it is quite common among students to treat
matrices as if they were linear maps. But this
point of
Post by Hagen
view easily leads to a lot of confusion and to
unnecessary work. Rather one should think of
matrices
Post by Hagen
as coordinate descriptions of linear maps.
Coordinate descriptions of linear maps because the
the matrices depend
on which basis used?
Yes.
Post by Hagen
I'd appreciate a further
elaboration because I may
be one of the students you refer to.
Here is an example that hopefully shows the point:

Let V be the set of all polynomial of the form
Ax+B, where A,B are the real coefficients, and x is
the variable of the polynomial.
So V consists of all polynomials in one variable of
degree <=1.

V is a vector space (over the reals), because you can
add polynomials of degree <=1 and get a polynomial of
the same type. Also you can multiply with a real number.

Now recall your analysis course and consider
differentiation: the first derivative (Ax+B)' of a
polynomial in V equals the constant polynomial A, right?

Consider the map:

D: V-->V, p-->p'

that maps a polynomial in V to its first derivative.

This is a linear map as you can easily check.
(However, no matrix appearing here.)

What does a matrix description of the map D look like?

Ok, choose some basis of V. The obvious one is (1,x).
Use this basis >>on both sides of the map V-->V<<,
that is B=C in the notation of my original post.

The coordinate vector of p=Ax+B with respect to (1,x)
is the vector (A,B).
So in particular the coordinate vectors of 1 and x
are (0,1) and (1,0) respectively.
The coordinate vector of d(Ax+B)=A is (0,A).

From these information you can now derive that the
matrix D representing the linear map d with respect to
the basis (1,x) is the 2x2 matrix

0 0
0 1
2005-07-22 17:03:55 UTC
Post by Hagen
From these information you can now derive that the
matrix D representing the linear map d with respect to
the basis (1,x) is the 2x2 matrix
0 0
0 1
Hi,

I apppreciate your reponse. I'm spending time to understand the origin
of all of the requirements, such as why the axioms are defined as such,
multiplication as such, etc. That is, going right down to the
fundamental thought processes that must have occured within
mathematicians minds when developing the various parts of the theory.
At the present time that has me figuring out why linear functions are
of such seeming importance and applicable to so much in the world.

I will write back once I have learned more.

2005-07-23 20:24:44 UTC
After some thought, I believe I now better understand the motivation
for matrix algebra, along with the fundamental thought processes.

Basically, matrix algebra, which has as its fundamental core the method
of matrix multiplication, is an algebra to convienently deal with the
composition of linear functions. Such compositions frequently occur
during a coordinate transformation.

The theory can be developed as follows.

Suppose we have sets of equations representing coordinate
transformations.
x'' = a_11x' + a_12y'
y'' = a_21x' + a_22y'

x' = b_11x + b_12y
y' = b_21x + b_22y

Another represention, which I stress is only an equivalent
representation, is
(a_11 a_12)(x') = (x'')
(a_21 a_22)(y') = (x'')
and
(b_11 b_12)(x) = (x')
(b_21 b_22)(y) = (y')

now based on the meaning of '=', that is "equals", we can may write
(a_11 a_12)[(b_11 b_12)(x)] = (x'')
(a_21 a_22)[(b_21 b_22)(x)] = (y'')

However, the algebra comes when we try to find a single object to
represent the two original ones. That is the new idea created by the
matrix algebra. Up to this point it has been nothing more than an
alternative representation for sets of equations.

If we perform substitutions into the original equations, we arrive at
the results of the known matrix multiplication. The abstraction of the
process into a formula is then done and the theory developed from
there.

At its core though, this is the basis for the theory as I see it. That
FACT. :)

It's nice to see this. There is no magic in it at all. Just logic and
conciseness.

quasi
2005-07-24 00:27:43 UTC
After some thought, I believe I now better understand the motivation
for matrix algebra, along with the fundamental thought processes.
Basically, matrix algebra, which has as its fundamental core the method
of matrix multiplication, is an algebra to convienently deal with the
composition of linear functions. Such compositions frequently occur
during a coordinate transformation.
The theory can be developed as follows.
Suppose we have sets of equations representing coordinate
transformations.
x'' = a_11x' + a_12y'
y'' = a_21x' + a_22y'
x' = b_11x + b_12y
y' = b_21x + b_22y
Another represention, which I stress is only an equivalent
representation, is
(a_11 a_12)(x') = (x'')
(a_21 a_22)(y') = (x'')
and
(b_11 b_12)(x) = (x')
(b_21 b_22)(y) = (y')
now based on the meaning of '=', that is "equals", we can may write
(a_11 a_12)[(b_11 b_12)(x)] = (x'')
(a_21 a_22)[(b_21 b_22)(x)] = (y'')
However, the algebra comes when we try to find a single object to
represent the two original ones. That is the new idea created by the
matrix algebra. Up to this point it has been nothing more than an
alternative representation for sets of equations.
If we perform substitutions into the original equations, we arrive at
the results of the known matrix multiplication. The abstraction of the
process into a formula is then done and the theory developed from
there.
At its core though, this is the basis for the theory as I see it. That
FACT. :)
It's nice to see this. There is no magic in it at all. Just logic and
conciseness.
Well put.

In my opinion, you have succeeded in capturing the main motivation,
ignoring the side issues.

If linear transformations f,g have compatible dimensions so that f
composed with g makes sense, and if A, B are the matrix
representations of f,g respectively, then we would like to define a
product operation on matrices (defined whenever the associated
composition is defined) so that the matrix representation of f
composed with g is A*B.

There is only one such product that works and it's precisely the
standard matrix multiplication.

Note that by viewing matrices as representing linear maps, matrix
multiplication automatically satisfies a number of laws since the
underlying linear maps do, such as the associative law (since
composition is associative), and the distributive law.

quasi
2005-07-24 20:06:25 UTC
Post by quasi
Well put.
In my opinion, you have succeeded in capturing the main motivation,
ignoring the side issues.
Thank you.

It was my intent, as stated, to understand the initial motivation
for the theory in terms of ideas and not the terminonolgy or ideas
later developed.

Too often educators, at least many that have taught me, focus on
definitions and terminology or even far too abstract ideas before
initially getting across the actual motivation and specific ideas of
what is being taught. They forget that all of the mathematical
definitions are not created out of thin air, but rather are created in
human brains which are driven by intent and motivation. And before we
students can properly understand the mathmatics we must understand the
specific and fundamental ideas in human terms and not definitions
created after the fact or be assumed to have it all fit together in our
minds once we've memorized a set of definitions.

At least, for me, learning the ideas in human terms before anything
else is preferable. Since it's my choice when learning on my own, that
is what I do. And after a while of contemplation and searching the
fundamental ideas are found and after that I can understand why
definitions are given as such and theorems are true. It makes actual
understanding possible for me. I can't be alone in this manner.

What else is interesting is then determining just what it means
that matrix algebra can be applied to some physical problem based on
ideas on not just algebra. That is, where the multiplications,
additions, compositions have physical meaning and then to find that
meaning.

For instance, if a physical problem obeys the linear function
requirement, f(x + y) = f(x) + f(y), then that may hint at a more
fundamental concept of what constitutes x and y. I mean, if f
represents a law, then things, if broken down to x and y, may be able
to be more simply understood than x + y. Well, it is hard to explain.
Post by quasi
Note that by viewing matrices as representing linear maps, matrix
multiplication automatically satisfies a number of laws since the
underlying linear maps do, such as the associative law (since
composition is associative), and the distributive law.
Exactly! That is why I sought the motivation for matrix algebra. It
is discovering that motivation that leads to further ideas that extend
and propel the construction of the theory.

In a text I have it lists the axioms that must be satisfied. They
are introduced in a manner that makes one wonder just why they are
defined as such. The axioms are then used to construct the theory and
show that linear compositions, etc, can be represented. But in
actuality it was the other way around! The compositions motivated the
axioms. To put the theory on firm logical and mathematical ground the
theory is then expressed beginning with such axioms, and of course, it
works out in way such that the mathematics has wide applicability.

I'm off to further my understanding of matrix algebra now. Knowing
what is behind many of the definitions that I've previously learned.

James Dolan
2005-08-06 12:29:32 UTC

|Hello,
|
|In order to better understand matrix algebra, I seek the motivation
|behind its creation. That is, why is the concept of a matrix, and
|matrix multiplication defined as they are?
|
|I read that the definition for matrix multiplication was defined to
|preserve the linear relationship of the composition of linear
|functions, which I do not feel confident in my understanding.
|
|For instance, suppose
|
|G o F(x) = G(F(x)) where G and F are linear functions whose domains
|and ranges work. Then clearly,
|
|G o F(ax + by) = aG(F(x)) + bG(F(y)).
|
|Now, if a linear function is represented in matrix algebra as a matrix,
|then why and how does the definition of matrix multiplication suite
|this?
|
|That is, why is the definition of matrix algebra,
|
|C = AB <=> C_ij = Sum_{k=1}^{n}A_ik*B_kj
|
|defined as so?
|
|
|As is evident, I am confused. The application of the definitions is
|fine, but I seek a reason or motivation for the very definitions of
|the matrix algebra. What was the intent in the minds of the humans
|that created this subject to begin with?
|
|If this is not explained well, please let me know. I truly wish to
|understand this.

i do find it somewhat difficult trying to understand what your real
question is. (perhaps not as difficult as trying to understand some
of the answers that people gave you, though.) anyway, i only have
something very trivial to say, but i'm not completely sure that it
isn't what you're looking for.

composition of functions is a more fundamental concept than
multiplication of matrixes. that is, you can define composition of
functions by the basic formula

[g o f](x) = g(f(x))

and then multiplication of matrixes "defines itself" (as a special
case of composition of functions) once you specify how to interpret a
matrix

a11 ... am1
. .
. .
. .
a1n ... amn

as a function f, namely by taking

f(x1,...,xm) = (sum_[j=1]^m aj1*xj, ... , sum_[j=1]^m ajn*xj).

thus if you can make yourself happy with the idea of defining linear
functions as those that arise from matrixes in this way, then you
should also be able to see how this compels you to invent the standard
concept of "matrix multiplication".

--