Matrix Formats¶
IceBin generates matrices and vectors that are store on-disk in one of two formats: eigen or compressed.
Eigen Format¶
Eigen Format stores matrices in NetCDF in a straightforward,
flexible format. Consider the following NetCDF file, for
example, which defines a matrix BvA.M and vectors BvA.Mw and
BvA.wM.
netcdf __linear1 {
dimensions:
dimB.dense_extent = 5 ;
dimA.dense_extent = 6 ;
BvA.M.nnz = 6 ;
BvA.M.rank = 2 ;
variables:
int64 dimB(dimB.dense_extent) ;
dimB:sparse_extent = 40LL ;
int64 dimA(dimA.dense_extent) ;
dimA:sparse_extent = 50LL ;
int BvA.info ;
BvA.info:type = "EIGEN" ;
BvA.info:conservative = 0 ;
string BvA.info:dim_names = "BvA.dimB", "BvA.dimA" ;
int64 BvA.M.info ;
BvA.M.info:shape = 5LL, 6LL ;
BvA.M.info:conservative = "f" ;
int BvA.M.indices(BvA.M.nnz, BvA.M.rank) ;
double BvA.M.values(BvA.M.nnz) ;
double BvA.Mw(BvA.dimA.dense_extent) ;
double BvA.wM(BvA.dimB.dense_extent) ;
}
In this case, two vector spaces are involved: \(\cal{A}\) and
\(\cal{B}\). The variable dimA defines a “dense” vector space
by listing the indices of \(\cal{A}\) that are used in vectors and
matrices. The attribute dimA:sparse_extent give the rank of the
original sparse vector space \(\cal{A}\); and the dimension
dimA.dense_extent gives the rank of the densified vector space
defined by dimA.
Vectors are stored in regular dense format in the dense space
dimA. For example, if BvA.Mw[0] == 28.9 and dimA[0] ==
17, then the element at index 17 in the original sparse vector
space \(\cal{A}\) equals 28.9.
Matrices (tensors) are stored in as coordinate-format sparse matrices
within the densified vector space dimA and dimB. For the
matrix BvA.M, coordinate indices are stored in BvA.M.indices
and the sparse matrix value at those indices is stored at
BvA.M.indices. In addition, the attribute
BvA.M.info:conservative declares whether this (regridding) matrix
is conservative; non-conservative regridding matrices require a
conservation correction when applied.
Compressed Format¶
Compressed format uses zlib compression to reduce the on-disk and in-memory footprint of large matrices, while still allowing for them to be multiplied by vectors. Compressed format vectors and matrices are stored in the original sparse vector space \(cal{A}\), allowing for easy use in applications where sparse vector spaces are not densified (such as ModelE). Here is an example:
netcdf __linear3 {
dimensions:
BvA.wM.indices.zsize = 29 ;
BvA.wM.values.zsize = 36 ;
BvA.M.indices.zsize = 41 ;
BvA.M.values.zsize = 30 ;
BvA.Mw.indices.zsize = 30 ;
BvA.Mw.values.zsize = 33 ;
variables:
int BvA.info ;
BvA.info:type = "COMPRESSED" ;
BvA.info:conservative = 0 ;
int BvA.wM.info ;
BvA.wM.info:format = "ZARRAY" ;
BvA.wM.info:rank = 1 ;
BvA.wM.info:nnz = 3LL ;
BvA.wM.info:shape = 40LL ;
ubyte BvA.wM.indices(BvA.wM.indices.zsize) ;
ubyte BvA.wM.values(BvA.wM.values.zsize) ;
int BvA.M.info ;
BvA.M.info:format = "ZARRAY" ;
BvA.M.info:rank = 2 ;
BvA.M.info:nnz = 6LL ;
BvA.M.info:shape = 40LL, 50LL ;
ubyte BvA.M.indices(BvA.M.indices.zsize) ;
ubyte BvA.M.values(BvA.M.values.zsize) ;
int BvA.Mw.info ;
BvA.Mw.info:format = "ZARRAY" ;
BvA.Mw.info:rank = 1 ;
BvA.Mw.info:nnz = 4LL ;
BvA.Mw.info:shape = 50LL ;
ubyte BvA.Mw.indices(BvA.Mw.indices.zsize) ;
ubyte BvA.Mw.values(BvA.Mw.values.zsize) ;
}
Vectors and matrices are both stored as compressed tensors of rank 1
or 2, respectively. In the above case, the variables
BvA.M.indices and BvA.M.values store the matrix in coordinate
format; however, the contents of those arrays is not directly readable
without first running through decompression. IceBin provides
libraries, accessible from Python and C++, that will decompress these
sparse matrices as needed.
Python API¶
The following sample program demonstrates how to read and use sparse matrices to NetCDF using Python. The Python API for both of matrices is the same, shielding the user from implementation differences between the two.
IceBin typically generates a regridding matrix, along with weight
vectors for the two dimensions involved in the matrix. Depending on
the case, the weight vectors may or may not be the same as the sum of
rows and columns of the matrix. The Python class
ibmisc.linear_Weighted holds a matrix plus two vectors, and allows
basic multiplication and dot product operations on them.
Loading¶
To load a linear_Weighted from a NetCDF file, use (for example):
with ibmisc.NcIO('file.nc', 'r') as ncio:
lw = ibmisc.nc_read_weighted(ncio, 'BvA')
Dot Products¶
The matrix in linear_Weighted may be applied to a vector using the
apply_M() method. Note that the input A_s is in the
original sparse vector space \(\cal{A}\):
def apply_M(self, A_s, fill=np.nan, bool force_conservation=True):
"""Applies the regrid matrix to A_s.
A_s: Either:
- A single vector (1-D array) to be transformed.
- A 2-D array of row vectors to be transformed.
fill:
Un-set indices in output array will get this value.
force_conservation: bool
If M is not conservative, apply a conservation correction
at the end"""
Similarly, the method apply_weight() takes inner products with the
weight vectors:
def apply_weight(self, int dim, A_s):
"""Computes dot product of a weight vector with A_s.
dim:
0 = use weight vector for B (output) dimension
1 = use weight vector for A (input) dimension
A_s: Either:
- A single vector (1-D array) to be transformed.
- A 2-D array of row vectors to be transformed."""
def apply_wM(self, A_s):
return self.apply_weight(0,A_s)
def apply_Mw(self, A_s):
return self.apply_weight(1,A_s)
Extraction¶
The method to_coo() returns the matrix in standard Python
coordinate format, as a scipy.sparse.coo_matrix object.
Similarly, the method get_weights(dim) returns the weight vector
(0=B, 1=A) as a numpy.array.