Calculate value of double summation of function over the pairs of vertices of a graph Calculate value of double summation of function over the pairs of vertices of a graph numpy numpy

Calculate value of double summation of function over the pairs of vertices of a graph


The expected answer R_inf[1,2] = 0.8 and R_inf[6,8] = 0.8 of OP's example a is the result of the algorithm applied to the directed graph a.T, not the undirected.

Here is R_2 computed with OP's code (with some minor fixes) and with my code. And R_inf also computed with my code:

after 2 iterations with pp_loop:0.8000 0      0      0      0      0      0      0.8000 0     0      0.8000 0.8000 0      0      0.4000 0.3200 0      0.32000      0.8000 0.8000 0      0      0.4000 0.3200 0      0.32000      0      0      0.8000 0.4800 0      0      0      0     0      0      0      0.4800 0.8000 0      0      0      0     0      0.4000 0.4000 0      0      0.8000 0.1600 0      0.16000      0.3200 0.3200 0      0      0.1600 0.8000 0      0.80000.8000 0      0      0      0      0      0      0.8000 0     0      0.3200 0.3200 0      0      0.1600 0.8000 0      0.8000after 2 iterations with OP:0.8000 0      0      0      0      0      0      0.8000 0     0      0.8000 0.8000 0      0      0.4000 0.3200 0      0.32000      0.8000 0.8000 0      0      0.4000 0.3200 0      0.32000      0      0      0.8000 0.4800 0      0      0      0     0      0      0      0.4800 0.8000 0      0      0      0     0      0.4000 0.4000 0      0      0.8000 0.1600 0      0.16000      0.3200 0.3200 0      0      0.1600 0.8000 0      0.80000.8000 0      0      0      0      0      0      0.8000 0     0      0.3200 0.3200 0      0      0.1600 0.8000 0      0.8000in fixed point:0.8000 0      0      0      0      0      0      0.8000 0     0      0.8000 0.8000 0      0      0.4000 0.3200 0      0.32000      0.8000 0.8000 0      0      0.4000 0.3200 0      0.32000      0      0      0.8000 0.4800 0.0960 0.0256 0      0.02560      0      0      0.4800 0.8000 0.1280 0      0      0     0      0.4000 0.4000 0.0960 0.1280 0.8000 0.1600 0      0.16000      0.3200 0.3200 0.0256 0      0.1600 0.8000 0      0.80000.8000 0      0      0      0      0      0      0.8000 0     0      0.3200 0.3200 0.0256 0      0.1600 0.8000 0      0.8000

My solution is faster and can compute the fixed point for networks up to ~1000 nodes.

A bit of explanation: The strategy is to express the recurrence relation by matrix algebra, because then we can leverage the highly optimized blas powered linear algebra that comes with numpy/scipy. For example, the inner sums - all of them in one go - can be done by multiplying the matrix R_k of current values with the transposed adjacency matrix. This may seem wasteful considering all the zeros in the adjacency matrix but blas is just very, very fast. Similarly, we can compute the sizes of the intersections by multiplying the adjacency matrix with its transpose, etc.

Assuming that the ultimate goal is not so much the sequence of R's but the fixed point we can save some more time by using a proper linear solver, because the recurrence is linear (+ offset). As writing the n^2xn^2 matrix of this operation is not efficient we use an iterative solver that can use a LinearOperator in place of a matrix.

If tolerances are set tight enough both approaches yield the same fixed point.

Code including OP's with fixes:

import numpy as npfrom scipy import sparsefrom scipy.sparse import linalgdef mock_data(n, p):    data = (np.random.random((n, n)) < p).astype(int)    np.einsum('ii->i', data)[:] = 0    return dataimport copyimport scipyimport scipy.sparse as spimport timeimport warningsclass Algo():    def __init__(self, c=0.8, max_iter=5, is_verbose=False, eps=1e-3):        self.c = c        self.max_iter = max_iter        self.is_verbose = is_verbose        self.eps = eps    def compute_similarity(self, a, R0=None, use_undirected=False,                           no_neighbor_policy='10'):        '''        Note: The adjacency matrix is a (0,1)-matrix with zeros on its         diagonal.        '''                                         a_transpose=np.transpose(a)        a_undirected=np.add(a,a_transpose)        a_use = a_undirected if use_undirected else a        self.num_nodes = np.shape(a)[0]        current_sim = np.eye(self.num_nodes) if R0 is None else R0        prev_sim = np.zeros(shape=(self.num_nodes, self.num_nodes))        #Determine the set of Neighbors for each node        nbs = []        for i in range(self.num_nodes):            nbs.append(np.nonzero(a_use[i])[0])        R = []        for itr in range(self.max_iter):            if scipy.allclose(current_sim, prev_sim, atol=self.eps):                break            prev_sim = copy.deepcopy(current_sim)            R.append(prev_sim)            for i in range(self.num_nodes):                for j in range(self.num_nodes):                    if len(nbs[i]) == 0 and len(nbs[j]) == 0:                        current_sim[i][j] = (                            0 if (no_neighbor_policy=='00' or                                  (no_neighbor_policy=='10' and i!=j))                            else self.c)                    elif i == j:                        current_sim[i][j] = self.c                    elif len(nbs[i]) == 0 or len(nbs[j]) == 0:                        current_sim[i][j] = 0                    else:                        #commonSet=0.0                        differenceSetofp_ij=0.0                        differenceSetofq_ij=0.0                        commonSet=len(np.intersect1d(nbs[i],nbs[j]))/np.maximum(len(np.union1d(nbs[i],nbs[j])), 1)                        for x in np.setdiff1d(nbs[i],nbs[j]):                            for y in nbs[j]:                                differenceSetofp_ij+=prev_sim[x][y]                        differenceSetofp_ij*=1/np.maximum(len(np.union1d(nbs[i],nbs[j]))*len(nbs[j]), 1)                                #print differenceSetofp                        for x in np.setdiff1d(nbs[j],nbs[i]):                            for y in nbs[i]:                                differenceSetofq_ij+=prev_sim[x][y]                        differenceSetofq_ij*=1/np.maximum(len(np.union1d(nbs[j],nbs[i]))*len(nbs[i]), 1)                         current_sim[i][j] = self.c*(commonSet+differenceSetofp_ij+differenceSetofq_ij)        if self.is_verbose:            print('SimRank: converged after {0} iterations'.format(itr))        R.append(current_sim)        return np.array(R)def f_OP(adj, c, max_n, R0=None, use_undirected=False, no_neighbor_policy='10'):    return Algo(c, max_n).compute_similarity(adj, R0, use_undirected,                                             no_neighbor_policy)def f_pp(adj, c, max_n, R0=None, use_undirected=False,         no_neighbor_policy='10', solver=linalg.gcrotmk):    n, n = adj.shape    if use_undirected:         adj = adj | adj.T    aind = np.where(adj.T)    n_neigh = adj.sum(axis=1)    n_inter = adj @ adj.T    n_union = np.add.outer(n_neigh, n_neigh) - n_inter    r_union = c / np.maximum(n_union, 1)    if no_neighbor_policy == '11':        n_inter[n_union==0] = 1        r_union[n_union==0] = c    elif no_neighbor_policy == '10':        np.einsum('ii->i', n_inter)[np.einsum('ii->i', n_union)==0] = 1        np.einsum('ii->i', r_union)[np.einsum('ii->i', n_union)==0] = c    elif no_neighbor_policy != '00':        raise ValueError    avg_ngh = adj.T / np.maximum(n_neigh, 1)    if solver is None:        R = np.empty((max_n+1, n, n))        R[0] = 0 if R0 is None else R0        if R0 is None:            np.einsum('ii->i', R[0])[:] = 1               for i in range(max_n):            rowsums = R[i] @ avg_ngh            rowsums[aind] = 0            rec_sum = adj @ rowsums            R[i+1] = r_union * (n_inter + rec_sum + rec_sum.T)            if np.allclose(R[i+1], R[i], rtol=1e-7, atol=1e-10):                return R[:i+2]        return R    else:        def LO(x):            R = x.reshape(n, n)            rowsums = R @ avg_ngh            rowsums[aind] = 0            rec_sum = adj @ rowsums            return (r_union * (rec_sum + rec_sum.T) - R).ravel()        R0 = np.identity(n) if R0 == None else R0        LO = linalg.LinearOperator((n*n, n*n), matvec=LO)#, rmatvec=LO)        res = solver(LO, (-r_union * n_inter).ravel(), R0.ravel(), tol=1e-9)        assert res[1]==0        return res[0].reshape(n, n)def f_pp_loop(adj, c, max_n, R0=None, use_undirected=False,              no_neighbor_policy='10'):    return f_pp(adj, c, max_n, R0, use_undirected, no_neighbor_policy, None)# test on OP's example:a = np.array([ [0, 1, 1, 0, 0, 1, 0, 0, 0],               [0, 0, 0, 0, 1, 0, 0, 0, 0],               [0, 0, 0, 1, 0, 0, 0, 0, 0],               [0, 0, 0, 0, 0, 0, 0, 0, 0],               [0, 0, 0, 0, 0, 0, 1, 0, 1],               [0, 0, 0, 1, 0, 0, 0, 0, 0],               [0, 0, 0, 0, 0, 1, 0, 0, 0],               [0, 0, 0, 0, 0, 0, 1, 0, 1],               [0, 0, 0, 0, 0, 0, 0, 0, 0]])a = a.Tm, c, nnp = 2, 0.8, '11'sol = {}for f in f_pp_loop, f_OP:    name = f.__name__[2:]    sol[name] = f(a, c, m, None, False, nnp)    print(f"after {len(sol[name])-1} iterations with {name}:")    print("\n".join(" ".join("0     " if i == 0 else f"{i:6.4f}" for i in row) for row in sol[name][-1]))print("in fixed point:")print("\n".join(" ".join("0     " if i == 0 else f"{i:6.4f}" for i in row) for row in f_pp(a, c, 64, None, False, nnp)))print()# demo large networkn, m, p, c, nnp = 1000, 256, 0.02, 0.9, '11'adj = mock_data(n, p)print(f'size {n}, maxiter {m}, connection prob {p}, c {c}')from time import timet = time()spp_loop = f_pp_loop(adj, c, m, None, False, nnp)print('pp_loop: {:8.5f} sec'.format(time()-t))print('squared rel err after {} iterations: {}'.format(len(spp_loop)-1, np.nanmax(((spp_loop[-1]-spp_loop[-2])/(spp_loop[-1]+spp_loop[-2]))**2)))t = time()spp = f_pp(adj, c, m, None, False, nnp)print('pp: {:8.5f} sec'.format(time()-t))def comp(a, b):    print(np.allclose(a, b))    print('abserr', np.nanmax(np.abs(a-b)), 'relerr', np.nanmax(2 * np.abs(a-b)/(a+b)))print('fixed points equal:', end=' ')comp(spp_loop[-1], spp)


I'll use a NumPy array to hold the output (for ease of creation and indexing) and networkx for generating an example of a graph and providing sets of neighbors. Neither module is essential, as I'm not using any advanced features of them. Here is the setup:

import numpy as npimport networkx as nx G = nx.petersen_graph()    # example graphvlist = list(G.nodes)      # list of verticesV = len(vlist)             # number of verticeskmax = 4                   # maximal kC = 2                      # mysterious constant CR = np.zeros((kmax, V, V)) # array of results, R[k, i, j]np.fill_diagonal(R[0], 1)  # initialize R[0] by putting 1 on its diagonal

And here is the loop calculating R[k+1, i, j]. The main characters are Lp and Lq, the sets of neighbors of p and q. Python sets are intersected with Lp & Lq, joined with Lp | Lq, and subtracted with Lp - Lq. The lines with Rij assignments are pretty much verbatim copies of the given formulas.

for k in range(kmax-1):  for i in range(V):    for j in range(V):      Lp, Lq = set(G[vlist[i]]), set(G[vlist[j]])      Rij = len(Lp & Lq) / len(Lp | Lq)      Rij += sum([R[k, p1, q1] for q1 in Lq for p1 in Lp - Lq]) / (len(Lp | Lq) * len(Lq))      Rij += sum([R[k, p1, q1] for p1 in Lp for q1 in Lq - Lp]) / (len(Lp | Lq) * len(Lp))      R[k+1, i, j] = C*Rij