What's the fastest way to find deepest path in a 3D array? What's the fastest way to find deepest path in a 3D array? arrays arrays

What's the fastest way to find deepest path in a 3D array?


I think you should be able to do it in O(N).When you parse your input, assign each node a 'caveNumber' initialized to 0. Set it to a valid number whenever you visit a cave:

CaveCount = 0, IsolatedCaveCount=0AllSizes = new Vector.For each node,    ProcessNode(size:0,depth:0);ProcessNode(size,depth):   If node.isCave and !node.caveNumber        if (size==0) ++CaveCount       if (size==0 and depth!=0) IsolatedCaveCount++       node.caveNumber = CaveCount       AllSizes[CaveCount]++       For each neighbor of node,             if (goingDeeper) depth++            ProcessNode(size+1, depth).

You will visit each node 7 times at worst case: once from the outer loop, and possibly once from each of its six neighbors. But you'll only work on each one once, since after that the caveNumber is set, and you ignore it.

You can do the depth tracking by adding a depth parameter to the recursive ProcessNode call, and only incrementing it when visiting a lower neighbor.


The solution shown below (as a python program) runs in time O(n lg*(n)), where lg*(n) is the nearly-constant iterated-log function often associated with union operations in disjoint-set forests.

In the first pass through all cells, the program creates a disjoint-set forest, using routines called makeset(), findset(), link(), and union(), just as explained in section 22.3 (Disjoint-set forests) of edition 1 of Cormen/Leiserson/Rivest. In later passes through the cells, it counts the number of members of each disjoint forest, checks the depth, etc. The first pass runs in time O(n lg*(n)) and later passes run in time O(n) but by simple program changes some of the passes could run in O(c) or O(b) for c caves with a total of b cells.

Note that the code shown below is not subject to the error contained in a previous answer, where the previous answer's pseudo-code contains the line
if (size==0 and depth!=0) IsolatedCaveCount++
The error in that line is that a cave with a connection to the surface might have underground rising branches, which the other answer would erroneously add to its total of isolated caves.

The code shown below produces the following output:
Deepest: 5 Largest: 22 Isolated: 3
(Note that the count of 24 shown in your diagram should be 22, from 4+9+9.)

v=[0b0000010000000000100111000,   # Cave map   0b0000000100000110001100000,   0b0000000000000001100111000,   0b0000000000111001110111100,   0b0000100000111001110111101]nx, ny, nz = 5, 5, 5inlay, ncells = (nx+1) * ny,  (nx+1) * ny * nzmasks = []for r in range(ny):    masks += [2**j for j in range(nx*ny)][nx*r:nx*r+nx] + [0]p = [-1 for i in range(ncells)]  # parent linksr = [ 0 for i in range(ncells)]  # rankc = [ 0 for i in range(ncells)]  # forest-size countsd = [-1 for i in range(ncells)]  # depthsdef makeset(x): # Ref: CLR 22.3, Disjoint-set forests    p[x] = x    r[x] = 0def findset(x):    if x != p[x]:        p[x] = findset(p[x])    return p[x]def link(x,y):    if r[x] > r[y]:        p[y] = x    else:        p[x] = y        if r[x] == r[y]:            r[y] += 1def union(x,y):    link(findset(x), findset(y))fa = 0                          # fa = floor abovebc = 0                          # bc = floor's base cell #for f in v:                     # f = current-floor map    cn = bc-1                   # cn = cell#    ml = 0    for m in masks:        cn += 1        if m & f:            makeset(cn)            if ml & f:                union(cn, cn-1)            mr = m>>nx            if mr and mr & f:                union(cn, cn-nx-1)            if m & fa:                union(cn, cn-inlay)        ml = m    bc += inlay    fa = ffor i in range(inlay):    findset(i)    if p[i] > -1:        d[p[i]] = 0for i in range(ncells):    if p[i] > -1:        c[findset(i)] += 1        if d[p[i]] > -1:            d[p[i]] = max(d[p[i]], i//inlay)isola = len([i for i in range(ncells) if c[i] > 0 and d[p[i]] < 0])print "Deepest:", 1+max(d), "  Largest:", max(c), "  Isolated:", isola


It sounds like you're solving a "connected components" problem. If your 3D array can be converted to a bit array (e.g. 0 = bedrock, 1 = cave, or vice versa) then you can apply a technique used in image processing to find the number and dimensions of either the foreground or background.

Typically this algorithm is applied in 2D images to find "connected components" or "blobs" of the same color. If possible, find a "single pass" algorithm:

http://en.wikipedia.org/wiki/Connected-component_labeling

The same technique can be applied to 3D data. Googling "connected components 3D" will yield links like this one:

http://www.ecse.rpi.edu/Homepages/wrf/pmwiki/pmwiki.php/Research/ConnectedComponents

Once the algorithm has finished processing your 3D array, you'll have a list of labeled, connected regions, and each region will be a list of voxels (volume elements analogous to image pixels). You can then analyze each labeled region to determine volume, closeness to the surface, height, etc.

Implementing these algorithms can be a little tricky, and you might want to try a 2D implementation first. Thought it might not be as efficient as you like, you could create a 3D connected component labeling algorithm by applying a 2D algorithm iteratively to each layer and then relabeling the connected regions from the top layer to the bottom layer:

  1. For layer 0, find all connected regions using the 2D connected component algorithm
  2. For layer 1, find all connected regions.
  3. If any labeled pixel in layer 0 sits directly over a labeled pixel in layer 1, change all the labels in layer 1 to the label in layer 0.
  4. Apply this labeling technique iteratively through the stack until you reach layer N.

One important considering in connected component labeling is how one considers regions to be connected. In a 2D image (or 2D array) of bits, we can consider either the "4-connected" region of neighbor elements

X 1 X1 C 1X 1 X

where "C" is the center element, "1" indicates neighbors that would be considered connected, and "X" are adjacent neighbors that we do not consider connected. Another option is to consider "8-connected neighbors":

1 1 11 C 11 1 1

That is, every element adjacent to a central pixel is considered connected. At first this may sound like the better option. In real-world 2D image data a chessboard pattern of noise or diagonal string of single noise pixels will be detected as a connected region, so we typically test for 4-connectivity.

For 3D data you can consider either 6-connectivity or 26-connectivity: 6-connectivity considers only the neighbor pixels that share a full cube face with the center voxel, and 26-connectivity considers every adjacent pixel around the center voxel. You mention that "diagonally placed" doesn't count, so 6-connectivity should suffice.