The first key observation: for each d between 1 and D, inclusive, we will compute the number of pairs of vertices that have distance d. For our first solution, we will do this incrementally in T. For T=1, we have one pair of vertices with distance 1, and nothing else. What happens if we already know everything about the tree for some T, and increment it to T+1? First of all, we get 2^T new pairs with distance 1: each of the old vertices with its new child. Additionally, each pair of original vertices u, v will generate four pairs in the new tree: (u,v), (u',v), (u,v'), and (u',v'). Here, u' is the new child of u, and v' the new child of v. If the original pair had distance d, the four new pairs have distances d, d+1, d+1, and d+2. We could now use dynamic programming: if there are 47 different pairs with distance d in the old tree, these would produce 47 pairs with distance d, 94 pairs with distance d+1, and 47 pairs with distance d+2. This would give us an O(DT) solution if we are looking for paths of length D in a tree that is created in T steps. Of course, this would hardly work for T=10^9. Luckily, the dynamic programming only uses linear equations, which means that we can instead compute it using matrix multiplication. We can find a matrix A with the following property: if v=(2^T, c_1, c_2, ..., c_D) is a vector containing the number 2^T and the counts c_1 ... c_D of pairs in distance 1 ... D in the generation-T tree, the vector v*A describes the generation-(T+1) tree in the same way. Once we have constructed this A, the answer can be computed as the last element of the vector (1,0,...,0)*A^T. The above solution computes the answer in O(D^3 * log T). Sadly, with D=500 that is still too slow (but not by much). What can be improved? The final step was to notice that the matrix A has a special form -- it is almost diagonal, and almost all rows look roughly the same. The powers of A will also have a special form, and using that observation we can find a way to muliply two powers of A in O(D^2) time. That gives us a solution that runs in O(D^2 * log T), which is already fast enough.