My question is about artificial intelligence, specifically neural networks.Hi I'm wondering what search method does back propagation use and its search space. I can't find resources that state what it is.

As a physicist who knows (something) about General Relativity, I've costumed to the term "maximally symmetric space"... being an $n$-dimensional manifold with $\frac{n(n+1)}{2}$ Killing vectors. A demonstration of that can be found in Weinberg's book "Gravitation and Cosmology", but it is always assumed that the manifold is Riemannian, and the connection is the Levi-Civita connection.

**Question(s)**: Is the above statement true for affine manifolds? Can you recommend me some bibliography on the subject?

*P.D.*: I'm interested in considering connections with torsion and non-metricity.

From a planar graph $\Gamma$, equipped with an integer-valued weight function $d:E(\Gamma) \sqcup V(\Gamma) \to \mathbb{Z}$, one can build a $3$-manifold $M_{\Gamma}$ as follows. For each vertex $v$, draw a small planar unknot centered at $v$. For each edge $e$ connecting vertices $v$ and $w$, add a series of $d(e)$ clasps between the corresponding unknots (with positive weights represented by right-handed clasps and negative weights represented by left-handed clasps). The result is a link with unknotted components - call it $L_{\Gamma}$. To get the $3$-manifold $M_{\Gamma}$, perform Dehn surgery on each component of $L_{\Gamma}$, with framings $$ f(v) = d(v) + \sum_{e \ni v} d(e). $$

In fact, *every 3-manifold $M$ is diffeomorphic to a manifold of the form $M_{\Gamma}$*. This can be proved using ideas from a paper of Matveev and Polyak (*A Geometrical Presentation of the Surface Mapping Class Group and Surgery*), as follows. Choose a Heegaard splitting for $M$, and write the gluing map as a composition of Lickorish twists. Then use the graphical calculus in sections 3 and 4 of that paper (omitting the twists denoted $\epsilon_i$) to produce a tangle whose plat closure is a framed link of the form $L_{\Gamma}$, such that the result of surgery on this link is $M$. This argument is given by Polyak in slides available on his website.

There is another proof as well, which involves taking an arbitrary link surgery presentation and repeatedly simplifying it. There is a natural way to measure the complexity of a link diagram, such that links of minimal complexity are of the form $L_{\Gamma}$. It is always possible to reduce this complexity by adding cancelling unknotted components and doing handleslides.

One can take the idea further and show that there is a finite set of local moves which suffice to relate any two weighted planar graphs representing the same 3-manifold. Indeed, this follows abstractly from the fact that the mapping class group is finitely presented. However, the moves produced by Wajnryb's presentation (for example) are rather complicated and nasty-looking. It is therefore natural to ask if one can find a more appealing set of moves.

One must certainly include the following moves (please excuse the lack of pictures):

- Self-loops and edges with weight $0$ can be eliminated.
- Any two parallel edges can be combined, at the expense of adding their weights.
- Suppose that $v$ is a vertex incident to exactly one edge $e$, with $d(e) = \pm f(v) = \pm 1$. Then $v$ and $e$ can be deleted, at the expensing of changing the framing on the other endpoint of $e$, call it $w$. If $f(v) = 0$, then $w$ (and all of its incident edges) can be removed from the graph .
- Suppose that $v$ is a vertex incident to exactly two edges $e_1$ and $e_2$, with $d(e_1) = d(e_2) = - f(v) = \pm 1$, then the vertex $v$ can be replaced with a single edge joining the opposite endpoints of $e_1$ and $e_2$, call them $w_1$ and $w_2$. If $d(e_1) = - d(e_2)$ and $f(v) = 0$, then $e_1$ and $e_2$ can be contracted, with the resulting vertex having weight $d(w_1) + d(w_2)$.
- Suppose that $v$ is a vertex incident to exactly three edges, and suppose that $d(e_1) = d(e_2) = -d(e_3) = f(v)$. Then $v$, together with all $e_i$, can be eliminated at the expense of adding a triangle connecting the opposite endpoints of the edges $e_i$.

Let's call any of the above moves a "blowdown", and let's call their inverses "blowups". All of them can be easily deduced using Kirby calculus, or from relations in the mapping class group.

**Question 1**: Are blowups and blowdowns sufficient to relate
any two planar graph presentations of a given $3$-manifold?

If the answer to this question is no, then there are additional non-local moves to consider:

- If two edges $e_1$ and $e_2$ connect the same pair of vertices (but are not necessarily parallel), then they can be combined, at the expense of adding their weights.
- If there is a vertex $v$ which divides $\Gamma$ into multiple components, those components can be "permuted around $v$". Any of these components which is connected to $v$ by a single edge $e$ with weight $\pm 1$ can also be "flipped over", at the expense of changing the weight on $e$.
- If there are two vertices $v$ and $w$ which separate the graph into multiple components, then any component which is joined to both $v$ and $w$ by a single pair of edges with opposite weights in $\{\pm 1\}$ can be "flipped over", at the expense of changing the weights on the edges.

Let's call any of the above moves a ``mutation''.

**Question 2**: Are blowups, blowdowns, and mutations sufficient to relate any two
planar graph presentations of a given $3$-manifold?

If the answer to this question is also no, then it would be good to have a nice answer to the following (admittedly vague) question:

**Question 3**: What is the "simplest possible" set of moves which are sufficient to relate any two planar graph presentations of a given $3$-manifold?

One argument in favor of a positive answer to Questions 1 or 2, or at least a very nice answer to Question 3, is that Kirby calculus itself admits a finite set of simple local moves. One approach might be to find a *canonical* way of simplifying an arbitrary link surgery presentation to a planar graph presentation, and trace the effects of a Kirby move through the simplification process to see which graph moves are required to implement it.

There is also a relationship with double branched covers, which might be relevant. If all vertex weights $d(v) = 0$, then $M_{\Gamma}$ can be identified (after doing surgery on an essential 2-sphere) with the double cover branched over a link $Z \subset S^3$, whose "checkerboard graph" is $\Gamma$. In this picture, Reidemeister moves on $Z$ can be realized by blowdowns and blowups, and Conway mutations can be realized by graph mutations. Note that these moves do not suffice to relate links with diffeomorphic double branched covers - this might be viewed as evidence in favor of a negative answer to Questions 1 and 2.

Let $X$ be the stack of rank $1$ degree $b$ coherent sheaves $E$ with torsion of length at most 1 on an elliptic curve $C$. Let $Y$ be the stack of pairs $E^{'} \subset E$ such that $E \in X$ and $E/E^{'}\cong \mathcal O_x$ for a point $x \in C$. Let $X_1$ be the stack of rank $1$ degree $b$ coherent sheaves whose torsion has length exactly $1$. Question: is it true (and if so, how to see) that $Y$ is the blow-up of $X \times C$ along $X_1$, where $X_1 \subset X \times C$ by $E \mapsto (E, \rm{Torsion}(E))$?

Let $n$ be a positive integer (assume $n$ is prime for simplicity), and let $x_k = \pm1$, for $k = 0,1,2,..., n-1$. Let $\rho$ be an $n-$th root of unity, I am interested in lower bounds for the absolute value of sums of the form:

$$S_n = \sum_{k=0}^{n-1} x_k \rho^k$$ assuming that such sum is not equal to zero (when $n$ is prime this can only happen when all the $x_k$'s have the same sign).

One can obtain an easy lower bound of $\frac{1}{n^{n-1}}$ by multiplying this algebraic integer by all its Galois conjugates, but given that there are $2^n$ sums, I am expecting a better lower bound (hopefully $e^{-Cn}$ for some constant $C$), or maybe is there anything known about the probability $$Pr(|S_n| < e^{-100n})$$ I am expecting this quantity to be exponentially small, I think from an argument in Tao-Vu's paper (https://arxiv.org/abs/1307.4357) related to Nguyen-Vu's optimal Offord-Littlewood inverse Theorem one might be able to show that such probability is smaller than $n^{-C}$ (for any fixed $C$ and $n \to \infty$).

I would be grateful for any information related to sums of this form, similar sums or some understanding in how difficult this question can be.

Thanks!

I would like to construct an atlas for the stack of sheaves E of rank 1 and degree b on an elliptic curve C such that E has torsion of length at most 1. Am I allowed to fix both the determinant L of the sheaf E and the point x on C where E possibly has torsion? In that case I would construct an atlas as PHom(L(-2x),F) where F is the unique nontrivial extension of L(-x) by L(-x). Would this help me to construct an atlas for the original stack? If not, could you suggest a different way to construct an atlas for the original stack?

I came across the following inequality in one of my calculations ($X,Y$ are centered random variables):

$$\operatorname{E}(X^2Y^2)-\operatorname{E}(X^2)\operatorname{E}(Y^2) \geq 2 \operatorname{E}(XY)^2$$

or, written in terms of covariances,

$$\operatorname{Cov}(X^2,Y^2) \geq 2 \operatorname{Cov}(X,Y)^2$$.

If $(X,Y)=(U,V)$ is a two-dimensional centered Gaussian, this becomes an equality and if $(X,Y)=(H_p(U),H_q(V))$, where $(U,V)$ is still a two-dimensional centered Gaussian with $\operatorname{E}(U^2)=\operatorname{E}(V^2)=1$ and $H_k$ denotes the $k$th (probabilists') Hermite polynomial, the inequality above is strict whenever $p,q \geq 2$.

I have a feeling that something like this should be true for arbitrary random variables but couldn't prove it. Does this inequality look familiar to anybody or do you have an idea on how this could be proved/disproved?

Let $X$ be a nonsingular threefold of degree $d$ contained in $\mathbb{P}^4$ and let $P$ be a point on $X$. Is it possible to find a plane $H$ contained in $T_P (X)$ such that $X \cap H$ is not a union of $d$ lines?

Let $G$ be a simple graph with $n$ vertices and $\lambda$ be the largest eigenvalue of its Laplacian operator $L=D-A$. I have some evidence for the following conjecture:

**Conjecture**: If G has diameter $\delta>3$ then $\lambda\leq n-1$.

I need a proof or a counterexample for this conjecture. Does there exist a good general upper-bound for $\lambda$ in terms of $\delta$ which includes the above conjecture (if it is true)?

This is a philosophy:

**step one**: A theorem: suppose $r:\Bbb N\to (0,1)$ is a function given by $r(n)$ is obtained by putting a point at the beginning of $n$ instance $r(34880)=0.34880$ then $r(\Bbb P)$ is dense in the interval $[0.1,1]$. proof

**step two**: a topological space based on the theorem above on odd numbers is made as: Let $Z_1:=\{\pm(2n-1)\mid n\in\Bbb N\}\cup\{0\}$ and $\lt_1$ be a total order relation (not well ordering) on $Z_1$ with: $\begin{cases} \forall m,n\in\Bbb N\\ 2n-1\lt_12m-1 & \text{iff}\quad r(2n-1)\lt r(2m-1),\\ -2n+1\lt_1-2m+1 & \text{iff}\quad r(2n-1)\gt r(2m-1),\\ -2n+1\lt_10\lt_12m-1\end{cases}$

then assume $\mathfrak T$ is a topology on $Z_1$ induced by $\lt_1$ hence $(Z_1,\mathfrak T)$ is a Hausdorff space.

**step three**: importance of prime number theorem: prime number theorem discusses on all prime numbers simultaneously & also contains the limitation concept & the logarithm function.

**step four**: amplifying the theory via algebraic structures: $(Z_1,\star,\circ)$ is an unique factorization domain with: $\begin{cases} \forall m,n\in\Bbb N,\,\forall v\in Z_1,\quad e=0\\ (2m-1)\star(-2m+1)=0\\ (4m-3)\star(4n-3)=4m+4n-5\\ (4m-3)\star(-4n+3)=\begin{cases} 4m-4n+1 & m\lt n\\ 4m-4n-1 & m\gt n\end{cases}\\ (4m-3)\star(4n-1)=4m+4n-3\\ (4m-3)\star(-4n+1)=\begin{cases} 4m-4n-1 & m\le n\\ 4m-4n-3 & m\gt n\end{cases}\\ (-4m+3)\star(-4n+3)=-4m-4n+5\\ (-4m+3)\star(4n-1)=\begin{cases} 4n-4m+1 & m\le n\\ 4n-4m+3 & m\gt n\end{cases}\\ (-4m+3)\star(-4n+1)=-4m-4n+3\\ (4m-1)\star(4n-1)=4m+4n-1\\ (4m-1)\star(-4n+1)=\begin{cases} 4m-4n+1 & m\lt n\\ 4m-4n-1 & m\gt n\end{cases}\\ (-4m+1)\star(-4n+1)=-4m-4n+1\\ 0\circ v=0,\quad1\circ v=v,\quad((-1)\circ v)\star v=0\\ (4m-3)\circ(4n-3)=8mn-4m-4n+1\\ (4m-3)\circ(-4n+3)=-8mn+4m+4n-1\\ (4m-3)\circ(4n-1)=8mn-4n-1\\ (4m-3)\circ(-4n+1)=-8mn+4n+1\\ (-4m+3)\circ(-4n+3)=8mn-4m-4n+1\\ (-4m+3)\circ(4n-1)=-8mn+4n+1\\ (-4m+3)\circ(-4n+1)=8mn-4n-1\\ (4m-1)\circ(4n-1)=8mn-1\\ (4m-1)\circ(-4n+1)=-8mn+1\\ (-4m+1)\circ(-4n+1)=8mn-1\end{cases}$

**step five**: rewriting the algebraic equation of **Goldbach** (in my individual point of view Goldbach is equivalent to the induction axiom, anyhow) in accordance with the UFD:

Guess $1$: $\forall n\in\Bbb N,\,n$ is a prime iff $2n-1$ is an irreducible element in $(Z_1,\star,\circ)$ and we have: $\begin{cases} \forall m,n,r,s\in\Bbb N,\\\\m\pm n=r\qquad\text{iff}\quad(2m-1)\star(\pm(2n-1))=2r-1\\\\m\cdot n=s\qquad\text{iff}\quad(2m-1)\circ(2n-1)=2s-1\quad\text{iff}\\2s-1=(2m-1)\star(2m-1)\star...(2m-1)\,(n\text{ times is summed})\end{cases}$.

Irreducible elements in $(Z_1,\star,\circ)$ except $3$ are of the form $4k-3,\,k\in\Bbb N$.

Guess $2$: $Y:=\{2p-1\mid p\in\Bbb P\setminus\{2\}\}$ is dense in $N_1:=\{n\in Z_1\mid n\gt0\}$.

**Goldbach's conjecture**: $\forall n\in\Bbb N,\,\exists r,s\in\Bbb N$, such that $4n+7=(4r-3)\star(4s-3),$ & $4r-3,4s-3$ are irreducible elements greater than $3$ in $(Z_1,\star,\circ)$, meantime $2r-1,2s-1\in\Bbb P$ & $4n+7$ is of the form $4k-1,\,k\in\Bbb N$.

$\color{red}{\text{step six}}$: analytic number theory: each open set in the topological space $(Z_1,\mathfrak T)$ is equivalent with a mathematical technique in the analytic number theory so we are capable to redefine the topology via some techniques of analytic number theory, and it is worth noting that a subspace topology of the Euclidean topology on $[0.1,1]$ is the same $\mathfrak T$.

**step seven**: the strength of algebraic topology

**step eight** homotopy groups (no idea yet, only I can dream two spheres $S^2$)

$\color{red}{\text{Question}}$: which technique could be planned cited in the step six? (each open set in the topology anchors on a fixed mathematical technique of analytic number theory and on the other side each technique of that type clarifies only an open set so that this relation is two sided).

Thanks a lots in advance.

What are examples of well received mathematical papers in which the author provides detail on how a surprising solution to a problem has been found.

I am especially looking for papers that also document the dead ends of investigation, i.e. ideas that seemed promising but lead nowhere, and where the motivation and inspiration that lead to the right ideas came from.

By "surprising solution" I mean solutions that feel right at first reading and it isn't clear why they haven't been found earlier.

In Tate's famous paper about $p$-divisible groups, for any prime $p$ he asked whether there exist nontrivial $p$-divisible groups over $\mathbb Z$, i.e. $p$-divisible groups which are not a product of powers of $\mu_{p^\infty}$ and $\mathbb Q_p/ \mathbb Z_p $.

In works of V. A. Abrashkin, he claims that there exist nontrivial $p$-divisible groups for $p=2$ and some irregular primes. What about the general case? Do we know a lot of primes $p$ such that every $p$-divisible group over $\mathbb Z$ is trivial?

Motivation: On the other hand, we know there is no abelian scheme over $\mathbb Z$, whose proof involves $p$-divisible groups.

Edit: One answer below suspects the works of V. A. Abrashkin might contain errors, maybe a further reference is good. Anyway, the key question is what we know for general prime $p$, is there any progress towards Tate's question? With the technique of mordern $p$-adic Hodge theory, maybe such question could be solved.

Here's a fair-sequencing problem that doesn't quite match the usual fair-division problems. I think that, like those, the answer should also be the Thue-Morse sequence ("balanced alternation"), because the same heuristic reasoning that suggests it's the fairest way there works here as well, but the problem doesn't seem to *reduce* to them, so it's not obvious. (See here for more on using Thue-Morse for fair division, or this earlier MathOverflow question.)

Anyway, the problem is stage-striking (as is used in certain competitive video games for stage selection :) ). There are two players and $n+1$ objects ("stages"). The two players have different preferences regarding the stages (ideally, *opposite* preferences, but you'll see below why we don't assume that). The two players will take turns (in some order -- thus the question) removing ("striking") stages that have not already been struck; once only a single stage is left, that stage is selected (both players "get" it). The question, then, is what is the fairest order for stage striking; as mentioned above, I suspect it should be Thue-Morse (one player strikes on 0, the other player strikes on 1), for similar reasons that this is the answer to the old problem of what order to take turns in for fair division.

Of course this raises the question of how we're formalizing this and what we mean by "fair". I'll present here the formalization of the problem that (after discussing this with some other people) I think is best, but answers to other ways of formalizing it would also be OK so long as they don't trivialize the problem.

So -- note that if the players assign the stages opposite values (i.e. they agree about which stages give how much of an advantage to who), as you would expect, then the striking order becomes irrelevant, so long as both players get the same number of strikes; regardless of order, the median stage will be selected. So instead we have to assume the players may disagree about which stages advantage who. Also, since we can only really deal with the order of the stages here, we won't allow them to have arbitrary numeric values as in the fair-division problem; rather we'll assume each player assigns the $n+1$ stages the values $0, 1, \ldots, n$, so that the value of a stage to a player depends only on where it falls in their preference ordering.

Now, since perfect information makes the problem trivial, we'll go all the way in the opposite direction -- each player's preferences are uniformly random; or rather, each player sees the other's preferences as uniformly random. What we want to compare, then, and to make as equal as possible, is the expected value that player 1 gets (when player 2 strikes randomly), vs the expected value that player 2 gets (when player 1 strikes randomly).

(I'm pretty sure that, in this formulation, we can assume that each player always strikes their least-preferred stage at each step, and that there is no advantage from deviating from this. But obviously correct me if I'm wrong there...)

So, for instance, in this model, if $n=2$, then the first player to strike gets an expected value of $3/2$ (they eliminate their least preferred stage and get one of the remaining two at random), while the second player to strike gets an expected value of $5/3$ (they have a $2/3$ chance their most-preferred stage is not eliminated, and a $1/3$ chance they have to settle for the median). So we get a difference of $1/6$. You see?

So the question then is, is the Thue-Morse striking order the fairest? Or is it something else? Is it at least the fairest when $n$ is a power of $2$, even if it might not be otherwise?

**EDIT**: Actually, a thought -- maybe it should be *reverse* Thue-Morse? (As in, if $n=12$, you would go $011001101001$ rather than $011010011001$; you just reverse the sequence, and then, if necessary, swap the roles of the players so as to start with a $0$.) This seems possible because here it's going later, rather than going earlier, that seems to confer an advantage. Of course, if $n$ is a power of two, this distinction is irrelevant, as reversing the sequence would merely swap the roles of the players.

This question already has an answer here:

Consider $u,v∈S^{M-1}\subset \mathbb{C}^M$ to be two independent unit norm random vectors on the $M−1$ dimensional complex sphere $S^{M−1}$. In addition, $u$ follows an isotropic distribution, i.e., $u$ is uniformly distributed on the complex sphere $S^{M−1}$. What is the distribution of $Z=|u⋅v|^2$? This question has been asked before (Distribution of dot product of two unit random vectors), but I get a different result. I get that $Z$ follows Beta$(1,M−1)$ distribution by simulation.

I would like to know if there are interesting examples of affine varieties $S_1, \dots, S_m\subseteq \mathbb{A}^n$ with the following property: for any $J\subseteq [m]$, $\cap_{i\in J}S_i$ is non-empty and smooth.

For example, if $S_i$'s are linear spaces, then any intersection is also a linear space, and non-empty because of the origin point.

I understand that when $m$ is a constant, it should be easy to construct many families. I would be more interested in the case when $m$ is large, say polynomial in $n$.

Lebesgue's differentiation theorem states that if $x$ is a point in $\mathbb{R}^n$ and $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is a Lebesgue integrable function, then the limit of $\frac{\int_B f d\lambda}{\lambda(B)}$ over all balls $B$ centered at $x$ as the diameter of $B$ goes to $0$ is equal almost everywhere to $f(x)$. But if you replace balls with other kinds of set with diameter going to $0$, this need not be true. For instance it need not be true if you replace balls with rectangles.

But my question is, what is known about what collections of sets the Lebesgue Differentiation Theorem holds for in one dimension? We know it holds for bounded intervals. Does it hold for finite unions of bounded open intervals? What about bounded Borel sets? What about bounded Lebesgue measurable sets in general?

Or are all these open problems?

Lebesgue's differentiation theorem states that if $x$ is a point in $\mathbb{R}^n$ and $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is a Lebesgue integrable function, then the limit of $\frac{\int_B f d\lambda}{\lambda(B)}$ over all balls $B$ centered at $x$ as the diameter of $B$ goes to $0$ is equal almost everywhere to $f(x)$. But if you replace balls with other kinds of set with diameter going to $0$, this need not be true. For instance it need not be true if you replace balls with rectangles.

It does hold for squares, though, and more generally for $n$-dimensional cubes. So my question is, is it known if it holds for all regular polygons? And more generally, is it known if it hold for all regular polyhedra and regular polytopes?

I ask "is it known" because a lot of problems in this topic tend to be open.

A chain letter starts when someone sends a letter to five other people. Everyone who receives the letter will send it back to five other people who have never received it or did not send the letter at all. For example, it was known that there were 10,000 people who had forwarded the chain letters before finally stopping and no more people received the letter. How many people receive the letter and how many do not send it out?

My attempt :

I used n = mi + 1 where i is number of internal vertices, m is the arry ( 5-arry in this case) and n is number of vertices.

Now the number of people who received the letter, n = mi+1 n = (5)(10000) + 1 = 50001

Now as far as the number of internal vertices is concerned, i have counted the root as a internal vertex.Thats how the book defined the root.But the solution posted on this website excludes the root and i dont understand why.Plus if you look at the example given on this section (section 10.3),the author has done a similar problem in which he counted the root as an internal vertex.May i know why the root should not be counted ,if thats the correct way ?

For the second part, i used l = n-i,l represents leaves.(number of people who did not send out letters).

l = 50001 - 10000 = 40001. So this answer also turns out to be wrong since i used the vertex i got in part a.

I would appreciate your feedback. I would really like an expert to answer this question. If you are not sure, please don't attempt it.

I have been learning some (topological) dimension theory and have gotten through most of the basic material, at this point, and am about to start looking at papers. In particular, I want to get familiar with the standard counterexamples regarding dimension of products, but I haven't noticed the question in the title addressed.

So my question would be what conditions on $X$ (e.g. separable metric, compact metric, continuum, homology manifold) are sufficient to ensure that if $\text{dim}(X \times X) = 2\text{dim}(X)$, then $\text{dim}(X^n) = n\text{dim}(X)$? The initial assumption could also be weakened from $\text{dim}(X \times X) = 2\text{dim}(X)$ to $\text{dim}(X^m) = m\text{dim}(X)$ for some $1 < m \leq n$, or for various subsets of $\lbrace 2, 3, \dots, n \rbrace$.

In particular, have the dimensions of all the powers of the Pontryagin Surface been computed?

Consider the following two definitions and the formula:

**Definition 1:**

$$\sum^{(m)} n^k = \begin{cases} n^k, &\quad\text{if m=0}\\ \sum\limits^{(m-1)} 1^k +\sum\limits^{(m-1)} 2^k + \cdots + \sum\limits^{(m-1)} n^k &\quad\text{if $m \ge 1$}\\ \end{cases} $$

**Recurrence Definition 2 :**

$$\mu[k,r] = \begin{cases} \text{1,} &\quad\text{if r=0}\\ (r)* \mu[k-1,r-1]+(r+1)*\mu[k-1,r] &\quad\text{if $1 \le r \le k$}\\ \text{0,} &\quad\text{r > k}\\ \end{cases} $$

**Formula** :

$$\sum^{(m)} n^k = \sum_{i=0}^k \binom{n+m-1}{m+i} \mu[k,i]$$

How to prove the formula algebraically for all values of $m$?

Related : Pre-print claiming the formula, Math SE question