How to reset Tensorboard when it tries to reuse a killed Windows PID How to reset Tensorboard when it tries to reuse a killed Windows PID windows windows

How to reset Tensorboard when it tries to reuse a killed Windows PID


Hey—sorry to hear that you’re running into issues. It’s entirelyplausible that everything that you describe is both accurate and myfault. :-)

How in the name of $deity do I get tensorboard to restart from scratch and forget what it thinks it knows about processes, ports etc.? If I could do that I could hack away at residual path etc. issues...

There is a directory called .tensorboard-info in your temp directorythat maintains a best-effort registry of the TensorBoard jobs that wethink are running. When TensorBoard launches (in any manner, includingwith %tensorboard), it writes an “info file” to that directory, andwhen you use %tensorboard we first check to see if a “compatibleinstance” (same working directory and CLI args) is still running, and ifso reuse it instead. When a TensorBoard instance shuts down cleanly, itremoves its own info file. The idea is that as long as TensorBoard isshut down cleanly we should always have an accurate record of whichprocesses are live, and since this registry is in a temp directory anyerrors due to hard shutdowns will be short-lived.

But this is where I erred: coming from the POSIX world and not beingvery familiar with Windows application development, I didn’t realizethat the Windows temp directory is not actually automatically deleted,ever. Therefore, any bookkeeping errors persist indefinitely.

So, the answer to your question is, “remove the .tensorboard-infodirectory located under tempfile.gettempdir()” (preferably when youdon’t have any actively running TensorBoard instances).

There are ways that we can plausibly work around this in TensorBoardcore: see https://github.com/tensorflow/tensorboard/issues/2483 for astart, and I’ve also considered amortized approaches like letting eachTensorBoard instance perform some cleanup of other instances at starttime. We haven’t yet gotten around to implementing these.

Let me know if this is helpful or if it fails to address your question.