Checking for dead links locally in a static website (using wget?) Checking for dead links locally in a static website (using wget?) python python

Checking for dead links locally in a static website (using wget?)


So I think you are running in the right direction. I would use wget and python as they are two readily available options on many systems. And the good part is that it gets the job done for you. Now what you want is to listen for Serving HTTP on 0.0.0.0 from the stdout of that process.

So I would start the process using something like below

python3 -u -m http.server > ./myserver.log &

Note the -u I have used here for unbuffered output, this is really important

Now next is waiting for this text to appear in myserver.log

timeout 10 awk '/Serving HTTP on 0.0.0.0/{print; exit}' <(tail -f ./myserver.log)

So 10 seconds is your maximum wait time here. And rest is self-explanatory. Next about your kill $pid. I don't think it is a problem, but if you want it to be more like the way a user does it then I would change it to

kill -s SIGINT $pid

This will be equivalent to you processing CTRL+C after launching the program. Also I would handle the SIGINT my bash script as well using something like below

https://unix.stackexchange.com/questions/313644/execute-command-or-function-when-sigint-or-sigterm-is-send-to-the-parent-script/313648

The above basically adds below to top of the bash script to handle you killing the script using CTRL+C or external kill signal

#!/bin/bashexit_script() {    echo "Printing something special!"    echo "Maybe executing other commands!"    trap - SIGINT SIGTERM # clear the trap    kill -- -$$ # Sends SIGTERM to child/sub processes}trap exit_script SIGINT SIGTERM


Tarun Lalwani's answer is correct, and following the advices given there one can write a clean and short shell script (relying on Python and awk). Another solution is to write the script completely in Python, giving a slightly more verbose but arguably cleaner script. The server can be launched in a thread, then the command to check the website is executed, and finally the server is shut down. We don't need to parse the textual output nor to send a signal to an external process anymore. The key parts of the script are therefore:

def start_server(port,                 server_class=HTTPServer,                 handler_class=SimpleHTTPRequestHandler):    server_address = ('', port)    httpd = server_class(server_address, handler_class)    thread = threading.Thread(target=httpd.serve_forever)    thread.start()    return httpddef main(cmd, port):    httpd = start_server(port)    status = subprocess.call(cmd)    httpd.shutdown()    sys.exit(status)

I wrote a slightly more advanced script (with a bit of command-line option parsing on top of this) and published it as: https://gitlab.com/moy/check-links