How to run a Jupyter notebook with Python code automatically on a daily basis? How to run a Jupyter notebook with Python code automatically on a daily basis? python python

How to run a Jupyter notebook with Python code automatically on a daily basis?


Update
recently I came across papermill which is for executing and parameterizing notebooks.

https://github.com/nteract/papermill

papermill local/input.ipynb s3://bkt/output.ipynb -p alpha 0.6 -p l1_ratio 0.1

This seems better than nbconvert, because you can use parameters. You still have to trigger this command with a scheduler. Below is an example with cron on Ubuntu.


Old Answer

nbconvert --execute

can execute a jupyter notebook, this embedded into a cronjob will do what you want.

Example setup on Ubuntu:

Create yourscript.sh with the following content:

/opt/anaconda/envs/yourenv/bin/jupyter nbconvert \                      --execute \                      --to notebook /path/to/yournotebook.ipynb \                      --output /path/to/yournotebook-output.ipynb

You have more options except --to notebook. I like this option since you have a fully executable "log"-File afterwards.

I recommend using a virtual environment to run your notebook, to avoid that future updates mess with your script. Do not forget to install nbconvert into the environment.

Now create a cronjob, that runs every day e.g. at 5:10 AM, by typing crontab -e in your terminal and add this line:

10 5 * * * /path/to/yourscript.sh


It's better to combine with airflow if you want to have higher quality. I packaged them in a docker image, https://github.com/michaelchanwahyan/datalab.

It is done by modifing an open source package nbparameterize and integrating the passing arguments such as execution_date. Graph can be generated on the fly The output can be updated and saved within inside the notebook.

When it is executed

  • the notebook will be read and inject the parameters
  • the notebook is executed and the output will overwrite the original path

Besides, it also installed and configured common tools such as spark, keras, tensorflow, etc.


Try the SeekWell Chrome Extension. It lets you schedule notebooks to run weekly, daily, hourly or every 5 minutes, right from Jupyter Notebooks. You can also send DataFrames directly to Sheets or Slack if you like.

Here's a demo video, and there is more info in the Chrome Web Store link above as well.

**Disclosure: I'm a SeekWell co-founder