{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Acceleration with Numba\n", "\n", "We explore how the computation of cost functions can be dramatically accelerated with numba's JIT compiler.\n", "\n", "The run-time of iminuit is usually dominated by the execution time of the cost function. To get good performance, it recommended to use array arthimetic and scipy and numpy functions in the body of the cost function. Python loops should be avoided, but if they are unavoidable, [Numba](https://numba.pydata.org/) can help. Numba can also parallelize numerical calculations to make full use of multi-core CPUs and even do computations on the GPU.\n", "\n", "Note: This tutorial shows how one can generate faster pdfs with Numba. Before you start to write your own pdf, please check whether one is already implemented in the [numba_stats library](https://github.com/HDembinski/numba-stats). If you have a pdf that is not included there, please consider contributing it to numba_stats." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# !pip install matplotlib numpy numba scipy iminuit\n", "from iminuit import Minuit\n", "import numpy as np\n", "import numba as nb\n", "import math\n", "from scipy.stats import expon, norm\n", "from matplotlib import pyplot as plt\n", "from argparse import Namespace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The standard fit in particle physics is the fit of a peak over some smooth background. We generate a Gaussian peak over exponential background, using scipy." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAigAAAGwCAYAAACD0J42AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAxuklEQVR4nO3de1jVVb7H8c8GAVHZmzQBSVBsDMW8FaY7u1iSTKKTj5ZdmMK0fDJwVCZTy7yWmifTbEqny6hZZKfOo1Oampeyi6RoWl7StPRgKeCpZIsOyOV3/pjjPrPTKbZs2Ivt+/U8+3ncv9/i9/uuxolPa63f+tksy7IEAABgkCB/FwAAAPBLBBQAAGAcAgoAADAOAQUAABiHgAIAAIxDQAEAAMYhoAAAAOM08HcBF6KqqkpHjx5VRESEbDabv8sBAADVYFmWTp48qdjYWAUF/foYSb0MKEePHlVcXJy/ywAAABfgyJEjatmy5a+2qZcBJSIiQtI/O2i32/1cDQAAqA6Xy6W4uDj37/FfUy8DytlpHbvdTkABAKCeqc7yDBbJAgAA4xBQAACAcQgoAADAOPVyDQoAwDxVVVU6c+aMv8uAH4WEhCg4ONgn1yKgAABq7MyZMzp06JCqqqr8XQr8LDIyUjExMTXep4yAAgCoEcuydOzYMQUHBysuLu43N+BCYLIsS6dPn1ZRUZEkqUWLFjW6HgEFAFAjFRUVOn36tGJjY9WoUSN/lwM/Cg8PlyQVFRUpKiqqRtM9xFwAQI1UVlZKkkJDQ/1cCUxwNqSWl5fX6DoEFACAT/BuNEi++3tAQAEAAMYhoAAAAOOwSBYAUCtaj19Vp/c7PCvNq/a9evVSly5dNG/evFqpZ8iQITpx4oRWrFhRK9f3h8OHDyshIUE7duxQly5davVejKAAAADjEFAAAKgnLqadegkoAICLVkVFhbKysuRwOHTppZfqiSeekGVZkqSlS5cqOTlZERERiomJ0T333OPehOysPXv2qF+/frLb7YqIiND111+vb7/99rz3ysvLU/PmzfX000+7jz355JOKiopSRESEHnjgAY0fP95j6mTIkCEaMGCAnnrqKcXGxioxMVGStGvXLt18880KDw9Xs2bNNHz4cJWUlLh/rlevXho9erTH/QcMGKAhQ4a4v7du3VozZszQ0KFDFRERofj4eL300kseP7N161Z17dpVDRs2VHJysnbs2FHtf7Y1xRqU86jOvKm3c50AAPMsWbJEw4YN09atW7Vt2zYNHz5c8fHxevDBB1VeXq7p06crMTFRRUVFys7O1pAhQ/T+++9Lkn744QfdcMMN6tWrlzZu3Ci73a7PPvtMFRUV59xn48aNGjhwoGbPnq3hw4dLkt544w099dRTevHFF9WzZ08tW7ZMc+bMUUJCgsfPbtiwQXa7XevWrZMknTp1SqmpqXI6ncrLy1NRUZEeeOABZWVlafHixV71f86cOZo+fboee+wxvfPOOxoxYoRuvPFGJSYmqqSkRP369dMtt9yi119/XYcOHdKoUaMu4J/yhSGgAAAuWnFxcZo7d65sNpsSExO1a9cuzZ07Vw8++KCGDh3qbtemTRvNnz9f3bp1U0lJiZo0aaIXXnhBDodDy5YtU0hIiCTpiiuuOOcey5cv13333adXXnlFd955p/v4888/r2HDhun++++XJE2aNEkffPCBx0iIJDVu3FivvPKKeyO8l19+WaWlpXrttdfUuHFjSdJf/vIX9e/fX08//bSio6Or3f++ffvq4YcfliSNGzdOc+fO1YcffqjExETl5OSoqqpKr776qho2bKgOHTro+++/14gRI6p9/ZpgigcAcNHq0aOHx8ZiTqdTBw4cUGVlpbZv367+/fsrPj5eERERuvHGGyVJ+fn5kqSdO3fq+uuvd4eT89myZYvuuOMOLV261COcSNL+/ft1zTXXeBz75XdJ6tixo8cuvV9//bU6d+7sDieS1LNnT1VVVWn//v1e9F7q1KmT+882m00xMTHuaayvv/5anTp1UsOGDd1tnE6nV9evCQIKAAC/UFpaqtTUVNntdr3xxhvKy8vT8uXLJf3/QtWz7535NZdffrnatWunv/3tbxe89fu/BpHqCgoKcq+lOet89/9luLLZbMa8kZqAAgC4aG3ZssXj++eff662bdtq3759+vHHHzVr1ixdf/31ateu3TkLZDt16qRPPvnkV4PHpZdeqo0bN+rgwYMaPHiwR9vExETl5eV5tP/l9/Np3769vvzyS506dcp97LPPPlNQUJB7EW3z5s117Ngx9/nKykrt3r37N6/9y/t89dVXKi0tdR/7/PPPvbpGTRBQAAAXrfz8fGVnZ2v//v1688039fzzz2vUqFGKj49XaGionn/+eX333Xd69913NX36dI+fzcrKksvl0l133aVt27bpwIEDWrp06TnTLFFRUdq4caP27dunu+++272IduTIkXr11Ve1ZMkSHThwQE8++aS++uqr33yXTXp6uho2bKiMjAzt3r1bH374oUaOHKl7773Xvf7k5ptv1qpVq7Rq1Srt27dPI0aM0IkTJ7z6Z3PPPffIZrPpwQcf1N69e/X+++/rmWee8eoaNcEiWQBAragPTzved999+sc//qFrrrlGwcHBGjVqlIYPHy6bzabFixfrscce0/z583XVVVfpmWee0R/+8Af3zzZr1kwbN27U2LFjdeONNyo4OFhdunRRz549z7lPTEyMNm7cqF69eik9PV05OTlKT0/Xd999p0ceeUSlpaUaPHiwhgwZoq1bt/5qzY0aNdLatWs1atQodevWTY0aNdKgQYP07LPPutsMHTpUX375pe677z41aNBAY8aM0U033eTVP5smTZrovffe00MPPaSuXbsqKSlJTz/9tAYNGuTVdS6UzfrlJFU94HK55HA4VFxcLLvd7vPr85gxAFRfaWmpDh06pISEBI8FlfDeLbfcopiYGC1dutTfpVywX/v74M3vb0ZQAADwg9OnT2vhwoVKTU1VcHCw3nzzTa1fv96938nFjoACAIAf2Gw2vf/++3rqqadUWlqqxMRE/dd//ZdSUlL8XZoRCCgAAPhBeHi41q9f7+8yjMVTPAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQDgF4YMGaIBAwbU+X2nTJmiLl261Pl9TcRjxgCA2jHFUcf3K/bZpZ577rlz3gaMukVAAQDgFxyOOg5XOAdTPACAi9Y777yjjh07Kjw8XM2aNVNKSopOnTp1zhTPyZMnlZ6ersaNG6tFixaaO3euevXqpdGjR7vbtG7dWjNmzNDQoUMVERGh+Ph4vfTSSx73GzdunK644go1atRIbdq00RNPPKHy8vI66m39QkABAFyUjh07prvvvltDhw7V119/rY8++kgDBw4879ROdna2PvvsM7377rtat26dPvnkE33xxRfntJszZ46Sk5O1Y8cOPfzwwxoxYoT279/vPh8REaHFixdr7969eu655/Tyyy9r7ty5tdrP+oopHgDARenYsWOqqKjQwIED1apVK0lSx44dz2l38uRJLVmyRDk5Oerdu7ckadGiRYqNjT2nbd++ffXwww9L+udoydy5c/Xhhx8qMTFRkjRx4kR329atW+uRRx7RsmXL9Oijj/q8f/UdAQUAcFHq3LmzevfurY4dOyo1NVV9+vTR7bffrksuucSj3Xfffafy8nJdc8017mMOh8MdOv5Vp06d3H+22WyKiYlRUVGR+9hbb72l+fPn69tvv1VJSYkqKipkt9troXf1n9dTPD/88IP++Mc/qlmzZgoPD1fHjh21bds293nLsjRp0iS1aNFC4eHhSklJ0YEDBzyu8dNPPyk9PV12u12RkZEaNmyYSkpKat4bAACqKTg4WOvWrdPq1auVlJSk559/XomJiTp06NAFXzMkJMTju81mU1VVlSQpNzdX6enp6tu3r1auXKkdO3bo8ccf15kzZ2rUj0DlVUD5+eef1bNnT4WEhGj16tXau3ev5syZ45E2Z8+erfnz52vhwoXasmWLGjdurNTUVJWWlrrbpKena8+ePVq3bp1Wrlypjz/+WMOHD/ddrwAAqAabzaaePXtq6tSp2rFjh0JDQ7V8+XKPNm3atFFISIjy8vLcx4qLi/XNN994da/NmzerVatWevzxx5WcnKy2bdvqv//7v33Sj0Dk1RTP008/rbi4OC1atMh9LCEhwf1ny7I0b948TZw4Ubfddpsk6bXXXlN0dLRWrFihu+66S19//bXWrFmjvLw8JScnS5Kef/559e3bV88888x55/TKyspUVlbm/u5yubzrJQAAv7BlyxZt2LBBffr0UVRUlLZs2aLjx4+rffv2+uqrr9ztIiIilJGRobFjx6pp06aKiorS5MmTFRQUJJvNVu37tW3bVvn5+Vq2bJm6deumVatWnROG8P+8GkF59913lZycrDvuuENRUVHq2rWrXn75Zff5Q4cOqaCgQCkpKe5jDodD3bt3V25urqR/DnFFRka6w4kkpaSkKCgoSFu2bDnvfWfOnCmHw+H+xMXFedVJAAB+yW636+OPP1bfvn11xRVXaOLEiZozZ45uvfXWc9o+++yzcjqd6tevn1JSUtSzZ0+1b99eDRs2rPb9/vCHP2jMmDHKyspSly5dtHnzZj3xxBO+7FJAsVlebJV39n+I7Oxs3XHHHcrLy9OoUaO0cOFCZWRkaPPmzerZs6eOHj2qFi1auH9u8ODBstlseuuttzRjxgwtWbLE47ErSYqKitLUqVM1YsSIc+57vhGUuLg4FRcX18riotbjV/1mm8Oz0nx+XwCoj0pLS3Xo0CElJCR49Qu7Pjt16pQuu+wyzZkzR8OGDfN3OUb5tb8PLpdLDoejWr+/vZriqaqqUnJysmbMmCFJ6tq1q3bv3u0OKLUlLCxMYWFhtXZ9AAB+zY4dO7Rv3z5dc801Ki4u1rRp0yTJvZwBvufVFE+LFi2UlJTkcax9+/bKz8+XJMXExEiSCgsLPdoUFha6z/3ykStJqqio0E8//eRuAwCAaZ555hl17tzZvdvsJ598oksvvdTfZQUsrwJKz549z5ma+eabb9wb3CQkJCgmJkYbNmxwn3e5XNqyZYucTqckyel06sSJE9q+fbu7zcaNG1VVVaXu3btfcEcAAKgtXbt21fbt21VSUqKffvpJ69atO++mbvAdr6Z4xowZo2uvvVYzZszQ4MGDtXXrVr300kvudw3YbDaNHj1aTz75pNq2bauEhAQ98cQTio2Ndb/ToH379vr973+vBx98UAsXLlR5ebmysrJ01113nfcJHgAAcPHxKqB069ZNy5cv14QJEzRt2jQlJCRo3rx5Sk9Pd7d59NFHderUKQ0fPlwnTpzQddddpzVr1ngslHnjjTeUlZWl3r17KygoSIMGDdL8+fN91ysAQJ3z4pkLBDBf/T3w6ikeU3izCvhC8BQPAFRfeXm5Dh48qNjYWDkcDn+XAz/78ccfVVRUpCuuuELBwcEe52rtKR4AAH6pQYMGatSokY4fP66QkBAFBXn9FhUEAMuydPr0aRUVFSkyMvKccOItAgoAoEZsNptatGihQ4cOsXU7FBkZ6ZOncgkoAIAaCw0NVdu2bXnx3UUuJCSkxiMnZxFQAAA+ERQUdNHsJIvax0QhAAAwDgEFAAAYh4ACAACMwxqUC8ReKQAA1B5GUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQAACAcQgoAADAOAQUAABgHAIKAAAwDgEFAAAYh4ACAACM08DfBQSy1uNX/Wabw7PS6qASAADqF0ZQAACAcQgoAADAOAQUAABgHAIKAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4bHVfD7BlPgDgYsMICgAAMA4BBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQAACAcQgoAADAOOyD4mfV2eMEAICLDSMoAADAOF4FlClTpshms3l82rVr5z5fWlqqzMxMNWvWTE2aNNGgQYNUWFjocY38/HylpaWpUaNGioqK0tixY1VRUeGb3gAAgIDg9RRPhw4dtH79+v+/QIP/v8SYMWO0atUqvf3223I4HMrKytLAgQP12WefSZIqKyuVlpammJgYbd68WceOHdN9992nkJAQzZgxwwfdAQAAgcDrgNKgQQPFxMScc7y4uFivvvqqcnJydPPNN0uSFi1apPbt2+vzzz9Xjx499MEHH2jv3r1av369oqOj1aVLF02fPl3jxo3TlClTFBoaWvMeAQCAes/rNSgHDhxQbGys2rRpo/T0dOXn50uStm/frvLycqWkpLjbtmvXTvHx8crNzZUk5ebmqmPHjoqOjna3SU1Nlcvl0p49e/7tPcvKyuRyuTw+AAAgcHkVULp3767FixdrzZo1WrBggQ4dOqTrr79eJ0+eVEFBgUJDQxUZGenxM9HR0SooKJAkFRQUeISTs+fPnvt3Zs6cKYfD4f7ExcV5UzYAAKhnvJriufXWW91/7tSpk7p3765WrVrpP//zPxUeHu7z4s6aMGGCsrOz3d9dLhchBQCAAFajfVAiIyN1xRVX6ODBg7rlllt05swZnThxwmMUpbCw0L1mJSYmRlu3bvW4xtmnfM63ruWssLAwhYWF1aTUgFed/VQOz0qrg0oAAKi5Gu2DUlJSom+//VYtWrTQ1VdfrZCQEG3YsMF9fv/+/crPz5fT6ZQkOZ1O7dq1S0VFRe4269atk91uV1JSUk1KAQAAAcSrEZRHHnlE/fv3V6tWrXT06FFNnjxZwcHBuvvuu+VwODRs2DBlZ2eradOmstvtGjlypJxOp3r06CFJ6tOnj5KSknTvvfdq9uzZKigo0MSJE5WZmckICQAAcPMqoHz//fe6++679eOPP6p58+a67rrr9Pnnn6t58+aSpLlz5yooKEiDBg1SWVmZUlNT9eKLL7p/Pjg4WCtXrtSIESPkdDrVuHFjZWRkaNq0ab7tFQBzTXFUo01x7dcBwGg2y7IsfxfhLZfLJYfDoeLiYtntdp9fP1Dfj8MaFBiBgAJctLz5/c27eAAAgHEIKAAAwDgEFAAAYJwa7YMCAB6qs74EAKqBgAKgfmKxLRDQmOIBAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcnuIBYB4eVwYueoygAAAA4xBQAACAcQgoAADAOAQUAABgHAIKAAAwDk/xXERaj1/1m20Oz0qrs+ugnuHJGgB1iBEUAABgHAIKAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAADAOAQUAABiHgAIAAIzDywLhoTovAgQAoLYxggIAAIxDQAEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeN2uA31dkU7vCstDqoBABgGkZQAACAcQgoAADAOEzxwGhMAwHAxYkRFAAAYJwajaDMmjVLEyZM0KhRozRv3jxJUmlpqf785z9r2bJlKisrU2pqql588UVFR0e7fy4/P18jRozQhx9+qCZNmigjI0MzZ85UgwYM6AQK3ooMAKiJCx5BycvL01//+ld16tTJ4/iYMWP03nvv6e2339amTZt09OhRDRw40H2+srJSaWlpOnPmjDZv3qwlS5Zo8eLFmjRp0oX3AgAABJQLGrIoKSlRenq6Xn75ZT355JPu48XFxXr11VeVk5Ojm2++WZK0aNEitW/fXp9//rl69OihDz74QHv37tX69esVHR2tLl26aPr06Ro3bpymTJmi0NBQ3/QMAKY4qtGmuPbrAOC1CxpByczMVFpamlJSUjyOb9++XeXl5R7H27Vrp/j4eOXm5kqScnNz1bFjR48pn9TUVLlcLu3Zs+e89ysrK5PL5fL4AACAwOX1CMqyZcv0xRdfKC8v75xzBQUFCg0NVWRkpMfx6OhoFRQUuNv8azg5e/7sufOZOXOmpk6d6m2pAACgnvJqBOXIkSMaNWqU3njjDTVs2LC2ajrHhAkTVFxc7P4cOXKkzu4NAADqnlcBZfv27SoqKtJVV12lBg0aqEGDBtq0aZPmz5+vBg0aKDo6WmfOnNGJEyc8fq6wsFAxMTGSpJiYGBUWFp5z/uy58wkLC5Pdbvf4AACAwOVVQOndu7d27dqlnTt3uj/JyclKT093/zkkJEQbNmxw/8z+/fuVn58vp9MpSXI6ndq1a5eKiorcbdatWye73a6kpCQfdQsAANRnXq1BiYiI0JVXXulxrHHjxmrWrJn7+LBhw5Sdna2mTZvKbrdr5MiRcjqd6tGjhySpT58+SkpK0r333qvZs2eroKBAEydOVGZmpsLCwnzULQAAUJ/5fGe0uXPnKigoSIMGDfLYqO2s4OBgrVy5UiNGjJDT6VTjxo2VkZGhadOm+boUAABQT9ksy7L8XYS3XC6XHA6HiouLa2U9Crug1i+8i6eOVGdPkfqIfVCAOuPN72/2lsdFgZcOAkD9wssCAQCAcQgoAADAOAQUAABgHNagAN4I1JfPBeoCWAD1FgEFCHSEDwD1EAEF8ELr0pzfbHO49ssAgIDHGhQAAGAcAgoAADAOAQUAABiHNSjA/+EVBwBgDgIK6j2CBQAEHqZ4AACAcQgoAADAOAQUAABgHAIKAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAADAOAQUAABiHgAIAAIxDQAEAAMZp4O8CgIAzxVGNNsW1XwcA1GOMoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwWyQI+1ro05zfbHK79MgCgXmMEBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQAACAcQgoAADAOAQUAABgHAIKAAAwDgEFAAAYh51kAVzcpjiq0aa49usA4MGrEZQFCxaoU6dOstvtstvtcjqdWr16tft8aWmpMjMz1axZMzVp0kSDBg1SYWGhxzXy8/OVlpamRo0aKSoqSmPHjlVFRYVvegMAAAKCVyMoLVu21KxZs9S2bVtZlqUlS5botttu044dO9ShQweNGTNGq1at0ttvvy2Hw6GsrCwNHDhQn332mSSpsrJSaWlpiomJ0ebNm3Xs2DHdd999CgkJ0YwZM2qlg4CRqvNf7dW6Dv9lDyAw2SzLsmpygaZNm+o//uM/dPvtt6t58+bKycnR7bffLknat2+f2rdvr9zcXPXo0UOrV69Wv379dPToUUVHR0uSFi5cqHHjxun48eMKDQ2t1j1dLpccDoeKi4tlt9trUv55tR6/yufXBP7V4Yb3+OZC1QkovgpDFzOCIOAT3vz+vuBFspWVlVq2bJlOnTolp9Op7du3q7y8XCkpKe427dq1U3x8vHJzcyVJubm56tixozucSFJqaqpcLpf27Nnzb+9VVlYml8vl8QEAAIHL64Cya9cuNWnSRGFhYXrooYe0fPlyJSUlqaCgQKGhoYqMjPRoHx0drYKCAklSQUGBRzg5e/7suX9n5syZcjgc7k9cXJy3ZQMAgHrE64CSmJionTt3asuWLRoxYoQyMjK0d+/e2qjNbcKECSouLnZ/jhw5Uqv3AwAA/uX1Y8ahoaH63e9+J0m6+uqrlZeXp+eee0533nmnzpw5oxMnTniMohQWFiomJkaSFBMTo61bt3pc7+xTPmfbnE9YWJjCwsK8LRUIfKwvARCgarxRW1VVlcrKynT11VcrJCREGzZscJ/bv3+/8vPz5XQ6JUlOp1O7du1SUVGRu826detkt9uVlJRU01IAAECA8GoEZcKECbr11lsVHx+vkydPKicnRx999JHWrl0rh8OhYcOGKTs7W02bNpXdbtfIkSPldDrVo0cPSVKfPn2UlJSke++9V7Nnz1ZBQYEmTpyozMxMRkgAmIvN3IA651VAKSoq0n333adjx47J4XCoU6dOWrt2rW655RZJ0ty5cxUUFKRBgwaprKxMqampevHFF90/HxwcrJUrV2rEiBFyOp1q3LixMjIyNG3aNN/2CgAA1Gs13gfFH9gHBfWdz/ZBgTkYQQF+kze/v3kXD+AHrUtzfrONaSGmPtYMoP4ioAABjmABoD6q8VM8AAAAvkZAAQAAxiGgAAAA4xBQAACAcQgoAADAODzFAxiKp28AXMwYQQEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGId9UABUa88V/IYpjmq0Ka79OoAAwQgKAAAwDiMoQD3GyAeAQMUICgAAMA4BBQAAGIeAAgAAjENAAQAAxmGRLIB6qToLhA83vKcOKgFQGwgoAHyG0ADAV5jiAQAAxiGgAAAA4zDFA6BOMQ0EoDoYQQEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQAAxmEnWQDGqc5uswACGyMoAADAOAQUAABgHAIKAAAwDmtQAKCuTHFUo01x7dcB1AOMoAAAAOMQUAAAgHGY4gEQsKrzuPLhhvfUQSUAvOXVCMrMmTPVrVs3RUREKCoqSgMGDND+/fs92pSWliozM1PNmjVTkyZNNGjQIBUWFnq0yc/PV1pamho1aqSoqCiNHTtWFRUVNe8NAAAICF4FlE2bNikzM1Off/651q1bp/LycvXp00enTp1ytxkzZozee+89vf3229q0aZOOHj2qgQMHus9XVlYqLS1NZ86c0ebNm7VkyRItXrxYkyZN8l2vAABAvWazLMu60B8+fvy4oqKitGnTJt1www0qLi5W8+bNlZOTo9tvv12StG/fPrVv3165ubnq0aOHVq9erX79+uno0aOKjo6WJC1cuFDjxo3T8ePHFRoaes59ysrKVFZW5v7ucrkUFxen4uJi2e32Cy3/32o9fpXPrwnATMZN8fAUDwKYy+WSw+Go1u/vGq1BKS7+5/+RmjZtKknavn27ysvLlZKS4m7Trl07xcfHuwNKbm6uOnbs6A4nkpSamqoRI0Zoz5496tq16zn3mTlzpqZOnVqTUgHggrGWBah7FxxQqqqqNHr0aPXs2VNXXnmlJKmgoEChoaGKjIz0aBsdHa2CggJ3m38NJ2fPnz13PhMmTFB2drb7+9kRFAAIOOyVAkiqQUDJzMzU7t279emnn/qynvMKCwtTWFhYrd8HAACY4YL2QcnKytLKlSv14YcfqmXLlu7jMTExOnPmjE6cOOHRvrCwUDExMe42v3yq5+z3s20AAMDFzauAYlmWsrKytHz5cm3cuFEJCQke56+++mqFhIRow4YN7mP79+9Xfn6+nE6nJMnpdGrXrl0qKipyt1m3bp3sdruSkpJq0hcAABAgvJriyczMVE5Ojv7+978rIiLCvWbE4XAoPDxcDodDw4YNU3Z2tpo2bSq73a6RI0fK6XSqR48ekqQ+ffooKSlJ9957r2bPnq2CggJNnDhRmZmZTOMAAABJXgaUBQsWSJJ69erlcXzRokUaMmSIJGnu3LkKCgrSoEGDVFZWptTUVL344ovutsHBwVq5cqVGjBghp9Opxo0bKyMjQ9OmTatZTwDAj3z1pE+1rlOdgoB6zquAUp0tUxo2bKgXXnhBL7zwwr9t06pVK73//vve3BoAAFxEeFkgAAAwDgEFAAAYh4ACAACMQ0ABAADGqdG7eACgvqvOUzMA6h4jKAAAwDgEFAAAYBymeACgjvhsOok3HuMiwAgKAAAwDiMoAFDPsB0+LgYEFAAIREwDoZ4joABAAGKUBfUdAQUALlaMssBgBBQAuEgxygKT8RQPAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAADAOAQUAABiHgAIAAIzDu3gAADXDSwdRCwgoAIAaqc5LBzV+1W82OTwrzQfVIFAwxQMAAIxDQAEAAMYhoAAAAOMQUAAAgHFYJAsA+LdaV2NxK1AbCCgAgHqjOoGJp4ECA1M8AADAOAQUAABgHKZ4AAABhWmgwMAICgAAMA4jKAAAIwTqE0OM6FwYAgoA4KJDaDAfUzwAAMA4jKAAAHAejLL4FyMoAADAOAQUAABgHK8Dyscff6z+/fsrNjZWNptNK1as8DhvWZYmTZqkFi1aKDw8XCkpKTpw4IBHm59++knp6emy2+2KjIzUsGHDVFJSUqOOAACAwOH1GpRTp06pc+fOGjp0qAYOHHjO+dmzZ2v+/PlasmSJEhIS9MQTTyg1NVV79+5Vw4YNJUnp6ek6duyY1q1bp/Lyct1///0aPny4cnJyat4jAADqGda7nMtmWZZ1wT9ss2n58uUaMGCApH+OnsTGxurPf/6zHnnkEUlScXGxoqOjtXjxYt111136+uuvlZSUpLy8PCUnJ0uS1qxZo759++r7779XbGzsOfcpKytTWVmZ+7vL5VJcXJyKi4tlt9svtPx/K1CfxQcABD6Tg4zL5ZLD4ajW72+frkE5dOiQCgoKlJKS4j7mcDjUvXt35ebmSpJyc3MVGRnpDieSlJKSoqCgIG3ZsuW81505c6YcDof7ExcX58uyAQCAYXwaUAoKCiRJ0dHRHsejo6Pd5woKChQVFeVxvkGDBmratKm7zS9NmDBBxcXF7s+RI0d8WTYAADBMvdgHJSwsTGFhYf4uAwAA1BGfBpSYmBhJUmFhoVq0aOE+XlhYqC5durjbFBUVefxcRUWFfvrpJ/fPAwCACxMoC259OsWTkJCgmJgYbdiwwX3M5XJpy5YtcjqdkiSn06kTJ05o+/bt7jYbN25UVVWVunfv7styAABAPeX1CEpJSYkOHjzo/n7o0CHt3LlTTZs2VXx8vEaPHq0nn3xSbdu2dT9mHBsb637Sp3379vr973+vBx98UAsXLlR5ebmysrJ01113nfcJHgAAcPHxOqBs27ZNN910k/t7dna2JCkjI0OLFy/Wo48+qlOnTmn48OE6ceKErrvuOq1Zs8a9B4okvfHGG8rKylLv3r0VFBSkQYMGaf78+T7oDgAA+C31YRqoRvug+Is3z1FfCPZBAQBc7GojoPhtHxQAAABfIKAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAADAOAQUAABiHgAIAAIxDQAEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQAACAcQgoAADAOAQUAABgHAIKAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAADAOAQUAABiHgAIAAIxDQAEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQAACAcfwaUF544QW1bt1aDRs2VPfu3bV161Z/lgMAAAzht4Dy1ltvKTs7W5MnT9YXX3yhzp07KzU1VUVFRf4qCQAAGMJvAeXZZ5/Vgw8+qPvvv19JSUlauHChGjVqpL/97W/+KgkAABiigT9ueubMGW3fvl0TJkxwHwsKClJKSopyc3PPaV9WVqaysjL39+LiYkmSy+Wqlfqqyk7XynUBAKgvauN37NlrWpb1m239ElD+53/+R5WVlYqOjvY4Hh0drX379p3TfubMmZo6deo5x+Pi4mqtRgAALmaOebV37ZMnT8rhcPxqG78EFG9NmDBB2dnZ7u9VVVX66aef1KxZM9lsNp/ey+VyKS4uTkeOHJHdbvfptU1A/+q/QO9joPdPCvw+0r/6r7b6aFmWTp48qdjY2N9s65eAcumllyo4OFiFhYUexwsLCxUTE3NO+7CwMIWFhXkci4yMrM0SZbfbA/YvnkT/AkGg9zHQ+ycFfh/pX/1XG338rZGTs/yySDY0NFRXX321NmzY4D5WVVWlDRs2yOl0+qMkAABgEL9N8WRnZysjI0PJycm65pprNG/ePJ06dUr333+/v0oCAACG8FtAufPOO3X8+HFNmjRJBQUF6tKli9asWXPOwtm6FhYWpsmTJ58zpRQo6F/9F+h9DPT+SYHfR/pX/5nQR5tVnWd9AAAA6hDv4gEAAMYhoAAAAOMQUAAAgHEIKAAAwDgElP/z8ccfq3///oqNjZXNZtOKFSv8XZJPzZw5U926dVNERISioqI0YMAA7d+/399l+cyCBQvUqVMn96ZCTqdTq1ev9ndZtWbWrFmy2WwaPXq0v0vxmSlTpshms3l82rVr5++yfOqHH37QH//4RzVr1kzh4eHq2LGjtm3b5u+yfKZ169bn/G9os9mUmZnp79J8orKyUk888YQSEhIUHh6uyy+/XNOnT6/We2Xqi5MnT2r06NFq1aqVwsPDde211yovL88vtdSLre7rwqlTp9S5c2cNHTpUAwcO9Hc5Prdp0yZlZmaqW7duqqio0GOPPaY+ffpo7969aty4sb/Lq7GWLVtq1qxZatu2rSzL0pIlS3Tbbbdpx44d6tChg7/L86m8vDz99a9/VadOnfxdis916NBB69evd39v0CBw/hX1888/q2fPnrrpppu0evVqNW/eXAcOHNAll1zi79J8Ji8vT5WVle7vu3fv1i233KI77rjDj1X5ztNPP60FCxZoyZIl6tChg7Zt26b7779fDodDf/rTn/xdnk888MAD2r17t5YuXarY2Fi9/vrrSklJ0d69e3XZZZfVbTEWziHJWr58ub/LqFVFRUWWJGvTpk3+LqXWXHLJJdYrr7zi7zJ86uTJk1bbtm2tdevWWTfeeKM1atQof5fkM5MnT7Y6d+7s7zJqzbhx46zrrrvO32XUqVGjRlmXX365VVVV5e9SfCItLc0aOnSox7GBAwda6enpfqrIt06fPm0FBwdbK1eu9Dh+1VVXWY8//nid18MUz0WquLhYktS0aVM/V+J7lZWVWrZsmU6dOhVwr07IzMxUWlqaUlJS/F1KrThw4IBiY2PVpk0bpaenKz8/398l+cy7776r5ORk3XHHHYqKilLXrl318ssv+7usWnPmzBm9/vrrGjp0qM9f6uov1157rTZs2KBvvvlGkvTll1/q008/1a233urnynyjoqJClZWVatiwocfx8PBwffrpp3VeT+CMn6LaqqqqNHr0aPXs2VNXXnmlv8vxmV27dsnpdKq0tFRNmjTR8uXLlZSU5O+yfGbZsmX64osv/DYfXNu6d++uxYsXKzExUceOHdPUqVN1/fXXa/fu3YqIiPB3eTX23XffacGCBcrOztZjjz2mvLw8/elPf1JoaKgyMjL8XZ7PrVixQidOnNCQIUP8XYrPjB8/Xi6XS+3atVNwcLAqKyv11FNPKT093d+l+URERIScTqemT5+u9u3bKzo6Wm+++aZyc3P1u9/9ru4LqvMxm3pAAT7F89BDD1mtWrWyjhw54u9SfKqsrMw6cOCAtW3bNmv8+PHWpZdeau3Zs8ffZflEfn6+FRUVZX355ZfuY4E2xfNLP//8s2W32wNmmi4kJMRyOp0ex0aOHGn16NHDTxXVrj59+lj9+vXzdxk+9eabb1otW7a03nzzTeurr76yXnvtNatp06bW4sWL/V2azxw8eNC64YYbLElWcHCw1a1bNys9Pd1q165dnddCQDmPQA4omZmZVsuWLa3vvvvO36XUut69e1vDhw/3dxk+sXz5cve/MM5+JFk2m80KDg62Kioq/F1irUhOTrbGjx/v7zJ8Ij4+3ho2bJjHsRdffNGKjY31U0W15/Dhw1ZQUJC1YsUKf5fiUy1btrT+8pe/eBybPn26lZiY6KeKak9JSYl19OhRy7Isa/DgwVbfvn3rvAbWoFwkLMtSVlaWli9fro0bNyohIcHfJdW6qqoqlZWV+bsMn+jdu7d27dqlnTt3uj/JyclKT0/Xzp07FRwc7O8Sfa6kpETffvutWrRo4e9SfKJnz57nPNr/zTffqFWrVn6qqPYsWrRIUVFRSktL83cpPnX69GkFBXn+2gwODlZVVZWfKqo9jRs3VosWLfTzzz9r7dq1uu222+q8Btag/J+SkhIdPHjQ/f3QoUPauXOnmjZtqvj4eD9W5huZmZnKycnR3//+d0VERKigoECS5HA4FB4e7ufqam7ChAm69dZbFR8fr5MnTyonJ0cfffSR1q5d6+/SfCIiIuKc9UKNGzdWs2bNAmYd0SOPPKL+/furVatWOnr0qCZPnqzg4GDdfffd/i7NJ8aMGaNrr71WM2bM0ODBg7V161a99NJLeumll/xdmk9VVVVp0aJFysjICKjHxCWpf//+euqppxQfH68OHTpox44devbZZzV06FB/l+Yza9eulWVZSkxM1MGDBzV27Fi1a9dO999/f90XU+djNob68MMPLUnnfDIyMvxdmk+cr2+SrEWLFvm7NJ8YOnSo1apVKys0NNRq3ry51bt3b+uDDz7wd1m1KtDWoNx5551WixYtrNDQUOuyyy6z7rzzTuvgwYP+Lsun3nvvPevKK6+0wsLCrHbt2lkvvfSSv0vyubVr11qSrP379/u7FJ9zuVzWqFGjrPj4eKthw4ZWmzZtrMcff9wqKyvzd2k+89Zbb1lt2rSxQkNDrZiYGCszM9M6ceKEX2qxWVYAbYEHAAACAmtQAACAcQgoAADAOAQUAABgHAIKAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAvzt+/LhiYmI0Y8YM97HNmzcrNDRUGzZs8GNlAPyFlwUCMML777+vAQMGaPPmzUpMTFSXLl1022236dlnn/V3aQD8gIACwBiZmZlav369kpOTtWvXLuXl5SksLMzfZQHwAwIKAGP84x//0JVXXqkjR45o+/bt6tixo79LAuAnrEEBYIxvv/1WR48eVVVVlQ4fPuzvcgD4ESMoAIxw5swZXXPNNerSpYsSExM1b9487dq1S1FRUf4uDYAfEFAAGGHs2LF655139OWXX6pJkya68cYb5XA4tHLlSn+XBsAPmOIB4HcfffSR5s2bp6VLl8putysoKEhLly7VJ598ogULFvi7PAB+wAgKAAAwDiMoAADAOAQUAABgHAIKAAAwDgEFAAAYh4ACAACMQ0ABAADGIaAAAADjEFAAAIBxCCgAAMA4BBQAAGAcAgoAADDO/wJXmefOucAL1AAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "np.random.seed(1) # fix seed\n", "\n", "# true parameters for signal and background\n", "truth = Namespace(n_sig=2000, f_bkg=10, sig=(5.0, 0.5), bkg=(0.0, 4.0))\n", "n_bkg = truth.n_sig * truth.f_bkg\n", "\n", "# make a data set\n", "x = np.empty(truth.n_sig + n_bkg)\n", "\n", "# fill m variables\n", "x[: truth.n_sig] = norm(*truth.sig).rvs(truth.n_sig)\n", "x[truth.n_sig :] = expon(*truth.bkg).rvs(n_bkg)\n", "\n", "# cut a range in x\n", "xrange = np.array((1.0, 9.0))\n", "ma = (xrange[0] < x) & (x < xrange[1])\n", "x = x[ma]\n", "\n", "plt.hist(\n", " (x[truth.n_sig :], x[: truth.n_sig]),\n", " bins=50,\n", " stacked=True,\n", " label=(\"background\", \"signal\"),\n", ")\n", "plt.xlabel(\"x\")\n", "plt.legend();" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "# ideal starting values for iminuit\n", "start = np.array((truth.n_sig, n_bkg, truth.sig[0], truth.sig[1], truth.bkg[1]))\n", "\n", "\n", "# iminuit instance factory, will be called a lot in the benchmarks blow\n", "def m_init(fcn):\n", " m = Minuit(fcn, start, name=(\"ns\", \"nb\", \"mu\", \"sigma\", \"lambd\"))\n", " m.limits = ((0, None), (0, None), None, (0, None), (0, None))\n", " m.errordef = Minuit.LIKELIHOOD\n", " return m" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-103168.78482586428" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# extended likelihood (https://doi.org/10.1016/0168-9002(90)91334-8)\n", "# this version uses numpy and scipy and array arithmetic\n", "def nll(par):\n", " n_sig, n_bkg, mu, sigma, lambd = par\n", " s = norm(mu, sigma)\n", " b = expon(0, lambd)\n", " # normalisation factors are needed for pdfs, since x range is restricted\n", " sn = s.cdf(xrange)\n", " bn = b.cdf(xrange)\n", " sn = sn[1] - sn[0]\n", " bn = bn[1] - bn[0]\n", " return (n_sig + n_bkg) - np.sum(\n", " np.log(s.pdf(x) / sn * n_sig + b.pdf(x) / bn * n_bkg)\n", " )\n", "\n", "\n", "nll(start)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "132 ms ± 2.15 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit -r 3 -n 1\n", "m = m_init(nll) # setup time is negligible\n", "m.migrad();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see whether we can beat that. The code above is already pretty fast, because numpy and scipy routines are fast, and we spend most of the time in those. But these implementations do not parallelize the execution and are not optimised for this particular CPU, unlike numba-jitted functions.\n", "\n", "To use numba, in theory we just need to put the `njit` decorator on top of the function, but often that doesn't work out of the box. numba understands many numpy functions, but no scipy. We must evaluate the code that uses scipy in 'object mode', which is numba-speak for calling into the Python interpreter." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-103168.78482586429" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# first attempt to use numba\n", "@nb.njit(parallel=True)\n", "def nll(par):\n", " n_sig, n_bkg, mu, sigma, lambd = par\n", " with nb.objmode(spdf=\"float64[:]\", bpdf=\"float64[:]\", sn=\"float64\", bn=\"float64\"):\n", " s = norm(mu, sigma)\n", " b = expon(0, lambd)\n", " # normalisation factors are needed for pdfs, since x range is restricted\n", " sn = np.diff(s.cdf(xrange))[0]\n", " bn = np.diff(b.cdf(xrange))[0]\n", " spdf = s.pdf(x)\n", " bpdf = b.pdf(x)\n", " no = n_sig + n_bkg\n", " return no - np.sum(np.log(spdf / sn * n_sig + bpdf / bn * n_bkg))\n", "\n", "\n", "nll(start) # test and warm-up JIT" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "157 ms ± 3.75 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit -r 3 -n 1 m = m_init(nll)\n", "m.migrad()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is even a bit slower. :( Let's break the original function down by parts to see why." ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "475 µs ± 3.24 µs per loop (mean ± std. dev. of 3 runs, 100 loops each)\n", "451 µs ± 5.18 µs per loop (mean ± std. dev. of 3 runs, 500 loops each)\n", "66.3 µs ± 219 ns per loop (mean ± std. dev. of 3 runs, 1,000 loops each)\n" ] } ], "source": [ "# let's time the body of the function\n", "n_sig, n_bkg, mu, sigma, lambd = start\n", "s = norm(mu, sigma)\n", "b = expon(0, lambd)\n", "# normalisation factors are needed for pdfs, since x range is restricted\n", "sn = np.diff(s.cdf(xrange))[0]\n", "bn = np.diff(b.cdf(xrange))[0]\n", "spdf = s.pdf(x)\n", "bpdf = b.pdf(x)\n", "\n", "%timeit -r 3 -n 100 norm(*start[2:4]).pdf(x)\n", "%timeit -r 3 -n 500 expon(0, start[4]).pdf(x)\n", "%timeit -r 3 -n 1000 n_sig + n_bkg - np.sum(np.log(spdf / sn * n_sig + bpdf / bn * n_bkg))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most of the time is spend in `norm` and `expon` which numba could not accelerate and the total time is dominated by the slowest part.\n", "\n", "This, unfortunately, means we have to do much more manual work to make the function faster, since we have to replace the scipy routines with Python code that numba can accelerate and run in parallel." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-103168.78482586429" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# when parallel is enabled, also enable associative math\n", "kwd = {\"parallel\": True, \"fastmath\": {\"reassoc\", \"contract\", \"arcp\"}}\n", "\n", "\n", "@nb.njit(**kwd)\n", "def sum_log(fs, spdf, fb, bpdf):\n", " return np.sum(np.log(fs * spdf + fb * bpdf))\n", "\n", "\n", "@nb.njit(**kwd)\n", "def norm_pdf(x, mu, sigma):\n", " invs = 1.0 / sigma\n", " z = (x - mu) * invs\n", " invnorm = 1 / np.sqrt(2 * np.pi) * invs\n", " return np.exp(-0.5 * z ** 2) * invnorm\n", "\n", "\n", "@nb.njit(**kwd)\n", "def nb_erf(x):\n", " y = np.empty_like(x)\n", " for i in nb.prange(len(x)):\n", " y[i] = math.erf(x[i])\n", " return y\n", "\n", "\n", "@nb.njit(**kwd)\n", "def norm_cdf(x, mu, sigma):\n", " invs = 1.0 / (sigma * np.sqrt(2))\n", " z = (x - mu) * invs\n", " return 0.5 * (1 + nb_erf(z))\n", "\n", "\n", "@nb.njit(**kwd)\n", "def expon_pdf(x, lambd):\n", " inv_lambd = 1.0 / lambd\n", " return inv_lambd * np.exp(-inv_lambd * x)\n", "\n", "\n", "@nb.njit(**kwd)\n", "def expon_cdf(x, lambd):\n", " inv_lambd = 1.0 / lambd\n", " return 1.0 - np.exp(-inv_lambd * x)\n", "\n", "\n", "def nll(par):\n", " n_sig, n_bkg, mu, sigma, lambd = par\n", " # normalisation factors are needed for pdfs, since x range is restricted\n", " sn = norm_cdf(xrange, mu, sigma)\n", " bn = expon_cdf(xrange, lambd)\n", " sn = sn[1] - sn[0]\n", " bn = bn[1] - bn[0]\n", " spdf = norm_pdf(x, mu, sigma)\n", " bpdf = expon_pdf(x, lambd)\n", " no = n_sig + n_bkg\n", " return no - sum_log(n_sig / sn, spdf, n_bkg / bn, bpdf)\n", "\n", "\n", "nll(start) # test and warm-up JIT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how well these versions do:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "111 µs ± 18.5 µs per loop (mean ± std. dev. of 5 runs, 100 loops each)\n", "56.9 µs ± 1.02 µs per loop (mean ± std. dev. of 5 runs, 500 loops each)\n", "59.1 µs ± 929 ns per loop (mean ± std. dev. of 5 runs, 1,000 loops each)\n" ] } ], "source": [ "%timeit -r 5 -n 100 norm_pdf(x, *start[2:4])\n", "%timeit -r 5 -n 500 expon_pdf(x, start[4])\n", "%timeit -r 5 -n 1000 sum_log(n_sig / sn, spdf, n_bkg / bn, bpdf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only a minor improvement for `sum_log`, but the pdf calculation was drastically accelerated. Since this was the bottleneck before, we expect also Migrad to finish faster now." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "42.7 ms ± 1.59 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit -r 3 -n 1\n", "m = m_init(nll) # setup time is negligible\n", "m.migrad();" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Success! We managed to get a big speed improvement over the initial code. This is impressive, but it cost us a lot of developer time. This is not always a good trade-off, especially if you consider that library routines are heavily tested, while you always need to test your own code in addition to writing it.\n", "\n", "By putting these faster functions into a library, however, we would only have to pay the developer cost once. You can find those in the [numba-stats](https://github.com/HDembinski/numba-stats) library." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "63.1 µs ± 4.25 µs per loop (mean ± std. dev. of 5 runs, 100 loops each)\n", "63.9 µs ± 1.08 µs per loop (mean ± std. dev. of 5 runs, 500 loops each)\n" ] } ], "source": [ "from numba_stats import norm, expon\n", "\n", "%timeit -r 5 -n 100 norm.pdf(x, *start[2:4])\n", "%timeit -r 5 -n 500 expon.pdf(x, 0, start[4])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The implementation of the normal pdf in numba-stats is even faster than our simple implementation here.\n", "\n", "Try to compile the functions again with `parallel=False` to see how much of the speed increase came from the parallelization and how much from the generally optimized code that `numba` generated for our specific CPU. On my machine, the gain was entirely due to numba.\n", "\n", "In general, it is good advice to not automatically add `parallel=True`, because this comes with an overhead of breaking data into chunks, copy chunks to the individual CPUs and finally merging everything back together. For large arrays, this overhead is negligible, but for small arrays, it can be a net loss.\n", "\n", "So why is `numba` so fast even without parallelization? We can look at the assembly code generated." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "signature: (array(float64, 1d, C), float64, float64)\n", "----------------------------------------------------\n", "\t.section\t__TEXT,__text,regular,pure_instructions\n", "\t.build_version macos, 12, 0\n", "\t.globl\t__ZN8__main__8norm_pdfB3v76B146c8tJTC_2fWQAlbW1yBC0oR6GELEUMELYSPGrIQMVjAQniQcIXKQIMVwoOGKoQDDVQQR1NHAS2FQ9XgSs8w86AhbIsexNXqiUXJBeo6CupFwBRdnJ8MYibn55UUJSaXqNcC7QPGIsRqAA_3d_3dE5ArrayIdLi1E1C7mutable7alignedEdd\n", "\t.p2align\t2\n", "__ZN8__main__8norm_pdfB3v76B146c8tJTC_2fWQAlbW1yBC0oR6GELEUMELYSPGrIQMVjAQniQcIXKQIMVwoOGKoQDDVQQR1NHAS2FQ9XgSs8w86AhbIsexNXqiUXJBeo6CupFwBRdnJ8MYibn55UUJSaXqNcC7QPGIsRqAA_3d_3dE5ArrayIdLi1E1C7mutable7alignedEdd:\n", "\t.cfi_startproc\n", "\tstp\td9, d8, [sp, #-112]!\n", "\tstp\tx28, x27, [sp, #16]\n", "\tstp\tx26, x25, [sp, #32]\n", "\tstp\tx24, x23, [sp, #48]\n", "\tstp\tx22, x21, [sp, #64]\n", "\tstp\tx20, x19, [sp, #80]\n", "\tstp\tx29, x30, [sp, #96]\n", "\tadd\tx29, sp, #96\n", "\tsub\tsp, sp, #368\n", "\tmov\tx19, sp\n", "\t.cfi_def_cfa w29, 16\n", "\t.cfi_offset w30, -8\n", "\t.cfi_offset w29, -16\n", "\t.cfi_offset w19, -24\n", "\t.cfi_offset w20, -32\n", "\t.cfi_offset w21, -40\n", "\t.cfi_offset w22, -48\n", "\t.cfi_offset w23, -56\n", "\t.cfi_offset w24, -64\n", "\t.cfi_offset w25, -72\n", "\t.cfi_offset w26\n", "[...]\n" ] } ], "source": [ "for signature, code in norm_pdf.inspect_asm().items():\n", " print(f\"signature: {signature}\\n{'-'*(len(str(signature)) + 11)}\\n{code[:1000]}\\n[...]\")\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This code section is very long, but the assembly grammar is very simple. Constants start with `.` and `SOMETHING:` is a jump label for the assembly equivalent of `goto`. Everything else is an instruction with its name on the left and the arguments are on the right.\n", "\n", "The SIMD instructions are the interesting commands that operate on multiple values at once. This is where the speed comes from. \n", "- If you are on the **x86** platform, those instructions end with `pd` and `ps`.\n", "- On **arch64**, they contain a dot `.` and some letters/numbers afterwards." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "signature: (array(float64, 1d, C), float64, float64)\n", "----------------------------------------------------\n", "Instructions\n", "add : 44\n", "adds : 2\n", "fmov : 3\n", "fmov.2d : 1\n", "fmul : 5\n", "fmul.2d : 5\n", "fsub : 1\n", "fsub.2d : 2\n", "madd : 6\n", "mov : 108\n", "mov.16b : 6\n", "mov.d : 1\n", "movi.16b : 5\n", "movk : 3\n", "mul : 3\n", "smulh : 1\n", "sub : 23\n", "subs : 1\n" ] } ], "source": [ "import re\n", "from collections import Counter\n", "\n", "for signature, code in norm_pdf.inspect_asm().items():\n", " print(f\"signature: {signature}\\n{'-'*(len(str(signature)) + 11)}\") \n", " instructions = []\n", " for line in code.split(\"\\n\"):\n", " instr = line.strip().split(\"\\t\")[0]\n", " if instr.startswith(\".\"): continue\n", " for match in (\"add\", \"sub\", \"mul\", \"mov\"):\n", " if match in instr:\n", " instructions.append(instr)\n", " c = Counter(instructions)\n", " print(\"Instructions\")\n", " for k in sorted(c):\n", " print(f\"{k:10}: {c[k]:5}\")\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "- `add`: subtract numbers\n", "- `sub`: subtract numbers\n", "- `mul`: multiply numbers\n", "- `mov`: copy values from memory to CPU registers and back\n", "\n", "You can google all the other commands.\n", "\n", "There is a lot of repetition in the assembly code, because the optimizer unrolls loops over subsequences to make them faster. Using an unrolled loop only works if the remaining chunk of data is large enough. Since the compiler does not know the length of the incoming array, it generates sections which handle shorter chunks and all the code to select which section to use. Finally, there is some code which does the translation from and to Python objects with corresponding error handling.\n", "\n", "We don't need to write SIMD instructions by hand, the optimizer does it for us and in a very sophisticated way." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('venv': venv)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "vscode": { "interpreter": { "hash": "bdbf20ff2e92a3ae3002db8b02bd1dd1b287e934c884beb29a73dced9dbd0fa3" } } }, "nbformat": 4, "nbformat_minor": 4 }