{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Performance impact of memory layout\n", "\n", "Modern CPUs are so fast that they are often wait for memory to be transferred. In case of memory access with a regular pattern, the CPU will prefetch memory that is likely to be used next. We can optimise this a little bit by arranging the numbers in memory that they are easy to fetch. For 1D data, there is not much that we can do, but for ND data, we have the choice between two layouts\n", "\n", "- x0, y0, ... x1, y1, ...\n", "- x0, x1, ..., y0, y1, ...\n", "\n", "Which one is more efficient is not obvious, so we try both options here. It turns out that the second option is better and that is the one used internally in the builtin cost functions as well." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from iminuit import Minuit\n", "from iminuit.cost import UnbinnedNLL\n", "import numpy as np\n", "from scipy.stats import multivariate_normal\n", "\n", "rng = np.random.default_rng(1)\n", "\n", "xy1 = rng.normal(size=(1_000_000, 2))\n", "xy2 = rng.normal(size=(2, 1_000_000))\n", "\n", "\n", "def cost1(x, y):\n", " return -np.sum(multivariate_normal.logpdf(xy1, (x, y)))\n", "\n", "cost1.errordef = Minuit.LIKELIHOOD\n", "\n", "def cost2(x, y):\n", " return -np.sum(multivariate_normal.logpdf(xy2.T, (x, y)))\n", "\n", "cost2.errordef = Minuit.LIKELIHOOD\n", "\n", "\n", "def logpdf(xy, x, y):\n", " return multivariate_normal.logpdf(xy.T, (x, y))\n", "\n", "cost3 = UnbinnedNLL(xy2, logpdf, log=True)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.68 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "%%timeit -n 1 -r 1\n", "m = Minuit(cost1, x=0, y=0)\n", "m.migrad()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "470 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "%%timeit -n 1 -r 1\n", "m = Minuit(cost2, x=0, y=0)\n", "m.migrad()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "528 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n" ] } ], "source": [ "%%timeit -n 1 -r 1\n", "m = Minuit(cost3, x=0, y=0)\n", "m.migrad()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "`cost2` and `cost3` are using the \"first all `x` then all `y`\" memory layout. `cost3` measures the small overhead incurred by using the built-in cost function `UnbinnedNLL` compared to a hand-tailored one." ] } ], "metadata": { "interpreter": { "hash": "bdbf20ff2e92a3ae3002db8b02bd1dd1b287e934c884beb29a73dced9dbd0fa3" }, "kernelspec": { "display_name": "Python 3.8.12 ('venv': venv)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }