ch05_preference.rst

Code Preference
===============

Initially we want our developers to following the
`coding guidelines for astropy-affiliated packages <https://docs.astropy.org/en/latest/development/codeguide.html>`_
as much as possible.
A few important conventions and special cases should be outlined here.

Package preference
------------------

Several packages are favored over others if they can be used to solve the problem under study.
Developers should use them as much as possible.


Standard libraries
    Python standard libraries have the highest priorities, e.g., ``os``, ``re``, etc.
``numpy``, ``scipy``, ``matplotlib``
    The ``BIG 3`` for Python scientific computing.
``astropy`` and its ``astropy``-affiliated packages
    For example, ``astropy.io.fits`` is favored over ``pyfits``.


Parallel computing
------------------

The two packages are preferred for implementing `embarrassingly` parallel computing (without inter-communication).

- ``multiprocessing``: https://docs.python.org/3/library/multiprocessing.html
- ``joblib``: https://joblib.readthedocs.io/en/latest/

.. literalinclude:: preference/example_multiprocessing.py
    :linenos:
    :language: python
    :caption: an example of using ``multiprocessing`` for parallel computing

The output is

.. code-block::

    Total time cost: 5.095193147659302 sec!

.. literalinclude:: preference/example_joblib.py
    :linenos:
    :language: python
    :caption: an example of using ``joblib`` for parallel computing

The output is

.. code-block::

    [Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
    [Parallel(n_jobs=5)]: Done   1 tasks      | elapsed:    5.2s
    [Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    5.2s remaining:    7.8s
    [Parallel(n_jobs=5)]: Done   3 out of   5 | elapsed:    5.2s remaining:    3.5s
    [Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    5.2s remaining:    0.0s
    [Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    5.2s finished
    Total time cost: 5.1958301067352295 sec!

.. tip::
    ``joblib`` is recommended for its highly concise syntax and verbose info -- do every thing within one statement.
    ``n_jobs`` can be set to ``-1`` to use almost all CPUs, ``backend`` can be set to ``multiprocessing``
    to use the backend built by standard library ``multiprocessing``, or ``loky`` for alleged high robustness.
    Visit https://joblib.readthedocs.io/en/latest/ for more info and usages of ``joblib``,
    such as the ``batch_size`` and ``verbose`` parameters.

For parallel computing with inter-communication or distributed computing,
we recommend developers to consider using ``mpi4py``: https://github.com/mpi4py/mpi4py.


Global variables
----------------

Usage of ``global`` should be prohibited.
In most cases, variables should be kept in their default scopes.

Using ``subprocess``
--------------------

``subprocess.run()`` is favored over ``subprocess.Popen()`` and ``os.system()`` as suggested in Python documentation:

- The ``subprocess`` module allows you to spawn new processes, connect to their input/output/error pipes,
  and obtain their return codes. This module intends to replace several older modules and functions:

  .. code-block:: python

     os.system
     os.spawn*

- The recommended approach to invoking ``subprocesses`` is to use the ``run()`` function for all use cases
  it can handle. For more advanced use cases, the underlying ``Popen`` interface can be used directly.


Numpy multithreading
--------------------

Numpy sometime automatically uses multithreading. To see if you are actually using OpenBLAS or MKL, use

.. code-block:: python

    numpy.__config__.show()

To set shut down this feature, use

.. code-block:: python

    export MKL_NUM_THREADS=1
    export NUMEXPR_NUM_THREADS=1
    export OMP_NUM_THREADS=1
    export VECLIB_MAXIMUM_THREADS=1

.. note::
    In most cases, this automatic multithreading does not enhance the performance in practice.
    Therefore, the above setting will be used in CSST pipeline globally.