ch05_preference.rst 3.94 KB
Newer Older
BO ZHANG's avatar
BO ZHANG committed
1
2
Code Preference
===============
3
4
5
6
7
8

Initially we want our developers to following the
`coding guidelines for astropy-affiliated packages <https://docs.astropy.org/en/latest/development/codeguide.html>`_
as much as possible.
A few important conventions and special cases should be outlined here.

BO ZHANG's avatar
BO ZHANG committed
9
10
Package preference
------------------
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

Several packages are favored over others if they can be used to solve the problem under study.
Developers should use them as much as possible.


Standard libraries
    Python standard libraries have the highest priorities, e.g., ``os``, ``re``, etc.
``numpy``, ``scipy``, ``matplotlib``
    The ``BIG 3`` for Python scientific computing.
``astropy`` and its ``astropy``-affiliated packages
    For example, ``astropy.io.fits`` is favored over ``pyfits``.


Parallel computing
------------------

The two packages are preferred for implementing `embarrassingly` parallel computing (without inter-communication).

- ``multiprocessing``: https://docs.python.org/3/library/multiprocessing.html
- ``joblib``: https://joblib.readthedocs.io/en/latest/

BO ZHANG's avatar
BO ZHANG committed
32
.. literalinclude:: preference/example_multiprocessing.py
33
34
35
36
37
38
39
40
41
42
    :linenos:
    :language: python
    :caption: an example of using ``multiprocessing`` for parallel computing

The output is

.. code-block::

    Total time cost: 5.095193147659302 sec!

BO ZHANG's avatar
BO ZHANG committed
43
.. literalinclude:: preference/example_joblib.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
    :linenos:
    :language: python
    :caption: an example of using ``joblib`` for parallel computing

The output is

.. code-block::

    [Parallel(n_jobs=5)]: Using backend LokyBackend with 5 concurrent workers.
    [Parallel(n_jobs=5)]: Done   1 tasks      | elapsed:    5.2s
    [Parallel(n_jobs=5)]: Done   2 out of   5 | elapsed:    5.2s remaining:    7.8s
    [Parallel(n_jobs=5)]: Done   3 out of   5 | elapsed:    5.2s remaining:    3.5s
    [Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    5.2s remaining:    0.0s
    [Parallel(n_jobs=5)]: Done   5 out of   5 | elapsed:    5.2s finished
    Total time cost: 5.1958301067352295 sec!

.. tip::
    ``joblib`` is recommended for its highly concise syntax and verbose info -- do every thing within one statement.
    ``n_jobs`` can be set to ``-1`` to use almost all CPUs, ``backend`` can be set to ``multiprocessing``
    to use the backend built by standard library ``multiprocessing``, or ``loky`` for alleged high robustness.
    Visit https://joblib.readthedocs.io/en/latest/ for more info and usages of ``joblib``,
    such as the ``batch_size`` and ``verbose`` parameters.

For parallel computing with inter-communication or distributed computing,
we recommend developers to consider using ``mpi4py``: https://github.com/mpi4py/mpi4py.
BO ZHANG's avatar
BO ZHANG committed
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115


Global variables
----------------

Usage of ``global`` should be prohibited.
In most cases, variables should be kept in their default scopes.

Using ``subprocess``
--------------------

``subprocess.run()`` is favored over ``subprocess.Popen()`` and ``os.system()`` as suggested in Python documentation:

- The ``subprocess`` module allows you to spawn new processes, connect to their input/output/error pipes,
  and obtain their return codes. This module intends to replace several older modules and functions:

  .. code-block:: python

     os.system
     os.spawn*

- The recommended approach to invoking ``subprocesses`` is to use the ``run()`` function for all use cases
  it can handle. For more advanced use cases, the underlying ``Popen`` interface can be used directly.


Numpy multithreading
--------------------

Numpy sometime automatically uses multithreading. To see if you are actually using OpenBLAS or MKL, use

.. code-block:: python

    numpy.__config__.show()

To set shut down this feature, use

.. code-block:: python

    export MKL_NUM_THREADS=1
    export NUMEXPR_NUM_THREADS=1
    export OMP_NUM_THREADS=1
    export VECLIB_MAXIMUM_THREADS=1

.. note::
    In most cases, this automatic multithreading does not enhance the performance in practice.
    Therefore, the above setting will be used in CSST pipeline globally.