README.md 8.54 KB
Newer Older
1
This repository provides the following functionalities:
2
3
4
5
6
1. [Read or Download a File From S3 Storage](#1-read-or-download-a-file-from-s3-storage)
2. [Commit For File Processing](#2-commit-for-file-processing)
3. [Query a List Of L1/L2 Fits-Files By Metadata Values](#3-query-a-list-of-l1l2-fits-files-by-metadata-values)
4. [Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state)
5. [Query a Star Catalog](#5-query-a-star-catalog)
7
8
9

# 1. Read or Download a File from S3 storage
Supported are two distinct ways of reading from s3 storage.
10
1) [Download to a local file](#从s3下载到本地)
11
12
13
2) [use open() to get a file object](#open-for-read)

## Configuration
qi pan's avatar
qi pan committed
14
**astropy 需升级至 5.3**  
Zheng Gaoshan's avatar
Zheng Gaoshan committed
15
**老写法同时兼容本地nas和云上s3,只要读路径以s3:// 协议开头会自动识别**
qi pan's avatar
qi pan committed
16

17
## 从s3下载到本地
qi pan's avatar
qi pan committed
18
```python
19

20
def get_path(remote_path: str, local_path: str):
21
22
23
24
    """
    Download a file/folder from s3 to local storage.

    Args:
25
        remote_path: s3 key
26
27
28
        local_path: Local path that will be downloaded to.
    """

29
def info_path(remote_path: str):
30
    """
31
    Get information about a s3 file.
32
33

    Args:
34
        remote_path: s3 key
35
36
37
    """

# Example:
qi pan's avatar
qi pan committed
38
39
from csst_fs import s3_fs
# single file
40
s3_fs.get_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_01_L0_V01.fits', 'v01.fits')
qi pan's avatar
qi pan committed
41
# folder
42
s3_fs.get_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0', './', recursive=True)
43
# get file or folder info
44
s3_fs.info_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_01_L0_V01.fits')
qi pan's avatar
qi pan committed
45
46
```

47
## Open for read
48
```python
49

50
def open_path(remote_path: str, mode: str = 'r'):
51
    """
52
    Get a readonly file object from a file on s3. Use mode = 'rb' for binary files.
53
54

    Args:
55
56
        remote_path: s3 key
        mode: str = 'r' For binary files: 'rb', default: 'r'
57
58
59
60
61
    Returns:
        File object of the s3 file.
    """

# Example:
62
from csst_fs import s3_fs
63
# open single file (s3 or local)
64
with s3_fs.open_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_01_L0_V01.fits', mode='rb') as file:
65
66
67
    file.read()
```

Matthias Weidenthaler's avatar
Matthias Weidenthaler committed
68

69
# 2. Commit For File Processing
70

71
Submit a file's content and file name to the ingestion API for further processing.
72
73
The function will return a successfull response as soon as the file content is successfully stored and queued for further processing. Otherwise, the function will handle errors appropriately.
A successfull response contains a task_id referring to the queued processing task. This can be used in [4. Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state) for querying a processing task's current state.
qi pan's avatar
qi pan committed
74

75
## Function: `start_ingestion_task`
qi pan's avatar
qi pan committed
76
77

```python
78
def start_ingestion_task(files: List[dict]) -> Dict[str, Any]:
79
    """
80
    Submit a list of file contents and file names for ingestion.
81
82

    Args:
83
84
85
86
87
88
89
90
91
92
        [
            {
                file_name (str): The file name for storing the file after ingestion.
                file_content (bytes): The file's content
            },
            {
                ...
            }
        ]

93
    Returns:
94
95
        dict: A dict containing a task_id referring to the queued processing task as well as a field failed, listing the file names for which ingestion failed.
        Example:
96
97
        {
            "task_id": "5",
98
            "failed": List[str] List of file names for which ingestion failed.
99
        }
100

101
    Raises:
102
        RuntimeError: If committing failed after retries.
103
    """
qi pan's avatar
qi pan committed
104
105
106
```


107
108
# 3. Query a List Of L1/L2 Fits-Files By Metadata Values
Query for file info by metadata values.
qi pan's avatar
qi pan committed
109

110
## Function: `query_metadata`
111
```python
112
def query_metadata(
113
114
    filter: Dict[str, Any],
    key: List[str],
115
    hdu: int = 0
116
117
118
119
120
121
122
) -> List[Dict[str, Any]]:
    """
    Query for file info by metadata values.

    Args:
        filter: The filter dict described below.
        key: A list of string values, corresponding to metadata keys that should be included in the output.
123
        hdu: The hdu the filter & key arguments refer to. Default is 0. E.g. 0, 1.
124
125
    Returns:
        A List[Dict] of matching documents containing a file_path value and the keys set as 'key' parameter under 'metadata'.
126
        E.g. with key = ["CABEND", "qc_status"]
127
128
129
            then returns:
            [
                {
130
                    "urn": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/10109300100413/CSST_MSC_MS_SCI_20231022050242_20231022050512_10109300100413_14_L1_V01.fits",
131
                    "metadata": {
132
133
                        "CABEND": "59785.82529",
                        "qc_status": "0.0"
134
                    },
135
136
137
138
139
140
141
142
143
144
145
                    "removed": false,
                    "created": 1756284502817,
                    "parentPath": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/10109300100413/",
                    "name": "CSST_MSC_MS_SCI_20231022050242_20231022050512_10109300100413_14_L1_V01.fits",
                    "lastModified": 1756284502817,
                    "grandParentPath": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/",
                    "platform": "s3",
                    "tags": [
                        "L1"
                    ]
                }
146
147
            ]
    """
148
```
149
150
151
## Filter Syntax
All filters are combined with logical AND (every clause must match).
1) String equality
qi pan's avatar
qi pan committed
152
```python
153
154
155
156
filter = {
    "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2",
    "obs_type": "WIDE",
}
qi pan's avatar
qi pan committed
157
158
```

159
160
161
162
2) Numeric equality and ranges
Supported inequality operators are:
lt/gt: less/greater than
lte/gte: less/greater than or equal
qi pan's avatar
qi pan committed
163
```python
164
165
166
167
168
169
170
171
filter = {
    "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2",
    "ra": {
        "gte": 250,
        "lte": 260
    },
    "qc_status": 0,
}
qi pan's avatar
qi pan committed
172
173
```

174
175
3) List of values
The queried data should match one of the values in the list. String or number values are possible.
qi pan's avatar
qi pan committed
176
```python
177
filter = {
178
    "NAXIS": [0, 1]
179
}
qi pan's avatar
qi pan committed
180
181
```

182
183
# 4. Query a L2 Processing Tasks State
Query the processing state of a processing task given a L2 task id.
qi pan's avatar
qi pan committed
184

185
## Function: `query_task_state`
186
```python
187
def query_task_state(
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
    task_id: str
) -> Dict[str, Any]
    """
    Query the processing state of a processing task given a L2 task id.

    Args:
        task_id: Task id of the L2 processing task
    Returns:
        Dictionary of the following format, including information about the current state of the corresponding processing task.
        The following strings are valid state values: tbd
        E.g.
            {
                "state": "submission_pending",
            }
"""
203
204
```

205
206
# 5. Query a Star Catalog
Query a star catalog by column values given a ra, dec and radius preselection.
207

208
## Function: `query_star_catalog`
qi pan's avatar
qi pan committed
209
```python
210
211
212
213
214
215
216
217
218
def query_star_catalog(
    catalog_name: str,
    filter: Dict[str, Any],
    key: List[str],
) -> List[Dict[str, Any]]:
    """
    Query a star catalog by column values given a ra, dec and radius preselection.

    Args:
219
        catalog_name: Name of the star catalog (e.g. csst-msc-l1-mbi-catmix)
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
        filter: The filter dict described below.
            The following keys MUST be set:
            {
                "ra": 40.3,
                "dec": 21.9,
                "radius": 0.2,
            }
            Ra, dec values pinpoint a location, 'radius' defines a radius in [deg] around this point.
            Only star catalog objects withing this area are considered for subsequent filtering.
            Setting ranges with (lt, gt, lte, gte) for ra, dec values is not supported.
        key: A list of string values, corresponding to the colum names that should be present in the return value.
    Returns:
        A List[Dict] of matching star catalog objects, containing key-value pairs for the keys set as 'key' parameter.
        E.g. with key = ["x", "bulge_flux", "ab"]
            then returns:
            [
                {
                    "x": 995.27,
                    "bulge_flux": "3.2",
                    "ab": 1.2,
                },
            ]
    """
qi pan's avatar
qi pan committed
243
```
244
245
246
## Filter Syntax
All filters are combined with logical AND (every clause must match).
1) String equality
qi pan's avatar
qi pan committed
247
```python
248
249
250
251
252
253
254
filter = {
    "ra": 40.3,
    "dec": 21.9,
    "radius": 0.2,
    "msc_photid": "00101000703350610200001812",
    "detector": "06",
}
qi pan's avatar
qi pan committed
255
256
```

257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
2) Numeric equality and ranges
Supported inequality operators are:
lt/gt: less/greater than
lte/gte: less/greater than or equal
```python
filter = {
    "ra": 40.3,
    "dec": 21.9,
    "radius": 0.2,
    "msc_photid": "00101000703350610200001812",
    "x": {
        "gte": 996,
        "lte": 1000,
    },
    "ratio_disk": -9999,
}
```