README.md 8.37 KB
Newer Older
1
This repository provides the following functionalities:
2
3
4
5
6
1. [Read or Download a File From S3 Storage](#1-read-or-download-a-file-from-s3-storage)
2. [Commit For File Processing](#2-commit-for-file-processing)
3. [Query a List Of L1/L2 Fits-Files By Metadata Values](#3-query-a-list-of-l1l2-fits-files-by-metadata-values)
4. [Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state)
5. [Query a Star Catalog](#5-query-a-star-catalog)
7
8
9

# 1. Read or Download a File from S3 storage
Supported are two distinct ways of reading from s3 storage.
10
1) [Download to a local file](#从s3下载到本地)
11
12
13
2) [use open() to get a file object](#open-for-read)

## Configuration
qi pan's avatar
qi pan committed
14
**astropy 需升级至 5.3**  
Zheng Gaoshan's avatar
Zheng Gaoshan committed
15
**老写法同时兼容本地nas和云上s3,只要读路径以s3:// 协议开头会自动识别**
qi pan's avatar
qi pan committed
16

17
## 从s3下载到本地
qi pan's avatar
qi pan committed
18
```python
19

20
def get_path(remote_path: str, local_path: str):
21
22
23
24
    """
    Download a file/folder from s3 to local storage.

    Args:
25
        remote_path: s3 key
26
27
28
        local_path: Local path that will be downloaded to.
    """

29
def info_path(remote_path: str):
30
    """
31
    Get information about a s3 file.
32
33

    Args:
34
        remote_path: s3 key
35
36
37
    """

# Example:
qi pan's avatar
qi pan committed
38
39
from csst_fs import s3_fs
# single file
40
s3_fs.get_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_01_L0_V01.fits', 'v01.fits')
qi pan's avatar
qi pan committed
41
# folder
42
s3_fs.get_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0', './', recursive=True)
43
# get file or folder info
44
s3_fs.info_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_01_L0_V01.fits')
qi pan's avatar
qi pan committed
45
46
```

47
## Open for read
48
```python
49

50
def open_path(remote_path: str, mode: str = 'r'):
51
    """
52
    Get a readonly file object from a file on s3. Use mode = 'rb' for binary files.
53
54

    Args:
55
56
        remote_path: s3 key
        mode: str = 'r' For binary files: 'rb', default: 'r'
57
58
59
60
61
    Returns:
        File object of the s3 file.
    """

# Example:
62
from csst_fs import s3_fs
63
# open single file (s3 or local)
64
with s3_fs.open_path('projects/csst-pipeline/csst_mbi_sample_dataset/L0/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_01_L0_V01.fits', mode='rb') as file:
65
66
67
    file.read()
```

Matthias Weidenthaler's avatar
Matthias Weidenthaler committed
68

69
# 2. Commit For File Processing
70

71
Submit a file's content and file name to the ingestion API for further processing.
72
73
The function will return a successfull response as soon as the file content is successfully stored and queued for further processing. Otherwise, the function will handle errors appropriately.
A successfull response contains a task_id referring to the queued processing task. This can be used in [4. Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state) for querying a processing task's current state.
qi pan's avatar
qi pan committed
74

75
## Function: `start_ingestion_task`
qi pan's avatar
qi pan committed
76
77

```python
78
def start_ingestion_task(file_content: str, file_name: str) -> dict:
79
80
81
82
83
84
85
86
87
88
89
90
    """
    Submit a file's content and file name to the ingestion API.

    Args:
        file_content (str): The file's content as string representation
        file_name (str): The file name for storing the file after ingestion.
    Returns:
        dict: A dict containing a task_id, referring the the queued processing task's id.
        E.g. 
        {
            "task_id": "5",
        }
91
92
    Raises:
        RuntimeError: If the ingestion API or data upload fails after retries.
93
    """
qi pan's avatar
qi pan committed
94
95
96
```


97
98
# 3. Query a List Of L1/L2 Fits-Files By Metadata Values
Query for file info by metadata values.
qi pan's avatar
qi pan committed
99

100
## Function: `query_metadata`
101
```python
102
def query_metadata(
103
104
    filter: Dict[str, Any],
    key: List[str],
105
    hdu: int = 0
106
107
108
109
110
111
112
) -> List[Dict[str, Any]]:
    """
    Query for file info by metadata values.

    Args:
        filter: The filter dict described below.
        key: A list of string values, corresponding to metadata keys that should be included in the output.
113
        hdu: The hdu the filter & key arguments refer to. Default is 0. E.g. 0, 1.
114
115
    Returns:
        A List[Dict] of matching documents containing a file_path value and the keys set as 'key' parameter under 'metadata'.
116
        E.g. with key = ["CABEND", "qc_status"]
117
118
119
            then returns:
            [
                {
120
                    "urn": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/10109300100413/CSST_MSC_MS_SCI_20231022050242_20231022050512_10109300100413_14_L1_V01.fits",
121
                    "metadata": {
122
123
                        "CABEND": "59785.82529",
                        "qc_status": "0.0"
124
                    },
125
126
127
128
129
130
131
132
133
134
135
                    "removed": false,
                    "created": 1756284502817,
                    "parentPath": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/10109300100413/",
                    "name": "CSST_MSC_MS_SCI_20231022050242_20231022050512_10109300100413_14_L1_V01.fits",
                    "lastModified": 1756284502817,
                    "grandParentPath": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/",
                    "platform": "s3",
                    "tags": [
                        "L1"
                    ]
                }
136
137
            ]
    """
138
```
139
140
141
## Filter Syntax
All filters are combined with logical AND (every clause must match).
1) String equality
qi pan's avatar
qi pan committed
142
```python
143
144
145
146
filter = {
    "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2",
    "obs_type": "WIDE",
}
qi pan's avatar
qi pan committed
147
148
```

149
150
151
152
2) Numeric equality and ranges
Supported inequality operators are:
lt/gt: less/greater than
lte/gte: less/greater than or equal
qi pan's avatar
qi pan committed
153
```python
154
155
156
157
158
159
160
161
filter = {
    "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2",
    "ra": {
        "gte": 250,
        "lte": 260
    },
    "qc_status": 0,
}
qi pan's avatar
qi pan committed
162
163
```

164
3) Timestamp equality and ranges
qi pan's avatar
qi pan committed
165
```python
166
167
168
169
170
171
172
filter = {
    "created_date": "2015-08-04T11:00:00",
    "obs_date": {
        "gt": "2015-06-01T10:00:00",
        "lt": "2015-07-01T10:00:00",
    },
}
qi pan's avatar
qi pan committed
173
174
```

175
176
# 4. Query a L2 Processing Tasks State
Query the processing state of a processing task given a L2 task id.
qi pan's avatar
qi pan committed
177

178
## Function: `query_task_state`
179
```python
180
def query_task_state(
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
    task_id: str
) -> Dict[str, Any]
    """
    Query the processing state of a processing task given a L2 task id.

    Args:
        task_id: Task id of the L2 processing task
    Returns:
        Dictionary of the following format, including information about the current state of the corresponding processing task.
        The following strings are valid state values: tbd
        E.g.
            {
                "state": "submission_pending",
            }
"""
196
197
```

198
199
# 5. Query a Star Catalog
Query a star catalog by column values given a ra, dec and radius preselection.
200

201
## Function: `query_star_catalog`
qi pan's avatar
qi pan committed
202
```python
203
204
205
206
207
208
209
210
211
def query_star_catalog(
    catalog_name: str,
    filter: Dict[str, Any],
    key: List[str],
) -> List[Dict[str, Any]]:
    """
    Query a star catalog by column values given a ra, dec and radius preselection.

    Args:
212
        catalog_name: Name of the star catalog (e.g. csst-msc-l1-mbi-catmix)
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
        filter: The filter dict described below.
            The following keys MUST be set:
            {
                "ra": 40.3,
                "dec": 21.9,
                "radius": 0.2,
            }
            Ra, dec values pinpoint a location, 'radius' defines a radius in [deg] around this point.
            Only star catalog objects withing this area are considered for subsequent filtering.
            Setting ranges with (lt, gt, lte, gte) for ra, dec values is not supported.
        key: A list of string values, corresponding to the colum names that should be present in the return value.
    Returns:
        A List[Dict] of matching star catalog objects, containing key-value pairs for the keys set as 'key' parameter.
        E.g. with key = ["x", "bulge_flux", "ab"]
            then returns:
            [
                {
                    "x": 995.27,
                    "bulge_flux": "3.2",
                    "ab": 1.2,
                },
            ]
    """
qi pan's avatar
qi pan committed
236
```
237
238
239
## Filter Syntax
All filters are combined with logical AND (every clause must match).
1) String equality
qi pan's avatar
qi pan committed
240
```python
241
242
243
244
245
246
247
filter = {
    "ra": 40.3,
    "dec": 21.9,
    "radius": 0.2,
    "msc_photid": "00101000703350610200001812",
    "detector": "06",
}
qi pan's avatar
qi pan committed
248
249
```

250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
2) Numeric equality and ranges
Supported inequality operators are:
lt/gt: less/greater than
lte/gte: less/greater than or equal
```python
filter = {
    "ra": 40.3,
    "dec": 21.9,
    "radius": 0.2,
    "msc_photid": "00101000703350610200001812",
    "x": {
        "gte": 996,
        "lte": 1000,
    },
    "ratio_disk": -9999,
}
```