README.md 8.44 KB
Newer Older
1
This repository provides the following functionalities:
2
3
4
5
6
1. [Read or Download a File From S3 Storage](#1-read-or-download-a-file-from-s3-storage)
2. [Commit For File Processing](#2-commit-for-file-processing)
3. [Query a List Of L1/L2 Fits-Files By Metadata Values](#3-query-a-list-of-l1l2-fits-files-by-metadata-values)
4. [Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state)
5. [Query a Star Catalog](#5-query-a-star-catalog)
7
8
9

# 1. Read or Download a File from S3 storage
Supported are two distinct ways of reading from s3 storage.
10
1) [Download to a local file](#从s3下载到本地)
11
12
13
2) [use open() to get a file object](#open-for-read)

## Configuration
qi pan's avatar
qi pan committed
14
**astropy 需升级至 5.3**  
15
**老写法同时兼容本地nas和云上s3,只要读路径以s3:// 协议开头会自动识别**  
qi pan's avatar
qi pan committed
16

17
如果需要读S3时,需要传入s3的密钥和endpoint等配置,有两种方法可选
18
The used s3 bucket is configured through an env variable.
qi pan's avatar
qi pan committed
19
20


21
## 从s3下载到本地
qi pan's avatar
qi pan committed
22
```python
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

def get(key: str, local_path: str):
    """
    Download a file/folder from s3 to local storage.

    Args:
        key: s3 key
        local_path: Local path that will be downloaded to.
    """

def info(key: str):
    """
    Get information about s3 file.

    Args:
        key: s3 key
    """

# Example:
qi pan's avatar
qi pan committed
42
43
from csst_fs import s3_fs
# single file
44
s3_fs.get('gaia/test/requirements.txt', 'requirements.txt')
qi pan's avatar
qi pan committed
45
# folder
46
s3_fs.get('gaia/data', './', recursive=True)
47
# get file or folder info
48
s3_fs.info('gaia/data')
qi pan's avatar
qi pan committed
49
50
```

51
## Open for read
52
```python
53
54
55
56
57
58
59
60
61
62
63
64

def open(key: str):
    """
    Get a readonly file object from a file on s3.

    Args:
        key: s3 key
    Returns:
        File object of the s3 file.
    """

# Example:
65
from csst_fs import s3_fs
66
# open single file (s3 or local)
67
with s3_fs.open('gaia/data') as file:
68
69
70
    file.read()
```

Matthias Weidenthaler's avatar
Matthias Weidenthaler committed
71

72
# 2. Commit For File Processing
73

74
Submit a file's content and file name to the ingestion API for further processing.
75
76
The function will return a successfull response as soon as the file content is successfully stored and queued for further processing. Otherwise, the function will handle errors appropriately.
A successfull response contains a task_id referring to the queued processing task. This can be used in [4. Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state) for querying a processing task's current state.
qi pan's avatar
qi pan committed
77

78
## Configuration
79
The helper will send HTTP requests to an external API. CSST_BACKEND_API_URL env variable should be set accordingly.
80

81
## Function: `start_ingestion_task`
qi pan's avatar
qi pan committed
82
83

```python
84
def start_ingestion_task(file_content: str, file_name: str) -> dict:
85
86
87
88
89
90
91
92
93
94
95
96
    """
    Submit a file's content and file name to the ingestion API.

    Args:
        file_content (str): The file's content as string representation
        file_name (str): The file name for storing the file after ingestion.
    Returns:
        dict: A dict containing a task_id, referring the the queued processing task's id.
        E.g. 
        {
            "task_id": "5",
        }
97
98
    Raises:
        RuntimeError: If the ingestion API or data upload fails after retries.
99
    """
qi pan's avatar
qi pan committed
100
101
102
```


103
104
# 3. Query a List Of L1/L2 Fits-Files By Metadata Values
Query for file info by metadata values.
qi pan's avatar
qi pan committed
105

106
## Configuration
107
The helper will send HTTP requests to an external API. CSST_BACKEND_API_URL env variable should be set accordingly.
108

109
## Function: `query_metadata`
110
```python
111
def query_metadata(
112
113
    filter: Dict[str, Any],
    key: List[str],
114
    hdu: int = 0
115
116
117
118
119
120
121
) -> List[Dict[str, Any]]:
    """
    Query for file info by metadata values.

    Args:
        filter: The filter dict described below.
        key: A list of string values, corresponding to metadata keys that should be included in the output.
122
        hdu: The hdu the filter & key arguments refer to. Default is 0. E.g. 0, 1.
123
124
    Returns:
        A List[Dict] of matching documents containing a file_path value and the keys set as 'key' parameter under 'metadata'.
125
        E.g. with key = ["CABEND", "qc_status"]
126
127
128
            then returns:
            [
                {
129
                    "urn": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/10109300100413/CSST_MSC_MS_SCI_20231022050242_20231022050512_10109300100413_14_L1_V01.fits",
130
                    "metadata": {
131
132
                        "CABEND": "59785.82529",
                        "qc_status": "0.0"
133
                    },
134
135
136
137
138
139
140
141
142
143
144
                    "removed": false,
                    "created": 1756284502817,
                    "parentPath": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/10109300100413/",
                    "name": "CSST_MSC_MS_SCI_20231022050242_20231022050512_10109300100413_14_L1_V01.fits",
                    "lastModified": 1756284502817,
                    "grandParentPath": "s3://csst/testing/L1/MSC/msc-v093-r1/kKwmIwzv/SCI/",
                    "platform": "s3",
                    "tags": [
                        "L1"
                    ]
                }
145
146
            ]
    """
147
```
148
149
150
## Filter Syntax
All filters are combined with logical AND (every clause must match).
1) String equality
qi pan's avatar
qi pan committed
151
```python
152
153
154
155
filter = {
    "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2",
    "obs_type": "WIDE",
}
qi pan's avatar
qi pan committed
156
157
```

158
159
160
161
2) Numeric equality and ranges
Supported inequality operators are:
lt/gt: less/greater than
lte/gte: less/greater than or equal
qi pan's avatar
qi pan committed
162
```python
163
164
165
166
167
168
169
170
filter = {
    "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2",
    "ra": {
        "gte": 250,
        "lte": 260
    },
    "qc_status": 0,
}
qi pan's avatar
qi pan committed
171
172
```

173
3) Timestamp equality and ranges
qi pan's avatar
qi pan committed
174
```python
175
176
177
178
179
180
181
filter = {
    "created_date": "2015-08-04T11:00:00",
    "obs_date": {
        "gt": "2015-06-01T10:00:00",
        "lt": "2015-07-01T10:00:00",
    },
}
qi pan's avatar
qi pan committed
182
183
```

184
185
# 4. Query a L2 Processing Tasks State
Query the processing state of a processing task given a L2 task id.
qi pan's avatar
qi pan committed
186

187
## Configuration
188
The helper will send HTTP requests to an external API. CSST_BACKEND_API_URL env variable should be set accordingly.
189

190
## Function: `query_task_state`
191
```python
192
def query_task_state(
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
    task_id: str
) -> Dict[str, Any]
    """
    Query the processing state of a processing task given a L2 task id.

    Args:
        task_id: Task id of the L2 processing task
    Returns:
        Dictionary of the following format, including information about the current state of the corresponding processing task.
        The following strings are valid state values: tbd
        E.g.
            {
                "state": "submission_pending",
            }
"""
208
209
```

210
211
# 5. Query a Star Catalog
Query a star catalog by column values given a ra, dec and radius preselection.
212

213
## Configuration
214
The helper will send HTTP requests to an external API. CSST_BACKEND_API_URL env variable should be set accordingly.
qi pan's avatar
qi pan committed
215

216
## Function: `query_star_catalog`
qi pan's avatar
qi pan committed
217
```python
218
219
220
221
222
223
224
225
226
def query_star_catalog(
    catalog_name: str,
    filter: Dict[str, Any],
    key: List[str],
) -> List[Dict[str, Any]]:
    """
    Query a star catalog by column values given a ra, dec and radius preselection.

    Args:
227
        catalog_name: Name of the star catalog (e.g. csst-msc-l1-mbi-catmix)
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
        filter: The filter dict described below.
            The following keys MUST be set:
            {
                "ra": 40.3,
                "dec": 21.9,
                "radius": 0.2,
            }
            Ra, dec values pinpoint a location, 'radius' defines a radius in [deg] around this point.
            Only star catalog objects withing this area are considered for subsequent filtering.
            Setting ranges with (lt, gt, lte, gte) for ra, dec values is not supported.
        key: A list of string values, corresponding to the colum names that should be present in the return value.
    Returns:
        A List[Dict] of matching star catalog objects, containing key-value pairs for the keys set as 'key' parameter.
        E.g. with key = ["x", "bulge_flux", "ab"]
            then returns:
            [
                {
                    "x": 995.27,
                    "bulge_flux": "3.2",
                    "ab": 1.2,
                },
            ]
    """
qi pan's avatar
qi pan committed
251
```
252
253
254
## Filter Syntax
All filters are combined with logical AND (every clause must match).
1) String equality
qi pan's avatar
qi pan committed
255
```python
256
257
258
259
260
261
262
filter = {
    "ra": 40.3,
    "dec": 21.9,
    "radius": 0.2,
    "msc_photid": "00101000703350610200001812",
    "detector": "06",
}
qi pan's avatar
qi pan committed
263
264
```

265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
2) Numeric equality and ranges
Supported inequality operators are:
lt/gt: less/greater than
lte/gte: less/greater than or equal
```python
filter = {
    "ra": 40.3,
    "dec": 21.9,
    "radius": 0.2,
    "msc_photid": "00101000703350610200001812",
    "x": {
        "gte": 996,
        "lte": 1000,
    },
    "ratio_disk": -9999,
}
```