This repository provides the following functionalities: 1. [Read or Download a File From S3 Storage](#1-read-or-download-a-file-from-s3-storage) 2. [Commit For File Processing](#2-commit-for-file-processing) 3. [Query a List Of L1/L2 Fits-Files By Metadata Values](#3-query-a-list-of-l1l2-fits-files-by-metadata-values) 4. [Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state) 5. [Query a Star Catalog](#5-query-a-star-catalog) # 1. Read or Download a File from S3 storage Supported are two distinct ways of reading from s3 storage. 1) [Download to a local file](#下载) 2) [use open() to get a file object](#open-for-read) ## Configuration **astropy 需升级至 5.3** **老写法同时兼容本地nas和云上s3,只要读路径以s3:// 协议开头会自动识别** 如果需要读S3时,需要传入s3的密钥和endpoint等配置,有两种方法可选 ### 方法1 环境变量 执行下面三个环境变量,本文档下面介绍到的所有方法都会尝试读取环境变量以获取配置 ```python s3_options = { 'key': os.getenv('S3_KEY'), 'secret': os.getenv('S3_SECRET'), 'endpoint_url': os.getenv('S3_ENDPOINT_URL') } ``` ### 方法2 每次调用方法时传入 s3_options 在第一个kwargs参数位置指定s3_options, s3_options示例: ```python s3_options = { "key": "minioadmin", "secret": "minioadmin", "endpoint_url": "http://localhost:9000" } ``` ## 从s3下载到本地 ### 下载 ```python from csst_fs import s3_fs # single file s3_fs.get('s3://csst-prod/gaia/test/requirements.txt', 'requirements.txt') s3_fs.get('s3://csst-prod/gaia/test/requirements.txt', 'requirements.txt', s3_options=s3_options) # folder s3_fs.get('s3://csst-prod/gaia/data', './', recursive=True) s3_fs.get('s3://csst-prod/gaia/data', './', s3_options=s3_options, recursive=True) # get file or folder info s3_fs.info('s3://csst-prod/gaia/data') s3_fs.info('s3://csst-prod/gaia/test/requirements.txt', s3_options=s3_options) ``` ### Open for read ```python from csst_fs import s3_fs # open single file (s3 or local) with s3_fs.open('s3://csst-prod/gaia/data') as file: file.read() ``` # 2. Commit For File Processing The function will return a successfull response as soon as the file content is successfully stored and queued for further processing. Otherwise, the function will handle errors appropriately. A successfull response contains a task_id referring to the queued processing task. This can be used in [4. Query a L2 Processing Tasks State](#4-query-a-l2-processing-tasks-state) for querying a processing task's current state. ## Configuration The helper will send HTTP requests to an external API. INGESTION_API_URL env variable should be set accordingly. ## Function: `submit_file_for_ingestion` ```python def submit_file_for_ingestion(file_content: str, file_name: str) -> dict: """ Submit a file's content and file name to the ingestion API. Args: file_content (str): The file's content as string representation file_name (str): The file name for storing the file after ingestion. Returns: dict: A dict containing a task_id, referring the the queued processing task's id. E.g. { "task_id": "5", } """ ``` # 3. Query a List Of L1/L2 Fits-Files By Metadata Values Query for file info by metadata values. ## Configuration The helper will send HTTP requests to an external API. SEARCH_API_URL env variable should be set accordingly. ## Function: `search_with_basic_filters` ```python def search_with_basic_filters( filter: Dict[str, Any], key: List[str], ) -> List[Dict[str, Any]]: """ Query for file info by metadata values. Args: filter: The filter dict described below. key: A list of string values, corresponding to metadata keys that should be included in the output. Returns: A List[Dict] of matching documents containing a file_path value and the keys set as 'key' parameter under 'metadata'. E.g. with key = ["dataset", "instrument", "obs_group", "obs_id"] then returns: [ { "file_path": "CSST_L0/MSC/SCI/60310/10100000000/MS/CSST_MSC_MS_SCIE_20290225043953_20290225044223_10100000000_03_L0_V01.fits", "metadata": { "dataset":"csst-msc-c11-1000sqdeg-wide-test-v2", "instrument":"MSC", "obs_group":"W1", "obs_id":"10200000000" }, }, ] """ ``` ## Filter Syntax All filters are combined with logical AND (every clause must match). 1) String equality ```python filter = { "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2", "obs_type": "WIDE", } ``` 2) Numeric equality and ranges Supported inequality operators are: lt/gt: less/greater than lte/gte: less/greater than or equal ```python filter = { "dataset": "csst-msc-c11-1000sqdeg-wide-test-v2", "ra": { "gte": 250, "lte": 260 }, "qc_status": 0, } ``` 3) Timestamp equality and ranges ```python filter = { "created_date": "2015-08-04T11:00:00", "obs_date": { "gt": "2015-06-01T10:00:00", "lt": "2015-07-01T10:00:00", }, } ``` # 4. Query a L2 Processing Tasks State Query the processing state of a processing task given a L2 task id. ## Configuration The helper will send HTTP requests to an external API. QUERY_TASK_STATE_API_URL env variable should be set accordingly. ## Function: `query_processing_task_state` ```python def query_processing_task_state( task_id: str ) -> Dict[str, Any] """ Query the processing state of a processing task given a L2 task id. Args: task_id: Task id of the L2 processing task Returns: Dictionary of the following format, including information about the current state of the corresponding processing task. The following strings are valid state values: tbd E.g. { "state": "submission_pending", } """ ``` # 5. Query a Star Catalog Query a star catalog by column values given a ra, dec and radius preselection. ## Configuration The helper will send HTTP requests to an external API. STAR_CATALOG_SEARCH_API_URL env variable should be set accordingly. ## Function: `search_with_basic_filters` ```python def query_star_catalog( catalog_name: str, filter: Dict[str, Any], key: List[str], ) -> List[Dict[str, Any]]: """ Query a star catalog by column values given a ra, dec and radius preselection. Args: catalog_name: Name of the star catalog (e.g. msc_l1_mbi_catmix) filter: The filter dict described below. The following keys MUST be set: { "ra": 40.3, "dec": 21.9, "radius": 0.2, } Ra, dec values pinpoint a location, 'radius' defines a radius in [deg] around this point. Only star catalog objects withing this area are considered for subsequent filtering. Setting ranges with (lt, gt, lte, gte) for ra, dec values is not supported. key: A list of string values, corresponding to the colum names that should be present in the return value. Returns: A List[Dict] of matching star catalog objects, containing key-value pairs for the keys set as 'key' parameter. E.g. with key = ["x", "bulge_flux", "ab"] then returns: [ { "x": 995.27, "bulge_flux": "3.2", "ab": 1.2, }, ] """ ``` ## Filter Syntax All filters are combined with logical AND (every clause must match). 1) String equality ```python filter = { "ra": 40.3, "dec": 21.9, "radius": 0.2, "msc_photid": "00101000703350610200001812", "detector": "06", } ``` 2) Numeric equality and ranges Supported inequality operators are: lt/gt: less/greater than lte/gte: less/greater than or equal ```python filter = { "ra": 40.3, "dec": 21.9, "radius": 0.2, "msc_photid": "00101000703350610200001812", "x": { "gte": 996, "lte": 1000, }, "ratio_disk": -9999, } ```