Filters¶
cxg.filters
¶
DatasetFilters
dataclass
¶
Filter criteria for dataset searches.
Repeated values within a single field are combined with OR logic. Different fields are combined with AND logic.
Source code in src/cxg/filters.py
dataset_id(dataset)
¶
collection_id(dataset)
¶
Extract the collection ID from a dataset dict.
Source code in src/cxg/filters.py
collection_name(dataset)
¶
Extract the collection name from a dataset dict.
dataset_title(dataset)
¶
get_ontology_entries(dataset, field_name)
¶
Extract ontology entries for a field as a list of dicts.
Normalizes scalar and dict values into a consistent list-of-dicts format.
Source code in src/cxg/filters.py
get_labels(dataset, field_name)
¶
Extract human-readable labels for an ontology field.
Source code in src/cxg/filters.py
get_cell_count(dataset)
¶
Extract the cell count from a dataset dict.
get_schema_version(dataset)
¶
get_suspension_types(dataset)
¶
Extract suspension types from a dataset dict as a list of strings.
Source code in src/cxg/filters.py
get_tissue_types(dataset)
¶
Extract tissue_type values nested inside tissue entries.
tissue_type is part of each TissueOntologyTermId entry per
CELLxGene schema v7.0.0, not a top-level dataset field. Returns one
value per matching tissue entry; unique_field_counts deduplicates
per dataset.
Source code in src/cxg/filters.py
matches_any_substring(values, needles)
¶
Check if any needle is a case-insensitive substring of any value.
Returns True if needles is empty.
Source code in src/cxg/filters.py
matches_any_exact(values, expected)
¶
Check if any expected value exactly matches any value (case-insensitive).
Returns True if expected is empty.
Source code in src/cxg/filters.py
dataset_matches(dataset, filters)
¶
Test whether a dataset matches all filter criteria.
Returns True if the dataset satisfies every non-empty filter.
Source code in src/cxg/filters.py
apply_filters(datasets, filters)
¶
Filter a list of datasets by the given criteria.
unique_field_counts(datasets, field_name)
¶
Count unique values for a field across datasets.
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
A dict mapping each unique value to the number of datasets |
dict[str, int]
|
containing it. |