One of the core strengths of cendat is concurrence–it can build, issue, and manage many API queries simultaneously. And up until ver 0.4.4 that capability was especially critical because cendat brute-forced the geographic hierarchy, generating many small queries for deeply nested geographies to deliver data in a predictable and timely fashion. But, that brute force approach didn’t make full use of the API’s wildcard paradigm, which allows for queries that are generalized over parent geographies to return bigger data objects with fewer queries. Making use of this capability in a completely general tool like cendat, however, is tricky because the Census geographic hierarchy is complex and what works for one summary level may not work for another. Thankfully, the Census Bureau provides all of the necessary information, and this latest version can now make use of it.
Let’s take a look at an example JSON entry for the block group summary level in the 2023 ACS 5-year data product:
Here we can see that in order to query for block groups, an API query requires information on the parent state, county, and tract–that’s what’s indicated by "requires": [ "state", "county", "tract" ]. Prior to 0.4.4, invoking get_data() with block groups set as the target geography yielded API calls that explicitly specified the nesting state, county, and tract. Again, these queries were issued concurrently and the individual data objects were small, so this was okay in many cases. But, for national queries, for example–where no nesting geographies are provided to limit the scope–this approach becomes problematic (there are ~85k tracts in the U.S.).
We can do this job with far fewer pulls by utilizing the information in the wildcard and optionalWithWCFor fields. In the above entry we have "wildcard": [ "county", "tract" ] and "optionalWithWCFor": "tract". That means that we really only have to provide state for national results because county and tract can be wildcarded and tract can be dropped altogether unless we need to specify specific tracts. Consider the following example in which we pull total population by block group for the whole nation and see how many of them have contain more than 10k people.
✅ Parameters created for 1 geo-variable combinations.
✅ Found 1 combinations. Building API queries...
✅ Found 1 combinations. Building API queries...
✅ Found 1 combinations. Building API queries...
✅ Found 1 combinations. Building API queries...
✅ Found 1 combinations. Building API queries...
✅ Found 1 combinations. Building API queries...
ℹ️ Making 6 API call(s)...
shape: (4, 5)
┌───────┬────────┬──────┬────────┬────────┐
│ state ┆ n ┆ pct ┆ cumn ┆ cumpct │
╞═══════╪════════╪══════╪════════╪════════╡
│ 06 ┆ 25,607 ┆ 57.8 ┆ 25,607 ┆ 57.8 │
│ 08 ┆ 8 ┆ 0.0 ┆ 25,615 ┆ 57.8 │
│ 48 ┆ 18,638 ┆ 42.0 ┆ 44,253 ┆ 99.8 │
│ 56 ┆ 78 ┆ 0.2 ┆ 44,331 ┆ 100.0 │
└───────┴────────┴──────┴────────┴────────┘
CPU times: user 246 ms, sys: 30.6 ms, total: 276 ms
Wall time: 2.11 s
And we can combine different summary levels as well. Here we get data for states, counties, and county subdivisions:
✅ Geographies set: 'state', 'county' (requires `within` for: state), 'county subdivision' (requires `within` for: county, state)
✅ Parameters created for 3 geo-variable combinations.
✅ Found 1 combinations. Building API queries...
✅ Found 1 combinations. Building API queries...
ℹ️ Discovering parent geographies for: ['state']
✅ Found 52 combinations. Building API queries...
ℹ️ Making 54 API call(s)...
shape: (3, 5)
┌────────┬────────┬──────┬────────┬────────┐
│ sumlev ┆ n ┆ pct ┆ cumn ┆ cumpct │
╞════════╪════════╪══════╪════════╪════════╡
│ 040 ┆ 52 ┆ 0.1 ┆ 52 ┆ 0.1 │
│ 050 ┆ 3,222 ┆ 8.1 ┆ 3,274 ┆ 8.2 │
│ 060 ┆ 36,434 ┆ 91.8 ┆ 39,708 ┆ 100.0 │
└────────┴────────┴──────┴────────┴────────┘
CPU times: user 511 ms, sys: 98.3 ms, total: 609 ms
Wall time: 7.41 s
That’s the last of my planned performance upgrades of core functionality, so (barring patches as of yet unforeseen) up next is feature expansion. First up is a biggie: optionally fetching polygons for requested geographies. Stay tuned…
Citation
BibTeX citation:
@online{couzens2025,
author = {Couzens, Lance},
title = {Cendat Ver 0.4.4},
date = {2025-08-26},
url = {https://mostlyunoriginal.github.io/posts/2025-08-26-cendat-fully-wild/},
langid = {en}
}