ACTION_ID: raw_to_structured_array
NAME: Raw to Structured Array
CATEGORY: transform
CREDITS: 0

Convert an array of objects (a JSON string, or a variable from a
previous step) into a structured array. Each declared output column
is read from the matching property name on every input item.

INDEX:
  1. Inputs
  2. Outputs
  3. How to configure
  4. Key notes
  5. Where it fits in a workflow
  6. When to use
  7. When not to use

================================================================================
1. INPUTS
================================================================================

array (type: raw_array | string, required)
  A raw_array (see https://floqer.com/docs/concepts.txt §4 Data Types) — i.e.
  unstructured nested data passed through as-is from an upstream step,
  typically a JSON array of objects.
  A stringified JSON array is also accepted.
  ⚠ If this input is not a raw_array (e.g. it's a single
  object, or already a structured_array), this action will likely not
  work. The action expects raw_array specifically.

================================================================================
2. OUTPUTS
================================================================================

list (type: structured_array) — A list of structured rows built from
each item in the input array. Columns are discovered from the actual
shape of the input on first run; fetch the resolved schema via Get
Action Outputs.

================================================================================
3. HOW TO CONFIGURE
================================================================================

⚠ READ THIS FIRST — this action needs THREE API calls before any
downstream action can reference its columns, not the usual one:

  1. Configure Action (PATCH) — tell it which upstream array to read.
  2. Run a row through it (POST .../run or sheet run) — the action
     parses one real input array and stores a sample so it can infer
     the column schema.
  3. Get Action Outputs (GET .../outputs) — surfaces the discovered
     `columns[]` with `{{<this>.list.<column>}}` references that
     downstream actions can wire to.

Until step 3 runs, the output's `columns[]` is empty and any
downstream `{{<this>.list.<column>}}` reference resolves to
`unresolved_reference`. The first row used for discovery also won't
carry the typed list in ITS cell — re-run that row (Run Rows or
Run Action) after step 3 so downstream consumers can read it.

Configuring `array` as a literal JSON string is NOT a substitute
for running a row — schema discovery only triggers at action
execution time. See the numbered walkthrough below ("Discovering
the output schema") for the full sequence.

Configure Action body
(PATCH /api/v1/workflows/{workflow_id}/sheets/{sheet_id}/actions/{action_instance_id}):

{
  "inputs": {
    "array": "{{<upstream_action_instance_id>.<some_raw_array_output>}}"
  }
}

Or paste a literal JSON-encoded array:

{
  "inputs": {
    "array": "[{\"first_name\": \"Ada\", \"company\": \"Floqer\"}, {\"first_name\": \"Grace\", \"company\": \"Acme\"}]"
  }
}

Field-by-field:
  - array   A reference to an upstream raw_array output, or a literal
            JSON-encoded array of objects. Each item becomes one row in
            the output `list`.

Discovering the output schema:

  1. Configure the action with the body above (Configure Action).

  2. Add ONE example row whose upstream value resolves to a
     representative array:

       POST /api/v1/workflows/{workflow_id}/sheets/{sheet_id}/rows
       Body: { "rows": [{ ...inputs that produce the source array... }] }

  3. Run that row:

       POST /api/v1/workflows/{workflow_id}/sheets/{sheet_id}/run
       Body: { "row_ids": ["<the_example_row_id>"] }

     The worker parses the array and stores a sample under
     the action instance so it can be inspected for column
     shape on the next call.

     Note: configuring `array` as a literal JSON string (see
     section 3) is NOT a substitute for running a row. Schema
     discovery only triggers when the action runs, which only
     happens when a row runs. The literal just changes what the
     action reads — it doesn't bypass step 3.

  4. Call Get Action Outputs:

       GET /api/v1/workflows/{workflow_id}/sheets/{sheet_id}/actions/
           {action_instance_id}/outputs

     The response surfaces a single output `list` of type
     `structured_array`. Its `columns[]` lists every discovered column
     with a `structured_array_reference` token shaped
     `{{<action_instance_id>.list.<column>}}` — drop those verbatim
     into downstream actions to address an individual column. The
     handler also persists the discovered column schema back onto the
     action so subsequent runs use stable responses.

  5. ⚠ Important: the example row from step 3 ran BEFORE the column
     schema existed, so its List Rows cell will NOT carry the
     discovered structured_array. To get a populated cell for that
     row, re-run it — Run Rows on the same row_id, or call Run
     Action (`POST /actions/{action_instance_id}/run`) to run rows
     just for this action. Alternatively delete the example row (Delete Rows) 
     and run fresh rows. From this point on every new run sees 
     the persisted column schema and surfaces the typed list in
     `cells[<this_action>].outputs.list`.

================================================================================
4. KEY NOTES
================================================================================

- Output columns are discovered, not declared. Until at least one row
  has run, the `columns[]` on the `list` output is empty. The example
  row used for discovery also won't surface the typed list in its own
  cell — re-run it or use fresh rows after Get Action Outputs. See
  section 3 for the full walkthrough.
- Columns come from the FIRST item in the input array. Per-key types
  are inferred from that one sample (string / number / boolean / array
  / json, plus email / url heuristics on strings) and applied to every
  row downstream.
- `array` accepts a single string: either a `{{ref}}` to an upstream
  raw_array output, or a literal JSON-encoded array. Mixed text +
  reference won't parse.
- To fan out the result into one row per item on a new sheet, follow
  with `push_data_to_sheet`.


================================================================================
5. WHERE IT FITS IN A WORKFLOW
================================================================================

Pattern: http_api_call / llm_web_agents (returns JSON array) ->
raw_to_structured_array -> push_data_to_sheet.

Pattern (CRM enrichment fan-out): salesforce_lookup_record (returns a
large array of contacts, each with hundreds of fields) ->
format_data_using_js_expression (slim each contact down to just the
fields you care about) -> raw_to_structured_array (build a structured
array with named columns from the cleaned objects) ->
push_data_to_sheet (expand into one row per contact on a new sheet
for downstream enrichment).

Pattern (web-scrape fan-out): scrape_web_page_using_firecrawl (e.g.
the speaker / exhibitor list on a conference website) ->
format_data_using_js_expression (extract just the fields you need —
company name, title, name — into a clean
array) -> raw_to_structured_array (turn the cleaned array into a
structured array with named columns) -> push_data_to_sheet (one
row per speaker on a new sheet for downstream per-person analysis
and enrichment).

================================================================================
6. WHEN TO USE
================================================================================

Use raw_to_structured_array when you have a raw JSON array — typically
the output of an upstream step like an HTTP / API call, a CRM lookup,
or a web scrape — that contains far more nested or noisy data than
you want to act on directly. The raw array is great as source data
but bad as working data: you can't push it cleanly to a sheet, run
per-row enrichment on it, or analyse individual records downstream
until the shape is normalised.

This action lifts that raw array into a structured array with named
columns mapped from the property names you care about, so the result
can be expanded into individual rows on a new sheet (via
push_data_to_sheet) and each row enriched, scored, or analysed
independently. In short: use it to turn raw, source-shaped data into
clean, per-record working data for downstream analysis.

================================================================================
7. WHEN NOT TO USE
================================================================================

Have a CSV string instead of a JSON array
  -> csv_to_structured_array_format
     (https://floqer.com/docs/action-detail/csv_to_structured_array_format.txt)

Already have a structured array (e.g. the output of an employee
finder or any other action that already produces a structured_array)
  -> skip this action. Pass the existing structured array directly
     to push_data_to_sheet or downstream consumers; running
     raw_to_structured_array on already-structured data adds no value
     and can flatten or rename columns unnecessarily.

The raw array will be consumed in its entirety by a downstream step
(e.g. passed wholesale into llm_models or llm_web_agents for analysis,
summarisation, or extraction)
  -> skip this action. The LLM step can interpret the raw JSON array
     directly; converting it into a structured array adds an extra
     transformation step with no benefit when the data does not need
     to be extracted into individual rows for human readability or
     per-row processing.

================================================================================

This file is maintained manually. Last updated: 2026-05-04.
Full interactive reference: https://floqer.com/docs/reference
Action catalog: https://floqer.com/docs/action-catalog.txt