Feature formulas and units

Feature documentation must make units, formula conventions, upstream dependencies, and exact output labels explicit. The phenotype page remains the authority for page-specific formulas; this page summarizes cross-cutting conventions.

Modality
All modalities
Pipeline step
Feature extraction and output documentation
Outputs
Documented formulas, units, column naming, and missingness rules
Maturity
Source-audited method page

Formula conventions

FeatureCurrent conventionUnit
Stroke volumeEDV - ESVmL
Ejection fraction(EDV - ESV) / EDV * 100%
Cardiac outputstroke volume * heart rate * 1e-3L/min
Cardiac indexcardiac output / BSAL/min/m^2
Indexed volume or massraw value divided by BSAmL/m^2 or g/m^2
Aortic equivalent diameter2 * sqrt(area / pi)mm
Aortic distensibility(area_max - area_min) / (area_min * central pulse pressure) * 1e310^-3/mmHg
Regurgitant fractionbackward flow divided by forward flow%
Native T1 correctionaggregate-fitted blood-pool correctionms

Unit rules

Feature classTypical unitNotes
VolumemLIndexed variants should state denominator
MassgMyocardial density convention belongs on myocardial pages
Ejection/emptying fraction%Derived from phase-specific volumes
Strain%Sign convention and backend must be documented
Strain rate1/sPeak definitions are method-sensitive
Torsion/recoildegree/cm and degree/cm/sLength normalization should be stated
T1msAcquisition- and correction-specific interpretation
FlowmL, mL/s, cm/s, cm^2Depends on phase-contrast conventions

Column naming rule

Every documented output should preserve the pipeline column name. Units should remain visible in the field name when the CSV already contains units, for example LV: V_ED [mL], Native T1: Myocardium-Global [ms], or Aortic Flow: Regurgitant Fraction [%].

Legacy labels are contracts

Do not silently normalize legacy output strings. If a current CSV contains a typo or old naming convention, document it as schema debt and migrate with an explicit versioned plan.

Missing input behavior

Derived features can be conditional. BSA-indexed rows depend on BSA lookup, distensibility rows depend on pressure data, atrial-contribution rows depend on ECG-derived timing, and strain-rate rows depend on peak detection. Public pages should say when a row is conditional rather than implying universal availability.

Source audit

  • Formula conventions were checked against current phenotype pages and implementation sources under src/feature_extraction/**.
  • Output names and schema-debt conventions were checked against docs/data/output_column_inventory.yml and docs/data/phenotype_dictionary.yml.
  • Textbook context boundary: broad clinical textbook context is not surfaced here because this page documents formula and schema conventions.