[Paper] Health+: Empowering Individuals via Unifying Health Data
Source: arXiv - 2602.19319v1
Overview
Health+ is a forward‑looking prototype that flips the current healthcare data model on its head: instead of institutions hoarding fragmented medical records, the system puts individuals in the driver’s seat. By letting users upload, query, and share health information across text, images, and other modalities through intuitive interfaces, Health+ aims to make personal health data both usable and privacy‑preserving.
Key Contributions
- Unified multimodal repository – A single backend that stores heterogeneous health artifacts (lab PDFs, radiology images, wearable logs, doctor notes) in a format that supports fast cross‑modal queries.
- User‑centric interaction layer – Low‑code UI widgets and natural‑language query assistants that let non‑technical users retrieve specific data points (e.g., “show me my cholesterol trend last year”).
- Intelligent recommendation engine – Context‑aware suggestions for data sharing (e.g., automatically offering recent ECGs to a new cardiologist) while respecting consent policies.
- Privacy‑by‑design architecture – End‑to‑end encryption, attribute‑based access control, and audit trails that give users granular control over who sees what.
- Scalable integration pipeline – Plug‑and‑play adapters for common health standards (HL7 FHIR, DICOM, CSV) that automatically normalize incoming records into the unified store.
Methodology
The authors built a prototype stack composed of three layers:
- Ingestion & Normalization – Open‑source adapters parse incoming files (PDF, DICOM, JSON) and map them to a common schema stored in a graph‑based DB (Neo4j) enriched with vector embeddings for similarity search.
- Secure Data Store – All records are encrypted at rest using per‑user keys. Attribute‑Based Encryption (ABE) enforces fine‑grained policies (e.g., “researchers can see anonymized lab results, but not imaging”).
- Interaction & Recommendation – A lightweight front‑end (React + TypeScript) talks to a backend AI service (GPT‑style LLM fine‑tuned on health‑specific intents) that translates natural language queries into graph traversals and vector searches. The recommendation engine runs a rule‑based policy engine plus a collaborative‑filtering model that learns sharing patterns from consent logs.
The system was evaluated through a series of usability workshops with 15 participants (mix of patients, clinicians, and developers) and performance benchmarks on synthetic health datasets (≈1 M records).
Results & Findings
| Metric | Outcome |
|---|---|
| Query latency (multimodal) | Median 420 ms for combined text‑image queries (well under 1 s UI threshold) |
| Data ingestion throughput | 1 200 records/min with parallel adapters |
| User satisfaction (SUS) | 84 / 100 – participants found the natural‑language interface “intuitive” |
| Privacy compliance | Zero policy violations in simulated sharing scenarios; audit logs captured 100 % of access events |
The workshops revealed that participants could retrieve specific health facts (e.g., “last MRI report”) without navigating multiple portals, and they trusted the consent UI enough to share data with a new specialist on the spot.
Practical Implications
- For developers building health‑tech apps – Health+ demonstrates a reusable pattern for multimodal data ingestion (FHIR + DICOM adapters) and a blueprint for integrating LLM‑driven query layers without exposing raw PHI.
- For patient‑facing platforms – The consent UI and audit trail can be dropped into existing patient portals to give users transparent control over data sharing, potentially reducing legal risk under HIPAA/GDPR.
- For research data marketplaces – The attribute‑based encryption model enables “privacy‑preserving data licensing,” where researchers can request anonymized subsets while the system enforces consent automatically.
- For interoperability initiatives – By normalizing to a graph + vector store, Health+ sidesteps the need for a single canonical schema, making it easier to plug into regional health information exchanges (HIEs).
Limitations & Future Work
- Prototype scope – Tested on synthetic data and a small user cohort; real‑world deployment would need to handle orders of magnitude larger volumes and stricter regulatory audits.
- LLM reliability – Natural‑language parsing occasionally mis‑interpreted medical terminology, suggesting a need for domain‑specific fine‑tuning and fallback keyword parsers.
- Consent complexity – The rule engine covers basic “share/not share” policies; more nuanced scenarios (time‑bounded consent, purpose‑limited sharing) remain to be modeled.
- Integration overhead – While adapters exist for common standards, onboarding legacy EMR systems may still require custom ETL pipelines.
Future work outlined by the authors includes scaling the backend to billions of records, extending the recommendation engine with federated learning for cross‑institutional insights, and conducting longitudinal studies to measure health outcomes when patients actively manage their data.
Authors
- Sujaya Maiyya
- Shantanu Sharma
- Avinash Kumar
Paper Information
- arXiv ID: 2602.19319v1
- Categories: cs.MM, cs.AI, cs.CR, cs.DB, cs.DC
- Published: February 22, 2026
- PDF: Download PDF