Extract Structured Data from Car Listings Using AI in .NET 10

Published: (March 10, 2026 at 01:23 AM EDT)
6 min read
Source: Dev.to

Source: Dev.to

🚗 From Chaos to Structure: Extracting Car Listings with AI

Ever struggled with parsing messy car listings from different sources? Imagine turning this:

“Check out this stylish Honda City 2018 model for sale, clocked only 30,000 km! Single owner, showroom condition, insurance valid. Yours for just ₹6.5 lakh.”

into this:

{
  "Make": "Honda",
  "Model": "City",
  "Year": 2018,
  "Mileage": 30000,
  "Price": 6.5,
  "AvailabilityType": "Sale",
  "Features": ["Single owner", "Showroom condition", "Insurance valid"],
  "OwnerCount": 1
}

Let me show you how to build this in under 100 lines of C# using the GitHub Models API! 🚀


Why This Matters

Car listings come in all shapes and sizes. Whether you’re building a price‑comparison site, marketplace aggregator, or inventory‑management system, you need to:

  • Extract key details (make, model, year, mileage, price)
  • Handle different formats (sale, lease, rent)
  • Deal with missing information gracefully
  • Process data at scale

Manually parsing this is tedious. Let AI do the heavy lifting! 💪


Tools We’ll Use

  • GitHub Models – free access to powerful AI models (no credit card needed)
  • Microsoft.Extensions.AI – unified AI abstraction for .NET
  • .NET 10 – latest and greatest

Setup

# 1️⃣ Create a new console app
dotnet new console -n TextExtraction
cd TextExtraction

# 2️⃣ Add required packages
dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration.UserSecrets

Store Your GitHub Token Securely

dotnet user-secrets init
dotnet user-secrets set "GitHubModels:Token" "your-github-token"

Define the Extraction Model

Create CarDetails.cs:

using System.Text.Json.Serialization;

[JsonConverter(typeof(JsonStringEnumConverter))]
public enum AvailabilityType
{
    Sale,
    Lease,
    Rent
}

public class CarDetails
{
    public string Make { get; set; } = string.Empty;
    public string Model { get; set; } = string.Empty;
    public int? Year { get; set; }
    public double? Mileage { get; set; }
    public double? Price { get; set; }
    public AvailabilityType? AvailabilityType { get; set; }
    public double? PricePerMonth { get; set; }
    public double? PricePerDay { get; set; }
    public string[]? Features { get; set; }
    public string? Location { get; set; }
    public string ShortSummary { get; set; } = string.Empty;
    public int? OwnerCount { get; set; }
}

Notice the nullable types? They let us handle missing data elegantly! ✨


Core Extraction Logic (Program.cs)

using Microsoft.Extensions.AI;
using OpenAI;
using System.ClientModel;
using System.Text.Json;

// -------------------------------------------------
// 1️⃣ Configure the client
// -------------------------------------------------
var configuration = new ConfigurationBuilder()
    .AddUserSecrets()
    .Build();

var credential = new ApiKeyCredential(
    configuration["GitHubModels:Token"] ??
    throw new InvalidOperationException("Token not found")
);

IChatClient chatClient = new OpenAIClient(credential, new OpenAIClientOptions
{
    Endpoint = new Uri("https://models.inference.ai.azure.com")
})
    .GetChatClient("gpt-4o-mini")
    .AsIChatClient();

// -------------------------------------------------
// 2️⃣ Prompt (schema)
// -------------------------------------------------
var prompt = @"Extract the following details from the car listing and return ONLY a valid JSON object:
{
  ""Make"": ""string - car manufacturer/brand"",
  ""Model"": ""string - car model name"",
  ""Year"": number - manufacturing year,
  ""Mileage"": number - kilometers driven,
  ""Price"": number - price in lakhs,
  ""AvailabilityType"": ""string - one of: Sale, Lease, Rent"",
  ""Features"": ""array of strings - notable features"",
  ""ShortSummary"": ""string - brief summary in 10-15 words"",
  ""OwnerCount"": number - previous owners (null if not mentioned)
}
Return only the JSON object, no additional text.";

// -------------------------------------------------
// 3️⃣ Sample listings
// -------------------------------------------------
var carListings = new List
{
    "Honda City 2018 for sale, only 30,000 km! Single owner, showroom condition. ₹6.5 lakh.",
    "Hyundai Creta SX 2020 — premium SUV with sunroof. Monthly lease at ₹22,000.",
    "Toyota Innova Crysta 2019 — spacious 7‑seater, 40,000 km, rent at ₹2,500/day."
};

// -------------------------------------------------
// 4️⃣ Process each listing
// -------------------------------------------------
foreach (var listing in carListings)
{
    var response = await chatClient.GetResponseAsync(
        $"{prompt}\n\nCar Listing:\n{listing}"
    );

    if (response.TryGetResult(out CarDetails? carDetails) && carDetails != null)
    {
        Console.WriteLine($"✅ Extracted: {carDetails.Make} {carDetails.Model}");
        Console.WriteLine(JsonSerializer.Serialize(carDetails,
            new JsonSerializerOptions { WriteIndented = true }));
    }
}

Run the app:

dotnet run

Sample Output

Processing car listings...

✅ Extracted: Honda City
{
  "Make": "Honda",
  "Model": "City",
  "Year": 2018,
  "Mileage": 30000,
  "Price": 6.5,
  "AvailabilityType": "Sale",
  "Features": ["Single owner", "Showroom condition"],
  "OwnerCount": 1
}

✅ Extracted: Hyundai Creta
{
  "Make": "Hyundai",
  "Model": "Creta SX",
  "Year": 2020,
  "AvailabilityType": "Lease",
  "PricePerMonth": 22000,
  "Features": ["Premium SUV", "Sunroof"]
}

Extending the Solution

  1. Add More Fields – fuel type, transmission, color

    public string? FuelType { get; set; }        // Petrol/Diesel/Electric
    public string? Transmission { get; set; }   // Manual/Automatic
    public string? Color { get; set; }
  2. Process Real‑Time Data – pull listings from an API or RSS feed

    var listings = await FetchListingsFromApi("https://api.carmarket.com/listings");
  3. Validate Data

    if (carDetails.Year  DateTime.Now.Year)
    {
        Console.WriteLine("⚠️ Invalid year detected");
    }
  4. Persist to a Database

    await dbContext.CarListings.AddAsync(carDetails);
    await dbContext.SaveChangesAsync();
  5. Swap to a More Capable Model

    .GetChatClient("gpt-4o")   // Higher accuracy, slightly slower

Best Practices

  • Keep temperature low (default works well) for consistent extraction.
  • Be explicit in prompts – define the exact JSON format you need.
  • Use nullable types – not every listing contains every field.
  • Batch process – handle many listings efficiently.
  • Monitor token usage – track costs via response.Usage.

🎯 Ready to turn chaotic car ads into clean, structured data?

Give it a try, tweak the schema to your needs, and let AI do the heavy lifting! 🚀

World Applications


🚀 Use‑Cases

  • 🏪 Marketplace Aggregation – Consolidate listings from multiple sources
  • 💰 Price Intelligence – Track pricing trends across markets
  • 📊 Analytics Dashboards – Build insights from unstructured data
  • 🤖 Chatbots – Power car‑recommendation bots
  • 📱 Mobile Apps – Parse user‑submitted listings

📂 Get the Complete Working Example

Grab it from GitHub:

genai-dotnet-basic_llm_tasks/TextExtraction

The repo includes:

  • ✅ Full source code with comments
  • ✅ 9 example car listings
  • ✅ Configuration‑setup guide
  • ✅ Detailed README

🛠️ What You’ll Learn

  • Using GitHub Models API in .NET
  • Strongly‑typed AI responses with GetResponseAsync
  • Schema‑based extraction with AI
  • Handling unstructured data gracefully
  • Building production‑ready text extraction

🎯 Try Extracting

  • 📄 Resume data – name, skills, experience
  • 🧾 Invoices – vendor, amounts, dates
  • 📧 Emails – sender, subject, key points
  • 🏠 Real‑estate listings
  • 🍕 Restaurant menus – dishes, prices, ingredients

The same pattern works for any text‑extraction task!


💡 What Will You Build?

Drop a comment below! 👇


👍 Like This?

Found this helpful? Give it a ❤️ and follow for more .NET + AI content!


Tags: #dotnet #ai #machinelearning #csharp #github #opensource #textextraction #nlp #automation

GitHub Repo:

0 views
Back to Blog

Related posts

Read more »

Your Agent Is a Small, Low-Stakes HAL

Overview I work with multi‑agent systems that review code, plan architecture, find faults, and critique designs. These systems fail in ways that are quiet and...