Extract Structured Data from Car Listings Using AI in .NET 10
Source: Dev.to
🚗 From Chaos to Structure: Extracting Car Listings with AI
Ever struggled with parsing messy car listings from different sources? Imagine turning this:
“Check out this stylish Honda City 2018 model for sale, clocked only 30,000 km! Single owner, showroom condition, insurance valid. Yours for just ₹6.5 lakh.”
into this:
{
"Make": "Honda",
"Model": "City",
"Year": 2018,
"Mileage": 30000,
"Price": 6.5,
"AvailabilityType": "Sale",
"Features": ["Single owner", "Showroom condition", "Insurance valid"],
"OwnerCount": 1
}
Let me show you how to build this in under 100 lines of C# using the GitHub Models API! 🚀
Why This Matters
Car listings come in all shapes and sizes. Whether you’re building a price‑comparison site, marketplace aggregator, or inventory‑management system, you need to:
- Extract key details (make, model, year, mileage, price)
- Handle different formats (sale, lease, rent)
- Deal with missing information gracefully
- Process data at scale
Manually parsing this is tedious. Let AI do the heavy lifting! 💪
Tools We’ll Use
- GitHub Models – free access to powerful AI models (no credit card needed)
- Microsoft.Extensions.AI – unified AI abstraction for .NET
- .NET 10 – latest and greatest
Setup
# 1️⃣ Create a new console app
dotnet new console -n TextExtraction
cd TextExtraction
# 2️⃣ Add required packages
dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
Store Your GitHub Token Securely
dotnet user-secrets init
dotnet user-secrets set "GitHubModels:Token" "your-github-token"
Define the Extraction Model
Create CarDetails.cs:
using System.Text.Json.Serialization;
[JsonConverter(typeof(JsonStringEnumConverter))]
public enum AvailabilityType
{
Sale,
Lease,
Rent
}
public class CarDetails
{
public string Make { get; set; } = string.Empty;
public string Model { get; set; } = string.Empty;
public int? Year { get; set; }
public double? Mileage { get; set; }
public double? Price { get; set; }
public AvailabilityType? AvailabilityType { get; set; }
public double? PricePerMonth { get; set; }
public double? PricePerDay { get; set; }
public string[]? Features { get; set; }
public string? Location { get; set; }
public string ShortSummary { get; set; } = string.Empty;
public int? OwnerCount { get; set; }
}
Notice the nullable types? They let us handle missing data elegantly! ✨
Core Extraction Logic (Program.cs)
using Microsoft.Extensions.AI;
using OpenAI;
using System.ClientModel;
using System.Text.Json;
// -------------------------------------------------
// 1️⃣ Configure the client
// -------------------------------------------------
var configuration = new ConfigurationBuilder()
.AddUserSecrets()
.Build();
var credential = new ApiKeyCredential(
configuration["GitHubModels:Token"] ??
throw new InvalidOperationException("Token not found")
);
IChatClient chatClient = new OpenAIClient(credential, new OpenAIClientOptions
{
Endpoint = new Uri("https://models.inference.ai.azure.com")
})
.GetChatClient("gpt-4o-mini")
.AsIChatClient();
// -------------------------------------------------
// 2️⃣ Prompt (schema)
// -------------------------------------------------
var prompt = @"Extract the following details from the car listing and return ONLY a valid JSON object:
{
""Make"": ""string - car manufacturer/brand"",
""Model"": ""string - car model name"",
""Year"": number - manufacturing year,
""Mileage"": number - kilometers driven,
""Price"": number - price in lakhs,
""AvailabilityType"": ""string - one of: Sale, Lease, Rent"",
""Features"": ""array of strings - notable features"",
""ShortSummary"": ""string - brief summary in 10-15 words"",
""OwnerCount"": number - previous owners (null if not mentioned)
}
Return only the JSON object, no additional text.";
// -------------------------------------------------
// 3️⃣ Sample listings
// -------------------------------------------------
var carListings = new List
{
"Honda City 2018 for sale, only 30,000 km! Single owner, showroom condition. ₹6.5 lakh.",
"Hyundai Creta SX 2020 — premium SUV with sunroof. Monthly lease at ₹22,000.",
"Toyota Innova Crysta 2019 — spacious 7‑seater, 40,000 km, rent at ₹2,500/day."
};
// -------------------------------------------------
// 4️⃣ Process each listing
// -------------------------------------------------
foreach (var listing in carListings)
{
var response = await chatClient.GetResponseAsync(
$"{prompt}\n\nCar Listing:\n{listing}"
);
if (response.TryGetResult(out CarDetails? carDetails) && carDetails != null)
{
Console.WriteLine($"✅ Extracted: {carDetails.Make} {carDetails.Model}");
Console.WriteLine(JsonSerializer.Serialize(carDetails,
new JsonSerializerOptions { WriteIndented = true }));
}
}
Run the app:
dotnet run
Sample Output
Processing car listings...
✅ Extracted: Honda City
{
"Make": "Honda",
"Model": "City",
"Year": 2018,
"Mileage": 30000,
"Price": 6.5,
"AvailabilityType": "Sale",
"Features": ["Single owner", "Showroom condition"],
"OwnerCount": 1
}
✅ Extracted: Hyundai Creta
{
"Make": "Hyundai",
"Model": "Creta SX",
"Year": 2020,
"AvailabilityType": "Lease",
"PricePerMonth": 22000,
"Features": ["Premium SUV", "Sunroof"]
}
Extending the Solution
-
Add More Fields – fuel type, transmission, color
public string? FuelType { get; set; } // Petrol/Diesel/Electric public string? Transmission { get; set; } // Manual/Automatic public string? Color { get; set; } -
Process Real‑Time Data – pull listings from an API or RSS feed
var listings = await FetchListingsFromApi("https://api.carmarket.com/listings"); -
Validate Data
if (carDetails.Year DateTime.Now.Year) { Console.WriteLine("⚠️ Invalid year detected"); } -
Persist to a Database
await dbContext.CarListings.AddAsync(carDetails); await dbContext.SaveChangesAsync(); -
Swap to a More Capable Model
.GetChatClient("gpt-4o") // Higher accuracy, slightly slower
Best Practices
- Keep temperature low (default works well) for consistent extraction.
- Be explicit in prompts – define the exact JSON format you need.
- Use nullable types – not every listing contains every field.
- Batch process – handle many listings efficiently.
- Monitor token usage – track costs via
response.Usage.
🎯 Ready to turn chaotic car ads into clean, structured data?
Give it a try, tweak the schema to your needs, and let AI do the heavy lifting! 🚀
World Applications
🚀 Use‑Cases
- 🏪 Marketplace Aggregation – Consolidate listings from multiple sources
- 💰 Price Intelligence – Track pricing trends across markets
- 📊 Analytics Dashboards – Build insights from unstructured data
- 🤖 Chatbots – Power car‑recommendation bots
- 📱 Mobile Apps – Parse user‑submitted listings
📂 Get the Complete Working Example
Grab it from GitHub:
genai-dotnet-basic_llm_tasks/TextExtraction
The repo includes:
- ✅ Full source code with comments
- ✅ 9 example car listings
- ✅ Configuration‑setup guide
- ✅ Detailed README
🛠️ What You’ll Learn
- Using GitHub Models API in .NET
- Strongly‑typed AI responses with
GetResponseAsync - Schema‑based extraction with AI
- Handling unstructured data gracefully
- Building production‑ready text extraction
🎯 Try Extracting
- 📄 Resume data – name, skills, experience
- 🧾 Invoices – vendor, amounts, dates
- 📧 Emails – sender, subject, key points
- 🏠 Real‑estate listings
- 🍕 Restaurant menus – dishes, prices, ingredients
The same pattern works for any text‑extraction task!
💡 What Will You Build?
Drop a comment below! 👇
👍 Like This?
Found this helpful? Give it a ❤️ and follow for more .NET + AI content!
Tags: #dotnet #ai #machinelearning #csharp #github #opensource #textextraction #nlp #automation
GitHub Repo: