실시간 정책 제약 하의 순환 제조 공급망을 위한 Generative Simulation Benchmarking

발행: 4개월 전 (2026년 1월 2일 오전 06:24 GMT+9)

13 분 소요

원문: Dev.to

Source: Dev.to

순환 제조를 위한 생성 시뮬레이션 벤치마킹

복합 시스템 시뮬레이션에 대한 개인적인 여정

이 문제에 대한 나의 매력은 깔끔한 연구실이 아니라, 중간 규모 전자 재제조 시설의 혼란스러운 현실에서 시작되었습니다. AI 최적화 프로젝트를 컨설팅하면서, 나는 정책 변화—새로운 환경 규제, 변동하는 소재 관세, 갑작스러운 공급업체 차질—가 원형 공급망을 통해 어떻게 예측할 수 없는 결과를 일으키는지 몇 주 동안 관찰했습니다.

공장 관리자는 정책이 바뀔 때마다 수동으로 조정되는 수백 개의 상호 의존 변수들이 담긴 스프레드시트를 보여주었습니다.

“우리는 눈을 가리고 날아다니고 있어요,” 라고 그는 말했습니다. “규제 하나가 바뀔 때마다 우리는 몇 주씩 재조정에 시간을 쓰고, 예상치 못한 비효율에 수천 달러를 지출합니다.”

이 경험은 생성적 시뮬레이션에 대한 다년간의 연구 여정을 촉발시켰습니다. 전통적인 이산 이벤트 시뮬레이션으로는 각 구성 요소가 여러 수명 주기를 가지고 정책이 실시간으로 변하는 원형 시스템의 복합적인 복잡성을 포착할 수 없다는 것을 깨달았습니다. 다중 에이전트 강화 학습에 관한 최신 논문을 연구하고 양자 영감 최적화 알고리즘을 실험하면서, 우리가 필요로 한 것은 단순히 더 나은 시뮬레이션이 아니라 생성적 벤치마킹이라는 것을 발견했습니다. 이는 정책 제약이 적용된 수천 개의 시나리오를 자동으로 생성하고 평가할 수 있는 방법이었습니다.

기술적 기반: 왜 순환 공급망이 전통 모델을 깨는가

Circular manufacturing은 선형 “take‑make‑dispose”(취득‑제조‑폐기) 모델에서 재료가 최고 효용을 유지하며 순환하는 폐쇄‑루프 시스템으로의 패러다임 전환을 의미합니다. 이러한 시스템이 시뮬레이션에서 특히 어려운 이유는 그 고유한 복잡성에 있습니다:

Multi‑directional material flows (forward, reverse, lateral)
Temporal decoupling (components re‑enter the system after unpredictable delays)
Quality degradation with each lifecycle
Real‑time policy constraints that evolve during simulation

전통적인 공급망 시뮬레이션은 기본적으로 선형 인과관계와 정적인 제약을 가정합니다. 순환 시스템은 비선형적 emergent behavior를 보여 작은 정책 변화가 여러 라이프사이클 단계에 걸쳐 불균형적인 영향을 미칠 수 있습니다.

생성 시뮬레이션 아키텍처

다양한 시뮬레이션 프레임워크를 실험하면서 여러 AI 기법을 결합한 하이브리드 아키텍처를 개발했습니다:

import numpy as np
from typing import Dict, List, Tuple
from dataclasses import dataclass
from enum import Enum

class MaterialState(Enum):
    VIRGIN = "virgin"
    IN_USE = "in_use"
    RETURNED = "returned"
    REMANUFACTURED = "remanufactured"
    RECYCLED = "recycled"
    DISPOSED = "disposed"

@dataclass
class PolicyConstraint:
    """Real‑time policy constraint representation"""
    constraint_type: str
    threshold: float
    activation_time: int
    decay_function: callable
    affected_materials: List[str]

    def is_active(self, current_time: int) -> bool:
        """Check if policy is active at simulation time"""
        return current_time >= self.activation_time

class CircularEntity:
    """Base class for circular supply chain entities"""
    def __init__(self, entity_id: str, material_type: str):
        self.id = entity_id
        self.material = material_type
        self.state = MaterialState.VIRGIN
        self.lifecycle_count = 0
        self.quality_score = 1.0
        self.location_history = []
        self.carbon_footprint = 0.0

    def transition_state(self, new_state: MaterialState,
                          quality_degradation: float = 0.05):
        """Handle state transitions with quality degradation"""
        self.state = new_state
        if new_state in [MaterialState.REMANUFACTURED, MaterialState.RECYCLED]:
            self.lifecycle_count += 1
            self.quality_score *= (1 - quality_degradation)

    def apply_policy_effect(self, policy: PolicyConstraint,
                            current_time: int):
        """Apply real‑time policy effects to entity"""
        if policy.is_active(current_time):
            # Policy‑specific effects implementation
            if policy.constraint_type == "carbon_tax":
                tax_rate = policy.decay_function(current_time - policy.activation_time)
                self.carbon_footprint += tax_rate

이 아키텍처를 실험하면서 발견한 흥미로운 점은, 각 엔티티를 독립적인 에이전트로 표현하고 그 라이프사이클 이력을 기억하도록 하면 전통적인 집계 방식보다 순환 행동을 훨씬 더 정확하게 모델링할 수 있다는 것이었습니다.

생성적 벤치마킹: 현실적인 정책 시나리오 만들기

내 접근 방식의 핵심 혁신은 생성적 측면—정책 조건이 변화함에 따라 순환 공급망을 스트레스 테스트할 수 있는 다양하고 현실적인 벤치마킹 시나리오를 자동으로 생성하는 것입니다. 시나리오 생성을 위해 생성적 적대 네트워크(GAN)를 탐색하면서, 전통적인 GAN은 정책 진화에 필요한 시간적 일관성을 유지하는 데 어려움을 겪는다는 것을 발견했습니다.

정책 인식 시나리오 생성

정책 의미론을 이해하는 트랜스포머 기반 시나리오 생성기를 개발했습니다:

import torch
import torch.nn as nn
from transformers import GPT2Model, GPT2Config

class PolicyAwareScenarioGenerator(nn.Module):
    """Generates realistic policy evolution scenarios"""

    def __init__(self, vocab_size: int, hidden_dim: int = 768):
        super().__init__()
        config = GPT2Config(
            vocab_size=vocab_size,
            n_embd=hidden_dim,
            n_layer=12,
            n_head=12,
            bos_token_id=0,
            eos_token_id=1,
        )
        self.transformer = GPT2Model(config)
        self.lm_head = nn.Linear(hidden_dim, vocab_size, bias=False)

    def forward(self, input_ids, attention_mask=None):
        outputs = self.transformer(input_ids, attention_mask=attention_mask)
        logits = self.lm_head(outputs.last_hidden_state)
        return logits

이 생성기는 정책 타임라인(예: 세율, 보조금 도입, 규제 상한) 을 내부적으로 일관되게 만들 수 있으며, 벤치마킹을 위해 시뮬레이션 엔진에 직접 입력할 수 있습니다.

추가 모델 스니펫

vocab_size = vocab_size
n_embd = hidden_dim
n_layer = 8
n_head = 8

self.transformer = GPT2Model(config)
self.policy_embedding = nn.Embedding(100, hidden_dim)
self.temporal_encoder = nn.LSTM(hidden_dim, hidden_dim, batch_first=True)

def generate_scenario(self,
                      initial_conditions: torch.Tensor,
                      policy_timeline: List[PolicyConstraint],
                      num_steps: int = 100):
    """Generate a complete benchmarking scenario"""

    scenarios = []
    current_state = initial_conditions

    for step in range(num_steps):
        # Encode active policies at this timestep
        active_policies = [p for p in policy_timeline if p.is_active(step)]
        policy_embeddings = self._encode_policies(active_policies)

        # Generate next state with policy constraints
        transformer_input = torch.cat([
            current_state,
            policy_embeddings,
            self._encode_temporal_context(step)
        ], dim=-1)

        next_state = self.transformer(transformer_input).last_hidden_state
        scenarios.append(next_state)
        current_state = next_state

    return torch.stack(scenarios, dim=1)

def _encode_policies(self, policies: List[PolicyConstraint]) -> torch.Tensor:
    """Encode multiple policies into a single embedding"""
    policy_ids = torch.tensor([hash(p) % 100 for p in policies])
    return self.policy_embedding(policy_ids).mean(dim=0, keepdim=True)

인과적 어텐션 마스크

이 접근 방식을 조사하면서 인과적 어텐션 마스크를 도입하는 것이 핵심이라는 것을 알게 되었습니다—정책은 과거 상태가 아니라 미래 상태에만 영향을 미칠 수 있습니다. 이러한 시간적 인과성 제약은 시나리오의 현실성을 크게 향상시켰습니다.

적응형 대응을 위한 다중 에이전트 강화 학습

벤치마킹 시스템은 단순히 시나리오를 생성하는 것에 그치지 않고, 다양한 제어 전략이 어떻게 수행되는지를 평가해야 했습니다. 다중 에이전트 강화 학습의 최신 연구들을 조사한 결과, 각 공급망 노드가 정책 변화에 대한 적응형 대응을 학습하도록 하는 분산 제어 시스템을 구현했습니다.

import gym
from gym import spaces
import numpy as np
import torch
import torch.nn.functional as F
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

class CircularSupplyChainEnv(gym.Env):
    """Custom environment for circular supply chain simulation"""

    def __init__(self, num_nodes: int = 10):
        super().__init__()
        self.num_nodes = num_nodes
        self.current_policies = []

        # Define action and observation spaces
        self.action_space = spaces.Box(
            low=0, high=1, shape=(num_nodes * 3,), dtype=np.float32
        )
        self.observation_space = spaces.Dict({
            'inventory_levels': spaces.Box(low=0, high=1000, shape=(num_nodes,)),
            'material_flows': spaces.Box(low=0, high=100, shape=(num_nodes, num_nodes)),
            'policy_embeddings': spaces.Box(low=-1, high=1, shape=(10,)),
            'quality_metrics': spaces.Box(low=0, high=1, shape=(num_nodes,))
        })

    def step(self, actions: np.ndarray):
        """Execute one timestep of the environment"""
        node_actions = self._decode_actions(actions)

        rewards = []
        for node_id, action in enumerate(node_actions):
            reward = self._apply_node_action(node_id, action)
            rewards.append(reward)

        self._update_material_flows()
        self._apply_policy_effects()

        total_reward = self._calculate_system_reward(rewards)
        done = self.current_step >= self.max_steps

        return self._get_observation(), total_reward, done, {}

    def _apply_node_action(self, node_id: int, action: Dict) -> float:
        """Apply individual node action with policy constraints"""
        policy_violations = self._check_policy_compliance(node_id, action)

        if policy_violations > 0:
            return -10.0 * policy_violations

        return self._calculate_local_reward(node_id, action)

이 RL 접근법을 실험하면서 얻은 한 가지 통찰은 개별 정책 준수 페널티가 포함된 공유 보상 구조가 가장 견고한 적응 행동을 만들어낸다는 것이었습니다. 노드들은 진화하는 제약 조건을 엄격히 따르면서도 협력하는 방법을 학습했습니다.

정책 탐색을 위한 양자 영감 최적화

가능한 정책 시퀀스의 조합 폭발로 인해 전통적인 최적화 방법이 효과적이지 못했습니다. 저는 양자 영감 기법을 사용하여 양자 영감 어닐링 알고리즘을 구현했습니다.

import numpy as np
from scipy.optimize import differential_evolution

class QuantumInspiredPolicyOptimizer:
    """Optimizes policy sequences using quantum‑inspired techniques"""

    def __init__(self, num_policies: int, horizon: int):
        self.num_policies = num_policies
        self.horizon = horizon
        self.temperature = 1.0
        self.quantum_tunneling_prob = 0.1

    def optimize_policy_sequence(self,
                                 scenario: np.ndarray,
                                 objective_function: callable) -> np.ndarray:
        """Find optimal policy sequence for given scenario"""

        population_size = 100
        # (Further implementation would follow...)

정책 탐색을 위한 양자 어닐링

population = self._initialize_quantum_population(population_size)

for iteration in range(1000):
    # Evaluate all sequences in superposition
    fitness = np.array([objective_function(s, scenario)
                        for s in population])

    # Apply quantum selection pressure
    selected = self._quantum_selection(population, fitness)

    # Quantum crossover and mutation
    offspring = self._quantum_crossover(selected)
    offspring = self._quantum_mutation(offspring)

    # Quantum tunneling to escape local optima
    if np.random.random() < self.quantum_tunneling_prob:
        population = self._quantum_tunneling(population)

def _initialize_quantum_population(self, size):
    """Initialize population in quantum superposition state"""
    population = np.random.randn(size, self.horizon, self.num_policies)
    population = np.exp(1j * population)          # Complex numbers for quantum state
    return np.abs(population)                       # Measurement gives classical probabilities

def _quantum_tunneling(self, population: np.ndarray) -> np.ndarray:
    """Quantum tunneling to escape local optima"""
    mask = np.random.random(population.shape) < self.quantum_tunneling_prob
    return np.where(mask, 1 - population, population)

Neural ODE를 실험하면서, 탄소세를 점진적으로 인상하거나 단계적인 물질 제한과 같은 부드러운 정책 전환을 모델링하는 데 특히 효과적이라는 것을 발견했습니다. 연속 시간 형태는 이산 모델이 완전히 놓치던 효과들을 포착했습니다.

import torch
from torch import nn
from torchdiffeq import odeint

class PolicyODE(nn.Module):
    def __init__(self, state_dim: int, policy_dim: int):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(state_dim + policy_dim, 128),
            nn.Tanh(),
            nn.Linear(128, state_dim)
        )
        self.policy_dim = policy_dim

    def forward(self, t, y):
        """Compute derivatives at time t"""
        state = y[:-1]
        policy_effect = self._interpolate_policies(t)

        combined = torch.cat([state, policy_effect], dim=-1)
        return self.net(combined)

    def simulate(self,
                 initial_state: torch.Tensor,
                 policy_schedule: List[Tuple[float, torch.Tensor]],
                 t_span: Tuple[float, float]) -> torch.Tensor:
        """Simulate continuous‑time evolution under a policy schedule"""
        self.policy_schedule = policy_schedule

        solution = odeint(
            self,
            torch.cat([initial_state, torch.zeros(1)]),
            torch.linspace(t_span[0], t_span[1], 100)
        )
        return solution[:, :-1]  # Return only state, not policy dimension

    def _interpolate_policies(self, t: float) -> torch.Tensor:
        """Interpolate policy effects at continuous time t"""
        before = [(time, p) for time, p in self.policy_schedule if time <= t]
        after = [(time, p) for time, p in self.policy_schedule if time > t]
        # (Interpolation logic would follow...)

정리된 마크다운 내용

vocab_size = vocab_size
n_embd = hidden_dim
n_layer = 8
n_head = 8

self.transformer = GPT2Model(config)
self.policy_embedding = nn.Embedding(100, hidden_dim)
self.temporal_encoder = nn.LSTM(hidden_dim, hidden_dim, batch_first=True)