```html

프록시를 통한 작업 시 timeout 오류를 수정하는 방법

요청이 멈추고, 스크립트가 TimeoutError 오류로 중단되며, 데이터를 받지 못했나요? 익숙한 상황인가요? 프록시를 통한 Timeout 오류는 파싱 및 자동화 작업 시 가장 흔한 문제 중 하나입니다. 원인을 분석하고 구체적인 해결책을 제시하겠습니다.

Timeout 오류가 발생하는 이유

Timeout은 하나의 문제가 아니라 증상입니다. 치료하기 전에 원인을 파악해야 합니다:

느린 프록시 서버. 과부하 상태인 서버 또는 지리적으로 먼 프록시는 각 요청에 지연을 추가합니다. 타임아웃이 10초인데 프록시가 12초 후에 응답하면 오류가 발생합니다.

대상 사이트에서의 차단. 사이트는 명시적인 거부 대신 의심스러운 요청을 의도적으로 "보류"할 수 있습니다. 이는 봇 방지 전술입니다 - 연결을 무한정 열어두는 것입니다.

DNS 문제. 프록시는 도메인을 해석해야 합니다. 프록시의 DNS 서버가 느리거나 접근할 수 없으면 요청이 연결 단계에서 멈춥니다.

잘못된 타임아웃 설정. 모든 것에 대한 하나의 일반적인 타임아웃은 흔한 실수입니다. Connect timeout과 read timeout은 다른 것이며 별도로 설정해야 합니다.

네트워크 문제. 패킷 손실, 불안정한 프록시 연결, 라우팅 문제 - 이 모든 것이 타임아웃을 초래합니다.

타임아웃 유형 및 설정

대부분의 HTTP 라이브러리는 여러 유형의 타임아웃을 지원합니다. 이들 간의 차이를 이해하는 것이 올바른 설정의 핵심입니다.

Connect timeout

프록시 및 대상 서버와의 TCP 연결 설정 시간입니다. 프록시에 접근할 수 없거나 서버가 응답하지 않으면 이 타임아웃이 작동합니다. 권장 값: 5-10초.

Read timeout

연결 설정 후 데이터 대기 시간입니다. 서버가 연결되었지만 침묵하면 read timeout이 작동합니다. 일반 페이지의 경우: 15-30초. 무거운 API의 경우: 60초 이상.

Total timeout

시작부터 끝까지 전체 요청에 대한 시간입니다. 멈춘 연결로부터의 보험입니다. 일반적으로: connect + read + 여유.

Python의 requests 라이브러리를 사용한 설정 예제:

import requests

proxies = {
    "http": "http://user:pass@proxy.example.com:8080",
    "https": "http://user:pass@proxy.example.com:8080"
}

# 튜플: (connect_timeout, read_timeout)
timeout = (10, 30)

try:
    response = requests.get(
        "https://target-site.com/api/data",
        proxies=proxies,
        timeout=timeout
    )
except requests.exceptions.ConnectTimeout:
    print("프록시 또는 서버에 연결할 수 없습니다")
except requests.exceptions.ReadTimeout:
    print("서버가 제시간에 데이터를 보내지 않았습니다")

aiohttp(비동기 Python)의 경우:

import aiohttp
import asyncio

async def fetch_with_timeout():
    timeout = aiohttp.ClientTimeout(
        total=60,      # 전체 타임아웃
        connect=10,    # 연결 시간
        sock_read=30   # 데이터 읽기 시간
    )
    
    async with aiohttp.ClientSession(timeout=timeout) as session:
        async with session.get(
            "https://target-site.com/api/data",
            proxy="http://user:pass@proxy.example.com:8080"
        ) as response:
            return await response.text()

Retry 로직: 올바른 접근 방식

Timeout은 항상 치명적인 오류는 아닙니다. 종종 재시도 요청이 성공적으로 진행됩니다. 하지만 retry는 현명하게 수행해야 합니다.

지수 백오프

일시 중지 없이 반복 요청으로 서버를 계속 두드리지 마세요. 지수 백오프를 사용하세요: 각 다음 시도는 증가하는 지연으로 진행됩니다.

import requests
import time
import random

def fetch_with_retry(url, proxies, max_retries=3):
    """재시도 및 지수 백오프를 사용한 요청"""
    
    for attempt in range(max_retries):
        try:
            response = requests.get(
                url,
                proxies=proxies,
                timeout=(10, 30)
            )
            response.raise_for_status()
            return response
            
        except (requests.exceptions.Timeout, 
                requests.exceptions.ConnectionError) as e:
            
            if attempt == max_retries - 1:
                raise  # 마지막 시도 - 오류 발생
            
            # 지수 백오프: 1s, 2s, 4s...
            # + 요청 파동을 만들지 않기 위한 무작위 jitter
            delay = (2 ** attempt) + random.uniform(0, 1)
            print(f"시도 {attempt + 1} 실패: {e}")
            print(f"{delay:.1f}초 후 재시도...")
            time.sleep(delay)

tenacity 라이브러리

프로덕션 코드의 경우 기성 솔루션을 사용하는 것이 더 편합니다:

from tenacity import retry, stop_after_attempt, wait_exponential
from tenacity import retry_if_exception_type
import requests

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
    retry=retry_if_exception_type((
        requests.exceptions.Timeout,
        requests.exceptions.ConnectionError
    ))
)
def fetch_data(url, proxies):
    response = requests.get(url, proxies=proxies, timeout=(10, 30))
    response.raise_for_status()
    return response.json()

타임아웃 시 프록시 로테이션

한 프록시가 계속 타임아웃을 제공하면 문제는 그 프록시에 있습니다. 논리적인 해결책은 다른 프록시로 전환하는 것입니다.

import requests
from collections import deque
from dataclasses import dataclass, field
from typing import Optional
import time

@dataclass
class ProxyManager:
    """실패한 시도를 추적하는 프록시 관리자"""
    
    proxies: list
    max_failures: int = 3
    cooldown_seconds: int = 300
    _failures: dict = field(default_factory=dict)
    _cooldown_until: dict = field(default_factory=dict)
    
    def get_proxy(self) -> Optional[str]:
        """작동하는 프록시 가져오기"""
        current_time = time.time()
        
        for proxy in self.proxies:
            # 쿨다운 중인 프록시 건너뛰기
            if self._cooldown_until.get(proxy, 0) > current_time:
                continue
            return proxy
        
        return None  # 모든 프록시가 쿨다운 중
    
    def report_failure(self, proxy: str):
        """실패한 요청 보고"""
        self._failures[proxy] = self._failures.get(proxy, 0) + 1
        
        if self._failures[proxy] >= self.max_failures:
            # 프록시를 쿨다운으로 보냄
            self._cooldown_until[proxy] = time.time() + self.cooldown_seconds
            self._failures[proxy] = 0
            print(f"프록시 {proxy}가 쿨다운으로 설정되었습니다")
    
    def report_success(self, proxy: str):
        """성공 시 오류 카운터 재설정"""
        self._failures[proxy] = 0


def fetch_with_rotation(url, proxy_manager, max_attempts=5):
    """오류 시 자동 프록시 변경을 사용한 요청"""
    
    for attempt in range(max_attempts):
        proxy = proxy_manager.get_proxy()
        
        if not proxy:
            raise Exception("사용 가능한 프록시가 없습니다")
        
        proxies = {"http": proxy, "https": proxy}
        
        try:
            response = requests.get(url, proxies=proxies, timeout=(10, 30))
            response.raise_for_status()
            proxy_manager.report_success(proxy)
            return response
            
        except (requests.exceptions.Timeout, 
                requests.exceptions.ConnectionError):
            proxy_manager.report_failure(proxy)
            print(f"{proxy}를 통한 타임아웃, 다른 프록시 시도...")
            continue
    
    raise Exception(f"{max_attempts}회 시도 후 데이터를 가져올 수 없습니다")

자동 로테이션이 있는 레지던셜 프록시를 사용할 때 이 로직이 단순화됩니다 - 제공자가 각 요청 시 또는 지정된 간격으로 IP를 자동으로 전환합니다.

타임아웃 제어를 통한 비동기 요청

대규모 파싱 시 동기 요청은 비효율적입니다. 비동기 접근 방식을 사용하면 수백 개의 URL을 병렬로 처리할 수 있지만 타임아웃을 신중하게 처리해야 합니다.

import aiohttp
import asyncio
from typing import List, Tuple

async def fetch_one(
    session: aiohttp.ClientSession, 
    url: str,
    semaphore: asyncio.Semaphore
) -> Tuple[str, str | None, str | None]:
    """타임아웃 처리를 사용한 단일 URL 로드"""
    
    async with semaphore:  # 병렬성 제한
        try:
            async with session.get(url) as response:
                content = await response.text()
                return (url, content, None)
                
        except asyncio.TimeoutError:
            return (url, None, "timeout")
        except aiohttp.ClientError as e:
            return (url, None, str(e))


async def fetch_all(
    urls: List[str],
    proxy: str,
    max_concurrent: int = 10
) -> List[Tuple[str, str | None, str | None]]:
    """타임아웃 및 병렬성 제어를 사용한 대량 로드"""
    
    timeout = aiohttp.ClientTimeout(total=45, connect=10, sock_read=30)
    semaphore = asyncio.Semaphore(max_concurrent)
    
    connector = aiohttp.TCPConnector(
        limit=max_concurrent,
        limit_per_host=5  # 호스트당 5개 이상의 연결 없음
    )
    
    async with aiohttp.ClientSession(
        timeout=timeout,
        connector=connector
    ) as session:
        # 모든 요청에 프록시 설정
        tasks = [
            fetch_one(session, url, semaphore) 
            for url in urls
        ]
        results = await asyncio.gather(*tasks)
    
    # 통계
    success = sum(1 for _, content, _ in results if content)
    timeouts = sum(1 for _, _, error in results if error == "timeout")
    print(f"성공: {success}, 타임아웃: {timeouts}")
    
    return results


# 사용 예제
async def main():
    urls = [f"https://example.com/page/{i}" for i in range(100)]
    results = await fetch_all(
        urls, 
        proxy="http://user:pass@proxy.example.com:8080",
        max_concurrent=10
    )

asyncio.run(main())

중요: 병렬성을 너무 높게 설정하지 마세요. 한 프록시를 통한 50-100개의 동시 요청은 이미 많습니다. 여러 프록시로 10-20개가 더 좋습니다.

진단: 원인을 찾는 방법

설정을 변경하기 전에 문제의 원인을 파악하세요.

단계 1: 프록시를 직접 확인하세요

# curl을 통한 간단한 테스트 및 시간 측정
curl -x http://user:pass@proxy:8080 \
     -w "Connect: %{time_connect}s\nTotal: %{time_total}s\n" \
     -o /dev/null -s \
     https://httpbin.org/get

time_connect이 5초 이상이면 프록시 또는 그에 대한 네트워크 문제입니다.

단계 2: 직접 요청과 비교하세요

import requests
import time

def measure_request(url, proxies=None):
    start = time.time()
    try:
        r = requests.get(url, proxies=proxies, timeout=30)
        elapsed = time.time() - start
        return f"OK: {elapsed:.2f}s, status: {r.status_code}"
    except Exception as e:
        elapsed = time.time() - start
        return f"FAIL: {elapsed:.2f}s, error: {type(e).__name__}"

url = "https://target-site.com"
proxy = {"http": "http://proxy:8080", "https": "http://proxy:8080"}

print("직접:", measure_request(url))
print("프록시를 통해:", measure_request(url, proxy))

단계 3: 다양한 프록시 유형을 확인하세요

타임아웃은 프록시 유형에 따라 달라질 수 있습니다:

프록시 유형	일반적인 지연	권장 타임아웃
데이터센터	50-200 ms	Connect: 5s, Read: 15s
레지던셜	200-800 ms	Connect: 10s, Read: 30s
모바일	300-1500 ms	Connect: 15s, Read: 45s

단계 4: 세부 정보를 기록하세요

import logging
import requests
from requests.adapters import HTTPAdapter

# 디버그 로깅 활성화
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("urllib3").setLevel(logging.DEBUG)

# 이제 요청의 모든 단계를 볼 수 있습니다:
# - DNS 해석
# - 연결 설정
# - 요청 전송
# - 응답 수신

Timeout 오류 해결 체크리스트

타임아웃 발생 시 조치 사항의 간단한 알고리즘:

타임아웃 유형 결정 — connect인가 read인가? 이는 다른 문제입니다.
프록시를 별도로 확인 — 작동하나요? 지연은 얼마나 되나요?
타임아웃 증가 — 프록시 유형에 대해 값이 너무 공격적일 수 있습니다.
백오프를 사용한 재시도 추가 — 단일 타임아웃은 정상이며 중요한 것은 안정성입니다.
로테이션 설정 — 문제 발생 시 다른 프록시로 자동 전환합니다.
병렬성 제한 — 너무 많은 동시 요청이 프록시를 과부하 상태로 만듭니다.
대상 사이트 확인 — 요청을 차단하거나 throttle할 수 있습니다.

결론

프록시를 통한 Timeout 오류는 해결 가능한 문제입니다. 대부분의 경우 프록시 유형에 맞게 타임아웃을 올바르게 설정하고, 재시도 로직을 추가하며, 실패 시 로테이션을 구현하는 것으로 충분합니다. 높은 안정성 요구 사항이 있는 작업의 경우 자동 로테이션이 있는 레지던셜 프록시를 사용하세요 - proxycove.com에서 자세히 알아보세요.

```

프록시 사용 시 타임아웃 오류를 해결하는 방법