中文文档 (Chinese): README.zh-CN.md
A practical Douyin downloader for both single-item and profile batch downloads, with progress display, retries, SQLite deduplication, and browser fallback support.
This document targets V2.0 (
mainbranch).
For the legacy version, switch to V1.0:git fetch --all && git switch V1.0
This project has been significantly upgraded to V2.0. Ongoing feature development and fixes are mainly on the
mainbranch.
V1.0 is still available, but maintained with low frequency.
- Single video download:
/video/{aweme_id} - Single image-note download:
/note/{note_id} - Automatic short-link parsing:
https://v.douyin.com/... - Profile batch download:
/user/{sec_uid}+mode: [post] - No-watermark preferred, plus cover/music/avatar/JSON metadata downloads
- Optional video transcription (
transcript, using OpenAI Transcriptions API) - Concurrent downloads, retry logic, and rate limiting
- SQLite deduplication and incremental download (
increase.post) - Time filters (
start_time/end_time, currently forpost) - Browser fallback when pagination is blocked (manual verification supported)
- Progress bar display (supports
progress.quiet_logsquiet mode)
mode: likeliked-content downloadmode: mixcollection downloadnumber.like/number.mix/increase.like/increase.mixcollection/mixlinks currently have no downloader (explicitly reported as unsupported)
- Python 3.8+
- macOS / Linux / Windows
pip install -r requirements.txtcp config.example.yml config.ymlpip install playwright
python -m playwright install chromium
python -m tools.cookie_fetcher --config config.ymlAfter logging into Douyin, return to the terminal and press Enter. Cookies will be written to your config automatically.
link:
- https://www.douyin.com/user/MS4wLjABAAAAxxxx
path: ./Downloaded/
mode:
- post
number:
post: 0
thread: 5
retry_times: 3
database: true
progress:
quiet_logs: true
cookies:
msToken: ""
ttwid: YOUR_TTWID
odin_tt: YOUR_ODIN_TT
passport_csrf_token: YOUR_CSRF_TOKEN
sid_guard: ""
browser_fallback:
enabled: true
headless: false
max_scrolls: 240
idle_rounds: 8
wait_timeout_seconds: 600
transcript:
enabled: false
model: gpt-4o-mini-transcribe
output_dir: ""
response_formats: ["txt", "json"]
api_url: https://api.openai.com/v1/audio/transcriptions
api_key_env: OPENAI_API_KEY
api_key: ""python run.py -c config.ymlpython run.py -c config.yml \
-u "https://www.douyin.com/video/7604129988555574538" \
-t 8 \
-p ./DownloadedArguments:
-u, --url: append download link(s), can be repeated-c, --config: specify config file-p, --path: specify download directory-t, --thread: specify concurrency--show-warnings: show warning/error logs-v, --verbose: show info/warning/error logs
link:
- https://www.douyin.com/video/7604129988555574538link:
- https://www.douyin.com/note/7341234567890123456link:
- https://www.douyin.com/user/MS4wLjABAAAAxxxx
mode:
- post
number:
post: 50number:
post: 0Current behavior applies to video items only (image-note items do not generate transcripts).
transcript:
enabled: true
model: gpt-4o-mini-transcribe
output_dir: "" # empty: same folder as video; non-empty: mirrored to target dir
response_formats:
- txt
- json
api_key_env: OPENAI_API_KEY
api_key: "" # can be set directly, or via environment variableRecommended to provide key through environment variable:
export OPENAI_API_KEY="sk-xxxx"When enabled, it generates:
xxx.transcript.txtxxx.transcript.json
If database: true, job status is also recorded in SQLite table transcript_job (success/failed/skipped).
mode: currently onlypostis effectivenumber: currently onlynumber.postis effectiveincrease: currently onlyincrease.postis effectivestart_time/end_time: currently used forposttime filteringfolderstyle: controls whether to create per-item subdirectoriesbrowser_fallback.*: used forpostwhen pagination is restrictedprogress.quiet_logs: quiet logs during progress stagetranscript.*: optional transcription after video downloadauto_cookie: reserved field, not used in main flow currently
Default with folderstyle: true:
Downloaded/
├── download_manifest.jsonl
└── AuthorName/
└── post/
└── 2024-02-07_Title_aweme_id/
├── ...mp4
├── ..._cover.jpg
├── ..._music.mp3
├── ..._data.json
├── ..._avatar.jpg
├── ...transcript.txt # transcript.enabled=true and includes txt
└── ...transcript.json # transcript.enabled=true and includes json
This is a common pagination risk-control behavior. Make sure:
browser_fallback.enabled: truebrowser_fallback.headless: false- complete verification manually in the browser popup, and do not close it too early
By default, progress.quiet_logs: true suppresses logs during progress stage.
Use --show-warnings or -v temporarily when debugging.
Run:
python -m tools.cookie_fetcher --config config.ymlCheck in order:
- whether
transcript.enabledistrue - whether downloaded items are videos (image-notes are not transcribed)
- whether
OPENAI_API_KEY(ortranscript.api_key) is valid - whether
response_formatsincludestxtorjson
If you prefer the legacy script style (V1.0):
git fetch --all
git switch V1.0This project is for technical research, learning, and personal data management only. Please use it legally and responsibly:
- Do not use it to infringe others' privacy, copyright, or other legal rights
- Do not use it for any illegal purpose
- Users are solely responsible for all risks and liabilities arising from usage
- If platform policies or interfaces change and features break, this is a normal technical risk
By continuing to use this project, you acknowledge and accept the statements above.
