SKILL.md 11 KB

twscrape

Python library for scraping Twitter/X data using GraphQL API with account rotation and session management.

When to use this skill

Use this skill when:

  • Working with Twitter/X data extraction and scraping
  • Need to bypass Twitter API limitations with account rotation
  • Building social media monitoring or analytics tools
  • Extracting tweets, user profiles, followers, trends from Twitter/X
  • Need async/parallel scraping operations for large-scale data collection
  • Looking for alternatives to official Twitter API

Quick Reference

Installation

pip install twscrape

Basic Setup

import asyncio
from twscrape import API, gather

async def main():
    api = API()  # Uses accounts.db by default

    # Add accounts (with cookies - more stable)
    cookies = "abc=12; ct0=xyz"
    await api.pool.add_account("user1", "pass1", "email@example.com", "mail_pass", cookies=cookies)

    # Or add accounts (with login/password - less stable)
    await api.pool.add_account("user2", "pass2", "email2@example.com", "mail_pass2")
    await api.pool.login_all()

asyncio.run(main())

Common Operations

# Search tweets
await gather(api.search("elon musk", limit=20))

# Get user info
await api.user_by_login("xdevelopers")
user = await api.user_by_id(2244994945)

# Get user tweets
await gather(api.user_tweets(user_id, limit=20))
await gather(api.user_tweets_and_replies(user_id, limit=20))
await gather(api.user_media(user_id, limit=20))

# Get followers/following
await gather(api.followers(user_id, limit=20))
await gather(api.following(user_id, limit=20))

# Tweet operations
await api.tweet_details(tweet_id)
await gather(api.retweeters(tweet_id, limit=20))
await gather(api.tweet_replies(tweet_id, limit=20))

# Trends
await gather(api.trends("news"))

Key Features

1. Multiple API Support

  • Search API: Standard Twitter search functionality
  • GraphQL API: Advanced queries and data extraction
  • Automatic switching: Based on rate limits and availability

2. Async/Await Architecture

# Parallel scraping
async for tweet in api.search("elon musk"):
    print(tweet.id, tweet.user.username, tweet.rawContent)

3. Account Management

  • Add multiple accounts for rotation
  • Automatic rate limit handling
  • Session persistence across runs
  • Email verification support (IMAP or manual)

4. Data Models

  • SNScrape-compatible models
  • Easy conversion to dict/JSON
  • Raw API response access available

Core API Methods

Search Operations

search(query, limit, kv={})

Search tweets by query string.

Parameters:

  • query (str): Search query (supports Twitter search syntax)
  • limit (int): Maximum number of tweets to return
  • kv (dict): Additional parameters (e.g., {"product": "Top"} for Top tweets)

Returns: AsyncIterator of Tweet objects

Example:

# Latest tweets
async for tweet in api.search("elon musk", limit=20):
    print(tweet.rawContent)

# Top tweets
await gather(api.search("python", limit=20, kv={"product": "Top"}))

User Operations

user_by_login(username)

Get user information by username.

Example:

user = await api.user_by_login("xdevelopers")
print(user.id, user.displayname, user.followersCount)

user_by_id(user_id)

Get user information by user ID.

followers(user_id, limit)

Get user's followers.

following(user_id, limit)

Get users that the user follows.

verified_followers(user_id, limit)

Get only verified followers.

subscriptions(user_id, limit)

Get user's Twitter Blue subscriptions.

Tweet Operations

tweet_details(tweet_id)

Get detailed information about a specific tweet.

tweet_replies(tweet_id, limit)

Get replies to a tweet.

retweeters(tweet_id, limit)

Get users who retweeted a specific tweet.

user_tweets(user_id, limit)

Get tweets from a user (excludes replies).

user_tweets_and_replies(user_id, limit)

Get tweets and replies from a user.

user_media(user_id, limit)

Get tweets with media from a user.

Other Operations

list_timeline(list_id)

Get tweets from a Twitter list.

trends(category)

Get trending topics by category.

Categories: "news", "sport", "entertainment", etc.

Account Management

Adding Accounts

With cookies (recommended):

cookies = "abc=12; ct0=xyz"  # String or JSON format
await api.pool.add_account("user", "pass", "email@example.com", "mail_pass", cookies=cookies)

With credentials:

await api.pool.add_account("user", "pass", "email@example.com", "mail_pass")
await api.pool.login_all()

CLI Account Management

# Add accounts from file
twscrape add_accounts accounts.txt username:password:email:email_password

# Login all accounts
twscrape login_accounts

# Manual email verification
twscrape login_accounts --manual

# List accounts and status
twscrape accounts

# Re-login specific accounts
twscrape relogin user1 user2

# Retry failed logins
twscrape relogin_failed

Proxy Configuration

Per-Account Proxy

proxy = "http://login:pass@example.com:8080"
await api.pool.add_account("user", "pass", "email@example.com", "mail_pass", proxy=proxy)

Global Proxy

api = API(proxy="http://login:pass@example.com:8080")

Environment Variable

export TWS_PROXY=socks5://user:pass@127.0.0.1:1080
twscrape search "elon musk"

Dynamic Proxy Changes

api.proxy = "socks5://user:pass@127.0.0.1:1080"
doc = await api.user_by_login("elonmusk")
api.proxy = None  # Disable proxy

Priority: api.proxy > TWS_PROXY env var > account-specific proxy

CLI Usage

Search Operations

twscrape search "QUERY" --limit=20
twscrape search "elon musk lang:es" --limit=20 > data.txt
twscrape search "python" --limit=20 --raw  # Raw API responses

User Operations

twscrape user_by_login USERNAME
twscrape user_by_id USER_ID
twscrape followers USER_ID --limit=20
twscrape following USER_ID --limit=20
twscrape verified_followers USER_ID --limit=20
twscrape user_tweets USER_ID --limit=20

Tweet Operations

twscrape tweet_details TWEET_ID
twscrape tweet_replies TWEET_ID --limit=20
twscrape retweeters TWEET_ID --limit=20

Trends

twscrape trends sport
twscrape trends news

Custom Database

twscrape --db custom-accounts.db <command>

Advanced Usage

Raw API Responses

async for response in api.search_raw("elon musk"):
    print(response.status_code, response.json())

Stopping Iteration

from contextlib import aclosing

async with aclosing(api.search("elon musk")) as gen:
    async for tweet in gen:
        if tweet.id < 200:
            break

Convert Models to Dict/JSON

user = await api.user_by_id(user_id)
user_dict = user.dict()
user_json = user.json()

Enable Debug Logging

from twscrape.logger import set_log_level
set_log_level("DEBUG")

Environment Variables

  • TWS_PROXY: Global proxy for all accounts Example: socks5://user:pass@127.0.0.1:1080

  • TWS_WAIT_EMAIL_CODE: Timeout for email verification (default: 30 seconds)

  • TWS_RAISE_WHEN_NO_ACCOUNT: Raise exception when no accounts available instead of waiting Values: false, 0, true, 1 (default: false)

Rate Limits & Limitations

Rate Limits

  • Rate limits reset every 15 minutes per endpoint
  • Each account has separate limits for different operations
  • Accounts automatically rotate when limits are reached

Tweet Limits

  • user_tweets and user_tweets_and_replies return approximately 3,200 tweets maximum per user
  • This is a Twitter/X platform limitation

Account Status

  • Rate limits vary based on:
    • Account age
    • Account verification status
    • Account activity history

Handling Rate Limits

The library automatically:

  • Switches to next available account
  • Waits for rate limit reset if all accounts exhausted
  • Tracks rate limit status per endpoint

Common Patterns

Large-Scale Data Collection

async def collect_user_data(username):
    user = await api.user_by_login(username)

    # Collect tweets
    tweets = await gather(api.user_tweets(user.id, limit=100))

    # Collect followers
    followers = await gather(api.followers(user.id, limit=100))

    # Collect following
    following = await gather(api.following(user.id, limit=100))

    return {
        'user': user,
        'tweets': tweets,
        'followers': followers,
        'following': following
    }

Search with Filters

# Language filter
await gather(api.search("python lang:en", limit=20))

# Date filter
await gather(api.search("AI since:2024-01-01", limit=20))

# From specific user
await gather(api.search("from:elonmusk", limit=20))

# With media
await gather(api.search("cats filter:media", limit=20))

Batch Processing

async def process_users(usernames):
    tasks = []
    for username in usernames:
        task = api.user_by_login(username)
        tasks.append(task)

    users = await asyncio.gather(*tasks)
    return users

Troubleshooting

Login Issues

  • Use cookies instead of credentials for more stable authentication
  • Enable manual email verification with --manual flag
  • Check email password is correct for IMAP access

Rate Limit Problems

  • Add more accounts for better rotation
  • Increase wait time between requests
  • Monitor account status with twscrape accounts

No Data Returned

  • Check account status - they may be suspended or rate limited
  • Verify query syntax - use Twitter search syntax
  • Try different accounts - some may have better access

Connection Issues

  • Configure proxy if behind firewall
  • Check network connectivity
  • Verify Twitter/X is accessible from your location

Resources

References

For detailed API documentation and examples, see the reference files in the references/ directory:

  • references/installation.md - Installation and setup
  • references/api_methods.md - Complete API method reference
  • references/account_management.md - Account configuration and management
  • references/cli_usage.md - Command-line interface guide
  • references/proxy_config.md - Proxy configuration options
  • references/examples.md - Code examples and patterns

Repository: https://github.com/vladkens/twscrape Stars: 1998+ Language: Python License: MIT