A tool for searching and filtering Telegram groups/channels by keywords and participant count.
- Search Telegram groups/channels by keywords
- Generate city+word combinations for comprehensive regional search
- Automatic deduplication of all text files (queries, cities, words)
- Filter results by minimum participant count
- Resume interrupted operations (progress tracking)
- Rate limiting and flood protection
- Separate search and filtering processes for better performance
-
Install dependencies:
npm install telegram
-
Create Telegram API credentials:
- Go to https://my.telegram.org/apps
- Create a new application
- Get your
API_IDandAPI_HASH
-
Configure environment: Create
.envfile:API_ID=your_api_id API_HASH=your_api_hash PHONE_NUMBER=+1234567890 TG_2FA=your_2fa_password # Optional, only if you have 2FA enabled
-
Prepare search queries: Create
queries.txtfile with one search term per line:crypto trading bitcoin ethereum programming
Step 1: Search for groups
Option A: Regular search
node parse.jsOption B: Cities combinations search
node parse.js --citiesOther options:
node parse.js --reset-progress # Reset search progress
node parse.js --help # Show helpThis will:
- Automatically remove duplicates from
queries.txt,cities.txt, andwords.txt - Search for groups/channels using keywords from
queries.txt(regular mode) or generated combinations (cities mode) - Save results with participant counts to
groups.json - Track progress in
processed_queries.json - Resume from where it left off if interrupted
Cities mode (--cities):
- Generates all combinations from
cities.txtandwords.txt - Creates
queries_cities.txtwith combinations like "Moscow work", "Moscow freelance", etc. - Uses these combinations for search instead of
queries.txt
Step 2: Extract group IDs (optional)
node extract_ids.js # Extract IDs with participant filtering
node extract_ids.js --with-usernames # Extract IDs with usernames
node extract_ids.js --no-filter # Extract all IDs without filtering
node extract_ids.js --help # Show helpThis will:
- Read groups from
groups.json - Filter by minimum participant count (from config)
- Extract only group IDs (one per line)
- Save to
group_ids.txt
Edit config.json to customize settings:
{
"search": {
"queriesFile": "queries.txt",
"limitPerQuery": 20,
"saveFile": "groups.json",
"processedQueriesFile": "processed_queries.json",
"twoLevelParsing": {
"enabled": true,
"firstLevel": {
"limitPerQuery": 100,
"maxWords": 30
},
"secondLevel": {
"limitPerQuery": 20,
"useAllWords": true
}
}
},
"extract": {
"inputFile": "groups.json",
"outputFile": "group_ids.txt",
"includeUsernames": false,
"minParticipants": 1000,
"filterByParticipants": true
},
"throttle": {
"betweenQueriesMs": 3000,
"betweenRequestsMs": 1200,
"maxRetries": 3,
"retryBackoffMultiplier": 2,
"floodWaitCapSec": 900
}
}Search (search):
queriesFile- file with keywords for searchlimitPerQuery- maximum number of results per querysaveFile- file to save all found groupsprocessedQueriesFile- file to track processed queriestwoLevelParsing- two-level parsing settings:enabled- enable two-level parsingfirstLevel.limitPerQuery- results limit for first level (high)firstLevel.maxWords- number of first words for first levelsecondLevel.limitPerQuery- results limit for second level (regular)secondLevel.useAllWords- use all words for second level
ID Extraction (extract):
inputFile- input file with groups (default groups.json)outputFile- output file with IDs (default group_ids.txt)includeUsernames- whether to add @username after IDminParticipants- minimum participant count for filteringfilterByParticipants- enable filtering by participants
Throttling (throttle):
betweenQueriesMs- delay between search queries (ms)betweenRequestsMs- delay between API requests (ms)maxRetries- maximum retry attempts on errorretryBackoffMultiplier- delay multiplier for retriesfloodWaitCapSec- maximum wait time for FLOOD_WAIT (sec)
A modern web interface is available for easier management:
Installation:
npm install # Install dependencies
npm run server # Start API server (port 3001)
npm run dev # Start React app (port 3000)Access: Open http://localhost:3000 in your browser
Features:
- 🔍 Parsing Control - Start/stop parsing with real-time logs
- 📋 ID Extraction - Extract group IDs with filtering options
- 📁 File Management - Edit queries, cities, words files
- ⚙️ Configuration - Manage all settings including two-level parsing
- 📊 Statistics - Real-time stats with progress tracking
Production:
npm run build # Build for production
npm run preview # Preview production buildAPI Endpoints:
POST /api/parse- Start regular parsingPOST /api/parse-cities- Start cities parsingPOST /api/extract- Extract IDs with optionsGET /api/files/:filename- Get file contentPUT /api/files/:filename- Save file contentGET /api/config- Get configurationPUT /api/config- Save configurationGET /api/stats- Get statisticsws://localhost:3001/ws- Real-time logs via WebSocket
New feature for more efficient data collection in cities mode:
How it works:
- First Level: Uses high limit (100) for first N words (30)
- Second Level: Uses regular limit (20) for all words
Benefits:
- More results for popular queries
- Time savings on less popular queries
- Flexible configuration
Example:
- 28 cities × 30 first words = 840 queries with limit 100
- 28 cities × 129 all words = 3612 queries with limit 20
- Total: 4452 queries instead of 3612 in regular mode
Configuration parameters:
twoLevelParsing.enabled- enable two-level parsingfirstLevel.limitPerQuery- results limit for first level (high)firstLevel.maxWords- number of first words for first levelsecondLevel.limitPerQuery- results limit for second level (regular)secondLevel.useAllWords- use all words for second level
Testing:
node test_two_level.js # Check two-level parsing settingsnpm run dev # Start React development server
npm run build # Build for production
npm run preview # Preview production build
npm run server # Start API server
npm run parse # Run parsing via CLI
npm run extract # Run ID extraction via CLInode parse.js --reset-progress # Reset search progressGroups are saved in JSON format:
[
{
"id": "123456789",
"title": "Group Name",
"username": "group_username",
"type": "supergroup",
"access_hash": "hash_value",
"participants_count": 1500
}
]group- regular groupsupergroup- supergroupchannel- channel
Core Scripts:
parse.js- Main search scriptextract_ids.js- ID extraction scripttest_two_level.js- Two-level parsing test utility
Configuration:
config.json- Main configuration.env- Environment variables (API keys)
Data Files:
queries.txt- Search keywords (regular mode)cities.txt- List of cities (for --cities mode)words.txt- List of words (for --cities mode)queries_cities.txt- Generated city+word combinations
Output Files:
groups.json- All found groups with participant countsgroup_ids.txt- Extracted group IDsprocessed_queries.json- Search progresssession.json- Telegram session
Web Interface:
server.js- Express API serversrc/- React application sourceindex.html- HTML templatevite.config.js- Vite configurationpackage.json- Dependencies and scripts
-
"Password is empty" error
- Add your 2FA password to
.envfile - Or enter it when prompted
- Add your 2FA password to
-
FLOOD_WAIT errors
- The script automatically handles these
- Increase delays in config if needed
-
Session expired
- Delete
session.jsonand re-authenticate
- Delete
-
No groups found
- Check your search terms in
queries.txt - Try more general keywords
- Check your search terms in
-
Two-level parsing not working
- Check configuration in
config.json - Run
node test_two_level.jsto verify settings - Ensure
cities.txtandwords.txtexist
- Check configuration in
-
Web interface not loading
- Ensure API server is running on port 3001
- Check if React dev server is running on port 3000
- Verify WebSocket connection in browser console
-
Real-time logs not updating
- Check WebSocket connection status
- Restart both API server and React app
- Clear browser cache and reload
-
For better search results:
- Use diverse keywords
- Include synonyms and variations
- Add both English and Russian terms
- Don't worry about duplicates in
queries.txt- they are automatically removed
-
For stable operation:
- Don't run multiple instances simultaneously
- Regularly check logs for errors
- Make backups of results
-
For optimization:
- First run search with a small number of queries
- Configure filtering parameters for your needs
- Use different minimum participant values for different purposes
# queries.txt
cryptocurrency
bitcoin
ethereum
trading
blockchain
DeFi
NFT
# config.json - increase minimum participants
"minParticipants": 5000Regular mode:
# queries.txt
jobs moscow
vacancies
freelance
remote work
programmer
designer
# config.json - decrease minimum participants
"minParticipants": 500Cities mode (recommended):
# cities.txt
Moscow
Saint Petersburg
Kazan
Novosibirsk
# words.txt
jobs
vacancies
freelance
programmer
designer
# Run
node parse.js --cities# queries.txt
programming
python
javascript
courses
learning
IT
# config.json
"minParticipants": 1000