Digging into Twitter's Actual Recommendation Algorithm
April 1, 2026
Disclaimer: I am no twitter expert, just an engineer who is curious about how things-I-use-daily work.
In April 2023, Twitter open-sourced the algorithm that powers the "For You" feed. Out of curiosity, I decided to actually read the code i.e. the real production Scala and Java that determined what X users saw every day.
Last open source commit to this algorithm was approximately 7 months ago so I am sure @nikitabier must have already done a lot of changes.
Every claim below links to actual source code but in any case, take the advice here with a pinch of salt.
The Pipeline: How a Tweet Reaches Your Feed
The algorithm runs in three stages. ~1500 candidate tweets enter. ~50 make it to your feed.
flowchart LR
A["~1500 Candidates\nfrom multiple sources"] --> B["Light Ranker\n(Earlybird)"]
B --> C["Heavy Ranker\n(Neural Network)"]
C --> D["Filters &\nMixing"]
D --> E["~50 Tweets\nin your feed"]
style A fill:#1d9bf0,color:#fff
style E fill:#1d9bf0,color:#fff
- Candidate Sourcing: Multiple systems generate ~1500 tweet candidates from both accounts you follow (~50%) and accounts you don't (~50%).
- Light Ranker: A fast linear model scores and filters candidates. Tweets below threshold are cut here — they never reach the neural network.
- Heavy Ranker: A neural network predicts 15 types of engagement for each surviving tweet, then combines them into a final score.
- Filters & Mixing: Visibility rules, safety filters, deduplication, and diversity mixing produce the final feed.
I am now going to explain each of these stages in detail:
Stage 1: How Tweets Become Candidates
Your "For You" feed draws from multiple candidate sources running in parallel:
flowchart TD
subgraph In-Network ["In-Network (~50%)"]
EB["Earlybird Search\n(tweets from people you follow)"]
end
subgraph Out-of-Network ["Out-of-Network (~50%)"]
SC["SimClusters\n(community detection)"]
UTEG["UTEG\n(interaction graph)"]
DR["Deep Retrieval\n(neural embeddings)"]
TW["TwHIN\n(knowledge graph)"]
end
EB --> MIX["Candidate Pool\n~1500 tweets"]
SC --> MIX
UTEG --> MIX
DR --> MIX
TW --> MIX
style MIX fill:#1d9bf0,color:#fff
SimClusters: Community-Based Discovery
Twitter's follow graph is decomposed into ~145,000 communities using a custom algorithm. Every active producer is assigned to a primary community based on who follows them.
When you tweet, your tweet embedding starts as an empty vector. Each time someone likes your tweet, their interest vector (derived from which communities they belong to) is added to your tweet's embedding. Over time, your tweet develops a fingerprint that represents which communities find it interesting.
Users see out-of-network tweets when their interest embedding has high cosine similarity with a tweet's embedding. This is why likes from people in your target niche matter more than random likes — they literally shape your tweet's community fingerprint.
This is why Nikita Bier has said multiple times to become a specific subject matter expert on twitter.
Source:
src/scala/com/twitter/simclusters_v2/README.md— "In production, the Known For dataset covers the top 20M producers and k ~= 145000""Tweet embeddings are updated each time the tweet is favorited. Specifically, the InterestedIn vector of each user who Fav-ed the tweet is added to the tweet vector."
UTEG: The Interaction Graph
The User-Tweet-Entity Graph maintains a real-time bipartite graph of user-tweet interactions from the last 24-48 hours. It tracks these actions (encoded as edge types):
| Edge Type | Value |
|---|---|
| Click | 0 |
| Favorite | 1 |
| Retweet | 2 |
| Reply | 3 |
| Tweet | 4 |
| Mention | 5 |
| Media Tag | 6 |
| Quote | 7 |
When multiple people in someone's follow graph engage with your tweet, UTEG surfaces it as a candidate to that user. Favorites are weighted highest.
Source:
src/scala/com/twitter/recos/user_tweet_entity_graph/UserTweetEdgeTypeMask.scala:44-52
Stage 2: The 15 Ranking Signals
The heavy ranker is a neural network that predicts 15 types of engagement for each candidate tweet, then combines them into a single score using a weighted linear formula:
FinalScore = sum(predicted_score[i] * weight[i]) + epsilon
where epsilon = 0.001. If the combined score is negative, it's penalized further.
Here are all 15 signals, grouped by category:
Core Engagement
| Signal | Internal Name | What It Predicts | Actionable Takeaway |
|---|---|---|---|
| Favorite | fav |
Will the user like this tweet? | The fundamental signal. Write content worth liking. |
| Retweet | retweet |
Will the user retweet? | Shareable format: hot takes, data, useful threads. |
| Reply | reply |
Will the user reply? | Ask questions. Invite debate. Be conversational. |
| Reply Engaged by Author | reply_engaged_by_author |
Will the tweet author engage with replies? | Reply to your own replies. This is its own ranking signal. |
Depth of Engagement
| Signal | Internal Name | What It Predicts | Actionable Takeaway |
|---|---|---|---|
| Good Click (Fav/Reply) | click_engaged |
Will the user click into the tweet AND then like or reply? | Write hooks that reward the click-through (threads, rich detail). |
| Good Click (Dwell 2+ min) | click_dwell |
Will the user click in and spend 2+ minutes reading? | Long-form content: detailed threads, dense images, infographics. |
| Profile Click | good_profile_click |
Will the user click your profile? | Consistent niche + compelling bio converts profile clicks. |
| Dwell Time | dwell |
How long will the user look at the tweet? | Images, carousels, and longer text increase dwell. |
| Bookmark | bookmark |
Will the user bookmark it? | Post reference material: guides, lists, cheatsheets, templates. |
| Share | share |
Will the user share via DM? | Content people send to friends: funny, niche-useful, surprising. |
Video Signals
| Signal | Internal Name | What It Predicts | Actionable Takeaway |
|---|---|---|---|
| Video Quality View | vqv |
Will the user watch 10+ seconds? | Front-load the hook. Only eligible for videos >= 10 seconds. |
| Video Quality View (Immersive) | vqv_immersive |
Quality view in immersive mode? | Optimize for full-screen vertical video. |
| Video Watch Time | video_watch_time_ms |
Total milliseconds of watch time. | Longer watch time = higher score. Retention > virality. |
| Video Quality Watch | video_quality_watched |
10+ second quality watch. | Same 10-second threshold as VQV. |
Negative Signal
| Signal | Internal Name | What It Predicts | Actionable Takeaway |
|---|---|---|---|
| Negative Feedback | negative_feedback_v2 |
Will the user hit "Not interested", mute, block, or unfollow? | This carries negative weight. Avoid rage-bait to wrong audiences. |
The video quality view threshold is explicitly 10 seconds in the code:
val isVideoDurationGte10Seconds =
(features.getOrElse(VideoDurationMsFeature, None).getOrElse(0) / 1000.0) >= 10
Source:
home-mixer/server/src/main/scala/com/twitter/home_mixer/model/PredictedScoreFeature.scala— all 15 features defined at lines 62-282, master list at lines 294-311Scoring formula:
home-mixer/server/src/main/scala/com/twitter/home_mixer/util/RerankerUtil.scala:104-136
Stage 3: Your Reputation Score (TweepCred)
This might be the most actionable thing you can do right now. Read along to understand.
Before your tweet even reaches the heavy ranker, a system called TweepCred computes a reputation score (0-100) for every user using PageRank on the follow graph.
This score feeds into the light ranker, search results, spam detection, and top tweets eligibility. It's one of the most consequential numbers attached to your account.
How TweepCred Is Calculated
flowchart TD
A["Compute Initial Mass\n(UserMass.scala)"] --> B{"Suspended?"}
B -->|Yes| C["Mass = 0\n(Dead account)"]
B -->|No| D{"Verified?"}
D -->|Yes| E["Mass = 100\n(Maximum)"]
D -->|No| F["Compute base mass from:\n- Account age\n- Device validation\n- Restrictions"]
F --> G{"Following > 500\nAND ratio > 0.6?"}
G -->|Yes| H["Penalty #1:\nmass / e^(5.0 x (ratio - 0.6))"]
G -->|No| I["No penalty"]
H --> J["Run PageRank\n(20 iterations, damping 0.1)"]
I --> J
E --> J
J --> K{"Following > 2500\nAND ratio > 0.6?"}
K -->|Yes| L["Penalty #2:\npagerank / min(e^(3.0 x (ratio - 0.6) x ln(ln(followings))), 50)"]
K -->|No| M["No penalty"]
L --> N["Convert to 0-100:\nscore = 130 + 5.21 x ln(pagerank)"]
M --> N
style C fill:#e74c3c,color:#fff
style E fill:#2ecc71,color:#fff
style H fill:#e67e22,color:#fff
style L fill:#e67e22,color:#fff
The Two Ratio Penalties: Real Numbers
Penalty 1 — Before PageRank (threshold: 500+ followings, ratio > 0.6):
| Following | Followers | Ratio | Divisor | Effect |
|---|---|---|---|---|
| 400 | 300 | 1.33 | None | Below 500 threshold |
| 600 | 1000 | 0.60 | 1.0x | Exactly at safe boundary |
| 600 | 800 | 0.75 | 2.1x | Half your starting mass |
| 1000 | 500 | 2.00 | 1096x | Mass effectively zero |
Penalty 2 — After PageRank (threshold: 2500+ followings, ratio > 0.6):
| Following | Followers | Ratio | Divisor | Effect |
|---|---|---|---|---|
| 3000 | 5000 | 0.60 | 1.0x | At safe boundary |
| 3000 | 3000 | 1.00 | 12x | Serious reduction |
| 5000 | 2000 | 2.50 | 50x | Maximum penalty (capped) |
Source:
src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala:15-17,54-64andReputation.scala:28-48
Where TweepCred Score Directly Affects You
Your TweepCred score feeds into four critical systems:
1. Search Results — Hard Filter
A BadUserRepFilter removes tweets from search results entirely if the author's TweepCred is below a configurable minTweepCred threshold. This is a binary gate — below threshold, you're invisible in search.
Source:
src/java/com/twitter/search/earlybird/search/queries/BadUserRepFilter.java:111-112
2. Spam Detection — Link Penalty
If your TweepCred is below 25 and your tweet contains a non-whitelisted link, it's scored as spam (-0.5) instead of not-spam (+0.5). That's a full 1.0 point swing. The only escape: if your tweet already has at least 1 engagement (like + retweet + reply >= 1).
Source:
src/java/com/twitter/search/earlybird/search/relevance/scoring/SpamVectorScoringFunction.java:18-22
3. Top Tweets — Excluded Below 40
Your tweets cannot appear as Top Tweets if your reputation is below 40. They receive the lowest possible score regardless of engagement.
private static final double MIN_USER_REPUTATION = 40.0;
Source:
src/java/com/twitter/search/earlybird/search/relevance/scoring/RetweetBasedTopTweetsScoringFunction.java:50
4. Light Ranker — Direct Score Component
Your reputation is multiplied by a weight and added directly to the tweet's light ranker score:
data.reputationContrib = params.reputationWeight * data.userRep;
If your reputation falls below a minimum threshold (and you're not followed by or verified to the viewer), your tweet is skipped entirely with SkipReason.LOW_REPUTATION — it never reaches the heavy ranker.
Source:
src/java/com/twitter/search/earlybird/search/relevance/scoring/LinearScoringFunction.java:44andFeatureBasedScoringFunction.java:300-306
What Gets You Downranked or Filtered
The visibilitylib contains a rule engine that evaluates safety labels against content. Actions range from soft downranking to hard drops.
Downranking Tiers
Content is sorted into quality tiers within conversations:
| Rule | Trigger | Tier |
|---|---|---|
| High toxicity (multi-level) | ML toxicity model | High Quality / Low Quality / Abusive Quality |
| Spam replies | DownrankSpamReply label |
Abusive Quality |
| Untrusted URLs | UntrustedUrl label |
Abusive Quality |
| High crypto spam | HighCryptospamScore |
Abusive Quality |
| Spammy content score | HighSpammyTweetContentScore |
Abusive Quality |
| High proactive ToS score | HighProactiveTosScore |
Abusive Quality |
| High p(spam) score | HighPSpammyTweetScore |
Low Quality |
| Enforcement actioned tweet | RitoActionedTweet |
Low Quality |
The Inner Circle Exception
Every downranking rule in the codebase is gated by a "Not Inner Circle of Friends" condition. If the viewer has an inner-circle relationship with the tweet author, all downranking rules are bypassed:
abstract class ConditionWithNotInnerCircleOfFriendsRule(action, condition)
extends RuleWithConstantAction(
action,
And(Not(DoesHaveInnerCircleOfFriendsRelationship), condition))
Your close network always sees your content, regardless of moderation signals.
Source:
visibilitylib/src/main/scala/com/twitter/visibility/rules/DownrankingRules.scalaandRule.scala
User-Level Labels That Kill Reach
These labels on your account result in a hard Drop — your content is completely hidden from recommendations:
Abusive,AbusiveHighRecallDoNotAmplifyCompromisedEngagementSpammerLowQualitySpamHighRecallRecommendationsBlacklistSearchBlacklist
Source:
visibilitylib/src/main/scala/com/twitter/visibility/rules/UserLabelRules.scala:16-84
The Playbook: What To Actually Do
Based on everything in the code, here's what concretely matters:
Do
-
Reply to replies on your own tweets.
ReplyEngagedByAuthoris a standalone neural network prediction head. The algorithm literally rewards authors who engage with their audience. -
Post content worth bookmarking. Bookmarks are a separate ranking signal from likes. Reference material, guides, templates, and lists trigger this.
-
Stay in a consistent niche. SimClusters assigns you to one primary community. Scattered topics = weak community signal = poor out-of-network discovery.
-
Get likes from people in your target audience. Each like adds that user's community interest vector to your tweet's embedding. Likes from your niche compound your discoverability within that niche.
-
Write hooks that reward click-through. Two separate signals track users who click into your tweet and then engage (
click_engaged) or dwell for 2+ minutes (click_dwell). -
Keep your follow ratio below 0.6. If you follow more than 500 accounts, maintain a following/followers ratio under 0.6. At ratio 2.0 with 1000 followings, the penalty divisor is ~1096x on your starting reputation mass.
-
Verify your account. Verified users start with mass = 100 (the maximum) and bypass the ratio penalty in the initial mass calculation.
-
For video: front-load the hook and make it 10+ seconds. The Video Quality View signal only activates for videos >= 10 seconds in duration.
Don't
-
Don't follow-for-follow at scale. The algorithm applies exponential penalties at 500+ followings (before PageRank) and again at 2500+ followings (after PageRank). The penalty compounds.
-
Don't post links with low reputation. If your TweepCred is below 25, tweets with non-whitelisted links are flagged as spam. Build reputation before link-heavy content.
-
Don't rage-bait the wrong audience. Negative feedback (mutes, blocks, "Not interested") carries negative weight in the ranking formula. If the combined score goes negative, it's penalized down to near-zero.
-
Don't use automation or bot-like patterns. The
AutomationandAutomationHighRecallsafety labels exist. Detected bot behavior gets theDoNotAmplifylabel. -
Don't post duplicate content.
DuplicateContentis a user-level label that results in a hard drop from recommendations. -
Don't ignore your account age. Accounts under 30 days old receive a logarithmic penalty on their starting mass:
mass *= log(1 + age/15). A 1-day-old account gets ~6% of full mass. A 15-day-old account gets ~69%. After 30 days, full credit.
Code References
Every claim in this post maps to a specific file in the twitter-recommendation-algorithm repository:
| Claim | File |
|---|---|
| 15 ranking signals | home-mixer/server/src/main/scala/com/twitter/home_mixer/model/PredictedScoreFeature.scala:62-311 |
| Weighted scoring formula | home-mixer/server/src/main/scala/com/twitter/home_mixer/util/RerankerUtil.scala:91-136 |
| Heavy ranker scorer | home-mixer/server/src/main/scala/com/twitter/home_mixer/functional_component/scorer/WeighedModelRerankingScorer.scala:43-44 |
| 145k communities, tweet embeddings | src/scala/com/twitter/simclusters_v2/README.md:33,64-67 |
| UTEG edge types | src/scala/com/twitter/recos/user_tweet_entity_graph/UserTweetEdgeTypeMask.scala:44-52 |
| TweepCred initial mass + ratio penalty #1 | src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala:15-64 |
| TweepCred reputation formula + ratio penalty #2 | src/scala/com/twitter/graph/batch/job/tweepcred/Reputation.scala:12-48 |
| PageRank parameters (20 iterations, 0.1 jump) | src/scala/com/twitter/graph/batch/job/tweepcred/PreparePageRankData.scala:36-38 |
| Account age penalty | src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala:45 |
| Search filter by reputation | src/java/com/twitter/search/earlybird/search/queries/BadUserRepFilter.java:111-112 |
| Spam scoring for low-rep + links | src/java/com/twitter/search/earlybird/search/relevance/scoring/SpamVectorScoringFunction.java:18-53 |
| Top Tweets minimum reputation (40) | src/java/com/twitter/search/earlybird/search/relevance/scoring/RetweetBasedTopTweetsScoringFunction.java:50,114-115 |
| Light ranker reputation contribution | src/java/com/twitter/search/earlybird/search/relevance/scoring/LinearScoringFunction.java:44 |
| Low reputation skip | src/java/com/twitter/search/earlybird/search/relevance/scoring/FeatureBasedScoringFunction.java:300-306 |
| Downranking rules + inner circle exception | visibilitylib/src/main/scala/com/twitter/visibility/rules/DownrankingRules.scala |
| Inner circle bypass definition | visibilitylib/src/main/scala/com/twitter/visibility/rules/Rule.scala:160-165 |
| User label drops (Abusive, DoNotAmplify, etc.) | visibilitylib/src/main/scala/com/twitter/visibility/rules/UserLabelRules.scala:16-84 |
| Video quality view 10-second threshold | home-mixer/server/src/main/scala/com/twitter/home_mixer/model/PredictedScoreFeature.scala:153-164 |
| Low TweepCred follow store (threshold 40) | follow-recommendations-service/common/src/main/scala/com/twitter/follow_recommendations/common/stores/LowTweepCredFollowStore.scala:24 |
The actual weights for the 15 ranking signals are dynamically configured via feature switches and are not hardcoded — the architecture and signals are public, but the production tuning is not.
I hope this blogpost will be helpful in your twitter journey!
And if not, feedback on my writing is always welcome at shivamsinghal5432 [at] gmail [dot] com.