Digging into Twitter's Actual Recommendation Algorithm

April 1, 2026

Disclaimer: I am no twitter expert, just an engineer who is curious about how things-I-use-daily work.

In April 2023, Twitter open-sourced the algorithm that powers the "For You" feed. Out of curiosity, I decided to actually read the code i.e. the real production Scala and Java that determined what X users saw every day.

Last open source commit to this algorithm was approximately 7 months ago so I am sure @nikitabier must have already done a lot of changes.

Every claim below links to actual source code but in any case, take the advice here with a pinch of salt.

The Pipeline: How a Tweet Reaches Your Feed

The algorithm runs in three stages. ~1500 candidate tweets enter. ~50 make it to your feed.

flowchart LR
    A["~1500 Candidates\nfrom multiple sources"] --> B["Light Ranker\n(Earlybird)"]
    B --> C["Heavy Ranker\n(Neural Network)"]
    C --> D["Filters &\nMixing"]
    D --> E["~50 Tweets\nin your feed"]

    style A fill:#1d9bf0,color:#fff
    style E fill:#1d9bf0,color:#fff

Candidate Sourcing: Multiple systems generate ~1500 tweet candidates from both accounts you follow (~50%) and accounts you don't (~50%).
Light Ranker: A fast linear model scores and filters candidates. Tweets below threshold are cut here — they never reach the neural network.
Heavy Ranker: A neural network predicts 15 types of engagement for each surviving tweet, then combines them into a final score.
Filters & Mixing: Visibility rules, safety filters, deduplication, and diversity mixing produce the final feed.

I am now going to explain each of these stages in detail:

Stage 1: How Tweets Become Candidates

Your "For You" feed draws from multiple candidate sources running in parallel:

flowchart TD
    subgraph In-Network ["In-Network (~50%)"]
        EB["Earlybird Search\n(tweets from people you follow)"]
    end

    subgraph Out-of-Network ["Out-of-Network (~50%)"]
        SC["SimClusters\n(community detection)"]
        UTEG["UTEG\n(interaction graph)"]
        DR["Deep Retrieval\n(neural embeddings)"]
        TW["TwHIN\n(knowledge graph)"]
    end

    EB --> MIX["Candidate Pool\n~1500 tweets"]
    SC --> MIX
    UTEG --> MIX
    DR --> MIX
    TW --> MIX

    style MIX fill:#1d9bf0,color:#fff

SimClusters: Community-Based Discovery

Twitter's follow graph is decomposed into ~145,000 communities using a custom algorithm. Every active producer is assigned to a primary community based on who follows them.

When you tweet, your tweet embedding starts as an empty vector. Each time someone likes your tweet, their interest vector (derived from which communities they belong to) is added to your tweet's embedding. Over time, your tweet develops a fingerprint that represents which communities find it interesting.

Users see out-of-network tweets when their interest embedding has high cosine similarity with a tweet's embedding. This is why likes from people in your target niche matter more than random likes — they literally shape your tweet's community fingerprint.

This is why Nikita Bier has said multiple times to become a specific subject matter expert on twitter.

Source: src/scala/com/twitter/simclusters_v2/README.md — "In production, the Known For dataset covers the top 20M producers and k ~= 145000"

"Tweet embeddings are updated each time the tweet is favorited. Specifically, the InterestedIn vector of each user who Fav-ed the tweet is added to the tweet vector."

UTEG: The Interaction Graph

The User-Tweet-Entity Graph maintains a real-time bipartite graph of user-tweet interactions from the last 24-48 hours. It tracks these actions (encoded as edge types):

Edge Type	Value
Click	0
Favorite	1
Retweet	2
Reply	3
Tweet	4
Mention	5
Media Tag	6
Quote	7

When multiple people in someone's follow graph engage with your tweet, UTEG surfaces it as a candidate to that user. Favorites are weighted highest.

Source: src/scala/com/twitter/recos/user_tweet_entity_graph/UserTweetEdgeTypeMask.scala:44-52

Stage 2: The 15 Ranking Signals

The heavy ranker is a neural network that predicts 15 types of engagement for each candidate tweet, then combines them into a single score using a weighted linear formula:

FinalScore = sum(predicted_score[i] * weight[i]) + epsilon

where epsilon = 0.001. If the combined score is negative, it's penalized further.

Here are all 15 signals, grouped by category:

Core Engagement

Signal	Internal Name	What It Predicts	Actionable Takeaway
Favorite	`fav`	Will the user like this tweet?	The fundamental signal. Write content worth liking.
Retweet	`retweet`	Will the user retweet?	Shareable format: hot takes, data, useful threads.
Reply	`reply`	Will the user reply?	Ask questions. Invite debate. Be conversational.
Reply Engaged by Author	`reply_engaged_by_author`	Will the tweet author engage with replies?	Reply to your own replies. This is its own ranking signal.

Depth of Engagement

Signal	Internal Name	What It Predicts	Actionable Takeaway
Good Click (Fav/Reply)	`click_engaged`	Will the user click into the tweet AND then like or reply?	Write hooks that reward the click-through (threads, rich detail).
Good Click (Dwell 2+ min)	`click_dwell`	Will the user click in and spend 2+ minutes reading?	Long-form content: detailed threads, dense images, infographics.
Profile Click	`good_profile_click`	Will the user click your profile?	Consistent niche + compelling bio converts profile clicks.
Dwell Time	`dwell`	How long will the user look at the tweet?	Images, carousels, and longer text increase dwell.
Bookmark	`bookmark`	Will the user bookmark it?	Post reference material: guides, lists, cheatsheets, templates.
Share	`share`	Will the user share via DM?	Content people send to friends: funny, niche-useful, surprising.

Video Signals

Signal	Internal Name	What It Predicts	Actionable Takeaway
Video Quality View	`vqv`	Will the user watch 10+ seconds?	Front-load the hook. Only eligible for videos >= 10 seconds.
Video Quality View (Immersive)	`vqv_immersive`	Quality view in immersive mode?	Optimize for full-screen vertical video.
Video Watch Time	`video_watch_time_ms`	Total milliseconds of watch time.	Longer watch time = higher score. Retention > virality.
Video Quality Watch	`video_quality_watched`	10+ second quality watch.	Same 10-second threshold as VQV.

Negative Signal

Signal	Internal Name	What It Predicts	Actionable Takeaway
Negative Feedback	`negative_feedback_v2`	Will the user hit "Not interested", mute, block, or unfollow?	This carries negative weight. Avoid rage-bait to wrong audiences.

The video quality view threshold is explicitly 10 seconds in the code:

val isVideoDurationGte10Seconds =
    (features.getOrElse(VideoDurationMsFeature, None).getOrElse(0) / 1000.0) >= 10

Source: home-mixer/server/src/main/scala/com/twitter/home_mixer/model/PredictedScoreFeature.scala — all 15 features defined at lines 62-282, master list at lines 294-311

Scoring formula: home-mixer/server/src/main/scala/com/twitter/home_mixer/util/RerankerUtil.scala:104-136

Stage 3: Your Reputation Score (TweepCred)

This might be the most actionable thing you can do right now. Read along to understand.

Before your tweet even reaches the heavy ranker, a system called TweepCred computes a reputation score (0-100) for every user using PageRank on the follow graph.

This score feeds into the light ranker, search results, spam detection, and top tweets eligibility. It's one of the most consequential numbers attached to your account.

How TweepCred Is Calculated

flowchart TD
    A["Compute Initial Mass\n(UserMass.scala)"] --> B{"Suspended?"}
    B -->|Yes| C["Mass = 0\n(Dead account)"]
    B -->|No| D{"Verified?"}
    D -->|Yes| E["Mass = 100\n(Maximum)"]
    D -->|No| F["Compute base mass from:\n- Account age\n- Device validation\n- Restrictions"]
    F --> G{"Following > 500\nAND ratio > 0.6?"}
    G -->|Yes| H["Penalty #1:\nmass / e^(5.0 x (ratio - 0.6))"]
    G -->|No| I["No penalty"]
    H --> J["Run PageRank\n(20 iterations, damping 0.1)"]
    I --> J
    E --> J
    J --> K{"Following > 2500\nAND ratio > 0.6?"}
    K -->|Yes| L["Penalty #2:\npagerank / min(e^(3.0 x (ratio - 0.6) x ln(ln(followings))), 50)"]
    K -->|No| M["No penalty"]
    L --> N["Convert to 0-100:\nscore = 130 + 5.21 x ln(pagerank)"]
    M --> N

    style C fill:#e74c3c,color:#fff
    style E fill:#2ecc71,color:#fff
    style H fill:#e67e22,color:#fff
    style L fill:#e67e22,color:#fff

The Two Ratio Penalties: Real Numbers

Penalty 1 — Before PageRank (threshold: 500+ followings, ratio > 0.6):

Following	Followers	Ratio	Divisor	Effect
400	300	1.33	None	Below 500 threshold
600	1000	0.60	1.0x	Exactly at safe boundary
600	800	0.75	2.1x	Half your starting mass
1000	500	2.00	1096x	Mass effectively zero

Penalty 2 — After PageRank (threshold: 2500+ followings, ratio > 0.6):

Following	Followers	Ratio	Divisor	Effect
3000	5000	0.60	1.0x	At safe boundary
3000	3000	1.00	12x	Serious reduction
5000	2000	2.50	50x	Maximum penalty (capped)

Source: src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala:15-17,54-64 and Reputation.scala:28-48

Where TweepCred Score Directly Affects You

Your TweepCred score feeds into four critical systems:

1. Search Results — Hard Filter

A BadUserRepFilter removes tweets from search results entirely if the author's TweepCred is below a configurable minTweepCred threshold. This is a binary gate — below threshold, you're invisible in search.

Source: src/java/com/twitter/search/earlybird/search/queries/BadUserRepFilter.java:111-112

2. Spam Detection — Link Penalty

If your TweepCred is below 25 and your tweet contains a non-whitelisted link, it's scored as spam (-0.5) instead of not-spam (+0.5). That's a full 1.0 point swing. The only escape: if your tweet already has at least 1 engagement (like + retweet + reply >= 1).

Source: src/java/com/twitter/search/earlybird/search/relevance/scoring/SpamVectorScoringFunction.java:18-22

3. Top Tweets — Excluded Below 40

Your tweets cannot appear as Top Tweets if your reputation is below 40. They receive the lowest possible score regardless of engagement.

private static final double MIN_USER_REPUTATION = 40.0;

Source: src/java/com/twitter/search/earlybird/search/relevance/scoring/RetweetBasedTopTweetsScoringFunction.java:50

4. Light Ranker — Direct Score Component

Your reputation is multiplied by a weight and added directly to the tweet's light ranker score:

data.reputationContrib = params.reputationWeight * data.userRep;

If your reputation falls below a minimum threshold (and you're not followed by or verified to the viewer), your tweet is skipped entirely with SkipReason.LOW_REPUTATION — it never reaches the heavy ranker.

Source: src/java/com/twitter/search/earlybird/search/relevance/scoring/LinearScoringFunction.java:44 and FeatureBasedScoringFunction.java:300-306

What Gets You Downranked or Filtered

The visibilitylib contains a rule engine that evaluates safety labels against content. Actions range from soft downranking to hard drops.

Downranking Tiers

Content is sorted into quality tiers within conversations:

Rule	Trigger	Tier
High toxicity (multi-level)	ML toxicity model	High Quality / Low Quality / Abusive Quality
Spam replies	`DownrankSpamReply` label	Abusive Quality
Untrusted URLs	`UntrustedUrl` label	Abusive Quality
High crypto spam	`HighCryptospamScore`	Abusive Quality
Spammy content score	`HighSpammyTweetContentScore`	Abusive Quality
High proactive ToS score	`HighProactiveTosScore`	Abusive Quality
High p(spam) score	`HighPSpammyTweetScore`	Low Quality
Enforcement actioned tweet	`RitoActionedTweet`	Low Quality

The Inner Circle Exception

Every downranking rule in the codebase is gated by a "Not Inner Circle of Friends" condition. If the viewer has an inner-circle relationship with the tweet author, all downranking rules are bypassed:

abstract class ConditionWithNotInnerCircleOfFriendsRule(action, condition)
    extends RuleWithConstantAction(
      action,
      And(Not(DoesHaveInnerCircleOfFriendsRelationship), condition))

Your close network always sees your content, regardless of moderation signals.

Source: visibilitylib/src/main/scala/com/twitter/visibility/rules/DownrankingRules.scala and Rule.scala

User-Level Labels That Kill Reach

These labels on your account result in a hard Drop — your content is completely hidden from recommendations:

Abusive, AbusiveHighRecall
DoNotAmplify
Compromised
EngagementSpammer
LowQuality
SpamHighRecall
RecommendationsBlacklist
SearchBlacklist

Source: visibilitylib/src/main/scala/com/twitter/visibility/rules/UserLabelRules.scala:16-84

The Playbook: What To Actually Do

Based on everything in the code, here's what concretely matters:

Do

Reply to replies on your own tweets. ReplyEngagedByAuthor is a standalone neural network prediction head. The algorithm literally rewards authors who engage with their audience.
Post content worth bookmarking. Bookmarks are a separate ranking signal from likes. Reference material, guides, templates, and lists trigger this.
Stay in a consistent niche. SimClusters assigns you to one primary community. Scattered topics = weak community signal = poor out-of-network discovery.
Get likes from people in your target audience. Each like adds that user's community interest vector to your tweet's embedding. Likes from your niche compound your discoverability within that niche.
Write hooks that reward click-through. Two separate signals track users who click into your tweet and then engage (click_engaged) or dwell for 2+ minutes (click_dwell).
Keep your follow ratio below 0.6. If you follow more than 500 accounts, maintain a following/followers ratio under 0.6. At ratio 2.0 with 1000 followings, the penalty divisor is ~1096x on your starting reputation mass.
Verify your account. Verified users start with mass = 100 (the maximum) and bypass the ratio penalty in the initial mass calculation.
For video: front-load the hook and make it 10+ seconds. The Video Quality View signal only activates for videos >= 10 seconds in duration.

Don't

Don't follow-for-follow at scale. The algorithm applies exponential penalties at 500+ followings (before PageRank) and again at 2500+ followings (after PageRank). The penalty compounds.
Don't post links with low reputation. If your TweepCred is below 25, tweets with non-whitelisted links are flagged as spam. Build reputation before link-heavy content.
Don't rage-bait the wrong audience. Negative feedback (mutes, blocks, "Not interested") carries negative weight in the ranking formula. If the combined score goes negative, it's penalized down to near-zero.
Don't use automation or bot-like patterns. The Automation and AutomationHighRecall safety labels exist. Detected bot behavior gets the DoNotAmplify label.
Don't post duplicate content. DuplicateContent is a user-level label that results in a hard drop from recommendations.
Don't ignore your account age. Accounts under 30 days old receive a logarithmic penalty on their starting mass: mass *= log(1 + age/15). A 1-day-old account gets ~6% of full mass. A 15-day-old account gets ~69%. After 30 days, full credit.

Code References

Every claim in this post maps to a specific file in the twitter-recommendation-algorithm repository:

Claim	File
15 ranking signals	`home-mixer/server/src/main/scala/com/twitter/home_mixer/model/PredictedScoreFeature.scala:62-311`
Weighted scoring formula	`home-mixer/server/src/main/scala/com/twitter/home_mixer/util/RerankerUtil.scala:91-136`
Heavy ranker scorer	`home-mixer/server/src/main/scala/com/twitter/home_mixer/functional_component/scorer/WeighedModelRerankingScorer.scala:43-44`
145k communities, tweet embeddings	`src/scala/com/twitter/simclusters_v2/README.md:33,64-67`
UTEG edge types	`src/scala/com/twitter/recos/user_tweet_entity_graph/UserTweetEdgeTypeMask.scala:44-52`
TweepCred initial mass + ratio penalty #1	`src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala:15-64`
TweepCred reputation formula + ratio penalty #2	`src/scala/com/twitter/graph/batch/job/tweepcred/Reputation.scala:12-48`
PageRank parameters (20 iterations, 0.1 jump)	`src/scala/com/twitter/graph/batch/job/tweepcred/PreparePageRankData.scala:36-38`
Account age penalty	`src/scala/com/twitter/graph/batch/job/tweepcred/UserMass.scala:45`
Search filter by reputation	`src/java/com/twitter/search/earlybird/search/queries/BadUserRepFilter.java:111-112`
Spam scoring for low-rep + links	`src/java/com/twitter/search/earlybird/search/relevance/scoring/SpamVectorScoringFunction.java:18-53`
Top Tweets minimum reputation (40)	`src/java/com/twitter/search/earlybird/search/relevance/scoring/RetweetBasedTopTweetsScoringFunction.java:50,114-115`
Light ranker reputation contribution	`src/java/com/twitter/search/earlybird/search/relevance/scoring/LinearScoringFunction.java:44`
Low reputation skip	`src/java/com/twitter/search/earlybird/search/relevance/scoring/FeatureBasedScoringFunction.java:300-306`
Downranking rules + inner circle exception	`visibilitylib/src/main/scala/com/twitter/visibility/rules/DownrankingRules.scala`
Inner circle bypass definition	`visibilitylib/src/main/scala/com/twitter/visibility/rules/Rule.scala:160-165`
User label drops (Abusive, DoNotAmplify, etc.)	`visibilitylib/src/main/scala/com/twitter/visibility/rules/UserLabelRules.scala:16-84`
Video quality view 10-second threshold	`home-mixer/server/src/main/scala/com/twitter/home_mixer/model/PredictedScoreFeature.scala:153-164`
Low TweepCred follow store (threshold 40)	`follow-recommendations-service/common/src/main/scala/com/twitter/follow_recommendations/common/stores/LowTweepCredFollowStore.scala:24`

The actual weights for the 15 ranking signals are dynamically configured via feature switches and are not hardcoded — the architecture and signals are public, but the production tuning is not.

I hope this blogpost will be helpful in your twitter journey!

And if not, feedback on my writing is always welcome at shivamsinghal5432 [at] gmail [dot] com.