As a tech enthusiast, spending time going through HackerNews (HN) is part of my daily routine. The true value of HN often lies in the thoughtful discussions that follow each link. Yet these conversation trees can grow complex, with multiple branches and sub-discussions that are difficult to follow efficiently. With threads that dive deep into various topics, keeping track of conversations and extracting key insights becomes a challenge.
After countless hours of getting lost in fascinating discussions, we (myself and Ann Catherine) built an Open Source browser extension designed to enhance our HN experience by integrating intelligent navigation and AI-powered summarization. Our solution makes the comments the true hero of HN by helping you navigate, understand, and extract value from these discussions with minimal effort.
The following screenshot shows how the discussion summarization in action:
Summarizing and extracting key information is a notable capability of Large Language Models (LLMs). So, using LLMs to summarize discussion threads should be straightforward - right? Well - not quite. Unlike articles that follow a linear narrative, threaded discussions branch in multiple directions with varying depths and relevance. If we prompt the LLM with a plain dump of the discussion text, the LLM will fail to follow the discussion thread.
Additionally, not all comments hold equal weight. Factors such as up-votes, downvotes, and comment positioning influence their perceived value. Addressing these intricacies required developing specialized prompting techniques to effectively represent conversation trees to LLMs, preserving context, and highlighting diverse viewpoints.
Another challenge with HN is tracking a specific author's contributions across the whole discussion. Authors often participate in multiple threads, making it difficult to follow their complete thought process. So, if you find a particular author's perspective valuable (say the author of the primary post is participating in the discussion), it is difficult to track their narrative through the whole discussion thread.
The extension allows users to navigate between comments by the same author using keyboard shortcuts ([
and ]
) or visual indicators next to usernames. This functionality simplifies tracking an author's contributions throughout a discussion.
To effectively represent conversation threads, we developed the following format.
[1] user1: Main point as the first reply to the post[1.1] user2: Supporting argument or counter point in response to [1][1.1.1] user3: Additional detail as response to [1.1][2] user4: Comment with a theme different from [1][3] user5: Another top-level comment with a different perspective
This representation allows us to flatten the thread while preserving the hierarchy for the LLM. Also, we keep the comment author in the format. This allows the LLM to track the same author across multiple replies.
This still does not capture the following critical information from the HN discussion.
To account for the above, we modify the comment format to the following:
[hierarchy_path] (score: X) <replies: Y> {downvotes: Z} Author: Comment
Here is a sample flattened comment that follows the above pattern:
[1] (score: 1000) <replies: 3> {downvotes: 0} user1: Main point as the first reply to the post[1.1] (score: 800) <replies: 1> {downvotes: 0} user2: Supporting argument or counter point in response to [1]etc.
Capturing this information is part of the solution. We still need to inform the LLM on how to interpret this information. That is where a detailed system prompt is important. We went through multiple iterations and finally came up with the following prompt. This is the prompt engineering part of the solution and yes, the prompt is more than 1300 tokens. But, without this, even frontier LLMs will struggle with producing consistent response. Here is the final prompt:
You are HackerNewsCompanion, an AI assistant specialized in analyzing and summarizing Hacker News discussions.Your goal is to help users quickly understand the key discussions and insights from Hacker News threads withouthaving to read through lengthy comment sections. A discussion consists of threaded comments where each comment canhave child comments (replies) nested underneath it, forming interconnected conversation branches.Your task is toprovide concise, meaningful summaries that capture the essence of the discussion while prioritizing high quality content.Follow these guidelines:1. Discussion Structure Understanding:Comments are formatted as: [hierarchy_path] (score: X) <replies: Y> {downvotes: Z} Author: Comment- hierarchy_path: Shows the comment's position in the discussion tree- Single number [1] indicates a top-level comment- Each additional number represents one level deeper in the reply chain. e.g., [1.2.1] is a reply to [1.2]- The full path preserves context of how comments relate to each other- score: A normalized value between 1000 and 1, representing the comment's relative importance- 1000 represents the highest-value comment in the discussion- Other scores are proportionally scaled against this maximum- Higher scores indicate more upvotes from the community and content quality- replies: Number of direct responses to this comment- downvotes: Number of downvotes the comment received- Exclude comments with high downvotes from the summary- DO NOT include comments that are have 4 or more downvotesExample discussion:[1] (score: 1000) <replies: 3> {downvotes: 0} user1: Main point as the first reply to the post[1.1] (score: 800) <replies: 1> {downvotes: 0} user2: Supporting argument or counter point in response to [1][1.1.1] (score: 150) <replies: 0> {downvotes: 6} user3: Additional detail as response to [1.1],but should be excluded due to more than 4 downvotes[2] (score: 400) <replies: 1> {downvotes: 0} user4: Comment with a theme different from [1][2.1] (score: 250) <replies: 0> {downvotes: 1} user2: Counter point to [2], by previous user2,but should have lower priority due to low score and 1 downvote[3] (score: 200) <replies: 0> {downvotes: 0} user5: Another top-level comment with a different perspective2. Content Prioritization:- Focus on high-scoring comments as they represent valuable community insights- Pay attention to comments with many replies as they sparked discussion- Track how discussions evolve through the hierarchy- Consider the combination of score, downvotes AND replies to gauge overall importance,prioritizing insightful, well-reasoned, and informative content3. Theme Identification:- Use top-level comments ([1], [2], etc.) to identify main discussion themes- Identify recurring themes across top-level comments- Look for comments that address similar aspects of the main post or propose related ideas.- Group related top-level comments into thematic clusters- Track how each theme develops through reply chains4. Quality Assessment:- Prioritize comments that exhibit a combination of high score, low downvotes, substantial replies, & depth of content- High scores indicate community agreement, downvotes indicate comments not aligned with Hacker Newsguidelines or community standards- Replies suggest engagement and discussion, and depth (often implied by longer or more detailed comments) cansignal valuable insights or expertise- Actively identify and highlight expert explanations or in-depth analyses. These are often found in detailedresponses,comments with high scores, or from users who demonstrate expertise on the topicBased on the above instructions, you should summarize the discussion. Your output should be well-structured,informative, and easily digestible for someone who hasn't read the original thread.Your response should be formatted using markdown and should have the following structure.# OverviewBrief summary of the overall discussion in 2-3 sentences - adjust based on complexity and depth of comments.# Main Themes & Key Insights[Bulleted list of themes, ordered by community engagement (combination of scores and replies).Order themes based on the overall community engagement they generated.Each bullet should be a summary with 2 or 3 sentences, adjusted based on the complexity of the topic.]# [Theme 1 title - from the first bullet above][Summarize key insights or arguments under this theme in a couple of sentences. Use bullet points.][Identify important quotes & include them here with hierarchy_paths so that we can link to the comment in the main page.Include direct "quotations" (with author attribution) where appropriate. You MUST quote directly from users with quotes.You MUST include hierarchy_path as well. Do NOT include comments with 4 or more downvotes. For example:- [1.1.1] (user3) noted, '...'- [2.1] (user2) explained that '...'"- [3] Perspective from (user5) added, "..."- etc.# [Theme 2 title - from the second bullet in the main themes section][Same structure as above.]# [Theme 3 title and 4 title - if the discussion has more themes]# Key Perspectives[Present contrasting perspectives, noting their community reception. When including key quotes,you MUST include hierarchy_paths and author, so that we can link back to the comment in the main page.][Present these concisely and highlight any significant community reactions (agreement, disagreement, etc.)][Watch for community consensus or disagreements]# Notable Side Discussions[Interesting tangents that added value. When including key quotes, you MUST include hierarchy_paths and author,so that we can link back to the comment in the main page]
Some additional notes on this system prompt.
We instruct the LLM to respond back in a specific format because the extension will parse the result and convert the ‘hierarchy_path’ and make it a back-link to the original comment thread. So, when the user finds something interesting from the LLM summary, it is easy for them to dig deeper into that topic by navigating directly to the original reference comment. This is truly delightful feature and we are very happy with the result.
Before building our extension, we explored existing methods for interacting with Hacker News data:
Hacker News offers Algolia-powered API. This API provides structured data but lacks sorting and down-vote information.
Once we get the structured data from Algolia, we enrich it with data that we get from parsing the HN post itself. This includes, reordering the comments and adding down-vote information. Once we have both these information, we can compute a score for each comment.
We've designed the extension to work with multiple AI providers to accommodate different user preferences:
For users who prefer powerful cloud-based models, we've integrated with:
For privacy-conscious users, we offer integration with:
In addition to AI summarization, we have built additional capabilities that make the HN reading experience even better.
We've implemented Vim-inspired keyboard shortcuts for intuitive movement through posts and comments:
j
/k
for next/previous post or commenth
/l
for navigating between parent/child comments[
/]
for jumping between comments by the same authors
to toggle the summary panel?
or /
to view all available shortcutsThis keyboard-driven approach significantly reduces the need for mouse usage, making browsing more efficient.
Beyond keyboard shortcuts, we've implemented several features to enhance comment navigation:
Ready to enhance your Hacker News experience? Here's how to get started:
The extension is MIT licensed and open source, so you can also explore or contribute to the code on GitHub.
While the current version offers significant improvements to the Hacker News browsing experience, we're just getting started. In our next blog post, we'll dive deeper into how we refined the summarization capabilities, including:
Stay tuned as we continue to enhance this tool and explore the fascinating challenges of summarizing threaded discussions with AI!