The smarter way to consume HackerNews discussions

What is HN Companion?

As a tech enthusiast, spending time going through HackerNews (HN) is part of my daily routine. The true value of HN often lies in the thoughtful discussions that follow each link. Yet these conversation trees can grow complex, with multiple branches and sub-discussions that are difficult to follow efficiently. With threads that dive deep into various topics, keeping track of conversations and extracting key insights becomes a challenge.

After countless hours of getting lost in fascinating discussions, we (myself and Ann Catherine) built an Open Source browser extension designed to enhance our HN experience by integrating efficient navigation and AI-powered summarization. Our solution makes the comments the true hero of HN by helping you navigate, understand, and extract value from these discussions with minimal effort.

The following screenshot shows how the discussion summarization in action:

Screenshot: AI-powered summarization of a Hacker News discussion thread using the HN Companion extension

🎥 For more details - Watch a quick demo of HN Companion in action

Exploration of How to Handle Threaded Summarization with LLMs

Summarizing and extracting key information is a notable capability of Large Language Models (LLMs). So, using LLMs to summarize discussion threads should be straightforward - right? Well - not quite. Unlike text in articles that follow a linear narrative, threaded discussions branch in multiple directions with varying depths and relevance. If we prompt the LLM with a plain dump of the discussion text, the LLM will fail to follow the conversation thread.

Additionally, not all comments hold equal weight. Factors such as up-votes, downvotes, and comment positioning influence their perceived value. Addressing these intricacies required developing specialized prompting techniques to effectively represent conversation trees to LLMs, preserving context, and highlighting diverse viewpoints.

Following an Author

Another challenge with HN is tracking a specific author's contributions across the whole discussion. Authors often participate in multiple threads, making it difficult to follow their complete thought process. So, if you find a particular author's perspective valuable (say the author of the primary post is participating in the discussion), it is difficult to track their narrative through the whole discussion thread.

The extension allows users to navigate between comments by the same author using keyboard shortcuts ([ and ]) or visual indicators next to usernames. This functionality simplifies tracking an author's contributions throughout a discussion.

Let’s look at how we represent threaded conversations to LLMs

To effectively represent conversation threads, we developed the following format.

[1] user1: Main point as the first reply to the post
   [1.1] user2: Supporting argument or counter point in response to [1]
   [1.1.1] user3: Additional detail as response to [1.1]
   [2] user4: Comment with a theme different from [1]
   [3] user5: Another top-level comment with a different perspective

This representation allows us to flatten the thread while preserving the hierarchy for the LLM. Also, we keep the comment author in the format. This allows the LLM to track the same author across multiple replies.

This still does not capture the following critical information from the HN discussion.

If a comment is downvoted by the community or by the moderators, the LLM will not know about it.
If the discussion thread is very nested it might suggest higher engagement from the community, and depth (often implied by longer or more detailed comments) can signal valuable insights or expertise.
Top-level comments ([1], [2], etc.) usually indicate start of a new discussion themes
Since, up-voted comments move up the rank, we need to instruct the LLM to prioritize those comments.

To account for the above, we modify the comment format to the following:

[hierarchy_path] (score: X) <replies: Y> {downvotes: Z} Author: Comment

Here is a sample flattened comment that follows the above pattern:

[1] (score: 1000) <replies: 3> {downvotes: 0} user1: Main point as the first reply to the post
[1.1] (score: 800) <replies: 1> {downvotes: 0} user2: Supporting argument or counter point in response to [1]
etc.

The System prompt

Capturing this information is part of the solution. We still need to inform the LLM on how to interpret this information. That is where a detailed system prompt is important. We went through multiple iterations and finally came up with the following prompt. This is the prompt engineering part of the solution and yes, the prompt is more than 1300 tokens. But, without this, even frontier LLMs will struggle with producing consistent response. Here is the final prompt:

You are HackerNewsCompanion, an AI assistant specialized in analyzing and summarizing Hacker News discussions.
Your goal is to help users quickly understand the key discussions and insights from Hacker News threads without
having to read through lengthy comment sections. A discussion consists of threaded comments where each comment can
have child comments (replies) nested underneath it, forming interconnected conversation branches.Your task is to
provide concise, meaningful summaries that capture the essence of the discussion while prioritizing high quality content.

Follow these guidelines:
1. Discussion Structure Understanding:
   Comments are formatted as: [hierarchy_path] (score: X) <replies: Y> {downvotes: Z} Author: Comment

   - hierarchy_path: Shows the comment's position in the discussion tree
     - Single number [1] indicates a top-level comment
     - Each additional number represents one level deeper in the reply chain. e.g., [1.2.1] is a reply to [1.2]
     - The full path preserves context of how comments relate to each other

   - score: A normalized value between 1000 and 1, representing the comment's relative importance
     - 1000 represents the highest-value comment in the discussion
     - Other scores are proportionally scaled against this maximum
     - Higher scores indicate more upvotes from the community and content quality

   - replies: Number of direct responses to this comment

   - downvotes: Number of downvotes the comment received
     - Exclude comments with high downvotes from the summary
     - DO NOT include comments that are have 4 or more downvotes

   Example discussion:
   [1] (score: 1000) <replies: 3> {downvotes: 0} user1: Main point as the first reply to the post
   [1.1] (score: 800) <replies: 1> {downvotes: 0} user2: Supporting argument or counter point in response to [1]
   [1.1.1] (score: 150) <replies: 0> {downvotes: 6} user3: Additional detail as response to [1.1],
           but should be excluded due to more than 4 downvotes
   [2] (score: 400) <replies: 1> {downvotes: 0} user4: Comment with a theme different from [1]
   [2.1] (score: 250) <replies: 0> {downvotes: 1} user2: Counter point to [2], by previous user2,
           but should have lower priority due to low score and 1 downvote
   [3] (score: 200) <replies: 0> {downvotes: 0} user5: Another top-level comment with a different perspective

2. Content Prioritization:
   - Focus on high-scoring comments as they represent valuable community insights
   - Pay attention to comments with many replies as they sparked discussion
   - Track how discussions evolve through the hierarchy
   - Consider the combination of score, downvotes AND replies to gauge overall importance,
     prioritizing insightful, well-reasoned, and informative content

3. Theme Identification:
   - Use top-level comments ([1], [2], etc.) to identify main discussion themes
   - Identify recurring themes across top-level comments
   - Look for comments that address similar aspects of the main post or propose related ideas.
   - Group related top-level comments into thematic clusters
   - Track how each theme develops through reply chains

4. Quality Assessment:
    - Prioritize comments that exhibit a combination of high score, low downvotes, substantial replies, & depth of content
    - High scores indicate community agreement, downvotes indicate comments not aligned with Hacker News
      guidelines or community standards
    - Replies suggest engagement and discussion, and depth (often implied by longer or more detailed comments) can
      signal valuable insights or expertise
    - Actively identify and highlight expert explanations or in-depth analyses. These are often found in detailed
      responses,comments with high scores, or from users who demonstrate expertise on the topic

Based on the above instructions, you should summarize the discussion. Your output should be well-structured,
informative, and easily digestible for someone who hasn't read the original thread.

Your response should be formatted using markdown and should have the following structure.

# Overview
Brief summary of the overall discussion in 2-3 sentences - adjust based on complexity and depth of comments.

# Main Themes & Key Insights
[Bulleted list of themes, ordered by community engagement (combination of scores and replies).
Order themes based on the overall community engagement they generated.
Each bullet should be a summary with 2 or 3 sentences, adjusted based on the complexity of the topic.]

# [Theme 1 title - from the first bullet above]
[Summarize key insights or arguments under this theme in a couple of sentences. Use bullet points.]
[Identify important quotes & include them here with hierarchy_paths so that we can link to the comment in the main page.
Include direct "quotations" (with author attribution) where appropriate. You MUST quote directly from users with quotes.
You MUST include hierarchy_path as well. Do NOT include comments with 4 or more downvotes. For example:
- [1.1.1] (user3) noted, '...'
- [2.1] (user2) explained that '...'"
- [3] Perspective from (user5) added, "..."
- etc.

# [Theme 2 title - from the second bullet in the main themes section]
[Same structure as above.]

# [Theme 3 title and 4 title - if the discussion has more themes]

# Key Perspectives
[Present contrasting perspectives, noting their community reception. When including key quotes,
you MUST include hierarchy_paths and author, so that we can link back to the comment in the main page.]
[Present these concisely and highlight any significant community reactions (agreement, disagreement, etc.)]
[Watch for community consensus or disagreements]

# Notable Side Discussions
[Interesting tangents that added value. When including key quotes, you MUST include hierarchy_paths and author,
so that we can link back to the comment in the main page]

Some additional notes on this system prompt.

The goal is very clearly articulated
The structure is very well explained and the expected response is also clearly stated (few-shot prompting)
The expected output structure/format is very well articulated.
We introduce a personality for the LLM (HackerNewsCompanion) - this is important later when we did fine-tuning

We instruct the LLM to respond back in a specific format because the extension will parse the result and convert the ‘hierarchy_path’ and make it a back-link to the original comment thread. So, when the user finds something interesting from the LLM summary, it is easy for them to dig deeper into that topic by navigating directly to the original reference comment. This is truly delightful feature and we are very happy with the result.

How We Get the Data

Before building our extension, we explored existing methods for interacting with Hacker News data:

Algolia API

Hacker News offers Algolia-powered API. This API provides structured data but lacks sorting and down-vote information.

Screen Scraping

Once we get the structured data from Algolia, we enrich it with data that we get from parsing the HN post itself. This includes, reordering the comments and adding down-vote information. Once we have both these information, we can compute a score for each comment.

AI Provider Integration

We've designed the extension to work with multiple AI providers to accommodate different user preferences:

Cloud-Based AI Integration

For users who prefer powerful cloud-based models, we've integrated with:

Anthropic Claude: Offering excellent summarization capabilities with Claude 3.5 Sonnet and Claude 3 Opus
OpenAI: Support for GPT-3.5 Turbo and GPT-4
OpenRouter: A unified API service that provides access to multiple LLMs

Local AI Integration

For privacy-conscious users, we offer integration with:

Ollama: For running models like Llama 3, Mistral, and others locally
Chrome Built-in AI: Leveraging the browser's Gemini Nano capabilities for on-device processing

Browser Extension Features

In addition to AI summarization, we have built additional capabilities that make the HN reading experience even better.

Smart Keyboard Navigation

We've implemented Vim-inspired keyboard shortcuts for intuitive movement through posts and comments:

j/k for next/previous post or comment
h/l for navigating between parent/child comments
[/] for jumping between comments by the same author
s to toggle the summary panel
? or / to view all available shortcuts

This keyboard-driven approach significantly reduces the need for mouse usage, making browsing more efficient.

Beyond keyboard shortcuts, we've implemented several features to enhance comment navigation:

Visual indicators for post authors and comment counts
Collapsible comment threads
Comment path tracking to help users understand the discussion structure

Getting Started

Ready to enhance your Hacker News experience? Here's how to get started:

Install HN Companion from the Chrome Web Store or Firefox Add-ons
Navigate to Hacker News
Press '?' to view keyboard shortcuts
Choose your preferred AI provider in extension settings

The extension is MIT licensed and open source, so you can also explore or contribute to the code on GitHub.

What's Next

While the current version offers significant improvements to the Hacker News browsing experience, we're just getting started. In our next blog post, we'll dive deeper into how we refined the summarization capabilities, including:

Handling input limitations for large discussion threads
How we fine-tuned a model with HN summarization data to make it much more efficient and less costly

Stay tuned as we continue to enhance this tool and explore the fascinating challenges of summarizing threaded discussions with AI!