Skip to content

Propose modernising hackage-server project#67

Open
qnikst wants to merge 2 commits into
haskellfoundation:mainfrom
tweag:tweag/proposal-modernising-hackage-server
Open

Propose modernising hackage-server project#67
qnikst wants to merge 2 commits into
haskellfoundation:mainfrom
tweag:tweag/proposal-modernising-hackage-server

Conversation

@qnikst
Copy link
Copy Markdown

@qnikst qnikst commented May 21, 2026

This commit introduces a proposal of the modernising hackage-server project by Tweag. The project includes a plan to improve hackage-server scalability and resource use by migration of the data store to relation database as well as a zero-downtime migration plan

Rendered document: 0000-modernising-hackage-server.md
Related discussion on Discourse: https://discourse.haskell.org/t/feedback-request-modernising-hackage-server-community-project-proposal/14142

This commit introduces a proposal of the modernising hackage-server project
by Tweag. The project includes a plan to improve hackage-server scalability and resource use
by migration of the data store to relation database as well as a zero-downtime
migration plan

### Migration Sequence

For the migration we 5 distinct phases:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed a slight typo!

@LaurentRDC
Copy link
Copy Markdown
Contributor

This is wonderful. I'm glad someone is taking a stab at this.

A few thoughts:

I'd like the proposal to go a bit further. Since incremental changes are hard on the current hackage-server, what are some of the ways in which hackage-server-v2 will be foward looking? How do we ensure that, in 10 years, there isn't a similar proposal for hackage-server-v3 because the architecture for hackage-server-v2 is lacking, or is hard to incrementally change?
Today's problem is horizontal scalability, and the proposal addresses that. What could be tomorrow's problem, and ensure that the new design allows for this to be solved? For example, the proposal mentions the use of IO () callbacks as being problematic due to an unclear control flow. What's the alternative being proposed here?

I strongly support the choice to go with Servant. Generating HTML pages is a bit annoying out-of-the-box, but the ability to create a hackage-server-api package is unparalleled.

Finally, one crucial detail to get right here, that I think needs to be addressed, is the solution around long-term data migrations once hackage-server-v2 is the source-of-truth. I'm not familiar with acid-state in practice, but I assume that it involves writing migrations in Haskell. SQL migrations can be painful if not managed appropriately

@hasufell
Copy link
Copy Markdown
Contributor

This is direly needed.

But I found the section about flora a bit handwavy... why exactly can this not be used to build a modern hackage-server? Have you reached out to @Kleidukos? It's possible these projects have largely different scope, but it's also possible this may cause more fragmentation that could have been avoided.

I also find it unclear who is going to maintain this project after the proposal is done and implemented.

@gbaz
Copy link
Copy Markdown
Collaborator

gbaz commented May 22, 2026

Flora has none of the APIs or backend necessary to be hackage. It is only a database and frontend. Most of the "juice" in hackage is the backend structures, not just the interface it provides to an existing database.

@gbaz
Copy link
Copy Markdown
Collaborator

gbaz commented May 22, 2026

On the whole I think this proposal is reasonable and addresses a real problem. The proposed architecture -- servant and postgres, is a standard and nice one that makes sense. That said, here are some comments.

HTML is generated manually throughout, as opposed to being a structured, templated system. This means it is prohibitively expensive to do any sort of modernizing of the generated documents, despite them conceptually being simple projections of the data.

This is not true. Many, though not all pages are generated using the hstringtemplate library, and the usage could be further pursued.

My main question is I don't understand the migration plan. The existing system will take all API requests, no? So how will the new system have the data to serve? Or is the idea all requests go to the new system and then it also "forwards" them to the old one? Additionally, will we need to run both servers at once on the existing hackage box? If so, will that cause even further resource costs on an already resource-starved box? Or is the plan to have a second box as well? (Which is fine, except then the filestore will need to be shared across boxes?).

An additional issue regarding just the proposal text (not the plan) is that we do not need horizontal scaling -- mirrors suffice for the most part, and we can build UI mirroring beyond that. What we need is to reduce the in-memory footprint. The motivation for switching datastores (a much needed thing, and thank you so much for looking at it!) is not scalability in the requests-per-second sense. I believe that a well-written hackage server could comfortably be served for quite some time on a much less beefy box than we now have, if it did not use acid-state. The motivation is just that the quantity of resident memory required by the current architecture is too high per each incremental package upload.

Finally, while on the whole I think a clean API-for-API rewrite would be ideal, I do wonder if there's another "middle balance" for now, which is to not swap the whole of the backend at once from acid-state to postgres, but to just swap the most expensive part, which I believe is the packagedb. It seems from skimming the migration document that most of the lines of code that require touching (20% or so in total) are not related to the packagedb, but rather to the user store, etc -- which are much less costly, I believe, to keep in acid-state for the time being.

A partial migration does not make hackage more horizontally scalable, but as I said above, it is not horizontal scalability that is our obstacle -- it is the single-box-cost of keeping too much data in memory.

@gbaz
Copy link
Copy Markdown
Collaborator

gbaz commented May 22, 2026

All that said, if a full rewrite can be done by two engineers in three months as this proposal states, then I think that we should absolutely go for it despite my reservations -- the cost-benefit analysis and my concerns are based on my experience of the very slow development of hackage in the past, and my fear of the scale of a large rewrite. So I would encourage the proposal submitters to really be sure they understand the scope of hackage well enough to give such an estimate (though the inventory of APIs and features indicates they have already thought about this.) If that is a genuine estimate of good engineers with sound timeline judgement, then that very much incentivizes going this path.

@hasufell
Copy link
Copy Markdown
Contributor

All that said, if a full rewrite can be done by two engineers in three months

Maybe we should ask directly: is Tweag planning to use AI assistance and if so in what shape or form?

I don't see an LLM contribution policy in the hackage-server project, but this is probably useful to clear up anyway.

@qnikst
Copy link
Copy Markdown
Author

qnikst commented May 22, 2026

Thanks for replies. I'll try to address them:

@L0neGamer:

I'd like the proposal to go a bit further. Since incremental changes are hard on the current hackage-server, what are some of the ways in which hackage-server-v2 will be foward looking? How do we ensure that, in 10 years, there isn't a similar proposal for hackage-server-v3 because the architecture for hackage-server-v2 is lacking, or is hard to incrementally change?

Any reply here would be a bit philosophical. There can be no reply that will convince everyone, as there is no agreement on what right or wrong in the community. What we can guarantee that Tweag will use the best (and safe) practices as of 2026 (and not use too experimental approaches). The very least we will split the storage/query layers, so it would be possible to change the implementation w/o affecting other layers of the server implementation, and care about documentation. We believe that the proposed incremental approach to migration will ensure that codebase is modifiable without crucial rewriting, so there will be no need in the similar proposal.

I'm not familiar with acid-state in practice, but I assume that it involves writing migrations in Haskell. SQL migrations can be painful if not managed appropriately

These years we prefer to use rel8 for working with database (we already have a proof of concept for that) and sqitch for migrations. Both were used in various Haskell projects, ensuring the sustainability of the solution.

@hasufell, with regards to the flora.pm, yes we definitely in contact with @Kleidukos, at the point (may 2026) flora.pm has some features that are not compatible with hackage (e.g. because of namespace support). And no background tasks coverage. If we continue with flora.pm keeping the hackage-server as it it will require significant work as in the flora.pm, but also update tooling that will have to support modern API. With all the respect to flora.pm that I believe is very important project for entire Haskell Ecosystem, modernising hackage-server looks like the better strategy in terms of efficiency and required investments.

I also find it unclear who is going to maintain this project after the proposal is done and implemented.

We expect that the proper long term strategy is that Haskell Foundation should own hackage-server, as it's important that the core infrastructure does not depend on a single entity. But Tweag will support code maintenance and address the bugs as much as we can.

@gbaz

HTML is generated manually throughout....
This is not true. Many, though not all pages are generated using the hstringtemplate library, and the usage could be further pursued.

Thanks! We will remove the false statement. And on the course of the implementation will check what will be the best way forward whether to pursue it further, or there will be safer/more efficient approach.

My main question is I don't understand the migration plan.

We will need to update the document to be more explicit, but long story short, we expect to have a second box on duration of the migration, the only complex part is sharing an access to data storage during the first step of the migration. But this problem has nice known solutions.

... I do wonder if there's another "middle balance" for now, which is to not swap the whole of the backend at once from acid-state to postgres, but to just swap the most expensive part, which I believe is the packagedb.

This was a part of the migration plan, we first move package db, and move usersdb as a separate step. But when you mentioned that, I start to think that this step will be a great milestone in our work. When we wrote a proposal we have not anticipated that, and saw benefit to community only when all the work will be done. We look forward to do complete rewrite and current approach to working with data still sets some limitations. But I think it worth explicitly mention the milestone.

... estimates ...

With regards to the timing. Initial very safe assumption after initial work as was 6 month 3 developers, but this will be a too costly request. With the experience of the similarly looking packages and concrete plan 3 month 2 devs is optimistic but still possible assumption for the interative migration, even without any AI-tools being involved. (Though it's possible if we have unknown unknowns we will be able to deliver only the packagedb related milestone in that time.)

@hasufell and we do not plan to use agentic approach for any code rewrite, where rewrite itself is done solely or largely using AI tools.


Following actions from us:

  • add information about migrations.
  • remove statement about html generation.
  • add details about the migration steps and requirements.
  • add details if it's feasible to move only packagedb related parts to a relational database.

I'll add another comment once we complete those actions.

@LaurentRDC
Copy link
Copy Markdown
Contributor

flora.pm has some features that are not compatible with hackage (e.g. because of namespace support)

Package namespaces is something that comes up quite often. Perhaps you could mention what hackage-server-v2 could do differently from hackage-server in order to allow this feature to be added in the future?

@Kleidukos
Copy link
Copy Markdown

Kleidukos commented May 22, 2026

Regarding several things that have been said about https://flora.pm in this thread:

  • Is it ready to act as a package repository today?
    • No
  • Does it want to replace Hackage Central today?
    • No
  • Namespaces are incompatible with what we have, what do?
    • A swift read of https://flora.pm/documentation/namespaces will inform you that currently, namespaces on Flora refer to package repositories that are indexed, because https://flora.pm is a meta-index of Haskell repositories
      • If Cabal finally supports namespaces, then it's not much work on the Flora side.

I don't think flora-server is the adequate choice to replace hackage-server today. I'd like it if work on hackage-server could ensure that the service still works for people, like being able to upload through cabal upload without a timeout error, for instance.

@qnikst
Copy link
Copy Markdown
Author

qnikst commented May 22, 2026

@LaurentRDC to be honest I would love Tweag to concentrate on concrete problem: high-memory usage (and as a result instability problems (cabal upload that mentioned above)) and keep the interfaces of the hackage-server-v2 fully compatible with v2.

And after this work we will be in a place where we can discuss what can be improved or adopted from other solutions. And think about more advanced features (e.g. namespaces) and the migration path to support them in the central repository. There are many interesting ideas floating around hackage so it's too easy to jump on the endless feature creep path.

For now I, personally, would prefer to leave exploration of the namespaces to the solutions that would solve it better (flora.pm).

@LaurentRDC
Copy link
Copy Markdown
Contributor

LaurentRDC commented May 22, 2026

@qnikst I totally agree. I wasn't clear enough. My ask isn't to add this feature (namespaces) or any other; it's to ensure that the new hackage-server-v2 is designed in an extensible way, whereas hackage-server is apparently not, such that future work on hackage-server-v2 is easier than current work on hackage-server

That's a bit of a vague ask, I concede

@gbaz
Copy link
Copy Markdown
Collaborator

gbaz commented May 22, 2026

To be clear, hackage-server is currently extensible despite living on (due to its age) an effectively custom web framework. It is modular, and mostly (with the exception of the problems caused by shared in-memory state) well factored. The problem is that the core foundations that it is built on are nonextensible, and along with that the general design feature of the Haskell language that migrating large chunks of code from pure to effectful can be extremely invasive.

Extending hackage with namespaces would not be difficult for technical reasons in the design of hackage. It would be difficult for reasons having to do with the design of namespaces vis-a-vis cabal, the package ecosystem as a whole, how packages are designed and dependencies declared, and even what the purpose would be and getting a large group to agree to it. I think that its out of scope to worry about such things -- they're not hackage problems.

Ultimately, the problem with hackage as it exists is it was built using what is now a very idiosyncratic stack at its very foundations. This proposal seems sound to me particularly in that it swaps that out for broadly used and maintained code. In fact, I would hope that a full rewrite could lead to a significant drop in source lines, because much of the "framework" in the hackage codebase would not need to be rewritten -- rather that is now duplicative of servant, etc.

I do think the migration plan needs to be thought through in greater detail. On that front, we should check with the rest of the admin team but I understand hackage now runs on a cloud box because it outgrew the physical box we had allocated for it. However, we still have the physical box around -- so a cost-efficient way to have (at least for a while) two servers would be to use the current cloud and the physical box together -- but bear in mind those are not at the same datacenter, so it would not be especially easy for them to directly share disk space.

That said, mirroring packages alone can be done efficiently by polling, because one need only repeatedly check the timestamp.json file to check when the index-01 tarball has been updated, then incrementally refetch that to get the incremental updates to the core package store since last fetch.

All told, I think the new server would probably be best written not with a balancer at front between old and new, but proxying all requests, and passing through the subset (large at first) that it did not know how to handle.

@LaurentRDC
Copy link
Copy Markdown
Contributor

Thanks @gbaz , that's all very helpful!

@blackheaven
Copy link
Copy Markdown

A quick word from the SRT: every few months, especially since the rise of LLMs-based security audit, we get security reports (we still have on-going reports).
We don't hope a possible rework to fix all the vulnerabilities, but, can we somehow mention to have the security in mind, either with regular, possibly async, code review, and/or LLMs reviews (if Tweag use them).

Address all the comments from the GitHub discussion.

Co-authored-by: Sandy Maguire <sandy.maguire@tweag.io>
@qnikst
Copy link
Copy Markdown
Author

qnikst commented May 22, 2026

Thanks everyone for the comments.

We believe that we have addressed all of them: removed the falsy statement about the html generation, added information about the migrations and security.

@gbaz we have changed the architecture of the solution so now all the requests will go through the hackage-server-v2 and unknown will fall through to the hackage-server.

As for the sharing the disk, we don't think we need that. In the proposal we have 2 alternatives (lsync-based or pull based) and the best one can be chosen together with admin team, as it largely depends on the existing infrastructure.
I hope it matches your vision.


We propose a complete rewrite of `hackage-server`, into the following form. Although full rewrites are often hard to justify it is our opinion that this is the best approach forwards (see “Why Incremental Refactoring Is Not Feasible” and “Correctness Guarantees” for the specific details.)

The Hackage Server V2 project represents a *complete rewrite* of the existing infrastructure, utilizing contemporary Haskell libraries and development methodologies. This new version is architected into two primary segments:
Copy link
Copy Markdown

@ysangkok ysangkok May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of Hackage is v2 as you can see from the announcement. This version number was previously advertised on the main page. It is confusing to also label this proposal v2, it would better to call it v3.

EDIT: Here is the Well-Typed blog post: https://www.well-typed.com/blog/2013/09/hackage-2-now-available-for-beta-testing/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants