Show HN: I built an ISP infrastructure emulator from scratch with a custom vBNG (aether.saphal.me)

saphalpdyl 21 hours ago

Demo: https://aether.saphal.me GitHub: https://github.com/saphalpdyl/Aether

Aether is a multi-BNG (Broadband Network Gateway) ISP infrastructure lab built almost from scratch that emulates IPoE IPv4 subscriber management end-to-end. It supports IPoE/Ipv4 networks and runs a python-based vBNG with RADIUS AAA, per-subscriber traffic shaping, and traffic simulation emulated on Containerlab. It is also my first personal networking project, built roughly over a month.

Motivations behind the project

I'm a CS sophomore. About three years ago, I was assigned, as an intern, to build a OSS/BSS platform for a regional ISP by myself without mentoring. Referencing demo.splynx.com , I developed most of the BSS side ( bookkeeping, accounting, inventory management ), but, in terms of networking, I managed to install and setup RADIUS and that was about it. I didn't have anyone to mentor me or ask questions to, so I had given up then.

Three years later, I decided to try cracking it again. This project is meant to serve as a learning reference for anyone who's been in that same position i.e staring at closed-source vendor stacks without proper guidance. This is absolutely not production-grade, but I hope it gives someone a place to start.

Architecture overview

The core component, the BNG, runs on an event-driven architecture where state changes are passed around as messages to avoid handling mutexes and locks. The session manager is the sole owner of the session state. To keep it clean and predictable, the direBNG never accepts external inputctly. The one exception is the Go RADIUS CoA daemon, which passes CoA messages in via IPC sockets. Everything the BNG produces(events, session snapshots) gets pushed to Redis Streams, where the bng-ingestor picks them up, processes them, and persists them.

Simulation and meta-configs

I am generating traffic through a simulator node that mounts the host's docker socket and runs docker exec commands on selected hosts. The topology.yaml used by Containerlab to define the network topology grows bigger as more BNG's and access nodes are added. So aether.config.yaml, a simpler configuration, is consumed by the configuration pipeline to generate the topology.yaml and other files (nginx.conf, kea-dhcp.conf, RADIUS clients.conf etc.)

Known Limitations

- Multiple veth hops through the emulated topology add significant overhead. Profiling with iperf3 (-P 10 -t 10, 9500 MTU, 24 vCPUs) shows BNG→upstream at ~24 Gbit/s, but host→BNG→upstream drops to ~3.5 Gbit/s. The 9500 MTU also isn't representative of real ISP deployments. This gets worse when the actual network is reintroduced capping my throughput to 1.6 Gbits/sec in local. - The circuit ID format (1/0/X) is non-standard. I simplified it for clarity. - No iBGP or VLAN support. - No Ipv6 support. I wanted to target IPv4 networks from the start to avoid getting too much breadth without a lot of depth.

Nearly everything I know about networking (except some sections from AWS) I learned building this. A lot was figured out on the fly, so engineers will likely spot questionable decisions in the codebase. I'd genuinely appreciate that feedback.

Questions

- Currently, the circuit where the user connects is arbitrarily decided by the demo user. In a real system with thousands of circuits, it'd be very difficult to properly assess which circuit the customer might connect to. When adding a new customer to a service, how does the operator decide, based on customer's location, which circuit to provide the service to ?

error503 14 hours ago

> - Currently, the circuit where the user connects is arbitrarily decided by the demo user. In a real system with thousands of circuits, it'd be very difficult to properly assess which circuit the customer might connect to. When adding a new customer to a service, how does the operator decide, based on customer's location, which circuit to provide the service to ?

I'm not exactly sure what you're asking, but port allocation is, depending on the ISP's deployment model, either going to be fixed at the time the infrastructure was built, or whoever is doing the last metre install will choose a random available port on the switch. The subscriber will be assigned to that port in the RADIUS or equivalent database, and the BNG will query the subscriber based on DHCP Option 82 port information added by the switch. You could also map the subscriber based on MAC address, but this doesn't really work unless you don't support customer provided equipment on their end.

saphalpdyl 14 hours ago

My access edge is injecting DHCP Option 82, and I'm mapping customers based on (bng_id + circuit_id + remote_id ). Say, a customer on Oakwood Drive ABC wants a service. What is the process of finding the right circuit between storing the customer's desired address and finding the best circuit to connect it to? Since, as mentioned in this thread, having connected to a wrong circuit can cause network noise for other customers too, how is the "cleanest" circuit + port assigned to a customer in a location ?

error503 12 hours ago

Depends on the access technology and environment. But usually there is not much choice to be made, by design. The cable or equivalent from the customer prem will go to exactly one aggregation location, and in that location, the choice of port generally doesn't matter. Among the potentially multiple cables or ports, they're all meant to be functionally the same. Maybe something is wrong with a cable or port, and that will hopefully come out in post-install testing, but there's not meant to be much of a decision to be made for commodity service like DSL or GPON (anything that'd use BNG). It's typically just going to be up to the last metre installer.

Metro ethernet services will be designed by an engineering team on a case-by-case basis, but they very rarely if ever use BNG.

saphalpdyl 10 hours ago

Thank you! That answers my question. I appreciate your feedback.

yjftsjthsd-h 17 hours ago

Forgive my ignorance, this isn't my strong suit. Am I correct in understanding that this is mostly a simulation layer for the actual physical network, but that you're mostly(?) running off-the-shelf software on top? So this is running the same software that you'd use for a real ISP network, just without having to actually provision all the hardare? Or is part of the actual network management custom as well?

saphalpdyl 15 hours ago

Hello. Containerlab gives me the virtual network topology ( links through veth pairs, containers etc.). The actual BNG's Control plane ( authentication, authorization, session handling, traffic shaping, events streaming etc. ) is written by me. So it's less running off-the-shelf software running on virtualized hardware, and more writing the software and running it on a virtualized hardware.

At some point, I did use Nokia SR Linux as my access node + relay, but had issues with configuration and Option 82. Later, I wrote one myself.

chaz6 13 hours ago

Thanks for sharing! I am happy to see open-source BNG projects taking off in the last few months. These are a couple of others to look at:-

https://github.com/codelaboratoryltd/bng-edge-infra

https://github.com/veesix-networks/osvbng

saphalpdyl 10 hours ago

That is awesome! I was walking almost blind folded here, making decisions based on observations, intuition, some blogs, and RFC. These projects give me something to look against as I further develop my skills.

saphalpdyl 17 hours ago

I recently found out about Apache Netbox that would act as the authoritative source of truth for the network topology and replace majority of aether.config.yaml. In Splynx, I did not see any mention of an external solution. It seems they have their own stack for that.

A better and UX-friendly implementation would have been Netbox + aether.config.yaml -> configuration pipeline -> topology.yaml + <other generated files>.

john_strinlai 16 hours ago

this looks pretty interesting! i plan to take a closer look after work, but thought i would mention it now: it may be worth a look through the NANOG (north american network operators group) archives (https://nanog.org/nanog-mailing-list/list-archives/) for information around your question if you havent, and/or posting your question to the NANOG mailing list. there are many very friendly people who have experience running ISPs of all sizes.

(or whichever operators group best fits your area. i only subscribe to NANOG, so cant speak to the activity/friendliness of the other groups. you can find a pretty comprehensive list here: https://nanog.org/resources/organizations-our-community/)

saphalpdyl 14 hours ago

Subscribed! I am very new to things here since it's barely been a 1.5 months since I touched this area, so many of my questions have probably been answered. I will search around and post a question if can't find an answer.

I plan to take all the feedback I can this week, and work on them on spring break.

nineteen999 12 hours ago

This brings back fond memories of my first job real job in IT, as the sysadmin for a small boutique mom-n-pop ISP. This was dialup/ISDN days though (back in the late 90's).

Good job!

<sniff>

saphalpdyl 10 hours ago

I appreciate the compliment. I wish this type of knowledge was more easily available for the general public since it represents an integral part of modern day internet. A comment on this thread mentioned other similar ongoing project which I'm very happy about and excited to explore.

calebelac 11 hours ago

Thank you for sharing. This is really cool and way more than I accomplished as a sophomore in CS. Keep it up!

saphalpdyl 10 hours ago

Thank you. I'm exploring more on this. Getting feedback from people here has hyped me up even more for the next step.

nonameiguess 17 hours ago

I feel like you were done dirty. When I was in grad school 12 years ago, our networking classes used mininet to simulate networks on a single host. It's mostly meant for developing SDN systems, but probably would have met your needs and supports way more.

On the other hand, building even a tiny subset but doing it yourself from scratch is a great way to learn. I made a very poor man's VM image builder for HyperV years back because Packer didn't have a builder for it at the time and that was a pretty interesting experience. Finally grokked the Windows object model and even though I still don't use it, I at least no longer jeer at PowerShell.

I'm interested in the answer to your question, too, but as a customer of an ISP. I don't work for one. I was the first owner of my house and when they hooked me into their network, whoever did messed up my neighbors badly, putting them on the wrong circuit and bleeding noise into adjacent neighborhoods. For three years, complaint calls would get our network cut by third-party contractors with no warning, then we'd have to call and get it reconnected. I don't know how they're supposed to do it, but know it can cause quite a mess when they do it wrong.

saphalpdyl 15 hours ago

Thank you for the comment.

Mininet did help me a lot during the initial phases. The main reason I made the switch to containernet(mininet-fork with containers ) and then to containerlab was because I wanted to run an actual NOS image as part of my topology. That was really what pushed me to try and switch to other options.

Yup, its a different experience. Sometimes, you end up learning something you never even intended to.

As for the circuit planning, I'd guess that they have circuits map on something like Netbox and using a intermediary system that maps a customer's location to the nearest circuit. Though, I don't know how they handle the optimization side of it to prevent cases like what happened to your neighbour.

bikesharing 18 hours ago

[dead]