crazyguitar / gist:c183db310d75cc48beff8922817e4454

Created February 17, 2026 22:18 — forked from gautambak/gist:2836941

cuda memory example


	// CUDA-C includes
	#include <cuda.h>
	#include <stdio.h>
	#include "../common/book.h"

	extern "C"
	int runCudaPart();

	// CODE goes below

crazyguitar / docs.md

Created February 8, 2026 06:53 — forked from rbiswasfc/docs.md

SGLang 0.4.3

File: README.md

SGLang Documentation

We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase. Most documentation files are located under the docs/ folder. We prefer Jupyter Notebooks over Markdown so that all examples can be executed and validated by our docs CI pipeline.

Docs Workflow

Install Dependency

crazyguitar / gist:4457149d58218c75c83c9a792bcf938f

Created December 6, 2025 07:22 — forked from gcmurphy/gist:6217834

Templates for loop unrolling / metaprogramming.

	#include <iostream>
	#include <cstdint>

	template<int i, uint64_t val, typename Function>
	class Loop {
	public:
	static inline void call(Function f){
	f(i, val);
	Loop<i-1, val, Function>::call(f);
	}

crazyguitar / memory_check.cpp

Created December 6, 2025 06:51 — forked from thirdwing/memory_check.cpp

C++ code to print out runtime memory usage

	#include <iostream>
	#include <fstream>
	#include <unistd.h>

	void process_mem_usage(double& vm_usage, double& resident_set)
	{
	vm_usage = 0.0;
	resident_set = 0.0;

	// the two fields we want

crazyguitar / MoE.py

Created October 4, 2025 04:00 — forked from ruvnet/MoE.py

A PyTorch implementation of a Mixture of Experts (MoE) model resembling the Mixtral 8x7B architecture, with detailed inline comments. This model combines transformer layers with an MoE layer consisting of 8 experts, aiming for high efficiency by activating only 2 experts per token. It's configured with dimensions reflecting the operational effic…

	"""
	This model integrates the MoE concept within a Transformer architecture. Each token's
	representation is processed by a subset of experts, determined by the gating mechanism.
	This architecture allows for efficient and specialized handling of different aspects of the
	data, aiming for the adaptability and efficiency noted in the Mixtral 8x7B model's design
	philosophy. The model activates only a fraction of the available experts for each token,
	significantly reducing the computational resources needed compared to activating all experts
	for all tokens.
	"""

crazyguitar / coro.cpp

Created January 8, 2025 07:23 — forked from Qix-/coro.cpp

C++20 coroutines + LibUV sample, v2

	// Thank you to the folks at the C++ slack channel,
	// along with @lewissbaker for the excellent literature
	// (even though it took me a few days to be convinced
	// it really was so).

	#include <uv.h>

	#include <iostream>
	#include <experimental/coroutine>

crazyguitar / dag.py

Created December 25, 2024 00:13 — forked from OhadRubin/dag.py

	import networkx as nx
	from itertools import product
	"""
	When we compare this code with Airflow, the strengths of your code lie in its simplicity, lightweight nature, and the ability to easily integrate with existing Python code:

	Simplicity: This code provides a simple and straightforward way to model and work with DAGs without needing to go through the process of setting up and configuring a comprehensive system like Airflow. For smaller teams or projects with less complexity, this can be an advantage.

	Lightweight and easy to incorporate: Your code is a compact, single-file solution that can be easily integrated into an existing Python project without having to set up an entire Airflow environment. When your primary focus is on creating task dependencies with parameter combinations, rather than scheduling and monitoring, your code is easier to incorporate.

	Focused on task generation: Your code emphasizes creating a Cartesian product of tasks associated with nodes' parameters. It is geared towards tackling

crazyguitar / nsight.sh

Created October 4, 2024 18:23 — forked from mcarilli/nsight.sh

Favorite nsight systems profiling commands for Pytorch scripts

	# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.

	# https://developer.nvidia.com/nsight-systems
	# https://docs.nvidia.com/nsight-systems/profiling/index.html

	# My preferred nsys (command line executable used to create profiles) commands
	#
	# In your script, write
	# torch.cuda.nvtx.range_push("region name")
	# ...

crazyguitar / bench.py

Created June 15, 2024 02:06 — forked from marians/bench.py

Benchmarking serialization/unserialization in python using json, pickle and cPickle

	import cPickle
	import pickle
	import json
	import random
	from time import time
	from hashlib import md5

	test_runs = 1000

	def float_list():

crazyguitar / commands.md

Created June 11, 2024 20:13 — forked from mcarilli/commands.md

Single- and multiprocess profiling workflow with nvprof and NVVP (Nsight Systems coming soon...)

Ordinary launch commands (no profiling):

Single-process:

python main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/

Multi-process:

python -m torch.distributed.launch  --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/

CHANG-NING TSAI crazyguitar

File: README.md

SGLang Documentation

Docs Workflow

Install Dependency

Ordinary launch commands (no profiling):