We recommend new contributors start from writing documentation, which helps you quickly understand SGLang codebase. Most documentation files are located under the docs/ folder. We prefer Jupyter Notebooks over Markdown so that all examples can be executed and validated by our docs CI pipeline.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // CUDA-C includes | |
| #include <cuda.h> | |
| #include <stdio.h> | |
| #include "../common/book.h" | |
| extern "C" | |
| int runCudaPart(); | |
| // CODE goes below |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #include <iostream> | |
| #include <cstdint> | |
| template<int i, uint64_t val, typename Function> | |
| class Loop { | |
| public: | |
| static inline void call(Function f){ | |
| f(i, val); | |
| Loop<i-1, val, Function>::call(f); | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #include <iostream> | |
| #include <fstream> | |
| #include <unistd.h> | |
| void process_mem_usage(double& vm_usage, double& resident_set) | |
| { | |
| vm_usage = 0.0; | |
| resident_set = 0.0; | |
| // the two fields we want |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| This model integrates the MoE concept within a Transformer architecture. Each token's | |
| representation is processed by a subset of experts, determined by the gating mechanism. | |
| This architecture allows for efficient and specialized handling of different aspects of the | |
| data, aiming for the adaptability and efficiency noted in the Mixtral 8x7B model's design | |
| philosophy. The model activates only a fraction of the available experts for each token, | |
| significantly reducing the computational resources needed compared to activating all experts | |
| for all tokens. | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // Thank you to the folks at the C++ slack channel, | |
| // along with @lewissbaker for the excellent literature | |
| // (even though it took me a few days to be convinced | |
| // it really was so). | |
| #include <uv.h> | |
| #include <iostream> | |
| #include <experimental/coroutine> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import networkx as nx | |
| from itertools import product | |
| """ | |
| When we compare this code with Airflow, the strengths of your code lie in its simplicity, lightweight nature, and the ability to easily integrate with existing Python code: | |
| Simplicity: This code provides a simple and straightforward way to model and work with DAGs without needing to go through the process of setting up and configuring a comprehensive system like Airflow. For smaller teams or projects with less complexity, this can be an advantage. | |
| Lightweight and easy to incorporate: Your code is a compact, single-file solution that can be easily integrated into an existing Python project without having to set up an entire Airflow environment. When your primary focus is on creating task dependencies with parameter combinations, rather than scheduling and monitoring, your code is easier to incorporate. | |
| Focused on task generation: Your code emphasizes creating a Cartesian product of tasks associated with nodes' parameters. It is geared towards tackling |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting. | |
| # https://developer.nvidia.com/nsight-systems | |
| # https://docs.nvidia.com/nsight-systems/profiling/index.html | |
| # My preferred nsys (command line executable used to create profiles) commands | |
| # | |
| # In your script, write | |
| # torch.cuda.nvtx.range_push("region name") | |
| # ... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import cPickle | |
| import pickle | |
| import json | |
| import random | |
| from time import time | |
| from hashlib import md5 | |
| test_runs = 1000 | |
| def float_list(): |
Single-process:
python main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/
Multi-process:
python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/
NewerOlder