项目作者: yinlixiang

项目描述 :
Learn Cpython Internals
高级语言:
项目地址: git://github.com/yinlixiang/Python_VM.git
创建时间: 2019-04-23T21:09:24Z
项目社区:https://github.com/yinlixiang/Python_VM

开源协议:GNU General Public License v3.0

下载


Cpython 源码阅读

Python Grammar

Python Grammar:
介绍Python语法、编译执行过程及Cypython架构UML图

Python 源码执行过程

https://hackmd.io/s/ByMHBMjFe

repo: https://github.com/python/cpython/tree/29d018aa63b72161cfc67602dc3dbd386272da64

  1. Main [Programs/python.c]
  2. => Py_Main [Modules/main.c]
  3. => pymain_main
  4. => pymain_init
  5. => _PyRuntime_Initialize
  6. => _Py_InitializeFromWideArgs
  7. => init_python
  8. => _Py_InitializeMainInterpreter
  9. => _Py_RunMain
  10. => PyRun_AnyFileExFlags
  11. => PyParser_ASTFromFileObject
  12. => PyParser_ParseFileObject
  13. => PyTokenizer_FromFile
  14. => parsetok: for (;;) {PyTokenizer_Get}
  15. => PyAST_FromNodeObject
  16. => run_mod
  17. => PyAST_CompileObject
  18. => PySymtable_BuildObject:
  19. symtable_visit_stmt(st,stmt_ty) for stmt_ty in asdl_seq
  20. => compiler_mod
  21. => compiler_enter_scope
  22. => compiler_body:
  23. VISIT(c, stmt, stmt_ty) for stmt_ty in asdl_seq
  24. => compiler_exit_scope
  25. => assemble
  26. => run_eval_code_obj
  27. => PyEval_EvalCode
  28. => PyEval_EvalCodeEx
  29. => _PyEval_EvalCodeWithName
  30. => _PyFrame_New_NoTrack
  31. => PyEval_EvalFrameEx
  32. => eval_frame
  33. => _PyEval_EvalFrameDefault:
  34. main_loop

Design of CPython’s Compiler

https://cpython-devguide.readthedocs.io/compiler

Compiler process:

  1. Parse source code into a parse tree (Parser/parsetok.c)
  2. Transform parse tree into an Abstract Syntax Tree (Python/ast.c)
  3. Transform AST into a Control Flow Graph (Python/compile.c)
  4. Emit bytecode based on the Control Flow Graph (Python/compile.c)

Excution:

  1. Executes byte code (Python/ceval.c)

Parse Trees

an LL(1) parser: Compilers: Principles, Techniques, and Tools

Python grammar: Grammar/Grammar Include/graminit.h

Python tokens: Grammar/Tokens Include/token.h

The parse tree: Include/node.h

  • CHILD(node *, int)
  • RCHILD(node *, int)
  • NCH(node *): Number of children
  • STR(node *)
  • TYPE(node *)
  • REQ(node *, TYPE)
  • LINENO(node *)

Parser/parsetok.c

  • parsetok

Abstract Syntax Trees (AST)

The Zephyr Abstract Syntax Description Language - Princeton CS

Python AST nodes: Parser/Python.asdl Parser/asdl.py

Python/asdl.c Include/asdl.h

Python/Python-ast.c Include/Python-ast.h

xxx_ty: AST node

asdl_seq *: a sequence of AST nodes

  • _Py_asdl_seq_new(Py_ssize_t, PyArena *)
  • asdl_seq_GET(asdl_seq *, int)
  • asdl_seq_SET(asdl_seq *, int, stmt_ty)
  • asdl_seq_LEN(asdl_seq *)

Memory Management

an arena: a memory is pooled in a single location for easy allocation and removal.

Include/pyarena.h Python/pyarena.c

PyArena structure

  • PyArena_New()
  • PyArena_Free()
  • PyArena_AddPyObject()

Parse Tree to AST

Python/ast.c

  • PyAST_FromNode()
    • PyAST_FromNodeObject()
      • ast_for_xxx => xxx_ty

Control Flow Graphs (CFG)

a directed graph: models the flow of a program using basic blocks

Python bytecode: intermediate representation (IR)

Basic blocks: a block of IR

  • single entry point
  • possibly multiple exit points

Code is directly generated from the basic blocks (with jump targets adjusted based on the output order) by doing a post-order depth-first search on the CFG following the edges.

AST to CFG to Bytecode

  1. transforms the AST into Python bytecode with control flow represented by the edges of the CFG.
  2. creates the namespace: variables can be classified as local, free/cell for closures, or global
  3. flattens the CFG into a list and calculates jump offsets: a post-order depth-first search

Python/compile.c

  • PyAST_CompileObject()
    • PySymtable_BuildObject(): Python/symtable.c
      • symtable_visit_xxx => symbol table
    • compiler_mod()
      • compiler_body(struct compiler *c, asdl_seq *stmts)
        • VISIT(c, stmt, stmt_ty) for stmt_ty in stmts
      • assemble(compiler c) => PyCodeObject *co
        • dfs(c, entryblock, &a, nblocks)
        • assemble_jump_offsets(&a, c)
        • Emit code in reverse postorder from dfs: assemble_emit
        • co = makecode(c, &a)

Code Objects

Include/code.h
PyCodeObject

Python/ceval.c

  • _PyEval_EvalFrameDefault()

Resources about the architecture of CPython

Current references

Title Brief Author Version
A guide from parser to objects, observed using GDB Code walk from Parser, AST, Sym Table and Objects Louie Lu 3.7.a0
Green Tree Snakes The missing Python AST docs Thomas Kluyver 3.6
Yet another guided tour of CPython A guide for how CPython REPL works Guido van Rossum 3.5
Python Asynchronous I/O Walkthrough How CPython async I/O, generator and coroutine works Philip Guo 3.5
Coding Patterns for Python Extensions Reliable patterns of coding Python Extensions in C Paul Ross 3.4

Historical references

Title Brief Author Version
Python’s Innards Series ceval, objects, pystate and miscellaneous topics Yaniv Aknin 3.1
Eli Bendersky’s Python Internals Objects, Symbol tables and miscellaneous topics Eli Bendersky 3.x
A guide from parser to objects, observed using Eclipse Code walk from Parser, AST, Sym Table and Objects Prashanth Raghu 2.7.12
CPython internals: A ten-hour codewalk through the Python interpreter source code Code walk from source code to generators Philip Guo 2.7.8