Mctrain's Blog

What I learned in IT, as well as thought about life

ChinaSys小记(2015.6)

| Comments

今年上半年的ChinaSys选择了在厦门举办,真是深得我心。顺便见了父母一趟,吃了好多海鲜,和亲人喝了几杯白酒,和几个闷骚程序员去鼓浪屿弱弱地游玩了一趟,还走了走厦门大学,白城沙滩,总的来说还是非常惬意的。不过两天会议下来,我越发觉得程序员与程序员之间的交流还是很局限的,特别是当我发现大部分人和talk都和我方向并不相同的时候,我就不知道聊天的时候该聊些什么。所以在两天会议的过程中其实并没有太多的和外界的交流,这也是我这次参加ChinaSys比较遗憾的一点。

废话说到这里,开始进入正题。根据主办方的介绍,这次ChinaSys注册105人次,放眼望去大部分都还是老面孔,来自清华,北大,计算所,上交,复旦,中科大,华科,北理工等高校,以及MSRA,百度等公司。从会议的整个过程来看,国内各大高校和公司的研究水平都很高,在各个的领域都有比较深入的探究,但是由于本人水平的原因,很多关键点并没有get到,所以笔记也就显得比较混乱,这次ChinaSys回来让我最大的感受就是要多了解一些各个领域的知识,对整个计算机发展的各个方向有一个大方向的了解,知道相应的问题,挑战,主要技术等,这样在每次参加这些会议的时候收获的就不只是这些皮毛的东西了。

接下来是我的笔记,基本没有太细节的点,因为很多也是别人正在做的工作,不能透露太多,当然更重要的原因其实是那些细节我自己并不了解。这次会议上用的是markdown直接记的一些印象比较深刻的点,这里也就直接整理一下记录在博客里面,留作纪念。另外这次会议没有很明确的分session,记录顺序就完全按演讲的时间顺序,另外非常不好意思的是由于某些客观原因,有好多个talk我都没有听到,所以也就不再这里记录了。这次ChinaSys共有28个talk,1个华科的金海教授keynote,以及一个讨论“如何在国内做出世界级研究”的panel。


ChinaSys 2015.6

105 registrations


Persistent B+ trees in Non-Volatile Main Memory

陈世敏 from 中科院计算所

problem: B+ tree one operation -> multiple inconsistency states in cache when crash

existing solution:

  • write-ahead log, 4 times (clflush & mfence) cost
  • shadowing (like RCU?), B+ tree need to change 2 pointer, not atomic one.

solution: re-design B+ tree node.

  • unsorted + slot array/bitmatp B+ node structure.
  • use empty slot to store newly inserted one
  • atomically update slot array & bitmap

Twin-Load: 一种在同步内存接口上构建异步内存扩展的方法

陈明宇 from 中科院计算所

problem: capacity wall (vs. memory wall) 扩容困难 (封装,结构,工艺)

DRAM系统容量 = 通道数 * 通道内颗粒数 * 颗粒容量

goal: 不修改通用处理器,支持异步扩展: 同步接口(通用)+异步协议(扩展性)

solution: 把一次访存分为一次预取和一次读取


RecFS: Building Reliable and Efficient Cloud Storage Services with File System in User Space

杨智 from 北大

background: storage synchronization approaches (dropbox?): inotify + rsync, 校验匹配

problem: high cost + inconsistency

solution: 利用用户态文件系统(fuse)截获写操作,获得相关信息 (relation table),


Computational Memory Architecture

王颖 from 中科院计算所

Processing in Memory (PIM):直接在内存中进行计算。 (NDC, NDA …),将通用处理器、流处理器等集成到内存。

problem: PIM returns (enabling technique + demanding app).

proposal: computational memory (ProPRAM)

  • in-memory computation application
  • COMS-computible memory technique

重用内存内资源,不需要integrate新的处理器等,内存加速器。


GraM: Scaling Graph Computation to the Trillions

杨凡 from MSRA

backgroud: graph engine, large graph computing

GraM: graph engine - focus on Scalability and Efficiency

design:

  • simple model - message passing
  • multi-core aware RDMA stack

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning

朱晓伟 from 清华

background: out-of-core - use disk, guarantee locality by partition.

insight: if we can guarantee the locality of both source (gather) and destination (scatter) vertex, we are able to merge the 2 phases into 1!

design: 2 phases -> 1 phase


Hardware Isolation is coming, What’s Next for System Software?

徐天妮 from 中科院计算所

problem: sharing cause inteference, … isolate programs from each other on a shared server is hard.

insight: a computer is inherently a network, design of network (tag) can be utilized to system.

PARD Programmable Architecture for Resourcing-on-Demand

challenge:

  • hardware differentiate application request: taggging each app
  • how to design control plane for a diversity of app: table + programming interface + interrupt line

like full-system SRIOV (via tagging in hardware)


云游戏细粒度资源调度

张伟 from 华科

background: 视频流 (并发度低,资源利用率低) & 图形流(终端要求高,跨平台难)

负载:逻辑 + 渲染 + 压缩

problem: 云游戏资源调度

solution:

  • 任务解耦:逻辑-渲染分离
  • 多资源融合调度
  • 轻量级负载迁移

Efficient Deterministic Replay with Hardware Virtualization Extensions

任仕儒 from 北大

motivation: software only deterministic replay

R&R the Memory interleaving with HAV extension?

当虚拟机下陷的时候通过EPT里面的dirty|access bit来记录对应的访问。

truncate chunk using performance counter (BTS)


An effective correlation-aware VM placement scheme for reducing SLA violation in data center

许胜 from 中科院计算所

background: data center power and utilization.

motivation: 虚拟机部署算法(分布式部署):既保证SLA性能,同时降低物理服务器数量。

solution: 通过对服务器部署能力进行约束的策略,采用SSP优化部署算法, 考虑应用的资源需求特性。


Linux内核数据竞争统计与分析

石剑君 from 北理工

motivation: summary of linux kernel data race

approach:

  • sources: BugZilla, linux mailing list, changlog
  • 归类patterns: use before initialization , use after free, access without sync, access with improper sync

Robust Distributed System Nucleus (rDSN) for Distributed System Study and Research

郭振宇 from MSRA

problem: robustness cannot be achieved in a single point, many research tools failed to be adopted in production.

proposal: come up a new development framework.

  • need to be able to monitor and manipulate all dependencies and non-determinisms in the system, with good semantic level.
  • well-defined interface for apps, reusable.
  • be practical, do not deviate from existing programming model too far.

Keynote: 图计算

金海 from 华科

网络空间实体关联

数据在哪里

  • 1% Web化数据: 1/500可爬(网站主观(主动屏蔽)、非主观(不符合规范等)原因不可爬)
  • 99%非Web化数据: 人工生成,qq,邮件,物联网…

Accelerating distributed graph processing with RDMA

高品 from 清华

motivation: data locality vs. load balance


Efficient Concurrent Search Tree for Epoch-based In-memory Database

张凯源 from 上海交大

insight: batch B+ tree node insert, search…

proposal: buffered B+ tree

problem: good insert, but bad search


Toward Optimized Array-based Computing Framework

章明星 from 清华

background: array-based languages: cannot scale-out

motivation: array-based program - (front end) -> array primitive - (back end) ->

中间缺少一个optimizer

design:

  • distinguish local and distributed data
  • separate computation and communication
  • optimize each locally-computing period

开源硬件加速创新设计

刘兴华 from LeMaker

大学创客创新计划

创客 vs. DIY

开源硬件 vs. 开发板


ShiDianNao: Shifting Vision Processing Closer to the Sensor

杜子东 from 中科院计算所

Diannao series : 硬件用于神经网络的加速器

background: 为什么要在sensor旁边用加速器: 功耗主要消耗在内存


Hi-fi Playback: Tolerating Position Errors in Shift Operations of Racetrack Memory

张超 from 北大

background: Racetrack memory (latency vs. capacity) proposed by IBM

shift position error (只移到一半,或者移多了)


总之,这次ChinaSys笔记记得比较浅,希望下次能做的更好。

Comments