style: 样式优化

master
JimZhang 2 years ago
parent df18165dfc
commit 17f3063fb6

@ -5,9 +5,15 @@ draft = true
[taxonomies]
tags=["Raft","分布式"]
+++
## Log compact机制
snapshot中需要对log进行compact防止log过多对服务造成大量压力。
### compact策略
- 执行snapshot时删除上次snapshot之前的数据
- 执行snapshot时固定保留最后n条entry
- leader计算近期分钟级别心跳中所有follower的next_index最小值再保留前n条entry
* 执行snapshot时删除上次snapshot之前的数据
* 执行snapshot时固定保留最后n条entry
* leader计算近期分钟级别心跳中所有follower的next_index最小值再保留前n条entry

@ -1,6 +1,7 @@
+++
title = "Hey, this is my blog"
date = 2022-08-25
draft = true
+++
Congratulations!

@ -0,0 +1,11 @@
+++
title = "Google"
date = 2022-12-08
template = "redirect.html"
draft = true
[taxonomies]
tags=["links"]
[extra]
redirect = "https://www.google.co.jp/"
+++
Google

@ -0,0 +1,189 @@
+++
title = "Python异步管道"
date = 2022-12-09
draft = false
[taxonomies]
tags=["python"]
+++
最近flink使用比较多使用python处理大规模数据的时按照`Pythonic`风格编码很难受在github上找了一下python流式管道的库发现了[pypeln](https://github.com/cgarciae/pypeln)[aiostream](https://github.com/vxgmichel/aiostream)。
## pypeln使用
> Concurrent data pipelines in Python >>>
pypeln是一个并发数据管道库当你觉得使用Spark、Flink、Dask过重直接处理太慢的时候可以使用它。
### 安装
```bash
pip install pypeln -i https://pypi.douban.com/simple
```
### 基本用法
```python
## 使用多进程模式
# import pypeln.process as operator
## 使用多线程模式
# import pypeln.thread as operator
# 使用协程模式
import pypeln.task as operator
def before_start_hook(database_uri):
async def wrapper():
return {'database':await MongoClient(database_uri)}
return wrapper
async def on_done_hook(database):
await database.close()
async def find_url(data_id,database):
return await database.get_url_by_id(data_id)
async def mock_http(url):
# 模拟http请求
return await asyncio.sleep(3,{'url':url})
async def mock_data_store(doc,database):
await database.insert_one(doc)
async def mock_data_source():
for i in range(100):
yield str(i)
pipes=(mock_data_source()
# on_start依赖注入到运行函数中 on_done在结束时回调
|operator.map(find_url,on_start=before_start_hook('data_uri'),on_done=on_done_hook,workers=8,maxsize=8)
|operator.map(mock_http,maxsize=200,workers=200)
|operator.each(mock_data_store,on_start=before_start_hook('data_uri'),on_done=on_done_hook,workers=8,maxsize=8)
)
# 运行
for pipe in pipes:
pass
```
### pypeln的问题
pypeln对于普通的并发任务可以很好的处理该库没有实现buffer运算符无法将流转换成批进行批量操作写数据库和写文件存在瓶颈。
## aiostream
> Generator-based operators for asynchronous iteration
aiostream是一个基于生成器的异步库使用拉模型天然背压。
### 安装
```bash
pip install aiostream -i https://pypi.douban.com/simple
```
### 基本使用
```python
import asyncio
from aiostream import stream, pipe
async def mock_http(url):
# 模拟http请求
return await asyncio.sleep(3,{'url':url})
async def mock_data_store(docs):
await database.insert_one(doc)
async def mock_data_source():
for i in range(100):
yield str(i)
async def main():
async with get_database() as database:
async def find_url(data_id):
return await database.get_url_by_id(data_id)
async def mock_data_store(docs):
await database.insert_many(docs)
await (stream.iterate(mock_data_source())
|stream.map(find_url,task_limit=5)
|stream.map(mock_http,task_limit=5)
|stream.timeout_buffer(100,3)
|stream.map(mock_data_store,task_limit=2)
)
asyncio.run(main())
```
上面示例代码中`timeout_buffer`操作符官方没有实现根据github issue中作者给出了样例
```python
from contextlib import asynccontextmanager
import asyncio
from aiostream import pipe, operator, streamcontext
@asynccontextmanager
async def buffer(streamer, size=1):
queue = asyncio.Queue(maxsize=size)
sentinel = object()
async def consume():
try:
async for item in streamer:
await queue.put(item)
finally:
await queue.put(sentinel)
@operator
async def wrapper():
while True:
item = await queue.get()
if item is sentinel:
await future
return
yield item
future = asyncio.ensure_future(consume())
try:
yield wrapper()
finally:
future.cancel()
@operator(pipable=True)
async def catch(source, exc_cls):
async with streamcontext(source) as streamer:
try:
async for item in streamer:
yield item
except exc_cls:
return
@operator(pipable=True)
async def chunks(source, n, timeout):
async with streamcontext(source) as streamer:
async with buffer(streamer) as buffered:
async with streamcontext(buffered) as first_streamer:
async for first in first_streamer:
tail = await (
buffered
| pipe.timeout(timeout)
| catch.pipe(asyncio.TimeoutError)
| pipe.take(n - 1)
| pipe.list()
)
yield [first, *tail]
pipe.timeout_buffer = chunks.pipe
```
### aiostream的问题
拉模型分组分流实现比较麻烦,所有的流使用`merge`操作符汇聚调用`await`方法执行,[RxPY](https://github.com/ReactiveX/RxPY)是一个很好的替代品采取推模式但是3.x之后官方不在维护背压back-pressure`reactivex`概念难以理解,只能放弃使用。
## 应用
aiostream除了适合流式处理数据也特别适合处理爬虫业务使用aiostream重构后的爬虫整体结构更加清晰适合长期维护的爬虫。依靠python异步的性能资源利用率数据爬取效率均有一定提升。
之前公司内部部分项目使用`scrapy`但是99%的`scrapy`特性没有使用,只是将`scrapy`作为爬取器与调度器然后通过pipeline落库。今年爬虫项目大部分都上了k8s集群维护不依赖scrapy的进程守护、web查看等功能因此写了一个简化版本的`scrapy`兼容部分scrapy api公司内部所有使用scrapy的爬虫均可以替换依赖的方式兼容无需修改代码。
后续考虑使用aiostream重构一版异步scrapy兼容框架减少项目内存与CPU资源的占用。

@ -0,0 +1,98 @@
+++
title = "使用ResteasyClient请求接口"
date = 2020-04-08
draft = false
[taxonomies]
tags=["Java"]
+++
从OkHttp转到ResteasyClient
## Okhttp
以前一直用OkHttpclient访问api服务参数复杂需要手工编码。
- query参数需要手工编码或者使用uribuilder构建完整url
- request body需要指定类型不能使用类直接传参
## Retrofit
接手外包公司安卓项目时接触到Retrofit框架一种使用动态代理机制将java接口转换成网络请求。主要好处就是解耦分离api定义和使用。作为程序员偷懒就是第一生产力这么方便的使用api肯定要集成到项目中。跑到google搜了搜看到ResteasyCient也支持proxy模式请求api选择无情抛弃Retrofit毕竟项目中引入了keycloak做认证和授权自带了ResteasyClient
## ResteasyClient
Resteasy是一个实现了JAX-RS规范的轻量实现该规范是针对基于http协议的RESTful Web Service而提供标准的JAVA API定义。
ResteasyClient是Resteasy提供的一个HttpClient用来消费Resteasy api项目使用maven作为包管理。
### 1. pom.xml
```xml
<properties>
<resteasy.version>3.9.1.Final</resteasy.version>
</properties>
<dependencies>
<dependency>
<groupId>org.jboss.resteasy</groupId>
<artifactId>resteasy-client</artifactId>
<version>${resteasy.version}</version>
</dependency>
...
</dependencies>
```
这个是keycloak内置的resteasy-client版本也可以用最新的不过最新版构建Client的方式略有不同
### 2. 代码
客户端代码主要设计三大类:
- **Client**
- **WebTarget**
- **Response**
WebTarget实例由Client生成。
每个WebTarget对应一个BaseUrl项目中是可以有很多WebTarget。
生成Client实例的方式有两种
- *org.jboss.resteasy.client.ClientRequest* 生成
- *ResteasyClientBuilder* 类生成
这里使用第二种方式构建client用ResteasyClient就是为了偷懒。
### 3. api接口定义
定义好请求的端点:
```java
public interface AddressInterface {
@GET
@Path("/spider/geo")
List<LocationResult> getLocation(@QueryParam("addresses") String addresses);
}
```
LocaltionResult类
```java
@Data
public class LocationResult {
private String location;
}
```
这里使用了lombok简化代码
### 4. 请求接口
```java
ResteasyClient client = new ResteasyClientBuilder().build();
ResteasyWebTarget target = client.target(UriBuilder.fromPath("https://127.0.0.1:8080"));
ServicesInterface proxy = target.proxy(AddressInterface.class);
List<LocationResult> results=proxy.getLocation("厦门市思明区塔埔东路169号2层201单元L室")
```
很简单

@ -9,7 +9,7 @@ jina框架使用gPRC协议通讯部署到k8s中对外暴露服务需要配置
k3s中默认使用Traefik Ingress参考yaml配置如下
```YAML
```yaml
---
# Service

File diff suppressed because one or more lines are too long

@ -9,6 +9,7 @@
--primary-color: #ef5350;
--hover-color: white;
--footer-padding: 1rem;
}
// -------------- THEME SWITCHER -------------- //
@mixin dark-appearance {
@ -85,9 +86,14 @@ li {
min-height: calc(100vh - 8rem);
max-width: 70ch;
margin: 2rem auto;
padding: 2rem 2rem;
padding: 2rem 2rem 0 2rem;
.list {
min-height: 72vh;
}
}
.discus {
width: 100%;
}
hr {
margin: 2rem 0;
text-align: center;
@ -323,10 +329,10 @@ button:hover {
}
footer {
bottom: 0;
position: absolute;
left: 0;
right: 0;
// bottom: 0;
// position: absolute;
// left: 0;
// right: 0;
display: flex;
align-items: center;
flex-direction: column;
@ -360,11 +366,41 @@ footer {
}
}
.nav {
width: 20%;
height: 1.5em;
min-width: max-content;
margin-left: auto;
margin-right: auto;
text-align: justify;
-ms-text-justify: distribute-all-lines;
text-justify: distribute-all-lines;
a, div {
vertical-align: middle;
display: inline-block;
*display: inline;
}
img {
height: 1.5em;
}
}
.stretch {
width: 100%;
display: inline-block;
font-size: 0;
line-height: 0;
}
article {
margin-bottom: 2rem;
.body {
word-wrap: break-word;
min-height: 70vh;
}
}
.pagination {

@ -1,6 +1,6 @@
{% import "macros/macros.html" as post_macros %}
<!DOCTYPE html>
<html>
<html lang="zh">
{% include "partials/header.html" %}

@ -1,6 +1,6 @@
{% macro list_posts(pages, tag_name=false) %}
<ul>
{%- if current_path =="/" %}
{%- if current_path =="/" %}
<li>
{% set url = get_url(path="about") %}
<a href={{ url }}>about me</a>
@ -25,9 +25,9 @@
</li>
{% endfor -%}
{% set pages_len = pages | length %}
{%- if current_path =="/" and pages_len > 7 %}
{%- if current_path =="/" %}
<li>
<a href="/posts">more posts...</a>
<a href="/posts">more posts</a>
</li>
{% endif -%}
</ul>
@ -61,7 +61,9 @@
<h1 >
{{ title }}
</h1>
{% endmacro content %}
{% endmacro page_header %}
{% macro content(page) %}
<main>
@ -70,6 +72,12 @@
{% if time < 1 %}
{% set time = 1 %}
{% endif %}
{% if page.lower %}
{% set_global previous = page.lower %}
{% endif %}
{% if page.higher %}
{% set_global next = page.higher %}
{% endif %}
<article>
<div class="title">
{#<h1 class="title">{{ page.title }}</h1>#}
@ -141,7 +149,23 @@
{{ page.content | safe }}
</section>
{% if previous or next %}
<div class="nav" style="width: 100%;">
{% if next %}
<a href="{{ next.permalink }}">⇦ {{ next.title|truncate(length=16,end="...") }}</a>
{% else %}
<a></a>
{% endif %}
{% if previous%}
<a href="{{ previous.permalink }}">{{ previous.title|truncate(length=16,end="...") }} ⇨</a>
{% else %}
<a></a>
{% endif %}
<span class="stretch"></span>
</div>
{% endif %}
</article>
</main>

@ -1,4 +1,24 @@
{% extends "base.html" %}
{% block main_content %}
{{ post_macros::content(page=page)}}
{# <script src="https://zzl221000.github.io/js/client.js" #}
<script src="https://giscus.vercel.app/client.js"
data-repo="zzl221000/zzl221000.github.io"
data-repo-id="MDEwOlJlcG9zaXRvcnkyMDQyODcxMjY="
data-category="Comments"
data-category-id="DIC_kwDODC0sls4CTRjL"
data-mapping="specific"
data-strict="0"
data-reactions-enabled="1"
data-emit-metadata="0"
data-input-position="top"
data-theme="light"
data-lang="zh-CN"
data-loading="lazy"
crossorigin="anonymous"
async>
</script>
<div class="giscus"></div>
{% endblock main_content %}

@ -14,7 +14,7 @@
</title>
{% else %}
<title>
{{ page.title | default(value=config.title) | default(value="Post") }}
{{ page.title | default(value=config.title) | default(value="Posts") |safe}}
</title>
{% endif %}

@ -5,8 +5,10 @@
<h1>{{ config.title }}</h1>
<main class="list">
{%- if paginator %}
{%- set show_pages = paginator.pages -%}
{% else %}
{% set section = get_section(path="posts/_index.md") %}
{%- set show_pages = section.pages -%}
{% endif -%}

@ -0,0 +1,22 @@
<!DOCTYPE html>
<html lang="{{ page.lang | default(value="en") }}">
<head>
<meta charset="utf-8" />
<meta http-equiv="refresh" content="1;url={{ page.extra.redirect | safe }}"/>
<link rel="canonical" href="{{ page.extra.redirect | safe }}"/>
<script type="text/javascript">
window.location.href = "{{ page.extra.redirect | safe }}"
</script>
<title>Page Redirection</title>
{# Favicon #}
{% if config.extra.favicon %}
<link rel="icon" type="image/png" href={{ config.extra.favicon }} />
{% endif %}
<link rel="stylesheet" type="text/css" media="screen" href={{ get_url(path="no-style-please.css") }} />
</head>
<body>
If you are not redirected automatically, follow <a href='{{ page.extra.redirect | safe }}'>this link</a>.
</body>
</html>
Loading…
Cancel
Save