来自  资质荣誉 2019-09-25 08:30 的文章
当前位置: 澳门太阳娱乐手机登录 > 资质荣誉 > 正文

定位及增删改操作,Python_json数据检索与定位之

json数据检索与定位之jsonPath类库

基于python实现json数据的jsonPath(精简版)定位及增删改操作

by:授客 QQ:1033553122


实践环境

by:授客 QQ:1033553122

win7 64

实践环境

Python 3.4.0

win7 64

jsonpath_ng-1.4.3-py2.py3-none-any.whl

Python 3.4.0

下载地址:

代码

#-- encoding:utf-8 -*-*

使用详解

# author:授客

官方实例

>>> from jsonpath_ng import jsonpath, parse

>>> jsonpath_expr = parse('foo[*].baz')

# 提取值

>>> [match.value for match in jsonpath_expr.find({'foo':[{'baz':1}, {'baz':2}]})]

[1, 2]

# 获取匹配值对应的路径

>>> [str(match.full_path) for match in jsonpath_expr.find({'foo': [{'baz': 1}, {'baz': 2}]})]

['foo.[0].baz', 'foo.[1].baz']

# 自动提供id

>>> [match.value for match in parse('foo[*].id').find({'foo': [{'id': 'bizzle'}, {'baz': 3}]})]

['bizzle']

>>> jsonpath.auto_id_field = 'id'

>>> [match.value for match in parse('foo[*].id').find({'foo': [{'id': 'bizzle'}, {'baz': 3}]})]

['foo.bizzle', 'foo.[1]']

# 扩展功能之一 命名操作符 `parent`

>>> [match.value for match in parse('a.*.b.`parent`.c').find({'a': {'x': {'b': 1, 'c': 'number one'}, 'y': {'b': 2, 'c': 'number two'}}})]

['number one', 'number two']

>>> ['number two', 'number one']

使用扩展的解析器

好处是有更强大的扩展功能

>>> from jsonpath_ng.ext import parse

>>> jsonpath_expr = parse('foo[*].baz')

import re

jsonpath 语法

基础语法(Atomic expressions)

$ 根对象

`this` 当前对象

`foo` More generally, this syntax allows "named operators" to extend JSONPath is arbitrary ways

field 指定具体的字段

[ field ] 同field

[ idx ] 数组访问 Array access, described below (this is always unambiguous with field access)

def parse_sub_expr:

例子

'''

获取根对象

>>> parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})

[DatumInContext(value={'key2': {'id': 2}, 'key3': [{'id': 3}, {'name': 'shouke'}], 'key1': {'id': 1}}, path=Root(), context=None)]

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[{'key2': {'id': 2}, 'key3': [{'id': 3}, {'name': 'shouke'}], 'key1': {'id': 1}}]

解析字表达式-元素路径的组成部分

获取一级键对应的值

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[{'id': 1}]

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[{'id': 1}]

# 注意:单独使用 filed、 [filed] 语法,field仅支持字典的一级键

[{'key2': {'id': 2}, 'key3': [{'id': 3}, {'name': 'shouke'}], 'key1': {'id': 1}}]

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[]

# 注意:单独使用 filed、 [filed] 语法,根对象必须是字典,不能是数组

>>> [match.value for match in parse.find([{'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]}])]

[]

**:param**sub_expr:

数组访问

>>> [match.value for match in parse.find([{'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]}])]

[{'key2': {'id': 2}, 'key3': [{'id': 3}, {'name': 'shouke'}], 'key1': {'id': 1}}]

**:return**:

jsonpath操作符

jsonpath1 . jsonpath2 匹配jsonpath2,并且父节点匹配jsonpath1的所有节点(All nodes matched by jsonpath2 starting at any node matching jsonpath1) 注意:仅针对字典可用

注:有无空格不影响,比如jsonpath1.jsonpath2 下同

jsonpath [ whatever ] 如果是字典,同jsonpath.whatever,如果是数组,则表示按索引访问数组

jsonpath1 .. jsonpath2 匹配jsonpath2,并且由匹配jsonpath1的父节点派生的所有节点

jsonpath1 where jsonpath2 匹配jsonpath1并且携带一个匹配jsonpath2直接子节点的所有节点(Any nodes matching jsonpath1 with a child matching jsonpath2)

jsonpath1 | jsonpath2 匹配jsonpath1,或者jsonpath2的所有节点的集合(注:有时候结果似乎和描述不符,具体见例子

'''

例子

RIGHT_INDEX_DEFAULT = '200000000' # 右侧索引的默认值 未指定右侧索引时使用,形如 key[2:]、key[:]

jsonpath1 . jsonpath2

>>> [match.value for match in parse('key1.id').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[1]

result = re.findall('[.+]', sub_expr)

jsonpath [ whatever ]

>>> [match.value for match in parse('key1[id]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[1]

>>> [match.value for match in parse('key3[0]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}]})]

[{'id': 3}]

if result: # 如果子表达式为数组,形如 [1]、key[1]、 key[1:2]、 key[2:]、 key[:3]、key[:]

jsonpath1 .. jsonpath2

>>> [match.value for match in parse('key3..id').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':{'key4':{'key5':{'id':3, 'name':'shouke'}}}})]

[3]

>>> [match.value for match in parse('key3..id').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[3, 4]

array_part = result[0]

jsonpath1 where jsonpath2

>>> [match.value for match in parse('key2 where id').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'

[{'id': 2}]

注意:匹配jsonpath2的必须是直接子节点

>>> [match.value for match in parse('key3 where id').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'

[]

>>> [match.value for match in parse('key3 where id').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':{'key4':{'key5':{'id':3, 'name':'shouke'}}}})]

[]

array_part = array_part.lstrip('[').rstrip(']')

jsonpath1 | jsonpath2

>>> [match.value for match in parse('key1 | key3 ').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':{'key4':{'key5':{'id':3, 'name':'shouke'}}}})]

[{'id': 1}, {'key4': {'key5': {'name': 'shouke', 'id': 3}}}]

>>> [match.value for match in parse('key1 | key3.key4 ').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':{'key4':{'key5':{'id':3, 'name':'shouke'}}}})]

[{'key5': {'name': 'shouke', 'id': 3}}]

key_part = sub_expr[:sub_expr.index('[')]

字段说明

fieldname 来自当前对象的字段名称

"fieldname" 同上,允许fieldname中包含指定字符

'fieldname' 同上

* 任意字段名称

field , field 指定多个字段

if key_part == '$': # 如果key为 $ ,为根,替换为数据变量 json_data

例子

key_part = JSON_DATA_VARNAME

*

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':{'key4':{'key5':{'id':3, 'name':'shouke'}}}})]

[1]

注意:如果是jsonpath1.* 返回的是最后层级的值

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':{'key4':{'key5':{'id':3, 'name':'shouke'}}}})]

[{'key5': {'name': 'shouke', 'id': 3}}]

>>> [match.value for match in parse.find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[]

>>> [match.value for match in parse('key1, key2, key3').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'kesh

[{'id': 1}, {'id': 2}, [{'id': 3}, {'name': 'shouke'}, {'name': 'keshou', 'id': 4}]]

>>> [match.value for match in parse.find({'root':{'key':[{'id':'tizza'}, {'name':'hizza'}]}})]

[{'id': 'tizza'}, {'name': 'hizza'}]

elif key_part == '*':

field , field 指定多个字段

>>> [match.value for match in parse('key1, key2, key3').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]

[{'id': 1}, {'id': 2}, [{'id': 3}, {'name': 'shouke'}, {'name': 'keshou', 'id': 4}]]

key_part == '[.+]' # 如果key为 ,替换为 [.+] 以便匹配 ["key1"]、["key2"]、……*

数组索引说明

[n] 数组索引

[start?:end?] 含义同python的数组切片,注意:数组索引不包含end,可以不指定start, end,或者两者之一

[*] 任意索引,表示返回整个数组元素,等同于[:]

**else**:

例子

key_part = '["%s"]' % key_part

[*]

[match.value for match in parse('key3[*]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[{'id': 3}, {'name': 'shouke'}, {'name': 'keshou', 'id': 4}]

if array_part == '*': # 如果数组索引为 ,替换为 [d+] 以便匹配 [0]、[1]、……*

[start?:end?]

>>> [match.value for match in parse('key3[0:1]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[{'id': 3}]

>>> [match.value for match in parse('key3[0:]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[{'id': 3}, {'name': 'shouke'}, {'name': 'keshou', 'id': 4}]

>>> [match.value for match in parse('key3[:1]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[{'id': 3}]

>>> [match.value for match in parse('key3[:]').find({'key1':{'id': 1}, 'key2':{'id': 2}, 'key3':[{'id':3}, {'name':'shouke'}, {'id':4, 'name':'keshou'}]})]

[{'id': 3}, {'name': 'shouke'}, {'name': 'keshou', 'id': 4}]

>>>

更多功能参考官方文档

参考链接

array_part = '[d+]'

else:

array_part_list = array_part.replace(' ', '').split(':')

left_index = array_part_list[0:1]

right_index = array_part_list[1:]

if left_index:

left_index = left_index[0]

if not (left_index or left_index.isdigit: # 为空字符串、非数字

left_index = '0'

else:

left_index = '0'


if right_index:

right_index = right_index[0]

if not (right_index or right_index.isdigit:

right_index = RIGHT_INDEX_DEFAULT # 一个比较大的值,

array_part = left_index + '-' + right_index

else:

array_part = left_index

array_part = '[[%s]]' % array_part # 数组索引设置为 [[n-m]],以便匹配[n],[n+1], ……,[m-1]

**return** key_part + array_part

elif sub_expr == '*':

sub_expr = '[.+]'

elif sub_expr == '$':

sub_expr = JSON_DATA_VARNAME

else:

sub_expr = '["%s"]' % sub_expr

return sub_expr

def parse_json(json_data, data_struct_link):

'''

递归解析json数据结构,存储元素的路径

**:param**json_data:

**:param**data_struct_link:

**:return**:

'''

**if**

type(json_data) ==

type: # 字典类型

keys_list = json_data.keys()

for key in keys_list:

temp_data_struct_link = data_struct_link + '["%s"]' % key

if

type(json_data[key]) not in [type, type]: # key对应的value值既不是数组,也不是字典

data_struct_list.append(temp_data_struct_link)

else:

parse_json(json_data[key], temp_data_struct_link)

elif

type(json_data) ==

type: # 数组类型

array_length =

len(json_data)

for index in

range(0, array_length):

temp_json_data = json_data[index]

keys_list = temp_json_data.keys()

for key in keys_list:

temp_data_struct_link = data_struct_link + '[%s]["%s"]' % (str, key)

if

type(temp_json_data[key]) not in [type, type]: # key对应的value值既不是数组,也不是字典

data_struct_list.append(temp_data_struct_link)

else:

parse_json(temp_json_data[key], temp_data_struct_link)

if __name__ == '__main__':

json_data = [{"data": [{

"admin": "string|集群负责人|||",

"components": [

{

"clusterId": "integer|组件所属的集群 id|||",

"createTime": "string|组件创建时间|||",

"description": "string|组件描述|||",

"enabled": "boolean|组件是否开启||false|",

},

{

"clusterId": "integer|组件所属的集群 id|||",

"createTime": "string|组件创建时间|||",

"description": "string|组件描述|||",

"enabled": "boolean|组件是否开启||false|",

}

],

"createTime": "string|集群创建时间|||",

"description": "string|集群描述|||",

"enabled": "boolean|集群是否开启||false|",

"id": "integer|集群 id|||",

"modifyTime": "string|集群修改时间|||",

"name": "string|集群名|||"

**}],

"errMsg": "string||||",

"ok": "boolean||||",

"status": "integer||||"

**}]

JSON_DATA_VARNAME = 'json_data' # 存在json数据的变量名称

data_struct_list = [] # 用于存放所有 json 元素路径,形如 json_data[0]["data"][0]["components"][0]["enabled"]

data_struct_link = 'json_data' *# 用于临时存放单条json 元素路径(的一部分)*

parse_json(json_data, data_struct_link)

print('获取的json元素路径,元素值如下:')

for item in data_struct_list:

print(item, 't', eval

# 测试用表达式

# expr = '$.data[].components[0]' # json数据为字典 形如 {……}*

# expr = '$[].data[0:1].components[*]' # json数据为数组 形如 [{……}]*

expr = 'data[0:1].components[*]'

*# expr = 'data[0:1].components'*

# 解析表达式为正则表达式

re_pattern = ''

for sub_expr in expr.split('.'):

re_pattern += parse_sub_expr

print('n元素路径jsonpath表达式为:%s' % expr)

print('元素路径正则表达式re pattern为:%s' % re_pattern)

print('njsonpath 匹配结果如下:')

re_pattern = re.compile(re_pattern)

target_set

set() # 匹配结果会有重复值,所以采用集合

**for item in** data_struct_list:

results = re.findall(re_pattern, item)

for result in results:

print('匹配的元素路径jsonpath为:%s' % item)

print('正则匹配结果为:%s' % result)

target = item[0:item.index +

len]

print('供提取数据使用的jsonpath为:%s' % target)

print('提取的结果值为:%s n' % eval

target_set.add

# 通过匹配提取的目标结果,操作json串

**for item in** target_set:

target =

eval

if

type

type: # 如果为字典

# 更改键的值

target['clusterId'] = 10

# 新增键值对

target['new_key'] = 'key_value'


*# 更改键的名称,可以考虑先复制旧的键值,赋值给新的键,然后删除旧的键*

target['description_new'] = target['description']

# 删除键值对

**del target['description'**]

elif

type

type:

# 暂不实现

**pass**


**print(json_data)

运行结果截图:

图片 1图片 2

图片 3

图片 4

图片 5

图片 6

本文由澳门太阳娱乐手机登录发布于 资质荣誉,转载请注明出处:定位及增删改操作,Python_json数据检索与定位之

关键词: