探查各地址模式分布情况,在人工生成数据集时按照如下的模式及权重生成数据。输出如下:mode-weight
[
{"mode": "city-district-road-houseNumber", "weight": 0.074},
{"mode": "city-district-township-road-houseNumber", "weight": 0.069},
{"mode": "city-district-road-houseNumber-room", "weight": 0.047}
]
探查各个字段内部的模式分布情况,以housenumber字段为例,housenumber的组成为一个list,每个list中是一个dict,存放了mode及weight,housenumber组成之一,数字+号组成,占比80%、组成之二,数字-数字号,占比4.7%…
[
{
"name":"houseNumber",
"compose":[
{
"mode":[
"@digit(1-10000)",
"号"
],
"weight":0.803077
},
{
"mode":[
"@digit(1-10000)",
"-",
"@digit(1-10000)",
"号"
],
"weight":0.047179
},
{
"mode":[
"@digit(1-10000)",
"弄"
],
"weight":0.12
},
{
"mode":[
"@digit(1-10000)",
"号、",
"@digit(1-10000)",
"号"
],
"weight":0.002051
}]
}]
人工标注后处理结果.jsonaddress字段结果分布.json、address字段内部分布.json