9.1 地址分布探查

探查各地址模式分布情况,在人工生成数据集时按照如下的模式及权重生成数据。输出如下:mode-weight

[
{"mode": "city-district-road-houseNumber", "weight": 0.074}, 
{"mode": "city-district-township-road-houseNumber", "weight": 0.069},
{"mode": "city-district-road-houseNumber-room", "weight": 0.047}
]

9.2 字段分布探查

探查各个字段内部的模式分布情况,以housenumber字段为例,housenumber的组成为一个list,每个list中是一个dict,存放了mode及weight,housenumber组成之一,数字+号组成,占比80%、组成之二,数字-数字号,占比4.7%…

[
    {
        "name":"houseNumber",
        "compose":[
            {
                "mode":[
                    "@digit(1-10000)",
                    "号"
                ],
                "weight":0.803077
            },
            {
                "mode":[
                    "@digit(1-10000)",
                    "-",
                    "@digit(1-10000)",
                    "号"
                ],
                "weight":0.047179
            },
            {
                "mode":[
                    "@digit(1-10000)",
                    "弄"
                ],
                "weight":0.12
            },
            {
                "mode":[
                    "@digit(1-10000)",
                    "号、",
                    "@digit(1-10000)",
                    "号"
                ],
                "weight":0.002051
            }]
}]

9.3 数据探查程序