Graph Structure Learning with Privacy Guarantees for Open Graph Data

In the age of graph data – such as social networks, business relationship graphs, or knowledge maps – sharing these datasets for research or application purposes is increasingly common. But what if the structure of a graph itself contains sensitive information? Even without revealing the node contents, simply disclosing the existence of edges can lead to privacy breaches.

Traditional approaches to Differential Privacy (DP) focus on protecting data during model training. In this paper, the authors go a step further — they aim to protect privacy at the moment of graph data publishing. They propose an elegant method based on Gaussian Differential Privacy (GDP) that enables learning the structure of a graph while maintaining strong privacy guarantees.

Problem and Assumptions

We have real graph data $G$ which should not be shared in raw form.
We want to generate a synthetic graph $\tilde{G}$ that:
- preserves statistical properties of $G$,
- enables model training as if on $G$,
- satisfies differential privacy with respect to $G$.

Mathematical Background

Differential Privacy

$$ \text{Mechanism } M \text{ satisfies } (\varepsilon, \delta)\text{-DP if for any } D, D’ \text{ differing by one element:} \ \Pr[M(D) \in S] \leq e^{\varepsilon} \Pr[M(D’) \in S] + \delta $$

Gaussian Differential Privacy (GDP)

$$ \mu = \text{privacy parameter}, \quad GDP \approx \mathcal{N}(\mu, 1) $$

Parameter Estimation

$$ \ell(\theta; G) = \sum_{(i,j)} y_{ij} \log \sigma(\theta_{ij}) + (1 - y_{ij}) \log(1 - \sigma(\theta_{ij})) $$

Algorithm (briefly)

Input graph $G$
Choose probabilistic model of graph
Estimate $\theta$ with Gaussian noise
Generate synthetic graph $\tilde{G} \sim P_\theta$
Publish $\tilde{G}$

Experimental Results

On datasets like Cora, Citeseer:

Statistical similarity preserved
Trained models performed well
Strong privacy even with low $\mu$

Conclusion

The GDP-based method allows:

protecting node relationships,
generating realistic synthetic graphs,
training high-performing models.

An important step toward privacy-preserving graph data sharing.

📚 Link

👉 Based on the publication 📄 arXiv:2507.19116

Problem and Assumptions#

Mathematical Background#

Differential Privacy#

Gaussian Differential Privacy (GDP)#

Parameter Estimation#

Algorithm (briefly)#

Experimental Results#

Conclusion#

📚 Link#